'Re: optimization help'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mysql
Subject:    Re: optimization help
From:       Brent Baisley <brenttech () gmail ! com>
Date:       2007-06-27 23:47:20
Message-ID: 13E18BA8-CDF7-4D74-A699-ABC0E03AB67F () gmail ! com
[Download RAW message or body]

That's quite a query. You may not be able to optimize it well with  
those nested selects. You may want to think about changing your query  
around a little, perhaps joining pieces of data using whatever  
programming language you're using on the front end. You have MySQL  
doing a lot of work and perhaps transferring a lot of data.
If some of those selects are pulling data that is redundant across  
many rows, if may be more efficient to join them on the front end  
using arrays or something similar.

Another alternative would be to use a different table type like a  
MERGE table. That allows you to query multiple tables as one, which  
is something you had asked about. You need to be appear of it's  
limitations, like unique indexes not being enforced across tables. So  
if you want to use auto increment, you need to set the value when you  
create a new table to add it to the merge setup.


On Jun 27, 2007, at 12:16 PM, Dave G wrote:

> I think I can do that:
>
> I don't have any other indexes, just the keys.
>
> mysql> show create table data__ProcessedDataFrames;
> +--------------------------- 
> +--------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> -----------------------------------------+
> | Table                     | Create Table       |
> +--------------------------- 
> +--------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> -----------------------------------------+
> | data__ProcessedDataFrames | CREATE TABLE  
> `data__ProcessedDataFrames` (
>   `processed_id` int(10) unsigned NOT NULL auto_increment,
>   `top_level_product_name` varchar(255) default NULL,
>   `test_id` int(10) unsigned default NULL,
>   `payload_time` double default NULL,
>   `universal_time` double default NULL,
>   `processed_data` mediumblob,
>   PRIMARY KEY  (`processed_id`),
>   KEY `test_id` (`test_id`),
>   KEY `payload_time` (`payload_time`),
>   KEY `top_level_product_name` (`top_level_product_name`)
> ) ENGINE=MyISAM AUTO_INCREMENT=1392568 DEFAULT CHARSET=latin1 |
> +--------------------------- 
> +--------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------------------- 
> -----------------------------------------+
> 1 row in set (0.00 sec)
>
> mysql>
>
> As for the amount of data I expect to get .... infinite really.
>
> Our setup: we have serveral boxes we are running tests from, where  
> sql is
> the storage engine on each local box, then we will have a main storage
> area for all relevant tests.  Based on passed data, the the tables  
> size
> will be pushing 4G as it is (when we port the data)  and expect at  
> least
> that much more over the life of this software, but since the  
> payloads I
> will be getting the data from have not been developed yet, I can't be
> entirely sure.  One of the reasons I was inquiring as to whether  
> breaking
> it up into several tables would be a good idea is because it would  
> make it
> easier for me to merge the data from the different testers into the  
> main
> data repository that way.  Otherwise I will have to figure out a  
> good way
> of redoing the test_id in each test that is stored in the main  
> repository.
>
> Slow queries will be a little hard to show without giving a full
> evaluation of my system.  So I'll simplify it a little.  I'm doing  
> several
> joins to get the right parameters to query this table in a stored
> procedure  ..... but when it comes down to it, the query on this  
> table is
> the big one and I can modify my other joins, just making the query  
> on this
> table fast is my concern.  Example query:
>
> select payload_time,HEX(processed_data) from data__ProcessedDataFrames
> where test_id=18 AND top_level_product_name="DataProduct" AND  
> payload_time
>> 118080000.74704 AND payload_time < 1180564096.24967;
>
> What I'm concerned about is with how much data I will eventually have,
> even scanning over the KEYS will take a long time.
>
> Thanks
>
> Dave G.
>
> BTW: heres the giant query that I use.
>    SELECT
>       E.product_id,
>       product_name,
>       D.top_level_product_name,
>       processed_id,
>       product_offset,
>       product_length,
>       version_id,
> 		byte_order,
>       ROUND(R.payload_time,6) as payload_time,
>       SUBSTR(
>          BINARY(processed_data),
>          FLOOR(product_offset/8)+1,
>          CEIL(product_length/8)) as substring,
> 		(SELECT HEX(substring)) as raw_data,
>       (SELECT toString (
> 			substring,
> 			round(char_length(raw_data)/2,0),
>          data_type,
>          (SELECT attribute_value FROM
>             config__DataProductAttributes WHERE attribute_name =
> 'FormatString' AND
>                config__DataProductAttributes.product_id =
>                   E.product_id),
>          product_offset % 8,
>          (product_length + (product_offset % 8)) % 8,
>          product_length,
>          byte_order,
>          (SELECT attribute_value FROM config__DataProductAttributes  
> WHERE
>            attribute_name = 'ConvParams' AND
>             config__DataProductAttributes.product_id =
>                E.product_id))) as converted_data,
>       (SELECT enum_name FROM config__DataProductEnumConversions WHERE
>          product_id = E.product_id AND
>          enum_value = converted_data) as enumerated_data,
>       (SELECT metricTest(converted_data,
>         (SELECT xmlTestMetric FROM test__TestMetrics
>          WHERE product_id = E.product_id))) as test_metric,
>       data_type,
>       R.test_id
>       FROM display__DataProducts_in_Element AS E INNER JOIN
> 		   (config__DataProducts AS D INNER JOIN
> 			  (data__ProcessedDataFrames AS R INNER JOIN
> 		       (SELECT
> E.element_id,R.test_id,R.top_level_product_name,max(round 
> (payload_time,6))
> as payload_time FROM
> 		         display__DataProducts_in_Element E, config__DataProducts D,
> data__ProcessedDataFrames R WHERE
> 		         payload_time <= ptime AND test_id like str_tid AND
> 			      D.top_level_product_name = R.top_level_product_name AND
> 			      E.product_id = D.product_id AND
> 			      E.element_id=id GROUP BY D.top_level_product_name) AS S
> 		       ON R.test_id = S.test_id AND R.top_level_product_name =
> S.top_level_product_name AND R.payload_time = S.payload_time)
>             ON D.top_level_product_name = R.top_level_product_name)
>           ON E.product_id = D.product_id
> 		WHERE E.element_id=id
>       GROUP BY product_id;
>
> toString is a UDF I wrote for data conversions.  the config__  
> tables are
> used to interpret the blob in the data__ProcessedDataFrames table
>
> Thanks again for the help.
>
>> Relevant data might include the actual schema for your tables, (DESC
>> does not inform us about what indexes you have or what table types  
>> you
>> are using) and samples of the queries which are running slowly.  Also
>> any estimates you may have worked up about how much data you expect
>> this thing to hold would also be useful.
>>
>>  - michael dykman
>>
>>
>> On 6/27/07, Dave G <mysql@godseyfamily.com> wrote:
>>> I have a table in my database (currently) that grows to be huge  
>>> (and I
>>> need to keep the data).  I'm in a redesign phase and I'm trying  
>>> to do it
>>> right.  So here are the relevant details:
>>>
>>> The table has several keys involved:
>>>
>>> mysql> desc data__ProcessedDataFrames;
>>> +------------------------+------------------+------+----- 
>>> +---------+----------------+
>>> | Field                  | Type             | Null | Key | Default |
>>> Extra
>>>          |
>>> +------------------------+------------------+------+----- 
>>> +---------+----------------+
>>> | processed_id           | int(10) unsigned | NO   | PRI | NULL    |
>>> auto_increment |
>>> | top_level_product_name | varchar(255)     | YES  | MUL | NULL    |
>>>          |
>>> | test_id                | int(10) unsigned | YES  | MUL | NULL    |
>>>          |
>>> | payload_time           | double           | YES  | MUL | NULL    |
>>>          |
>>> | processed_data         | mediumblob       | YES  |     | NULL    |
>>>          |
>>> +------------------------+------------------+------+----- 
>>> +---------+----------------+
>>> 6 rows in set (0.00 sec)
>>>
>>> This is the table that contains the data I'm interested in  
>>> currently.
>>> Queries on this table when it gets large is slow as molasses.  I'm
>>> thinking about making a new table for anything with a different  
>>> test_id
>>> .... any opinions as to whether this is good or bad?
>>>
>>> Before you make fun of me for my questions, I a bit new to database
>>> programming.
>>>
>>> If it is better design to break it into smaller tables (for speed
>>> anyway)
>>> then I would need to know how to query over multiple tables as  
>>> though it
>>> was one table.  Join will do this, but that takes forever (unless of
>>> course I may be doing this wrong), so that's not a good option.   
>>> I need
>>> to
>>> be able to query over mutiple test_ids, which will be multiple  
>>> tables,
>>> for
>>> specific top_level_product_name, with in some time range (using  
>>> p_time).
>>>
>>> Any help would be appreciated.  I will happily give more  
>>> information if
>>> you need to offer an educated opinion.
>>>
>>> Thanks
>>>
>>> David Godsey
>>>
>>>
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe:    http://lists.mysql.com/mysql? 
>>> unsub=mdykman@gmail.com
>>>
>>>
>>
>>
>> --
>>  - michael dykman
>>  - mdykman@gmail.com
>>
>>  - All models are wrong.  Some models are useful.
>>
>
>
>
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql? 
> unsub=brenttech@gmail.com
>


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=mysql-marcsub@progressive-comp.com

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic