'Re: Slow Query Question - Need help of Gurus.'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mysql-java
Subject:    Re: Slow  Query Question - Need help of Gurus.
From:       Michael Stassen <Michael.Stassen () verizon ! net>
Date:       2004-04-19 17:08:13
Message-ID: 4084077D.8010102 () verizon ! net
[Download RAW message or body]



Ben Dinnerville wrote:

<snip>
>>>Note that sorting by the count can't use an index, so it will be slower
>>>than if you had ordered by `Call Svc Tag ID`.
> 
> This is something that will need testing. Ordering on a varchar(255) column
> (call svc tag ID) is going to be a lot more inefficient than ordering on a
> numeric column, especially when the ordering is happening on the reduced
> result set of the group by and predicate clauses, however the indexing
> available on the "real" column may negate any deficit.

The ordering happens on the reduced result set either way, so that isn't 
relevant here.  The key point is the index.  Without any index, it is 
certainly true that a varchar(255) should take longer to sort than an int, 
but in this case the varchar column is already sorted in the index, while 
the int is the result of a calculation.  No additional work is needed to 
sort the varchar, while the int must be sorted.  On the other hand, sorting 
ints is usually fast, so I don't think this should be a big factor.

>>> MySQL handles BETWEEN just fine. If you think about it, you
>>> explicitly set the range with BETWEEN, but the optimizer has to put
>>> the two inequalities joined with AND together to get the same range.
>>> In other words, BETWEEN is  easier.
> 
> I am sure that MySQL does handle the "between" just fine, but this is not to
> say that it will handle the between better that a gt / lt clause in this
> particular scenario. I know the between sounds like it might be easier, but
> internally the between is likely to be translated into a gt / lt clause by
> the optimiser anyhow as the only way to check if something is between 2
> values is the check if it is greater than the min and less than the max.
> Just give the 2 different queries a go and see if there is any time
> difference.

I won't pretend to be an expert on the internals of the optimizer, but I 
think you are missing the value of the index.  Another way to tell if a 
value is between two others is to check its relative position in the index. 
  Both the between and the 2 inequalities define a range.  With the index, 
the optimizer need merely find the start and end of the range and then grab 
everything from the one to the other.  I am certainly in favor of testing, 
but I'd be surprised if you saw a measurable difference between the two.

> Also noticed that the table seems to be a fairly flat structure (hard to
> tell definitely from looking at one sample). By this I mean that there are a
> lot of varchar(255)'s in the table, and looking at the naming of the
> columns, these seem to be "id" type fields. You might be able to gain some
> performance by normalising the database a little by moving some of the large
> varchar fields out to a related table and referencing by a numeric type id,
> this could greatly reduce the amount of data in your base table, especially
> considering the number of rows you are talking about storing and could give
> you an IO based performance increase. Once again, might need to test and
> play around a little with different models here.

Excellent point.  Normalizing would help a lot.  Unless there are 500,000 
Call Reps, there's a LOT of wasted space in this table.  To make matters 
worse, the rows are variable length, so there's a penalty for wasted space. 
  At the least, the Call Rep info should be in its own table, and the 6 Call 
Rep columns should be replaced with an int column containing the Call Rep 
key.  Similarly, the Case Status and Substatus should be moved to their own 
table(s?).  If at all possible, all tables should be fixed length (CHARs) of 
a reasonable size (will you ever really have a 255 char Call Rep Employee 
Name?) rather than variable length (VARCHARs).  Usually, I'd say that disk 
is cheap, so go ahead and waste some space to save on time, but in this 
case, I expect we'll save so much space from normalizing that even changing 
VARCHAR to CHAR we'll still end up ahead spacewise.

Dropping any unused indexes could also help, in general.  For example, 
unless we need to select based on time of day without regard to date, the 
index on `Journal Create Time` is unlikely to be used.  In that case drop 
both date and time indexes and replace them with one multicolumn index.

<snip>
> Also, are you looking for null values, or the word "null" in the column?  

I noticed that too and forgot to mention it.

Michael


-- 
MySQL Java Mailing List
For list archives: http://lists.mysql.com/java
To unsubscribe:    http://lists.mysql.com/java?unsub=mysql-java@progressive-comp.com

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic