'Re: Performance boost by splitting up large table?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mysql
Subject:    Re: Performance boost by splitting up large table?
From:       Larry Martell <larry.martell () gmail ! com>
Date:       2014-05-16 0:03:16
Message-ID: CACwCsY51A+G9zfsrRpbECy_QzuH3x6da=H36jL2-ZbpSn0mFVg () mail ! gmail ! com
[Download RAW message or body]

On Thu, May 15, 2014 at 11:01 AM, Johan De Meersman <vegivamp@tuxera.be> wrote:
> ----- Original Message -----
> > From: "Larry Martell" <larry.martell@gmail.com>
> > Subject: Re: Performance boost by splitting up large table?
> > 
> > This table is queried based on requests from the users. There are 10
> > different lookup columns they can specify, and they can provide any or
> 
> That makes it rather more of a bother, as MySQL can't (yet) skip columns in an \
> index, as far as I'm aware. Someone please correct me if I'm wrong here. 
> 
> > all of these. Currently each one of the columns has an index on it.
> 
> I'm a bit fuzzy on multiple-index queries, but I think support isn't all too sharp.
> 
> 
> > Would it be beneficial to create an index with all 10? Rarely are all
> > 10 specified in the query. Typically it's 3 or 4. Would it be
> > worthwhile to see which are much commonly specified and create an
> > index with just those? Or would it be better to put the commonly
> > selected columns on the index with the lookup columns?
> 
> 
> You may want to grab a day or week's worth of queries (either general_log with all \
> the overhead and disk space that entails, or tcpdump) and pump that through \
> pt-query-digest (Percona Toolkit) to see which combinations of fields are most \
> often used, and add the necessary covering indices to help those queries.

We have a log going back over 2 years with over 200,000 queries, and
I've written python scripts to give the frequency of the column pulled
for the report and the column used for filtering. Not surprising, a
very small number of each are used most of the time, and a huge number
used just 1 or 2 times. I'll be digging into that next week.

> A few points to keep in mind during analysis:
> * order of fields in the where clause is largely irrelevant (although \
>                 most-selective-first is preferred)
> * not all the fields in the index must be queried; but you MUST query a full prefix \
>                 set
> -> ie, for index (a, b, c, d) you can query (a, b) but for (a, c) only (a) will be \
>                 used
> * every index constitutes a (small) performance penalty upon table updates, so \
> don't go too wild either :-) 
> Also helpful:
> * plenty of memory, at least enough to keep the working set in memory
> * faster disks (SSD, or FusionIO things) helps bundles for data that doesn't fit in \
> memory

The db host is running with 250GB memory, db size is 470GB, spinning
2.5" 15k rpm drives, and 40 cores.

> * if you notice a tendency for multiple users (say, a dozen) to run identical \
> queries (and that means /bitwise/ identical query text, down to the last space), \
> the query cache might help. Don't make it too big, though, a couple of meg should \
> suffice.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic