[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-performance
Subject:    Re: [PERFORM] Insert performance and multi-column index order
From:       Bob Lunney <bob_lunney () yahoo ! com>
Date:       2009-06-30 12:31:43
Message-ID: 387856.59902.qm () web39707 ! mail ! mud ! yahoo ! com
[Download RAW message or body]


Greg,

Thanks for the mental prod!  Yes, the original data is more closely sorted by the \
timestamptz column, since they represent events coming into the collection system in \
real time.  As for the distribution of data values, it goes without saying the \
timestamptz value is monotonically increasing, with roughly 1300 entries having the \
same timestamptz value.  The other three columns' values are essentially reference \
data, with 400 values for the varchar, 680 for the first text column, and 60 for the \
second text column.  The distribution is fairly even, with some small spikes but \
nothing significant.

The "duh" moment came for me when you pointed out the implicit sort order of the \
data.  After resorting the data into the new index column order the insert \
performance was largely restored.  I didn't monitor the process with vmstat, however \
- the end result is good enough for me.  I believe that the index maintenance of page \
splitting, etc., that you describe below was exactly the culprit, and that presorting \
the data solved that problem.  

I call it my "duh" moment since I've presorted data for Sybase and Oracle for exactly \
the same reason, but forgot to apply the lesson to PostgreSQL.

BTW, this is PG 8.2.1 and 8.3.7 running on SLES 10.3, although I don't think it \
matters.

Thanks for the help, Greg and Tom!

--- On Sat, 6/27/09, Greg Smith <gsmith@gregsmith.com> wrote:

> From: Greg Smith <gsmith@gregsmith.com>
> Subject: Re: [PERFORM] Insert performance and multi-column index order
> To: bob_lunney@yahoo.com
> Cc: pgsql-performance@postgresql.org
> Date: Saturday, June 27, 2009, 1:08 AM
> On Fri, 26 Jun 2009, bob_lunney@yahoo.com
> wrote:
> 
> > The original unique index was in the order
> (timestamptz, varchar, text, text) and most queries against
> it were slow.  I changed the index order to (varchar, text,
> timestamptz, text) and queries now fly, but loading data
> (via copy from stdin) in the table is 2-4 times slower.
> 
> Is the input data closer to being sorted by the timestamptz
> field than the varchar field?  What you might be seeing
> is that the working set of index pages needed to keep
> building the varchar index are bigger or have more of a
> random access component to them as they spill in and out of
> the buffer cache.  Usually you can get a better idea
> what the difference is by comparing the output from vmstat
> while the two are loading.  More random read/write
> requests in the mix will increase the waiting for I/O
> percentage while not increasing the total amount
> read/written per second.
> 
> --
> * Greg Smith gsmith@gregsmith.com
> http://www.gregsmith.com Baltimore, MD




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic