[prev in list] [next in list] [prev in thread] [next in thread]
List: pgsql-performance
Subject: Re: [PERFORM] Insert performance and multi-column index order
From: Bob Lunney <bob_lunney () yahoo ! com>
Date: 2009-06-30 12:31:43
Message-ID: 387856.59902.qm () web39707 ! mail ! mud ! yahoo ! com
[Download RAW message or body]
Greg,
Thanks for the mental prod! Yes, the original data is more closely sorted by the \
timestamptz column, since they represent events coming into the collection system in \
real time. As for the distribution of data values, it goes without saying the \
timestamptz value is monotonically increasing, with roughly 1300 entries having the \
same timestamptz value. The other three columns' values are essentially reference \
data, with 400 values for the varchar, 680 for the first text column, and 60 for the \
second text column. The distribution is fairly even, with some small spikes but \
nothing significant.
The "duh" moment came for me when you pointed out the implicit sort order of the \
data. After resorting the data into the new index column order the insert \
performance was largely restored. I didn't monitor the process with vmstat, however \
- the end result is good enough for me. I believe that the index maintenance of page \
splitting, etc., that you describe below was exactly the culprit, and that presorting \
the data solved that problem.
I call it my "duh" moment since I've presorted data for Sybase and Oracle for exactly \
the same reason, but forgot to apply the lesson to PostgreSQL.
BTW, this is PG 8.2.1 and 8.3.7 running on SLES 10.3, although I don't think it \
matters.
Thanks for the help, Greg and Tom!
--- On Sat, 6/27/09, Greg Smith <gsmith@gregsmith.com> wrote:
> From: Greg Smith <gsmith@gregsmith.com>
> Subject: Re: [PERFORM] Insert performance and multi-column index order
> To: bob_lunney@yahoo.com
> Cc: pgsql-performance@postgresql.org
> Date: Saturday, June 27, 2009, 1:08 AM
> On Fri, 26 Jun 2009, bob_lunney@yahoo.com
> wrote:
>
> > The original unique index was in the order
> (timestamptz, varchar, text, text) and most queries against
> it were slow. I changed the index order to (varchar, text,
> timestamptz, text) and queries now fly, but loading data
> (via copy from stdin) in the table is 2-4 times slower.
>
> Is the input data closer to being sorted by the timestamptz
> field than the varchar field? What you might be seeing
> is that the working set of index pages needed to keep
> building the varchar index are bigger or have more of a
> random access component to them as they spill in and out of
> the buffer cache. Usually you can get a better idea
> what the difference is by comparing the output from vmstat
> while the two are loading. More random read/write
> requests in the mix will increase the waiting for I/O
> percentage while not increasing the total amount
> read/written per second.
>
> --
> * Greg Smith gsmith@gregsmith.com
> http://www.gregsmith.com Baltimore, MD
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic