[prev in list] [next in list] [prev in thread] [next in thread]
List: pgsql-performance
Subject: Re: [PERFORM] Insert performance for large transaction with multiple COPY FROM
From: Horst Dehmer <horst.dehmer () gmail ! com>
Date: 2013-01-18 23:15:15
Message-ID: 273CF8C2-159B-44FF-B382-61A76541F435 () gmail ! com
[Download RAW message or body]
Hey Jeff (and others)!
First of all: Thanks for your detailed explanations and guide lines.
On 17.01.2013, at 18:12, Jeff Janes <jeff.janes@gmail.com> wrote:
> So the theory is that the presence of idx_4 is causing the trigger to
> pick a poor plan (i.e. one using idx_4) while its absence removes that
> temptation?
Yes. And auto_explain confirms this for the first record \
(obj_item_loc_obj_item_id_idx = idx_4 from last my last mail):
2013-01-18 22:50:21 CET LOG: duration: 0.021 ms plan:
Query Text: SELECT * FROM obj_item_loc WHERE obj_item_loc.obj_item_id = \
NEW.obj_item_id AND obj_item_loc.loc_id = NEW.loc_id AND obj_item_loc.obj_item_loc_ix \
= NEW.obj_item_loc_ix Index Scan using obj_item_loc_obj_item_id_idx on obj_item_loc \
(cost=0.00..8.27 rows=1 width=382) Index Cond: (obj_item_id = $15)
Filter: ((loc_id = $16) AND (obj_item_loc_ix = $17))
2013-01-18 22:50:21 CET CONTEXT: SQL statement "SELECT * FROM obj_item_loc WHERE \
obj_item_loc.obj_item_id = NEW.obj_item_id AND obj_item_loc.loc_id = NEW.loc_id AND \
obj_item_loc.obj_item_loc_ix = NEW.obj_item_loc_ix" PL/pgSQL function \
obj_item_loc_before_insert() line 5 at SQL statement COPY obj_item_loc, line 1: \
"10903011224100007276 10903010224100015110 10900024100000029720 \N \N \N \N \N \N \N \N \N \N \N \N \..."
and for one of the last records:
2013-01-18 22:53:20 CET LOG: duration: 16.088 ms plan:
Query Text: SELECT * FROM obj_item_loc WHERE obj_item_loc.obj_item_id = \
NEW.obj_item_id AND obj_item_loc.loc_id = NEW.loc_id AND obj_item_loc.obj_item_loc_ix \
= NEW.obj_item_loc_ix Index Scan using obj_item_loc_obj_item_id_idx on obj_item_loc \
(cost=0.00..8.27 rows=1 width=382) Index Cond: (obj_item_id = $15)
Filter: ((loc_id = $16) AND (obj_item_loc_ix = $17))
I see a linear increase of the duration from 0.0x ms to over 16 ms (apart from a few \
nasty outliers with about 22 ms). Although even at the end there are still a few \
durations < 0.03 but mostly 15 ms and above.
> True disk reads are much more expensive, but given how few reads you
> have relative to hits, I now think that in aggregate the hits are more
> of a concern than the reads are. In other words, you seem to be CPU
> bound, not IO bound.
Yes, definitely CPU bound, as top shows 99+% CPU utilization.
> Even more so I think (but not with much confidence) that most of your
> "reads" are actually coming from the OS cache and not from the disk.
> PG cannot distinguish true disk reads from OS cache reads.
>
> When was the last time you reset the stats? That is, are your
> reported numbers accumulated over several loads, with some having idx4
> and some not?
I set up a fresh database before each test run. So the stats should be clean.
> More formally, use use auto_explain and set
> auto_explain.log_nested_statements to true. I haven't verified this
> works with triggers, just going by the description I think it should.
Nice tip! Works for triggers as well.
> Your use case is a little unusual. If you are bulk loading into an
> initially empty table, usually you would remove the trigger and add it
> after the load (with some kind of bulk operation to make up for
> whatever it was the trigger would have been doing). On the other
> hand, if you are bulk loading into a "live" table and so can't drop
> the trigger, then the live table should have good-enough preexisting
> statistics to make the trigger choose a good plan.
My case is indeed unusual as for the whole model of 276 tables there will never be an \
update nor a delete on any row. The model is rather short-lived, from a few hours to \
a few months. COPY FROM/TO are the only ways to get data into the database and back \
out. And in between there is lots of graph traversal and calculation of convex hulls. \
But the lengthy transaction are by far not the common case.
Having said that, I'm no longer sure if a RDBMS is the right tool for the backend. \
Maybe indexing and storing with a plain full text search engine is. Dunno...
Thanks again!
--
Horst
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic