[prev in list] [next in list] [prev in thread] [next in thread]
List: pgsql-performance
Subject: Re: [PERFORM] Fast HashJoin only after a cluster/recreate table
From: Alexandre de Arruda Paes <adaldeia () gmail ! com>
Date: 2016-04-01 17:34:37
Message-ID: CAGewt-u4JVuibCPuRvPwXz8bcgi=SOagS1d=WKUxKdX1k_H5Cg () mail ! gmail ! com
[Download RAW message or body]
Hi David,
Thanks for the explanations. But I don't undestand why when I recreate the
table, the planner choose a best mode for sometime...
> I wonder what the planner would do if you pulled out the join to ES08T. If
that generates a better plan, then providing that es08tipdoc is the primary
key of that table, then you could just put a
> subquery in the SELECT clause to lookup the es08desdoc.
We have a problem with this approach. Actually, this querys are generated
by a framework and can't be 'rewrite'.
Can you think in another solution directly in DB (perhaps a partial index,
table partitioning, etc) ???
Best regards,
Alexandre
2016-04-01 10:17 GMT-03:00 David Rowley <david.rowley@2ndquadrant.com>:
> On 1 April 2016 at 15:44, Alexandre de Arruda Paes <adaldeia@gmail.com>
> wrote:
>
> > In the query below, the planner choose an extreme slow mergejoin(380
> > seconds). 'Vacuum analyze' can't help.
> >
>
> Yeah, it appears that planner is estimating the WHERE clause on es09t
> quite badly, expecting just 1 row, but there's actually 1212 rows. This
> seems to throw the whole plan decision out quite a bit, as, if you notice
> in the merge left join for t1.es09tipdoc = t2.es08tipdoc, it expect just 2
> rows to be present, therefore most likely thinks that it's not worth
> sorting those results on t1.es09tipdoc, t1.es09numdoc in order for it to
> match the known output order of Materialize node on the inner side of that
> join. Instead it assumes the Merge join will be good enough on just the
> es09tipdoc order, and just adds the other two columns as join filters....
>
> ... and this seems to be what's killing the performance. This Merge Join
> constantly has to perform a mark/restore on the Materialize node. This why
> you're getting the insanely high "Rows Removed by Join Filter: 992875295",
> in other words the join filter throw away that many row combinations
> because they didn't match.
>
> This mark/restore is basically the rewind process that the merge join
> algorithm needs to do to match many to many rows. In actual fact, this
> rewind is pretty useless in this case as the GROUP BY subquery ensures that
> no duplicate values will make it into the inner side of that merge join.
> The planner is not currently smart enough to detect this.
>
> There are some patches currently pending for 9.6 which might help fix this
> problem in the future;
>
> 1. multivariate statistics; this might help the poor estimates on es09t.
> If this was available you could add statistics on es09codemp, es09datreq,
> which may well improve the estimate and cause the plan to change.
> https://commitfest.postgresql.org/9/450/
> 2. Unique joins. This tackles the problem a different way and allows the
> Merge Join algorithm to skip the restore with some new smarts that are
> added to the planner to detect when the inner side of the join can only
> produce, at most, a single row for each outer row.
> https://commitfest.postgresql.org/9/129/
>
> If you feel like compiling 9.6 devel from source and applying each of
> these patches independently and seeing if it helps... Of course that does
> not solve your 9.4 production dilemma, but it may help evaluation of each
> of these two patches for 9.6 or beyond.
>
> I wonder what the planner would do if you pulled out the join to ES08T. If
> that generates a better plan, then providing that es08tipdoc is the primary
> key of that table, then you could just put a subquery in the SELECT clause
> to lookup the es08desdoc.
>
>
> >
> > QUERY PLAN
> >
> >
> > ---------------------------------------------------------------------------------- \
> > ------------------------------------------------------------------------------------
> > Sort (cost=289546.93..289546.94 rows=2 width=78) (actual
> > time=380405.796..380405.929 rows=2408 loops=1)
> > Sort Key: t1.es09numdoc, t1.es09tipdoc
> > Sort Method: quicksort Memory: 435kB
> > Buffers: shared hit=82163
> > -> Merge Left Join (cost=47.09..289546.92 rows=2 width=78) (actual
> > time=1133.077..380398.160 rows=2408 loops=1)
> > Merge Cond: (t1.es09tipdoc = es09t1.es09tipdoc)
> > Join Filter: ((es09t1.es09codemp = t1.es09codemp) AND
> > (es09t1.es09numdoc = t1.es09numdoc))
> > Rows Removed by Join Filter: 992875295
> > Buffers: shared hit=82163
> > -> Merge Left Join (cost=46.53..49.29 rows=2 width=70) (actual
> > time=12.206..18.155 rows=2408 loops=1)
> > Merge Cond: (t1.es09tipdoc = t2.es08tipdoc)
> > Buffers: shared hit=6821
> > -> Sort (cost=9.19..9.19 rows=2 width=44) (actual
> > time=11.611..12.248 rows=2408 loops=1)
> > Sort Key: t1.es09tipdoc
> > Sort Method: quicksort Memory: 285kB
> > Buffers: shared hit=6814
> > -> Nested Loop (cost=1.11..9.18 rows=2 width=44)
> > (actual time=0.040..10.398 rows=2408 loops=1)
> > Buffers: shared hit=6814
> > -> Index Scan using ad_es09t_1 on es09t t3
> > (cost=0.56..4.58 rows=1 width=42) (actual time=0.020..0.687 rows=1212
> > loops=1)
> > Index Cond: ((es09codemp = 1) AND
> > (es09datreq >= '2016-02-02'::date) AND (es09datreq <= '2016-02-02'::date))
> > Filter: (es09usuari ~~
> > '%%%%%%%%%%%%%%%%%%%%'::text)
> > Buffers: shared hit=108
> > -> Index Scan using es09t1_pkey on es09t1 t1
> > (cost=0.56..4.59 rows=1 width=19) (actual time=0.006..0.007 rows=2
> > loops=1212)
> > Index Cond: ((es09codemp = 1) AND
> > (es09tipdoc = t3.es09tipdoc) AND (es09numdoc = t3.es09numdoc))
> > Filter: (es09tipdoc ~~ '%%%%%'::text)
> > Buffers: shared hit=6706
> > -> Sort (cost=37.35..38.71 rows=547 width=32) (actual
> > time=0.592..2.206 rows=2919 loops=1)
> > Sort Key: t2.es08tipdoc
> > Sort Method: quicksort Memory: 67kB
> > Buffers: shared hit=7
> > -> Seq Scan on es08t t2 (cost=0.00..12.47 rows=547
> > width=32) (actual time=0.003..0.126 rows=547 loops=1)
> > Buffers: shared hit=7
> > -> Materialize (cost=0.56..287644.85 rows=716126 width=23)
> > (actual time=0.027..68577.800 rows=993087854 loops=1)
> > Buffers: shared hit=75342
> > -> GroupAggregate (cost=0.56..278693.28 rows=716126
> > width=15) (actual time=0.025..4242.453 rows=3607573 loops=1)
> > Group Key: es09t1.es09codemp, es09t1.es09tipdoc,
> > es09t1.es09numdoc
> > Buffers: shared hit=75342
> > -> Index Only Scan using es09t1_pkey on es09t1
> > (cost=0.56..199919.49 rows=7161253 width=15) (actual time=0.016..1625.031
> > rows=7160921 loops=1)
> > Index Cond: (es09codemp = 1)
> > Heap Fetches: 51499
> > Buffers: shared hit=75342
> > Planning time: 50.129 ms
> > Execution time: 380419.435 ms
> >
>
>
> --
> David Rowley http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
[Attachment #3 (text/html)]
<div dir="ltr">Hi David,<div><br></div><div>Thanks for the explanations. But I \
don't undestand why when I recreate the table, the planner choose a best mode for \
sometime...</div><div><br></div><div><div>>I wonder what the planner would do if \
you pulled out the join to ES08T. If that generates a better plan, then providing \
that es08tipdoc is the primary key of that table, then you could just put a \
</div><div>> subquery in the SELECT clause to lookup the \
es08desdoc.</div><div><div></div></div></div><div><br></div><div>We have a problem \
with this approach. Actually, this querys are generated by a framework and can't \
be 'rewrite'. </div><div>Can you think in another solution directly in DB \
(perhaps a partial index, table partitioning, etc) ???</div><div><br></div><div>Best \
regards,</div><div><br></div><div>Alexandre</div><div><br></div><div><br><div \
class="gmail_extra"><br><div class="gmail_quote">2016-04-01 10:17 GMT-03:00 David \
Rowley <span dir="ltr"><<a href="mailto:david.rowley@2ndquadrant.com" \
target="_blank">david.rowley@2ndquadrant.com</a>></span>:<br><blockquote \
class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div \
dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On 1 April 2016 at \
15:44, Alexandre de Arruda Paes <span dir="ltr"><<a \
href="mailto:adaldeia@gmail.com" target="_blank">adaldeia@gmail.com</a>></span> \
wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div \
dir="ltr"><div>In the query below, the planner choose an extreme slow mergejoin(380 \
seconds). 'Vacuum analyze' can't \
help.<br></div></div></blockquote><div><br></div></span><div>Yeah, it appears that \
planner is estimating the WHERE clause on es09t quite badly, expecting just 1 row, \
but there's actually 1212 rows. This seems to throw the whole plan decision \
out quite a bit, as, if you notice in the merge left join for t1.es09tipdoc = \
t2.es08tipdoc, it expect just 2 rows to be present, therefore most likely thinks that \
it's not worth sorting those results on t1.es09tipdoc, t1.es09numdoc in order \
for it to match the known output order of Materialize node on the inner side of that \
join. Instead it assumes the Merge join will be good enough on just the es09tipdoc \
order, and just adds the other two columns as join \
filters....</div><div><br></div><div>... and this seems to be what's killing the \
performance. This Merge Join constantly has to perform a mark/restore on the \
Materialize node. This why you're getting the insanely high "Rows Removed by \
Join Filter: 992875295", in other words the join filter throw away that many row \
combinations because they didn't match.</div><div><br></div><div>This \
mark/restore is basically the rewind process that the merge join algorithm needs to \
do to match many to many rows. In actual fact, this rewind is pretty useless in this \
case as the GROUP BY subquery ensures that no duplicate values will make it into the \
inner side of that merge join. The planner is not currently smart enough to detect \
this.</div><div><br></div><div>There are some patches currently pending for 9.6 which \
might help fix this problem in the future;</div><div><br></div><div>1. multivariate \
statistics; this might help the poor estimates on es09t. If this was available you \
could add statistics on es09codemp, es09datreq, which may well improve the estimate \
and cause the plan to change. <a href="https://commitfest.postgresql.org/9/450/" \
target="_blank">https://commitfest.postgresql.org/9/450/</a></div><div>2. Unique \
joins. This tackles the problem a different way and allows the Merge Join algorithm \
to skip the restore with some new smarts that are added to the planner to detect when \
the inner side of the join can only produce, at most, a single row for each outer \
row. <a href="https://commitfest.postgresql.org/9/129/" \
target="_blank">https://commitfest.postgresql.org/9/129/</a></div><div><br></div><div>If \
you feel like compiling 9.6 devel from source and applying each of these patches \
independently and seeing if it helps... Of course that does not solve your 9.4 \
production dilemma, but it may help evaluation of each of these two patches for 9.6 \
or beyond.</div><div><br></div><div>I wonder what the planner would do if you pulled \
out the join to ES08T. If that generates a better plan, then providing that \
es08tipdoc is the primary key of that table, then you could just put a subquery in \
the SELECT clause to lookup the es08desdoc.</div><div><div><div> </div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div \
dir="ltr"><div></div><div> \
QUERY PLAN \
<br></div><div><div>------------------------------------------------------------------ \
----------------------------------------------------------------------------------------------------</div><div> \
Sort (cost=289546.93..289546.94 rows=2 width=78) (actual \
time=380405.796..380405.929 rows=2408 loops=1)</div><div> Sort Key: \
t1.es09numdoc, t1.es09tipdoc</div><div> Sort Method: quicksort Memory: \
435kB</div><div> Buffers: shared hit=82163</div><div> -> Merge Left Join \
(cost=47.09..289546.92 rows=2 width=78) (actual time=1133.077..380398.160 rows=2408 \
loops=1)</div><div> Merge Cond: (t1.es09tipdoc = \
es09t1.es09tipdoc)</div><div> Join Filter: ((es09t1.es09codemp = \
t1.es09codemp) AND (es09t1.es09numdoc = t1.es09numdoc))</div><div> Rows \
Removed by Join Filter: 992875295</div><div> Buffers: shared \
hit=82163</div><div> -> Merge Left Join (cost=46.53..49.29 rows=2 \
width=70) (actual time=12.206..18.155 rows=2408 loops=1)</div><div> \
Merge Cond: (t1.es09tipdoc = t2.es08tipdoc)</div><div> Buffers: \
shared hit=6821</div><div> -> Sort (cost=9.19..9.19 \
rows=2 width=44) (actual time=11.611..12.248 rows=2408 loops=1)</div><div> \
Sort Key: t1.es09tipdoc</div><div> Sort Method: \
quicksort Memory: 285kB</div><div> Buffers: shared \
hit=6814</div><div> -> Nested Loop \
(cost=1.11..9.18 rows=2 width=44) (actual time=0.040..10.398 rows=2408 \
loops=1)</div><div> Buffers: shared \
hit=6814</div><div> -> Index Scan using \
ad_es09t_1 on es09t t3 (cost=0.56..4.58 rows=1 width=42) (actual time=0.020..0.687 \
rows=1212 loops=1)</div><div> Index \
Cond: ((es09codemp = 1) AND (es09datreq >= '2016-02-02'::date) AND \
(es09datreq <= '2016-02-02'::date))</div><div> \
Filter: (es09usuari ~~ '%%%%%%%%%%%%%%%%%%%%'::text)</div><div> \
Buffers: shared hit=108</div><div> -> \
Index Scan using es09t1_pkey on es09t1 t1 (cost=0.56..4.59 rows=1 width=19) (actual \
time=0.006..0.007 rows=2 loops=1212)</div><div> \
Index Cond: ((es09codemp = 1) AND (es09tipdoc = t3.es09tipdoc) AND (es09numdoc = \
t3.es09numdoc))</div><div> Filter: \
(es09tipdoc ~~ '%%%%%'::text)</div><div> \
Buffers: shared hit=6706</div><div> -> Sort \
(cost=37.35..38.71 rows=547 width=32) (actual time=0.592..2.206 rows=2919 \
loops=1)</div><div> Sort Key: t2.es08tipdoc</div><div> \
Sort Method: quicksort Memory: 67kB</div><div> \
Buffers: shared hit=7</div><div> -> Seq Scan on \
es08t t2 (cost=0.00..12.47 rows=547 width=32) (actual time=0.003..0.126 rows=547 \
loops=1)</div><div> Buffers: shared \
hit=7</div><div> -> Materialize (cost=0.56..287644.85 rows=716126 \
width=23) (actual time=0.027..68577.800 rows=993087854 loops=1)</div><div> \
Buffers: shared hit=75342</div><div> -> GroupAggregate \
(cost=0.56..278693.28 rows=716126 width=15) (actual time=0.025..4242.453 rows=3607573 \
loops=1)</div><div> Group Key: es09t1.es09codemp, \
es09t1.es09tipdoc, es09t1.es09numdoc</div><div> \
Buffers: shared hit=75342</div><div> -> Index \
Only Scan using es09t1_pkey on es09t1 (cost=0.56..199919.49 rows=7161253 width=15) \
(actual time=0.016..1625.031 rows=7160921 loops=1)</div><div> \
Index Cond: (es09codemp = 1)</div><div> Heap \
Fetches: 51499</div><div> Buffers: shared \
hit=75342</div><div> Planning time: 50.129 ms</div><div> Execution time: 380419.435 \
ms</div></div></div></blockquote></div></div></div><span><font color="#888888"><br \
clear="all"><div><br></div>-- <br><div><div dir="ltr"><div><div dir="ltr"><div><div \
dir="ltr"><div><div dir="ltr"><font color="#000000"><span style="font-size:12.8px"> \
David Rowley </span><a href="http://www.2ndQuadrant.com/" \
target="_blank">http://www.2ndQuadrant.com/</a><br></font><div \
style="font-size:12.8px"><div></div><div><font color="#000000"> PostgreSQL \
Development, 24x7 Support, Training & \
Services</font></div></div></div></div></div></div></div></div></div></div> \
</font></span></div></div> </blockquote></div><br></div></div></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic