'Re: [PERFORM] Fast HashJoin only after a cluster/recreate table'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-performance
Subject:    Re: [PERFORM] Fast HashJoin only after a cluster/recreate table
From:       Alexandre de Arruda Paes <adaldeia () gmail ! com>
Date:       2016-04-01 17:34:37
Message-ID: CAGewt-u4JVuibCPuRvPwXz8bcgi=SOagS1d=WKUxKdX1k_H5Cg () mail ! gmail ! com
[Download RAW message or body]

Hi David,

Thanks for the explanations. But I don't undestand why when I recreate the
table, the planner choose a best mode for sometime...

> I wonder what the planner would do if you pulled out the join to ES08T. If
that generates a better plan, then providing that es08tipdoc is the primary
key of that table, then you could just put a
> subquery in the SELECT clause to lookup the es08desdoc.

We have a problem with this approach. Actually, this querys are generated
by a framework and can't be 'rewrite'.
Can you think in another solution directly in DB (perhaps a partial index,
table partitioning, etc) ???

Best regards,

Alexandre



2016-04-01 10:17 GMT-03:00 David Rowley <david.rowley@2ndquadrant.com>:

> On 1 April 2016 at 15:44, Alexandre de Arruda Paes <adaldeia@gmail.com>
> wrote:
> 
> > In the query below, the planner choose an extreme slow mergejoin(380
> > seconds). 'Vacuum analyze' can't help.
> > 
> 
> Yeah, it appears that planner is estimating the WHERE clause on es09t
> quite badly, expecting just 1 row, but there's actually 1212 rows.  This
> seems to throw the whole plan decision out quite a bit, as, if you notice
> in the merge left join for t1.es09tipdoc = t2.es08tipdoc, it expect just 2
> rows to be present, therefore most likely thinks that it's not worth
> sorting those results on t1.es09tipdoc, t1.es09numdoc in order for it to
> match the known output order of Materialize node on the inner side of that
> join. Instead it assumes the Merge join will be good enough on just the
> es09tipdoc order, and just adds the other two columns as join filters....
> 
> ... and this seems to be what's killing the performance. This Merge Join
> constantly has to perform a mark/restore on the Materialize node. This why
> you're getting the insanely high "Rows Removed by Join Filter: 992875295",
> in other words the join filter throw away that many row combinations
> because they didn't match.
> 
> This mark/restore is basically the rewind process that the merge join
> algorithm needs to do to match many to many rows. In actual fact, this
> rewind is pretty useless in this case as the GROUP BY subquery ensures that
> no duplicate values will make it into the inner side of that merge join.
> The planner is not currently smart enough to detect this.
> 
> There are some patches currently pending for 9.6 which might help fix this
> problem in the future;
> 
> 1. multivariate statistics; this might help the poor estimates on es09t.
> If this was available you could add statistics on es09codemp, es09datreq,
> which may well improve the estimate and cause the plan to change.
> https://commitfest.postgresql.org/9/450/
> 2. Unique joins. This tackles the problem a different way and allows the
> Merge Join algorithm to skip the restore with some new smarts that are
> added to the planner to detect when the inner side of the join can only
> produce, at most, a single row for each outer row.
> https://commitfest.postgresql.org/9/129/
> 
> If you feel like compiling 9.6 devel from source and applying each of
> these patches independently and seeing if it helps...  Of course that does
> not solve your 9.4 production dilemma, but it may help evaluation of each
> of these two patches for 9.6 or beyond.
> 
> I wonder what the planner would do if you pulled out the join to ES08T. If
> that generates a better plan, then providing that es08tipdoc is the primary
> key of that table, then you could just put a subquery in the SELECT clause
> to lookup the es08desdoc.
> 
> 
> > 
> > QUERY PLAN
> > 
> > 
> > ---------------------------------------------------------------------------------- \
> > ------------------------------------------------------------------------------------
> >  Sort  (cost=289546.93..289546.94 rows=2 width=78) (actual
> > time=380405.796..380405.929 rows=2408 loops=1)
> > Sort Key: t1.es09numdoc, t1.es09tipdoc
> > Sort Method: quicksort  Memory: 435kB
> > Buffers: shared hit=82163
> > ->  Merge Left Join  (cost=47.09..289546.92 rows=2 width=78) (actual
> > time=1133.077..380398.160 rows=2408 loops=1)
> > Merge Cond: (t1.es09tipdoc = es09t1.es09tipdoc)
> > Join Filter: ((es09t1.es09codemp = t1.es09codemp) AND
> > (es09t1.es09numdoc = t1.es09numdoc))
> > Rows Removed by Join Filter: 992875295
> > Buffers: shared hit=82163
> > ->  Merge Left Join  (cost=46.53..49.29 rows=2 width=70) (actual
> > time=12.206..18.155 rows=2408 loops=1)
> > Merge Cond: (t1.es09tipdoc = t2.es08tipdoc)
> > Buffers: shared hit=6821
> > ->  Sort  (cost=9.19..9.19 rows=2 width=44) (actual
> > time=11.611..12.248 rows=2408 loops=1)
> > Sort Key: t1.es09tipdoc
> > Sort Method: quicksort  Memory: 285kB
> > Buffers: shared hit=6814
> > ->  Nested Loop  (cost=1.11..9.18 rows=2 width=44)
> > (actual time=0.040..10.398 rows=2408 loops=1)
> > Buffers: shared hit=6814
> > ->  Index Scan using ad_es09t_1 on es09t t3
> > (cost=0.56..4.58 rows=1 width=42) (actual time=0.020..0.687 rows=1212
> > loops=1)
> > Index Cond: ((es09codemp = 1) AND
> > (es09datreq >= '2016-02-02'::date) AND (es09datreq <= '2016-02-02'::date))
> > Filter: (es09usuari ~~
> > '%%%%%%%%%%%%%%%%%%%%'::text)
> > Buffers: shared hit=108
> > ->  Index Scan using es09t1_pkey on es09t1 t1
> > (cost=0.56..4.59 rows=1 width=19) (actual time=0.006..0.007 rows=2
> > loops=1212)
> > Index Cond: ((es09codemp = 1) AND
> > (es09tipdoc = t3.es09tipdoc) AND (es09numdoc = t3.es09numdoc))
> > Filter: (es09tipdoc ~~ '%%%%%'::text)
> > Buffers: shared hit=6706
> > ->  Sort  (cost=37.35..38.71 rows=547 width=32) (actual
> > time=0.592..2.206 rows=2919 loops=1)
> > Sort Key: t2.es08tipdoc
> > Sort Method: quicksort  Memory: 67kB
> > Buffers: shared hit=7
> > ->  Seq Scan on es08t t2  (cost=0.00..12.47 rows=547
> > width=32) (actual time=0.003..0.126 rows=547 loops=1)
> > Buffers: shared hit=7
> > ->  Materialize  (cost=0.56..287644.85 rows=716126 width=23)
> > (actual time=0.027..68577.800 rows=993087854 loops=1)
> > Buffers: shared hit=75342
> > ->  GroupAggregate  (cost=0.56..278693.28 rows=716126
> > width=15) (actual time=0.025..4242.453 rows=3607573 loops=1)
> > Group Key: es09t1.es09codemp, es09t1.es09tipdoc,
> > es09t1.es09numdoc
> > Buffers: shared hit=75342
> > ->  Index Only Scan using es09t1_pkey on es09t1
> > (cost=0.56..199919.49 rows=7161253 width=15) (actual time=0.016..1625.031
> > rows=7160921 loops=1)
> > Index Cond: (es09codemp = 1)
> > Heap Fetches: 51499
> > Buffers: shared hit=75342
> > Planning time: 50.129 ms
> > Execution time: 380419.435 ms
> > 
> 
> 
> --
> David Rowley                   http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
> 


[Attachment #3 (text/html)]

<div dir="ltr">Hi David,<div><br></div><div>Thanks for the explanations. But I \
don&#39;t undestand why when I recreate the table, the planner choose a best mode for \
sometime...</div><div><br></div><div><div>&gt;I wonder what the planner would do if \
you pulled out the join to  ES08T. If that generates a better plan, then providing \
that  es08tipdoc is the primary key of that table, then you could just put a  \
</div><div>&gt; subquery in the SELECT clause to lookup the  \
es08desdoc.</div><div><div></div></div></div><div><br></div><div>We have a problem \
with this approach. Actually, this querys are generated by a framework and can&#39;t \
be &#39;rewrite&#39;.  </div><div>Can you think in another solution directly in DB \
(perhaps a partial index, table partitioning, etc) ???</div><div><br></div><div>Best \
regards,</div><div><br></div><div>Alexandre</div><div><br></div><div><br><div \
class="gmail_extra"><br><div class="gmail_quote">2016-04-01 10:17 GMT-03:00 David \
Rowley <span dir="ltr">&lt;<a href="mailto:david.rowley@2ndquadrant.com" \
target="_blank">david.rowley@2ndquadrant.com</a>&gt;</span>:<br><blockquote \
class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div \
dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On 1 April 2016 at \
15:44, Alexandre de Arruda Paes <span dir="ltr">&lt;<a \
href="mailto:adaldeia@gmail.com" target="_blank">adaldeia@gmail.com</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div \
dir="ltr"><div>In the query below, the planner choose an extreme slow mergejoin(380 \
seconds). &#39;Vacuum analyze&#39; can&#39;t \
help.<br></div></div></blockquote><div><br></div></span><div>Yeah, it appears that \
planner is estimating the WHERE clause on es09t quite badly, expecting just 1 row, \
but there&#39;s actually  1212 rows.   This seems to throw the whole plan decision \
out quite a bit, as, if you notice in the merge left join for t1.es09tipdoc = \
t2.es08tipdoc, it expect just 2 rows to be present, therefore most likely thinks that \
it&#39;s not worth sorting those results on t1.es09tipdoc,  t1.es09numdoc in order \
for it to match the known output order of Materialize node on the inner side of that \
join. Instead it assumes the Merge join will be good enough on just the es09tipdoc \
order, and just adds the other two columns as join \
filters....</div><div><br></div><div>... and this seems to be what&#39;s killing the \
performance. This Merge Join constantly has to perform a mark/restore on the \
Materialize node. This why you&#39;re getting the insanely high &quot;Rows Removed by \
Join Filter: 992875295&quot;, in other words the join filter throw away that many row \
combinations because they didn&#39;t match.</div><div><br></div><div>This \
mark/restore is basically the rewind process that the merge join algorithm needs to \
do to match many to many rows. In actual fact, this rewind is pretty useless in this \
case as the GROUP BY subquery ensures that no duplicate values will make it into the \
inner side of that merge join. The planner is not currently smart enough to detect \
this.</div><div><br></div><div>There are some patches currently pending for 9.6 which \
might help fix this problem in the future;</div><div><br></div><div>1. multivariate \
statistics; this might help the poor estimates on es09t. If this was available you \
could add statistics on es09codemp, es09datreq, which may well improve the estimate \
and cause the plan to change. <a href="https://commitfest.postgresql.org/9/450/" \
target="_blank">https://commitfest.postgresql.org/9/450/</a></div><div>2. Unique \
joins. This tackles the problem a different way and allows the Merge Join algorithm \
to skip the restore with some new smarts that are added to the planner to detect when \
the inner side of the join can only produce, at most, a single row for each outer \
row.  <a href="https://commitfest.postgresql.org/9/129/" \
target="_blank">https://commitfest.postgresql.org/9/129/</a></div><div><br></div><div>If \
you feel like compiling 9.6 devel from source and applying each of these patches \
independently and seeing if it helps...   Of course that does not solve your 9.4 \
production dilemma, but it may help evaluation of each of these two patches for 9.6 \
or beyond.</div><div><br></div><div>I wonder what the planner would do if you pulled \
out the join to  ES08T. If that generates a better plan, then providing that  \
es08tipdoc is the primary key of that table, then you could just put a subquery in \
the SELECT clause to lookup the  es08desdoc.</div><div><div><div>  </div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div \
dir="ltr"><div></div><div>                                                            \
QUERY PLAN                                                                            \
<br></div><div><div>------------------------------------------------------------------ \
----------------------------------------------------------------------------------------------------</div><div> \
Sort   (cost=289546.93..289546.94 rows=2 width=78) (actual \
time=380405.796..380405.929 rows=2408 loops=1)</div><div>     Sort Key: \
t1.es09numdoc, t1.es09tipdoc</div><div>     Sort Method: quicksort   Memory: \
435kB</div><div>     Buffers: shared hit=82163</div><div>     -&gt;   Merge Left Join \
(cost=47.09..289546.92 rows=2 width=78) (actual time=1133.077..380398.160 rows=2408 \
loops=1)</div><div>              Merge Cond: (t1.es09tipdoc = \
es09t1.es09tipdoc)</div><div>              Join Filter: ((es09t1.es09codemp = \
t1.es09codemp) AND (es09t1.es09numdoc = t1.es09numdoc))</div><div>              Rows \
Removed by Join Filter: 992875295</div><div>              Buffers: shared \
hit=82163</div><div>              -&gt;   Merge Left Join   (cost=46.53..49.29 rows=2 \
width=70) (actual time=12.206..18.155 rows=2408 loops=1)</div><div>                   \
Merge Cond: (t1.es09tipdoc = t2.es08tipdoc)</div><div>                       Buffers: \
shared hit=6821</div><div>                       -&gt;   Sort   (cost=9.19..9.19 \
rows=2 width=44) (actual time=11.611..12.248 rows=2408 loops=1)</div><div>            \
Sort Key: t1.es09tipdoc</div><div>                                Sort Method: \
quicksort   Memory: 285kB</div><div>                                Buffers: shared \
hit=6814</div><div>                                -&gt;   Nested Loop   \
(cost=1.11..9.18 rows=2 width=44) (actual time=0.040..10.398 rows=2408 \
loops=1)</div><div>                                         Buffers: shared \
hit=6814</div><div>                                         -&gt;   Index Scan using \
ad_es09t_1 on es09t t3   (cost=0.56..4.58 rows=1 width=42) (actual time=0.020..0.687 \
rows=1212 loops=1)</div><div>                                                  Index \
Cond: ((es09codemp = 1) AND (es09datreq &gt;= &#39;2016-02-02&#39;::date) AND \
(es09datreq &lt;= &#39;2016-02-02&#39;::date))</div><div>                             \
Filter: (es09usuari ~~ &#39;%%%%%%%%%%%%%%%%%%%%&#39;::text)</div><div>               \
Buffers: shared hit=108</div><div>                                         -&gt;   \
Index Scan using es09t1_pkey on es09t1 t1   (cost=0.56..4.59 rows=1 width=19) (actual \
time=0.006..0.007 rows=2 loops=1212)</div><div>                                       \
Index Cond: ((es09codemp = 1) AND (es09tipdoc = t3.es09tipdoc) AND (es09numdoc = \
t3.es09numdoc))</div><div>                                                  Filter: \
(es09tipdoc ~~ &#39;%%%%%&#39;::text)</div><div>                                      \
Buffers: shared hit=6706</div><div>                       -&gt;   Sort   \
(cost=37.35..38.71 rows=547 width=32) (actual time=0.592..2.206 rows=2919 \
loops=1)</div><div>                                Sort Key: t2.es08tipdoc</div><div> \
Sort Method: quicksort   Memory: 67kB</div><div>                                \
Buffers: shared hit=7</div><div>                                -&gt;   Seq Scan on \
es08t t2   (cost=0.00..12.47 rows=547 width=32) (actual time=0.003..0.126 rows=547 \
loops=1)</div><div>                                         Buffers: shared \
hit=7</div><div>              -&gt;   Materialize   (cost=0.56..287644.85 rows=716126 \
width=23) (actual time=0.027..68577.800 rows=993087854 loops=1)</div><div>            \
Buffers: shared hit=75342</div><div>                       -&gt;   GroupAggregate   \
(cost=0.56..278693.28 rows=716126 width=15) (actual time=0.025..4242.453 rows=3607573 \
loops=1)</div><div>                                Group Key: es09t1.es09codemp, \
es09t1.es09tipdoc, es09t1.es09numdoc</div><div>                                \
Buffers: shared hit=75342</div><div>                                -&gt;   Index \
Only Scan using es09t1_pkey on es09t1   (cost=0.56..199919.49 rows=7161253 width=15) \
(actual time=0.016..1625.031 rows=7160921 loops=1)</div><div>                         \
Index Cond: (es09codemp = 1)</div><div>                                         Heap \
Fetches: 51499</div><div>                                         Buffers: shared \
hit=75342</div><div>  Planning time: 50.129 ms</div><div>  Execution time: 380419.435 \
ms</div></div></div></blockquote></div></div></div><span><font color="#888888"><br \
clear="all"><div><br></div>-- <br><div><div dir="ltr"><div><div dir="ltr"><div><div \
dir="ltr"><div><div dir="ltr"><font color="#000000"><span style="font-size:12.8px">  \
David Rowley                             </span><a href="http://www.2ndQuadrant.com/" \
target="_blank">http://www.2ndQuadrant.com/</a><br></font><div \
style="font-size:12.8px"><div></div><div><font color="#000000">  PostgreSQL \
Development, 24x7 Support, Training &amp; \
Services</font></div></div></div></div></div></div></div></div></div></div> \
</font></span></div></div> </blockquote></div><br></div></div></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic