'Re: Please help! Query jumps from 1s -> 4m'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-performance
Subject:    Re: Please help! Query jumps from 1s -> 4m
From:       James Thompson <james () jthompson ! dev>
Date:       2020-05-13 14:17:03
Message-ID: CABoe=cTRMXFBTJ93nVk=JyPsFLTfxBZ+emGQU_wzPxiX40tMQA () mail ! gmail ! com
[Download RAW message or body]

Just to follow up on this...
Tried increasing stats targets last week + re-analyzing but the query was
just as bad.
Ended up increasing the prepareThreshold to prevent server-side prepares
for now (and thus later generic statements). This 'fixed' the issue and had
no noticeable negative effect for our workloads.

I still don't understand why the plan being off makes the query so much
slower in this case (the plans I shared in the last email don't look too
different, I don't understand how the filter can add on 2mins of execution
time to an index-only scan). If anyone does have thoughts on what could be
happening I would be very interested to hear, but the main performance
problem is effectively solved.

Thanks all for the valuable help getting to the bottom of what was
happening.

On Tue, 5 May 2020 at 22:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> James Thompson <james@jthompson.dev> writes:
> > The slowness occurs when the prepared statement changes to a generic
> plan.
>
> > Initial plan:
> > ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on
> table1
> > table1alias2  (cost=0.56..2549.70 rows=70 width=36) (actual
> > time=1.901..45.256 rows=65000 loops=1)
> >     Output: table1alias2.uniqueid
> >     Index Cond: ((table1alias2.col20 = '12345'::bigint) AND
> (table1alias2.
> > col8 = ANY ('{c5986b02-3a02-4639-8147-f286972413ba,...
> > 98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[])))
> >     Heap Fetches: 10
> >     Buffers: shared hit=5048
>
> > after 5 executions of the statement:
> > ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on
> table1
> > table1alias2  (cost=0.56..17.23 rows=1 width=36) (actual
> > time=125.344..126877.822 rows=65000 loops=1)
> >     Output: table1alias2.uniqueid
> >     Index Cond: (table1alias2.col20 = $1001)
> >     Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ...,
> > $1000])::text[]))
> >     Rows Removed by Filter: 2670023
> >     Heap Fetches: 428
> >     Buffers: shared hit=45933 read=42060 dirtied=4
>
> Yeah, this is a dynamic we've seen before.  The rowcount estimate, and
> hence the cost estimate, for the plan with explicit parameter values is
> way off; but the estimate for the generic plan is even more way off,
> causing the system to falsely decide that the latter is cheaper.
>
> I've speculated about refusing to believe generic cost estimates if they
> are
> more than epsilon less than the concrete cost estimate, but it's not quite
> clear how that should work or whether it'd have its own failure modes.
>
> The one thing that is totally clear is that these rowcount estimates are
> crappy.  Can you improve them by increasing the stats target for that
> table?  Maybe with less-garbage-y inputs, the system would make the right
> plan choice here.
>
>                         regards, tom lane
>

[Attachment #3 (text/html)]

<div dir="ltr"><div>Just to follow up on this...</div><div>Tried increasing stats \
targets last week + re-analyzing but the query was just as bad. <br></div><div>Ended \
up increasing the prepareThreshold to prevent server-side prepares for now (and thus \
later generic statements). This &#39;fixed&#39; the issue and had no noticeable \
negative effect for our workloads.<br></div><div><br></div><div>I still don&#39;t \
understand why the plan being off makes the query so much slower in this case (the \
plans I shared in the last email don&#39;t look too different, I don&#39;t understand \
how the filter can add on 2mins of execution time to an index-only scan). If anyone \
does have thoughts on what could be happening I would be very interested to hear, but \
the main performance problem is effectively solved.</div><div><br></div><div>Thanks \
all for the valuable help getting to the bottom of what was \
happening.<br></div></div><br><div class="gmail_quote"><div dir="ltr" \
class="gmail_attr">On Tue, 5 May 2020 at 22:42, Tom Lane &lt;<a \
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex">James Thompson &lt;<a \
href="mailto:james@jthompson.dev" target="_blank">james@jthompson.dev</a>&gt; \
writes:<br> &gt; The slowness occurs when the prepared statement changes to a generic \
plan.<br> <br>
&gt; Initial plan:<br>
&gt; -&gt;   Index Only Scan using table1_typea_include_uniqueid_col16_idx on \
table1<br> &gt; table1alias2   (cost=0.56..2549.70 rows=70 width=36) (actual<br>
&gt; time=1.901..45.256 rows=65000 loops=1)<br>
&gt;        Output: table1alias2.uniqueid<br>
&gt;        Index Cond: ((table1alias2.col20 = &#39;12345&#39;::bigint) AND \
(table1alias2.<br> &gt; col8 = ANY \
(&#39;{c5986b02-3a02-4639-8147-f286972413ba,...<br> &gt; \
98ed24b1-76f5-4b0e-bb94-86cf13a4809c}&#39;::text[])))<br> &gt;        Heap Fetches: \
10<br> &gt;        Buffers: shared hit=5048<br>
<br>
&gt; after 5 executions of the statement:<br>
&gt; -&gt;   Index Only Scan using table1_typea_include_uniqueid_col16_idx on \
table1<br> &gt; table1alias2   (cost=0.56..17.23 rows=1 width=36) (actual<br>
&gt; time=125.344..126877.822 rows=65000 loops=1)<br>
&gt;        Output: table1alias2.uniqueid<br>
&gt;        Index Cond: (table1alias2.col20 = $1001)<br>
&gt;        Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ...,<br>
&gt; $1000])::text[]))<br>
&gt;        Rows Removed by Filter: 2670023<br>
&gt;        Heap Fetches: 428<br>
&gt;        Buffers: shared hit=45933 read=42060 dirtied=4<br>
<br>
Yeah, this is a dynamic we&#39;ve seen before.   The rowcount estimate, and<br>
hence the cost estimate, for the plan with explicit parameter values is<br>
way off; but the estimate for the generic plan is even more way off,<br>
causing the system to falsely decide that the latter is cheaper.<br>
<br>
I&#39;ve speculated about refusing to believe generic cost estimates if they are<br>
more than epsilon less than the concrete cost estimate, but it&#39;s not quite<br>
clear how that should work or whether it&#39;d have its own failure modes.<br>
<br>
The one thing that is totally clear is that these rowcount estimates are<br>
crappy.   Can you improve them by increasing the stats target for that<br>
table?   Maybe with less-garbage-y inputs, the system would make the right<br>
plan choice here.<br>
<br>
                                    regards, tom lane<br>
</blockquote></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic