'Re: Underestimated number of output rows with an aggregate function'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-performance
Subject:    Re: Underestimated number of output rows with an aggregate function
From:       Philippe BEAUDOIN <phb.emaj () free ! fr>
Date:       2023-10-16 16:52:11
Message-ID: 84cc7977-49ba-49b2-ab36-dea0ed14ebdc () free ! fr
[Download RAW message or body]

Le 15/10/2023 Ă  18:37, Tom Lane a ĂŠcrit  :
> Philippe BEAUDOIN<phb.emaj@free.fr>  writes:
>> During my analysis, I realized that the output rows estimate of the
>> second CTE is really bad, leading to a bad plan for the next CTE.
>> I reproduced the issue in a very small test case with a simplified
>> query. Attached is a shell script and its output.
> Yeah.  If you try it you'll see that the estimates for the
> "keys.c1 = tbl.c1" and "keys.seq = tbl.seq" clauses are spot-on
> individually.  The problem is that the planner assumes that they
> are independent clauses, so it multiplies those selectivities together.
> In reality, because seq is already unique, the condition on c1 adds
> no additional selectivity.
>
> If seq is guaranteed unique in your real application, you could just
> drop the condition on c1.  Otherwise I'm not sure about a good
> answer.  In principle creating extended stats on c1 and seq should
> help, but I think we don't yet apply those for join clauses.
>
> A partial answer could be to defeat application of the table's
> statistics by writing
>
>            JOIN keys ON (keys.c1 = tbl.c1+0 AND keys.seq = tbl.seq+0)
>
> For me this gives an output estimate of 3000 rows, which is still not
> great but should at least prevent choice of an insane plan at the
> next join level.  However, it pessimizes the plan for this query
> itself a little bit (about doubling the runtime).

Thanks for the trick (and the quick answer). In the test case, it 
effectively brings a pretty good plan.

Unfortunately, as these statements are generated and depend on the base 
table structure, the issue remains for some of them (but not all). So, 
for the moment at least, I keep the previous workaround (disabling 
nested loops).

>> For the hash join node, the plan shows a "Inner Unique: true" property.
>> I wonder if this is normal.
> Sure.  The output of the WITH is visibly unique on c1.
OK, I see.
> 			regards, tom lane


[Attachment #3 (text/html)]

<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-cite-prefix">Le 15/10/2023 Ă  18:37, Tom Lane a
      ĂŠcrit  :<br>
    </div>
    <blockquote type="cite" cite="mid:2091880.1697387849@sss.pgh.pa.us">
      <pre class="moz-quote-pre" wrap="">Philippe BEAUDOIN <a \
class="moz-txt-link-rfc2396E" \
href="mailto:phb.emaj@free.fr">&lt;phb.emaj@free.fr&gt;</a> writes: </pre>
      <blockquote type="cite">
        <pre class="moz-quote-pre" wrap="">During my analysis, I realized that the \
output rows estimate of the  second CTE is really bad, leading to a bad plan for the \
next CTE. I reproduced the issue in a very small test case with a simplified 
query. Attached is a shell script and its output.
</pre>
      </blockquote>
      <pre class="moz-quote-pre" wrap="">
Yeah.  If you try it you'll see that the estimates for the
"keys.c1 = tbl.c1" and "keys.seq = tbl.seq" clauses are spot-on
individually.  The problem is that the planner assumes that they
are independent clauses, so it multiplies those selectivities together.
In reality, because seq is already unique, the condition on c1 adds
no additional selectivity.

If seq is guaranteed unique in your real application, you could just
drop the condition on c1.  Otherwise I'm not sure about a good
answer.  In principle creating extended stats on c1 and seq should
help, but I think we don't yet apply those for join clauses.

A partial answer could be to defeat application of the table's
statistics by writing

          JOIN keys ON (keys.c1 = tbl.c1+0 AND keys.seq = tbl.seq+0)

For me this gives an output estimate of 3000 rows, which is still not
great but should at least prevent choice of an insane plan at the
next join level.  However, it pessimizes the plan for this query
itself a little bit (about doubling the runtime).</pre>
    </blockquote>
    <p>Thanks for the trick (and the quick answer). In the test case, it
      effectively brings a pretty good plan.<br>
    </p>
    <p>Unfortunately, as these statements are generated and depend on
      the base table structure, the issue remains for some of them (but
      not all). So, for the moment at least, I keep the previous
      workaround (disabling nested loops).</p>
    <blockquote type="cite" cite="mid:2091880.1697387849@sss.pgh.pa.us">
      <pre class="moz-quote-pre" wrap="">
</pre>
      <blockquote type="cite">
        <pre class="moz-quote-pre" wrap="">For the hash join node, the plan shows a \
"Inner Unique: true" property.  I wonder if this is normal.
</pre>
      </blockquote>
      <pre class="moz-quote-pre" wrap="">
Sure.  The output of the WITH is visibly unique on c1.</pre>
    </blockquote>
    OK, I see.<span style="white-space: pre-wrap">
</span>
    <blockquote type="cite" cite="mid:2091880.1697387849@sss.pgh.pa.us">
      <pre class="moz-quote-pre" wrap="">
			regards, tom lane
</pre>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic