'=?UTF-8?B?UmVbMl06IEluZGV4LW9ubHkgc2NhbiBub3Qgd29ya2luZyB3aGVuIElOIGNs?= =?UTF-8?B?YXVzZSBoYXMgMiBvc'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    =?UTF-8?B?UmVbMl06IEluZGV4LW9ubHkgc2NhbiBub3Qgd29ya2luZyB3aGVuIElOIGNs?= =?UTF-8?B?YXVzZSBoYXMgMiBvc
From:       Anna B. <terzi () bk ! ru>
Date:       2022-11-28 13:09:08
Message-ID: 1669640948.115398878 () f420 ! i ! mail ! ru
[Download RAW message or body]

[Attachment #2 (text/plain)]


Hi Tom and community,
 
Thank you very much! 
After digging how Postgres planner uses statistics, I have increased table statistics \
from 100 to 1000. It was enough for planner to use multiple scans of the index and \
then sort! (Also I have added dependency extended stats on the three columns as you \
suggested).  
Can I ask one more question. I am also testing same index but covering version:
 
create index "ix-transaction-client-trans_dttm-include-division"
    on transaction (client_id,
                    trans_dttm desc)
    include (division_code);
 
Why tuned statistics does not improved it?
 
EXPLAIN (ANALYZE, BUFFERS)
select *
from transaction
where client_id = 123456
  and (trans_dttm between TO_DATE('01.01.2020', 'dd.mm.yyyy') and \
TO_DATE('31.12.2022', 'dd.mm.yyyy'))  and (division_code in
       ('not_existing_code1', 'not_existing_code2'))
order by trans_dttm desc
 
"Index Scan using ""ix-transaction-client-trans_dttm-include-division"" on \
transaction  (cost=0.57..8243559.04 rows=240 width=921) (actual \
time=23920.988..23920.989 rows=0 loops=1)" "  Index Cond: ((client_id = \
'123456'::numeric) AND (trans_dttm >= to_date('01.01.2020'::text, \
'dd.mm.yyyy'::text)) AND (trans_dttm <= to_date('31.12.2022'::text, \
'dd.mm.yyyy'::text)))" "  Filter: ((division_code)::text = ANY \
('{not_existing_code1,not_existing_code2}'::text[]))"  Rows Removed by Filter: \
10000000  Buffers: shared hit=8021895 read=2038341
  I/O Timings: read=8902.706
Planning Time: 1.278 ms
Execution Time: 23921.026 ms
 
Yes, I have read about covering indexes in Postgres, about why it has to check rows \
visibility. But do not understand why Postgres prefers to filter 10000000 table rows \
instead of filtering in index + using visibility map. Btw, visibility map is up to \
date: relpages, reltuples, relallvisible
23478634, 210520464, 23478634
 
Thank you in advance,
Dmitry
> Пятница, 25 ноября 2022, 18:40 +03:00 от Tom Lane <tgl@sss.pgh.pa.us>:
> 
> =?UTF-8?B?QW5uYSBCLg==?= < terzi@bk.ru > writes:
> > create index "ix-transaction-client-trans_dttm-division"
> > on transaction (client_id,
> > trans_dttm desc,
> > division_code);
> > 
> > EXPLAIN (ANALYZE, BUFFERS)
> > select *
> > from transaction
> > where client_id = 123456
> > and (trans_dttm between TO_DATE('01.01.2020', 'dd.mm.yyyy') and \
> > TO_DATE('31.12.2022', 'dd.mm.yyyy')) and (division_code in
> > ('not_existing_code1', 'not_existing_code2'))
> > order by trans_dttm desc
> > limit 50 offset 0;
> 
> The reason you get a plan like this:
> 
> > " -> Index Scan using ""ix-transaction-client-trans_dttm-division"" on \
> > transaction (cost=0.57..8350814.66 rows=28072 width=2675) (actual \
> > time=703291.834..703291.835 rows=0 loops=1)" " Index Cond: ((client_id = \
> > '123456'::numeric) AND (trans_dttm >= to_date('01.01.2020'::text, \
> > 'dd.mm.yyyy'::text)) AND (trans_dttm <= to_date('31.12.2022'::text, \
> > 'dd.mm.yyyy'::text)))" " Filter: ((division_code)::text = ANY \
> > ('{not_existing_code1,not_existing_code2}'::text[]))"
> 
> is that if the =ANY clause were an index condition, it would result
> in multiple scans of the index, therefore the output would (in all
> probability) not be sorted in index order. To produce the demanded
> result, the plan would have to read the entire index scan and sort
> its output. The planner estimates that that would be slower than
> what it has done here. In practice it looks like you're reading
> the whole scan output anyway because there are less than 50
> matching rows, but the planner didn't know that.
> 
> The problem with =ANY producing unordered output can be dodged if
> the =ANY is on the first index column; but I suppose that does not
> help you here, since making division_code the first index column
> would defeat getting output that's sorted by trans_dttm anyway.
> 
> You might try making extended stats on these three columns to see
> if that helps the planner to get a better rowcount estimate.
> If it understood that there were fewer than 50 matching rows,
> it might opt for the use-the-=ANY-and-sort plan type.
> 
> regards, tom lane 
 
 
 
 


[Attachment #3 (text/html)]


<HTML><BODY><div><div><div>Hi Tom and \
community,</div></div><div>&nbsp;</div><div><div>Thank you very \
much!&nbsp;</div><div>After digging how Postgres planner uses statistics, I have \
increased table statistics from 100 to 1000. It was enough for planner to use \
multiple scans of the index and then sort!</div><div>(Also I have added dependency \
extended stats on the three columns as you \
suggested).</div></div><div>&nbsp;</div><div><div>Can I ask one more question. I am \
also testing same index but covering \
version:</div></div><div>&nbsp;</div><div><div>create index \
"ix-transaction-client-trans_dttm-include-division"</div><div>&nbsp; &nbsp; on \
transaction (client_id,</div><div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \
&nbsp; &nbsp; &nbsp; trans_dttm desc)</div><div>&nbsp; &nbsp; include \
(division_code);</div></div><div>&nbsp;</div><div><div>Why tuned statistics does not \
improved it?</div></div><div>&nbsp;</div><div><div>EXPLAIN (ANALYZE, \
BUFFERS)</div><div>select *</div><div>from transaction</div><div>where client_id = \
123456</div><div>&nbsp; and (trans_dttm between TO_DATE('01.01.2020', 'dd.mm.yyyy') \
and TO_DATE('31.12.2022', 'dd.mm.yyyy'))</div><div>&nbsp; and (division_code \
in</div><div>&nbsp; &nbsp; &nbsp; &nbsp;('not_existing_code1', \
'not_existing_code2'))</div><div>order by trans_dttm \
desc</div></div><div>&nbsp;</div><div><div>"Index Scan using \
""ix-transaction-client-trans_dttm-include-division"" on transaction \
&nbsp;(cost=0.57..8243559.04 rows=240 width=921) (actual time=23920.988..23920.989 \
rows=0 loops=1)"</div><div>" &nbsp;Index Cond: ((client_id = '123456'::numeric) AND \
(trans_dttm &gt;= to_date('01.01.2020'::text, 'dd.mm.yyyy'::text)) AND (trans_dttm \
&lt;= to_date('31.12.2022'::text, 'dd.mm.yyyy'::text)))"</div><div>" &nbsp;Filter: \
((division_code)::text = ANY \
('{not_existing_code1,not_existing_code2}'::text[]))"</div><div>&nbsp; Rows Removed \
by Filter: 10000000</div><div>&nbsp; Buffers: shared hit=8021895 \
read=2038341</div><div>&nbsp; I/O Timings: read=8902.706</div><div>Planning Time: \
1.278 ms</div><div>Execution Time: 23921.026 \
ms</div></div><div>&nbsp;</div><div><div>Yes, I have read about covering indexes in \
Postgres, about why it has to check rows visibility. But do not understand why \
Postgres prefers to filter 10000000 table&nbsp;rows instead of filtering in index + \
using visibility map.</div><div>Btw, visibility map is up to \
date:</div><div>relpages, reltuples, relallvisible</div><div>23478634, 210520464, \
23478634</div></div><div>&nbsp;</div><div><div>Thank you in \
advance,</div><div>Dmitry</div></div><br><blockquote style="border-left:1px solid \
#0857A6; margin:10px; padding:0 0 0 10px;">ÐÑÑÐ½Ð¸ÑÐ°, 25 Ð½Ð¾ÑÐ±ÑÑ 2022, \
18:40 +03:00 Ð¾Ñ Tom Lane &lt;tgl@sss.pgh.pa.us&gt;:<br>&nbsp;<div id=""><div \
class="js-helper js-readmsg-msg"><div><div \
id="style_16693908400580475194_BODY">=?UTF-8?B?QW5uYSBCLg==?= &lt;<a \
href="/compose?To=terzi@bk.ru">terzi@bk.ru</a>&gt; writes:<br>&gt; create index \
"ix-transaction-client-trans_dttm-division"<br>&gt; on transaction \
(client_id,<br>&gt; trans_dttm desc,<br>&gt; division_code);<br>&gt;<br>&gt; EXPLAIN \
(ANALYZE, BUFFERS)<br>&gt; select *<br>&gt; from transaction<br>&gt; where client_id \
= 123456<br>&gt; and (trans_dttm between TO_DATE('01.01.2020', 'dd.mm.yyyy') and \
TO_DATE('31.12.2022', 'dd.mm.yyyy'))<br>&gt; and (division_code in<br>&gt; \
('not_existing_code1', 'not_existing_code2'))<br>&gt; order by trans_dttm \
desc<br>&gt; limit 50 offset 0;<br><br>The reason you get a plan like \
this:<br><br>&gt; " -&gt; Index Scan using \
""ix-transaction-client-trans_dttm-division"" on transaction (cost=0.57..8350814.66 \
rows=28072 width=2675) (actual time=703291.834..703291.835 rows=0 loops=1)"<br>&gt; " \
Index Cond: ((client_id = '123456'::numeric) AND (trans_dttm &gt;= \
to_date('01.01.2020'::text, 'dd.mm.yyyy'::text)) AND (trans_dttm &lt;= \
to_date('31.12.2022'::text, 'dd.mm.yyyy'::text)))"<br>&gt; " Filter: \
((division_code)::text = ANY \
('{not_existing_code1,not_existing_code2}'::text[]))"<br><br>is that if the =ANY \
clause were an index condition, it would result<br>in multiple scans of the index, \
therefore the output would (in all<br>probability) not be sorted in index order. To \
produce the demanded<br>result, the plan would have to read the entire index scan and \
sort<br>its output. The planner estimates that that would be slower than<br>what it \
has done here. In practice it looks like you're reading<br>the whole scan output \
anyway because there are less than 50<br>matching rows, but the planner didn't know \
that.<br><br>The problem with =ANY producing unordered output can be dodged if<br>the \
=ANY is on the first index column; but I suppose that does not<br>help you here, \
since making division_code the first index column<br>would defeat getting output \
that's sorted by trans_dttm anyway.<br><br>You might try making extended stats on \
these three columns to see<br>if that helps the planner to get a better rowcount \
estimate.<br>If it understood that there were fewer than 50 matching rows,<br>it \
might opt for the use-the-=ANY-and-sort plan type.<br><br>regards, tom \
lane</div></div></div></div></blockquote>&nbsp;<div>&nbsp;</div><div \
data-signature-widget="container"><div \
data-signature-widget="content"><div>&nbsp;</div></div></div><div>&nbsp;</div></div></BODY></HTML>




[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic