'Hash aggregate spilling (v13) / partitions & batches'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Hash aggregate spilling (v13) / partitions & batches
From:       talk to ben <blo.talkto () gmail ! com>
Date:       2020-11-23 16:36:19
Message-ID: CAPE8EZ7npc55G4PnH2sW75+a0wfjd8e6GT6z8o95gqUioh0ZXw () mail ! gmail ! com
[Download RAW message or body]

Hi,

I am testing things on hash aggregate spilling in version 13.1 and am
struggling to understand the partition thing in the two explains below.
My understanding is that a partition corresponds to a spill file which will
be treated in a later batch (which can re-spill in some cases).

Am I right to think that the second explain analyze says that PostgreSQL
was planning for 8 batches (there are 8 planned partitions) and that only
one was necessary (= no spill files) ?

regards
benoit

[local]:5433 postgres@postgres=# CREATE TABLE tableA(ac1 int, ac2 int);
CREATE TABLE
[local]:5433 postgres@postgres=# CREATE TABLE tableB(bc1 int, bc2 int);
CREATE TABLE

[local]:5433 postgres@postgres=# INSERT INTO tableA SELECT x, random()*100
FROM generate_series(1,1000000) AS F(x);
INSERT 0 1000000
[local]:5433 postgres@postgres=# INSERT INTO tableB SELECT mod(x,100000),
random()*100 FROM generate_series(1,1000000) AS F(x);
INSERT 0 1000000

[local]:5433 postgres@postgres=# SELECT name, setting, unit FROM
pg_settings WHERE name IN('work_mem', 'hash_mem_multiplier');
        name         | setting | unit
---------------------+---------+------
 hash_mem_multiplier | 1       | NULL
 work_mem            | 4096    | kB
(2 rows)
[local]:5433 postgres@postgres=# EXPLAIN (ANALYZE) SELECT ac1, count(ac2),
sum(bc2) FROM tableA INNER JOIN TABLEB ON ac1 = bc1 GROUP BY Ac1;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost7290.50..157056.12 rows00000 width ) (actual
timew3.405..889.020 rows™999 loops=1)
   Group Key: tablea.ac1
   Planned Partitions: 32  Batches: 33  Memory Usage: 4369kB  Disk Usage:
30456kB
   ->  Hash Join  (cost0832.00..70728.00 rows00000 width) (actual
time8.774..583.031 rows™9990 loops=1)
         Hash Cond: (tableb.bc1 = tablea.ac1)
         ->  Seq Scan on tableb  (cost=0.00..14425.00 rows00000 width=8)
(actual time=0.023..77.297 rows00000 loops=1)
         ->  Hash  (cost425.00..14425.00 rows00000 width=8) (actual
time8.378..158.379 rows00000 loops=1)
               Buckets: 131072  Batches: 16  Memory Usage: 3471kB
               ->  Seq Scan on tablea  (cost=0.00..14425.00 rows00000
width=8) (actual time=0.010..53.476 rows00000 loops=1)
 Planning Time: 0.824 ms
 Execution Time: 895.251 ms
(11 rows)

[local]:5433 postgres@postgres=# SET hash_mem_multiplier TO 5;
SET
[local]:5433 postgres@postgres=# EXPLAIN (ANALYZE) SELECT ac1, count(ac2),
sum(bc2) FROM tableA INNER JOIN TABLEB ON ac1 = bc1 GROUP BY Ac1;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost7290.50..157056.12 rows00000 width ) (actual
timei6.684..714.198 rows™999 loops=1)
   Group Key: tablea.ac1
   Planned Partitions: 8  Batches: 1  Memory Usage: 15633kB
   ->  Hash Join  (cost0832.00..70728.00 rows00000 width) (actual
time1.789..560.692 rows™9990 loops=1)
         Hash Cond: (tableb.bc1 = tablea.ac1)
         ->  Seq Scan on tableb  (cost=0.00..14425.00 rows00000 width=8)
(actual time=0.032..78.718 rows00000 loops=1)
         ->  Hash  (cost425.00..14425.00 rows00000 width=8) (actual
time8.592..168.593 rows00000 loops=1)
               Buckets: 524288  Batches: 4  Memory Usage: 13854kB
               ->  Seq Scan on tablea  (cost=0.00..14425.00 rows00000
width=8) (actual time=0.018..52.796 rows00000 loops=1)
 Planning Time: 0.242 ms
 Execution Time: 717.914 ms
(11 rows)


[Attachment #3 (text/html)]

<div dir="ltr"><div>Hi,</div><div><br></div><div>I am testing things on hash \
aggregate spilling in version 13.1 and am struggling to understand the partition \
thing in the two explains below.<br></div><div>My understanding is that a partition \
corresponds to a spill file which will be treated in a later batch (which can \
re-spill in some cases).</div><div><br></div><div>Am I right to think that the second \
explain analyze says that PostgreSQL was planning for 8 batches (there are 8 planned \
partitions) and that only one was necessary (= no spill files) \
?<br></div><div><br></div><div>regards</div><div>benoit<br></div><div><br></div><div>[local]:5433 \
postgres@postgres=# CREATE TABLE tableA(ac1 int, ac2 int);<br>CREATE \
TABLE<br>[local]:5433 postgres@postgres=# CREATE TABLE tableB(bc1 int, bc2 \
int);<br>CREATE TABLE<br><br>[local]:5433 postgres@postgres=# INSERT INTO tableA \
SELECT x, random()*100 FROM generate_series(1,1000000) AS F(x);<br>INSERT 0 \
1000000<br>[local]:5433 postgres@postgres=# INSERT INTO tableB SELECT mod(x,100000), \
random()*100 FROM generate_series(1,1000000) AS F(x);<br>INSERT 0 \
1000000</div><div><br></div><div>[local]:5433 postgres@postgres=# SELECT name, \
setting, unit FROM pg_settings WHERE name IN(&#39;work_mem&#39;, \
&#39;hash_mem_multiplier&#39;);<br>            name             | setting | \
unit<br>---------------------+---------+------<br>  hash_mem_multiplier | 1          \
| NULL<br>  work_mem                  | 4096      | kB<br>(2 rows)<br>[local]:5433 \
postgres@postgres=# EXPLAIN (ANALYZE) SELECT ac1, count(ac2), sum(bc2) FROM tableA \
INNER JOIN TABLEB ON ac1 = bc1 GROUP BY Ac1;<br>                                      \
QUERY PLAN<br>-----------------------------------------------------------------------------------------------------------------------------------<br> \
HashAggregate   (cost=137290.50..157056.12 rows=1000000 width=20) (actual \
time=773.405..889.020 rows=99999 loops=1)<br>     Group Key: tablea.ac1<br>     \
Planned Partitions: 32   Batches: 33   Memory Usage: 4369kB   Disk Usage: 30456kB<br> \
-&gt;   Hash Join   (cost=30832.00..70728.00 rows=1000000 width=12) (actual \
time=158.774..583.031 rows=999990 loops=1)<br>              Hash Cond: (tableb.bc1 = \
tablea.ac1)<br>              -&gt;   Seq Scan on tableb   (cost=0.00..14425.00 \
rows=1000000 width=8) (actual time=0.023..77.297 rows=1000000 loops=1)<br>            \
-&gt;   Hash   (cost=14425.00..14425.00 rows=1000000 width=8) (actual \
time=158.378..158.379 rows=1000000 loops=1)<br>                       Buckets: 131072 \
Batches: 16   Memory Usage: 3471kB<br>                       -&gt;   Seq Scan on \
tablea   (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.010..53.476 \
rows=1000000 loops=1)<br>  Planning Time: 0.824 ms<br>  Execution Time: 895.251 \
ms<br>(11 rows)<br><br>[local]:5433 postgres@postgres=# SET hash_mem_multiplier TO \
5;<br>SET<br>[local]:5433 postgres@postgres=# EXPLAIN (ANALYZE) SELECT ac1, \
count(ac2), sum(bc2) FROM tableA INNER JOIN TABLEB ON ac1 = bc1 GROUP BY Ac1;<br>     \
QUERY PLAN<br>-----------------------------------------------------------------------------------------------------------------------------------<br> \
HashAggregate   (cost=137290.50..157056.12 rows=1000000 width=20) (actual \
time=696.684..714.198 rows=99999 loops=1)<br>     Group Key: tablea.ac1<br>     \
Planned Partitions: 8   Batches: 1   Memory Usage: 15633kB<br>     -&gt;   Hash Join  \
(cost=30832.00..70728.00 rows=1000000 width=12) (actual time=171.789..560.692 \
rows=999990 loops=1)<br>              Hash Cond: (tableb.bc1 = tablea.ac1)<br>        \
-&gt;   Seq Scan on tableb   (cost=0.00..14425.00 rows=1000000 width=8) (actual \
time=0.032..78.718 rows=1000000 loops=1)<br>              -&gt;   Hash   \
(cost=14425.00..14425.00 rows=1000000 width=8) (actual time=168.592..168.593 \
rows=1000000 loops=1)<br>                       Buckets: 524288   Batches: 4   Memory \
Usage: 13854kB<br>                       -&gt;   Seq Scan on tablea   \
(cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.018..52.796 rows=1000000 \
loops=1)<br>  Planning Time: 0.242 ms<br>  Execution Time: 717.914 ms<br>(11 \
rows)</div><div><br></div></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic