[prev in list] [next in list] [prev in thread] [next in thread]
List: postgresql-general
Subject: Hash aggregate spilling (v13) / partitions & batches
From: talk to ben <blo.talkto () gmail ! com>
Date: 2020-11-23 16:36:19
Message-ID: CAPE8EZ7npc55G4PnH2sW75+a0wfjd8e6GT6z8o95gqUioh0ZXw () mail ! gmail ! com
[Download RAW message or body]
Hi,
I am testing things on hash aggregate spilling in version 13.1 and am
struggling to understand the partition thing in the two explains below.
My understanding is that a partition corresponds to a spill file which will
be treated in a later batch (which can re-spill in some cases).
Am I right to think that the second explain analyze says that PostgreSQL
was planning for 8 batches (there are 8 planned partitions) and that only
one was necessary (= no spill files) ?
regards
benoit
[local]:5433 postgres@postgres=# CREATE TABLE tableA(ac1 int, ac2 int);
CREATE TABLE
[local]:5433 postgres@postgres=# CREATE TABLE tableB(bc1 int, bc2 int);
CREATE TABLE
[local]:5433 postgres@postgres=# INSERT INTO tableA SELECT x, random()*100
FROM generate_series(1,1000000) AS F(x);
INSERT 0 1000000
[local]:5433 postgres@postgres=# INSERT INTO tableB SELECT mod(x,100000),
random()*100 FROM generate_series(1,1000000) AS F(x);
INSERT 0 1000000
[local]:5433 postgres@postgres=# SELECT name, setting, unit FROM
pg_settings WHERE name IN('work_mem', 'hash_mem_multiplier');
name | setting | unit
---------------------+---------+------
hash_mem_multiplier | 1 | NULL
work_mem | 4096 | kB
(2 rows)
[local]:5433 postgres@postgres=# EXPLAIN (ANALYZE) SELECT ac1, count(ac2),
sum(bc2) FROM tableA INNER JOIN TABLEB ON ac1 = bc1 GROUP BY Ac1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost7290.50..157056.12 rows00000 width ) (actual
timew3.405..889.020 rows™999 loops=1)
Group Key: tablea.ac1
Planned Partitions: 32 Batches: 33 Memory Usage: 4369kB Disk Usage:
30456kB
-> Hash Join (cost0832.00..70728.00 rows00000 width) (actual
time8.774..583.031 rows™9990 loops=1)
Hash Cond: (tableb.bc1 = tablea.ac1)
-> Seq Scan on tableb (cost=0.00..14425.00 rows00000 width=8)
(actual time=0.023..77.297 rows00000 loops=1)
-> Hash (cost425.00..14425.00 rows00000 width=8) (actual
time8.378..158.379 rows00000 loops=1)
Buckets: 131072 Batches: 16 Memory Usage: 3471kB
-> Seq Scan on tablea (cost=0.00..14425.00 rows00000
width=8) (actual time=0.010..53.476 rows00000 loops=1)
Planning Time: 0.824 ms
Execution Time: 895.251 ms
(11 rows)
[local]:5433 postgres@postgres=# SET hash_mem_multiplier TO 5;
SET
[local]:5433 postgres@postgres=# EXPLAIN (ANALYZE) SELECT ac1, count(ac2),
sum(bc2) FROM tableA INNER JOIN TABLEB ON ac1 = bc1 GROUP BY Ac1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost7290.50..157056.12 rows00000 width ) (actual
timei6.684..714.198 rows™999 loops=1)
Group Key: tablea.ac1
Planned Partitions: 8 Batches: 1 Memory Usage: 15633kB
-> Hash Join (cost0832.00..70728.00 rows00000 width) (actual
time1.789..560.692 rows™9990 loops=1)
Hash Cond: (tableb.bc1 = tablea.ac1)
-> Seq Scan on tableb (cost=0.00..14425.00 rows00000 width=8)
(actual time=0.032..78.718 rows00000 loops=1)
-> Hash (cost425.00..14425.00 rows00000 width=8) (actual
time8.592..168.593 rows00000 loops=1)
Buckets: 524288 Batches: 4 Memory Usage: 13854kB
-> Seq Scan on tablea (cost=0.00..14425.00 rows00000
width=8) (actual time=0.018..52.796 rows00000 loops=1)
Planning Time: 0.242 ms
Execution Time: 717.914 ms
(11 rows)
[Attachment #3 (text/html)]
<div dir="ltr"><div>Hi,</div><div><br></div><div>I am testing things on hash \
aggregate spilling in version 13.1 and am struggling to understand the partition \
thing in the two explains below.<br></div><div>My understanding is that a partition \
corresponds to a spill file which will be treated in a later batch (which can \
re-spill in some cases).</div><div><br></div><div>Am I right to think that the second \
explain analyze says that PostgreSQL was planning for 8 batches (there are 8 planned \
partitions) and that only one was necessary (= no spill files) \
?<br></div><div><br></div><div>regards</div><div>benoit<br></div><div><br></div><div>[local]:5433 \
postgres@postgres=# CREATE TABLE tableA(ac1 int, ac2 int);<br>CREATE \
TABLE<br>[local]:5433 postgres@postgres=# CREATE TABLE tableB(bc1 int, bc2 \
int);<br>CREATE TABLE<br><br>[local]:5433 postgres@postgres=# INSERT INTO tableA \
SELECT x, random()*100 FROM generate_series(1,1000000) AS F(x);<br>INSERT 0 \
1000000<br>[local]:5433 postgres@postgres=# INSERT INTO tableB SELECT mod(x,100000), \
random()*100 FROM generate_series(1,1000000) AS F(x);<br>INSERT 0 \
1000000</div><div><br></div><div>[local]:5433 postgres@postgres=# SELECT name, \
setting, unit FROM pg_settings WHERE name IN('work_mem', \
'hash_mem_multiplier');<br> name | setting | \
unit<br>---------------------+---------+------<br> hash_mem_multiplier | 1 \
| NULL<br> work_mem | 4096 | kB<br>(2 rows)<br>[local]:5433 \
postgres@postgres=# EXPLAIN (ANALYZE) SELECT ac1, count(ac2), sum(bc2) FROM tableA \
INNER JOIN TABLEB ON ac1 = bc1 GROUP BY Ac1;<br> \
QUERY PLAN<br>-----------------------------------------------------------------------------------------------------------------------------------<br> \
HashAggregate (cost=137290.50..157056.12 rows=1000000 width=20) (actual \
time=773.405..889.020 rows=99999 loops=1)<br> Group Key: tablea.ac1<br> \
Planned Partitions: 32 Batches: 33 Memory Usage: 4369kB Disk Usage: 30456kB<br> \
-> Hash Join (cost=30832.00..70728.00 rows=1000000 width=12) (actual \
time=158.774..583.031 rows=999990 loops=1)<br> Hash Cond: (tableb.bc1 = \
tablea.ac1)<br> -> Seq Scan on tableb (cost=0.00..14425.00 \
rows=1000000 width=8) (actual time=0.023..77.297 rows=1000000 loops=1)<br> \
-> Hash (cost=14425.00..14425.00 rows=1000000 width=8) (actual \
time=158.378..158.379 rows=1000000 loops=1)<br> Buckets: 131072 \
Batches: 16 Memory Usage: 3471kB<br> -> Seq Scan on \
tablea (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.010..53.476 \
rows=1000000 loops=1)<br> Planning Time: 0.824 ms<br> Execution Time: 895.251 \
ms<br>(11 rows)<br><br>[local]:5433 postgres@postgres=# SET hash_mem_multiplier TO \
5;<br>SET<br>[local]:5433 postgres@postgres=# EXPLAIN (ANALYZE) SELECT ac1, \
count(ac2), sum(bc2) FROM tableA INNER JOIN TABLEB ON ac1 = bc1 GROUP BY Ac1;<br> \
QUERY PLAN<br>-----------------------------------------------------------------------------------------------------------------------------------<br> \
HashAggregate (cost=137290.50..157056.12 rows=1000000 width=20) (actual \
time=696.684..714.198 rows=99999 loops=1)<br> Group Key: tablea.ac1<br> \
Planned Partitions: 8 Batches: 1 Memory Usage: 15633kB<br> -> Hash Join \
(cost=30832.00..70728.00 rows=1000000 width=12) (actual time=171.789..560.692 \
rows=999990 loops=1)<br> Hash Cond: (tableb.bc1 = tablea.ac1)<br> \
-> Seq Scan on tableb (cost=0.00..14425.00 rows=1000000 width=8) (actual \
time=0.032..78.718 rows=1000000 loops=1)<br> -> Hash \
(cost=14425.00..14425.00 rows=1000000 width=8) (actual time=168.592..168.593 \
rows=1000000 loops=1)<br> Buckets: 524288 Batches: 4 Memory \
Usage: 13854kB<br> -> Seq Scan on tablea \
(cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.018..52.796 rows=1000000 \
loops=1)<br> Planning Time: 0.242 ms<br> Execution Time: 717.914 ms<br>(11 \
rows)</div><div><br></div></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic