'Re: [HACKERS] Parallel Seq Scan'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: [HACKERS] Parallel Seq Scan
From:       Thom Brown <thom () linux ! com>
Date:       2014-12-31 16:16:20
Message-ID: CAA-aLv4CRgko6C_KaY1gazS1NwTHY=h-Rq8a-VteGHyDqKHRtg () mail ! gmail ! com
[Download RAW message or body]

On 31 December 2014 at 14:20, Thom Brown <thom@linux.com> wrote:

> On 18 December 2014 at 16:03, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>>
>>
>> On Thu, Dec 18, 2014 at 9:22 PM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> >
>> > On Mon, Dec 8, 2014 at 10:40 AM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> > >
>> > > On Sat, Dec 6, 2014 at 5:37 PM, Stephen Frost <sfrost@snowman.net>
>> wrote:
>> > > >
>> > >
>> > > So to summarize my understanding, below are the set of things
>> > > which I should work on and in the order they are listed.
>> > >
>> > > 1. Push down qualification
>> > > 2. Performance Data
>> > > 3. Improve the way to push down the information related to worker.
>> > > 4. Dynamic allocation of work for workers.
>> > >
>> > >
>> >
>> > I have worked on the patch to accomplish above mentioned points
>> > 1, 2 and partly 3 and would like to share the progress with community.
>>
>> Sorry forgot to attach updated patch in last mail, attaching it now.
>>
>
> When attempting to recreate the plan in your example, I get an error:
>
>  ➤ psql://thom@[local]:5488/pgbench
>
> # create table t1(c1 int, c2 char(500)) with (fillfactor=10);
> CREATE TABLE
> Time: 13.653 ms
>
>  ➤ psql://thom@[local]:5488/pgbench
>
> # insert into t1 values(generate_series(1,100),'amit');
> INSERT 0 100
> Time: 4.796 ms
>
>  ➤ psql://thom@[local]:5488/pgbench
>
> # explain select c1 from t1;
> ERROR:  could not register background process
> HINT:  You may need to increase max_worker_processes.
> Time: 1.659 ms
>
>  ➤ psql://thom@[local]:5488/pgbench
>
> # show max_worker_processes ;
>  max_worker_processes
> ----------------------
>  8
> (1 row)
>
> Time: 0.199 ms
>
> # show parallel_seqscan_degree ;
>  parallel_seqscan_degree
> -------------------------
>  10
> (1 row)
>
>
> Should I really need to increase max_worker_processes to >=
> parallel_seqscan_degree?  If so, shouldn't there be a hint here along with
> the error message pointing this out?  And should the error be produced when
> only a *plan* is being requested?
>
> Also, I noticed that where a table is partitioned, the plan isn't
> parallelised:
>
> # explain select distinct bid from pgbench_accounts;
>
>
>                                        QUERY
> PLAN
>
> ----------------------------------------------------------------------------------------
>  HashAggregate  (cost=1446639.00..1446643.99 rows=499 width=4)
>    Group Key: pgbench_accounts.bid
>    ->  Append  (cost=0.00..1321639.00 rows=50000001 width=4)
>          ->  Seq Scan on pgbench_accounts  (cost=0.00..0.00 rows=1 width=4)
>          ->  Seq Scan on pgbench_accounts_1  (cost=0.00..4279.00
> rows=100000 width=4)
>          ->  Seq Scan on pgbench_accounts_2  (cost=0.00..2640.00
> rows=100000 width=4)
>          ->  Seq Scan on pgbench_accounts_3  (cost=0.00..2640.00
> rows=100000 width=4)
>          ->  Seq Scan on pgbench_accounts_4  (cost=0.00..2640.00
> rows=100000 width=4)
>          ->  Seq Scan on pgbench_accounts_5  (cost=0.00..2640.00
> rows=100000 width=4)
>          ->  Seq Scan on pgbench_accounts_6  (cost=0.00..2640.00
> rows=100000 width=4)
>          ->  Seq Scan on pgbench_accounts_7  (cost=0.00..2640.00
> rows=100000 width=4)
> ...
>          ->  Seq Scan on pgbench_accounts_498  (cost=0.00..2640.00
> rows=100000 width=4)
>          ->  Seq Scan on pgbench_accounts_499  (cost=0.00..2640.00
> rows=100000 width=4)
>          ->  Seq Scan on pgbench_accounts_500  (cost=0.00..2640.00
> rows=100000 width=4)
> (504 rows)
>
> Is this expected?
>

Another issue (FYI, pgbench2 initialised with: pgbench -i -s 100 -F 10
pgbench2):

 ➤ psql://thom@[local]:5488/pgbench2

# explain select distinct bid from pgbench_accounts;
                                        QUERY
PLAN
-------------------------------------------------------------------------------------------
 HashAggregate  (cost=245833.38..245834.38 rows=100 width=4)
   Group Key: bid
   ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..220833.38
rows=10000000 width=4)
         Number of Workers: 8
         Number of Blocks Per Workers: 208333
(5 rows)

Time: 7.476 ms

 ➤ psql://thom@[local]:5488/pgbench2

# explain (analyse, buffers, verbose) select distinct bid from
pgbench_accounts;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Time: 14897.991 ms

The logs say:

2014-12-31 15:21:42 GMT [9164]: [240-1] user=,db=,client= LOG:  registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [241-1] user=,db=,client= LOG:  registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [242-1] user=,db=,client= LOG:  registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [243-1] user=,db=,client= LOG:  registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [244-1] user=,db=,client= LOG:  registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [245-1] user=,db=,client= LOG:  registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [246-1] user=,db=,client= LOG:  registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [247-1] user=,db=,client= LOG:  registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [248-1] user=,db=,client= LOG:  starting
background worker process "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [249-1] user=,db=,client= LOG:  starting
background worker process "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [250-1] user=,db=,client= LOG:  starting
background worker process "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [251-1] user=,db=,client= LOG:  starting
background worker process "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [252-1] user=,db=,client= LOG:  starting
background worker process "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [253-1] user=,db=,client= LOG:  starting
background worker process "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [254-1] user=,db=,client= LOG:  starting
background worker process "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [255-1] user=,db=,client= LOG:  starting
background worker process "backend_worker"
2014-12-31 15:21:46 GMT [9164]: [256-1] user=,db=,client= LOG:  worker
process: backend_worker (PID 10887) exited with exit code 1
2014-12-31 15:21:46 GMT [9164]: [257-1] user=,db=,client= LOG:
unregistering background worker "backend_worker"
2014-12-31 15:21:50 GMT [9164]: [258-1] user=,db=,client= LOG:  worker
process: backend_worker (PID 10888) exited with exit code 1
2014-12-31 15:21:50 GMT [9164]: [259-1] user=,db=,client= LOG:
unregistering background worker "backend_worker"
2014-12-31 15:21:57 GMT [9164]: [260-1] user=,db=,client= LOG:  server
process (PID 10869) was terminated by signal 9: Killed
2014-12-31 15:21:57 GMT [9164]: [261-1] user=,db=,client= DETAIL:  Failed
process was running: explain (analyse, buffers, verbose) select distinct
bid from pgbench_accounts;
2014-12-31 15:21:57 GMT [9164]: [262-1] user=,db=,client= LOG:  terminating
any other active server processes

Running it again, I get the same issue.  This is with
parallel_seqscan_degree set to 8, and the crash occurs with 4 and 2 too.

This doesn't happen if I set the pgbench scale to 50.  I suspect this is a
OOM issue.  My laptop has 16GB RAM, the table is around 13GB at scale 100,
and I don't have swap enabled.  But I'm concerned it crashes the whole
instance.

I also notice that requesting BUFFERS in a parallel EXPLAIN output yields
no such information.  Is that not possible to report?
-- 
Thom

[Attachment #3 (text/html)]

<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 31 December 2014 \
at 14:20, Thom Brown <span dir="ltr">&lt;<a href="mailto:thom@linux.com" \
target="_blank">thom@linux.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div \
class="gmail_quote"><div><div class="h5">On 18 December 2014 at 16:03, Amit Kapila \
<span dir="ltr">&lt;<a href="mailto:amit.kapila16@gmail.com" \
target="_blank">amit.kapila16@gmail.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><span><br><br>On Thu, Dec 18, 2014 \
at 9:22 PM, Amit Kapila &lt;<a href="mailto:amit.kapila16@gmail.com" \
target="_blank">amit.kapila16@gmail.com</a>&gt; wrote:<br>&gt;<br>&gt; On Mon, Dec 8, \
2014 at 10:40 AM, Amit Kapila &lt;<a href="mailto:amit.kapila16@gmail.com" \
target="_blank">amit.kapila16@gmail.com</a>&gt; wrote:<br>&gt; &gt;<br>&gt; &gt; On \
Sat, Dec 6, 2014 at 5:37 PM, Stephen Frost &lt;<a href="mailto:sfrost@snowman.net" \
target="_blank">sfrost@snowman.net</a>&gt; wrote:<br>&gt; &gt; &gt;<br>&gt; \
&gt;<br>&gt; &gt; So to summarize my understanding, below are the set of \
things<br>&gt; &gt; which I should work on and in the order they are listed.<br>&gt; \
&gt;<br>&gt; &gt; 1. Push down qualification<br>&gt; &gt; 2. Performance Data<br>&gt; \
&gt; 3. Improve the way to push down the information related to worker.<br>&gt; &gt; \
4. Dynamic allocation of work for workers.<br>&gt; &gt;<br>&gt; &gt;<br>&gt;<br>&gt; \
I have worked on the patch to accomplish above mentioned points<br>&gt; 1, 2 and \
partly 3 and would like to share the progress with community.<div><br><div \
class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" \
style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"> </blockquote></div></div></div></span><div>Sorry \
forgot to attach updated patch in last mail, attaching it \
now.</div></div></blockquote><div><br></div></div></div><div>When attempting to \
recreate the plan in your example, I get an error:<br><br>  ➤ \
psql://thom@[local]:5488/pgbench <br><span class=""><br># create table t1(c1 int, c2 \
char(500)) with (fillfactor=10);<br>CREATE TABLE<br></span>Time: 13.653 ms<br><br>  \
➤ psql://thom@[local]:5488/pgbench <br><span class=""><br># insert into t1 \
values(generate_series(1,100),&#39;amit&#39;);<br>INSERT 0 100<br></span>Time: 4.796 \
ms<br><br>  ➤ psql://thom@[local]:5488/pgbench <br><span class=""><br># explain \
select c1 from t1;<br></span>ERROR:   could not register background process<br>HINT:  \
You may need to increase max_worker_processes.<br>Time: 1.659 ms<br><br>  ➤ \
psql://thom@[local]:5488/pgbench <br><br># show max_worker_processes ;<br>  \
max_worker_processes <br>----------------------<br>  8<br>(1 row)<br><br>Time: 0.199 \
ms<br><br># show parallel_seqscan_degree ;<br>  parallel_seqscan_degree \
<br>-------------------------<br>  10<br>(1 row)<br><br></div><div><br>Should I \
really need to increase max_worker_processes to &gt;= parallel_seqscan_degree?   If \
so, shouldn&#39;t there be a hint here along with the error message pointing this \
out?   And should the error be produced when only a *plan* is being \
requested?<br><br></div><div>Also, I noticed that where a table is partitioned, the \
plan isn&#39;t parallelised:<br><br># explain select distinct bid from \
pgbench_accounts;<br><br><br>                                                         \
QUERY PLAN                                                                            \
<br>----------------------------------------------------------------------------------------<br> \
HashAggregate   (cost=1446639.00..1446643.99 rows=499 width=4)<br>     Group Key: \
pgbench_accounts.bid<br>     -&gt;   Append   (cost=0.00..1321639.00 rows=50000001 \
width=4)<br>                 -&gt;   Seq Scan on pgbench_accounts   (cost=0.00..0.00 \
rows=1 width=4)<br>                 -&gt;   Seq Scan on pgbench_accounts_1   \
(cost=0.00..4279.00 rows=100000 width=4)<br>                 -&gt;   Seq Scan on \
pgbench_accounts_2   (cost=0.00..2640.00 rows=100000 width=4)<br>                 \
-&gt;   Seq Scan on pgbench_accounts_3   (cost=0.00..2640.00 rows=100000 width=4)<br> \
-&gt;   Seq Scan on pgbench_accounts_4   (cost=0.00..2640.00 rows=100000 width=4)<br> \
-&gt;   Seq Scan on pgbench_accounts_5   (cost=0.00..2640.00 rows=100000 width=4)<br> \
-&gt;   Seq Scan on pgbench_accounts_6   (cost=0.00..2640.00 rows=100000 width=4)<br> \
-&gt;   Seq Scan on pgbench_accounts_7   (cost=0.00..2640.00 rows=100000 \
width=4)<br>...<br>                 -&gt;   Seq Scan on pgbench_accounts_498   \
(cost=0.00..2640.00 rows=100000 width=4)<br>                 -&gt;   Seq Scan on \
pgbench_accounts_499   (cost=0.00..2640.00 rows=100000 width=4)<br>                 \
-&gt;   Seq Scan on pgbench_accounts_500   (cost=0.00..2640.00 rows=100000 \
width=4)<br>(504 rows)<br><br></div><div>Is this expected?<span class=""><font \
color="#888888"><br></font></span></div></div></div></div></blockquote><div><br></div><div>Another \
issue (FYI, pgbench2 initialised with: pgbench -i -s 100 -F 10 pgbench2):<br><br>  \
➤ psql://thom@[local]:5488/pgbench2 <br><br># explain select distinct bid from \
pgbench_accounts;<br>                                                                 \
QUERY PLAN                                                                            \
<br>-------------------------------------------------------------------------------------------<br> \
HashAggregate   (cost=245833.38..245834.38 rows=100 width=4)<br>     Group Key: \
bid<br>     -&gt;   Parallel Seq Scan on pgbench_accounts   (cost=0.00..220833.38 \
rows=10000000 width=4)<br>                 Number of Workers: 8<br>                 \
Number of Blocks Per Workers: 208333<br>(5 rows)<br><br>Time: 7.476 ms<br><br>  ➤ \
psql://thom@[local]:5488/pgbench2 <br><br># explain (analyse, buffers, verbose) \
select distinct bid from pgbench_accounts;<br>server closed the connection \
unexpectedly<br>       This probably means the server terminated abnormally<br>       \
before or while processing the request.<br>The connection to the server was lost. \
Attempting reset: Failed.<br>Time: 14897.991 ms<br><br></div><div>The logs \
say:<br><br>2014-12-31 15:21:42 GMT [9164]: [240-1] user=,db=,client= LOG:   \
registering background worker &quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT \
[9164]: [241-1] user=,db=,client= LOG:   registering background worker \
&quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT [9164]: [242-1] \
user=,db=,client= LOG:   registering background worker \
&quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT [9164]: [243-1] \
user=,db=,client= LOG:   registering background worker \
&quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT [9164]: [244-1] \
user=,db=,client= LOG:   registering background worker \
&quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT [9164]: [245-1] \
user=,db=,client= LOG:   registering background worker \
&quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT [9164]: [246-1] \
user=,db=,client= LOG:   registering background worker \
&quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT [9164]: [247-1] \
user=,db=,client= LOG:   registering background worker \
&quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT [9164]: [248-1] \
user=,db=,client= LOG:   starting background worker process \
&quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT [9164]: [249-1] \
user=,db=,client= LOG:   starting background worker process \
&quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT [9164]: [250-1] \
user=,db=,client= LOG:   starting background worker process \
&quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT [9164]: [251-1] \
user=,db=,client= LOG:   starting background worker process \
&quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT [9164]: [252-1] \
user=,db=,client= LOG:   starting background worker process \
&quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT [9164]: [253-1] \
user=,db=,client= LOG:   starting background worker process \
&quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT [9164]: [254-1] \
user=,db=,client= LOG:   starting background worker process \
&quot;backend_worker&quot;<br>2014-12-31 15:21:42 GMT [9164]: [255-1] \
user=,db=,client= LOG:   starting background worker process \
&quot;backend_worker&quot;<br>2014-12-31 15:21:46 GMT [9164]: [256-1] \
user=,db=,client= LOG:   worker process: backend_worker (PID 10887) exited with exit \
code 1<br>2014-12-31 15:21:46 GMT [9164]: [257-1] user=,db=,client= LOG:   \
unregistering background worker &quot;backend_worker&quot;<br>2014-12-31 15:21:50 GMT \
[9164]: [258-1] user=,db=,client= LOG:   worker process: backend_worker (PID 10888) \
exited with exit code 1<br>2014-12-31 15:21:50 GMT [9164]: [259-1] user=,db=,client= \
LOG:   unregistering background worker &quot;backend_worker&quot;<br>2014-12-31 \
15:21:57 GMT [9164]: [260-1] user=,db=,client= LOG:   server process (PID 10869) was \
terminated by signal 9: Killed<br>2014-12-31 15:21:57 GMT [9164]: [261-1] \
user=,db=,client= DETAIL:   Failed process was running: explain (analyse, buffers, \
verbose) select distinct bid from pgbench_accounts;<br>2014-12-31 15:21:57 GMT \
[9164]: [262-1] user=,db=,client= LOG:   terminating any other active server \
processes<br></div></div><br></div><div class="gmail_extra">Running it again, I get \
the same issue.   This is with parallel_seqscan_degree set to 8, and the crash occurs \
with 4 and 2 too.<br></div><div class="gmail_extra"><br></div><div \
class="gmail_extra">This doesn&#39;t happen if I set the pgbench scale to 50.   I \
suspect this is a OOM issue.   My laptop has 16GB RAM, the table is around 13GB at \
scale 100, and I don&#39;t have swap enabled.   But I&#39;m concerned it crashes the \
whole instance.<br><br></div><div class="gmail_extra">I also notice that requesting \
BUFFERS in a parallel EXPLAIN output yields no such information.   Is that not \
possible to report?<br></div><div class="gmail_extra">-- <br><div \
class="gmail_signature">Thom</div> </div></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic