[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: Avoid excessive inlining?
From:       "Joel Jacobson" <joel () compiler ! org>
Date:       2020-12-22 17:44:57
Message-ID: 84d1730f-8784-4e30-bba5-01a99f3dfcc9 () www ! fastmail ! com
[Download RAW message or body]

Thanks Tom,

this was exactly what I needed to hear.

I guess I recently have become too fond of all the nice new "recent" advanced SQL \
features, such as LATERAL and MATERIALIZED CTEs, now in my possession since I now \
only code on hobby projects, after all the years stuck in an old PostgreSQL version \
in my previous job, to realise that such SQL features are not always a good fit for \
the job at all times.

I rewrote all the slow pure SQL code in PL/pgSQL and got as 568% speed-up in the CBOR \
to JSON converter I'm working on. Thanks also for giving me inspiration on the \
wording for my own commit message:

https://github.com/truthly/pg-cbor/commit/7ea7640f699cdf271ffa9cfb5d059f8141f541ed

Author: Joel Jakobsson <joel@compiler.org>
Date:   Tue Dec 22 18:21:47 2020 +0100

    Optimize by rewriting declarative pure SQL code into imperative PL/pgSQL

    PostgreSQL's SQL language isn't terribly well suited to execute
    a fundamentally stepwise, imperative algorithm like CBOR.

    Rather than hacking up cute tricks with LATERAL, we should just use
    a language that *is* well suited, a PL, like PL/pgSQL.

    -- Pure SQL (before):
    select * from pg_stat_xact_user_functions ;
     schemaname |  funcname  | calls | total_time | self_time
    ------------+------------+-------+------------+-----------
     cbor       | next_item  |    14 |   48.91024 | 38.964918
     cbor       | next_array |     1 |   7.297435 |  1.816102
     cbor       | next_map   |     2 |  40.844352 |    7.8957
     cbor       | to_jsonb   |     1 |  50.222183 |  1.311943

    -- PL/pgSQL (after):
    select * from pg_stat_xact_user_functions ;
     schemaname |   funcname   | calls | total_time | self_time
    ------------+--------------+-------+------------+-----------
     cbor       | next_item    |    14 |   8.021371 |  3.358271
     cbor       | next_array   |     1 |   0.565398 |  0.353071
     cbor       | next_map     |     2 |   5.607702 |  1.324057
     cbor       | to_jsonb     |     1 |   8.823691 |   0.80232

FUNCTIONS/major_type_0.sql      |  23 +++++++++++++++++++++++
FUNCTIONS/major_type_1.sql      |  23 +++++++++++++++++++++++
FUNCTIONS/major_type_2.sql      |  23 +++++++++++++++++++++++
FUNCTIONS/major_type_3.sql      |  23 +++++++++++++++++++++++
FUNCTIONS/major_type_4.sql      |  23 +++++++++++++++++++++++
FUNCTIONS/major_type_5.sql      |  23 +++++++++++++++++++++++
FUNCTIONS/major_type_6.sql      |  40 ++++++++++++++++++++++++++++++++++++++++
FUNCTIONS/major_type_7.sql      |  43 +++++++++++++++++++++++++++++++++++++++++++
FUNCTIONS/next_item.sql         | 109 \
++++++++++++++++++++++++++++++++++++-------------------------------------------------------------------------
 Makefile                        |   8 ++++++++
expected/rfc7049_appendix_a.out |  52 \
+++++++++++++++++++++++++++++++++++++++------------- 11 files changed, 304 \
insertions(+), 86 deletions(-)

Best regards,

Joel

On Tue, Dec 22, 2020, at 17:32, Tom Lane wrote:
"Joel Jacobson" <joel@compiler.org> writes:
> I think I was a bit unclear about my problem, and might have used the wrong \
> terminology. In my LATERAL query, there are calculations in a certain order.
> For each step, "columns" are computed named e.g. "g", "c", "h", "i", etc.
> However, when looking at the query plan, these steps are gone, and instead there is \
> just one huge fully expanded expression, which doesn't look very efficient.

Yeah, this isn't really about function inlining, it's about subquery
flattening (which is similar in some ways, but not the same thing).

Unfortunately, subquery flattening happens early enough in the planner
that there's no chance of making any useful cost comparisons to decide
whether to do it or not.  So we just do it unconditionally.  I'm
not really sure that failing to do it would provide a better outcome
in this situation anyway --- sure, you'd save a few scalar calculations,
but the overhead of running additional plan nodes could outweigh that.

The long and the short of it is that SQL isn't terribly well suited to
execute a fundamentally stepwise, imperative algorithm like this one.
Rather than hacking up cute tricks with LATERAL, you should just use
a language that *is* well suited.  That's why we provide PLs.

FWIW, another trick for inserting optimization fences is WITH.
So you could do something like

WITH Q1(g,c) AS MATERIALIZED
  (SELECT year % 19, year / 100),
Q2(h) AS MATERIALIZED
  (SELECT (c - c/4 - (8*c + 13)/25 + 19*g + 15) % 30 FROM Q1),
...
SELECT make_date(year, easter_month, easter_day) FROM Q6;

But I'd bet lunch that that won't be faster for this example,
because there's a lot of overhead in CTEs.

regards, tom lane


Kind regards,

Joel


[Attachment #3 (text/html)]

<!DOCTYPE html><html><head><title></title><style \
type="text/css">p.MsoNormal,p.MsoNoSpacing{margin:0}</style></head><body><div>Thanks \
Tom,<br></div><div><br></div><div>this was exactly what I needed to \
hear.<br></div><div><br></div><div>I guess I recently have become too fond of all the \
nice new "recent" advanced SQL features,<br></div><div>such as LATERAL and \
MATERIALIZED CTEs, now in my possession since I now only code on \
hobby<br></div><div>projects, after all the years stuck in an old PostgreSQL version \
in my previous job,<br></div><div>to realise that such SQL features are not always a \
good fit for the job at all times.<br></div><div><br></div><div>I rewrote all the \
slow pure SQL code in PL/pgSQL and got as 568% speed-up in the CBOR to JSON converter \
I'm working on. Thanks also for giving me inspiration on the wording for my own \
commit message:<br></div><div><br></div><div><div><a \
href="https://github.com/truthly/pg-cbor/commit/7ea7640f699cdf271ffa9cfb5d059f8141f541 \
ed">https://github.com/truthly/pg-cbor/commit/7ea7640f699cdf271ffa9cfb5d059f8141f541ed</a><br></div><div><br></div><div>Author: \
Joel Jakobsson &lt;<a \
href="mailto:joel@compiler.org">joel@compiler.org</a>&gt;<br></div></div><div>Date:&nbsp;&nbsp; \
Tue Dec 22 18:21:47 2020 +0100<br></div><div><br></div><div>&nbsp;&nbsp;&nbsp; \
Optimize by rewriting declarative pure SQL code into imperative \
PL/pgSQL<br></div><div><br></div><div>&nbsp;&nbsp;&nbsp; PostgreSQL's SQL language \
isn't terribly well suited to execute<br></div><div>&nbsp;&nbsp;&nbsp; a \
fundamentally stepwise, imperative algorithm like \
CBOR.<br></div><div><br></div><div>&nbsp;&nbsp;&nbsp; Rather than hacking up cute \
tricks with LATERAL, we should just use<br></div><div>&nbsp;&nbsp;&nbsp; a language \
that *is* well suited, a PL, like \
PL/pgSQL.<br></div><div><br></div><div>&nbsp;&nbsp;&nbsp; -- Pure SQL \
(before):<br></div><div>&nbsp;&nbsp;&nbsp; select * from pg_stat_xact_user_functions \
;<br></div><div>&nbsp;&nbsp;&nbsp;&nbsp; schemaname |&nbsp; funcname&nbsp; | calls | \
total_time | self_time<br></div><div>&nbsp;&nbsp;&nbsp; \
------------+------------+-------+------------+-----------<br></div><div>&nbsp;&nbsp;&nbsp;&nbsp; \
cbor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | next_item&nbsp; |&nbsp;&nbsp;&nbsp; 14 \
|&nbsp;&nbsp; 48.91024 | 38.964918<br></div><div>&nbsp;&nbsp;&nbsp;&nbsp; \
cbor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | next_array |&nbsp;&nbsp;&nbsp;&nbsp; 1 \
|&nbsp;&nbsp; 7.297435 |&nbsp; 1.816102<br></div><div>&nbsp;&nbsp;&nbsp;&nbsp; \
cbor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | next_map&nbsp;&nbsp; \
|&nbsp;&nbsp;&nbsp;&nbsp; 2 |&nbsp; 40.844352 |&nbsp;&nbsp;&nbsp; \
7.8957<br></div><div>&nbsp;&nbsp;&nbsp;&nbsp; \
cbor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | to_jsonb&nbsp;&nbsp; \
|&nbsp;&nbsp;&nbsp;&nbsp; 1 |&nbsp; 50.222183 |&nbsp; \
1.311943<br></div><div><br></div><div>&nbsp;&nbsp;&nbsp; -- PL/pgSQL \
(after):<br></div><div>&nbsp;&nbsp;&nbsp; select * from pg_stat_xact_user_functions \
;<br></div><div>&nbsp;&nbsp;&nbsp;&nbsp; schemaname |&nbsp;&nbsp; \
funcname&nbsp;&nbsp; | calls | total_time | \
self_time<br></div><div>&nbsp;&nbsp;&nbsp; \
------------+--------------+-------+------------+-----------<br></div><div>&nbsp;&nbsp;&nbsp;&nbsp; \
cbor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | next_item&nbsp;&nbsp;&nbsp; \
|&nbsp;&nbsp;&nbsp; 14 |&nbsp;&nbsp; 8.021371 |&nbsp; \
3.358271<br></div><div>&nbsp;&nbsp;&nbsp;&nbsp; \
cbor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | next_array&nbsp;&nbsp; \
|&nbsp;&nbsp;&nbsp;&nbsp; 1 |&nbsp;&nbsp; 0.565398 |&nbsp; \
0.353071<br></div><div>&nbsp;&nbsp;&nbsp;&nbsp; \
cbor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | next_map&nbsp;&nbsp;&nbsp;&nbsp; \
|&nbsp;&nbsp;&nbsp;&nbsp; 2 |&nbsp;&nbsp; 5.607702 |&nbsp; \
1.324057<br></div><div>&nbsp;&nbsp;&nbsp;&nbsp; \
cbor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | to_jsonb&nbsp;&nbsp;&nbsp;&nbsp; \
|&nbsp;&nbsp;&nbsp;&nbsp; 1 |&nbsp;&nbsp; 8.823691 |&nbsp;&nbsp; \
0.80232<br></div><div><br></div><div>FUNCTIONS/major_type_0.sql&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
|&nbsp; 23 +++++++++++++++++++++++<br></div><div>FUNCTIONS/major_type_1.sql&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
|&nbsp; 23 +++++++++++++++++++++++<br></div><div>FUNCTIONS/major_type_2.sql&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
|&nbsp; 23 +++++++++++++++++++++++<br></div><div>FUNCTIONS/major_type_3.sql&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
|&nbsp; 23 +++++++++++++++++++++++<br></div><div>FUNCTIONS/major_type_4.sql&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
|&nbsp; 23 +++++++++++++++++++++++<br></div><div>FUNCTIONS/major_type_5.sql&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
|&nbsp; 23 +++++++++++++++++++++++<br></div><div>FUNCTIONS/major_type_6.sql&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
|&nbsp; 40 ++++++++++++++++++++++++++++++++++++++++<br></div><div>FUNCTIONS/major_type_7.sql&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
|&nbsp; 43 +++++++++++++++++++++++++++++++++++++++++++<br></div><div>FUNCTIONS/next_item.sql&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
| 109 ++++++++++++++++++++++++++++++++++++-------------------------------------------- \
-----------------------------<br></div><div>Makefile&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs \
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
|&nbsp;&nbsp; 8 ++++++++<br></div><div>expected/rfc7049_appendix_a.out |&nbsp; 52 \
+++++++++++++++++++++++++++++++++++++++-------------<br></div><div>11 files changed, \
304 insertions(+), 86 deletions(-)<br></div><div><br></div><div>Best \
regards,<br></div><div><br></div><div>Joel</div><div><br></div><div>On Tue, Dec 22, \
2020, at 17:32, Tom Lane wrote:<br></div><div>"Joel Jacobson" \
&lt;joel@compiler.org&gt; writes:<br></div><div>&gt; I think I was a bit unclear \
about my problem, and might have used the wrong terminology.<br></div><div>&gt; In my \
LATERAL query, there are calculations in a certain order.<br></div><div>&gt; For each \
step, "columns" are computed named e.g. "g", "c", "h", "i", etc.<br></div><div>&gt; \
However, when looking at the query plan, these steps are gone, and instead there is \
just one huge fully expanded expression, which doesn't look very \
efficient.<br></div><div><br></div><div>Yeah, this isn't really about function \
inlining, it's about subquery<br></div><div>flattening (which is similar in some \
ways, but not the same thing).<br></div><div><br></div><div>Unfortunately, subquery \
flattening happens early enough in the planner<br></div><div>that there's no chance \
of making any useful cost comparisons to decide<br></div><div>whether to do it or \
not.&nbsp; So we just do it unconditionally.&nbsp; I'm<br></div><div>not really sure \
that failing to do it would provide a better outcome<br></div><div>in this situation \
anyway --- sure, you'd save a few scalar calculations,<br></div><div>but the overhead \
of running additional plan nodes could outweigh \
that.<br></div><div><br></div><div>The long and the short of it is that SQL isn't \
terribly well suited to<br></div><div>execute a fundamentally stepwise, imperative \
algorithm like this one.<br></div><div>Rather than hacking up cute tricks with \
LATERAL, you should just use<br></div><div>a language that *is* well suited.&nbsp; \
That's why we provide PLs.<br></div><div><br></div><div>FWIW, another trick for \
inserting optimization fences is WITH.<br></div><div>So you could do something \
like<br></div><div><br></div><div>WITH Q1(g,c) AS MATERIALIZED<br></div><div>&nbsp; \
(SELECT year % 19, year / 100),<br></div><div>Q2(h) AS \
MATERIALIZED<br></div><div>&nbsp; (SELECT (c - c/4 - (8*c + 13)/25 + 19*g + 15) % 30 \
FROM Q1),<br></div><div>...<br></div><div>SELECT make_date(year, easter_month, \
easter_day) FROM Q6;<br></div><div><br></div><div>But I'd bet lunch that that won't \
be faster for this example,<br></div><div>because there's a lot of overhead in \
CTEs.<br></div><div><br></div><div>regards, tom \
lane<br></div><div><br></div><div><br></div><div>Kind \
regards,<br></div><div><br></div><div>Joel<br></div><div><br></div></body></html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic