[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: [HACKERS] [PATCH] Incremental sort
From:       Alexander Korotkov <a.korotkov () postgrespro ! ru>
Date:       2017-09-30 20:20:19
Message-ID: CAPpHfdtOiP8tQ8xUxo+YZTSJR65JqFYo63nxDJBS7y_mpW9agw () mail ! gmail ! com
[Download RAW message or body]

On Sat, Sep 16, 2017 at 2:46 AM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:

> On Thu, Sep 14, 2017 at 2:48 AM, Alexander Korotkov <
> a.korotkov@postgrespro.ru> wrote:
>
>> Patch rebased to current master is attached.  I'm going to improve my
>> testing script and post new results.
>>
>
> New benchmarking script and results are attached.  There new dataset
> parameter is introduced: skew factor.  Skew factor defines skew in
> distribution of groups sizes.
> My idea of generating is just usage of power function where power is from
> 0 to 1.  Following formula is used to get group number for particular item
> number i.
> [((i / number_of_indexes) ^ power) * number_of_groups]
> For example, power = 1/6 gives following distribution of groups sizes:
> group number    group size
> 0               2
> 1               63
> 2               665
> 3               3367
> 4               11529
> 5               31031
> 6               70993
> 7               144495
> 8               269297
> 9               468558
>
> For convenience, instead of power itself, I use skew factor where power =
> 1.0 / (1.0 + skew).  Therefore, with skew = 0.0, distribution of groups
> sizes is uniform.  Larger skew gives more skewed distribution (and that
> seems to be quite intuitive).  For, negative skew, group sizes are mirrored
> as for corresponding positive skew.  For example, skew factor = -5.0 gives
> following groups sizes distribution:
> group number    group size
> 0               468558
> 1               269297
> 2               144495
> 3               70993
> 4               31031
> 5               11529
> 6               3367
> 7               665
> 8               63
> 9               2
>
> Results shows that between 2172 test cases, in 2113 incremental sort gives
> speedup while in 59 it causes slowdown.  The following 4 test cases show
> most significant slowdown (>10% of time).
>
> Table                   GroupedCols GroupCount Skew PreorderedFrac
> FullSortMedian IncSortMedian TimeChangePercent
> int4|int4|numeric       1                  100  -10              0
> 1.5688240528  2.0607631207             31.36
> text|int8|text|int4     1                    1    0              0
> 1.7785198689 <(778)%20519-8689>  2.1816160679             22.66
> int8|int8|int4          1                   10  -10              0
>  1.136412859  1.3166360855             15.86
> numeric|text|int4|int8  2                   10  -10              1
> 0.4403841496  0.5070910454             15.15
>
> As you can see, 3 of this 4 test cases have skewed distribution while one
> of them is related to costly location-aware comparison of text.  I've no
> particular idea of how to cope these slowdowns.  Probably, it's OK to have
> slowdown in some cases while have speedup in majority of cases (assuming
> there is an option to turn off new behavior).  Probably, we should teach
> optimizer more about skewed distributions of groups, but that doesn't seem
> feasible for me.
>
> Any thoughts?
>

BTW, replacement selection sort was removed by 8b304b8b.  I think it worth
to rerun benchmarks after that, because results might be changed.  Will do.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[Attachment #3 (text/html)]

<div dir="ltr"><div class="gmail_extra"><div><div \
class="gmail-m_-976228356640248636gmail_signature">On Sat, Sep 16, 2017 at 2:46 AM, \
Alexander Korotkov <span dir="ltr">&lt;<a href="mailto:a.korotkov@postgrespro.ru" \
target="_blank">a.korotkov@postgrespro.ru</a>&gt;</span> wrote:<br></div></div><div \
class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div \
class="gmail_extra"><div class="gmail_quote"><span \
class="gmail-m_-976228356640248636gmail-">On Thu, Sep 14, 2017 at 2:48 AM, Alexander \
Korotkov <span dir="ltr">&lt;<a href="mailto:a.korotkov@postgrespro.ru" \
target="_blank">a.korotkov@postgrespro.ru</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div \
class="gmail_quote"><div>Patch rebased to current master is attached.   I&#39;m going \
to improve my testing script and post new results.  \
</div></div></div></div></blockquote><br \
class="gmail-m_-976228356640248636gmail-m_2358389376549298690gmail-m_-6733953979613587979gmail-Apple-interchange-newline"></span>New \
benchmarking script and results are attached.   There new dataset parameter is \
introduced: skew factor.   Skew factor defines skew in distribution of groups \
sizes.</div><div class="gmail_quote">My idea of generating is just usage of power \
function where power is from 0 to 1.   Following formula is used to get group number \
for particular item number i.</div><div class="gmail_quote">[((i / number_of_indexes) \
^ power) * number_of_groups]<br></div><div class="gmail_quote">For example, power = \
1/6 gives following distribution of groups sizes:</div><div class="gmail_quote"><div \
class="gmail_quote"><span style="font-family:monospace,monospace">group number      \
group size</span><br></div><div class="gmail_quote"><font face="monospace, \
monospace">0                      2</font></div><div class="gmail_quote"><font \
face="monospace, monospace">1</font><span style="font-family:monospace,monospace">  \
</span><span style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">63</span></div><div class="gmail_quote"><font \
face="monospace, monospace">2</font><span style="font-family:monospace,monospace">  \
</span><span style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">665</span></div><div \
class="gmail_quote"><font face="monospace, monospace">3</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">3367</span></div><div \
class="gmail_quote"><font face="monospace, monospace">4</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">11529</span></div><div \
class="gmail_quote"><font face="monospace, monospace">5</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">31031</span></div><div \
class="gmail_quote"><font face="monospace, monospace">6</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">70993</span></div><div \
class="gmail_quote"><font face="monospace, monospace">7</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">144495</span></div><div \
class="gmail_quote"><font face="monospace, monospace">8</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">269297</span></div><div \
class="gmail_quote"><font face="monospace, monospace">9</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">468558</span></div><div><br></div></div><div \
class="gmail_quote">For convenience, instead of power itself, I use skew factor where \
power = 1.0 / (1.0 + skew).   Therefore, with skew = 0.0, distribution of groups \
sizes is uniform.   Larger skew gives more skewed distribution (and that seems to be \
quite intuitive).   For, negative skew, group sizes are mirrored as for corresponding \
positive skew.   For example, skew factor = -5.0 gives following groups sizes \
distribution:</div><div class="gmail_quote"><div class="gmail_quote"><span \
style="font-family:monospace,monospace">group number      group \
size</span><br></div><div class="gmail_quote"><font face="monospace, \
monospace">0</font><span style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">468558</span></div><div \
class="gmail_quote"><font face="monospace, monospace">1</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">269297</span></div><div \
class="gmail_quote"><font face="monospace, monospace">2</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">144495</span></div><div \
class="gmail_quote"><font face="monospace, monospace">3</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">70993</span></div><div \
class="gmail_quote"><font face="monospace, monospace">4</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">31031</span></div><div \
class="gmail_quote"><font face="monospace, monospace">5</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">11529</span></div><div \
class="gmail_quote"><font face="monospace, monospace">6</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">3367</span></div><div \
class="gmail_quote"><font face="monospace, monospace">7</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">665</span></div><div \
class="gmail_quote"><font face="monospace, monospace">8</font><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">63</span></div><div class="gmail_quote"><font \
face="monospace, monospace">9</font><span style="font-family:monospace,monospace">  \
</span><span style="font-family:monospace,monospace">                    </span><span \
style="font-family:monospace,monospace">  </span><span \
style="font-family:monospace,monospace">2</span></div><div><font face="monospace, \
monospace"><br></font></div></div><div class="gmail_quote">Results shows that between \
2172 test cases, in 2113 incremental sort gives speedup while in 59 it causes \
slowdown.   The following 4 test cases show most significant slowdown (&gt;10% of \
time).</div><div class="gmail_quote"><br></div><div class="gmail_quote"><div \
class="gmail_quote"><font face="monospace, monospace">Table                           \
GroupedCols GroupCount Skew PreorderedFrac FullSortMedian IncSortMedian \
TimeChangePercent</font></div><div class="gmail_quote"><font face="monospace, \
monospace">int4|int4|numeric          1                           100   -10           \
0    1.5688240528   2.0607631207                   31.36</font></div><div \
class="gmail_quote"><font face="monospace, monospace">text|int8|text|int4       1     \
1      0                     0    <a href="tel:(778)%20519-8689" value="+17785198689" \
target="_blank">1.7785198689</a>   2.1816160679                   \
22.66</font></div><div class="gmail_quote"><font face="monospace, \
monospace">int8|int8|int4               1                            10   -10         \
0      1.136412859   1.3166360855                   15.86</font></div><div \
class="gmail_quote"><font face="monospace, monospace">numeric|text|int4|int8   2      \
10   -10                     1    0.4403841496   0.5070910454                   \
15.15</font></div><div><br></div></div><div class="gmail_quote">As you can see, 3 of \
this 4 test cases have skewed distribution while one of them is related to costly \
location-aware comparison of text.   I&#39;ve no particular idea of how to cope these \
slowdowns.   Probably, it&#39;s OK to have slowdown in some cases while have speedup \
in majority of cases (assuming there is an option to turn off new behavior).   \
Probably, we should teach optimizer more about skewed distributions of groups, but \
that doesn&#39;t seem feasible for me.</div><div class="gmail_quote"><br></div><div \
class="gmail_quote">Any \
thoughts?</div></div></div></blockquote><div><br></div><div>BTW, replacement \
selection sort was removed by  8b304b8b.   I think it worth to rerun benchmarks after \
that, because results might be changed.   Will do.</div><div><br></div><div><div \
class="gmail-m_-976228356640248636gmail_signature">------<br>Alexander \
Korotkov<br>Postgres Professional:  <a href="http://www.postgrespro.com/" \
target="_blank">http://www.<wbr>postgrespro.com</a><br>The Russian Postgres Company  \
</div></div></div></div></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic