[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: [GENERAL] clustered index benchmark comparing Postgresql vs Mariadb
From:       Merlin Moncure <mmoncure () gmail ! com>
Date:       2017-08-31 15:29:28
Message-ID: CAHyXU0zeziyYOMtgra3VH-P_m8NR+gAfW8LvQiM8cFxGE-BOOQ () mail ! gmail ! com
[Download RAW message or body]

On Wed, Aug 30, 2017 at 9:03 PM, =EC=9C=A0=EC=83=81=EC=A7=80 <y0212@naver.c=
om> wrote:

> I want to get help with Postgresql.
>
> I investigated that Postgresql could be rather fast in an environment
> using a secondary index. but It came up with different results on benckma=
rk.
>
> The database I compared was mariadb, and the benchmark tool was sysbench
> 1.0.8 with the postgresql driver.
>
> Server environment: vmware, Ubuntu 17.04, processor: 4, RAM: 4 GB,
> Harddisk: 40 GB, Mariadb (v10.3), PostgreSQL (v9.6.4)
>
mysql and other systems (for example sql server) use a technique where the
table is automatically clustered around an index-- generally the primary
key.  This technique has some tradeoffs; the main upside is that lookups on
the pkey are somewhat faster whereas lookups on any other index are
somewhat slower and insertions can be slower in certain cases (especially
if guids are the pkey).  I would call this technique 'index organized
table'.  The technique exploits the persistent organization so that the
index is implied and does not have to be kept separate from the heap (the
main table data storage).

postgres 'cluster' command currently is a one time pass over the table that
organizes the table physically in index order but does not maintain the
table in that order nor does it exploit the ordering to eliminate the
primary key index in the manner that other systems do.   From a postgres
point of view, the main advantage is that scans (not single record lookups)
over the key will be sequential physical reads and will tend to have to
read less physical pages since adjacent key records logically will also be
adjacent physically.

For my part, I generally prefer the postgres style of organization for most
workloads, particularly for the surrogate key pattern. I would definitely
like to have the option of having the indexed organized style however.
It's definitely possible to tease out the tradeoffs in synthetic
benchmarking but in the gross aggregate I suspect (but can't obviously
prove) the technique is a loser since as database models mature the kinds
of ways tables are indexed looked up and joined tends to proliferate.

merlin

[Attachment #3 (text/html)]

<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Aug 30, 2017 \
at 9:03 PM,  상지 <span dir="ltr">&lt;<a href="mailto:y0212@naver.com" \
target="_blank">y0212@naver.com</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div \
style="font-size:10pt;font-family:Gulim,sans-serif"><p><span style="font-size:10pt">I \
want to get help with Postgresql.</span><br></p><p>I investigated that Postgresql \
could be rather fast in an environment using a secondary index. but It came up with \
different results on benckmark.</p><p><span style="font-size:10pt">The database I \
compared was mariadb, and the benchmark tool was sysbench 1.0.8 with the postgresql \
driver.</span></p><p>Server environment: vmware, Ubuntu 17.04, processor: 4, RAM: 4 \
GB, Harddisk: 40 GB, Mariadb (v10.3), PostgreSQL \
(v9.6.4)</p></div></blockquote><div>mysql and other systems (for example sql server) \
use a technique where the table is automatically clustered around an index-- \
generally the primary key.   This technique has some tradeoffs; the main upside is \
that lookups on the pkey are somewhat faster whereas lookups on any other index are \
somewhat slower and insertions can be slower in certain cases (especially if guids \
are the pkey).   I would call this technique &#39;index organized table&#39;.   The \
technique exploits the persistent organization so that the index is implied and does \
not have to be kept separate from the heap (the main table data \
storage).</div><div><br></div><div>postgres &#39;cluster&#39; command currently is a \
one time pass over the table that organizes the table physically in index order but \
does not maintain the table in that order nor does it exploit the ordering to \
eliminate the primary key index in the manner that other systems do.    From a \
postgres point of view, the main advantage is that scans (not single record lookups) \
over the key will be sequential physical reads and will tend to have to read less \
physical pages since adjacent key records logically will also be adjacent \
physically.</div><div><br></div><div>For my part, I generally prefer the postgres \
style of organization for most workloads, particularly for the surrogate key pattern. \
I would definitely like to have the option of having the indexed organized style \
however.   It&#39;s definitely possible to tease out the tradeoffs in synthetic \
benchmarking but in the gross aggregate I suspect (but can&#39;t obviously prove) the \
technique is a loser since as database models mature the kinds of ways tables are \
indexed looked up and joined tends to proliferate.  \
</div><div><br></div><div>merlin</div><div>  </div></div></div></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic