'Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-performance
Subject:    Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows
From:       Lars Aksel Opsahl <Lars.Opsahl () nibio ! no>
Date:       2016-10-25 20:52:05
Message-ID: 1511d5311a414c49b9cc061b84e04a3d () nibio ! no
[Download RAW message or body]


Hi

I have now tested through insert and the updated and it works extremely good with out \
doing any partitioning on big table (428.812.8392 rows) this case, when we used a \
common index as Tom Suggested. 

We are able to insert 172.000 rows pr. second.

The number of rows are competed based total time from when we start to read the csv \
files and until the last file is done. We use GNU parallel and run 5 threads. The \
number of inserts are actually 172.000 * 2 because first we copy the rows into a temp \
table and there we prepare the date and then insert them into the common big main \
table. There is no errors in the log. 

We are able update 98.000 rows pr, second.

Since each update also means one insert we are close 200.000 inserts and updates pr. \
second. For update we give a column column that is null a value. Thats is done for \
all the 4.3 billions rows. We run 5 threads in parallel here also, and there is no \
error and no dead locks. 

To get around the problem with duplication of indexes it's solvable in this project \
because first we add date and then we do analyses, this means that we can have \
different indexes when adding data and we are using them.

In this project we going add about 25 billions geo located observations which which \
will be used for doing analyses. I suppose that we at some level have to do \
partitioning but so far Postgres has worked extremely well even if it's based on \
MVCC. 

Postgres/Postgis software and communities are sure for sure really fun to work with \
Postgres/Postgis open source software hold a very high quality. 

Thanks.

Lars

________________________________________
Fra: pgsql-performance-owner@postgresql.org <pgsql-performance-owner@postgresql.org> \
                på vegne av Tom Lane <tgl@sss.pgh.pa.us>
Sendt: 24. oktober 2016 14:52
Til: Lars Aksel Opsahl
Kopi: pgsql-performance@postgresql.org
Emne: Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion \
rows

Lars Aksel Opsahl <Lars.Opsahl@nibio.no> writes:
> In this example I have two tables one with 4 billion rows and another with 50000 \
> rows and then I try to do a standard simple join between this two tables and this \
> takes 397391  ms. with this SQL (the query plan is added is further down)

This particular query would work a lot better if you had an index on
nora_bc25_observation (point_uid_ref, epoch), ie both join columns
in one index.  I get the impression that that ought to be the primary
key of the table, which would be an even stronger reason to have a
unique index on it.

                        regards, tom lane


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic