[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgis-users
Subject:    Re: [postgis-users] On UpdateGeometrySRID() performance
From:       Martin Davis <mtnclimb () gmail ! com>
Date:       2018-11-01 17:13:41
Message-ID: CAK2ens3EBM68D-tbYSJu+ifYvnmf+6ZHeaMwz3KLfn8OEyJw-Q () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


A good entry for the FAQ?

On Thu, Nov 1, 2018 at 10:08 AM James Klassen <klassen.js@gmail.com> wrote:

> When I need to load and transform a large dataset before it lands in its
> final tables/rows, I generally use unlogged tables for the intermediate
> steps.  This saves a lot of disk writing (especially so if running
> PostgreSQL on a storage layer doing its own copy-on-write).  The downside
> is the unlogged tables won't be saved if for whatever reason PostgreSQL
> unexpectedly stops, but that generally doesn't matter in this case because
> you can just restart the load.
>
> On Thu, Nov 1, 2018, 11:54 Paul Ramsey <pramsey@cleverelephant.ca> wrote:
>
>> Also, if you created an index during your import process, that'll also
>> make your update slower.
>> I usually skip bulk updates in favour of writing a fresh table with the
>> stuff I want..
>>
>> CREATE TABLE mynewtable AS SELECT ST_SetSRID(geom, 4326) AS geom, ...
>> FROM myoldtable
>>
>> That generally writes as fast as is possible though still only single
>> threaded.
>>
>> P
>>
>>
>> On Thu, Nov 1, 2018 at 9:51 AM Darafei "Komяpa" Praliaskouski <
>> me@komzpa.net> wrote:
>>
>>> Hi,
>>>
>>> Please feed in your WKT polygons in EWKT format:
>>>
>>> SRID=4326;POLYGON(...)
>>>
>>> Since every update in Postgres is essentially Delete+Insert, time of
>>> rewriting each and every row being equal to initial Insert time is expected
>>> thing.
>>>
>>> You can also update SRID in two columns in one go:
>>>
>>> update tablename set geom1 = ST_SetSRID(geom1, 4326), geom2 =
>>> ST_SetSRID(geom2, 4326);
>>>
>>> Out of curiosity, where did you learn about UpdateGeometrySRID before
>>> learning about ST_SetSRID?
>>>
>>> ср, 31 окт. 2018 г. в 21:57, jerry73204 <jerry73204@gmail.com>:
>>>
>>>> Hi all,
>>>>
>>>> I'm stuck in the low performance of UpdateGeometrySRID().
>>>>
>>>> I get started with a 50GB polygon dataset in CSV in EPSG:4326
>>>> coordinates. Since I find no way to `\copy` the csv while preserving
>>>> the
>>>> SRID, the data is imported with null SRID and then `SELECT
>>>> UpdateGeometrySRID('table', 'column', 4326)`.
>>>>
>>>> The `UpdateGeometrySRID()` takes as long time as that of `\copy`, which
>>>> turns out to be approx two hours. The dataset has two geometry columns
>>>> and thus I have to take triple time to finish this data.
>>>>
>>>> I profiled the postgresql daemon. The avg disk writing speed is 30MB/s,
>>>> while occasionally peaks to 100MB/s. The SSD, F2FS formatted disk is
>>>> capable of up to 150MB/s. The daemon does not utilize the 4-core
>>>> i5-7600k CPU. It seems to be a single process task with avg CPU load
>>>> 20%, while other workers are idle. I wonder if there's a room for
>>>> improving the performance. Also, I'm looking for if it's possible to
>>>> preserve SRID with `\copy`.
>>>>
>>>> Jerry Lin
>>>>
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users@lists.osgeo.org
>>>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>>
>>> --
>>> Darafei Praliaskouski
>>> Support me: http://patreon.com/komzpa
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users@lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users@lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users@lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users

[Attachment #5 (text/html)]

<div dir="ltr">A good entry for the FAQ?</div><br><div class="gmail_quote"><div \
dir="ltr">On Thu, Nov 1, 2018 at 10:08 AM James Klassen &lt;<a \
href="mailto:klassen.js@gmail.com">klassen.js@gmail.com</a>&gt; \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto">When I need to load \
and transform a large dataset before it lands in its final tables/rows, I generally \
use unlogged tables for the intermediate steps.   This saves a lot of disk writing \
(especially so if running PostgreSQL on a storage layer doing its own copy-on-write). \
The downside is the unlogged tables won&#39;t be saved if for whatever reason \
PostgreSQL unexpectedly stops, but that generally doesn&#39;t matter in this case \
because you can just restart the load.</div><br><div class="gmail_quote"><div \
dir="ltr">On Thu, Nov 1, 2018, 11:54 Paul Ramsey &lt;<a \
href="mailto:pramsey@cleverelephant.ca" \
target="_blank">pramsey@cleverelephant.ca</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">Also, if you created an index during your \
import process, that&#39;ll also make your update slower.<div>I usually skip bulk \
updates in favour of writing a fresh table with the stuff I \
want..</div><div><br></div><div>CREATE TABLE mynewtable AS SELECT ST_SetSRID(geom, \
4326) AS geom, ... FROM myoldtable</div><div><br></div><div>That generally writes as \
fast as is possible though still only single \
threaded.</div><div><br></div><div>P</div><div><br></div></div><br><div \
class="gmail_quote"><div dir="ltr">On Thu, Nov 1, 2018 at 9:51 AM Darafei \
&quot;Komяpa&quot; Praliaskouski &lt;<a href="mailto:me@komzpa.net" rel="noreferrer" \
target="_blank">me@komzpa.net</a>&gt; wrote:<br></div><blockquote class="gmail_quote" \
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div \
dir="ltr">Hi,<div><br></div><div>Please feed in your WKT polygons in EWKT \
format:</div><div><br></div><div>SRID=4326;POLYGON(...)</div><div><br></div><div>Since \
every update in Postgres is essentially Delete+Insert, time of rewriting each and \
every row being equal to initial Insert time is expected \
thing.</div><div><br></div><div>You can also update SRID in two columns in one \
go:</div><div><br></div><div>update tablename set geom1 = ST_SetSRID(geom1, 4326), \
geom2 = ST_SetSRID(geom2, 4326);</div><div><br></div><div>Out of curiosity, where did \
you learn about UpdateGeometrySRID before learning about \
ST_SetSRID?</div></div><br><div class="gmail_quote"><div dir="ltr">ср, 31 окт. \
2018 г. в 21:57, jerry73204 &lt;<a href="mailto:jerry73204@gmail.com" \
rel="noreferrer" target="_blank">jerry73204@gmail.com</a>&gt;:<br></div><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">Hi all,<br> <br>
I&#39;m stuck in the low performance of UpdateGeometrySRID().<br>
<br>
I get started with a 50GB polygon dataset in CSV in EPSG:4326 <br>
coordinates. Since I find no way to `\copy` the csv while preserving the <br>
SRID, the data is imported with null SRID and then `SELECT <br>
UpdateGeometrySRID(&#39;table&#39;, &#39;column&#39;, 4326)`.<br>
<br>
The `UpdateGeometrySRID()` takes as long time as that of `\copy`, which <br>
turns out to be approx two hours. The dataset has two geometry columns <br>
and thus I have to take triple time to finish this data.<br>
<br>
I profiled the postgresql daemon. The avg disk writing speed is 30MB/s, <br>
while occasionally peaks to 100MB/s. The SSD, F2FS formatted disk is <br>
capable of up to 150MB/s. The daemon does not utilize the 4-core <br>
i5-7600k CPU. It seems to be a single process task with avg CPU load <br>
20%, while other workers are idle. I wonder if there&#39;s a room for <br>
improving the performance. Also, I&#39;m looking for if it&#39;s possible to <br>
preserve SRID with `\copy`.<br>
<br>
Jerry Lin<br>
<br>
_______________________________________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" rel="noreferrer" \
target="_blank">postgis-users@lists.osgeo.org</a><br> <a \
href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer \
noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a></blockquote></div>-- \
<br><div dir="ltr" class="m_-4891228243511266869m_-3628794120223649882m_-8598942316708085255m_-1163194206989076059gmail_signature" \
data-smartmail="gmail_signature"><div dir="ltr">Darafei Praliaskouski<br>Support me: \
<a href="http://patreon.com/komzpa" rel="noreferrer" \
target="_blank">http://patreon.com/komzpa</a></div></div> \
_______________________________________________<br> postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" rel="noreferrer" \
target="_blank">postgis-users@lists.osgeo.org</a><br> <a \
href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer \
noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a></blockquote></div>
 _______________________________________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" rel="noreferrer" \
target="_blank">postgis-users@lists.osgeo.org</a><br> <a \
href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer \
noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a></blockquote></div>
 _______________________________________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" \
target="_blank">postgis-users@lists.osgeo.org</a><br> <a \
href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" \
target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a></blockquote></div>



[Attachment #6 (text/plain)]

_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic