[prev in list] [next in list] [prev in thread] [next in thread]
List: postgis-users
Subject: Re: [postgis-users] postgis-users Digest, Vol 201, Issue 1
From: Simon Norris <snorris () hillcrestgeo ! ca>
Date: 2018-11-01 21:11:36
Message-ID: B8EE29B5-3323-4AB1-ACF7-6B49D845EB1C () hillcrestgeo ! ca
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
You could also consider loading your csv with ogr2ogr, that should allow you to set \
the correct SRID on load. ogr should use the fast PG_USE_COPY option by default.
If that works well and your data allows it, you could try more optimization by \
splitting it up with ogr2ogr's -sql option - and run the load in parallel jobs, \
something like noted here: http://longwayaround.org.uk/notes/loading-postgis/ \
<http://longwayaround.org.uk/notes/loading-postgis/>. Syntax for `parallel` on the \
single file might be tricky, I'd probably use python's multiprocessing module.
>
>
> Message: 7
> Date: Thu, 1 Nov 2018 15:57:41 +0800
> From: jerry73204 <jerry73204@gmail.com>
> To: postgis-users@lists.osgeo.org
> Subject: [postgis-users] On UpdateGeometrySRID() performance
> Message-ID: <69efd569-bac0-d092-eb17-895dd02d20bc@gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Hi all,
>
> I'm stuck in the low performance of UpdateGeometrySRID().
>
> I get started with a 50GB polygon dataset in CSV in EPSG:4326
> coordinates. Since I find no way to `\copy` the csv while preserving the
> SRID, the data is imported with null SRID and then `SELECT
> UpdateGeometrySRID('table', 'column', 4326)`.
>
> The `UpdateGeometrySRID()` takes as long time as that of `\copy`, which
> turns out to be approx two hours. The dataset has two geometry columns
> and thus I have to take triple time to finish this data.
>
> I profiled the postgresql daemon. The avg disk writing speed is 30MB/s,
> while occasionally peaks to 100MB/s. The SSD, F2FS formatted disk is
> capable of up to 150MB/s. The daemon does not utilize the 4-core
> i5-7600k CPU. It seems to be a single process task with avg CPU load
> 20%, while other workers are idle. I wonder if there's a room for
> improving the performance. Also, I'm looking for if it's possible to
> preserve SRID with `\copy`.
>
> Jerry Lin
>
>
[Attachment #5 (unknown)]
<html><head><meta http-equiv="Content-Type" content="text/html; \
charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: \
space; line-break: after-white-space;" class="">You could also consider loading your \
csv with ogr2ogr, that should allow you to set the correct SRID on load. <div \
class="">ogr should use the fast PG_USE_COPY option by default.<br class="">If that \
works well and your data allows it, you could try more optimization by splitting it \
up with ogr2ogr's -sql option - and run the load in parallel jobs, something like \
noted here: <a href="http://longwayaround.org.uk/notes/loading-postgis/" \
class="">http://longwayaround.org.uk/notes/loading-postgis/</a>. Syntax for \
`parallel` on the single file might be tricky, I'd probably use python's \
multiprocessing module.<br class=""><br class=""><div><br class=""><blockquote \
type="cite" class=""><div class=""><br class=""></div><div class=""><div class=""><br \
class="">Message: 7<br class="">Date: Thu, 1 Nov 2018 15:57:41 +0800<br \
class="">From: jerry73204 <<a href="mailto:jerry73204@gmail.com" \
class="">jerry73204@gmail.com</a>><br class="">To: <a \
href="mailto:postgis-users@lists.osgeo.org" \
class="">postgis-users@lists.osgeo.org</a><br class="">Subject: [postgis-users] On \
UpdateGeometrySRID() performance<br class="">Message-ID: <<a \
href="mailto:69efd569-bac0-d092-eb17-895dd02d20bc@gmail.com" \
class="">69efd569-bac0-d092-eb17-895dd02d20bc@gmail.com</a>><br \
class="">Content-Type: text/plain; charset=utf-8; format=flowed<br class=""><br \
class="">Hi all,<br class=""><br class="">I'm stuck in the low performance of \
UpdateGeometrySRID().<br class=""><br class="">I get started with a 50GB polygon \
dataset in CSV in EPSG:4326 <br class="">coordinates. Since I find no way to `\copy` \
the csv while preserving the <br class="">SRID, the data is imported with null SRID \
and then `SELECT <br class="">UpdateGeometrySRID('table', 'column', 4326)`.<br \
class=""><br class="">The `UpdateGeometrySRID()` takes as long time as that of \
`\copy`, which <br class="">turns out to be approx two hours. The dataset has two \
geometry columns <br class="">and thus I have to take triple time to finish this \
data.<br class=""><br class="">I profiled the postgresql daemon. The avg disk writing \
speed is 30MB/s, <br class="">while occasionally peaks to 100MB/s. The SSD, F2FS \
formatted disk is <br class="">capable of up to 150MB/s. The daemon does not utilize \
the 4-core <br class="">i5-7600k CPU. It seems to be a single process task with avg \
CPU load <br class="">20%, while other workers are idle. I wonder if there's a room \
for <br class="">improving the performance. Also, I'm looking for if it's possible to \
<br class="">preserve SRID with `\copy`.<br class=""><br class="">Jerry Lin<br \
class=""><br class=""><br class=""></div></div></blockquote></div><br \
class=""></div></body></html>
[Attachment #6 (text/plain)]
_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic