[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgis-users
Subject:    Re: [postgis-users] postgis-users Digest, Vol 201, Issue 1
From:       Simon Norris <snorris () hillcrestgeo ! ca>
Date:       2018-11-01 21:11:36
Message-ID: B8EE29B5-3323-4AB1-ACF7-6B49D845EB1C () hillcrestgeo ! ca
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


You could also consider loading your csv with ogr2ogr, that should allow you to set \
the correct SRID on load.  ogr should use the fast PG_USE_COPY option by default.
If that works well and your data allows it, you could try more optimization by \
splitting it up with ogr2ogr's -sql option - and run the load in parallel jobs, \
something like noted here: http://longwayaround.org.uk/notes/loading-postgis/ \
<http://longwayaround.org.uk/notes/loading-postgis/>.  Syntax for `parallel` on the \
single file might be tricky, I'd probably use python's multiprocessing module.


> 
> 
> Message: 7
> Date: Thu, 1 Nov 2018 15:57:41 +0800
> From: jerry73204 <jerry73204@gmail.com>
> To: postgis-users@lists.osgeo.org
> Subject: [postgis-users] On UpdateGeometrySRID() performance
> Message-ID: <69efd569-bac0-d092-eb17-895dd02d20bc@gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
> 
> Hi all,
> 
> I'm stuck in the low performance of UpdateGeometrySRID().
> 
> I get started with a 50GB polygon dataset in CSV in EPSG:4326 
> coordinates. Since I find no way to `\copy` the csv while preserving the 
> SRID, the data is imported with null SRID and then `SELECT 
> UpdateGeometrySRID('table', 'column', 4326)`.
> 
> The `UpdateGeometrySRID()` takes as long time as that of `\copy`, which 
> turns out to be approx two hours. The dataset has two geometry columns 
> and thus I have to take triple time to finish this data.
> 
> I profiled the postgresql daemon. The avg disk writing speed is 30MB/s, 
> while occasionally peaks to 100MB/s. The SSD, F2FS formatted disk is 
> capable of up to 150MB/s. The daemon does not utilize the 4-core 
> i5-7600k CPU. It seems to be a single process task with avg CPU load 
> 20%, while other workers are idle. I wonder if there's a room for 
> improving the performance. Also, I'm looking for if it's possible to 
> preserve SRID with `\copy`.
> 
> Jerry Lin
> 
> 


[Attachment #5 (unknown)]

<html><head><meta http-equiv="Content-Type" content="text/html; \
charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: \
space; line-break: after-white-space;" class="">You could also consider loading your \
csv with ogr2ogr, that should allow you to set the correct SRID on load.&nbsp;<div \
class="">ogr should use the fast PG_USE_COPY option by default.<br class="">If that \
works well and your data allows it, you could try more optimization by splitting it \
up with ogr2ogr's -sql option - and run the load in parallel jobs, something like \
noted here: <a href="http://longwayaround.org.uk/notes/loading-postgis/" \
class="">http://longwayaround.org.uk/notes/loading-postgis/</a>. &nbsp;Syntax for \
`parallel` on the single file might be tricky, I'd probably use python's \
multiprocessing module.<br class=""><br class=""><div><br class=""><blockquote \
type="cite" class=""><div class=""><br class=""></div><div class=""><div class=""><br \
class="">Message: 7<br class="">Date: Thu, 1 Nov 2018 15:57:41 +0800<br \
class="">From: jerry73204 &lt;<a href="mailto:jerry73204@gmail.com" \
class="">jerry73204@gmail.com</a>&gt;<br class="">To: <a \
href="mailto:postgis-users@lists.osgeo.org" \
class="">postgis-users@lists.osgeo.org</a><br class="">Subject: [postgis-users] On \
UpdateGeometrySRID() performance<br class="">Message-ID: &lt;<a \
href="mailto:69efd569-bac0-d092-eb17-895dd02d20bc@gmail.com" \
class="">69efd569-bac0-d092-eb17-895dd02d20bc@gmail.com</a>&gt;<br \
class="">Content-Type: text/plain; charset=utf-8; format=flowed<br class=""><br \
class="">Hi all,<br class=""><br class="">I'm stuck in the low performance of \
UpdateGeometrySRID().<br class=""><br class="">I get started with a 50GB polygon \
dataset in CSV in EPSG:4326 <br class="">coordinates. Since I find no way to `\copy` \
the csv while preserving the <br class="">SRID, the data is imported with null SRID \
and then `SELECT <br class="">UpdateGeometrySRID('table', 'column', 4326)`.<br \
class=""><br class="">The `UpdateGeometrySRID()` takes as long time as that of \
`\copy`, which <br class="">turns out to be approx two hours. The dataset has two \
geometry columns <br class="">and thus I have to take triple time to finish this \
data.<br class=""><br class="">I profiled the postgresql daemon. The avg disk writing \
speed is 30MB/s, <br class="">while occasionally peaks to 100MB/s. The SSD, F2FS \
formatted disk is <br class="">capable of up to 150MB/s. The daemon does not utilize \
the 4-core <br class="">i5-7600k CPU. It seems to be a single process task with avg \
CPU load <br class="">20%, while other workers are idle. I wonder if there's a room \
for <br class="">improving the performance. Also, I'm looking for if it's possible to \
<br class="">preserve SRID with `\copy`.<br class=""><br class="">Jerry Lin<br \
class=""><br class=""><br class=""></div></div></blockquote></div><br \
class=""></div></body></html>


[Attachment #6 (text/plain)]

_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic