[prev in list] [next in list] [prev in thread] [next in thread]
List: postgis-users
Subject: Re: [postgis-users] postgis-3.1.0beta2 released
From: Marco Boeringa <marco () boeringa ! demon ! nl>
Date: 2020-12-11 20:31:05
Message-ID: f21a7ac5-6303-cbb1-242b-d53bc172c6a0 () boeringa ! demon ! nl
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
Hi Darafei,
Thanks for the added explanation, much appreciated.
Yes, it very much sounds like I ran into this particular issue. In my
specific case though, I was lucky enough to be able to workaround this
issue by modifying my SQL. The SQL used was a nested WITH structure that
creates a materialized view, and I had put the 'ORDER BY geom' clause on
the outer query. By moving the clause to the inner most CTE, which
directly referenced a table that did have a spatial index on the
geometry column, the processing time for the creation of the 370M
materialized view went down from an undetermined >30hours, to under 4
hours. This solved my particular issue.
I concur this sounds like needing a fix upstream.
Marco
Op 11-12-2020 om 20:53 schreef Darafei "Komяpa" Praliaskouski:
> Hello Marco,
>
> This is exactly the case. The way Postgres accelerates the sorting is
> by having a 8-byte "prefix" to sort on before sorting on the actual
> field content. There is a problem in parallel sort implementation in
> Postgres where this prefix (as which PostGIS uses the sortable hash
> which is Hilbert key currently) where this thing will get calculated
> in parallel workers, sorted, and then calculated again on the merge
> stage in main worker, and then sorted again. Since it's a relatively
> slow op to calculate it performance may degrade to
> single-thread-sort-like. To fight that, _ST_SortableHash is exposed
> for advanced users: it immutably emits the key as an integer and after
> the sort that integer can be passed from parallel workers to main
> worker to be used verbatim. Best practices around this are not
> established yet and your feedback is welcome.
>
> For what I've noticed for now there is a significant difference in
> speed if you do something along those lines:
> select _ST_SortableHash(geom), * from table order by 1;
> instead of
> select * from table order by _ST_SortableHash(geom);
> There may be other tricks to consider.
>
> Please note that the best way forward on this is to fix issue in
> upstream postgres instead of this workaround.
>
> On Fri, Dec 11, 2020 at 10:36 PM Marco Boeringa
> <marco@boeringa.demon.nl <mailto:marco@boeringa.demon.nl>> wrote:
>
> Hi Paul,
>
> Could you potentially eleborate a bit about #4805, '_ST_SortableHash'
> exposed?
>
> I looked up the bug tracker issue, and have followed the links,
> but the
> information provided there is more of a stub, and at least one
> reference
> suggests mainly a link with BRIN type spatial indexes. On the other
> hand, it suggests a generic performance issue with Postgres and
> sorting
> very large spatial tables on the geometry column ('ORDER BY geom').
>
> I must admit I actually ran into to something that sounds like the
> latter, where attempting to 'ORDER BY geom' on a non-indexed spatial
> column of a >370M table, resulted in a relatively short (few hours)
> parallel execution of the query with multiple 'parallel workers'
> visible
> in pgAdmin, which was subsequently followed by > 30 hours of the
> 'client
> backend' doing single threaded work, with virtually no CPU, RAM or
> disk
> write activity. Since the database runs on SSD, this was unexpected.
> Even a full sequential scan of the entire table should only have
> cost a
> fraction of that time. I finally killed the process, not wanting
> to wait
> any longer.
>
> Marco Boeringa
>
> Op 11-12-2020 om 19:09 schreef Paul Ramsey:
> > Since there were a few quick additions to beta1, I have dropped
> a beta2. The next release will be rc1 and then final, barring any
> emergency.
> >
> > * Enhancements *
> > - #4814, Do not drop empty geometry components when converting
> > to GEOS (Sandro Santilli)
> > - #4815, Fix GEOS conversion of POINT EMPTY to retain type
> > (Sandro Santilli)
> > - #4813, ST_MakeValid removing NaN coordinates (Sandro Santilli)
> >
> >
> >> On Dec 9, 2020, at 4:25 PM, Paul Ramsey
> <pramsey@cleverelephant.ca <mailto:pramsey@cleverelephant.ca>> wrote:
> >>
> >> This is a beta1 release, for testing and quality assurance, to
> be followed shortly by a final release. If you're interested in
> the stability and usability of PostGIS, please take a little time
> to ensure that you can build and use this release.
> >>
> >>
> https://download.osgeo.org/postgis/source/postgis-3.1.0beta1.tar.gz
> <https://download.osgeo.org/postgis/source/postgis-3.1.0beta1.tar.gz>
> >> https://postgis.net/2020/12/09/postgis-3.1.0beta1/
> <https://postgis.net/2020/12/09/postgis-3.1.0beta1/>
> >>
> >> There have been a few changes since the alpha3 release.
> >>
> >> * Breaking changes *
> >>
> >> - #4214, Deprecated ST_Count(tablename,...),
> ST_ApproxCount(tablename, ...)
> >> ST_SummaryStats(tablename, ..),
> >> ST_Histogram(tablename, ...),
> ST_ApproxHistogram(tablename, ...),
> >> ST_Quantile(tablename, ...),
> ST_ApproxQuantile(tablename, ...) removed.
> >> (Darafei Praliaskouski)
> >>
> >> * Enhancements *
> >>
> >> - #4801, ST_ClusterKMeans supports weights in POINT[Z]M
> geometries (Darafei Praliaskouski)
> >> - #4804, ST_ReducePrecision (GEOS 3.9+) allows valid
> precision reduction (Paul Ramsey)
> >> - #4805, _ST_SortableHash exposed to work around parallel
> soring performance issue
> >> in Postgres. If your table is huge, use ORDER BY
> _ST_SortableHash(geom)
> >> instead of ORDER BY geom to make parallel sort
> faster (Darafei Praliaskouski)
> >> - #4625, Correlation statistics now calculated.
> >> Run ANALYZE for BRIN indexes to start kicking in.
> >> (Darafei Praliaskouski)
> >> - Fix axis order issue with urn:ogc:def:crs:EPSG in
> ST_GeomFromGML()
> >> (Even Roualt)
> >>
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users@lists.osgeo.org <mailto:postgis-users@lists.osgeo.org>
> > https://lists.osgeo.org/mailman/listinfo/postgis-users
> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> _______________________________________________
> postgis-users mailing list
> postgis-users@lists.osgeo.org <mailto:postgis-users@lists.osgeo.org>
> https://lists.osgeo.org/mailman/listinfo/postgis-users
> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
>
>
>
> --
> Darafei "Komяpa" Praliaskouski
> OSM BY Team - http://openstreetmap.by/ <http://openstreetmap.by/>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users@lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
[Attachment #5 (text/html)]
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi Darafei,</p>
<p>Thanks for the added explanation, much appreciated.</p>
<p>Yes, it very much sounds like I ran into this particular issue.
In my specific case though, I was lucky enough to be able to
workaround this issue by modifying my SQL. The SQL used was a
nested WITH structure that creates a materialized view, and I had
put the 'ORDER BY geom' clause on the outer query. By moving the
clause to the inner most CTE, which directly referenced a table
that did have a spatial index on the geometry column, the
processing time for the creation of the 370M materialized view
went down from an undetermined >30hours, to under 4 hours. This
solved my particular issue.</p>
<p>I concur this sounds like needing a fix upstream.<br>
</p>
<p>Marco<br>
</p>
<div class="moz-cite-prefix">Op 11-12-2020 om 20:53 schreef Darafei
"Komяpa" Praliaskouski:<br>
</div>
<blockquote type="cite"
cite="mid:CAC8Q8tJ8aMbfLd9i8o=YpPh+EJ9kwc-e_iSpkT6EzxhYoZNJZg@mail.gmail.com">
<div dir="ltr">Hello Marco,
<div><br>
</div>
<div>This is exactly the case. The way Postgres accelerates the
sorting is by having a 8-byte "prefix" to sort on before
sorting on the actual field content. There is a problem in
parallel sort implementation in Postgres where this prefix (as
which PostGIS uses the sortable hash which is Hilbert key
currently) where this thing will get calculated in parallel
workers, sorted, and then calculated again on the merge stage
in main worker, and then sorted again. Since it's a relatively
slow op to calculate it performance may degrade to
single-thread-sort-like. To fight that, _ST_SortableHash is
exposed for advanced users: it immutably emits the key as an
integer and after the sort that integer can be passed from
parallel workers to main worker to be used verbatim. Best
practices around this are not established yet and your
feedback is welcome.<br>
<br>
For what I've noticed for now there is a significant
difference in speed if you do something along those lines:<br>
select _ST_SortableHash(geom), * from table order by 1;<br>
instead of <br>
select * from table order by _ST_SortableHash(geom);<br>
There may be other tricks to consider.<br>
<br>
Please note that the best way forward on this is to fix issue
in upstream postgres instead of this workaround.</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Dec 11, 2020 at 10:36
PM Marco Boeringa <<a \
href="mailto:marco@boeringa.demon.nl">marco@boeringa.demon.nl</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi
Paul,<br>
<br>
Could you potentially eleborate a bit about #4805,
'_ST_SortableHash' <br>
exposed?<br>
<br>
I looked up the bug tracker issue, and have followed the
links, but the <br>
information provided there is more of a stub, and at least one
reference <br>
suggests mainly a link with BRIN type spatial indexes. On the
other <br>
hand, it suggests a generic performance issue with Postgres
and sorting <br>
very large spatial tables on the geometry column ('ORDER BY
geom').<br>
<br>
I must admit I actually ran into to something that sounds like
the <br>
latter, where attempting to 'ORDER BY geom' on a non-indexed
spatial <br>
column of a >370M table, resulted in a relatively short
(few hours) <br>
parallel execution of the query with multiple 'parallel
workers' visible <br>
in pgAdmin, which was subsequently followed by > 30 hours
of the 'client <br>
backend' doing single threaded work, with virtually no CPU,
RAM or disk <br>
write activity. Since the database runs on SSD, this was
unexpected. <br>
Even a full sequential scan of the entire table should only
have cost a <br>
fraction of that time. I finally killed the process, not
wanting to wait <br>
any longer.<br>
<br>
Marco Boeringa<br>
<br>
Op 11-12-2020 om 19:09 schreef Paul Ramsey:<br>
> Since there were a few quick additions to beta1, I have
dropped a beta2. The next release will be rc1 and then final,
barring any emergency.<br>
><br>
> * Enhancements *<br>
> - #4814, Do not drop empty geometry components when
converting<br>
> to GEOS (Sandro Santilli)<br>
> - #4815, Fix GEOS conversion of POINT EMPTY to retain
type<br>
> (Sandro Santilli)<br>
> - #4813, ST_MakeValid removing NaN coordinates (Sandro
Santilli)<br>
><br>
><br>
>> On Dec 9, 2020, at 4:25 PM, Paul Ramsey <<a
href="mailto:pramsey@cleverelephant.ca" \
target="_blank">pramsey@cleverelephant.ca</a>> wrote:<br>
>><br>
>> This is a beta1 release, for testing and quality
assurance, to be followed shortly by a final release. If
you're interested in the stability and usability of PostGIS,
please take a little time to ensure that you can build and use
this release.<br>
>><br>
>> <a
href="https://download.osgeo.org/postgis/source/postgis-3.1.0beta1.tar.gz"
rel="noreferrer" \
target="_blank">https://download.osgeo.org/postgis/source/postgis-3.1.0beta1.tar.gz</a><br>
>> <a
href="https://postgis.net/2020/12/09/postgis-3.1.0beta1/"
rel="noreferrer" \
target="_blank">https://postgis.net/2020/12/09/postgis-3.1.0beta1/</a><br> \
>><br> >> There have been a few changes since the alpha3
release.<br>
>><br>
>> * Breaking changes *<br>
>><br>
>> - #4214, Deprecated ST_Count(tablename,...),
ST_ApproxCount(tablename, ...)<br>
>> ST_SummaryStats(tablename, ..),<br>
>> ST_Histogram(tablename, ...),
ST_ApproxHistogram(tablename, ...),<br>
>> ST_Quantile(tablename, ...),
ST_ApproxQuantile(tablename, ...) removed.<br>
>> (Darafei Praliaskouski)<br>
>><br>
>> * Enhancements *<br>
>><br>
>> - #4801, ST_ClusterKMeans supports weights in
POINT[Z]M geometries (Darafei Praliaskouski)<br>
>> - #4804, ST_ReducePrecision (GEOS 3.9+) allows
valid precision reduction (Paul Ramsey)<br>
>> - #4805, _ST_SortableHash exposed to work around
parallel soring performance issue<br>
>> in Postgres. If your table is huge, use
ORDER BY _ST_SortableHash(geom)<br>
>> instead of ORDER BY geom to make parallel
sort faster (Darafei Praliaskouski)<br>
>> - #4625, Correlation statistics now calculated.<br>
>> Run ANALYZE for BRIN indexes to start
kicking in.<br>
>> (Darafei Praliaskouski)<br>
>> - Fix axis order issue with urn:ogc:def:crs:EPSG in
ST_GeomFromGML()<br>
>> (Even Roualt)<br>
>><br>
> _______________________________________________<br>
> postgis-users mailing list<br>
> <a href="mailto:postgis-users@lists.osgeo.org"
target="_blank">postgis-users@lists.osgeo.org</a><br>
> <a
href="https://lists.osgeo.org/mailman/listinfo/postgis-users"
rel="noreferrer" \
target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a><br> \
_______________________________________________<br> postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" \
target="_blank">postgis-users@lists.osgeo.org</a><br> <a
href="https://lists.osgeo.org/mailman/listinfo/postgis-users"
rel="noreferrer" \
target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a><br> \
</blockquote> </div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">Darafei "Komяpa" Praliaskouski<br>
OSM BY Team - <a href="http://openstreetmap.by/"
target="_blank">http://openstreetmap.by/</a><br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" \
wrap="">_______________________________________________ postgis-users mailing list
<a class="moz-txt-link-abbreviated" \
href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a> <a \
class="moz-txt-link-freetext" \
href="https://lists.osgeo.org/mailman/listinfo/postgis-users">https://lists.osgeo.org/mailman/listinfo/postgis-users</a>
</pre>
</blockquote>
</body>
</html>
_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic