'Re: [postgis-users] ST_Intersection very slow.'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgis-users
Subject:    Re: [postgis-users] ST_Intersection very slow.
From:       John Abraham <jea () hbaspecto ! com>
Date:       2015-02-24 19:36:19
Message-ID: BD32A86B-B3D4-4A8D-82D9-9D64B484B579 () hbaspecto ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

Thanks Mark.  I will indeed do the st_dump, but I don't think it will help much (very \
few multis).  I think the tiling will help a lot.  What I wonder, though, is how long \
it will take to tile?  Afterall, it's still an st_intersection operation to tile each \
geometry against each tile.

Is there a quad tree type tiling algorithm in a function?  If I do 256 x 256 tiles, \
doing it all at once would be 65536 operations of st_intersection(complex_geom, \
square).  With a quad tree I'll only have 4 operations of \
st_intersection(complex_geom, square), and then 16 of (a_little_less_complex_geom, \
square), and then 64 of (even_less_complex_geom, square) and so on, for 8 nests.  The \
last one will still be 65536 operations, but the complex geoms should be a lot \
simpler by the time I get down to that level.  What do you think, is this worth \
trying?  Or is intersecting with a square fairly simple regardless of how complex the \
other geometry is?

(Your reply initially fell into my junk mail, so I missed it, and also haven't tried \
these things yet, partly because I missed your email.)

--
John Abraham
jea@hbaspecto.com
403-232-1060

On Feb 19, 2015, at 6:52 PM, Mark Wynter <mark@dimensionaledge.com> wrote:

> 
> > So I've was running this query for 866000 s (10 days) before I decided to kill \
> > it:
> 
> 
> > One potential thing I've realized is that a few of the geometries in tazjan2 are \
> > multipolygons, not single polygons.  But it's only a few.  There are a few very \
> > large and complex polygons in lancover_polygons_snap, but again, most of the \
> > 998031 are simple small polygons, about half would be ST_CoveredBy the polygons \
> > in tazjan2 and most of the rest would only overlap two or three of the polygons \
> > in tazjan2.
> 
> 
> A common offender - the multipolygons.
> 
> You can get an immediate performance improvement by ST_Dump(wkb_geometry).geom into \
> a new table, with a new serial id, but retaining the original poly_id so you can \
> don’t lose the multipolygon provenance of each polygon. Complex operations on the \
> multipolygon table, like ST_Intersection(), are extremely slow - Because the query \
> has to access and compute the operation on each of the geometry elements contained \
> within the multi. 
> Another massive performance gain can be achieved by “Tiling” your dumped polygon \
> table - which has the effect of breaking your geometries into smaller features, \
> which you can then “union” back together again, after running ST_Intersection on \
> your vector tiles.  Don’t try to tile your multi polygon table - you’ll still be \
> waiting days. 
> Take the coastline of Australia - a single multi polygon
> 
> SELECT Count(ogc_fid) as num_rows, SUM(ST_NumGeometries(wkb_geometry)) as \
> num_geoms, SUM(ST_NPoints(wkb_geometry)) as num_points FROM abs_aus11; num_rows | \
>                 num_geoms | num_points 
> -------------------+--------------
> 1 |      6718 |    1622598
> 
> If you need to query the intersection of a polygon with land, and the water, its \
> near impossible.  Many, many days. 
> Hence we need to take a different approach.
> 
> The first thing you can do is to dump the multi polygon table.
> Then create a regular vector grid.
> Then tile the both your Polygon tables using the vector grid.
> Put indexes on the resultant tiled tables.
> Then run your ST_Intersects.
> The result is many more geometries, and points, but now we get the benefit of \
> geometry indexes. 
> As a result of tiling Australia coastline, we now get
> 
> SELECT Count(tid) as num_rows, SUM(ST_NumGeometries(wkb_geometry)) as num_geoms, \
> SUM(ST_NPoints(wkb_geometry)) as num_points FROM abs_aus11_tiled; num_rows | \
>                 num_geoms | num_points 
> -------------------+--------------
> 17222 |     29020 |    6513770
> 
> Each geometry also has a tile_id, and the original poly_id.  The tile_id’s are \
> really useful for subsetting your queries, and for using tile-id’s as an additional \
> join condition.  At any time you can rapidly rebuild your original geometries doing \
> ST_Union() GROUP BY poly_id. 
> Using the approach of dumping and tiling, queries that once took days now takes \
> minutes at most.  And as Remi mentioned, you can parallelise the process.  The \
> concept of vector tiling in PostGIS is analogous to Map Reduce and assigning Key \
> Value Pairs.  Its about chunking your data, doing lots of fast operations on the \
> smaller bits, and then consolidating your outputs.  You don’t necessarily have to \
> write a SQL function to do the SQL parallelisation.   My preference is Bash and GNU \
> Parallel because you can write vanilla SQL and execute via psql in parallel.   For \
> me its now about the speed I can write the SQL. 
> So we’re now starting to apply Big Data concepts within PostGIS….    Holy smoke… \
> and you can apply the same concepts to a wide range of PostGIS operations - scale \
> up, scale out…. 
> I will write up a tutorial explaining the benefits of vector tiling in PostGIS, \
> with examples and parallelisation code patterns.  If not today, hopefully over the \
> next week.   
> Mark
> 
> 

[Attachment #5 (unknown)]

<html><head><meta http-equiv="Content-Type" content="text/html \
charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: \
space; -webkit-line-break: after-white-space;"><div>Thanks Mark. &nbsp;I will indeed \
do the st_dump, but I don't think it will help much (very few multis). &nbsp;I think \
the tiling will help a lot. &nbsp;What I wonder, though, is how long it will take to \
tile? &nbsp;Afterall, it's still an st_intersection operation to tile each geometry \
against each tile.</div><div><br></div><div>Is there a quad tree type tiling \
algorithm in a function? &nbsp;If I do 256 x 256 tiles, doing it all at once would be \
65536 operations of st_intersection(complex_geom, square). &nbsp;With a quad tree \
I'll only have 4 operations of st_intersection(complex_geom, square), and then 16 of \
(a_little_less_complex_geom, square), and then 64 of (even_less_complex_geom, square) \
and so on, for 8 nests. &nbsp;The last one will still be 65536 operations, but the \
complex geoms should be a lot simpler by the time I get down to that level. \
&nbsp;What do you think, is this worth trying? &nbsp;Or is intersecting with a square \
fairly simple regardless of how complex the other geometry \
is?</div><div><br></div><div>(Your reply initially fell into my junk mail, so I \
missed it, and also haven't tried these things yet, partly because I missed your \
email.)</div><div><br></div><div><div apple-content-edited="true"> <span \
class="Apple-style-span" style="border-collapse: separate; border-spacing: \
0px;"><div>--</div><div>John Abraham</div><div><a \
href="mailto:jea@hbaspecto.com">jea@hbaspecto.com</a></div><div>403-232-1060</div></span>

</div>
<br><div><div>On Feb 19, 2015, at 6:52 PM, Mark Wynter &lt;<a \
href="mailto:mark@dimensionaledge.com">mark@dimensionaledge.com</a>&gt; \
wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><meta \
http-equiv="Content-Type" content="text/html charset=windows-1252"><div \
style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: \
after-white-space;"><div><br><blockquote type="cite">So I've was running this query \
for 866000 s (10 days) before I decided to kill \
it:<br></blockquote><div><br></div><br><blockquote type="cite">One potential thing \
I've realized is that a few of the geometries in tazjan2 are multipolygons, not \
single polygons. &nbsp;But it's only a few. &nbsp;There are a few very large and \
complex polygons in lancover_polygons_snap, but again, most of the 998031 are simple \
small polygons, about half would be ST_CoveredBy the polygons in tazjan2 and most of \
the rest would only overlap two or three of the polygons in \
tazjan2.<br></blockquote><div><br></div><div><br></div><div>A common offender - the \
multipolygons.</div><div><br></div><div>You can get an immediate performance \
improvement by ST_Dump(wkb_geometry).geom into a new table, with a new serial id, but \
retaining the original poly_id so you can don’t lose the multipolygon provenance of \
each polygon.</div><div>Complex operations on the multipolygon table, like \
ST_Intersection(), are extremely slow - Because the query has to access and compute \
the operation on each of the geometry elements contained within the \
multi.</div><div><br></div><div>Another massive performance gain can be achieved by \
“Tiling” your dumped polygon table - which has the effect of breaking your geometries \
into smaller features, which you can then “union” back together again, after running \
ST_Intersection on your vector tiles. &nbsp;Don’t try to tile your multi polygon \
table - you’ll still be waiting days.</div><div><br></div><div>Take the coastline of \
Australia - a single multi polygon</div><div><div style="font-family: Menlo; \
font-size: 11px; margin: 0px;"><br></div><div style="font-family: Menlo; font-size: \
11px; margin: 0px;">SELECT Count(ogc_fid) as num_rows, \
SUM(ST_NumGeometries(wkb_geometry)) as num_geoms, SUM(ST_NPoints(wkb_geometry)) as \
num_points FROM abs_aus11;</div><div style="font-family: Menlo; font-size: 11px; \
margin: 0px;">&nbsp;num_rows | num_geoms | num_points&nbsp;</div><div \
style="font-family: Menlo; font-size: 11px; margin: \
0px;">-------------------+--------------</div><div style="font-family: Menlo; \
font-size: 11px; margin: 0px;">&nbsp; &nbsp; &nbsp; &nbsp; 1 | &nbsp; &nbsp; \
&nbsp;6718 | &nbsp; &nbsp;1622598</div></div><div><br></div><div>If you need to query \
the intersection of a polygon with land, and the water, its near impossible. \
&nbsp;Many, many days.</div><div><br></div><div>Hence we need to take a different \
approach.</div><div><br></div><div>The first thing you can do is to dump the multi \
polygon table.</div><div>Then create a regular vector grid.</div><div>Then tile the \
both your Polygon tables using the vector grid.</div><div>Put indexes on the \
resultant tiled tables.</div><div>Then run your ST_Intersects.</div><div>The result \
is many more geometries, and points, but now we get the benefit of geometry \
indexes.</div><div><div style="font-family: Menlo; font-size: 11px;"><br></div><div \
style="font-family: Menlo; font-size: 11px;"><div style="margin: 0px;">As a result of \
tiling Australia coastline, we now get</div><div style="margin: 0px;"><br></div><div \
style="margin: 0px;">SELECT Count(tid) as num_rows, \
SUM(ST_NumGeometries(wkb_geometry)) as num_geoms, SUM(ST_NPoints(wkb_geometry)) as \
num_points FROM abs_aus11_tiled;</div><div style="margin: 0px;">&nbsp;num_rows | \
num_geoms | num_points&nbsp;</div><div style="margin: \
0px;">-------------------+--------------</div><div style="margin: 0px;">&nbsp; &nbsp; \
17222 | &nbsp; &nbsp; 29020 | &nbsp; \
&nbsp;6513770</div></div></div><div><br></div><div>Each geometry also has a tile_id, \
and the original poly_id. &nbsp;The tile_id’s are really useful for subsetting your \
queries, and for using tile-id’s as an additional join condition. &nbsp;At any time \
you can rapidly rebuild your original geometries doing ST_Union() GROUP BY \
poly_id.</div><div><br></div><div>Using the approach of dumping and tiling, queries \
that once took days now takes minutes at most. &nbsp;And as Remi mentioned, you can \
parallelise the process. &nbsp;The concept of vector tiling in PostGIS is analogous \
to Map Reduce and assigning Key Value Pairs. &nbsp;Its about chunking your data, \
doing lots of fast operations on the smaller bits, and then consolidating your \
outputs. &nbsp;You don’t necessarily have to write a SQL function to do the SQL \
parallelisation. &nbsp; My preference is Bash and GNU Parallel because you can write \
vanilla SQL and execute via psql in parallel. &nbsp; For me its now about the speed I \
can write the SQL.</div><div><br></div><div>So we’re now starting to apply Big Data \
concepts within PostGIS…. &nbsp; &nbsp;Holy smoke… and you can apply the same \
concepts to a wide range of PostGIS operations - scale up, scale \
out….</div><div><br></div><div>I will write up a tutorial explaining the benefits of \
vector tiling in PostGIS, with examples and parallelisation code patterns. &nbsp;If \
not today, hopefully over the next week. \
&nbsp;</div><div><br></div><div>Mark</div><div><br></div></div><br></div></blockquote></div><br></div></body></html>

_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

[prev in list] [next in list] [prev in thread] [next in thread]