[prev in list] [next in list] [prev in thread] [next in thread]
List: postgis-users
Subject: Re: [postgis-users] Distance based clustering (like in Openlayers orLeaflet)
From: Rémi_Cura <remi.cura () gmail ! com>
Date: 2014-04-25 7:29:27
Message-ID: CAJvUf_uhOaCGmvrLeLS=Qj52cNwZKPOFE1EQUQSWkEVz9TrbLQ () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
Hey,
I did the same thing using PLR and a minimal spanning tree.
It was fast, easy, and clustering method could be changed at will
(see all classical unsupervised learning).
Cheers,
Rémi-C
2014-04-24 12:53 GMT+02:00 Paragon Corporation <lr@pcorp.us>:
> Quick glance at your plpgsql function. It won't be able to use a
> spatial index. To utilize a spatial index, you'll need to change your
> construct
>
> ST_Distance(a.geom, b.geom) <= 4891.96981025 * 40
>
> to:
> ST_DWithin(a.geom, b.geom, 4891.96981025 * 40)
>
> Also make sure you have spatial indexes on your points table.
>
> Hope that helps,
> Regina
> http://www.postgis.us
> http://postgis.net
>
>
>
> ------------------------------
> *From:* postgis-users-bounces@lists.osgeo.org [mailto:
> postgis-users-bounces@lists.osgeo.org] *On Behalf Of *Matthias Ludwig
> *Sent:* Wednesday, April 23, 2014 10:53 AM
> *To:* postgis-users@lists.osgeo.org
> *Subject:* [postgis-users] Distance based clustering (like in Openlayers
> orLeaflet)
>
> Hi,
> I try to implement a simple distance based clustering in PostgreSQL.
> It's working the following way:
> 1. loop as long as there are non clustered points
> 2. took one point that is not clustered
> 3. define this as a new cluster
> 4. get all surrounding points in a defined distance and add them into
> the cluster
> 5. calculate the centroid for each cluster and count cluster points
>
> The calculation speed depends mostly on the count of points and the scale
> used. In the moment i'am using a python script and it took around 1sec for
> 1000 points for TMS zoomlevel 5 (distance = 4891.96981025m * 40 pixel) and
> around 3sec at zoomlevel 15 (distance = 4.78m * 40 pixel). Implementing
> it with pgsql doesn't speeds it up.
> Is it possible to do the work completely with SQL and maybe with just one
> query? I'am not sure if I understood the WITH RECURSIVE concept
> completely...but shouldn't it by possible to replace the iteration(2.-4.)
> with a WITH RECURSIVE statement? Are there any other suggestion to speed it
> up or a better solution?
>
> Greetings
>
>
> CREATE TABLE points
> (
> id_point serial PRIMARY KEY,
> geom geometry(Point,3857),
> cluster_nr integer
> );
> INSERT INTO points (geom)
> SELECT ST_SetSrid(ST_MakePoint(12.5 + random()*5, 51 + random()*5), 3857)
> FROM generate_series(0, 1000);
>
> CREATE TABLE pg_clusters
> (
> gid integer,
> cluster_feature_ids integer[],
> count integer,
> geom geometry
> );
>
> -- select first point, which isn't clustered yet
> SELECT id_point
> FROM points
> WHERE cluster_nr IS NULL
> LIMIT 1;
> -- use first point and get all surrounding points in defined distance
> including the point itself
> -- use them as first cluster
> WITH first_cluster AS (
> SELECT a.id_point AS cluster_nr,
> b.id_point,
> b.geom AS geom
> FROM points a,
> points b
> WHERE ST_Distance(a.geom, b.geom) <= 4891.96981025 * 40
> AND a.id_point = 185
> AND a.cluster_nr IS NULL
> )
> UPDATE points AS a
> SET cluster_nr = b.cluster_nr
> FROM first_cluster AS b
> WHERE a.id_point = b.id_point;
> --> choose next id
> --> calculate next cluster
> etc.
> -- calculate the centroid of a cluster and count cluster points
> DELETE FROM pg_clusters;
> WITH clusters AS (
> SELECT cluster_nr, array_agg(id_point) AS feature_ids,
> count(id_point), ST_Centroid(ST_Collect(geom)) AS geom
> FROM points
> GROUP BY cluster_nr
> )
> INSERT INTO pg_clusters(gid, cluster_feature_ids, count, geom)
> SELECT * FROM clusters;
>
> _______________________________________________
> postgis-users mailing list
> postgis-users@lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>
[Attachment #5 (text/html)]
<div dir="ltr"><div><div><div>Hey,<br>I did the same thing using PLR and a minimal \
spanning tree.<br></div>It was fast, easy, and clustering method could be changed at \
will <br></div>(see all classical unsupervised learning).<br> \
<br></div>Cheers,<br>Rémi-C<br></div><div class="gmail_extra"><br><br><div \
class="gmail_quote">2014-04-24 12:53 GMT+02:00 Paragon Corporation <span \
dir="ltr"><<a href="mailto:lr@pcorp.us" \
target="_blank">lr@pcorp.us</a>></span>:<br> <blockquote class="gmail_quote" \
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><u></u>
<div>
<div dir="ltr" align="left"><span><font color="#0000ff" face="Arial">Quick glance at \
your plpgsql function. It won't be able to use a spatial index. To utilize a \
spatial index, you'll need to change your construct</font></span></div><div \
class=""> <div dir="ltr" align="left"><span><font color="#0000ff" \
face="Arial"></font></span> </div> <div dir="ltr" align="left"><span>
ST_Distance(a.geom, b.geom) <= 4891.96981025 * 40<br></span></div>
<div dir="ltr" align="left"><span><font color="#0000ff" face="Arial"></font></span> \
</div> </div><div dir="ltr" align="left"><span><font color="#0000ff" \
face="Arial">to:</font></span></div> <div dir="ltr" align="left"><span> \
ST_DWithin(a.geom, b.geom, 4891.96981025 * 40)</span></div>
<div><span></span> </div>
<div><span><font color="#0000ff" face="Arial">Also
make sure you have spatial indexes on your points table.</font></span></div>
<div><span><font color="#0000ff" face="Arial"></font></span> </div>
<div><span><font color="#0000ff" face="Arial">Hope
that helps,</font></span></div>
<div><span><font color="#0000ff" face="Arial">Regina</font></span></div>
<div><span><font color="#0000ff" face="Arial"><a href="http://www.postgis.us" \
target="_blank">http://www.postgis.us</a></font></span></div> <div><span><font \
color="#0000ff" face="Arial"><a href="http://postgis.net" \
target="_blank">http://postgis.net</a></font></span></div> <div><span><font \
color="#0000ff" face="Arial"></font> </span></div> <div dir="ltr" \
align="left"><br></div><br> <div dir="ltr" align="left" lang="en-us">
<hr>
<font face="Tahoma"><b>From:</b> <a \
href="mailto:postgis-users-bounces@lists.osgeo.org" \
target="_blank">postgis-users-bounces@lists.osgeo.org</a> [mailto:<a \
href="mailto:postgis-users-bounces@lists.osgeo.org" \
target="_blank">postgis-users-bounces@lists.osgeo.org</a>] <b>On Behalf Of \
</b>Matthias Ludwig<br><b>Sent:</b> Wednesday, April 23, 2014 10:53 AM<br><b>To:</b> \
<a href="mailto:postgis-users@lists.osgeo.org" \
target="_blank">postgis-users@lists.osgeo.org</a><br><b>Subject:</b> [postgis-users] \
Distance based clustering (like in Openlayers \
orLeaflet)<br></font><br></div><div><div class="h5"> <div></div>
<div style="FONT-SIZE:12px;FONT-FAMILY:Verdana">
<div>
<div>Hi,</div>
<div>I try to implement a simple distance based clustering in PostgreSQL.</div>
<div>It's working the following way:</div>
<div>1. loop as long as there are non clustered points</div>
<div> 2. took one point that is not clustered</div>
<div> 3. define this as a new cluster</div>
<div> 4. get all surrounding points in a defined distance and add
them into the cluster</div>
<div>5. calculate the centroid for each cluster and count cluster points</div>
<div> </div>
<div>The calculation speed depends mostly on the count of points and the scale
used. In the moment i'am using a python script and it took around 1sec for 1000
points for TMS zoomlevel 5 (distance = 4891.96981025m * 40 pixel) and around
3sec at zoomlevel 15 (distance = <span \
style="WHITE-SPACE:normal;TEXT-TRANSFORM:none;WORD-SPACING:0px;FLOAT:none;COLOR:rgb(42,42,42);FONT:13px/18px \
'Segoe UI','Lucida \
Grande',Verdana,Arial,Helvetica,sans-serif;DISPLAY:inline!important;LETTER-SPACING:normal;TEXT-INDENT:0px">4.78m</span> \
* 40 pixel). Implementing it with pgsql doesn't speeds it up.</div>
<div>Is it possible to do the work completely with SQL and maybe with just one
query? I'am not sure if I understood the WITH RECURSIVE concept completely...but
shouldn't it by possible to replace the iteration(2.-4.) with a WITH RECURSIVE
statement? Are there any other suggestion to speed it up or a better
solution?</div>
<div> </div>
<div>Greetings</div>
<div> </div>
<div> </div>
<div>CREATE TABLE points<br>(<br> id_point serial PRIMARY KEY,<br>
geom geometry(Point,3857),<br> cluster_nr integer<br>);<br>INSERT INTO
points (geom)<br>SELECT ST_SetSrid(ST_MakePoint(12.5 + random()*5, 51 +
random()*5), 3857)<br>FROM generate_series(0, 1000);</div>
<div> </div>
<div>CREATE TABLE pg_clusters<br>(<br> gid
integer,<br> cluster_feature_ids
integer[],<br> count integer,<br> geom
geometry<br>);</div>
<div> </div>
<div>-- select first point, which isn't clustered yet<br>SELECT
id_point<br>FROM points<br>WHERE cluster_nr IS
NULL<br>LIMIT 1;</div>
<div>-- use first point and get all surrounding points in defined distance
including the point itself</div>
<div>-- use them as first cluster<br>WITH first_cluster AS
(<br> SELECT a.id_point AS
cluster_nr,<br>
b.id_point,<br>
b.geom AS geom<br>
FROM points
a,<br>
points b<br> WHERE
ST_Distance(a.geom, b.geom) <= 4891.96981025 *
40<br>
AND a.id_point =
185<br>
AND a.cluster_nr IS NULL<br>)<br>UPDATE points AS
a<br>SET cluster_nr =
b.cluster_nr<br>FROM first_cluster AS b<br>WHERE
a.id_point = b.id_point;</div>
<div>--> choose next id<br>--> calculate next cluster<br>etc.</div>
<div>-- calculate the centroid of a cluster and count cluster points<br>DELETE
FROM pg_clusters;<br>WITH clusters AS
(<br> SELECT cluster_nr,
array_agg(id_point) AS feature_ids, count(id_point),
ST_Centroid(ST_Collect(geom)) AS
geom<br> FROM
points<br> GROUP BY
cluster_nr<br>)<br>INSERT INTO pg_clusters(gid, cluster_feature_ids, count,
geom)<br>SELECT * FROM clusters;</div></div></div></div></div></div>
<br>_______________________________________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users" \
target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users</a><br></blockquote></div><br></div>
_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic