[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    RE: Geospatial clustering + zoom in/out help
From:       "Smiley, David W." <dsmiley () mitre ! org>
Date:       2014-01-31 19:27:40
Message-ID: DA95D27D48C2AB4E815C79B1D473F20E2D8D0FE8 () IMCMBX04 ! MITRE ! ORG
[Download RAW message or body]

Hi Bojan.

You've got some good ideas here along the lines of some that others have tried.  I've \
through together a page on the wiki about this subject some time ago that I'm sure \
you will find interesting.  It references a relevant stack-overflow post, and also a \
presentation at DrupalCon which had a segment from a guy using the same approach you \
suggest here involving field-collapsing and/or stats components.  The video shows it \
in action.

http://wiki.apache.org/solr/SpatialClustering

It would be helpful for everyone if you share your experience with whatever you \
choose, once you give an approach a try.

~ David
________________________________________
From: Bojan Šmid [bosmid@gmail.com]
Sent: Thursday, January 30, 2014 1:15 PM
To: solr-user@lucene.apache.org
Subject: Geospatial clustering + zoom in/out help

Hi,

I have an index with 300K docs with lat,lon. I need to cluster the docs
based on lat,lon for display in the UI. The user then needs to be able to
click on any cluster and zoom in (up to 11 levels deep).

I'm using Solr 4.6 and I'm wondering how best to implement this efficiently?

A bit more specific questions below.

I need to:

1) cluster data points at different zoom levels

2) click on a specific cluster and zoom in

3) be able to select a region (bounding box or polygon) and show clusters
in the selected area

What's the best way to implement this so that queries are fast?

What I thought I would try, but maybe there are better ways:

* divide the world in NxM large squares and then each of these squares into
4 more squares, and so on - 11 levels deep

* at index time figure out all squares (at all 11 levels) each data point
belongs to and index that info into 11 different fields: e.g.
<id=1 name=foo lat=x lon=y zoom1=square1_62  zoom2=square1_62_47
zoom3=square1_62_47_33 ....>

* at search time, use field collapsing on zoomX field to get which docs
belong to which square on particular level

* calculate center point of each square (by calculating mean value of
positions for all points in that square) using StatsComponent (facet on
zoomX field, avg on lat and lon fields) - I would consider those squares as
separate clusters (one square is one cluster) and center points of those
squares as center points of clusters derived from them

I *think* the problem with this approach is that:

* there will be many unique fields for bigger zoom levels, which means
field collapsing / StatsComponent maaay not work fast enough

* clusters will not look very natural because I would have many clusters on
each zoom level and what are "real" geographical clusters would be
displayed as multiple clusters since their points would in some cases be
dispersed into multiple squares. But that may be OK

* a lot will depend on how the squares are calculated - linearly dividing
360 degrees by N to get "equal" size squares in degrees would produce
issues with "real" square sizes and counts of points in each of them


So I'm wondering if there is a better way?

Thanks,


  Bojan
=


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic