'Re: [postgis-devel] PointM(x,y,t) and gist_geometry_ops_nd'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgis-devel
Subject:    Re: [postgis-devel] PointM(x,y,t) and gist_geometry_ops_nd
From:       Paul Ramsey <pramsey () cleverelephant ! ca>
Date:       2022-05-20 0:25:16
Message-ID: DC8ECD38-4B15-4F25-A8CF-5ACC40952331 () cleverelephant ! ca
[Download RAW message or body]

Fun reads, but I didn't see anything that could be taken easily into the world of \
GIST...  It does seem like an index where the key is a ((x1,y1,x2,xy), (t1, t2)) \
(conceptually) with some kind of knowledge about the likely query pattern could build \
a tree that starts off spliting agressively on time then flips to splitting on space \
as the nodes get down to the expected query time window size. But it still seems like \
an open question whether that would ever be better than just a pair of independent \
indexes on time and space, supplemented by maybe some agressive partitioning on time. \
Very interesting reads! P

> On May 17, 2022, at 12:46 AM, Esteban Zimanyi <esteban.zimanyi@ulb.be> wrote:
> 
> Dear all
> 
> Regarding spatio-temporal indexing, seminal papers are the recurrent surveys from \
> Walid G. Aref 
> https://www.cs.purdue.edu/homes/aref/RecentPapers.html
> 
> Ahmed R. Mahmood, Sri Punni, Walid G. Aref:
> Spatio-temporal access methods: a survey (2010 - 2017). GeoInformatica 23(1): 1-36 \
> (2019) 
> Long-Van Nguyen-Dinh, Walid G. Aref, Mohamed F. Mokbel:
> Spatio-Temporal Access Methods: Part 2 (2003 - 2010). IEEE Data Eng. Bull. 33(2): \
> 46-55 (2010) 
> Mohamed F. Mokbel, Thanaa M. Ghanem, Walid G. Aref:
> Spatio-Temporal Access Methods. IEEE Data Eng. Bull. 26(2): 40-49 (2003)
> 
> A great introduction is Chapter 7 in 
> 
> Ralf Hartmut Güting, Markus Schneider
> Moving Objects Databases, Morgan Kaufmann, 2005
> 
> https://www.amazon.com/Objects-Databases-Kaufmann-Management-Systems/dp/0120887991
> 
> Regards
> 
> Esteban
> 
> 
> On Mon, May 16, 2022 at 5:54 PM Paul Ramsey <pramsey@cleverelephant.ca> wrote:
> Thank you Darafei for sharing your experience. I'm glad this is not a new and \
> exciting problem, but rather a well experienced one. 
> So, I did try out using Mercator instead of geographics to give the spatial \
> dimension some more weight, and it resulted in a better performing index (about \
> 700ms vs 1000ms before for the same query) but still way less good than using a \
> multi-key on gist. 
> I also timed out just using a timestamp index and a spatial index separately, and \
> letting the system bitmap them together, and: surprise! this ended up being the \
> actual fastest option of them all. So "do nothing special and just add single \
> indexes on the columns of interest" won the whole race. 
> This feels a little counter-intuitive, but maybe not. Maybe the gains available \
> from an explicit spatial-temporal approach have quite a high floor already. 
> Anyways, I was thinking about what such an explicit approach might look like.
> 
> * put the XYT into a fixed-length key to get rid of the expense of the varlena key
> * compute the penalty of cost proportionally instead of absolutely... so a given \
> new key will expand an existing node by N% in the time dimensin and M% in the \
> spatial dimensions. That stills leaves a magic number to balance time and space, \
> but it at least doesn't just wash out due to something arbitrary like the input \
> unit size 
> That still leaves splits to be calculated but maybe there's a similar proportional \
> approach that can be applied. 
> Anyone know any good papers on spatio-temporal indexing?
> 
> P
> 
> > On May 12, 2022, at 1:45 PM, Дорофей Пролесковский \
> > <me@komzpa.net> wrote: 
> > Hi,
> > 
> > Thanks for the summon. I think the question is much better analyzed in the \
> > MobilityDB working group, so Esteban please feel free to add the findings you \
> > have to the discussion :) 
> > I would like to see the EXPLAIN (ANALYZE, VERBOSE, BUFFERS) from your queries so \
> > that we're sure that there is indeed an over-scanning somewhere and not a runaway \
> > Bitmap scan. 
> > Indeed the first thing is that the penalty will be calculated in a \
> > non-proportional way. What you get is essentially a 2D index that occasionally \
> > gets indexed internally by Z too. In the past designs I've mitigated this by \
> > using SRID=3857 which is nice for that kind of data, since it looks good and also \
> > assumes things travel 1 meter per second which is not far from the truth. 
> > Second thing is our GiST rework adventure's finding by Aliaskandr Kalenik on how \
> > Postgres handles multi-column gist indexes page split: after a split on first \
> > column it checks if the resulting pages have tuples that can have penalty=0 when \
> > inserted into either page and re-runs the split for them based on second column. \
> > I suspect this will be happening a lot in AIS data. Nothing similar happens to \
> > dimensions in 2D or ND gist implementation and may be a way for improvement. 
> > Third one that comes to mind is that our ND implementation is not getting as much \
> > love as our 2D gist implementation: picksplit there is still not updated to \
> > Alexander Korotkov's method from 2012, there is no universal strategy for sorting \
> > build since it's shared between XYZ and lon-lat-ele projections, and it is \
> > generally a varlen thing with a lot of conditional loops on dimensions (which are \
> > still up to 4). For fair comparison, can you create (geom_nd, T) index on it with \
> > geometry being 3D with M forced to 0 and check what is the pure difference \
> > between 2D and ND ops? 
> > Fourth thing is that people with that kind of data don't want point-based \
> > indexing of movement tracks, if someone disappears for 10 minutes I still want to \
> > see where they were before and after that if I'm doing a slice. You want short \
> > tracks that stitch together. MobilityDB has that abstraction, and on that kind of \
> > data picksplit will work better and gist tree will be nicer, since you now can't \
> > just split at arbitrary Z/M and get nice separated buckets of data on first try. 
> > Also in real data there will be an ID of vehicle and that usually will also be in \
> > the indexing partition.  
> > Hope this helps.
> > 
> > Darafei.
> > 
> > чт, 12 мая 2022 г. в 22:54, Paul Ramsey <pramsey@cleverelephant.ca>:
> > All (but especially Darafei),
> > 
> > I grabbed some AIS data (7M ship movements) to do up a blog post on indexing \
> > spatio-temporal data and some of the quirks therein. 
> > The data were long/lat/timestamp, as one would expect. I built a simple 2D index \
> > and did some queries that included an ST_Intersects() and a time range, where the \
> > filter was such that the result set was quite selective (a few thousand points) \
> > and it ran (predictably) slowly, as it had to choose either a temporal or a \
> > spatial index, and then plow through the remaining results in brute force. 
> > The naive approach took about 4s.
> > 
> > Then I created a new table with XYT data, by doing PointM(long, lat, \
> > extract(epoch from ts)). Then I built an nd index on that table. 
> > Then I built a XYT query box that was the same as my original filter. And I ran \
> > it on the new table. 
> > It was faster than the naive approach, but only by about 4x. It took about 1s.
> > 
> > Then I added the btree_gist extension and built a multi-key index on timestamp \
> > and geometry using the 2d ops. Running the same query as the naive approach, I \
> > got a 40ms return time. So, definitely this is the winning approach. 
> > However, I am left with the nd XYT approach  being really slow, and wanting to \
> > mentally model why. 
> > Here's my guess:
> > 
> > In terms of data range, you have longitude, which in my data is no more than \
> > about 10 units wide. And similar for latitude. And then I have the time range, \
> > which is on a 1 second basis for a day so 86400 units. 
> > The nd penalty function looks at how the bounds volume and "edge" metrics are \
> > affected by adding new points. Given the huge difference in dimensional size, no \
> > matter what data is being added, the T dimension is always going to dominate. So \
> > the index will be made up of splits on T, over and over and over, until the \
> > variance in XY finally starts to get closer to the variance in T (when we're \
> > talking about time slices of a second or so).  
> > So the performance of queries on this index is going to be pretty bad for time \
> > ranges of more than a couple seconds, since the index will provide no spatial \
> > filtering at all on larger time ranges. 
> > Basically the moral of the story is "you cannot expect an ND index on non-spatial \
> > dimensions mixed with spatial dimensions to work unless you massage your \
> > dimensions to have proportions that match your expected query pattern". 
> > That's my story.
> > 
> > Does that sound right to you?
> > 
> > P
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel@lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel@lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
> 
> _______________________________________________
> postgis-devel mailing list
> postgis-devel@lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
> _______________________________________________
> postgis-devel mailing list
> postgis-devel@lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel

_______________________________________________
postgis-devel mailing list
postgis-devel@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-devel

[prev in list] [next in list] [prev in thread] [next in thread]