'Re: [boost] Seeking feedback on SegmentedTreeSeq (sequence data structure with efficient random acce'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       boost
Subject:    Re: [boost] Seeking feedback on SegmentedTreeSeq (sequence data structure with efficient random acce
From:       Chris Clearwater <chris () detrino ! org>
Date:       2015-11-09 23:01:47
Message-ID: 564125DB.3070809 () detrino ! org
[Download RAW message or body]

On 11/09/2015 10:14 AM, Phil Endecott wrote:
> I refer to the feedback that I gave last time you posted:
>
>   http://thread.gmane.org/gmane.comp.lib.boost.devel/263167

Thanks for your feedback Phil. Sorry I did not properly respond in the 
original thread.

> I think this could be most useful if it were decomposed into
> several layers:
>
> - An in-memory B+-Tree
> - An augmented tree
> - An order-statistics tree
>
> That way, we could have in-memory B+-trees the provide a std::map-like
> interface, which would surely be useful, and we could have O(log N)
> random access insert/erase on a binary tree where that is appropriate,
> and we could have other kinds of augmented trees such as interval trees.
>
> Working out exactly what these layers should look like is not easy,
> but I think it's worth doing in order to broaden the applicability. 

My library implements a counted B+Tree. I see this as the basis to which 
other indexes can be added. In particular I would like to add support 
for maps and sets as this would provide a high performance alternative 
to the standard containers for trivially-copyable keys. I see no reason 
to make the count index optional. The memory overhead of storing the 
count index for a set where the key is 64 bits is less than a tenth of a 
percent. This is because the counts stored in level 1 of the tree are 
used to encode the length of segments, which maps/sets need to know as 
well as sequences. I have also benchmarked the CPU overhead of 
maintaining the count index and found it to be negligible. On top of 
this, it would dramtically complicate implementation for the count index 
to be optional as I would still need to store the segment lengths 
somewhere and implement iterator operations less efficiently when it was 
not present. Keeping the index around would also mean that maps/sets 
would recieve efficient access by position for practically free.

That being said, I think the data structure is useful today and should 
be a part of Boost. Pouring over the mailing list I see that other 
people have proposed and even implemented sequences with logarithmic 
random access operations many times, so clearly there is a need for such 
a thing in the community. The reaon I am posting again to the list is 
because I have worked on reference documentation, boost documentation, 
testing, and polished the implementation. This addresses some of the 
comments from my last post so it's worth taking another look at.

Regards,  Chris.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[prev in list] [next in list] [prev in thread] [next in thread]