[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-dev
Subject:    Re: Implementing a secondary index
From:       Caleb Rackliffe <calebrackliffe () gmail ! com>
Date:       2021-11-18 16:42:06
Message-ID: CAHvM0ucVi9P-QbWpuvn__duJ+hbJUDS5raB7L-wOdF1LdOaGbA () mail ! gmail ! com
[Download RAW message or body]


Hi Claude,

In code space, the best place to start would be the secondary index API and
the manager that maintains the indexes on a per-table basis:

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/Index.java
 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/SecondaryIndexManager.java


If you have any questions about either, feel free to reach out, either here
or in ASF Slack.

P.S. If you're interested in where secondary indexing in Cassandra is
headed, follow
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index
.

On Wed, Nov 17, 2021 at 4:34 AM DuyHai Doan <doanduyhai@gmail.com> wrote:

> Hello Claude
> 
> I have written a blog post about 2nd index architecture a long time ago but
> most of the content should still be relevant, worth checking
> 
> https://www.doanduyhai.com/blog/?p=13191
> 
> Regards
> 
> Duy Hai DOAN
> 
> Le mer. 17 nov. 2021 Ã  10:17, Claude Warren <claude.warren@instaclustr.com
> > 
> a écrit :
> 
> > Greetings,
> > 
> > I am looking to implement a Multidimensional Bloom filter index [1] [2]
> on
> > a Cassandra table.  OK, I know that is a lot to take in.  What I need is
> > any documentation that explains the architecture of the index options, or
> > someone I can ask questions of -- a mentor if you will.
> > 
> > I have a proof of concept for the index that works from the client side
> > [3].  What I want to do is move some of that processing to the server
> > side.
> > 
> > I basically I think I need to do the following:
> > 
> > 1. On each partition create an SST to store the index data.  This
> table
> > comprises, 2 integer data points and the primary key for the data
> table.
> > 2. When the index cell gets updated in the original table (there will
> > only be on column), update one or more rows in the SST table.
> > 3. When querying perform multiple queries against the index data, and
> > return the primary key values (or the data associated with the primary
> > keys
> > -- I am unclear on this bit).
> > 
> > Any help or guidance would be appreciated,
> > Claude
> > 
> > [1] https://archive.org/details/arxiv-1501.01941/mode/2up
> > [2] https://archive.fosdem.org/2020/schedule/event/bloom_filters/
> > [3] https://github.com/Claude-at-Instaclustr/blooming_cassandra
> > 
> > 
> > 
> > 
> > --
> > 
> > [image: Instaclustr logo]
> > 
> > 
> > *Claude Warren*
> > 
> > Principal Software Engineer
> > 
> > Instaclustr
> > 
> 



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic