[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-user
Subject:    Partitions
From:       "David G. Boney" <dboney1 () semanticartifacts ! com>
Date:       2010-12-24 19:07:15
Message-ID: 49DF653C-BB68-47EE-A00D-574F06D2609C () semanticartifacts ! com
[Download RAW message or body]

I am using the Hadoop interface with Cassandra. Is it possible to line up partitions \
or splits of two different column families to be on the same node? I am doing this \
for data locality reasons. I want to read all the data from a split of column family \
A and a split from column family B into memory to do some processing.

Here is an example. Column family A has 1,000,000 rows and column family B has \
50,000,000 rows. Let say column family A has a split every 10,000 rows and column \
family B has a split every 500,000 rows. I want the first split of A and the first \
split of B on same node and the second split of A and second split of B on the next \
node, and so on. 

A second scenario is that the two column families use the same key. Lets assume the \
key is an integer in the range of 1 to 1,000,000. The two column families have a \
different number of rows. I would like the splits to occur at certain multiples of \
the key value, say every 10,000. The first split would have keys in the range of 1 to \
9999. The second split would have keys in the range of 10,000 to 19,999 and so on. I \
still want the first split of column family A and the first split of column family B \
to be on the first node, and so on. It is possible in this scenario that a split \
                could be empty or very small, that is OK.
-------------
Sincerely,
David G. Boney
dboney1@semanticartifacts.com
http://www.semanticartifacts.com


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic