[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-user
Subject:    Re: Multi-type column values in single CF
From:       Silvčre_Lestang <silvere.lestang () gmail ! com>
Date:       2011-07-04 8:48:57
Message-ID: CAFg-H_KecjkTa4TErqujUExM1bLCcXV_u1CsnW8yJvMw7D2rXA () mail ! gmail ! com
[Download RAW message or body]

We do pretty much the same thing here, dynamic column with a timestamp for
column name and a different value type for each row. We use the
serialization/deserialization classes provided with Hector and store the
type of the value in the key of the row. Example of row key:
"b6c8a1e7281761e62230ea76daa3d841#INT" => every values are Integer
"7f30a6a2bbb1b921afc8216d8c5d9257#DOUBLE" => every values are Double
....
If I'll have to do it again, I'll try to use (Dynamic)CompositeType for
value or an equivalent mechanism as suggested by Roland.

On 3 July 2011 15:07, Roland Gude <roland.gude@yoochoose.com> wrote:

> You could do the serialization for all your supported datatypes yourself
> (many libraries for serialization are available and a pretty thorough
> benchmarking for them can be found here:
> https://github.com/eishay/jvm-serializers/wiki) and prepend the serialized
> bytes with an identifier for your datatype.
> This would not avoid casting though but would still be better performing
> then serializing to strings as it is done in your example.
> Prepending the values with the id seems to be better to me, because you can
> be sure that a new insertion to some field overwrites the correct column
> even if it changed the type.
>
> -----Ursprüngliche Nachricht-----
> Von: osishkin osishkin [mailto:osishkin@gmail.com]
> Gesendet: Sonntag, 3. Juli 2011 13:52
> An: user@cassandra.apache.org
> Betreff: Multi-type column values in single CF
>
> Hi all,
>
> I need to store column values that are of various data types in a
> single column family, i.e I have column values that are integers,
> others that are strings, and maybe more later. All column names are
> strings (no comparator problem for me).
> The thing is I need to store unstructured data - I do not have fixed
> and known-in-advacne column names, so I can not use a fixed static map
> for casting the values back to their original type on retrieval from
> cassandra.
>
> My immediate naive thought is to simply prefix every column name with
> the type the value needs to be cast back to.
> For example i'll do the follwing conversion to the columns of some key -
> {'attr1': 'val1','attr2': 100}  ~> {'str_attr1' : 'val1', 'int_attr2' :
> '100'}
> and only then send it to cassandra. This way I know to what should I
> cast it back.
>
> But all this casting back and forth on the client side seems to me to
> be very bad for performance.
> Another option is to split the columns on dedicated column families
> with mathcing validation types - a column family for integer values,
> one for string, one for timestamp etc.
> But that does not seem very efficient either (and worse for any
> rollback mechanism), since now I have to perform several get calls on
> multiple CFs where once I had only one.
>
> I thought perhaps someone has encountered a similar situation in the
> past, and can offer some advice on the best course of action.
>
> Thank you,
> Osi
>
>
>

[Attachment #3 (text/html)]

We do pretty much the same thing here, dynamic column with a timestamp for column \
name and a different value type for each row. We use the \
serialization/deserialization classes provided with Hector and store the type of the \
value in the key of the row. Example of row key:<div> \
&quot;b6c8a1e7281761e62230ea76daa3d841#INT&quot; =&gt; every values are \
Integer</div><div>&quot;7f30a6a2bbb1b921afc8216d8c5d9257#DOUBLE&quot; =&gt; every \
values are Double</div><div>....</div><div>If I&#39;ll have to do it again, I&#39;ll \
try to use (Dynamic)CompositeType for value or an equivalent mechanism as suggested \
by Roland.<br> <br><div class="gmail_quote">On 3 July 2011 15:07, Roland Gude <span \
dir="ltr">&lt;<a href="mailto:roland.gude@yoochoose.com">roland.gude@yoochoose.com</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex;"> You could do the serialization for all your supported \
datatypes yourself (many libraries for serialization are available and a pretty \
thorough benchmarking for them can be found here: <a \
href="https://github.com/eishay/jvm-serializers/wiki" \
target="_blank">https://github.com/eishay/jvm-serializers/wiki</a>) and prepend the \
serialized bytes with an identifier for your datatype.<br>

This would not avoid casting though but would still be better performing then \
serializing to strings as it is done in your example.<br> Prepending the values with \
the id seems to be better to me, because you can be sure that a new insertion to some \
field overwrites the correct column even if it changed the type.<br> <br>
-----Ursprüngliche Nachricht-----<br>
Von: osishkin osishkin [mailto:<a \
                href="mailto:osishkin@gmail.com">osishkin@gmail.com</a>]<br>
Gesendet: Sonntag, 3. Juli 2011 13:52<br>
An: <a href="mailto:user@cassandra.apache.org">user@cassandra.apache.org</a><br>
Betreff: Multi-type column values in single CF<br>
<div><div></div><div class="h5"><br>
Hi all,<br>
<br>
I need to store column values that are of various data types in a<br>
single column family, i.e I have column values that are integers,<br>
others that are strings, and maybe more later. All column names are<br>
strings (no comparator problem for me).<br>
The thing is I need to store unstructured data - I do not have fixed<br>
and known-in-advacne column names, so I can not use a fixed static map<br>
for casting the values back to their original type on retrieval from<br>
cassandra.<br>
<br>
My immediate naive thought is to simply prefix every column name with<br>
the type the value needs to be cast back to.<br>
For example i&#39;ll do the follwing conversion to the columns of some key -<br>
{&#39;attr1&#39;: &#39;val1&#39;,&#39;attr2&#39;: 100}  ~&gt; {&#39;str_attr1&#39; : \
&#39;val1&#39;, &#39;int_attr2&#39; : &#39;100&#39;}<br> and only then send it to \
cassandra. This way I know to what should I<br> cast it back.<br>
<br>
But all this casting back and forth on the client side seems to me to<br>
be very bad for performance.<br>
Another option is to split the columns on dedicated column families<br>
with mathcing validation types - a column family for integer values,<br>
one for string, one for timestamp etc.<br>
But that does not seem very efficient either (and worse for any<br>
rollback mechanism), since now I have to perform several get calls on<br>
multiple CFs where once I had only one.<br>
<br>
I thought perhaps someone has encountered a similar situation in the<br>
past, and can offer some advice on the best course of action.<br>
<br>
Thank you,<br>
Osi<br>
<br>
<br>
</div></div></blockquote></div><br></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic