[prev in list] [next in list] [prev in thread] [next in thread]
List: cassandra-user
Subject: Issues with MapReduce Job (using Brisk)
From: Silvère_Lestang <silvere.lestang () gmail ! com>
Date: 2011-05-31 11:00:47
Message-ID: BANLkTimPLpvd+O8-RsPBmr7x_kqDePBS_Q () mail ! gmail ! com
[Download RAW message or body]
Hi,
I try to create a MapReduce job that calculate the average of values stored
in cassandra and write the result back to cassandra (using
ColumnFamilyOutputFormat and ColumnFamilyInputFormat). I use the Brisk
distribution of Hadoop but I don't know if it's somehow related.
My code is here: http://pastebin.com/8gd21VuP
As I understand in the WordCount example, the first parameters of the map
method is the row key and the second one is a map of <column.name, column>.
I found confirmation of this in the class ColumnFamilyRecordReader.
But my code didn't works. I dumped the row key and the column name using
logger and I saw that both seems to be get_range_slices objects, which is
very unexpected.
Here is an example of log (the CF contains 5 rows called row[0-4] with 10
columns each called columnKey[0-9] with value 42 ( * in ascii)):
*MAP rowkey*:
get_range_slicesrow1columnKey0*`columnKey1*acolumnKey2*bcolumnKey3*ccolumnK=
ey4*dcolumnKey5*ecolumnKey6*fcolumnKey7*gcolumnKey8*hcolumnKey9*Hrow2column=
Key0*columnKey1*columnKey2*columnKey3*columnKey4*columnKey5*columnKey6*colu=
mnKey7*columnKey8*columnKey9*row4columnKey0*pcolumnKey1*qcolumnKey2*rcolumn=
Key3*XcolumnKey4*YcolumnKey5*ZcolumnKey6*[columnKey7*@columnKey8*AcolumnKey=
9*Brow3columnKey0*columnKey1*columnKey2*columnKey3*columnKey4*columnKey5*co=
lumnKey6*columnKey7*columnKey8*columnKey9*row0columnKey0*columnKey1*:column=
Key2*:!columnKey3*:"columnKey4*:#columnKey5*>columnKey6*>columnKey7*>column=
Key8*>columnKey9*>
*MAP columnKey:*get_range_slicesrow1columnKey0*`columnKey1*acolumnKey2*bcol=
umnKey3*ccolumnKey4*dcolumnKey5*ecolumnKey6*fcolumnKey7*gcolumnKey8*hcolumn=
Key9*Hrow2columnKey0*columnKey1*columnKey2*columnKey3*columnKey4*columnKey5=
*columnKey6*columnKey7*columnKey8*columnKey9*row4columnKey0*pcolumnKey1*qco=
lumnKey2*rcolumnKey3*XcolumnKey4*YcolumnKey5*ZcolumnKey6*[columnKey7*@colum=
nKey8*AcolumnKey9*Brow3columnKey0*columnKey1*columnKey2*columnKey3*columnKe=
y4*columnKey5*columnKey6*columnKey7*columnKey8*columnKey9*row0columnKey0*co=
lumnKey1*:columnKey2*:!columnKey3*:"columnKey4*:#columnKey5*>columnKey6*>co=
lumnKey7*>columnKey8*>columnKey9*>
I can't understand why I get this instead of row key and column key, if
anybody have an idea?
Silv=E8re
[Attachment #3 (text/html)]
<div>Hi,</div><div>I try to create a MapReduce job that calculate the average of \
values stored in cassandra and write the result back to cassandra (using \
ColumnFamilyOutputFormat and ColumnFamilyInputFormat). I use the Brisk distribution \
of Hadoop but I don't know if it's somehow related.</div> <div>My code is \
here: <a href="http://pastebin.com/8gd21VuP">http://pastebin.com/8gd21VuP</a></div><div>As \
I understand in the WordCount example, the first parameters of the map method is the \
row key and the second one is a map of <<a \
href="http://column.name">column.name</a>, column>. I found confirmation of this \
in the class ColumnFamilyRecordReader.</div> <div>But my code didn't works. I \
dumped the row key and the column name using logger and I saw that both seems to be \
get_range_slices objects, which is very unexpected.</div><div>Here is an example of \
log (the CF contains 5 rows called row[0-4] with 10 columns each called \
columnKey[0-9] with value 42 ( * in ascii)):</div> <blockquote \
class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: \
0px;"><b>MAP rowkey</b>: \
get_range_slicesrow1columnKey0*`columnKey1*acolumnKey2*bcolumnKey3*ccolumnKey4*dcolumn \
Key5*ecolumnKey6*fcolumnKey7*gcolumnKey8*hcolumnKey9*Hrow2columnKey0*columnKey1*column \
Key2*columnKey3*columnKey4*columnKey5*columnKey6*columnKey7*columnKey8*columnKey9*row4 \
columnKey0*pcolumnKey1*qcolumnKey2*rcolumnKey3*XcolumnKey4*YcolumnKey5*ZcolumnKey6*[co \
lumnKey7*@columnKey8*AcolumnKey9*Brow3columnKey0*columnKey1*columnKey2*columnKey3*colu \
mnKey4*columnKey5*columnKey6*columnKey7*columnKey8*columnKey9*row0columnKey0*columnKey \
1*:columnKey2*:!columnKey3*:"columnKey4*:#columnKey5*>columnKey6*>columnKey7*>columnKey8*>columnKey9*> \
<br> <b>MAP columnKey:</b> \
get_range_slicesrow1columnKey0*`columnKey1*acolumnKey2*bcolumnKey3*ccolumnKey4*dcolumn \
Key5*ecolumnKey6*fcolumnKey7*gcolumnKey8*hcolumnKey9*Hrow2columnKey0*columnKey1*column \
Key2*columnKey3*columnKey4*columnKey5*columnKey6*columnKey7*columnKey8*columnKey9*row4 \
columnKey0*pcolumnKey1*qcolumnKey2*rcolumnKey3*XcolumnKey4*YcolumnKey5*ZcolumnKey6*[co \
lumnKey7*@columnKey8*AcolumnKey9*Brow3columnKey0*columnKey1*columnKey2*columnKey3*colu \
mnKey4*columnKey5*columnKey6*columnKey7*columnKey8*columnKey9*row0columnKey0*columnKey \
1*:columnKey2*:!columnKey3*:"columnKey4*:#columnKey5*>columnKey6*>columnKey7*>columnKey8*>columnKey9*><br>
</blockquote><div><br></div>I can't understand why I get this instead of row key \
and column key, if anybody have an idea?<div>Silvère</div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic