[prev in list] [next in list] [prev in thread] [next in thread]
List: hadoop-user
Subject: Re: Is it possible to implement transpose with PigLatin/any other MR language?
From: Subir S <subir.sasikumar () gmail ! com>
Date: 2012-06-25 19:59:42
Message-ID: CA+m5d=z0FU+AQ-vEso=jTG=GT=qVEr6EBw76rF8vpYTpvYpUbw () mail ! gmail ! com
[Download RAW message or body]
That is great Simone! I have not tried your suggestion yet, but will surely
try it.
@Robert, thank you I will try that option too.
On Mon, Jun 25, 2012 at 3:12 PM, Simone Leo <simleo@crs4.it> wrote:
> Hello,
>
> we recently added a tool for solving relatively simple problems like this
> one to Pydoop. The tool is called Pydoop Script:
>
> http://pydoop.sourceforge.net/**docs/pydoop_script.html#**pydoop-script<http://pydoop.sourceforge.net/docs/pydoop_script.html#pydoop-script>
>
> Using Pydoop Script, I implemented the transposer in 14 lines of code:
>
> import struct
>
> def mapper(key, value, writer):
> value = value.split()
> for i, a in enumerate(value):
> writer.emit(struct.pack(">q", i), "%s\t%s" % (key, a))
>
> def reducer(key, ivalue, writer):
> vector = []
> for v in ivalue:
> v = v.split("\t")
> v[0] = struct.unpack(">q", v[0])[0]
> vector.append(v)
> vector.sort()
> vector = [v[1] for v in vector]
> writer.emit(struct.unpack(">q"**, key)[0], "\t".join(vector))
>
> Here is the complete workflow:
>
> hadoop fs -put matrix.txt{,}
> pydoop script transpose.py matrix.txt t_matrix
> hadoop fs -get t_matrix{,}
> sort -mn -k1,1 -o t_matrix.txt t_matrix/part-0000*
>
> The final t_matrix.txt actually contains an additional first column with
> row indexes that should be removed (but this can probably be avoided if the
> transposed matrix acts as input for another job). Although the above
> implementation can be improved in several ways, it took me just about 30
> minutes to write and test after seeing your message.
>
> Cheers
>
> Simone
>
>
> On 06/21/2012 10:16 AM, Subir S wrote:
>
> > Hi,
> >
> > Is it possible to implement transpose operation of rows into columns and
> > vice versa...
> >
> >
> > i.e.
> >
> > col1 col2 col3
> > col4 col5 col6
> > col7 col8 col9
> > col10 col11 col12
> >
> > can this be converted to
> >
> > col1 col4 col7 col10
> > col2 col5 col8 col11
> > col3 col6 col9 col12
> >
> > Is this even possible with map reduce? If yes, which language helps to
> > achieve this faster?
> >
> > Thanks
> >
> >
> --
> Simone Leo
> Data Fusion - Distributed Computing
> CRS4
> POLARIS - Building #1
> Piscina Manna
> I-09010 Pula (CA) - Italy
> e-mail: simone.leo@crs4.it
> http://www.crs4.it
>
>
>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic