[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: Is it possible to implement transpose with PigLatin/any other MR language?
From:       Subir S <subir.sasikumar () gmail ! com>
Date:       2012-06-25 19:59:42
Message-ID: CA+m5d=z0FU+AQ-vEso=jTG=GT=qVEr6EBw76rF8vpYTpvYpUbw () mail ! gmail ! com
[Download RAW message or body]


That is great Simone! I have not tried your suggestion yet, but will surely
try it.

@Robert, thank you I will try that option too.

On Mon, Jun 25, 2012 at 3:12 PM, Simone Leo <simleo@crs4.it> wrote:

> Hello,
> 
> we recently added a tool for solving relatively simple problems like this
> one to Pydoop. The tool is called Pydoop Script:
> 
> http://pydoop.sourceforge.net/**docs/pydoop_script.html#**pydoop-script<http://pydoop.sourceforge.net/docs/pydoop_script.html#pydoop-script>
>  
> Using Pydoop Script, I implemented the transposer in 14 lines of code:
> 
> import struct
> 
> def mapper(key, value, writer):
> value = value.split()
> for i, a in enumerate(value):
> writer.emit(struct.pack(">q", i), "%s\t%s" % (key, a))
> 
> def reducer(key, ivalue, writer):
> vector = []
> for v in ivalue:
> v = v.split("\t")
> v[0] = struct.unpack(">q", v[0])[0]
> vector.append(v)
> vector.sort()
> vector = [v[1] for v in vector]
> writer.emit(struct.unpack(">q"**, key)[0], "\t".join(vector))
> 
> Here is the complete workflow:
> 
> hadoop fs -put matrix.txt{,}
> pydoop script transpose.py matrix.txt t_matrix
> hadoop fs -get t_matrix{,}
> sort -mn -k1,1 -o t_matrix.txt t_matrix/part-0000*
> 
> The final t_matrix.txt actually contains an additional first column with
> row indexes that should be removed (but this can probably be avoided if the
> transposed matrix acts as input for another job). Although the above
> implementation can be improved in several ways, it took me just about 30
> minutes to write and test after seeing your message.
> 
> Cheers
> 
> Simone
> 
> 
> On 06/21/2012 10:16 AM, Subir S wrote:
> 
> > Hi,
> > 
> > Is it possible to implement transpose operation of rows into columns and
> > vice versa...
> > 
> > 
> > i.e.
> > 
> > col1 col2 col3
> > col4 col5 col6
> > col7 col8 col9
> > col10 col11 col12
> > 
> > can this be converted to
> > 
> > col1 col4 col7 col10
> > col2 col5 col8 col11
> > col3 col6 col9 col12
> > 
> > Is this even possible with map reduce? If yes, which language helps to
> > achieve this faster?
> > 
> > Thanks
> > 
> > 
> --
> Simone Leo
> Data Fusion - Distributed Computing
> CRS4
> POLARIS - Building #1
> Piscina Manna
> I-09010 Pula (CA) - Italy
> e-mail: simone.leo@crs4.it
> http://www.crs4.it
> 
> 
> 



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic