[prev in list] [next in list] [prev in thread] [next in thread] 

List:       avro-user
Subject:    Re: Reduce-side joins in Avro M/R
From:       Andrew Kenworthy <adwkenworthy () yahoo ! com>
Date:       2011-12-13 16:46:13
Message-ID: 1323794773.91669.YahooMailNeo () web120003 ! mail ! ne1 ! yahoo ! com
[Download RAW message or body]

I'm currently using a UNION-schema to map two different types of data (read from two \
different input paths) in my reducer to a common record. This works fine, but - if I \
have understood the mechanism correctly - it would mean that Avro is having to check \
each and every record against my UNION schema. With a "normal" reduce-side join, I \
could use MultipleInputs to specify a mapper for each input, thus letting them run \
independently (since each mapper knows its input) with presumably less overhead. 


Is it possible with Avro to avoid the overhead of checking each input row against the \
union schema?

Thanks,

Andrew



> ________________________________
> From: Scott Carey <scottcarey@apache.org>
> To: "user@avro.apache.org" <user@avro.apache.org>; Andrew Kenworthy \
>                 <adwkenworthy@yahoo.com> 
> Sent: Wednesday, December 7, 2011 7:40 PM
> Subject: Re: Reduce-side joins in Avro M/R
> 
> 
> This should be conceptually the same as a normal map-reduce join of the same type.  \
> Avro handles the serialization, but not the map-reduce algorithm or strategy.    
> On 12/6/11 8:43 AM, "Andrew Kenworthy" <adwkenworthy@yahoo.com> wrote:
> 
> 
> Hi,
> > 
> > 
> > I'd like to use reduce-side joins in an avro M/R job, and am not sure how to do \
> > it: are there any best-practice tips or outlines of what one would have to \
> > implement in order to make this possible? 
> > 
> > Thanks,
> > 
> > 
> > Andrew Kenworthy
> 
> 


[Attachment #3 (text/html)]

<html><body><div style="color:#000; background-color:#fff; font-family:verdana, \
helvetica, sans-serif;font-size:10pt"><div>I'm currently using a UNION-schema to map \
two different types of data (read from two different input paths) in my reducer to a \
common record. This works fine, but - if I have understood the mechanism correctly - \
it would mean that Avro is having to check each and every record against my UNION \
schema. With a "normal" reduce-side join, I could use MultipleInputs to specify a \
mapper for each input, thus letting them run independently (since each mapper knows \
its input) with presumably less \
overhead.&nbsp;<br></div><div><span><br></span></div><div><span>Is it possible with \
Avro to avoid the overhead of checking each input row against the union \
schema?</span></div><div><span><br></span></div><div><span>Thanks,</span></div><div><span><br></span></div><div>Andrew</div><div><br><blockquote \
style="border-left: 2px solid rgb(16, 16, 255);  margin-left: 5px; margin-top: 5px; \
padding-left: 5px;">  <div style="font-size: 10pt; font-family: verdana, helvetica, \
sans-serif; "> <div style="font-size: 12pt; font-family: 'times new roman', 'new \
york', times, serif; "> <font size="2" face="Arial"> <hr size="1">  <b><span \
style="font-weight:bold;">From:</span></b> Scott Carey \
&lt;scottcarey@apache.org&gt;<br> <b><span style="font-weight: bold;">To:</span></b> \
"user@avro.apache.org" &lt;user@avro.apache.org&gt;; Andrew Kenworthy \
&lt;adwkenworthy@yahoo.com&gt; <br> <b><span style="font-weight: \
bold;">Sent:</span></b> Wednesday, December 7, 2011 7:40 PM<br> <b><span \
style="font-weight: bold;">Subject:</span></b> Re: Reduce-side joins in Avro M/R<br> \
</font> <br> <meta http-equiv="x-dns-prefetch-control" content="off"><div \
id="yiv2116054764"><div><div>This should be conceptually the same as a normal \
map-reduce join of the same type. &nbsp;Avro handles the serialization, but not the \
map-reduce algorithm or strategy. &nbsp;&nbsp;</div><div><br></div><span \
id="yiv2116054764OLK_SRC_BODY_SECTION"><div><div>On 12/6/11 8:43 AM, "Andrew \
Kenworthy" &lt;<a rel="nofollow" ymailto="mailto:adwkenworthy@yahoo.com" \
target="_blank" href="mailto:adwkenworthy@yahoo.com">adwkenworthy@yahoo.com</a>&gt; \
wrote:</div></div><div><br></div><blockquote \
id="yiv2116054764MAC_OUTLOOK_ATTRIBUTION_BLOCKQUOTE" style="BORDER-LEFT:#b5c4df 5 \
solid;PADDING:0 0 0 5;MARGIN:0 0 0 5;"><div><div><div style="color: rgb(0, 0, 0); \
background-color: rgb(255, 255, 255); font-size: 10pt; font-family: verdana, \
helvetica, sans-serif; "><div>Hi,</div><div><br></div><div>I'd like to use \
reduce-side joins in an avro M/R job, and am not sure how to do it: are there  any \
best-practice tips or outlines of what one would have to implement in order to make \
this possible?</div><div><br></div><div>Thanks,</div><div><br></div><div>Andrew \
Kenworthy</div></div></div></div></blockquote></span></div> </div><meta \
http-equiv="x-dns-prefetch-control" content="on"><br><br> </div> </div> \
</blockquote></div>   </div></body></html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic