[prev in list] [next in list] [prev in thread] [next in thread]
List: avro-user
Subject: Re: Reduce-side joins in Avro M/R
From: Andrew Kenworthy <adwkenworthy () yahoo ! com>
Date: 2011-12-13 16:46:13
Message-ID: 1323794773.91669.YahooMailNeo () web120003 ! mail ! ne1 ! yahoo ! com
[Download RAW message or body]
I'm currently using a UNION-schema to map two different types of data (read from two \
different input paths) in my reducer to a common record. This works fine, but - if I \
have understood the mechanism correctly - it would mean that Avro is having to check \
each and every record against my UNION schema. With a "normal" reduce-side join, I \
could use MultipleInputs to specify a mapper for each input, thus letting them run \
independently (since each mapper knows its input) with presumably less overhead.
Is it possible with Avro to avoid the overhead of checking each input row against the \
union schema?
Thanks,
Andrew
> ________________________________
> From: Scott Carey <scottcarey@apache.org>
> To: "user@avro.apache.org" <user@avro.apache.org>; Andrew Kenworthy \
> <adwkenworthy@yahoo.com>
> Sent: Wednesday, December 7, 2011 7:40 PM
> Subject: Re: Reduce-side joins in Avro M/R
>
>
> This should be conceptually the same as a normal map-reduce join of the same type. \
> Avro handles the serialization, but not the map-reduce algorithm or strategy.
> On 12/6/11 8:43 AM, "Andrew Kenworthy" <adwkenworthy@yahoo.com> wrote:
>
>
> Hi,
> >
> >
> > I'd like to use reduce-side joins in an avro M/R job, and am not sure how to do \
> > it: are there any best-practice tips or outlines of what one would have to \
> > implement in order to make this possible?
> >
> > Thanks,
> >
> >
> > Andrew Kenworthy
>
>
[Attachment #3 (text/html)]
<html><body><div style="color:#000; background-color:#fff; font-family:verdana, \
helvetica, sans-serif;font-size:10pt"><div>I'm currently using a UNION-schema to map \
two different types of data (read from two different input paths) in my reducer to a \
common record. This works fine, but - if I have understood the mechanism correctly - \
it would mean that Avro is having to check each and every record against my UNION \
schema. With a "normal" reduce-side join, I could use MultipleInputs to specify a \
mapper for each input, thus letting them run independently (since each mapper knows \
its input) with presumably less \
overhead. <br></div><div><span><br></span></div><div><span>Is it possible with \
Avro to avoid the overhead of checking each input row against the union \
schema?</span></div><div><span><br></span></div><div><span>Thanks,</span></div><div><span><br></span></div><div>Andrew</div><div><br><blockquote \
style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; margin-top: 5px; \
padding-left: 5px;"> <div style="font-size: 10pt; font-family: verdana, helvetica, \
sans-serif; "> <div style="font-size: 12pt; font-family: 'times new roman', 'new \
york', times, serif; "> <font size="2" face="Arial"> <hr size="1"> <b><span \
style="font-weight:bold;">From:</span></b> Scott Carey \
<scottcarey@apache.org><br> <b><span style="font-weight: bold;">To:</span></b> \
"user@avro.apache.org" <user@avro.apache.org>; Andrew Kenworthy \
<adwkenworthy@yahoo.com> <br> <b><span style="font-weight: \
bold;">Sent:</span></b> Wednesday, December 7, 2011 7:40 PM<br> <b><span \
style="font-weight: bold;">Subject:</span></b> Re: Reduce-side joins in Avro M/R<br> \
</font> <br> <meta http-equiv="x-dns-prefetch-control" content="off"><div \
id="yiv2116054764"><div><div>This should be conceptually the same as a normal \
map-reduce join of the same type. Avro handles the serialization, but not the \
map-reduce algorithm or strategy. </div><div><br></div><span \
id="yiv2116054764OLK_SRC_BODY_SECTION"><div><div>On 12/6/11 8:43 AM, "Andrew \
Kenworthy" <<a rel="nofollow" ymailto="mailto:adwkenworthy@yahoo.com" \
target="_blank" href="mailto:adwkenworthy@yahoo.com">adwkenworthy@yahoo.com</a>> \
wrote:</div></div><div><br></div><blockquote \
id="yiv2116054764MAC_OUTLOOK_ATTRIBUTION_BLOCKQUOTE" style="BORDER-LEFT:#b5c4df 5 \
solid;PADDING:0 0 0 5;MARGIN:0 0 0 5;"><div><div><div style="color: rgb(0, 0, 0); \
background-color: rgb(255, 255, 255); font-size: 10pt; font-family: verdana, \
helvetica, sans-serif; "><div>Hi,</div><div><br></div><div>I'd like to use \
reduce-side joins in an avro M/R job, and am not sure how to do it: are there any \
best-practice tips or outlines of what one would have to implement in order to make \
this possible?</div><div><br></div><div>Thanks,</div><div><br></div><div>Andrew \
Kenworthy</div></div></div></div></blockquote></span></div> </div><meta \
http-equiv="x-dns-prefetch-control" content="on"><br><br> </div> </div> \
</blockquote></div> </div></body></html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic