[prev in list] [next in list] [prev in thread] [next in thread]
List: avro-user
Subject: Re: Mapreduce Strings from reader, when Avro is clearly Utf8
From: Marshall Bockrath-Vandegrift <llasram () gmail ! com>
Date: 2013-08-27 21:11:36
Message-ID: 878uzmsut3.fsf () zeno ! atl ! damballa
[Download RAW message or body]
Anna Lahoud <annalahoud@gmail.com> writes:
> I am experiencing a problem and I found that another user wrote in
> about this same issue in March 2013 but there were no replies to his
> question. I am really hoping that there is someone who can explain
> this or offer suggestions. I cut and paste his message in since I
> could only find it in an archive.
>
> I have Avro files that clearly contain Utf8 and if I run
> non-mapreduce, I get Utf8 out. However, with the same files, I get
> String objects back from the mapper. Help!?!?!
There are some confusing differences between the now-named "data models"
used by the `mapred` vs `mapreduce` APIs.
The Generic{Data,Datum{Reader,Writer}} and Specific implementations
generate `Utf8` instances by default. The Reflect implementation
generates `String` instances only(?).
In 1.7.4 and earlier: The `mapred` API defaults to using the Specific
implementations (producing `Utf8`s), but may be configured to use the
Reflect implementations via the `...mapred.AvroJob.setReflect()` method.
The `mapreduce` API uses the Reflect implementations and cannot be
configured – and thus always produces `String` instances. So no dice.
In 1.7.5 (and I hope later): Both the APIs allow you to specify the data
model as a sub-class of `GenericData`. For example:
import org.apache.avro.mapreduce.AvroJob;
....
AvroJob.setDataModelClass(job, GenericData.class);
So-setting the job data model should yield the `Utf8` instances you're
hoping for.
HTH,
-Marshall
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic