[prev in list] [next in list] [prev in thread] [next in thread] 

List:       avro-user
Subject:    Re: AvroKeyValueInputFormat/AvroKeyValueOutputFormat vs AvroSequenceFileInputFormat/AvroSequenceFile
From:       Martin Kleppmann <mkleppmann () linkedin ! com>
Date:       2014-05-23 8:24:07
Message-ID: 206685CE-B5E8-4D46-A9DE-6833028E49BE () linkedin ! com
[Download RAW message or body]

In general, you're probably better off with \
AvroKeyValueInputFormat/AvroKeyValueOutputFormat, since that generates Avro data \
files which you can read from other applications and other languages. Hadoop sequence \
files aren't really supported by anything other than Hadoop.

If your data remains entirely within Hadoop, there are cases where you might want to \
use sequence files. For example, it might be used for the transient files generated \
during the shuffle (output of mappers being fed into reducers).

Martin

On 20 May 2014, at 16:34, Jim Donofrio <donofrio111@gmail.com> wrote:
> What are the pro's and con's of AvroKeyValueInputFormat/AvroKeyValueOutputFormat vs \
> AvroSequenceFileInputFormat/AvroSequenceFileOutputFormat? Which is more commonly \
> used? 
> They both use AvroKey, AvroValue. The only difference seems to be one serializes \
> into avro data files and other hadoop sequence files. 
> Thanks


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic