'Re: splitting of big files?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: splitting of big files?
From:       Doug Cutting <cutting () apache ! org>
Date:       2008-05-29 17:20:18
Message-ID: 483EE5D2.7090008 () apache ! org
[Download RAW message or body]

Erik Paulson wrote:
> When reading from HDFS, how big are the network read requests, and what
> controls that? Or, more concretely, if I store files using 64Meg blocks
> in HDFS and run the simple word count example, and I get the default of
> one FileSplit/Map task per 64 meg block, how many bytes into the second 64meg
> block will a mapper read before it first passes a buffer up to the record
> reader to see if it has found an end-of-line?

This is controlled by io.file.buffer.size, which is 4k by default.

Doug
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic