[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-dev
Subject:    [jira] Updated: (HADOOP-331) map outputs should be written to a
From:       "Devaraj Das (JIRA)" <jira () apache ! org>
Date:       2006-10-31 18:18:22
Message-ID: 25326315.1162318702278.JavaMail.root () brutus
[Download RAW message or body]

     [ http://issues.apache.org/jira/browse/HADOOP-331?page=all ]

Devaraj Das updated HADOOP-331:
-------------------------------

    Attachment: 331-design.txt

Attaching a revised design spec.

> map outputs should be written to a single output file with an index
> -------------------------------------------------------------------
> 
> Key: HADOOP-331
> URL: http://issues.apache.org/jira/browse/HADOOP-331
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.3.2
> Reporter: eric baldeschwieler
> Assigned To: Devaraj Das
> Attachments: 331-design.txt, 331.txt
> 
> 
> The current strategy of writing a file per target map is consuming a lot of unused \
> buffer space (causing out of memory crashes) and puts a lot of burden on the FS \
> (many opens, inodes used, etc).   I propose that we write a single file containing \
> all output and also write an index file IDing which byte range in the file goes to \
> each reduce.  This will remove the issue of buffer waste, address scaling issues \
> with number of open files and generally set us up better for scaling.  It will also \
> have advantages with very small inputs, since the buffer cache will reduce the \
> number of seeks needed and the data serving node can open a single file and just \
> keep it open rather than needing to do directory and open ops on every request. The \
> only issue I see is that in cases where the task output is substantiallyu larger \
> than its input, we may need to spill multiple times.  In this case, we can do a \
> merge after all spills are complete (or during the final spill).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: \
                http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic