[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: Doubt in reducer
From:       Vladimir Klimontovich <klimontovich () gmail ! com>
Date:       2009-08-27 15:12:06
Message-ID: A01B01DA-6313-4140-BB14-285FEE0280FD () gmail ! com
[Download RAW message or body]

But reducer can do some preparations during map process. It can
distribute map output across nodes that will work as reducers.

Copying and sorting map output is also time costuming process (maybe,
more consuming than reduce itself). For example, piece job run log on  
40node cluster
could be like that:

09/08/27 11:08:24 INFO job.JobRunningListener:  map 36% reduce 10%
09/08/27 11:08:28 INFO job.JobRunningListener:  map 37% reduce 10%
09/08/27 11:08:29 INFO job.JobRunningListener:  map 37% reduce 11%

But if you run job on single node cluster reduce will start only after  
map finished.

On Aug 27, 2009, at 4:31 PM, Harish Mallipeddi wrote:

> On Thu, Aug 27, 2009 at 5:22 PM, Rakhi Khatwani  
> <rkhatwani@gmail.com> wrote:
>
>>
>> but i want my reduce to run , tht is if 25% map is done, thn i want  
>> the
>> reduce 2 save that much data. even if the 2nd map fails, i dont  
>> loose data.
>> any pointers?
>> Regards,
>> Raakhi
>>
>
> What you're asking for will break the semantics of reduce(). Reduce  
> can only
> proceed after receiving all the map-outputs.
>
> -- 
> Harish Mallipeddi
> http://blog.poundbang.in

---
Vladimir Klimontovich,
skype: klimontovich
GoogleTalk/Jabber: klimontovich@gmail.com
Cell phone: +7926 890 2349

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic