[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: Setting number of mappers according to number of TextInput lines
From:       Sachin Aggarwal <different.sachin () gmail ! com>
Date:       2012-06-21 5:17:42
Message-ID: CABaTa1SnM8Vw6md3Y2LtuG7TyYdGsqJf8wjUfehzReD0r4bgBQ () mail ! gmail ! com
[Download RAW message or body]


use like this

    FileInputFormat.setMaxInputSplitSize(job, 2097152);
    FileInputFormat.setMinInputSplitSize(job, 1048576);

size in bytes or u can write ur on split function google it.

On Sun, Jun 17, 2012 at 1:05 PM, Ondřej Klimpera <klimpond@fit.cvut.cz>wrote:

> Hi, I made some progress, combination of NLineInputFormat and
> mapre.max.split.size seems to work, but it is hard to exactly set the byte
> value. Input lines have from 64 to 1024 bytes approx.
>
> What I need is having as much  mappers as possible (use full potential of
> the cluster), where each receives N input lines.
>
>
>
> On 06/17/2012 05:02 AM, Harsh J wrote:
>
>> Ondřej,
>>
>> While NLineInputFormat will indeed give you N lines per task, it does
>> not guarantee that the N map tasks that come out for a file from it
>> will all be sent to different nodes. Which one is your need exactly -
>> Simply having N lines per map task, or N wider distributed maps?
>>
>> On Sat, Jun 16, 2012 at 3:01 PM, Ondřej Klimpera<klimpond@fit.cvut.cz>
>>  wrote:
>>
>>> I tried this approach, but the job is not distributed among 10 mapper
>>> nodes.
>>> Seems Hadoop ignores this property :(
>>>
>>> My first thought is, that the small file size is the problem and Hadoop
>>> doesn't care about it's splitting in proper way.
>>>
>>> Thanks any ideas.
>>>
>>>
>>>
>>> On 06/16/2012 11:27 AM, Bejoy KS wrote:
>>>
>>>> Hi Ondrej
>>>>
>>>> You can use NLineInputFormat with n set to 10.
>>>>
>>>> ------Original Message------
>>>> From: Ondřej Klimpera
>>>> To: common-user@hadoop.apache.org
>>>> ReplyTo: common-user@hadoop.apache.org
>>>> Subject: Setting number of mappers according to number of TextInput
>>>> lines
>>>> Sent: Jun 16, 2012 14:31
>>>>
>>>> Hello,
>>>>
>>>> I have very small input size (kB), but processing to produce some output
>>>> takes several minutes. Is there a way how to say, file has 100 lines, i
>>>> need 10 mappers, where each mapper node has to process 10 lines of input
>>>> file?
>>>>
>>>> Thanks for advice.
>>>> Ondrej Klimpera
>>>>
>>>>
>>>> Regards
>>>> Bejoy KS
>>>>
>>>> Sent from handheld, please excuse typos.
>>>>
>>>>
>>
>>
>


-- 

Thanks & Regards

Sachin Aggarwal
7760502772


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic