'Re: Threads per MapReduce job'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: Threads per MapReduce job
From:       "Aaron Kimball" <aaron () cloudera ! com>
Date:       2008-12-31 10:11:12
Message-ID: d6d7c4410812310211i169107e1wd2044214f4e8f460 () mail ! gmail ! com
[Download RAW message or body]


Michael,

Those two parameters control the number of concurrent threads (actually,
they're in separate processes) per node. In your setup, each tasktracker
will have at most one map task and one reduce task executing at a given
point in time. However, the job may be split into a much larger number of
distinct work units a.k.a. tasks. This is controlled by the size of the
dataset. A map task usually corresponds to one HDFS file chunk -- 64 MB of
data. So you gave it 17 chunks worth of input data, and it enqueued 17
tasks. Of those, your cluster was only executing one map task per node at
any point in time.

A couple other notes:
* Don't change things directly in hadoop-default.xml; the preferred
administration mechanism is to override them by setting them in
hadoop-site.xml
* If you change these parameters while a cluster is running, I think you
have to restart the mapreduce service on the nodes for this to take effect

Cheers,
- Aaron

On Fri, Dec 26, 2008 at 9:19 PM, Michael Miceli
<michael.miceli88@gmail.com>wrote:

> Hi everyone:
> How do I control the number of threads per mapreduce job.  I am using
> bin/hadoop jar wordcount to run jobs and even though I have found these
> settings in hadoop-default.xml and changed the values to 1:
> <name>mapred.tasktracker.map.tasks.maximum</name>
> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>
> The output of the job seems to indicate otherwise.
> 08/12/26 18:21:12 INFO mapred.JobClient:   Job Counters
> 08/12/26 18:21:12 INFO mapred.JobClient:     Launched reduce tasks=1
> 08/12/26 18:21:12 INFO mapred.JobClient:     Rack-local map tasks=12
> 08/12/26 18:21:12 INFO mapred.JobClient:     Launched map tasks=17
> 08/12/26 18:21:12 INFO mapred.JobClient:     Data-local map tasks=4
>
> I have 2 servers running the mapreduce process and the datanode process.
> Thanks,
> Michael
>


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic