[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    RE: Performance Tunning
From:       "GOEKE, MATTHEW (AG/1000)" <matthew.goeke () monsanto ! com>
Date:       2011-06-28 18:48:11
Message-ID: E236CD55EC618B4884CA7273B31AA0780731D4EE () stlwexmbxprd04 ! na ! ds ! monsanto ! com
[Download RAW message or body]

Mike,

Somewhat of a tangent but it is actually very informative to hear that you are \
getting bound by I/O with a 2:1 core to disk ratio. Could you share what you used to \
make those calls? We have been using both a local ganglia daemon as well as the \
Hadoop ganglia daemon to get an overall look at the cluster and the items of \
interest, I would assume, would be CPU wait i/o as well as the throughput of block \
operations.

Obviously the disconnect on my side was I didn't realize you were dedicating a \
physical core per daemon. I am a little surprised that you found that necessary but \
then again after seeing some of the metrics from my own stress testing I am noticing \
that we might be over extending with our config on heavy loads. Unfortunately I am \
working with lower specced hardware at the moment so I don't have the overhead to \
test that out.

Matt

-----Original Message-----
From: Michael Segel [mailto:michael_segel@hotmail.com] 
Sent: Tuesday, June 28, 2011 1:31 PM
To: common-user@hadoop.apache.org
Subject: RE: Performance Tunning



Matthew,

I understood that Juan was talking about a 2 socket quad core box.  We run boxes with \
the e5500 (xeon quad core ) chips. Linux sees these as 16 cores.  Our data nodes are \
32GB Ram w 4 x 2TB SATA. Its a pretty basic configuration. 

What I was saying was that if you consider 1 core for each TT, DN and RS jobs, thats \
3 out of the 8 physical cores, leaving you 5 cores or 10 'hyperthread cores'. So you \
could put up 10 m/r slots on the machine.  Note that on the main tasks (TT, DN, RS) I \
dedicate the physical core.

Of course your mileage may vary if you're doing non-standard or normal things.  A \
good starting point is 6 mappers and 4 reducers.  And of course YMMV depending on if \
you're using MapR's release, Cloudera, and if you're running HBase or something else \
on the cluster.

From our experience... we end up getting disk I/O bound first, and then network or \
memory becomes the next constraint. Really the xeon chipsets are really good. 

HTH

-Mike


> From: matthew.goeke@monsanto.com
> To: common-user@hadoop.apache.org
> Subject: RE: Performance Tunning
> Date: Tue, 28 Jun 2011 14:46:40 +0000
> 
> Mike,
> 
> I'm not really sure I have seen a community consensus around how to handle \
> hyper-threading within Hadoop (although I have seen quite a few articles that \
> discuss it). I was assuming that when Juan mentioned they were 4-core boxes that he \
> meant 4 physical cores and not HT cores. I was more stating that the starting point \
> should be 1 slot per thread (or hyper-threaded core) but obviously reviewing the \
> results from ganglia, or any other monitoring solution, will help you come up with \
> a more concrete configuration based on the load. 
> My brain might not be working this morning but how did you get the 10 slots again? \
> That seems low for an 8 physical core box but somewhat overextending for a 4 \
> physical core box. 
> Matt
> 
> -----Original Message-----
> From: im_gumby@hotmail.com [mailto:im_gumby@hotmail.com] On Behalf Of Michel Segel
> Sent: Tuesday, June 28, 2011 7:39 AM
> To: common-user@hadoop.apache.org
> Subject: Re: Performance Tunning
> 
> Matt,
> You have 2 threads per core, so your Linux box thinks an 8 core box has16 cores. In \
> my calcs, I tend to take a whole core for TT DN and RS and then a thread per slot \
> so you end up w 10 slots per node. Of course memory is also a factor. 
> Note this is only a starting point.you can always tune up. 
> 
> Sent from a remote device. Please excuse any typos...
> 
> Mike Segel
> 
> On Jun 27, 2011, at 11:11 PM, "GOEKE, MATTHEW (AG/1000)" \
> <matthew.goeke@monsanto.com> wrote: 
> > Per node: 4 cores * 2 processes = 8 slots
> > Datanode: 1 slot
> > Tasktracker: 1 slot
> > 
> > Therefore max of 6 slots between mappers and reducers.
> > 
> > Below is part of our mapred-site.xml. The thing to keep in mind is the number of \
> > maps is defined by the number of input splits (which is defined by your data) so \
> > you only need to worry about setting the maximum number of concurrent processes \
> > per node. In this case the property you want to hone in on is \
> > mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum. \
> > Keep in mind there are a LOT of other tuning improvements that can be made but it \
> > requires an strong understanding of your job load. 
> > <configuration>
> > <property>
> > <name>mapred.tasktracker.map.tasks.maximum</name>
> > <value>2</value>
> > </property>
> > 
> > <property>
> > <name>mapred.tasktracker.reduce.tasks.maximum</name>
> > <value>1</value>
> > </property>
> > 
> > <property>
> > <name>mapred.child.java.opts</name>
> > <value>-Xmx512m</value>
> > </property>
> > 
> > <property>
> > <name>mapred.compress.map.output</name>
> > <value>true</value>
> > </property>
> > 
> > <property>
> > <name>mapred.output.compress</name>
> > <value>true</value>
> > </property>
> > 
> > 
> This e-mail message may contain privileged and/or confidential information, and is \
> intended to be received only by persons entitled to receive such information. If \
> you have received this e-mail in error, please notify the sender immediately. \
> Please delete it and all attachments from any servers, hard drives or any other \
> media. Other use of this e-mail by you is strictly prohibited. 
> All e-mails and attachments sent and received are subject to monitoring, reading \
> and archival by Monsanto, including its subsidiaries. The recipient of this e-mail \
> is solely responsible for checking for the presence of "Viruses" or other \
> "Malware". Monsanto, along with its subsidiaries, accepts no liability for any \
> damage caused by any such code transmitted by or accompanying this e-mail or any \
> attachment. 
> 
> The information contained in this email may be subject to the export control laws \
> and regulations of the United States, potentially including but not limited to the \
> Export Administration Regulations (EAR) and sanctions regulations issued by the \
> U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC).  As a \
> recipient of this information you are obligated to comply with all applicable U.S. \
> export laws and regulations. 
 		 	   		  


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic