'RE: question on HDFS block distribution'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    RE: question on HDFS block distribution
From:       "Hairong Kuang" <hairong () yahoo-inc ! com>
Date:       2007-05-23 0:57:45
Message-ID: 007901c79cd5$60356e70$d09215ac () ds ! corp ! yahoo ! com
[Download RAW message or body]

This is done on purpose to improve the write performance. In practice, we
run map/reduce jobs on the cluster so every node in the cluster gets an
equal chance of writing. A single node data uploading as described in your
email is normally carried out at an off-cluster node. So imbalanced data
distribution should not be a problem.

Hairong

-----Original Message-----
From: moonwatcher32329@yahoo.com [mailto:moonwatcher32329@yahoo.com] 
Sent: Tuesday, May 22, 2007 4:18 PM
To: hadoop-user@lucene.apache.org
Subject: question on HDFS block distribution


  hi guys, when a file being copied to HDFS, it seems that HDFS always
writes the first copy of a block to the data node running on the  machine
that invoked the copy, and the data nodes for the replicas are  selected
evenly from the remaining data nodes. so, for example, on a 5  node cluster
with replication factor set to 2, if i copy a N-byte file  from node 1, then
node 1 will use up N bytes and nodes 2,3,4,5 will use  up N/4 bytes each.
  is this a known issue, or there any way to configure HDFS so that the
blocks are distributed evenly (so with each node using up 2*N/5 bytes  in
this case)?
  thanks,
  
  
  
       
---------------------------------
Get the free Yahoo! toolbar and rest assured with the added security of
spyware protection. 


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic