[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: Is SAN storage is a good option for Hadoop ?
From:       Steve Loughran <stevel () apache ! org>
Date:       2011-09-29 13:06:59
Message-ID: 4E846D73.2010808 () apache ! org
[Download RAW message or body]

On 29/09/11 13:28, Brian Bockelman wrote:
> 
> On Sep 29, 2011, at 1:50 AM, praveenesh kumar wrote:
> 
> > Hi,
> > 
> > I want to know can we use SAN storage for Hadoop cluster setup ?
> > If yes, what should be the best pratices ?
> > 
> > Is it a good way to do considering the fact "the underlining power of Hadoop
> > is co-locating the processing power (CPU) with the data storage and thus it
> > must be local storage to be effective".
> > *But also, is it better to say “local is better” in the situation where I
> > have a single local 5400 RPM IDE drive, which  would be dramatically slower
> > than SAN storage striped  across many drives spinning at 10k RPM and
> > accessed via fiber channel ?*
> 
> Hi Praveenesh,
> 
> Two things:
> 1) If the option is a single 5400 RPM IDE drive (you can still buy those?) versus \
> high-end SAN, the high-end SAN is going to win.  That's often false comparison: the \
> question is often "What can I buy for $50k?".  In that case (setting aside \
> organizational politics), you can buy more spindles in the "traditional" Hadoop \
>                 setup than for the SAN.
> - Also, if you're latency limited, you're likely working against yourself.  The \
> best thing I ever did for my organization was make our software work just as well \
> with 100ms latency as with 1ms latency. 2) As Paul pointed out, you have to ask \
> yourself whether the SAN is shared or dedicated.  Many SANs don't have the ability \
> to strongly partition workloads between users.. 
> Brian
> 

One more: SAN is a SPOF. [Gray05] includes the impact of a SAN outage on 
MS TerraServer, while [Jiang08] provides evidence that entry level 
FibreChannel storage is less reliable than SATA due to interconnects.

Anyone who criticises the NameNode for being a SPOF and relies on a SAN 
instead is missing something obvious.

[Gray05] Empirical Measurements of Disk Failure Rates and Error Rates
[Jiang08] Are disks the dominant contributor for storage failures?


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic