[prev in list] [next in list] [prev in thread] [next in thread]
List: hadoop-user
Subject: Re: Is SAN storage is a good option for Hadoop ?
From: Steve Loughran <stevel () apache ! org>
Date: 2011-09-29 13:06:59
Message-ID: 4E846D73.2010808 () apache ! org
[Download RAW message or body]
On 29/09/11 13:28, Brian Bockelman wrote:
>
> On Sep 29, 2011, at 1:50 AM, praveenesh kumar wrote:
>
> > Hi,
> >
> > I want to know can we use SAN storage for Hadoop cluster setup ?
> > If yes, what should be the best pratices ?
> >
> > Is it a good way to do considering the fact "the underlining power of Hadoop
> > is co-locating the processing power (CPU) with the data storage and thus it
> > must be local storage to be effective".
> > *But also, is it better to say “local is better” in the situation where I
> > have a single local 5400 RPM IDE drive, which would be dramatically slower
> > than SAN storage striped across many drives spinning at 10k RPM and
> > accessed via fiber channel ?*
>
> Hi Praveenesh,
>
> Two things:
> 1) If the option is a single 5400 RPM IDE drive (you can still buy those?) versus \
> high-end SAN, the high-end SAN is going to win. That's often false comparison: the \
> question is often "What can I buy for $50k?". In that case (setting aside \
> organizational politics), you can buy more spindles in the "traditional" Hadoop \
> setup than for the SAN.
> - Also, if you're latency limited, you're likely working against yourself. The \
> best thing I ever did for my organization was make our software work just as well \
> with 100ms latency as with 1ms latency. 2) As Paul pointed out, you have to ask \
> yourself whether the SAN is shared or dedicated. Many SANs don't have the ability \
> to strongly partition workloads between users..
> Brian
>
One more: SAN is a SPOF. [Gray05] includes the impact of a SAN outage on
MS TerraServer, while [Jiang08] provides evidence that entry level
FibreChannel storage is less reliable than SATA due to interconnects.
Anyone who criticises the NameNode for being a SPOF and relies on a SAN
instead is missing something obvious.
[Gray05] Empirical Measurements of Disk Failure Rates and Error Rates
[Jiang08] Are disks the dominant contributor for storage failures?
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic