[prev in list] [next in list] [prev in thread] [next in thread] 

List:       zfs-discuss
Subject:    Re: [zfs-discuss] What's wrong with LSI 3081 (1068) + expander + (bad) SATA
From:       Jeff Bacon <bacon () walleyesoftware ! com>
Date:       2012-04-09 13:23:18
Message-ID: B5AF220754B7E141B9DFAE8153874440966C4E () MN-MAILSTORE1 ! walleyetrading ! net
[Download RAW message or body]

> Out of curiosity, are there any third-party hardware vendors
> that make server/storage chassis (Supermicro et al) who make
> SATA backplanes with the SAS interposers soldered on?

There doesn't seem to be much out there, though I haven't looked. 

> Would that make sense, or be cheaper/more reliable than
> having extra junk between the disk and backplane connectors?
> (if I correctly understand what the talk is about? ;)

Honestly, probably not. Either the expander chip properly handles
SATA-tunneling, or it doesn't. Shoving an interposer in just 
throws a band-aid on the problem IMO. 

> ZFS was very attractive at first because of the claim that
> "it returns Inexpensive into raId" and can do miracles
> with SATA disks. Reality has shown to many of us that
> many SATA implementations existing in the wild should
> be avoided... so we're back to good vendors' higher end
> expensive SATAs or better yet SAS drives. Not inexpensive
> anymore again :(

Many != all. Not that I've tried a whole bunch of them, mind
you.However, I've found all of the SuperMicro SAS1 backplanes
to be somewhat problematic with SATA drives, especially if
you use the 1068-based controllers. It was horrible with the
really old single-digit-Phase firmware. I find it...
acceptable...with 2008-based controllers.

I've finally settled on having one box based on 1068s (3081s)
and I think it's up to 4 or 5 expanders of 16 1TB drives.
Basically, the box hangs when certain drives die in certain
ways - it eventually gets over it, mostly, but it can hang
for a bit. I might see the occasional "hang until you yank
the bad disk", but drives don't die THAT often - even
3yr-old-seagate-cudas. Granted, most all the firmware on
the 333ASes has finally been updated to the CC1H version. 

I might note that that box represents most of the collection
of 1068+SAS1-based expanders that I have. It's an archival
system that doesn't do much at all (well, the CPU pounds
like hell running rtgpoll but that's a different matter
having nothing to do with the ZFS pools). I also have a
small pile of leftover 3081s and 3041s if anyone's
interested. :)

Now, I suspect that there is improved LSI firmware available
for the SAS1 expander chips. I could go chasing after it -
SMC doesn't have it public, but LSI probably has it somewhere
and I know an expert I could ask to go through and tweak my
controllers. However, I hadn't met him 3 years ago, and 
now it just isn't worth my time (or worth paying him to do it). 

On the other hand, I have two-hands-worth of CSE847-E26-RJBOD1s
stuffed with 'cuda 2T and 3T SATA drives, connected to 9211-8es
running the phase-10 firmware. One box is up to 170TB worth.
It's fine. Nary an issue. Granted, I'm not beating the arrays
to death - again, that's not what it's for, it's there to hang
onto a bunch of data. But it does get used, and I'm writing
200-300GB/day to it. I have another such JBOD attached to 
a box with a pile of constellations, and it causes no issues.

Frankly, I would say that yes ZFS _does_ do miracles with
inexpensive disks. I can trust 100s of TB to it and not
worry about losing any of it. But it wasn't written by Jesus;
it can't turn water into wine or deal with all of the terrible
variations of Crap out there into enterprise-level replacements
for your EMC arrays. Nor can it cope with having Any Old
Random Crap you have laying around thrown at it - but it does
surprisingly well, IMO. My home box is actually just a bunch
of random drives tied onto 3 3041 controllers on an old
overclocked Q6600 on an ASUS board and it's never had
a problem, not ever. 


> So, is there really a fundamental requirement to avoid
> cheap hardware, and are there no good ways to work around
> its inherently higher instability and lack of dependability?
> 
> Or is it just a harder goal (indefinitely far away on the
> project roadmap)?

ZFS is no replacement for doing your research, homework,
and testing. God only knows I've gone through some crap -
I had a 20pk of WD 2TB Blacks that I bought that turned out
to work for $*%$&. I suppose with enough patience and
effort I could have made them work, but Seagate's firmware
has just simply been more reliable, and the contents of that
box have filtered their way into desktops. (Some of which
are in that home machine mentioned above - attached directly
to the controller, no problems at all.)

If you're going to do BYO for your enterprise needs, be
prepared to fork over the additional cash you are going to
need to spend on test kit - defined both as kit to test
on, and kit you buy, test, and pitch because "it don't
work like Vendor said" or "A don't play nice with B". 
Which sometimes is much cheaper than fighting with
Vendor A and Vendor B about why they can't work together.
Not to mention the R&D time.

I don't avoid cheap hardware. Any number of my colleagues
would say I am insane for running what they would say should
be being run on EMC or NetApp on a handful of Solaris 10
fileservers built off raw SuperMicro boxes. But we can
beat the living * out of these boxes and they work just fine. 

But there's cheap, and Cheap. ZFS can perform miracles,
but only about so many - at least not without work. And
unfortunately as well, the work required to make it cope
with Really Really Cheap S*** is probably somewhat
orthogonal to the desires of paying users, for the
perhaps obvious reason that it's often enough cheaper
to cough up the $ for the slightly-better hardware than
to pay for the people-time to write the solution. 


Further, I might suggest that the solution already _has_
been written - in the form of meta-layers such as Ceph
or GlusterFS, where the fundamental instabilities are 
simply compensated for through duplication. 


-bacon  
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic