[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs
From:       Anthony D'Atri <anthony.datri () gmail ! com>
Date:       2020-06-24 21:38:11
Message-ID: AADB578D-B268-4A47-B1FA-BD080FAB7DED () gmail ! com
[Download RAW message or body]

The benefit of disabling on-drive cache may be at least partly dependent on the HBA; \
I've done testing of one specific drive model and found no difference, where someone \
else reported a measurable difference for the same model.

> Good to know that we're not alone :) I also looked for a newer firmware, to no \
> avail.

Dell sometimes publishes firmware blobs for drives that they resell, though those \
seem to have customized inquiry strings baked in, and their firmware won't apply to \
"generic" drives without questionable hackery with a hex editor.  

My experience with Toshiba has been that the only way to get firmware blobs for \
generic drives is to persuade Toshiba themselves to give it to you, be it through a \
rep or the CSO.

> 
> Mark Nelson wrote:
> > This isn't the first time I've seen drive cache cause problematic
> > latency issues, and not always from the same manufacturer.
> > Unfortunately it seems like you really have to test the drives you
> > want to use before deploying them them to make sure you don't run into
> > issues.
> 
> That's very true! Data sheets and even public benchmarks can be quite
> deceiving, and two hard drives that seem to have similar performance profiles
> can perform very differently within a Ceph cluster. Lesson learned.

Benchmarks often are in a context rather removed from what anyone would deploy in \
production.

Notably I've had at least two experiences with drives that passed chassis vendor and \
in-house initial qualification.

The first was an HDD.  We had a mix of drives from vendor A and vendor B.  Found that \
Vendor B's drives were throwing read errors at 30x the rate of Vendor A's.  After \
persisting for months through the layers I was finally able to send drives to the \
vendor's engineers, who found at least one design flaw that was tickled by the op \
pattern of a Filestore (XFS) OSD with colo journal.  Firmware was not able to \
substantially fix the problem, so they all had to be replaced with Vendor A.  Today \
BlueStore probably would not trigger the same design flaw.


The second was an SSD that was marketed as "enterprise" but had certain things that \
would only properly housekeep if allowed long idle times.  In that case I was \
eventually able to work with the vendor for a firmware fix.  In this case, BlueStore \
seemed to correlate with the behavior, as well as a serial number range.  This was \
one that didn't manifest until drives had been in production for at least 90 days and \
as workload increased.


Moral of the story is to stress-test every model of drive if you care about data \
durability, availability, and performance.  Throw increasingly busy workloads and \
queue depths against the drives; performance of some will hit an abrupt cliff at a \
certain point.



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic