'[ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs
From:       Mark Nelson <mnelson () redhat ! com>
Date:       2021-11-04 10:47:50
Message-ID: 156922e2-9d9c-8cda-f329-c27dc5c84979 () redhat ! com
[Download RAW message or body]

Hi Dan,


I can't speak for those specific Toshiba drives, but we have absolutely 
seen very strange behavior (sometimes with cache enabled and sometimes 
not) with different drives and firmwares over the years from various 
manufacturers.  There was one especially bad case from back in the 
Inktank days, but my memory is a bit fuzzy.  I think we were seeing 
weird periodic commit latency spikes that grew worse over time.  That 
one might have been cache related.  I believe we ended up doing a lot of 
tests with blktrace and iowatcher to show the manufacturer what we were 
seeing, but I don't recall if anything ever got fixed.


Mark


On 11/4/21 5:33 AM, Dan van der Ster wrote:
> Hello Benoīt, (and others in this great thread),
> 
> Apologies for replying to this ancient thread.
> 
> We have been debugging similar issues during an ongoing migration to
> new servers with TOSHIBA MG07ACA14TE hdds.
> 
> We see a similar commit_latency_ms issue on the new drives (~60ms in
> our env vs ~20ms for some old 6TB Seagates).
> However, disabling the write cache (hdparm -W 0) made absolutely no
> difference for us.
> 
> So we're wondering:
> * Are we running the same firmware as you? (We have 0104). I wonder if
> Toshiba has changed the implementation of the cache in the meantime...
> * Is anyone aware of some HBA or other setting in the middle that
> might be masking this setting from reaching the drive?
> 
> Best Regards,
> 
> Dan
> 
> 
> 
> On Wed, Jun 24, 2020 at 9:44 AM Benoīt Knecht <bknecht@protonmail.ch> wrote:
> > Hi,
> > 
> > We have a Nautilus (14.2.9) Ceph cluster with two types of HDDs:
> > 
> > - TOSHIBA MG07ACA14TE   [1]
> > - HGST HUH721212ALE604  [2]
> > 
> > They're all bluestore OSDs with no separate DB+WAL and part of the same pool.
> > 
> > We noticed that while the HGST OSDs have a commit latency of about 15ms, the \
> > Toshiba OSDs hover around 150ms (these values come from the \
> > `ceph_osd_commit_latency_ms` metric in Prometheus). 
> > On paper, it seems like those drives have very similar specs, so it's not clear \
> > to me why we're seeing such a large difference when it comes to commit latency. 
> > Has anyone had any experience with those Toshiba drives? Or looking at the specs, \
> > do you spot anything suspicious? 
> > And if you're running a Ceph cluster with various disk brands/models, have you \
> > ever noticed some of them standing out when looking at \
> > `ceph_osd_commit_latency_ms`? 
> > Thanks in advance for your feedback.
> > 
> > Cheers,
> > 
> > --
> > Ben
> > 
> > [1]: https://toshiba.semicon-storage.com/content/dam/toshiba-ss/asia-pacific/docs/product/storage/product-manual/eHDD-MG07ACA-Product-Manual.pdf
> >  [2]: https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/pu \
> > blic/western-digital/product/data-center-drives/ultrastar-dc-hc500-series/data-sheet-ultrastar-dc-hc520.pdf
> >  _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-leave@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-leave@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic