[prev in list] [next in list] [prev in thread] [next in thread]
List: ceph-users
Subject: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs
From: Igor Fedotov <ifedotov () suse ! de>
Date: 2020-06-24 12:58:47
Message-ID: 207a0eec-7ece-ab24-6520-9659666c14a9 () suse ! de
[Download RAW message or body]
Benoit, thanks for the update.
for the sake of completeness one more experiment please if possible:
turn off write cache for HGST drives and measure commit latency once again.
Kind regards,
Igor
On 6/24/2020 3:53 PM, BenoƮt Knecht wrote:
> Thank you all for your answers, this was really helpful!
>
> Stefan Priebe wrote:
> > yes we have the same issues and switched to seagate for those reasons.
> > you can fix at least a big part of it by disabling the write cache of
> > those drives - generally speaking it seems the toshiba firmware is
> > broken.
> > I was not able to find a newer one.
> Good to know that we're not alone :) I also looked for a newer firmware, to no
> avail.
>
> Igor Fedotov wrote:
> > Benoit, wondering what are the write cache settings in your case?
> >
> > And do you see any difference after disabling it if any?
> Write cache is enabled on all our OSDs (including the HGST drives that don't
> have a latency issue).
>
> To see if disabling write cache on the Toshiba drives would help, I turned it
> off on all 12 drives in one of our OSD nodes:
>
> ```
> for disk in /dev/sd{a..l}; do hdparm -W0 $disk; done
> ```
>
> and left it on in the remaining nodes. I used `rados bench write` to create
> some load on the cluster, and looked at
>
> ```
> avg by (hostname) (ceph_osd_commit_latency_ms * on (ceph_daemon) group_left \
> (hostname) ceph_osd_metadata) ```
>
> in Prometheus. The hosts with write cache _enabled_ had a commit latency around
> 145ms, while the host with write cache _disabled_ had a commit latency around
> 25ms. So it definitely helps!
>
> Mark Nelson wrote:
> > This isn't the first time I've seen drive cache cause problematic
> > latency issues, and not always from the same manufacturer.
> > Unfortunately it seems like you really have to test the drives you
> > want to use before deploying them them to make sure you don't run into
> > issues.
> That's very true! Data sheets and even public benchmarks can be quite
> deceiving, and two hard drives that seem to have similar performance profiles
> can perform very differently within a Ceph cluster. Lesson learned.
>
> Cheers,
>
> --
> Ben
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic