[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs
From:       Frank Schilder <frans () dtu ! dk>
Date:       2020-06-25 7:10:26
Message-ID: 69d1c76b7de34d309530e8da2089de9f () dtu ! dk
[Download RAW message or body]

Hi all,

> I did a quick test with wcache off[1]. And have the impression the
> simple rados bench of 2 minutes performed a bit worse on my slow hdd's.

This probably depends on whether or not the drive actually has non-volatile write \
cache. I noticed that from many vendors you can buy the seemingly exact same drive \
for a difference of something like 20$. My best bet is, that the slightly more \
expensive ones have functioning power loss protection hardware that passed the \
quality test and is disabled in the cheaper drives (probably among other things). \
Going for the cheapest version all the time can have its price.

For the disks we are using, my impression is that disabling volatile write cache \
actually adds the volatile cache capacity to the non-volatile write cache. The disks \
start consuming more power, but also perform better with ceph.

For our HDDs I have never seen a degradation, fortunately - or one could say that \
maybe they are so crappy that it couldn't get any worse :). In case our vendor reads \
this, this was a practical joke :)

The main question here is, do you want to risk data loss on power loss? Ceph is \
extremely sensitive to data that was acknowledged as "on disk" by the firmware to \
disappear after power outage. This is different to journaled file systems like ext4, \
which manage to roll back to an earlier consistent version. One looses data but the \
fs is not damaged. Xfs has still problems with that though. With ceph you can loose \
entire pools without a viable recovery option as was described earlier in this \
thread.

> Couldn't we just set (uncomment)
> write_cache = off
> in /etc/hdparm.conf?

I was pondering with that. The problem is, that on Centos systems it seems to be \
ignored, in general it does not apply to SAS drives, for example, and that it has no \
working way of configuring which drives to exclude.

For example, while for data disks for ceph we have certain minimum requirements, like \
functioning power loss protection, for an OS boot drive I really don't care. Power \
outages on cheap drives that loose writes has not been a problem since ext4. A few \
log entries or contents of swap - who cares. Here, performance is more important than \
data security on power loss.

I would require a configurable option that works in the same way for all types of \
protocols, SATA, SAS, NVMe disks, you name it. At time of writing, I don't know of \
any.

Best regards,
=================
Frank Schilder
AIT Risų Campus
Bygning 109, rum S14

________________________________________
From: Marc Roos <M.Roos@f1-outsourcing.eu>
Sent: 25 June 2020 00:01:51
To: paul.emmerich; vitalif
Cc: bknecht; ceph-users; s.priebe
Subject: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

I did a quick test with wcache off[1]. And have the impression the
simple rados bench of 2 minutes performed a bit worse on my slow hdd's.

[1]
IFS=$'\n' && for line in `mount | grep 'osd/ceph'| awk '{print $1"
"$3}'| sed -e 's/1 / /' -e 's#/var/lib/ceph/osd/ceph-##'`;do IFS=' '
arr=($line); service ceph-osd@${arr[1]} stop && smartctl -s wcache,off
${arr[0]} && service ceph-osd@${arr[1]} start ;done


-----Original Message-----
To: Paul Emmerich
Cc: Benoīt Knecht; s.priebe@profihost.ag; ceph-users@ceph.io
Subject: [ceph-users] Re: High ceph_osd_commit_latency_ms on Toshiba
MG07ACA14TE HDDs

Hi, https://yourcmc.ru/wiki/Ceph_performance author here %)

Disabling write cache is REALLY bad for SSDs without capacitors
[consumer SSDs], also it's bad for HDDs with firmwares that don't have
this bug-o-feature. The bug is really common though. I have no idea
where it comes from, but it's really common. When you "disable" the
write cache you actually "enable" the non-volatile write cache on those
drives. Seagate EXOS drives also behave like that... It seems most EXOS
drives have an SSD cache even though it's not mentioned in specs. And it
gets enabled when you do hdparm -W 0. In theory hdparm -W 0 may hurt
linear write performance even on those HDDs, though.

> Well, what I was saying was "does it hurt to unconditionally run
> hdparm -W 0 on all disks?"
> 
> Which disk would suffer from this? I haven't seen any disk where this
> would be a bad idea
> 
> Paul
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
email to ceph-users-leave@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic