[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openldap-devel
Subject:    LMDB and fsync failures
From:       Howard Chu <hyc () symas ! com>
Date:       2024-02-09 10:09:04
Message-ID: 7cce887c-c1a1-ecf7-7508-c5abf4eec2a8 () symas ! com
[Download RAW message or body]

If anyone remembers fsync-gate https://danluu.com/fsyncgate/ which showed a lot of vulnerabilities in
other popular DBMSs, some other research was published on the topic as well
 https://www.usenix.org/conference/atc20/presentation/rebello

I originally discussed this on twitter back in 2020 but wanted to summarize again here.

As usual with these types of reports, there are a lot of flaws in their test methodology,
which invalidates some of their conclusions.

In particular, I question the validity of the failure scenarios their CuttleFS simulator produces.
Specifically, they claim that multiple systems exhibit False Failures after fsync reports a failure,
but actually (partially) succeeded. In the case of LMDB, where a 1-page synchronous write is involved,
this is just an invalid test.

They assume that the relevant sector that LMDB cares about is successfully written, but an I/O error
occurs on some other sector in the page. And so while LMDB invalidates the commit in memory, a cache
flush and subsequent page-in will read the updated sector. But in the real world, if there are hard
I/O errors on these other sectors, they will most likely also be unreadable, and a subsequent page-in
will also fail. So at least for LMDB, there would be no false failure.

The failure modes they're modeling don't reflect reality.

Leaving that issue aside, there's also the point that modern storage devices are now using 4KB sectors,
and still guarantee atomic sector writes, so the partial success scenario they describe can't even happen.
This is a bunch of academic speculation, with a total absence of real world modeling to validate the
failure scenarios they presented.

The other failures they report, on ext4fs with journaled data, are certainly disturbing. But we always
recommend turning that journaling off with LMDB; it's redundant with LMDB's own COW strategy and harms
perf for no benefit.

Of course, you don't even need to trust the filesystem, you can just use LMDB on a raw block device.

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic