[prev in list] [next in list] [prev in thread] [next in thread]
List: openldap-devel
Subject: LMDB and fsync failures
From: Howard Chu <hyc () symas ! com>
Date: 2024-02-09 10:09:04
Message-ID: 7cce887c-c1a1-ecf7-7508-c5abf4eec2a8 () symas ! com
[Download RAW message or body]
If anyone remembers fsync-gate https://danluu.com/fsyncgate/ which showed a \
lot of vulnerabilities in other popular DBMSs, some other research was \
published on the topic as well \
https://www.usenix.org/conference/atc20/presentation/rebello
I originally discussed this on twitter back in 2020 but wanted to summarize \
again here.
As usual with these types of reports, there are a lot of flaws in their \
test methodology, which invalidates some of their conclusions.
In particular, I question the validity of the failure scenarios their \
CuttleFS simulator produces. Specifically, they claim that multiple systems \
exhibit False Failures after fsync reports a failure, but actually \
(partially) succeeded. In the case of LMDB, where a 1-page synchronous \
write is involved, this is just an invalid test.
They assume that the relevant sector that LMDB cares about is successfully \
written, but an I/O error occurs on some other sector in the page. And so \
while LMDB invalidates the commit in memory, a cache flush and subsequent \
page-in will read the updated sector. But in the real world, if there are \
hard I/O errors on these other sectors, they will most likely also be \
unreadable, and a subsequent page-in will also fail. So at least for LMDB, \
there would be no false failure.
The failure modes they're modeling don't reflect reality.
Leaving that issue aside, there's also the point that modern storage \
devices are now using 4KB sectors, and still guarantee atomic sector \
writes, so the partial success scenario they describe can't even happen. \
This is a bunch of academic speculation, with a total absence of real world \
modeling to validate the failure scenarios they presented.
The other failures they report, on ext4fs with journaled data, are \
certainly disturbing. But we always recommend turning that journaling off \
with LMDB; it's redundant with LMDB's own COW strategy and harms perf for \
no benefit.
Of course, you don't even need to trust the filesystem, you can just use \
LMDB on a raw block device.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic