[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-btrfs
Subject:    Re: Question / Idea regarding fragmentation caused by COW operations
From:       Philipp Gerlach <philipp.gerlach () gmx ! de>
Date:       2023-02-27 17:14:09
Message-ID: aeca7349-ce9f-7956-1dbd-2fce28edd85e () gmx ! de
[Download RAW message or body]



On 27.02.2023 17:36, Hugo Mills wrote:
> On Mon, Feb 27, 2023 at 05:22:50PM +0100, Philipp Gerlach wrote:
>> Hello,
>>
>> I have a question, maybe an idea for a feature, regarding the way how
>> COW is causing fragmentation of files.
>>
>> As I understand it, COW in btrfs works roughly like this:
>>
>> A file=C2=A0which -- because of its size -- uses several extents is edi=
ted.
>> That means that the data within one of the extents will be changed.=C2=
=A0The
>> changes are not written to the original extent, but the changed data fo=
r
>> the extent is written to a new location. Afterwards, the reference to
>> the extent is updated and will then point to=C2=A0the new extent in the=
 new
>> location, and pointing to the other unchanged extents in their original
>> location as before.=C2=A0This way the extents of the file are partly in=
 their
>> original location and partly in a new location, i.e, the file becomes
>> fragmented.
>>
>> Do I understand this more or less correctly?
>>
>> My Idea is the following:
>> I assume that in many use cases the newest version of a file will be th=
e
>> one which is most often read and most probably further edited. Older
>> versions are probably mostly kept for backup / snapshots / archive, so
>> one could assume that they are rarely read and even less edited.
>>
>> Therefore it may be beneficial to switch the way extents are written:
>> 1. Instead of writing the changed extent to a new location, copy the
>> original extent to a new location
>> 2. Update all existing references to that extent to point to the new
>> location
>> 3. Then write the changed extent in the original location
>> 4. Update the reference to the extent for the file which is currently
>> edited to point again in the original location
>>
>> This would mean that one has to pay a bit in terms of performance=C2=A0=
while
>> writing, especially for extents which are referenced a lot (for updatin=
g
>> the old references), but on the upside fragmentation would probably onl=
y
>> rarely be encountered while reading. Plus, if e.g. older versions are
>> indeed only used in snapshots for versioning purposes, the fragmented
>> files would be deleted over time when old snapshots are deleted regular=
ly.
>>
>> Does this make sense?
>     Yes. It's not a new idea. In fact, it's an old one.
>
>     btrfs's CoW is technically RoW (redirect-on-write) -- the new
> extent is redirected to a different location. What you've described is
> actual CoW (copy-on-write), which is, for example, used in some
> database servers, going back decades (Oracle, for example).
>
>     I suspect you'd have to rewrite a large chunk of btrfs to make it
> work with CoW rather than RoW, and you'd end up with what's
> essentially a different filesystem at the end of it.
>
>     Hugo.
>

Thanks for the quick answer. I am still learning about all of this and
I'm grateful for background information since it's not always in the
docs. So I understand that this was a design decision for btrfs, and I
guess it was made in view of write performance.

I still think that it might be for some use cases beneficial to arrange
the extents in this way (actual CoW as I have now learned). So, if btrfs
does it differently,=C2=A0would it be possible to have something like the
autodefrag process which would wait until the disk is idle, and then
just switch the extents in the described way, so that the newest file
would become unfragmented and older ones more fragmented? I understand
that existing defragmentation is using different strategies which
usually increase the used space.

Don't get me wrong, I don't want to question the way btrfs is designed,
I'm just thinking about ways to deal with the fragmentation issue.

Philipp
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic