[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-btrfs
Subject: Re: [PATCH 2/2] btrfs: fix and document the zoned device choice in alloc_new_bio
From: David Sterba <dsterba () suse ! cz>
Date: 2022-03-30 15:10:16
Message-ID: 20220330151016.GG2237 () twin ! jikos ! cz
[Download RAW message or body]
On Mon, Mar 28, 2022 at 11:04:26PM +0000, Naohiro Aota wrote:
> On Mon, Mar 28, 2022 at 09:12:40PM +0200, David Sterba wrote:
> > On Fri, Mar 25, 2022 at 09:09:56AM +0000, Johannes Thumshirn wrote:
> > > On 24/03/2022 17:54, Christoph Hellwig wrote:
> > > > Zone Append bios only need a valid block device in struct bio, but
> > > > not the device in the btrfs_bio. Use the information from
> > > > btrfs_zoned_get_device to set up bi_bdev and fix zoned writes on
> > > > multi-device file system with non-homogeneous capabilities and remove
> > > > the pointless btrfs_bio.device assignment.
> > > >
> > > > Add big fat comments explaining what is going on here.
> > >
> > > Looks like the old code worked by sheer luck, as we had wbc set and thus
> > > always assigned fs_info->fs_devices->latest_dev->bdev to the bio. Which
> > > would obviously not work on a multi device FS.
> >
> > No, it worked fine because the real bio is set just before writing the
> > data somewhere deep in the io submit path in submit_stripe_bio().
> >
> > That it has to be set here is because of the cgroup implementation that
> > accesses it, see 429aebc0a9a0 ("btrfs: get bdev directly from fs_devices
> > in submit_extent_page").
> >
> > Which brings me to the question if Christoph's fix is correct because
> > the comment for the wbc + zoned append is assuming something that's not
> > true.
>
> While the real bio is setup in submit_stripe_bio(), we need to set the
Oh sorry I actually wanted to say that the real 'bdev' is set in
submit_stripe_bio (ie. the one where the write is going to be done).
> device destination for bio_add_zone_append_page() called in
> btrfs_bio_add_page(). The bio_add_zone_append_page() checks that the
> bio length is not exceeding max_zone_append_sectors() of the device,
> and checks other hardware restrictions.
Yeah, but can this still mean that it's checking potentially different
devices with different hw restrictions? In alloc_new_bio() it's one and in
submit_stripe_bio() it's a different one.
Before the cgroup writeback was added to bios, the only reason why
bio_set_bdev required the block device is to check if it's the same one
as before and drop some bit:
static inline void bio_set_dev(struct bio *bio, struct block_device *bdev)
{
bio_clear_flag(bio, BIO_REMAPPED);
if (bio->bi_bdev != bdev)
bio_clear_flag(bio, BIO_THROTTLED);
bio->bi_bdev = bdev;
bio_associate_blkg(bio); <-- this was not here
}
So the latest_dev was just a stub to satisfy the bio API requirements.
Please note that its existence spans a long time and things have
changed, I remember that Chris' answer to why we need the latest_dev was
"to put something to the bios". Ie. we don't need it because we have to
write same data to different block devices and distribute that in
submit_stripe_bio(), while the bios have to be set much earlier
expecting a block device.
I'm not sure we have a 1:1 match in what the APIs provide and expect and
what btrfs wants to do. At this point multi-device support for zoned
mode is not complete so we probably won't observe any problems with
hardware with different restrictions.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic