[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-btrfs
Subject:    Re: [PATCH 2/2] btrfs: fix and document the zoned device choice in alloc_new_bio
From:       David Sterba <dsterba () suse ! cz>
Date:       2022-03-30 15:10:16
Message-ID: 20220330151016.GG2237 () twin ! jikos ! cz
[Download RAW message or body]

On Mon, Mar 28, 2022 at 11:04:26PM +0000, Naohiro Aota wrote:
> On Mon, Mar 28, 2022 at 09:12:40PM +0200, David Sterba wrote:
> > On Fri, Mar 25, 2022 at 09:09:56AM +0000, Johannes Thumshirn wrote:
> > > On 24/03/2022 17:54, Christoph Hellwig wrote:
> > > > Zone Append bios only need a valid block device in struct bio, but
> > > > not the device in the btrfs_bio.  Use the information from
> > > > btrfs_zoned_get_device to set up bi_bdev and fix zoned writes on
> > > > multi-device file system with non-homogeneous capabilities and remove
> > > > the pointless btrfs_bio.device assignment.
> > > > 
> > > > Add big fat comments explaining what is going on here.
> > > 
> > > Looks like the old code worked by sheer luck, as we had wbc set and thus
> > > always assigned fs_info->fs_devices->latest_dev->bdev to the bio. Which 
> > > would obviously not work on a multi device FS.
> > 
> > No, it worked fine because the real bio is set just before writing the
> > data somewhere deep in the io submit path in submit_stripe_bio().
> > 
> > That it has to be set here is because of the cgroup implementation that
> > accesses it, see 429aebc0a9a0 ("btrfs: get bdev directly from fs_devices
> > in submit_extent_page").
> > 
> > Which brings me to the question if Christoph's fix is correct because
> > the comment for the wbc + zoned append is assuming something that's not
> > true.
> 
> While the real bio is setup in submit_stripe_bio(), we need to set the

Oh sorry I actually wanted to say that the real 'bdev' is set in
submit_stripe_bio (ie. the one where the write is going to be done).

> device destination for bio_add_zone_append_page() called in
> btrfs_bio_add_page(). The bio_add_zone_append_page() checks that the
> bio length is not exceeding max_zone_append_sectors() of the device,
> and checks other hardware restrictions.

Yeah, but can this still mean that it's checking potentially different
devices with different hw restrictions? In alloc_new_bio() it's one and in
submit_stripe_bio() it's a different one.

Before the cgroup writeback was added to bios, the only reason why
bio_set_bdev required the block device is to check if it's the same one
as before and drop some bit:

static inline void bio_set_dev(struct bio *bio, struct block_device *bdev)
{
	bio_clear_flag(bio, BIO_REMAPPED);
	if (bio->bi_bdev != bdev)
		bio_clear_flag(bio, BIO_THROTTLED);
	bio->bi_bdev = bdev;
	bio_associate_blkg(bio);		<-- this was not here
}

So the latest_dev was just a stub to satisfy the bio API requirements.
Please note that its existence spans a long time and things have
changed, I remember that Chris' answer to why we need the latest_dev was
"to put something to the bios". Ie. we don't need it because we have to
write same data to different block devices and distribute that in
submit_stripe_bio(), while the bios have to be set much earlier
expecting a block device.

I'm not sure we have a 1:1 match in what the APIs provide and expect and
what btrfs wants to do. At this point multi-device support for zoned
mode is not complete so we probably won't observe any problems with
hardware with different restrictions.
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic