[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-raid
Subject:    Re: [md PATCH 2/2] md: only allow remove_and_add_spares when no sync_thread running.
From:       Shaohua Li <shli () kernel ! org>
Date:       2018-02-19 17:42:31
Message-ID: 20180219174231.mghxr6cjgjol3cym () kernel ! org
[Download RAW message or body]

On Mon, Feb 19, 2018 at 09:08:53AM +1100, Neil Brown wrote:
> On Sun, Feb 18 2018, Shaohua Li wrote:
> 
> > On Sat, Feb 03, 2018 at 09:19:30AM +1100, Neil Brown wrote:
> > > The locking protocols in md assume that a device will
> > > never be removed from an array during resync/recovery/reshape.
> > > When that isn't happening, rcu or reconfig_mutex is needed
> > > to protect an rdev pointer while taking a refcount.  When
> > > it is happening, that protection isn't needed.
> > > 
> > > Unfortunately there are cases were remove_and_add_spares() is
> > > called when recovery might be happening: is state_store(),
> > > slot_store() and hot_remove_disk().
> > > In each case, this is just an optimization, to try to expedite
> > > removal from the personality so the device can be removed from
> > > the array.  If resync etc is happening, we just have to wait
> > > for md_check_recover to find a suitable time to call
> > > remove_and_add_spares().
> > > 
> > > This optimization and not essential so it doesn't
> > > matter if it fails.
> > > So change remove_and_add_spares() to abort early if
> > > resync/recovery/reshape is happening, unless it is called
> > > from md_check_recovery() as part of a newly started recovery.
> > > The parameter "this" is only NULL when called from
> > > md_check_recovery() so when it is NULL, there is no need to abort.
> > > 
> > > As this can result in a NULL dereference, the fix is suitable
> > > for -stable.
> > > 
> > > cc: yuyufen <yuyufen@huawei.com>
> > > Cc: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
> > > Fixes: 8430e7e0af9a ("md: disconnect device from personality before trying to \
> > >                 remove it.")
> > > Cc: stable@ver.kernel.org (v4.8+)
> > > Signed-off-by: NeilBrown <neilb@suse.com>
> > > ---
> > > drivers/md/md.c    |    4 ++++
> > > drivers/md/raid5.c |    4 ++--
> > > 2 files changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > index 4e4dee0ec2de..926542fbc892 100644
> > > --- a/drivers/md/md.c
> > > +++ b/drivers/md/md.c
> > > @@ -8554,6 +8554,10 @@ static int remove_and_add_spares(struct mddev *mddev,
> > > 	int removed = 0;
> > > 	bool remove_some = false;
> > > 
> > > +	if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
> > > +		/* Mustn't remove devices when resync thread is running */
> > > +		return 0;
> > > +
> > > 	rdev_for_each(rdev, mddev) {
> > > 		if ((this == NULL || rdev == this) &&
> > > 		    rdev->raid_disk >= 0 &&
> > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > > index 98ce4272ace9..3fa97dad3837 100644
> > > --- a/drivers/md/raid5.c
> > > +++ b/drivers/md/raid5.c
> > > @@ -4448,12 +4448,12 @@ static void analyse_stripe(struct stripe_head *sh, \
> > > struct stripe_head_state *s)  else if (is_bad) {
> > > 			/* also not in-sync */
> > > 			if (!test_bit(WriteErrorSeen, &rdev->flags) &&
> > > -			    test_bit(R5_UPTODATE, &dev->flags)) {
> > > +			    (test_bit(R5_UPTODATE, &dev->flags) || test_bit(R5_OVERWRITE, \
> > > &dev->flags))) {  /* treat as in-sync, but with a read error
> > > 				 * which we can now try to correct
> > > 				 */
> > > 				set_bit(R5_Insync, &dev->flags);
> > > -				set_bit(R5_ReadError, &dev->flags);
> > > +				//set_bit(R5_ReadError, &dev->flags);
> > 
> > Oh, what's this for?
> 
> That shouldn't have been there.
> I think that change was me experimenting with the problem that writing
> to a known bad block doesn't always mark the block as not bad and more.
> I still had that code in my tree when I made this patch, and didn't
> notice.
> 
> This patch should only have the four line addition to md.c, not the
> change to raid5.  Could you please edit it?

Ok, done.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic