[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-btrfs
Subject:    Re: how to replace a failed drive?
From:       Remi Gauvin <remi () georgianit ! com>
Date:       2021-09-02 0:15:41
Message-ID: da5c047c-971d-7dff-3bce-397d0432d49c () georgianit ! com
[Download RAW message or body]

On 2021-09-01 6:07 p.m., Tomasz Chmielewski wrote:
> I'm trying to follow
> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
> to replace a failed drive. But it seems to be written by a person who
> never attempted to replace a failed drive in btrfs filesystem, and who
> never used mdadm RAID (to see how good RAID experience should look like).
> 
> What I have:
> 
> - RAID-10 over 4 devices (/dev/sd[a-d]2)
> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating system
> - it was replaced using hot-swapping - new drive registered itself as
> /dev/sde
> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of other
> btrfs devices
> - because I couldn't remove the faulty device (it wouldn't go below my
> current number of devices) I've added the new device to btrfs filesystem:
> 
> btrfs device add /dev/sde2 /data/lxd
> 
> 
> Now, I wonder, how can I remove the disk which crashed?
> 
> # btrfs device delete /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
> 
> 
> # btrfs device remove /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
> 
> 
> # btrfs filesystem show /data/lxd
> Label: 'lxd5'  uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
>         Total devices 5 FS bytes used 2.84TiB
>         devid    1 size 1.73TiB used 1.60TiB path /dev/sda2
>         devid    3 size 1.73TiB used 1.60TiB path /dev/sdd2
>         devid    4 size 1.73TiB used 1.60TiB path /dev/sdc2
>         devid    6 size 1.73TiB used 0.00B path /dev/sde2
>         *** Some devices missing
> 
> 
> And, a gem:
> 
> # btrfs device delete missing /data/lxd
> ERROR: error removing device 'missing': no missing devices found to remove
> 
> 
> So according to "btrfs filesystem show /data/lxd" device is missing, but
> according to "btrfs device delete missing /data/lxd" - no device is
> missing. So confusing!
> 
> 
> At this point, btrfs keeps producing massive amounts of logs -
> gigabytes, like:
> 
> [39894585.659909] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298373, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660096] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298374, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660288] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298375, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660478] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298376, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660667] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298377, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660861] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298378, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661105] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298379, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661298] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298380, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.747082] BTRFS warning (device sda2): lost page write due to IO
> error on /dev/sdb2
> [39894585.747214] BTRFS error (device sda2): error writing primary super
> block to device 5
> 
> 
> 
> This is REALLY, REALLY very bad RAID experience.
> 
> How to recover at this point?
> 
> 
> Tomasz Chmielewski

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic