[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-btrfs
Subject: Re: how to replace a failed drive?
From: Remi Gauvin <remi () georgianit ! com>
Date: 2021-09-02 0:15:41
Message-ID: da5c047c-971d-7dff-3bce-397d0432d49c () georgianit ! com
[Download RAW message or body]
On 2021-09-01 6:07 p.m., Tomasz Chmielewski wrote:
> I'm trying to follow
> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
> to replace a failed drive. But it seems to be written by a person who
> never attempted to replace a failed drive in btrfs filesystem, and who
> never used mdadm RAID (to see how good RAID experience should look like).
>
> What I have:
>
> - RAID-10 over 4 devices (/dev/sd[a-d]2)
> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating system
> - it was replaced using hot-swapping - new drive registered itself as
> /dev/sde
> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of other
> btrfs devices
> - because I couldn't remove the faulty device (it wouldn't go below my
> current number of devices) I've added the new device to btrfs filesystem:
>
> btrfs device add /dev/sde2 /data/lxd
>
>
> Now, I wonder, how can I remove the disk which crashed?
>
> # btrfs device delete /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
>
>
> # btrfs device remove /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
>
>
> # btrfs filesystem show /data/lxd
> Label: 'lxd5' uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
> Total devices 5 FS bytes used 2.84TiB
> devid 1 size 1.73TiB used 1.60TiB path /dev/sda2
> devid 3 size 1.73TiB used 1.60TiB path /dev/sdd2
> devid 4 size 1.73TiB used 1.60TiB path /dev/sdc2
> devid 6 size 1.73TiB used 0.00B path /dev/sde2
> *** Some devices missing
>
>
> And, a gem:
>
> # btrfs device delete missing /data/lxd
> ERROR: error removing device 'missing': no missing devices found to remove
>
>
> So according to "btrfs filesystem show /data/lxd" device is missing, but
> according to "btrfs device delete missing /data/lxd" - no device is
> missing. So confusing!
>
>
> At this point, btrfs keeps producing massive amounts of logs -
> gigabytes, like:
>
> [39894585.659909] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298373, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660096] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298374, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660288] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298375, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660478] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298376, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660667] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298377, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660861] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298378, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661105] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298379, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661298] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298380, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.747082] BTRFS warning (device sda2): lost page write due to IO
> error on /dev/sdb2
> [39894585.747214] BTRFS error (device sda2): error writing primary super
> block to device 5
>
>
>
> This is REALLY, REALLY very bad RAID experience.
>
> How to recover at this point?
>
>
> Tomasz Chmielewski
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic