[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-poweredge
Subject: Re: [Linux-PowerEdge] 2 predicted failure disks and RAID5
From: Stephen Dowdy <sdowdy () ucar ! edu>
Date: 2017-11-14 19:17:09
Message-ID: 10d8a69e-1879-3886-4e3b-88a16a1a0419 () ucar ! edu
[Download RAW message or body]
On 11/14/2017 11:52 AM, Grzegorz Bakalarski wrote:
> Thanks for valuable input.
> Regarding punctured block: from fwtermlog I got several (not much) lines of type:
>
> 11/13/17 3:24:45: EVT#08603-11/13/17 3:24:45: 97=Puncturing bad block on PD 02(e0x20/s2) \
> at 9ecd
that's bad. You have a punctured stripe.
> T35: maintainPdFailHistory=0 disablePuncturing=0 zeroBasedEnclEnumeration=1 \
> disableBootCLI=1
This is and informational line indicating that the controller doesn't have the \
disablePuncturing config option set.
> All the same PD, the same bad block (different time)
>
> Is my raid useless?
No, it's good enough to recover what data you can before you rebuild it. However, you can't \
trust the data that uses the bad block. You'll get a read error from any object that maps to \
it.
Here's a good doc Dell put out:
https://www.dell.com/support/article/us/en/4/438291#2
"...If the data within a punctured stripe is accessed errors will continue to be reported \
against the affected badLBAs with no possible correction available. Eventually (this could be \
minutes, days, weeks, months, etc.), the Bad Block Management (BBM) Table will fill up causing \
one or more drives to become flagged as predictive failure.,,,:
> BTW: why do think raid level migration to raid-6 with 2 additional disk would be better than \
> with one disk. I would keep VD size the same.
I'm not talking about a migration, i'm talking a complete WIPE of what you have, and a \
recreation from scratch. At this point, you can recover what you can to a staging location, \
rebuild, then restore. Keep track of data with I/O errors, because it's going to have a \
corrupted block at the punctured block address. this could (if you're lucky), be in \
unallocated space. could also be in filesystem structures and lead to widescale corruption of \
the filesystem.
I would mount it all READONLY and do a file-level dump (not a 'dd' or anything like that, which \
would migrate corrupted filesystem structures). (i typically 'rsync' data to another \
machine.). You don't want any backup tool that does infinite retries, as it'll likely result \
in another disk failure. (from the above)
> Anyway will migration too raid-6 fail with this "awful Puncturing)???
RAID-6 is going to lessen the likelihood of a puncture, with 2 parity drives. While you're \
rebuilding a RAID5, any unrecoverable bad block event on any of the "good" drives during the \
rebuild will result in a puncture, with RAID6, you still have parity to cope with an \
uncorrectable error.
The above is especially true of some of the less reliable seagate drives from past years. You \
can't count on them not throwing UCEs during a rebuild (or before you get the replacement drive \
installed), thereby puncturing the RAID. :-(
--stephen
--
Stephen Dowdy - Systems Administrator - NCAR/RAL
303.497.2869 - sdowdy@ucar.edu - http://www.ral.ucar.edu/~sdowdy/
_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic