[prev in list] [next in list] [prev in thread] [next in thread] 

List:       evms-devel
Subject:    Re: [Evms-devel] Problems with RAID5 when switching between 1.2 and 2.0.1
From:       "Steve Dobbelstein" <steved () us ! ibm ! com>
Date:       2003-06-30 15:07:16
[Download RAW message or body]


Pontus Lidman wrote:
> Hello,
>
> I've recently been upgrading to evms 2.0.1 from evms 1.2 and ran into
> some troubles. When 2.0.1 failed for various reasons, I wanted to
> revert back to 1.2. However, 1.2 has some problems discovering all the
> segments in my RAID5 array. I'm under the impression that switching
> back from 2.0.1 to 1.2 should be possible to do.
>
> My raid array consists of hdd1, hdb and sdb1. It doesn't want to use
> sdb1 for some reason; the array runs in degraded mode; it doesn't try
> to reconstruct anything on sdb1.
>
> When I run the evms command line tool, I see some errors:
> MDRaid5RegMgr: Region md/md0 object index 2 has incorrect major/minor.

I would expect the major/minor numbers to be incorrect.  EVMS 2 uses
device-mapper to create the devices for the partitions.  The device-mapper
which major/minor numbers for the partitions are not the same as the
traditional major/minor numbers for partitions.  EVMS 2 updates the MD
superblocks to use the current (device-mapper) major/minor numbers for the
devices.  So, If you are migrating back from EVMS 2 to EVMS 1, the MD
superblocks will have device-mapper major/minor numbers in them which will
not appear in EVMS 1 since it doesn't use device-mapper.  EVMS 1 will
complain that the major/minor numbers are incorrect.

> MDRaid5RegMgr: Region md/md0 object index 2 is greater than nr_disks.

This doesn't look good.  The number of disks should not have changed.

> MDRaid5RegMgr: Region md/md0 object index 2 is faulty. Array may be
degraded.

This doesn't look good either.  None of the disks should be marked faulty,
unless they were already faulty under EVMS 2.

> MDRaid5RegMgr: Region md/md0 has disk counts that are not correct.

Again, not good.

> MDRaid5RegMgr: MD region md/md0 has inconsistent metadata.  If you elect
not to fix the region at this time, you may do so later.  Changes will not
be written to disk until you select to commit the changes.
>
> The tool offers to fix this problem, but I can't predict what it will
> do to my array. Is it safe to answer 'Fix'?

Judging from the track record of other users who have seen similar
messages, I would not select "Fix".  "Fix" only makes things worse, i.e.,
unusable.

> Any advice is appreciated.

I have been wanting to debug the problem with "Fix", mainly, why it thinks
there is a problem in the first place, and then why it hoses the array
rather than fixing it.  Others who have run into this problem have selected
"Fix" and ended up with unusable arrays.  Unfortunately, it is impossible
to debug what went wrong with "Fix" after the damage has been done.

Would you be willing to work with me and use your system for debugging?  It
would mainly involve applying patches for debug code, running the user
interface, and sending me the log.  If you are willing, you can start with
running the EVMS user interface with "-d everything" (e.g., "evmsgui -d
everything") to set the debug level to log everything and then send me the
log.  The default log for EVMS 1 is /var/log/evmsEngine.log.  The default
log for EVMS 2 is /var/log/evms-engine.log.  Of course, this assumes that
you haven't "fixed" your array yet.

> Regards,
>
> Pontus
>
> Relevant messages from my kernel log are below:
>
> Jun 27 23:05:53 h90 kernel: evms: md core: OUT OF DATE, freshest: hdd1
> Jun 27 23:05:53 h90 kernel: evms: md core: kicking non-fresh sdb1 from
array!
> Jun 27 23:05:53 h90 kernel: evms: md core: kick_rdev_from_array: (sdb1)
> Jun 27 23:05:53 h90 kernel: evms: md core: evms_md_analyze_sbs: [md0]
found former faulty device [number=2]
> Jun 27 23:05:53 h90 kernel: evms: md raid5: raid5_run: device hdd1
operational as raid disk 1
> Jun 27 23:05:53 h90 kernel: evms: md raid5: raid5_run: device hdb
operational as raid disk 0
> Jun 27 23:05:53 h90 kernel: evms: md raid5:  md0, not all disks are
operational -- trying to recover array
> Jun 27 23:05:53 h90 kernel: evms: md raid5: raid5_run: raid level 5 set
md0 active with 2 out of 3 devices, algorithm 2
> Jun 27 23:05:53 h90 kernel: evms: md raid5: RAID5 conf printout:
> Jun 27 23:05:53 h90 kernel: evms: md raid5:  --- rd:3 wd:2 fd:1
> Jun 27 23:05:53 h90 kernel: evms: md raid5:  disk 0, s:0, o:1, n:0 rd:0
us:1 dev:hdb
> Jun 27 23:05:53 h90 kernel: evms: md raid5:  disk 1, s:0, o:1, n:1 rd:1
us:1 dev:hdd1
> Jun 27 23:05:53 h90 kernel: evms: md raid5:  disk 2, s:0, o:0, n:2 rd:2
us:1 dev:<EVMS_NODE_NO_NAME>
> Jun 27 23:05:53 h90 kernel: evms: md raid5: RAID5 conf printout:
> Jun 27 23:05:53 h90 kernel: evms: md raid5:  --- rd:3 wd:2 fd:1
> Jun 27 23:05:53 h90 kernel: evms: md raid5:  disk 0, s:0, o:1, n:0 rd:0
us:1 dev:hdb
> Jun 27 23:05:53 h90 kernel: evms: md raid5:  disk 1, s:0, o:1, n:1 rd:1
us:1 dev:hdd1
> Jun 27 23:05:53 h90 kernel: evms: md raid5:  disk 2, s:0, o:0, n:2 rd:2
us:1 dev:<EVMS_NODE_NO_NAME>
> Jun 27 23:05:53 h90 kernel: evms: md core: recovery thread got woken up
...
> Jun 27 23:05:53 h90 kernel: evms: md core:  [md0] no spare disk to
reconstruct array! -- continuing in degraded mode
> Jun 27 23:05:53 h90 kernel: evms: md core: recovery thread finished ...

Steve D.




-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01
_______________________________________________
Evms-devel mailing list
Evms-devel@lists.sourceforge.net
To subscribe/unsubscribe, please visit:
https://lists.sourceforge.net/lists/listinfo/evms-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic