'RE: [Evms-devel] Problem with degraded RAID-1 regions on a Cluster'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       evms-devel
Subject:    RE: [Evms-devel] Problem with degraded RAID-1 regions on a Cluster
From:       "Thomas Guyot-Sionnest" <Thomas () zango ! com>
Date:       2006-01-11 17:53:50
Message-ID: E345C809C68668438936E25DB7EBF7FFB23F73 () seaex01 ! 180solutions ! com
[Download RAW message or body]

> -----Original Message-----
> From: Steve Dobbelstein [mailto:steved@us.ibm.com]
> Sent: Tuesday, January 10, 2006 5:01 PM
> To: Thomas Guyot-Sionnest
> Cc: evms-devel@lists.sourceforge.net
> Subject: RE: [Evms-devel] Problem with degraded RAID-1 regions on a
> Cluster
> 
> "Thomas Guyot-Sionnest" <Thomas@zango.com> wrote on 12/22/2005 11:43:42
> AM:
> 
> > Some more details on this: my solution did not really fixed the problem.
> As
> > soon as I exported and reimported the Cluster container, the added drive
> got
> > lost (EVMS still had it as a faulty object), even once the array was
> > rebuilt.  With mdadm, second drive was barker as "Empty".
> 
> Let me make sure I understand the sequence of events.  Was the drive
> listed
> as a faulty object after you ran, for example, mdadm --manage /dev/md2
> --add /dev/evms/.nodes/cluster1/sdf and then exported and reimported the
> container?

As far as I recall the drive just disappear... I could try again next week
if you really need that information...

> > In evmsn, I can remove the drive, but can't add it back, even after
> saving
> > changes.
> 
> Are you trying to add the drive back with "Add active" or "Add spare"?  To
> rebuild a degraded array (RAID1 or RAID 5) you add the drive back as a
> spare.  The MD kernel code will then sync the new spare drive into the
> array.

When one drive gets marked as bad by MD, all MD-related options disappear in
evms*, and the duplicate name disappear at the same time. That's why I add
it with mdadm.

Once I add it with mdadm, the MD rebuilds, and as soon as I failover the
cluster container the drive disappears just like if I never added it.
Waiting untill the array is completely rebuilt before failing over doesn't
help.

> Having the double name of the cluster container in the name of the MD
> region is a bug in the EVMS MD code.  There should only be one instance.
> 
> Looking at the MD code, I see that it prepends the cluster container name
> to the array name each time it adds an object to the array.  Since you
> have
> two objects in the array, the name is prepended twice.

You should probably check what's make EVMS add the MD options, since I don't
get them when I loose one drive.

> > > I could see:
> > >
> > > cluster1/md/md2   621.0 GB      X
> > MDRaid1RegMgr
> > > cluster2/md/md0   621.0 GB      X
> > MDRaid1RegMgr
> 
> Since only one disk was found for the array, the cluster container name
> was
> prepended only once, as it should be.
> 
> > > And I had no MD-related options in the context-menu of these regions.
> 
> That is strange.  If the disk is back on-line it should either be marked
> faulty, in which case the "Remove a faulty object" function should be
> available, or it should no longer be a part of the array, in which case
> the
> "Add spare to fix degraded array" or "Add spare object" function should be
> available.

I have none of these options when I loose a drive.

> > > This
> > > time, however, I tried recovering them with mdadm and it worked.
> > >
> > > mdadm --manage /dev/md2 --add /dev/evms/.nodes/cluster1/sdf
> > > mdadm --manage /dev/md0 --add /dev/evms/.nodes/cluster2/sde
> > >
> > > After that I could see the duplicate cluster name and MD options were
> > > back.
> > > The RAID-0 region recovered by itself and I lost all object sitting on
> the
> > > LVM region that were using this MD region (as expected).
> > >
> > > It looks like some portion of the code doesn't recognize a degraded MD
> > > RAID-1 region on a cluster container. Do you have any idea what's
> going
> > > on?
> 
> Not immediately.  I'd like to fix the bug about naming MD regions that are
> in a cluster container.  I wouldn't be surprised if there were other bugs
> in the handling of cluster container objects.
> 
> Steve D.


Thanks,

Thomas

["smime.p7s" (application/x-pkcs7-signature)]
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Evms-devel mailing list
Evms-devel@lists.sourceforge.net
To subscribe/unsubscribe, please visit:
https://lists.sourceforge.net/lists/listinfo/evms-devel

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic