[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    [Lustre-discuss] Redhat cluster failover
From:       gmontagner () sorint ! it (Giacomo Montagner)
Date:       2009-06-26 13:24:35
Message-ID: 1246022675.3957.13.camel () ragnarok ! mydomain ! org
[Download RAW message or body]

Hi Daire, 
it seems good, did you try it? You might as well check for some 

/proc/fs/lustre/obdfilter/<OST> 

entry, to see if the OST is mounted and working well. 

Bye, 
Giacomo


On Wed, 2009-06-24 at 11:45 +0100, Daire.Byrne at framestore.com wrote:
> Giacomo,
> 
> I had not considered using RHCS's mount filesystem plugin "fs.sh". I was thinking \
> of just using the "script" plugin with mount/umount commands in it. As far as I can \
> tell the main advantage of this is that it is trivial to add checks to the "status" \
> return to notify RHCS when an OST has had a failure (e.g. \
> /proc/fs/lustre/health_check). I have included a quick proof of concept (untested). \
>  My idea is to create symlinks to this script named after the OST devices (e.g. \
> delta-OST0000 -> lustre.init) and then add them as script services in RHCS. Are \
> there more rigorous checks that people do to check the health of a lustre mount \
> other than just checking /proc/fs/lustre/health_check ? 
> Daire
> 
> ----- "Giacomo Montagner" <gmontagner at sorint.it> wrote:
> 
> > On Tue, 2009-06-23 at 12:52 +0100, Daire.Byrne at framestore.com wrote:
> > > Hi,
> > > 
> > > I know that heartbeat is the preferred failover application for
> > Lustre but I want to evaluate Redhat's cluster suite again. It used to
> > be pretty ropey in the RHEL4 days but I'm led to believe it is much
> > improved in RHEL5. I was wondering if anyone is currently using this
> > with Lustre and if so could you share your init.d script to help get
> > me started? Any other advice or thoughts gratefully accepted.
> > > 
> > > Regards,
> > > 
> > > Daire 
> > 
> > Hi! 
> > I'm using RHCS on RHEL 5.3 in a test environment (VMware virtual 
> > machines, nothing special) to failover an MGS, an MDT and four OST's 
> > over 2 VM. It works pretty well, I only needed to modify the original
> > 
> > fs.sh resource agent script and disable almost every check - the only
> > 
> > surviving check, by now, is "it's mounted/it's not mounted". I would 
> > like to rewrite the RA script to make it work better (with some 
> > effective check to see if a target is really working as it should) but
> > I
> > hadn't time yet. I attach the RA script. It's ugly, and maybe some 
> > comment is completely nonsense or out-of-place. And perhaps my English
> > 
> > gets often funny (let's say funny). 
> > I'm using LVM-HA to ensure no device gets mounted twice, but it should
> > 
> > be an unbearable overhead in a true production environment (I think).
> > 
> > Maye the lustre MMP is enough.
> > 
> > Bye!
> > Giacomo
> > 
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss at lists.lustre.org
> > > http://lists.lustre.org/mailman/listinfo/lustre-discuss


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic