'[ogfs-users]Problems with opengfs + opendlm on RHEL 3'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       opengfs-users
Subject:    [ogfs-users]Problems with opengfs + opendlm on RHEL 3
From:       Marc Swanson <mswanson () sonitrol ! net>
Date:       2004-07-12 20:58:29
Message-ID: 1089665908.10550.714.camel () wsmis3 ! sonitrol ! net
[Download RAW message or body]

Hello,

to be brief, I cannot make opendlm and opengfs play nice on RHEL 3 no
matter what variation of kernels or CVS/production code I try to run.  I
run into the same issues as the person who wrote this post: 
http://sourceforge.net/mailarchive/message.php?msg_id=7300387

which seems to have no clear resolution.  essentially my two nodes sit
in state RC_INIT_WAIT forever.  This is on a plain vanilla 2.4.22 kernel
with the patches applied.  Also, the 'dlmdu' process dies with this
message in the logs: "Admin client pid[9985] closed device!  Shut down."

On a previous setup where I ran the stock RHEL kernel patched (with many
mods to fix incompatibilities with redhatisms) the dlmdu processes would
not die, but I'd get a kernel panic as soon as I tried to mount the
filesystems.

I guess my real question:  is anyone running RHEL or any other kernel
with NPTL (which I suspect is the root of some of these problems) such
as Fedora Core running opengfs and opendlm with success?  Where am I
going wrong?  My heartbeat implementation perhaps?  I used ultramonkey
rpms first for the heartbeat stuff, then tried compiling from source
with the same problems on both setups (staying in the RC_INIT_WAIT
state.. and yes, I tried 'export LD_ASSUME_KERNEL=2.4 per the docs').
 

Further details..:  Our cluster environment is an HP DL380 server
connected via shared scsi (MSA 500) to a storage box.

The nodes are installed with White Box Enterprise Linux 3, which is
essentially the same as Redhat Enterprise Linux 3.  We have tried
patching the stock kernel (which took numerous mods to the kernel
patches, as well as mods to the opengfs source.. mostly due to the
redhatisms such as sigmask_lock vs. sighand->siglock, etc.)

I _HAVE_ had success with memexp but under testing I ran into a scary
unrecoverable scenario.. if you kill memexpd on the lock storage server
it kernel panics that server, and renders the mount unable to be mounted
(even after an ogfsck, with -o lockproto=nolock).  I had to wipe the
mount clean and start over to get it working again.

Any helps is GREATLY appreciated!!

Thanks!


-Marc Swanson-




-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Opengfs-users mailing list
Opengfs-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opengfs-users
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic