'Re: [Evms-devel] Little Advice please! (long sorry)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       evms-devel
Subject:    Re: [Evms-devel] Little Advice please! (long sorry)
From:       Ram Pai <linuxram () us ! ibm ! com>
Date:       2003-04-09 19:26:03
[Download RAW message or body]

On Tue, 8 Apr 2003 rschapel@edtech.mcc.edu wrote:

> Greetings:
>   I am new comer to the world of EVMS, and the first thing I would like
> to say is "Thanks" Already I can see what this approach is going to
> allow us to do. Its going to be awesome, but since I am a new comer I
> was hoping to snag a little advice from people who have been using this
> for a while longer, or just have a better picture of it in their heads.
> I am still a little wishy on the terminology, since I started at LVM,
> shifted to EVMS (Thanks!!) for better cluster support and then read the
> original white paper, which seems to use different
> terminology. That being said the documentation has been GREAT,
> especially the install/user guides (Mucho cudos). My goal is to setup a
> clustered set of servers for file service. Its going to be an
> active-active solution with each sharing a different portion of the
> shared disk. My equipment:

Don't set it up as active-active unless the file-system on top of it
is cluster aware.

Currently there are hardly any cluster aware file-system.
OpenGFS is working on this. SO your best bet currently is to create volumes
setup on private containers(not on shared containers).  I mean create
a private container with a bunch of shared disks in it, and create volumes
using space from that private container.


> 
> 2 x Dell 2650's with Perc 3 DC's
>   1 set Mirrored 18 gig drives hooked up to a PERC 3 DI
> 1 x PowerVault 220 with 13 x 72Gig drives (hooked to both Perc 3 DC's)
> 
> I was thinking I am going to split the disk up into 3 main parts (1
> container) 200G, 200G, 400G (freespace) and then use the freespace
> expand the main partitions as the need arises.  I would then have
> fail-over for whatever time I needed to fix a down server,  I mean slow
> file service is better than no file service.

If its a 2-node configuration I suggest a Stonith-device. 

> 
> I currently have on order a set of external power switches (well soon
> anyways)
> 
> The PERC 3 DC's support a cluster mode in which they should both be able
> to access the disk at the same time, the 13 gig drives are going to be
> in a raid 5 configuration. I haven't done this yet, but I understand
> they can do this (anybody ever use this equipment?)

I assume you are going to use hardware-raid and not raid through software.
Because the software raid driver (md) is not cluster aware.

Also I assume you are going to split your RAID-5 disk into 3 disks,
and create 3 private containers with one disk each. This should work.


> 
> I downloaded and installed HA-Linux and fellow packages (rpm's) for my
> redhat 7.3 install (I have updated/kernel the kernel as required, again
> GREAT docs) They seemed to install correctly. Currently I just have it
> taking over the IP address of the other machine. I am currently in
> "test" mode becoming comfortable with EVMS/HA, so while there is a HB
> line between the two (2 NIC's and cross over) I am not sharing the disk
> yet. When I try to start EVMS I am receiving the following error:

From the  description I get a feeling that Linux-HA is not installed
properly.  Because it is failing to spawn the daemons.


I assume you have the following settings in /etc/ha.d/ha.cf file , on both 
the nodes in the correct order.

respawn hacluster /usr/lib/heartbeat/ccm
respawn root /sbin/evmsd



> 
> Engine: The plug-in linuxha in module /lib/evms/ha-1.0.0.so failed to
> load.  The plug-in's setup_evms_plugin() function failed with error code
> 19: No such device
> 
> I ran the log in detail mode and notice this entry
> 
> {misc stuff}
> Apr 08 15:32:19 eslewis.mcc.edu Engine: load_plugins: Module to load is
> /lib/evms/csm-1.0.0.so Apr 08 15:32:19 eslewis.mcc.edu Engine:
> load_module_plugins: Loaded from /lib/evms/csm-1.0.0.so. Apr 08 15:32:19
> eslewis.mcc.edu Engine: load_module_plugins:   short name:  CSM Apr 08
> 15:32:19 eslewis.mcc.edu Engine: load_module_plugins:   long name:
> Cluster Segment Manager Apr 08 15:32:19 eslewis.mcc.edu Engine:
> load_module_plugins:   version:     1.0.0 {Misc stuff}
> Apr 08 15:32:19 eslewis.mcc.edu Engine: load_plugins: Module to load is
> /lib/evms/ha-1.0.0.so Apr 08 15:32:19 eslewis.mcc.edu Engine:
> load_module_plugins: Loaded from /lib/evms/ha-1.0.0.so. Apr 08 15:32:19
> eslewis.mcc.edu Engine: load_module_plugins:   short name:  linuxha Apr
> 08 15:32:19 eslewis.mcc.edu Engine: load_module_plugins:   long name:
> Cluster Manager LinuxHA Apr 08 15:32:19 eslewis.mcc.edu Engine:
> load_module_plugins:   version:     1.0.0 {misc stuff}
> 
> Apr 08 15:32:19 eslewis.mcc.edu linuxha: sm_cnct: Error: Daemon may not
> be running Apr 08 15:32:19 eslewis.mcc.edu linuxha: ece_init: Error

This indicates that the evmsd daemon is not running.


> connecting to ECE Daemon Apr 08 15:32:19 eslewis.mcc.edu Engine:
> engine_user_message: Message is: Engine: The plug-in linuxha in module
> /lib/evms/ha-1.0.0.so failed to load.  The p lug-in's
> setup_evms_plugin() function failed with error code 19: No such device.
> 
> Does HA have to be actually running to start EVMS with cluster mode
> support? I guess this would make sense, I checked the ha.log and saw this:

HA needs to be running. What this means is the HA daemon and the CCM daemon
needs to be running. To check if the HA daemon's service are available you
can run /usr/lib/heartbeat/api_test
and to check if the CCM daemon's services are available you can run
/usr/lib/heartbeat/ccm_testclient

Without these service the evmsd daemon fails.


> 
> heartbeat: 2003/04/08_13:33:47 info: Link eslewis.mcc.edu:eth1 up.
> heartbeat: 2003/04/08_13:33:49 info: Local status now set to: 'active'
> heartbeat: 2003/04/08_13:33:49 info: Starting child client
> /usr/lib/heartbeat/ccm (504,65)
> heartbeat: 2003/04/08_13:33:49 info: Starting child client /sbin/evmsd (0,0)
> heartbeat: 2003/04/08_13:33:49 info: Starting /usr/lib/heartbeat/ccm as
> uid 504  gid 65 (pid 18305)
> heartbeat: 2003/04/08_13:33:49 info: Link esdev2:eth1 up.
> heartbeat: 2003/04/08_13:33:49 info: Starting /sbin/evmsd as uid 0  gid 0
> (pid 18306)
> heartbeat: 2003/04/08_13:33:49 info: Status update for node esdev2: status up
> heartbeat: 2003/04/08_13:33:49 WARN: Exiting /usr/lib/heartbeat/ccm
> process 18305 returned rc 1.
> heartbeat: 2003/04/08_13:33:49 info: Respawning client
> /usr/lib/heartbeat/ccm:
> heartbeat: 2003/04/08_13:33:49 info: Starting child client
> /usr/lib/heartbeat/ccm (504,65)
> heartbeat: 2003/04/08_13:33:49 info: Status update for node esdev2: status
> active
> heartbeat: 2003/04/08_13:33:49 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2003/04/08_13:33:49 WARN: Exiting /sbin/evmsd process 18306
> returned rc 19.
> 
> This keep repeating:
> heartbeat: 2003/04/08_13:33:52 info: Starting child client /sbin/evmsd (0,0)
> heartbeat: 2003/04/08_13:33:53 info: Starting /usr/lib/heartbeat/ccm as
> uid 504  gid 65 (pid 18464)
> heartbeat: 2003/04/08_13:33:53 WARN: Exiting /usr/lib/heartbeat/ccm
> process 18464 returned rc 1.
> heartbeat: 2003/04/08_13:33:53 info: Respawning client
> /usr/lib/heartbeat/ccm:
> heartbeat: 2003/04/08_13:33:53 info: Starting child client
> /usr/lib/heartbeat/ccm (504,65)
> heartbeat: 2003/04/08_13:33:53 info: Starting /sbin/evmsd as uid 0  gid 0
> (pid 18466)
> heartbeat: 2003/04/08_13:33:53 WARN: Exiting /sbin/evmsd process 18466
> returned rc 19.
> heartbeat: 2003/04/08_13:33:53 info: Respawning client /sbin/evmsd:
> 
> Finally I started getting errors on the respawn rate:
> heartbeat: 2003/04/08_13:33:59 WARN: Exiting /usr/lib/heartbeat/ccm
> process 18484 returned rc 1.
> heartbeat: 2003/04/08_13:33:59 ERROR: Client /usr/lib/heartbeat/ccm
> respawning too fast
> heartbeat: 2003/04/08_13:33:59 info: Starting /sbin/evmsd as uid 0  gid 0
> (pid 18486)
> heartbeat: 2003/04/08_13:33:59 WARN: Exiting /sbin/evmsd process 18486
> returned rc 19.
> heartbeat: 2003/04/08_13:33:59 ERROR: Client /sbin/evmsd respawning too fast

Its hard to tell why heartbeat's ccm daemon is shutting down.
Did you see any log messages from the ccm daemon in the heartbeat's log.
That would be a good indication as to why the ccm daemon is failing.

change the line in /etc/ha.d/ha.cf  from

respawn hacluster /usr/lib/heartbeat/ccm
	to
respawn hacluster /usr/lib/heartbeat/ccm -dv

and restart the heartbeat daemon.
And see if you find any ccm related log messages. If you don't


The approach I would recommend is:

1. shutdown heartbeat on both nodes.
2. comment out the  respawn lines in /etc/ha.d/ha.cf file on both nodes.
	I mean
#respawn hacluster /usr/lib/heartbeat/ccm
#respawn root /sbin/evmsd

3. start heartbeat on both nodes.

4. su - hacluster
	and hand start  the ccm daemon

	/usr/lib/heartbeat/ccm -dv

	see what the messages are.
	It should indicate the reason for the failure

> 
> 
> Well I am going to mess around with the other node, make sure I am running
> the right version of HA, and double check some things, any advice is
> welcome!

Great! Let us know what you find. 
Ram Pai
-- 
Ram Pai
linuxram@us.ibm.com
503-5783752
EVMS: http://www.sf.net/projects/evms
----------------------------------



-------------------------------------------------------
This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger 
for complex code. Debugging C/C++ programs can leave you feeling lost and 
disoriented. TotalView can help you find your way. Available on major UNIX 
and Linux platforms. Try it free. www.etnus.com
_______________________________________________
Evms-devel mailing list
Evms-devel@lists.sourceforge.net
To subscribe/unsubscribe, please visit:
https://lists.sourceforge.net/lists/listinfo/evms-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic