[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    [Lustre-discuss] mds and oss failover
From:       cliffw () clusterfs ! com (cliff white)
Date:       2006-05-19 7:36:50
Message-ID: 445A5AE7.2030704 () clusterfs ! com
[Download RAW message or body]

Anselm Strauss wrote:
> hi.
> 
> i've read about the configuration where mds and oss server are on the 
> same host. i was wondering if it is also possible to do failover in such 
> a scenario, especially if it's possible to:
> 
> 1) automatically/manually failover only one service (results in a 
> load-balanced scenario),
> 2) achieve a clean failover if a host crashes and both services must 
> failover at the same time?
> 
> has anyone tried it?

Hi Anselm,

Before 1.4.4 this could deadlock because OST's had to be up before
MDS's, but those deadlocks were fixed in 1.4.6.  There could also be
problems at extreme loads if an MDS and OSS are running on the same
system.

But we think it is safe to try and to answer your questions:
Let's call your services 'mds-foo' and 'oss-foo'

(1) you can failover one service with:

Option 1 - copy your XML file to /etc/lustre/config.xml,  symlink the 
service name to /etc/init.d/lustre, then you can
# ./mds-foo stop || start
or
# ./oss-foo stop || start

Option 2
If you look inside /etc/init.d/lustre, the current options look like
(assume mds-foo)
(start) # lconf --service mds-foo <XML FILE>
(stop)  # lconf --service mds-foo --failover --cleanup <XML FILE>

which expands to these options:
(start)  # lconf --group mds-foo --select mds-foo=HOSTNAME
(stop)   # lconf --group mds-foo --select mds-foo=HOSTNAME --failover 
--cleanup  <XML FILE>

(2) both services can failover

On the primary node, given the above /etc/lustre/config.xml
'/etc/init.d/lustre' will stop and start all the services on the node,
this is done by matching the hostname of the node to the service 
description.
For the secondary node, you would have to start/stop both services 
separately, as the implied match won't happen.

  For most failover software I would always treat the MDS and OSS as 
separate resources - this would make it simple to cover both the 
automatic and manual cases with one set of scripts.

It is worth noting - the MDS generally does quite a bit less work than 
the OSS for most workloads, if the OSS is consuming your server, moving 
the MDS off that server will in most cases not reduce your load very 
much if at all as in many cases the application will do very few 
metadata operations relative to data transactions. Thus the 
'load-balancing' part of your scenario may not be very useful.

I think that CFS believes this works, and it is just because we haven't
had time to test this extensively that we do not officially "support"
it.


- Peter & Cliff -
> 
> sincerely,
> anselm strauss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic