'Re: [Linux-ha-dev] [PATCH] change timeouts, startup behaviour ocf:heartbeat:ManageVE (OpenVZ VE clus'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    Re: [Linux-ha-dev] [PATCH] change timeouts, startup behaviour	ocf:heartbeat:ManageVE (OpenVZ VE clus
From:       Dejan Muhamedagic <dejan () suse ! de>
Date:       2013-04-03 15:52:11
Message-ID: 20130403155210.GA3757 () squib
[Download RAW message or body]

Hi,

On Thu, Mar 21, 2013 at 02:59:17PM +0000, Tim Small wrote:
> On 13/03/13 16:18, Dejan Muhamedagic wrote:
> > On Tue, Mar 12, 2013 at 12:58:44PM +0000, Tim Small wrote:
> > 
> > > The attached patch changes the behaviour of the OpenVZ virtual machine
> > > cluster resource agent, so that:
> > > 
> > > 1. The default resource stop timeout is greater than the hardcoded
> > > 
> > Just for the record: where is this hardcoded actually? Is it
> > also documented?
> > 
> 
> Defined here:
> 
> http://git.openvz.org/?p=vzctl;a=blob;f=include/env.h#l26
> 
> /** Shutdown timeout.
> */
> #define MAX_SHTD_TM             120
> 
> 
> 
> Used by env_stop() here:
> 
> http://git.openvz.org/?p=vzctl;a=blob;f=src/lib/env.c#l821
> <http://git.openvz.org/?p=vzctl;a=blob;f=src/lib/env.c;h=2da848d87904d9e572b7da5c0e7dc5d93217ae5b;hb=HEAD#l818>
>  
> 
> 
> for (i = 0; i < MAX_SHTD_TM; i++) {
> sleep(1);
> if (!vps_is_run(h, veid)) {
> ret = 0;
> goto out;
> }
> }
> 
> kill_vps:
> logger(0, 0, "Killing container ...");
> 
> 
> 
> Perhaps something based on wall time would be more consistent, and I can
> think of cases where users might want it to be a bit higher, or a bit
> lower, but currently it's just fixed at 120s.
> 
> 
> I can't find the timeout documented anywhere.

That makes it hard to reference in other software products. But
we can anyway increase the advised timeout in the metadata.

> > > 2. The start operation now waits for resource startup to complete i.e.
> > > for the VE to "boot up" (so that the cluster manager can detect VEs
> > > which are hanging on startup, and also throttle simultaneous startups,
> > > so as not-to overburden the node in question).  Since the start
> > > operation now does a lot more, the default start operation timeout has
> > > been increased.
> > > 
> > I'm not sure if we can introduce this just like that. It changes
> > significantly the agent's behaviour.
> > 
> 
> Yes.  I think it probably makes the agent's behavour a bit more correct,
> but that depends what your definition of a VE resource having "started"
> is, I suppose.  Currently with this agent the says that it has started
> as soon as it has begun the boot process, whereas with the proposed
> change, it would mean that it has started when it has booted up (which
> should imply "is operational").
> 
> Although my personal reason for the change was so that I had a
> reasonable way to avoid booting tens of VEs on the host machine at the
> same time, I can think of other benefits - such as making other
> resources depend on the fully-booted VE, or detecting the case where a
> faulty VE host node causes the VE to hang during start-up.
> 
> 
> I suppose other options are:
> 
> 1. Make start --wait the default, but make starting without waiting
> selectable using a RA parameter.
> 
> 2. Make start without waiting the default, but make --wait selectable
> using a RA parameter.
> 
> 
> I suppose that the change will break configurations where the
> administrator has hard coded a short timeout, and this change is
> introduced as part of an upgrade, which I suppose is a bad thing...

Yes, it could be so. I think that we should go for option 2.

> > BTW, how does vzctl know when the VE is started?
> > 
> 
> The vzctl manual page says that 'vzctl start --wait' will "attempt to
> wait till the default runlevel is reached" within the container.

OK. Though that may mean different things depending on which
init system is running.

> > If the description above matches
> > the code modifications, then there should be three instead of
> > one patch.
> > 
> 
> Fair enough - I was being lazy!

> )

Cheers,

Dejan

> 
> Tim.
> 
> -- 
> South East Open Source Solutions Limited
> Registered in England and Wales with company number 06134732.  
> Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
> VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309
> 

> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[prev in list] [next in list] [prev in thread] [next in thread]