[prev in list] [next in list] [prev in thread] [next in thread] 

List:       vdsm-devel
Subject:    Re: [ovirt-devel] incoming migration semaphore (and possible SLA consequences)
From:       Martin Betak <mbetak () redhat ! com>
Date:       2015-09-30 11:49:03
Message-ID: 1130471755.60347620.1443613743584.JavaMail.zimbra () redhat ! com
[Download RAW message or body]

+1

----- Original Message -----
From: "Tomas Jelinek" <tjelinek@redhat.com>
To: "Martin Sivak" <msivak@redhat.com>
Cc: "engine-devel@ovirt.org" <devel@ovirt.org>, "Martin Betak" <mbetak@redhat.com>, \
"Francesco Romani" <fromani@redhat.com>, "Martin Polednik" <mpolednik@redhat.com>, \
                "Roy Golan" <rgolan@redhat.com>
Sent: Wednesday, September 30, 2015 1:15:29 PM
Subject: Re: incoming migration semaphore (and possible SLA consequences)



----- Original Message -----
> From: "Martin Sivak" <msivak@redhat.com>
> To: "Tomas Jelinek" <tjelinek@redhat.com>
> Cc: "engine-devel@ovirt.org" <devel@ovirt.org>, "Martin Betak" <mbetak@redhat.com>, \
> "Francesco Romani" <fromani@redhat.com>, "Martin Polednik" <mpolednik@redhat.com>, \
>                 "Roy Golan" <rgolan@redhat.com>
> Sent: Wednesday, September 30, 2015 10:41:42 AM
> Subject: Re: incoming migration semaphore (and possible SLA consequences)
> 
> > Please note that we would also like to enrich the scheduler to be aware of
> > max incoming migrations
> > limit thus preventing the storms, but it is a separate topic (no patches
> > around yet).
> 
> This is easy to do, but please make sure you distinguish migration
> from a VM start.
> 
> It might also be better to only use the number of ongoing migrations
> for penalizing the host. In that case the storm would be smaller and
> an overloaded host would still be a possible migration destination
> when there is no better host to use (because of other constraints for
> example).
> 
> There is also the consideration of what happens when a Maintenance
> mode is triggered. The user might want the "storm" to happen to be
> able to save VMs from a compromised host before it fails completely.
> This might work fine when the scoring approach is used.

The combination of re-try on VDSM and a scoring approach on engine sounds good to me, \
since the storms should not happen, but when they do, they are intentional so VDSM \
should perform the migration as requested by engine.

> 
> Martin
> 
> 
> On Tue, Sep 29, 2015 at 3:35 PM, Tomas Jelinek <tjelinek@redhat.com> wrote:
> > Hi all,
> > 
> > as part of the effort to enhance the migration convergence [1] we are
> > proposing a semaphore for incoming migrations [2] (similar to outgoing).
> > It's purpose is to protect the destination host from migration storms where
> > too many migrations are coming to it from different sources.
> > 
> > There are basically 3 ways how to do it (with pros/cons):
> > 
> > 1: when the destination host refuses the migration, the source host tries
> > it again later (considering no migration will take forever after some time
> > the migration will succeed to start)
> > (+) pros:
> > (+) if the engine wants to migrate to a specific host (and only to the
> > specific host because user did pick it) than it only sends the command
> > and it will happen (now or later)
> > (+) will not interfere with engine re-runs since the migration will fail
> > only when there is a real issue
> > (+) will be consistent with the current outgoing semaphore (since the
> > outgoing semaphore also waits until has capacity and than starts the
> > migration)
> > (+) VDSM is more autonomous because after the engine sends the command,
> > VDSM will do it even if engine disappears in this moment
> > (-) cons:
> > (-) re-try on VDSM is not common
> > (-) if the user does not pick a specific destination and he just wants
> > to migrate the machine out of the source, waiting on the destination to
> > have capacity can be wasteful since failing the migration and picking a
> > different host could lead to better results
> > 
> > 2: when the destination host refuses the migration, the source host returns
> > to engine "migration failed" and the engine will have to handle it somehow
> > (+) pros:
> > (+) simpler vdsm (try to migrate, if the destination does not have
> > capacity, fail)
> > (+) lets the engine to pick a different destination host
> > (-) cons:
> > (-) not consistent with the outgoing migration semaphore (since if there
> > are more VMs waiting for outgoing migrations semaphore, the migration
> > does not fail but waits)
> > (-) engine would have to handle different kinds of migration failed
> > reasons
> > (-) VDSM is not autonomous - if the engine disappears the migration will
> > not be started
> > (-) Here I'm not sure about the consequences to scheduler but I think it
> > would have to be reworked to accommodate the different kinds of re-run.
> > Any ideas from someone more familiar with this? Roy, Martin?
> > 
> > 3: (hybrid) - if the user picks a specific host, VDSM will use the first
> > way, if the user will not pick a specific host, VDSM will use the second
> > option
> > (+) pros:
> > (+) works well with both cases when the intention is to migrate the
> > machine TO A SPECIFIC host and when the intention is just to migrate
> > the VM out to ANY host
> > (-) cons:
> > (-) more complicated VDSM
> > (-) still will interfere with engine scheduling
> > (-) not consistent with current VDSM's outgoing semaphore
> > 
> > The currently proposed patch [2] is the first option.
> > 
> > Please note that we would also like to enrich the scheduler to be aware of
> > max incoming migrations limit thus preventing the storms, but it is a
> > separate topic (no patches around yet).
> > 
> > Here the question is that when the storm happens, how should VDSM protect
> > itself.
> > 
> > Any ideas?
> > 
> > Thank you,
> > Tomas
> > 
> > [1]: www.ovirt.org/Features/Migration_Enhancements
> > [2]: https://gerrit.ovirt.org/#/c/45954/
> > 
> > 
> 
_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic