[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mesos-dev
Subject:    Re: New scheduler API proposal: unsuppress and clear_filter
From:       Meng Zhu <mzhu () mesosphere ! com>
Date:       2018-12-11 7:22:03
Message-ID: CAAr6wux2XxVMwkT17XMZjLLeBwEW=qjh8tpZ+Ccr0PY6zKcv0A () mail ! gmail ! com
[Download RAW message or body]


Thanks Ben. Some thoughts below:

From a scheduler's perspective the difference between the two models is:
>
> (1) expressing "how much more" you need
> (2) expressing an offer "matcher"
>
> So:
>
> (1) covers the middle part of the demand quantity spectrum we currently
> have: unsuppressed -> infinite additional demand, suppressed -> 0
> additional demand, and now also unsuppressed w/ request of X -> X
> additional demand
>

I am not quite sure if the middle ground (expressing "how much more")
is needed. Even with matchers, the framework may still find itself to cycle
through several offers before finding the right resource. Setting
"effective limit"
will surely prolong this process. I guess the motivation here is to avoid
e.g. sending
too much resources to a just-unsuppressed framework that only wants to
launch a small task. I would say the inefficiency of flooding the framework
with offers would be tolerable if the framework rejects most offers in time,
as we are making progress. Even in cases where such limiting is desired
(e.g. the number of frameworks is too large), I think it is more appropirate
to rely on operators to configure the cluster prioirty by e.g. setting
limits,
than to expect individual frameworks to perform such altruistc action to
limit its own offers (while still having pending work).


> (2) is a global filtering mechanism to avoid getting offers in an unusable
> shape
>

Yeah, as you mentioned, I think we all agree that adding global matchers to
filter-out undesired resources is a good direction--which I think is what
matters most here. I think the small difference lies in how should the
framework
communicate the information: whether a more declarative approach or
exposing the global matchers to frameworks directly.


> They both solve inefficiencies we have, and they're complementary: a
> "request" could actually consist of (1) and (2), e.g. "I need an additional
> 10 cpus, 100GB mem, and I want offers to contain [1cpu, 10GB mem]".
>
> I'll schedule a meeting to discuss further. We should also make sure we
> come back to the original problem in this thread around REVIVE retries.
>
> On Mon, Dec 10, 2018 at 11:58 AM Benjamin Bannier <
> benjamin.bannier@mesosphere.io> wrote:
>
> > Hi Ben et al.,
> >
> > I'd expect frameworks to *always* know how to accept or decline offers in
> > general. More involved frameworks might know how to suppress offers. I
> > don't expect that any framework models filters and their associated
> > durations in detail (that's why I called them a Mesos implementation
> > detail) since there is not much benefit to a framework's primary goal of
> > running tasks as quickly as possible.
> >
> > > I couldn't quite tell how you were imagining this would work, but let
> me
> > spell out the two models that I've been considering, and you can tell me
> if
> > one of these matches what you had in mind or if you had a different model
> > in mind:
> >
> > > (1) "Effective limit" or "give me this much more" ...
> >
> > This sounds more like an operator-type than a framework-type API to me.
> > I'd assume that frameworks would not worry about their total limit the
> way
> > an operator would, but instead care about getting resources to run a
> > certain task at a point in time. I could also imagine this being easy to
> > use incorrectly as frameworks would likely need to understand their total
> > limit when issuing the call which could require state or coordination
> among
> > internal framework components (think: multi-purpose frameworks like
> > Marathon or Aurora).
> >
> > > (2) "Matchers" or "give me things that look like this": when a
> scheduler
> > expresses its "request" for a role, it would act as a "matcher" (opposite
> > of filter). When mesos is allocating resources, it only proceeds if
> > (requests.matches(resources) && !filters.filtered(resources)). The open
> > ended aspect here is what a matcher would consist of. Consider a case
> where
> > a matcher is a resource quantity and multiple are allowed; if any matcher
> > matches, the result is a match. This would be equivalent to letting
> > frameworks specify their own --min_allocatable_resources for a role
> (which
> > is something that has been considered). The "matchers" could be more
> > sophisticated: full resource objects just like filters (but global), full
> > resource objects but with quantities for non-scalar resources like ports,
> > etc.
> >
> > I was thinking in this direction, but what you described is more involved
> > than what I had in mind as a possible first attempt. I'd expect that
> > frameworks currently use `REVIVE` as a proxy for `REQUEST_RESOURCES`, not
> > as a way to manage their filter state tracked in the allocator. Assuming
> we
> > have some way to express resource quantities (i.e., MESOS-9314), we
> should
> > be able to improve on `REVIVE` by providing a `REQUEST_RESOURCES` which
> > clears all filters for resource containing the requested resources (or
> all
> > filters if no explicit resource request). Even if that let to more offers
> > than needed it would likely still perform better than `REVIVE` (or
> > `CLEAR_FILTERS` which has similar semantics). If we keep the scope of
> these
> > calls narrow and clear we have freedom to be smarter in the future
> > internally.
> >
> > This should not only be pretty straight-forward to implement in Mesos,
> but
> > I'd imagine also map pretty well onto framework use cases (i.e., I assume
> > frameworks are interested in controlling the resources they are offered,
> > not in managing filters we maintain for them).
> >
> > > With regard to incentives, the incentive today for adhering to suppress
> > is that your framework will be doing less processing of offers when it
> has
> > no work to do and that other instances of your own framework as well as
> > other frameworks would get resources faster. The second aspect is indeed
> > indirect. The incentive structure with "request" / "demand" does indeed
> > seem to be more direct (while still having the indirect benefit on other
> > frameworks / roles): "I'll tell you what to show me so that I get it
> > faster".
> >
> > Additionally, by potentially explicitly introducing filters as a
> framework
> > API concept, we ask the majority of framework authors to reason about an
> > aspect they didn't have to worry about up until then (previously: "if
> work
> > arrives, revive, and decline until an offer can be accepted, then
> > suppress"). If we provided them something which fits their *current
> mental
> > model* while also gives them more control, we have a higher chance of it
> > being globally useful and adopted than if we'd add an expert-level knob.
> >
> > > However, as far as performance is concerned, we still need suppress
> > adoption and not just request adoption. Suppress is actually the bigger
> > performance win at the current time, unless we think that frameworks with
> > no work would "effectively suppress" via requests (e.g. "no work? set a 0
> > request so nothing matches"). Note though, that "effectively suppressing"
> > via requests has the same incentive structure as suppress itself, right?
> >
> > I was also wondering about how what I suggested would fit here as we have
> > two concepts controlling if and which offers a framework gets (a single
> > global flag for suppress, and a zoo of many fine-grained filters).
> > Currently we only expose `SUPPRESS`, `DECLINE`, and `REVIVE`. It seems
> that
> > explicitly adding framework control over filters to that might restrict
> > what we can do internally in the future. Right now the API gives us some
> > freedom how we interpret declines, we could e.g., merge filters which
> > expire at the same time, or even interpret filters on all cluster
> resources
> > interchangebly with a suppressed state (the API would actually allow us
> to
> > put a framework into suppressed state -- maybe for some time -- even
> before
> > it has seen all resources). If we exposed filters we loose some of that
> > implementation freedom, and we should make sure it is worth it.
> >
> > As for incentives, if we finally added `REQUEST_RESOURCES` we'd allow
> > frameworks to make their interaction with Mesos more declarative yet
> > conceptually not much harder. Even if we (Mesos) wouldn't be able to
> > implement optimal handling right away, it should could already be useful
> > with an MVP implementation on the Mesos side. Also, it would open up
> > potential for future optimizations with frameworks already "speaking the
> > right protocol".
> >
> >
> >
> > Cheers,
> >
> > Benjamin
> >
> >
>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic