[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mesos-dev
Subject:    Re: Question about the deprecated policy after 1.0
From:       Yan Xu <xujyan () apple ! com>
Date:       2016-09-22 16:26:13
Message-ID: E3F1AB67-FD4B-4550-8020-899F15FB8714 () apple ! com
[Download RAW message or body]


Thanks @haosdent. Sorry there's been offline chats and I was waiting for that to \
circle back to the list.

Comments inlined. Besides this particular issue I think we all agree that we can \
improve the versioning doc to avoid confusion in the future. I'll send a RR for that \
but let's focus on this particular issue first.

> On Sep 14, 2016, at 7:42 PM, haosdent <haosdent@gmail.com> wrote:
> 
> Thx @yan's update!
> 
> > The remaining issue we'd like to discuss with the community is, of the things in \
> > we plan to support in v1 API and hope to deprecate in Mesos 2.0, should we slate \
> > them for deprecation before we have defined their replacement?
> 
> IMO, if we sure a feature would be deprecated in Mesos 2.0, we should deprecate it \
> immediately although could not give a clear replacement at that time. Then users \
> would think that feature is not recommended to use and avoid to use it.

There has been misunderstandings about whether the v1 HTTP health check API is \
supported but I thought that the eventual conclusion was that we are supporting it in \
1.x. See http://mesos.slackarchive.io/health-check/-/1472748463.000051/1473696580.000102/1473235632000088/ \
<http://mesos.slackarchive.io/health-check/-/1472748463.000051/1473696580.000102/1473235632000088/>


But this is not even about it.

If the API is totally not supposed to be used, no need to wait until 2.0, remove it \
today. If the API is supported in v1 and we are not sunsetting this feature, we need \
to provide an "upgrade path" whenever we plan to replace it with something else (in \
addition to conforming to the process here \
<http://mesos.apache.org/documentation/latest/versioning/>)

In either case, "deprecate this API in 2.0 without an alternative" doesn't make \
sense.

This is not a hypothetical issue - there are people who depend on this field.

> 
> Otherwise, we would encounter the same problem again:
> After few releases, we finish the better replacement of that old feature. \
> Unfortunately, we find it has been used for a lot of users because we didn' mark it \
> deprecated before. Then we have to deprecate it in Mesos 3.0. A worse case is we \
> forget and ignore that TODO item as time flows, and left this tech debt to other \
> guys. This is which we should avoid now if we could.

Think about the users who depend on this who find out that alternative API doesn't \
make into the v2 but the old API is removed from it in v2. What should they do?

I don't think we should ever do this.

Like I said, it's perfectly fine if we want to deprecate this in 2.0. To do that, \
let's make sure the alternative API makes into 2.0 first. If we fail to achieve that, \
why shouldn't we postpone the deprecation further?

> 
> Bases on the above two points, I think we should slate the deprecated feature \
> although we don't have defined their replacement. 
> On Thu, Sep 15, 2016 at 4:31 AM, Yan Xu <xujyan@apple.com \
> <mailto:xujyan@apple.com>> wrote: To follow up on this, after discussions we \
> (contributors and reviewers of MESOS-6110) have agreed to support the existing HTTP \
> health check API in Mesos 1.x. (See https://reviews.apache.org/r/51803 \
> <https://reviews.apache.org/r/51803>) 
> The remaining issue we'd like to discuss with the community is, of the things in we \
> plan to support in v1 API and hope to deprecate in Mesos 2.0, should we slate them \
> for deprecation before we have defined their replacement? 
> Two contrasting examples In /r/51803:
> `HTTPCheckInfo::type`:
> - v1 API: setting `type` is not required.
> - Proposal: Deprecate the above in 2.0, i.e., requiring it to be set.
> - This is clearly OK because it's clear to the users how to migrate to the new API \
> -> just set the `type`. 
> `HTTPCheckInfo::statuses`:
> - v1 API: This field can be set, even though not acted on by Mesos default \
>                 executors, they can be used by custom executors.
> - Proposal: Deprecate the above in 2.0, i.e., removing it.
> - IMO we cannot slate this API for deprecation in 2.0 right now because there's no \
> replacement defined yet. I think ultimately we don't want to be in the situation \
> where we remove `statuses` without replacing it with something else. If that's the \
> case, I don't think we need to rush to decide that it has to be deprecated in 2.0. \
> What if the replacement doesn't make it into 2.0? 
> Feel free to provide your feedback here or on https://reviews.apache.org/r/51803 \
> <https://reviews.apache.org/r/51803> where more context may help clarify things. 
> 
> > On Sep 6, 2016, at 12:05 PM, Yan Xu <xujyan@apple.com <mailto:xujyan@apple.com>> \
> > wrote: 
> > Should we think of this not as "whether this change should be subject to the \
> > current policy" but rather "whether this change presents an exception case that \
> > shouldn't be subject to the current policy"? 
> > We shouldn't make exceptions liberally but given that the community as a whole is \
> > new to the 1.0 world, we should allow the policy to evolve into something that \
> > works for both the developers and the frameworks writers/users. If we discover \
> > exceptions we should amend the policy document. 
> > The current policy on this is very clear on this (as Silas has cited): "The \
> > deprecation clock for vN-1 API will start as soon as we release "N.0.0" version \
> > of Mesos. We will strive to give enough time (e.g., 6 months) for \
> > frameworks/operators to upgrade to vN API before we stop supporting vN-1 API." 
> > In this case it translates to "Start the deprecation for this change with Mesos \
> > 2.0 and thrive to not break it until 2.0.0 release date + 6 months". 
> > Can we list the technical challenges that would make adhering to this policy not \
> > worthwhile? If this is the case, how should we change policy document \
> > accordingly? 
> > Separately from the details of this change but motivated by it, I think we need \
> > to improve the policy document on: Clearly state what in the versioned protobufs \
> > that don't constitute as the official API. IMO we should give the developers the \
> > flexibility to add to the API before the feature is fully developed (and they are \
> > not subject to the policy) but in the policy doc we should point out the words to \
> > look for for such exceptions, e.g., "Unimplemented. DO NOT USE." or \
> > "Experimental. Unstable API." Clearly state what to do with deprecations by \
> > developers and users. People will likely appreciate early heads-ups so perhaps \
> > the developer should be able to start warning people as soon as the change is \
> > made. e.g., print a warning message that references a future release: "This \
> > API/flag will be deprecated in Mesos 2.0... " To help folks discover deprecations \
> > we can have a live document that lists the deprecated features by version. \
> > Currently the CHANGELOG file only lists deprecation in the next release so \
> > there's not a place to put the deprecations for 2.0 when we are only at 1.1.0 \
> > (WIP). Thoughts?
> > 
> > Jiang Yan Xu 
> > 
> > On Tue, Sep 6, 2016 at 6:40 AM, Silas Snider <swsnider@apple.com \
> > <mailto:swsnider@apple.com> <mailto:swsnider@apple.com \
> > <mailto:swsnider@apple.com>>> wrote: Responses inline
> > 
> > > On Sep 6, 2016, at 1:33 AM, haosdent <haosdent@gmail.com \
> > > <mailto:haosdent@gmail.com> <mailto:haosdent@gmail.com \
> > > <mailto:haosdent@gmail.com>>> wrote: 
> > > Hi, Silas. Thanks a lot to help test the health check changes recently.
> > > 
> > > According to my understanding about your email, you mentioned two problems:
> > > 
> > > 1. The bug that broken exists HTTP/command health check caused by r50812 \
> > > <https://reviews.apache.org/r/50812 \
> > > <https://reviews.apache.org/r/50812><https://reviews.apache.org/r/50812 \
> > > <https://reviews.apache.org/r/50812>>> and r50996 \
> > > <https://reviews.apache.org/r/50996 <https://reviews.apache.org/r/50996> \
> > > <https://reviews.apache.org/r/50996 <https://reviews.apache.org/r/50996>>> 
> > > > It is now true that even with the proposed change (51560), we will still get \
> > > > tasks rejected with TASK_ERROR in 1.1.0, despite the same exact code working \
> > > > in 1.0.0. Even in the case of the command health checks, which are once again \
> > > > supported in 51560, we now get deprecation warnings, suggesting that mesos \
> > > > will again break us in 1.4.
> > > 
> > > As you mentioned, this is a bug and we definitely should fix before release \
> > > 1.1.0. I have updated r51560 <https://reviews.apache.org/r/51560 \
> > > <https://reviews.apache.org/r/51560> <https://reviews.apache.org/r/51560 \
> > > <https://reviews.apache.org/r/51560>>> yesterday and verify it fix the problem \
> > > via r51635. As you see in the r51560 <https://reviews.apache.org/r/51560 \
> > > <https://reviews.apache.org/r/51560> <https://reviews.apache.org/r/51560 \
> > > <https://reviews.apache.org/r/51560>>>, we make sure the protobuf compatible \
> > > again and didn't lose any fields. Would you help to double check if it fixes \
> > > your problem when you free? It would be highly appreciated that if you could \
> > > help to verify it. 
> > > After this bug fix, we could ensure all tasks with HTTP/command health check \
> > > are not when upgrading to 1.1.0. 
> > 
> > I see those changes now (I'm very very bad at the review board UI, so I'm sorry \
> > if it was always there and I missed it somehow). 
> > > 2. Should we make the `HealthCheck::type` required after v2 ?
> > > 
> > > To be honest, I think 6 months should be enough and it also should be changed \
> > > in v1 because it is a minor change and we didn't make it `required` in protobuf
> > > message level. We still keeping it `option` in protobuf message definition and
> > > add a check about it in Mesos code.
> > > But your concerns make sense as well, so let's see what other users/developers \
> > > say to see if we could make an agreement on this.
> > > 
> > 
> > This is an important point. It doesn't make sense to me that the compatibility \
> > policy is talking about only whether a protobuf field is optional or required — \
> > it seems to me that any change that takes a protocol exchange that did not result \
> > in a TASK_ERROR before, and changes it to cause a TASK_ERROR now, *is* making \
> > that protobuf field semantically required, whether or not the protobuf def says \
> > so. 
> > I'll also point out that there is no definition of ‘minor' change in the \
> > compatibility document, and therefore, whether or not a change appears to be \
> > ‘minor' under some rubric (and I agree that this change could seem minor), if \
> > it's part of the v1 mesos.proto, it affects downstream users (such as me, a \
> > writer of schedulers). 
> > > On Tue, Sep 6, 2016 at 1:18 PM, Silas Snider <swsnider@apple.com \
> > > <mailto:swsnider@apple.com> <mailto:swsnider@apple.com \
> > > <mailto:swsnider@apple.com>> <mailto:swsnider@apple.com \
> > > <mailto:swsnider@apple.com><mailto:swsnider@apple.com \
> > > <mailto:swsnider@apple.com>>>> wrote: There's a little history to this:
> > > 
> > > In https://reviews.apache.org/r/50812/ <https://reviews.apache.org/r/50812/> \
> > > <https://reviews.apache.org/r/50812/ <https://reviews.apache.org/r/50812/>> \
> > > <https://reviews.apache.org/r/50812/ <https://reviews.apache.org/r/50812/> \
> > > <https://reviews.apache.org/r/50812/ <https://reviews.apache.org/r/50812/>>> \
> > > <https://reviews.apache.org/r/50812/ <https://reviews.apache.org/r/50812/> \
> > > <https://reviews.apache.org/r/50812/ <https://reviews.apache.org/r/50812/>> \
> > > <https://reviews.apache.org/r/50812/ \
> > > <https://reviews.apache.org/r/50812/><https://reviews.apache.org/r/50812/ \
> > > <https://reviews.apache.org/r/50812/>>>>, on the 8th of August, the HTTP health \
> > > check message was changed to be entirely incompatible with the previous HTTP \
> > > health check message. Not only was its name changed (breaking compatibility \
> > > with anyone using the feature with libmesos), but the field tags were \
> > > rearranged, making it truly wire-format incompatible. This change also \
> > > introduced a ‘type' field to the HealthCheck message as an optional enum. 
> > > Next, in https://reviews.apache.org/r/50996/ \
> > > <https://reviews.apache.org/r/50996/> <https://reviews.apache.org/r/50996/ \
> > > <https://reviews.apache.org/r/50996/>> <https://reviews.apache.org/r/50996/ \
> > > <https://reviews.apache.org/r/50996/><https://reviews.apache.org/r/50996/ \
> > > <https://reviews.apache.org/r/50996/>>> <https://reviews.apache.org/r/50996/ \
> > > <https://reviews.apache.org/r/50996/> <https://reviews.apache.org/r/50996/ \
> > > <https://reviews.apache.org/r/50996/>> <https://reviews.apache.org/r/50996/ \
> > > <https://reviews.apache.org/r/50996/> <https://reviews.apache.org/r/50996/ \
> > > <https://reviews.apache.org/r/50996/>>>>, on the 13th of August, the health \
> > > checking code was changed to make the new ‘type' field mandatory — if the \
> > > protobuf field is not present, the mesos master rejects your task with \
> > > TASK_ERROR. 
> > > A colleague of mine was testing our internal scheduler against HEAD of mesos, \
> > > and discovered that any task they submitted was being rejected as TASK_ERROR, \
> > > since we were setting health checks, but not sending type. I filed MESOS-6110, \
> > > on the 30th of August, and haosdent huang has kindly created \
> > > https://reviews.apache.org/r/51560/ <https://reviews.apache.org/r/51560/> \
> > > <https://reviews.apache.org/r/51560/ <https://reviews.apache.org/r/51560/>> \
> > > <https://reviews.apache.org/r/51560/ \
> > > <https://reviews.apache.org/r/51560/><https://reviews.apache.org/r/51560/ \
> > > <https://reviews.apache.org/r/51560/>>> <https://reviews.apache.org/r/51560/ \
> > > <https://reviews.apache.org/r/51560/> <https://reviews.apache.org/r/51560/ \
> > > <https://reviews.apache.org/r/51560/>> <https://reviews.apache.org/r/51560/ \
> > > <https://reviews.apache.org/r/51560/> <https://reviews.apache.org/r/51560/ \
> > > <https://reviews.apache.org/r/51560/>>>> to try to fix this. 
> > > In the course of reviewing that fix, I noticed that it only addresses the case \
> > > of a command health check, and does not continue to support HTTP health checks \
> > > in the way they were in 1.0.0. This is a problem for our scheduler, as we have \
> > > ~always (before mesos actually added support) passed our HTTP health checks in \
> > > the message, depending on our custom executor to actually perform the check. It \
> > > is now true that even with the proposed change (51560), we will still get tasks \
> > > rejected with TASK_ERROR in 1.1.0, despite the same exact code working in \
> > > 1.0.0. 
> > > Even in the case of the command health checks, which are once again supported \
> > > in 51560, we now get deprecation warnings, suggesting that mesos will again \
> > > break us in 1.4. 
> > > It is my team's belief that the mesos compatibility guarantee, as documented on \
> > > this page: http://mesos.apache.org/documentation/latest/versioning/ \
> > > <http://mesos.apache.org/documentation/latest/versioning/> \
> > > <http://mesos.apache.org/documentation/latest/versioning/ \
> > > <http://mesos.apache.org/documentation/latest/versioning/>> \
> > > <http://mesos.apache.org/documentation/latest/versioning/ \
> > > <http://mesos.apache.org/documentation/latest/versioning/><http://mesos.apache.org/documentation/latest/versioning/ \
> > > <http://mesos.apache.org/documentation/latest/versioning/>>><http://mesos.apache.org/documentation/latest/versioning/ \
> > > <http://mesos.apache.org/documentation/latest/versioning/> \
> > > <http://mesos.apache.org/documentation/latest/versioning/ \
> > > <http://mesos.apache.org/documentation/latest/versioning/>> \
> > > <http://mesos.apache.org/documentation/latest/versioning/ \
> > > <http://mesos.apache.org/documentation/latest/versioning/> \
> > > <http://mesos.apache.org/documentation/latest/versioning/ \
> > > <http://mesos.apache.org/documentation/latest/versioning/>>>> would prohibit \
> > > this sort of change from occurring. Specifically, the ‘API Versioning' \
> > > section says "The API version is only bumped if we need to make a backwards \
> > > incompatible API change. We will strive to support a given API version for at \
> > > least a year." and under the ‘API compatibility' the change is considered to \
> > > be breaking if it would involve "Adding new required fields to existing \
> > > requests to "/scheduler"." 
> > > The proposed change does indeed add a new required field — ‘type' to the v1 \
> > > api, in the case of command health checks in 6 months, in the case of http \
> > > health checks, immediately. Therefore, it seems clear that this constitutes a \
> > > new ‘v2' api, and it's very clear that 6 months is too short, especially as \
> > > another part of the 'API Versioning' section says "The deprecation clock for \
> > > vN-1 API will start as soon as we release "N.0.0" version of Mesos. […]" 
> > > Please believe me, I understand the need to be able to change broken api and \
> > > implementation quickly, without spending years maintaining technical debt. This \
> > > is why I believe the mesos project decided to move to a model where the \
> > > internal protobufs are separate from the v1/v2/etc. protobufs, and \
> > > evolvers/devolvers are proposed. It seems clear that the right way of doing \
> > > this is to modify the internal protobuf to look the way you'd like (better \
> > > message name, clearer field order, etc.) and write an evolver from the v1 api \
> > > to the internal api. 
> > > Also, I think it's important to note that the compatibility guarantees I'm \
> > > citing are exactly the things that make it possible at all to write a scheduler \
> > > against mesos and actually use it in production. Deciding that this case is too \
> > > insignificant to really bother with the compatibility guarantees means that \
> > > you've just pushed the tech debt issue one level higher to the scheduler \
> > > writers. 
> > > I'm sorry this email ended up so long, but thank you for taking some time to \
> > > read it — I believe that this issue is critical to the ongoing health of the \
> > > mesos project. 
> > > 
> > > > On Sep 5, 2016, at 11:14 AM, haosdent <haosdent@gmail.com \
> > > > <mailto:haosdent@gmail.com> <mailto:haosdent@gmail.com \
> > > > <mailto:haosdent@gmail.com>> <mailto:haosdent@gmail.com \
> > > > <mailto:haosdent@gmail.com><mailto:haosdent@gmail.com \
> > > > <mailto:haosdent@gmail.com>>>> wrote: 
> > > > Hi, folks. As I mentioned in the previous email \
> > > > http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1 \
> > > > <http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1> \
> > > > <http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1 \
> > > > <http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1>> \
> > > > <http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1 \
> > > > <http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1> \
> > > > <http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1 \
> > > > <http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1>>> \
> > > > <http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1 \
> > > > <http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1> \
> > > > <http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1 \
> > > > <http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1>> \
> > > > <http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1 \
> > > > <http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1><http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1 \
> > > > <http://search-hadoop.com/m/0Vlr6Ma9DWqzG3M1>>>>. We have added `type` in the \
> > > > `HealthCheck` protobuf definition in 1.1.0 and health checks without `type` \
> > > > specified will be deprecated since 1.1.0. 
> > > > For backwards compatibility, we still support the command health check if the
> > > > type is not specified for now. But we plan to make `type` become a required \
> > > > field and return `TASK_ERROR` if the type is not specified after 6 months. \
> > > > The question is if this meets the deprecated policy since 1.0 ? If 6 months \
> > > > is too short and we have to deprecate it after 2.0 ?
> > > > 
> > > > Looking forward the answers. Any concerns and questions are appreciated, \
> > > > thanks a lot! 
> > > > --
> > > > Best Regards,
> > > > Haosdent Huang
> > > 
> > > 
> > > 
> > > 
> > > --
> > > Best Regards,
> > > Haosdent Huang
> > 
> > 
> 
> 
> 
> 
> -- 
> Best Regards,
> Haosdent Huang



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic