[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mesos-user
Subject:    Re: Apache Mesos 0.19.0 Released
From:       Dick Davies <dick () hellooperator ! net>
Date:       2014-06-14 9:37:04
Message-ID: CAK5eLPSwOMhK2y4ocojCOoj70OQCaqPm8+eV14oHXSN8qDYV2w () mail ! gmail ! com
[Download RAW message or body]

Thanks Ben, that's very useful.

My main reason for using Zookeeper was simplicity (one less component
to worry about in the stack).
> From what you've said, that was fools gold so I can see why you've
dropped that option.


On 13 June 2014 20:15, Benjamin Mahler <benjamin.mahler@gmail.com> wrote:
> Dick: Excellent question, the zookeeper backed registry was dropped for a
> few reasons:
> 
> (1) Znodes by default have a size limit of 1MB. This means if you're cluster
> grows organically and the set of slaves surpasses 1MB, all subsequent
> storage operations will fail. You would not be able to add slaves to your
> cluster past this point. Compression helps, but does not solve it.
> 
> (2) To implement a scalable ZooKeeper backed storage layer, we need to be
> able to partition our data across znodes and perform atomic writes.
> (a) Partitioning is non-trivial and we don't know of any C++ libraries
> that do this already.
> (b) To my knowledge, before 3.4.x transactional support was missing and
> applications had to implement two-phase commit [1]. Complex! Even in 3.4.x
> the transactional support seems to limit total transaction data to 1MB, from
> the NOTE in [2].
> 
> (3) Alternatively, one can live with a simple, but operationally unfortunate
> implementation outlined in (1). But that means we would at least need to
> provide some tooling to make moving between state backends simple. Doable,
> but implies more work and support.
> 
> (4) ZooKeeper is currently the largest source of disruptions to our system
> availability, becoming more reliant on it as a permanent storage backend,
> was a bit worrisome. At Twitter we have had a lot more operational
> experience and confidence with the replicated log as a permanent storage
> backend.
> 
> To be clear, there's nothing stopping anyone from wiring up the existing
> ZooKeeper storage implementation in Mesos and providing it as an alternative
> to the replicated log. As soon as we provide two we should have tooling to
> allow people to move between them.
> 
> I hope this clarifies things!
> 
> [1]
> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_twoPhasedCommit
> 
> [2]
> http://zookeeper.apache.org/doc/r3.4.3/api/org/apache/zookeeper/ZooKeeper.html#multi%28java.lang.Iterable%29
>  
> 
> On Fri, Jun 13, 2014 at 11:04 AM, Jie Yu <yujie.jay@gmail.com> wrote:
> > > 
> > > Largely because of a requirement to bring everything back up in a certain
> > > order
> > 
> > 
> > I don't think they need to be brought back up in a certain order. You just
> > need to restart all of them. The only requirement is that all masters should
> > be running at 0.19.0.
> > 
> > > I'd also be very interested in a zookeeper implementation
> > 
> > 
> > I think there is an issue with ZK impl. Ben Mahler probably can expand
> > here.
> > 
> > - Jie
> > 
> > 
> > On Fri, Jun 13, 2014 at 12:32 AM, Tom Arnfeld <tom@duedil.com> wrote:
> > > 
> > > Hey Dave (and the group),
> > > 
> > > I have to say for me it was a little fiddly to upgrade a 0.18.2
> > > cluster to 0.19.0. Largely because of a requirement to bring
> > > everything back up in a certain order (I had to lower the quorum count
> > > to 1) otherwise mesos failed to get a majority vote to initialise the
> > > log (I had 3 masters).
> > > 
> > > I'd also be very interested in a zookeeper implementation - and
> > > perhaps some improved documentation around the log.
> > > 
> > > Cheers,
> > > 
> > > Tom.
> > > 
> > > > On 13 Jun 2014, at 08:17, Dick Davies <dick@hellooperator.net> wrote:
> > > > 
> > > > I thought I read that there was going to be a registry implementation
> > > > backed by zookeeper;
> > > > does anyone know why that was dropped?
> > > > 
> > > > Really excited to see the containerizer features rolling in, but the
> > > > quorum looks at first glance
> > > > to make Mesos a little harder to operate
> > > > ("This means adding or removing masters must be done carefully! ") - I
> > > > understand the
> > > > benefits but was hoping we could get by with the zookeeper registry.
> > > > 
> > > > 
> > > > > On 13 June 2014 03:49, Dave Lester <davelester@gmail.com> wrote:
> > > > > Hi All,
> > > > > 
> > > > > Below is a blog post that Ben Mahler wrote as release manager for
> > > > > Mesos
> > > > > 0.19.0; it was published on the Mesos site today.
> > > > > 
> > > > > I know that not everyone follows @ApacheMesos Twitter (even though you
> > > > > should!), so I wanted to make sure was also shared on the user@ list.
> > > > > 
> > > > > Cheers,
> > > > > Dave
> > > > > 
> > > > > 
> > > > > Apache Mesos 0.19.0 Released
> > > > > 
> > > > > The latest Mesos release, 0.19.0 is now available for download. This
> > > > > new
> > > > > version includes the following features and improvements:
> > > > > 
> > > > > The master now persists the list of registered slaves in a durable
> > > > > replicated manner using the Registrar and the replicated log.
> > > > > Alpha support for custom container technologies has been added with
> > > > > the
> > > > > ExternalContainerizer.
> > > > > Metrics reporting has been overhauled and is now exposed on
> > > > > <ip:port>/metrics/snapshot.
> > > > > Slave Authentication: optionally, only authenticated slaves can
> > > > > register
> > > > > with the master.
> > > > > Numerous bug fixes and stability improvements.
> > > > > 
> > > > > Full release notes are available on JIRA.
> > > > > 
> > > > > Registrar
> > > > > 
> > > > > Mesos 0.19.0 introduces the "Registrar": the master now persists the
> > > > > list of
> > > > > registered slaves in a durable replicated manner. The previous lack of
> > > > > durable state was an intentional design decision that simplified
> > > > > failover
> > > > > and allowed masters to be run and migrated with ease. However, the
> > > > > stateless
> > > > > design had issues:
> > > > > 
> > > > > In the event of a dual failure (slave fails while master is down), no
> > > > > lost
> > > > > task notifications are sent. This leads to a task running according to
> > > > > the
> > > > > framework but unknown to Mesos.
> > > > > When a new master is elected, we may allow rogue slaves to re-register
> > > > > with
> > > > > the master. This leads to tasks running on the slave that are not
> > > > > known to
> > > > > the framework.
> > > > > 
> > > > > Persisting the list of registered slaves allows failed over masters to
> > > > > detect slaves that do not re-register, and notify frameworks
> > > > > accordingly. It
> > > > > also allows us to prevent rogue slaves from re-registering;
> > > > > terminating the
> > > > > rogue tasks in the process.
> > > > > 
> > > > > The state is persisted using the replicated log (available since
> > > > > 0.9.0).
> > > > > 
> > > > > External Containerization
> > > > > 
> > > > > As alluded to during the containerization / isolation refactor in
> > > > > 0.18.0,
> > > > > the ExternalContainerizer has landed in this release. This provides
> > > > > alpha
> > > > > level support for custom containerization.
> > > > > 
> > > > > Developers can implement their own external containerizers to provide
> > > > > support for custom container technologies. Initial Docker support is
> > > > > now
> > > > > available through some community driven external containerizers:
> > > > > Docker
> > > > > Containerizer for Mesos by Tom Arnfeld and Deimos by Jason Dusek.
> > > > > Please
> > > > > reach out on the mailing lists with questions!
> > > > > 
> > > > > Metrics
> > > > > 
> > > > > Previously, Mesos components had to use custom metrics code and custom
> > > > > HTTP
> > > > > endpoints for exposing metrics. This made it difficult to expose
> > > > > additional
> > > > > system metrics and often required having an endpoint for each
> > > > > libprocess
> > > > > Process (Actor) for which metrics were desired. Having metrics spread
> > > > > across
> > > > > endpoints was operationally complex.
> > > > > 
> > > > > We needed a consistent, simple, and global way to expose metrics,
> > > > > which led
> > > > > to the creation of a metrics library within libprocess. All metrics
> > > > > are now
> > > > > exposed via /metrics/snapshot. The /stats.json endpoint remains for
> > > > > backwards compatibility.
> > > > > 
> > > > > Upgrading
> > > > > 
> > > > > For backwards compatibility, the "Registrar" will be enabled in a
> > > > > phased
> > > > > manner. By default, the "Registrar" is write-only in 0.19.0 and will
> > > > > be
> > > > > read/write in 0.20.0.
> > > > > 
> > > > > If running in high-availability mode with ZooKeeper, operators must
> > > > > now
> > > > > specify the --work_dir for the master, along with the --quorum size of
> > > > > the
> > > > > ensemble of masters. This means adding or removing masters must be
> > > > > done
> > > > > carefully! The best practice is to only ever add or remove a single
> > > > > master
> > > > > at a time and to allow a small amount of time for the replicated log
> > > > > to
> > > > > catch up on the new master. Maintenance documentation will be added to
> > > > > reflect this.
> > > > > 
> > > > > Please refer to the upgrades document, which details how to perform an
> > > > > upgrade from 0.18.x.
> > > > > 
> > > > > Future Work
> > > > > 
> > > > > Thanks to the Registrar, reconciliation primitives can now be provided
> > > > > to
> > > > > ensure that the state of tasks between Mesos and frameworks is kept
> > > > > consistent. This will remove the need for frameworks to implement
> > > > > out-of-band task reconciliation to inspect the state of slaves.
> > > > > Reconciliation work is being tracked at MESOS-1407.
> > > > > 
> > > > > The addition of state through the Registrar opens up a rich set of
> > > > > possible
> > > > > features that were previously not possible due to the lack of
> > > > > persistent
> > > > > state in the master. These include:
> > > > > 
> > > > > Cluster maintenance primitives (MESOS-1474)
> > > > > Repair automation (MESOS-695)
> > > > > Global resource reservations
> > > > > 
> > > > > Getting Involved
> > > > > 
> > > > > We encourage you to try out this release, and let us know what you
> > > > > think and
> > > > > if you hit any issues on the user mailing list. You can also get in
> > > > > touch
> > > > > with us via @ApacheMesos or via mailing lists and IRC.
> > > > > 
> > > > > Thanks
> > > > > 
> > > > > Thanks to the 32 contributors who made 0.19.0 possible:
> > > > > 
> > > > > Ashutosh Jain, Adam B, Alexandra Sava, Anton Lindström, Archana
> > > > > kumari,
> > > > > Benjamin Hindman, Benjamin Mahler, Bernardo Gomez Palacio, Bernd
> > > > > Mathiske,
> > > > > Charlie Carson, Chengwei Yang, Chi Zhang, Dave Lester, Dominic Hamon,
> > > > > Ian
> > > > > Downes, Isabel Jimenez, Jake Farrell, Jameel, Al-Aziz, Jiang Yan Xu,
> > > > > Jie Yu,
> > > > > Nikita Vetoshkin, Niklas Q. Nielsen, Ritwik Yadav, Sam Taha, Steven
> > > > > Phung,
> > > > > Till Toenshoff, Timothy St. Clair, Tobi Knaup, Tom Arnfeld, Tom
> > > > > Galloway,
> > > > > Vinod Kone, Vinson Lee
> > 
> > 
> 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic