'Re: [DISCUSS] A point of view on Testing Cassandra'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-dev
Subject:    Re: [DISCUSS] A point of view on Testing Cassandra
From:       Benedict Elliott Smith <benedict () apache ! org>
Date:       2020-07-16 9:26:30
Message-ID: B270CE47-C796-4E4B-8D67-1D8C00DD815B () apache ! org
[Download RAW message or body]

Thanks for getting the ball rolling.  I think we need to be a lot more specific, \
though, and it may take some time to hash it all out.

For starters we need to distinguish between types of "done" - are we discussing:
 - Release
 - New Feature
 - New Functionality (for an existing feature)
 - Performance Improvement
 - Minor refactor
 - Bug fix

?  All of these (perhaps more) require unique criteria in my opinion.

For example:
 - New features should be required to include randomised integration tests that \
exercise all of the functions of the feature in random combinations and verifies that \
the behaviour is consistent with expectation.  New functionality for an existing \
feature should augment any existing such tests to include the new functionality in \
                its random exploration of behaviour.
 - Releases are more suitable for many of your cluster-level tests, IMO, particularly \
if we get regular performance regression tests running against trunk (something for a \
shared roadmap)

Then, there are various things that need specifying more clearly, e.g.:

> Minimum 75% code coverage on non-boilerplate code
Coverage by what? In my model, randomised integration tests of the relevant feature, \
                but we need to agree specifically. Some thoughts:
 - Not clear the value of code coverage measures, but 75% perhaps an acceptable \
                arbitrary number if we want a lower bound
 - More pertinent measure is options and behaviours
    - For a given system/feature/function, we should run with _every_ user option and \
                every feature behaviour at least once; 
    - Where tractable, exhaustive coverage (every combination of option, with every \
                logical behaviour); 
    - Where not possible, random combinations of options and behaviours.

> - Some form of the above in mixed-version clusters
I think we need to include mixed-schema, and modified-schema clusters as well, as \
this is a significant source of bugs 

> aggressively adversarial scenarios
As far as chaos is concerned, I hope to bring an addition to in-jvm dtests soon, that \
should facilitate this for more targeted correctness tests - so problems can be \
surfaced more rapidly and repeatably.  Also with much less hardware :)


On 15/07/2020, 22:35, "Joshua McKenzie" <jmckenzie@apache.org> wrote:

    I like that the "we need a Definition of Done" seems to be surfacing. No
    directed intent from opening this thread but it seems a serendipitous
    outcome. And to reiterate - I didn't open this thread with the hope or
    intent of getting all of us to agree on anything or explore what we should
    or shouldn't agree on. That's not my place nor is it historically how we
    seem to operate. :) Just looking to share a PoV so other project
    participants know about some work coming down the pipe and can engage if
    they're interested.

    Brainstorming here to get discussion started, which we could drop in a doc
    and riff on or high bandwidth w/collaborators interested in the topic:

       - Tested on clusters with N nodes (10? 50? 3?) <- I'd start at proposing
       min maybe 25
       - Tested on data set sizes >= <M>TB (Maybe 30 given the 25 node count
       w/current density)
       - Soak tested in aggressively adversarial scenarios w/proven correctness
       for 72 hours (fallout w/nodes down, up, bounce, GC pausing, major
       compaction, major repair, packet loss, bootstrapping, etc. We could come up
       with a list)
       - Some form of the above in mixed-version clusters
       - Minimum 75% code coverage on non-boilerplate code
       - Where possible (i.e. not a brand new semantic / feature), diff-tested
       against existing schemas making use of APIs in mixed version clusters as
       well as on new-version only clusters (in case of refactor / internal black
       box rewrite)

    Some discrete bars like the above for a definition of done may make sense.
    Any other ideas to add or differing points of view on what the #'s above
    should be? Or disagreement on the items in the list above?

    I hold all the above loosely, so don't hesitate to respond, disagree, or
    totally shoot down. Or propose an entirely different approach to
    determining a Definition of Done we could engage with.

    Last but not least, we'd have to make infrastructure like this available to
    the project at large for usage and validation on testing features or this
    exercise will simply serve to deter engagement with the project outside a
    small subset of the population with resources to dedicate to this type of
    testing which I think we don't want.

    On Wed, Jul 15, 2020 at 11:53 AM Benedict Elliott Smith <benedict@apache.org>
    wrote:

    > Perhaps you could clarify what you personally hope we _should_ agree as a
    > project, and what you want us to _not_ agree (blossom in infinite variety)?
    >
    > My view: We need to agree a shared framework for quality going forwards.
    > This will raise the bar to contributions, including above many that already
    > exist.  So, we then need a roadmap to meeting the framework's requirements
    > for past and future contributions, so that feature development does not
    > suffer too greatly from the extra expectations imposed upon them.  I hope
    > the framework and roadmap will be very specific and prescriptive in setting
    > their minimum standards, which can of course be further augmented as any
    > contributor desires.
    >
    > This seems to be the only way to come to an agreement about the point of
    > contention you raise: some people perceive an insufficient concern about
    > quality, others perceive a surplus of concern about quality.  Until we
    > agree quite specifically what we mean, this tension will persist.  I also
    > think it's a great way to improve project efficiency, if a contributor so
    > cares: resources can be focused on the shared requirements first, since
    > they're the "table stakes".
    >
    > Could you elaborate what you would prefer to leave out of this in your
    > "Definition of Done"?
    >
    >
    > On 15/07/2020, 16:28, "Joshua McKenzie" <jmckenzie@apache.org> wrote:
    >
    >     >
    >     > This section reads as very anti-adding tests to test/unit; I am 100%
    > in
    >     > favor of improving/creating our smoke, integration, regression,
    >     > performance, E2E, etc. testing, but don't think I am as negative to
    >     > test/unit, these tests are still valuable and more are welcome.
    >
    >     I am a strong proponent of unit tests; upon re-reading the document I
    > don't
    >     draw the same conclusion you do about the implications of the
    >     verbiage, however it's completely reasonable to have a point of view
    > that's
    >     skeptical of people on this project's dedication to rigor and quality.
    > :) I
    >     think it's critical to "name and tame" the current architectural
    >     constraints that undermine our ability to thoroughly unit test, as
    > well as
    >     understand and mitigate the weaknesses of our current unit testing
    >     capabilities. A discrete example - attempting to "unit test" anything
    > in
    >     the CommitLog largely leads to the entire CommitLog package spinning
    > up,
    >     which drags in other packages, and before you know it you have multiple
    >     modules up and running thanks to the dependency tree. This is something
    >     myself, Jason, Stupp, Branimir, and others have all repeatedly burned
    > time
    >     on trying to delicately walk through re: test spin up and tear down.
    > This
    >     has ramifications far beyond just the time lost by engineers; the
    >     opportunity cost of that combined with the fragility of systems means
    > that
    >     what testing we *do* perform is going to be constrained in scope
    > relative
    >     to a traditional battery against a stand-alone, modularized artifact.
    >
    >     Any and all contribution to *any* testing is strongly welcomed by all
    > of us
    >     on the project. In terms of "where I and a few others are going to
    > choose
    >     to invest our efforts" right now, accepting the current shortcomings
    > of the
    >     system to make as much headway on the urgent + important is where we're
    >     headed.
    >
    >     I think it's more important that we set a standard for the project
    > (e.g.,
    >     > fundamental conformance to properties of the database) rather than
    >     > attempting to measure quality relative to other DBs
    >
    >     I'm sympathetic to this then the pragmatist in me hammers me down. In
    >     general, the adage "Software is never done; it is only released"
    > resonates
    >     for me as the core of what we have to navigate here. We will never be
    > able
    >     to state with 100% certainty that there is fundamental conformance to
    > the
    >     availability and correctness properties of the database; this
    > dissatisfying
    >     reality is why you have multiple teams implementing the software for
    >     spacecraft and then redundancies within redundancies in each system for
    >     unexpected failure scenarios and the unknown-unknown. In my opinion, we
    >     need a very clear articulation of our Definition of Done when it comes
    > to
    >     correctness guarantees (yes Ariel, you were right) as well as a more
    >     skillful and deliberate articulated and implemented "failsafe" for
    > catching
    >     things and/or surfacing adverse conditions within the system upon
    > failure.
    >
    >     It's tricky because in the past (in my opinion) we've been pretty
    > remiss as
    >     a project when it comes to a devotion to correctness and rigor. The
    > danger
    >     I'm anecdotally seeing is that if we let that pendulum swing too far
    > in the
    >     other direction without successfully clearly defining what "Done" looks
    >     like from a quality perspective, that's an Everest we can all climb
    > and die
    >     on as a project.
    >
    >     On Wed, Jul 15, 2020 at 12:42 AM Scott Andreas <scott@paradoxica.net>
    > wrote:
    >
    >     > Thanks for starting discussion!
    >     >
    >     > Replying to the thread with what I would have left as comments.
    >     >
    >     > ––––––
    >     >
    >     > > As yet, we lack empirical evidence to quantify the relative
    > stability or
    >     > instability of our project compared to a peer cohort
    >     >
    >     > I think it's more important that we set a standard for the project
    > (e.g.,
    >     > fundamental conformance to properties of the database) rather than
    >     > attempting to measure quality relative to other DBs. That might be a
    > useful
    >     > measure, but I don't think it's the most important one. With regard
    > to
    >     > measuring against a common standard in the project, this is roughly
    > what I
    >     > had in mind when proposing "Release Quality Metrics" on the list in
    > 2018. I
    >     > still think making progress on something like this is essential
    > toward
    >     > defining a quantitative bar for release:
    >     > https://www.mail-archive.com/dev@cassandra.apache.org/msg13154.html
    >     >
    >     > > Conversely, the ability to repeatedly and thoroughly validate the
    >     > correctness of both new and existing functionality in the system is
    > vital
    >     > to the speed with which we can evolve it's form and function.
    >     >
    >     > Strongly agreed.
    >     >
    >     > > Utopia (and following section)
    >     >
    >     > Some nods to great potential refactors to consider post-4.0 here. ^
    >     >
    >     > > We should productize a kubernetes-centric, infra agnostic tool
    > that has
    >     > the following available testing paradigms:
    >     >
    >     > This would be an excellent set of capabilities to have.
    >     >
    >     > > We need to empower our user community to participate in the testing
    >     > process...
    >     >
    >     > I really like this point. I took as a thought experiment "what would
    > feel
    >     > great to be able to say" if one were to write a product announcement
    > for
    >     > 4.0 and landed on something like "Users of Apache Cassandra can
    > preflight
    >     > their 4.0 upgrade by runing $tool to clone, upgrade, and compare
    > their
    >     > clusters, ensuring that the upgrade will complete smoothly and
    > correctly."
    >     >
    >     > > The less friction and less investment we can require from ecosystem
    >     > participants, the more we can expect them to engage in desired
    > behavior.
    >     >
    >     > +1
    >     >
    >     > ––––––
    >     >
    >     > I like the document and there's a lot that has me nodding. Toward the
    >     > opening statement on "empirical evidence to quantify relative
    > stability,"
    >     > I'd love to revisit discussion on quantifying attributes like these
    > here:
    >     >
    > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=93324430
    >     >
    >     > – Scott
    >     >
    >     > ________________________________________
    >     > From: David Capwell <dcapwell@gmail.com>
    >     > Sent: Tuesday, July 14, 2020 6:23 PM
    >     > To: dev@cassandra.apache.org
    >     > Subject: Re: [DISCUSS] A point of view on Testing Cassandra
    >     >
    >     > I am also not fully clear on the motives, but welcome anything which
    > helps
    >     > bring in better and more robust testing; thanks for starting this.
    >     >
    >     > Since I can not comment in the doc I have to copy/paste and put
    > here... =(
    >     >
    >     > Reality
    >     > > ...
    >     > > investing in improving our smoke and integration testing as much
    > as is
    >     > > possible with our current constraints seems prudent.
    >     >
    >     >
    >     > This section reads as very anti-adding tests to test/unit; I am 100%
    > in
    >     > favor of improving/creating our smoke, integration, regression,
    >     > performance, E2E, etc. testing, but don't think I am as negative to
    >     > test/unit, these tests are still valuable and more are welcome.
    >     >
    >     > To enumerate a punch list of traits we as engineers need from a
    > testing
    >     > > suite
    >     >
    >     >
    >     > Would be good to speak about portability, accessibility, and version
    >     > independents.  If a new contributor wants to add tests to this suite
    > they
    >     > need to be able to run it, and it should run within a "reasonable"
    > time
    >     > frame; one of the big issuers with python dtests is that it takes
    > 14+ hours
    >     > to run, this makes it no longer accessible to new contributors.
    >     >
    >     >
    >     > On Tue, Jul 14, 2020 at 11:47 AM Joshua McKenzie <
    > jmckenzie@apache.org>
    >     > wrote:
    >     >
    >     > > The purpose is purely to signal a point of view on the state of
    > testing
    >     > in
    >     > > the codebase, some shortcomings of the architecture, and what a
    > few of us
    >     > > are doing and further planning to do about it. Kind of a "prompt
    >     > discussion
    >     > > if anyone has a wild allergic reaction to it, or encourage
    > collaboration
    >     > if
    >     > > they have a wild positive reaction" sort of thing. Maybe a
    > spiritual
    >     > > "CEP-lite". :)
    >     > >
    >     > > I would advocate that we be very selective about the topics on
    > which we
    >     > > strive for a consistent shared point of view as a project. There
    > are a
    >     > lot
    >     > > of us and we all have different experiences and different points
    > of view
    >     > > that lead to different perspectives and value systems. Agreeing on
    >     > discrete
    >     > > definitions of done, 100% - that's table stakes. But agreeing on
    > how we
    >     > get
    >     > > there, my personal take is we'd all be well served to spend our
    > energy
    >     > > Doing the Work and expressing these complementary positions rather
    > than
    >     > > trying to bend everyone to one consistent point of view.
    >     > >
    >     > > Let a thousand flowers bloom, as someone wise recently told me. :)
    >     > >
    >     > > That said, this work will be happening in an open source repo with
    > a
    >     > > permissive license (almost certainly ASLv2), likely using github
    > issues,
    >     > so
    >     > > anyone that wants to collaborate on it would be most welcome. I
    > can make
    >     > > sure Gianluca, Charles, Berenguer, and others bring that to this ML
    >     > thread
    >     > > once we've started open-sourcing things.
    >     > >
    >     > > On Tue, Jul 14, 2020 at 4:25 AM Benedict Elliott Smith <
    >     > > benedict@apache.org>
    >     > > wrote:
    >     > >
    >     > > > It does raise the bar to critiquing the document though, but
    > perhaps
    >     > > > that's also a feature.
    >     > > >
    >     > > > Perhaps we can first discuss the purpose of the document? It
    > seems to
    >     > be
    >     > > a
    >     > > > mix of mission statement for the project, as well as your own
    > near term
    >     > > > roadmap?  Should we interpret it only as an advertisement of
    > your own
    >     > > view
    >     > > > of the problems the project faces, as a start to dialogue, or is
    > the
    >     > > > purpose to solicit feedback?
    >     > > >
    >     > > > Would it be helpful to work towards a similar document the whole
    >     > > community
    >     > > > endorses, with a shared mission statement, and a (perhaps loosely
    >     > > defined)
    >     > > > shared roadmap?
    >     > > >
    >     > > > I'd like to call out some specific things in the document that I
    > am
    >     > > > personally excited by: the project has long lacked a coherent,
    >     > repeatable
    >     > > > approach to performance testing and regressions; combined with
    > easy
    >     > > > visualisation tools this would be a huge win.  The FQL sampling
    > with
    >     > data
    >     > > > distribution inference is also something that has been discussed
    >     > > privately
    >     > > > elsewhere, and would be hugely advantageous to the former, so
    > that we
    >     > can
    >     > > > discover representative workloads.
    >     > > >
    >     > > > Thanks for taking the time to put this together, and start this
    >     > dialogue.
    >     > > >
    >     > > >
    >     > > > On 13/07/2020, 23:41, "Joshua McKenzie" <jmckenzie@apache.org>
    > wrote:
    >     > > >
    >     > > >     >
    >     > > >     > Can you please allow comments on the doc so we can leave
    >     > feedback.
    >     > > >     >
    >     > > >
    >     > > >
    >     > > >     > Doc is view only; figured we could keep this to the ML.
    >     > > >     >
    >     > > >     That's a feature, not a bug.
    >     > > >
    >     > > >     Happy to chat here or on slack w/anyone. This is a complex
    > topic so
    >     > > >     long-form or high bandwidth communication is a better fit
    > than gdoc
    >     > > >     comments. They rapidly become unwieldy.
    >     > > >
    >     > > >     On Mon, Jul 13, 2020 at 6:17 PM sankalp kohli <
    >     > > kohlisankalp@gmail.com>
    >     > > >     wrote:
    >     > > >
    >     > > >     > Can you please allow comments on the doc so we can leave
    >     > feedback.
    >     > > >     >
    >     > > >     > On Mon, Jul 13, 2020 at 2:16 PM Joshua McKenzie <
    >     > > > jmckenzie@apache.org>
    >     > > >     > wrote:
    >     > > >     >
    >     > > >     > > Link:
    >     > > >     > >
    >     > > >     > >
    >     > > >     >
    >     > > >
    >     > >
    >     >
    > https://docs.google.com/document/d/1ktuBWpD2NLurB9PUvmbwGgrXsgnyU58koOseZAfaFBQ/edit#
  >     > > >     > >
    >     > > >     > >
    >     > > >     > > Myself and a few other contributors are working with
    > this point
    >     > > of
    >     > > > view
    >     > > >     > as
    >     > > >     > > our frame of where we're going to work on improving
    > testing on
    >     > > the
    >     > > >     > project.
    >     > > >     > > I figured it might be useful to foster collaboration more
    >     > broadly
    >     > > > in the
    >     > > >     > > community as well as provide people with the opportunity
    > to
    >     > > > discuss work
    >     > > >     > > they're doing they may not yet have had a chance to
    > bring up or
    >     > > > open
    >     > > >     > > source. While fallout is already open-sourced, expect the
    >     > schema
    >     > > >     > anonymizer
    >     > > >     > > and some of the cassandra-diff + nosqlbench framework
    > effort to
    >     > > be
    >     > > >     > > open-sourced / openly worked on soon. Anyone that's
    > interested
    >     > in
    >     > > >     > > collaborating, that would be highly welcome.
    >     > > >     > >
    >     > > >     > > Doc is view only; figured we could keep this to the ML.
    >     > > >     > >
    >     > > >     > > Thanks.
    >     > > >     > >
    >     > > >     > > ~Josh
    >     > > >     > >
    >     > > >     >
    >     > > >
    >     > > >
    >     > > >
    >     > > >
    > ---------------------------------------------------------------------
    >     > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    >     > > > For additional commands, e-mail: dev-help@cassandra.apache.org
    >     > > >
    >     > > >
    >     > >
    >     >
    >     > ---------------------------------------------------------------------
    >     > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    >     > For additional commands, e-mail: dev-help@cassandra.apache.org
    >     >
    >     >
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    > For additional commands, e-mail: dev-help@cassandra.apache.org
    >
    >



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic