[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-dev
Subject:    Re: [DISCUSSION] Next release roadmap
From:       Benedict Elliott Smith <benedict () apache ! org>
Date:       2021-04-26 17:14:38
Message-ID: AECE111D-AED6-4918-B3CE-B0B6D200DFC6 () apache ! org
[Download RAW message or body]

I think my earlier response vanished into the moderator queue. Just a few comments:

1) The Paxos latency (and correctness) improvements I think should land in 4.0.x, as \
we have introduced a fairly significant regression and this work mostly resolves \
outstanding issues with LWTs today. 2) If we aim to deliver multi-partition LWTs in \
4.x/5.0, we may likely want to pair this with work to further reduce latency beyond \
the above work, as contention will become a more significant problem. Should I be \
involved in delivering multi-partition LWTs I will also be aiming to deliver even \
lower latencies for the release they land in. 3) To support all of the above work, I \
also aim to deliver a Simulator facility for deterministically executing cluster \
workloads under adversarial scheduling (i.e. that intercepts all message and thread \
events and evaluates them sequentially, in pseudorandom order), alongside \
linearizability verification built upon this. This work will include (or have as a \
prerequisite) significant clean-ups to internal functionality like executors, use of \
futures and other concurrency primitives, and mocking out of time and the filesystem.


On 23/04/2021, 14:46, "Benjamin Lerer" <b.lerer@gmail.com> wrote:

    Hi everybody,

    Thanks for all the responses. I went through the emails and aggregated the
    proposals to give us an idea on where we stand at this point.

    I only included the improvements in the list and left on the side the bug
    fixes.
    Regarding bug fixes, I wonder if we should not have discussions every month
    to discuss what are the important issues that should be fixed in priority.
    I feel that we sometimes tend to forget old issues even if they are more
    important than some new ones.

    Do not hesitate to tell me if I missed something or misinterpreted some
    proposal.

    *Query side improvements:*

      * Storage Attached Index or SAI. The CEP can be found at
    https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index
                
      * Add support for OR predicates in the CQL where clause
      * Allow to aggregate by time intervals (CASSANDRA-11871) and allow UDFs
    in GROUP BY clause
      * Ability to read the TTL and WRITE TIME of an element in a collection
    (CASSANDRA-8877)
      * Multi-Partition LWTs
      * Materialized views hardening: Addressing the different Materialized
    Views issues (see CASSANDRA-15921 and [1] for some of the work involved)

    *Security improvements:*

      * SSTables encryption (CASSANDRA-9633)
      * Add support for Dynamic Data Masking (CEP pending)
      * Allow the creation of roles that have the ability to assign arbitrary
    privileges, or scoped privileges without also granting those roles access
    to database objects.
      * Filter rows from system and system_schema based on users permissions
    (CASSANDRA-15871)

    *Performance improvements:*

      * Trie-based index format (CEP pending)
      * Trie-based memtables (CEP pending)
      * Paxos improvements: Paxos / LWT implementation that would enable the
    database to serve serial writes with two round-trips and serial reads with
    one round-trip in the uncontended case

    *Safety/Usability improvements:*

      * Guardrails. The CEP can be found at
    https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
                
      * Add ability to track state in repair (CASSANDRA-15399)
      * Repair coordinator improvements (CASSANDRA-15399)
      * Make incremental backup configurable per keyspace and table
    (CASSANDRA-15402)
      * Add ability to blacklist a CQL partition so all requests are ignored
    (CASSANDRA-12106)
      * Add default and required keyspace replication options (CASSANDRA-14557)
      * Transactional Cluster Metadata: Use of transactions to propagate
    cluster metadata
      * Downgrade-ability: Ability to downgrade to downgrade in the event that
    a serious issue has been identified

    *Pluggability improvements:*

      * Pluggable schema manager (CEP pending)
      * Pluggable filesystem (CEP pending)
      * Pluggable authenticator for CQLSH (CASSANDRA-16456). A CEP draft can be
    found at
    https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit
                
      * Memtable API (CEP pending). The goal being to allow improvements such
    as CASSANDRA-13981 to be easily plugged into Cassandra

    *Memtable pluggable implementation:*

      * Enable Cassandra for Persistent Memory (CASSANDRA-13981)

    *Other tools:*

      * CQL compatibility test suite (CEP pending)

    Le jeu. 22 avr. 2021   16:11, Benjamin Lerer <b.lerer@gmail.com> a écrit :

    > Finally, I think it's important we work to maintain trunk in a shippable
    >> state.
    >
    >
    > I am +100 on this. Bringing Cassandra to such a state was a huge effort
    > and keeping it that way will help us to ensure the quality of the
    > releases.
    >
    > Le jeu. 15 avr. 2021   17:30, Scott Andreas <scott@paradoxica.net> a
    > écrit :
    >
    >> Thanks for starting this discussion, Benjamin!
    >>
    >> I share others' enthusiasm on this thread for improvements to secondary
    >> indexes, trie-based partition indexes, guardrails, and encryption at rest.
    >>
    >> Here are some other post-4.0 areas for investment that have been on my
    >> mind:
    >>
    >> – Transactional Cluster Metadata
    >> Migrating from optimistic modification and propagation of cluster
    >> metadata via gossip to a transactional implementation opens a lot of
    >> possibilities. Token movements and instance replacements get safer and
    >> faster. Schema changes can be made atomic, enabling users to execute DDL
    >> rapidly without waiting for convergence. Operations like expansions and
    >> shrinks become easier to automate with less care and feeding.
    >>
    >> – Paxos improvements
    >> During discussion on C-12126, Benedict expressed interest in post-4.0
    >> improvements that can be made to Cassandra's Paxos / LWT implementation
    >> that would enable the database to serve serial writes with two round-trips
    >> and serial reads with one round-trip in the uncontended case. For many
    >> cross-WAN serial use cases, this may halve the latency of CAS queries.
    >>
    >> – Multi-Partition LWTs
    >> LWT is a great primitive, but modeling applications with the constraint
    >> of single-key CAS can be a game of Twister. Extending the paxos
    >> improvements discussed above to enable multi-partition CAS would enable
    >> users of Apache Cassandra to perform serial operations across partition
    >> boundaries.
    >>
    >> – Downgrade-ability
    >> I also see "downgradeability" as important to future new release
    >> adoption. Taking file format changes as an example, it's currently not
    >> possible to downgrade in the event that a serious issue has been identified
    >> – unless you're able to host-replace yourself out after upgrading one
    >> replica, or revert to a pre-upgrade snapshot and accept data loss. It would
    >> be excellent if it were possible for v.next to continue writing the
    >> previous SSTable/commitlog/hint/etc. format until a switch is flipped to
    >> opt into new file formats. Apache HDFS takes a similar approach, enabling
    >> downgrade until NameNode metadata is finalized [1]. This would be an
    >> excellent capability to have in Apache Cassandra, and dramatically lower
    >> the stakes for new release adoption.
    >>
    >> On pluggability / disaggregation:
    >> I agree that these are important themes. We'll want to bring a lot of
    >> care and attention to this work. Disaggregation can open a lot of
    >> possibilities - with the drawback of future changes being restricted to the
    >> defined interface and an inability to optimize across interface boundaries.
    >> We can probably hit a sweet spot, though.
    >>
    >> Toolchains to validate implementations of pluggable components will
    >> become very important. It would be bad for the project's users if bundled
    >> implementations were of uneven quality or supported subsets of
    >> functionality. Converging on a common validation toolchain for pluggable
    >> subsystems can help us ensure that quality while minimizing the effort
    >> required to test new implementations.
    >>
    >> Finally, I think it's important we work to maintain trunk in a shippable
    >> state. This might look like major changes and new features hiding behind
    >> feature flags that enable users to selectively enable them as development
    >> and validation proceeds, with new code executed regardless of the flag held
    >> to a higher standard.
    >>
    >> Cheers,
    >>
    >> – Scott
    >>
    >> [1]
    >> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
  >>
    >>
    >> ________________________________________
    >> From: guo Maxwell <cclive1601@gmail.com>
    >> Sent: Wednesday, April 14, 2021 10:25 PM
    >> To: dev@cassandra.apache.org
    >> Subject: Re: [DISCUSSION] Next release roadmap
    >>
    >> +1
    >>
    >> Brandon Williams <driftx@gmail.com> 于2021年4月15日周四 \
上午4:48写道:  >>
    >> > Agreed.  Everyone just please keep in mind this thread is for roadmap
    >> > contributions you plan to make, not contributions you would like to
    >> > see.
    >> >
    >> > On Wed, Apr 14, 2021 at 3:45 PM Nate McCall <zznate.m@gmail.com> wrote:
    >> > >
    >> > > Agree with Stefan 100% on this. We need to move towards pluggability.
    >> Our
    >> > > users are asking for it, it makes sense architecturally, and people
    >> are
    >> > > doing it anyway.
    >> > >
    >> > >
    >> > > ...
    >> > > > for me definitely
    >> https://issues.apache.org/jira/browse/CASSANDRA-9633
    >> > > >
    >> > > > I am surprised nobody mentioned this in the previous answers, there
    >> is
    >> > > > ~50 people waiting for it to happen and multiple people working on
    >> it
    >> > > > seriously and wanting that feature to be there for so so long.
    >> > > > ...
    >> >
    >> > ---------------------------------------------------------------------
    >> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    >> > For additional commands, e-mail: dev-help@cassandra.apache.org
    >> >
    >> >
    >>
    >> --
    >> you are the apple of my eye !
    >>
    >> ---------------------------------------------------------------------
    >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
    >> For additional commands, e-mail: dev-help@cassandra.apache.org
    >>
    >>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic