[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-dev
Subject:    Re: [DISCUSS] CEP-19: Trie memtable implementation
From:       Branimir Lambov <blambov () apache ! org>
Date:       2022-01-18 10:13:29
Message-ID: CABY-YwxO1wzt5F2R8nVkR5ZL9cBjZ8n9g7UEGTc4A1DznARbrA () mail ! gmail ! com
[Download RAW message or body]

The memtable pluggability API (CEP-11) is per-table to enable memtable
selection that suits specific workflows. It also makes full sense to permit
per-node configuration, both to be able to modify the configuration to suit
heterogeneous deployments better, as well as to test changes for
improvements such as this one.
Recognizing this, the patch comes with a modification to the API
<https://github.com/blambov/cassandra/commit/24b558ba2f71a2f040804e28993cc914b31298f5>
that defines memtable templates in cassandra.yaml (i.e. per node) and
allows the schema to select a template (in addition to being able to
specify the full memtable configuration). One could use this e.g. by adding:

memtable_templates:
    trie:
        class: TrieMemtable
        shards: 16
    skiplist:
        class: SkipListMemtable
memtable:
    template: skiplist

(which defines two templates and specifies the default memtable
implementation to use) to cassandra.yaml and specifying  WITH memtable =
{'template' : 'trie'} in the table schema.

I intend to commit this modification with the memtable API
(CASSANDRA-17034/CEP-11).

Performance comparisons will be published soon.

Regards,
Branimir

On Fri, Jan 14, 2022 at 4:15 PM Jeff Jirsa <jjirsa@gmail.com> wrote:

> Sounds like a great addition
>
> Can you share some of the details around gc and latency improvements
> you've observed with the list?
>
> Any specific reason the confirmation is through schema vs yaml? Presumably
> it's so a user can test per table, but this changes every host in a
> cluster, so the impact of a bug/regression is much higher.
>
>
> On Jan 10, 2022, at 1:30 AM, Branimir Lambov <blambov@apache.org> wrote:
>
> 
> We would like to contribute our TrieMemtable to Cassandra.
>
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-19%3A+Trie+memtable+implementation
>
> This is a new memtable solution aimed to replace the legacy
> implementation, developed with the following objectives:
> - lowering the on-heap complexity and the ability to store memtable
> indexing structures off-heap,
> - leveraging byte order and a trie structure to lower the memory footprint
> and improve mutation and lookup performance.
>
> The new memtable relies on CASSANDRA-6936 to translate to and from
> byte-ordered representations of types, and CASSANDRA-17034 / CEP-11 to plug
> into Cassandra. The memtable is built on multiple shards of custom
> in-memory single-writer multiple-reader tries, whose implementation uses a
> combination of state-of-the-art and novel features for greater efficiency.
>
> The CEP's JIRA ticket (
> https://issues.apache.org/jira/browse/CASSANDRA-17240) contains the
> initial version of the implementation. In its current form it achieves much
> better garbage collection latency, significantly bigger data sizes between
> flushes for the same memory allocation, as well as drastically increased
> write throughput, and we expect the memory and garbage collection
> improvements to go much further with upcoming improvements to the solution.
>
> I am interested in hearing your thoughts on the proposal.
>
> Regards,
> Branimir
>
>

[Attachment #3 (text/html)]

<div dir="ltr"><div>The memtable pluggability API (CEP-11) is per-table to enable \
memtable selection that  suits specific workflows. It also makes full sense to permit \
per-node configuration, both to be able to modify the configuration to suit \
heterogeneous deployments better, as well as to test changes for improvements such as \
this one.<br></div><div dir="ltr"><div>Recognizing this, the patch comes with <a \
href="https://github.com/blambov/cassandra/commit/24b558ba2f71a2f040804e28993cc914b31298f5">a \
modification to the API</a> that defines memtable templates in cassandra.yaml (i.e. \
per node) and allows the schema to select a template (in addition to being able to \
specify the full memtable configuration). One could use this e.g. by \
adding:</div><div><div><pre style="color:rgb(0,0,0);font-family:&quot;JetBrains \
Mono&quot;,monospace;font-size:9.8pt"><span \
style="color:rgb(0,0,128);font-weight:bold">memtable_templates</span>:<br>    <span \
style="color:rgb(0,0,128);font-weight:bold">trie</span>:<br>        <span \
style="color:rgb(0,0,128);font-weight:bold">class</span>: TrieMemtable<br>        \
<span style="color:rgb(0,0,128);font-weight:bold">shards</span>: 16<br>    <span \
style="color:rgb(0,0,128);font-weight:bold">skiplist</span>:<br>        <span \
style="color:rgb(0,0,128);font-weight:bold">class</span>: SkipListMemtable<br><span \
style="color:rgb(0,0,128);font-weight:bold">memtable</span>:<br>    <span \
style="color:rgb(0,0,128);font-weight:bold">template</span>: \
skiplist<br></pre></div><div>(which defines two templates and specifies the default \
memtable implementation to use) to cassandra.yaml and specifying    <span \
style="color:rgb(0,128,0);font-weight:bold;font-family:&quot;JetBrains \
Mono&quot;,monospace;font-size:9.8pt">WITH memtable = {&#39;template&#39; : \
&#39;trie&#39;}  </span>in the table schema.</div></div><div><br></div><div>I intend \
to commit this modification with the memtable API \
(CASSANDRA-17034/CEP-11).</div><div><br></div><div>Performance comparisons will be \
published soon.</div><div><br></div><div>Regards,</div><div>Branimir</div></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jan 14, 2022 at 4:15 PM \
Jeff Jirsa &lt;<a href="mailto:jjirsa@gmail.com">jjirsa@gmail.com</a>&gt; \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div \
dir="ltr"><div dir="ltr">Sounds like a great addition<div><br><div>Can you share some \
of the details around gc and latency improvements you've observed with the list?  \
</div><div><br></div><div>Any specific reason the confirmation is through schema vs \
yaml? Presumably it's so a user can test per table, but this changes every host in a \
cluster, so the impact of a bug/regression is much higher.  </div><div><br><div \
dir="ltr"><br><blockquote type="cite">On Jan 10, 2022, at 1:30 AM, Branimir Lambov \
&lt;<a href="mailto:blambov@apache.org" target="_blank">blambov@apache.org</a>&gt; \
wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><div \
dir="ltr">We would like to contribute  our TrieMemtable to Cassandra.  \
<div><br></div><div><a \
href="https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-19%3A+Trie+memtable+implementation" \
target="_blank">https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-19%3A+Trie+memtable+implementation</a><br></div><div><br></div><div>This \
is a new memtable solution aimed to replace the legacy implementation, developed with \
the following objectives:<div>- lowering the on-heap complexity and the ability to \
store memtable indexing structures off-heap,</div><div>- leveraging byte order and a \
trie structure to lower the memory footprint and improve mutation and lookup \
performance.</div><div><div><br></div><div>The new memtable relies on CASSANDRA-6936 \
to translate to and from byte-ordered representations of types, and CASSANDRA-17034 / \
CEP-11 to plug into Cassandra. The memtable is built on multiple shards of custom \
in-memory single-writer multiple-reader tries, whose implementation uses a \
combination of state-of-the-art and novel features for greater \
efficiency.</div></div></div><div><br></div><div>The CEP&#39;s JIRA ticket (<a \
href="https://issues.apache.org/jira/browse/CASSANDRA-17240" \
target="_blank">https://issues.apache.org/jira/browse/CASSANDRA-17240</a>) contains \
the initial version of the implementation. In its current form it achieves much \
better garbage collection latency, significantly bigger data sizes between flushes \
for the same memory allocation, as well as drastically increased write throughput, \
and we expect the memory and garbage collection improvements to go much further with \
upcoming improvements to the solution.</div><div><br></div><div>I am interested in \
hearing your thoughts on the \
proposal.</div><div><br></div><div>Regards,</div><div>Branimir</div><div><br></div></div>
 </div></blockquote></div></div></div></div></div></blockquote></div></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic