[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-dev
Subject:    Re: [DISCUSS] CEP-25: Trie-indexed SSTable format
From:       Benedict <benedict () apache ! org>
Date:       2022-11-22 10:25:23
Message-ID: CC3237A4-D81C-4FDD-9DE6-1470604EA952 () apache ! org
[Download RAW message or body]

I don't think there's any requirement to run general testing with every storage \
variant, except perhaps pre-release. The idea is to look for regressions in the \
modified areas of the codebase, and if the storage layer hasn't been changed it \
doesn't make sense to confuse or slow down testing IMO.

I'd also like to reiterate my desire to deprecate python dtests.

Regarding testing this feature itself, that's another question IMO. Harry and the \
Simulator would be nice to utilise - even if only in their current guises. But it \
would be nice to introduce some dedicated storage layer tests that could be run more \
efficiently.

> On 22 Nov 2022, at 10:17, Josh McKenzie <jmckenzie@apache.org> wrote:
> 
> 
> Strong +1 for the proposal here.
> 
> > One of the questions that we want to ask is whether anyone objects to maintaining \
> > full compatibility with existing files created by DataStax Enterprise.
> No concerns here. So long as it's clear in the implementation what it is and why \
> it's there I don't see a problem; I think encouraging this kind of bridge work can \
> only benefit the project as it'll encourage more upstreaming from forks where more \
> aggressive innovation might be taking place. 
> Regarding testing, I'd recommend we start with enabling the index + memtables \
> together in the utests-trie target rather than trying to multiplex everything \
> together. 
> 
> > On Tue, Nov 22, 2022, at 4:18 AM, Jacek Lewandowski wrote:
> > +1 for the proposal !
> > 
> > btw. regarding tests - perhaps we will have to let Python DTests run with either \
> > new or old format 
> > thanks
> > - - -- --- ----- -------- -------------
> > Jacek Lewandowski
> > 
> > 
> > On Mon, Nov 21, 2022 at 3:06 PM Benedict <benedict@apache.org> wrote:
> > 
> > Yes of course, this was absolutely just a query and not a precondition for this \
> > work. It stands on its own on my view, and I'm already ready to +1 the proposal. 
> > 
> > > > On 21 Nov 2022, at 13:55, Branimir Lambov <blambov@apache.org> wrote:
> > > 
> > > I see. This does make a lot of sense for full row indexing, and also if one can \
> > > specify sub-kb granularity (at the current default we just won't have an index \
> > > in these cases). How does opening a ticket to do these two* after the current \
> > > code is committed sound? 
> > > * embedded index for sub-X-byte partitions + granularity in bytes
> > > 
> > > On Mon, Nov 21, 2022 at 3:38 PM Benedict <benedict@apache.org> wrote:
> > > 
> > > Buffering on write up to at most one page seems fine? Once you are past a \
> > > single page it's fine to write either to the end of the partition or to a \
> > > separate file, there's nothing much to be gained, but esp. for small partitions \
> > > there's likely significant value in prepending it? 
> > > It might be preferable to retain the separate index for those that overflow \
> > > this buffer, and simply encode in the partition index whether the row index is \
> > > inline or in the separate file. 
> > > 
> > > > > On 21 Nov 2022, at 13:29, Branimir Lambov <blambov@apache.org> wrote:
> > > > 
> > > > There is no intention to introduce any new versions of the format \
> > > > specifically for DSE. If there are any further changes to the format, they \
> > > > will be OSS-first. In other words this support only extends to preexisting \
> > > > versions of the format. 
> > > > Inline row index in the data file is not something we have implemented, and \
> > > > it's currently not in any plans. I personally am not sure how it can be done \
> > > > to provide a benefit: if we place it at the end of a partition, it does not \
> > > > help much compared to a separate file; if we place it in front, we have to \
> > > > buffer the partition content, which will affect write performance. In either \
> > > > case it may be harder to cache. Do you have something different in mind? 
> > > > Regards,
> > > > Branimir
> > > > 
> > > > On Mon, Nov 21, 2022 at 3:01 PM Benedict <benedict@apache.org> wrote:
> > > > 
> > > > Personally very pleased to see this proposal, and I'm not opposed to easing \
> > > > your migration by maintaining some light support for internal file versions - \
> > > > though would prefer the support have some version limit where it can be \
> > > > excised (maybe for one minor version bump?) 
> > > > One implementation question: are there any plans to support inline row index \
> > > > in the big sstable format files? Is this something DSE supports, and on the \
> > > > roadmap just not for initial work, or currently not envisioned? 
> > > > I would anticipate significant advantage to this for many workloads, and no \
> > > > downside (except for streaming - which could be resolved fairly easily by \
> > > > skipping over these sections when streaming to an old node, but since we \
> > > > don't generally stream between versions I don't see any major issue anyway). 
> > > > 
> > > > 
> > > > > > On 21 Nov 2022, at 12:43, Branimir Lambov <blambov@apache.org> wrote:
> > > > > 
> > > > > Hi everyone,
> > > > > 
> > > > > We would like to put CEP-25 for discussion.
> > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-25%3A+Trie-indexed+SSTable+format
> > > > >  
> > > > > The proposal describes DSE's Big Trie-indexed SSTable format, which \
> > > > > replaces the primary index with on-disk tries to improve lookup performance \
> > > > > and index size, better handle wide partitions, and remove the need to \
> > > > > manage key caching and index summaries. 
> > > > > We would like to discuss this proposal with you.
> > > > > 
> > > > > One of the questions that we want to ask is whether anyone objects to \
> > > > > maintaining full compatibility with existing files created by DataStax \
> > > > > Enterprise. 
> > > > > Regards,
> > > > > Branimir
> > > > 
> > > > 
> > > > 
> > > > 
> 


[Attachment #3 (text/html)]

<html><head><meta http-equiv="content-type" content="text/html; \
charset=utf-8"></head><body dir="auto"><div dir="ltr"></div><div dir="ltr">I don't \
think there's any requirement to run general testing with every storage variant, \
except perhaps pre-release. The idea is to look for regressions in the modified areas \
of the codebase, and if the storage layer hasn't been changed it doesn't make sense \
to confuse or slow down testing IMO.</div><div dir="ltr"><br></div><div dir="ltr">I'd \
also like to reiterate my desire to deprecate python dtests.</div><div \
dir="ltr"><br></div><div dir="ltr">Regarding testing this feature itself, that's \
another question IMO. Harry and the Simulator would be nice to utilise - even if only \
in their current guises. But it would be nice to introduce some dedicated storage \
layer tests that could be run more efficiently.</div><div dir="ltr"><br><div \
dir="ltr"></div><blockquote type="cite">On 22 Nov 2022, at 10:17, Josh McKenzie \
&lt;jmckenzie@apache.org&gt; wrote:<br><br></blockquote></div><blockquote \
type="cite"><div dir="ltr"><title></title><div>Strong +1 for the proposal \
here.<br></div><div><br></div><blockquote type="cite"><div>One of the questions that \
we want to ask is whether anyone objects to maintaining full compatibility with \
existing files created by DataStax Enterprise.</div></blockquote><div>No concerns \
here. So long as it's clear in the implementation what it is and why it's there I \
don't see a problem; I think encouraging this kind of bridge work can only benefit \
the project as it'll encourage more upstreaming from forks where more aggressive \
innovation might be taking place.<br></div><div><br></div><div>Regarding testing, I'd \
recommend we start with enabling the index + memtables together in \
the&nbsp;utests-trie target rather than trying to multiplex everything \
together.<br></div><div><br></div><div><br></div><div>On Tue, Nov 22, 2022, at 4:18 \
AM, Jacek Lewandowski wrote:<br></div><blockquote type="cite" id="qt" style=""><div \
dir="ltr"><div class="qt-gmail_default" style="font-family:verdana, sans-serif;">+1 \
for the proposal !<br></div><div class="qt-gmail_default" style="font-family:verdana, \
sans-serif;"><br></div><div class="qt-gmail_default" style="font-family:verdana, \
sans-serif;">btw. regarding tests - perhaps we will have to let Python DTests run \
with either new or old format<br></div><div><div dir="ltr" \
class="qt-gmail_signature"><div dir="ltr"><span class="font" \
style="font-family:verdana, sans-serif;"></span><br></div><div dir="ltr"><span \
class="font" style="font-family:verdana, sans-serif;"><span class="qt-gmail_default" \
style=""><span class="font" style="font-family:verdana, \
sans-serif;">thanks</span></span><br>- - -- --- ----- -------- -------------<br>Jacek \
Lewandowski</span></div></div></div><div><br></div></div><div><br></div><div \
class="qt-gmail_quote"><div dir="ltr" class="qt-gmail_attr">On Mon, Nov 21, 2022 at \
3:06 PM Benedict &lt;<a href="mailto:benedict@apache.org">benedict@apache.org</a>&gt; \
wrote:<br></div><blockquote class="qt-gmail_quote" \
style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-color:rgb(204, \
204, 204);border-left-style:solid;border-left-width:1px;padding-left:1ex;"><div \
dir="auto"><div dir="ltr"><br></div><div dir="ltr">Yes of course, this was absolutely \
just a query and not a precondition for this work. It stands on its own on my view, \
and I'm already ready to +1 the proposal.<br></div><div dir="ltr"><div><br></div><div \
dir="ltr"><br></div><blockquote type="cite"><div>On 21 Nov 2022, at 13:55, Branimir \
Lambov &lt;<a href="mailto:blambov@apache.org" \
target="_blank">blambov@apache.org</a>&gt; \
wrote:<br></div></blockquote></div><blockquote type="cite"><div \
dir="ltr"><div><br></div><div dir="ltr"><div dir="ltr">I see. This does make a lot \
of sense for full row indexing, and also if one can specify sub-kb granularity (at \
the current default we just won't have an index in these cases). How does opening a \
ticket to do these two* after the current code is committed \
sound?<br></div><div><br></div><div>*&nbsp;embedded index for sub-X-byte partitions + \
granularity in bytes<br></div><div><div><br></div><div class="qt-gmail_quote"><div \
dir="ltr" class="qt-gmail_attr">On Mon, Nov 21, 2022 at 3:38 PM Benedict &lt;<a \
href="mailto:benedict@apache.org" target="_blank">benedict@apache.org</a>&gt; \
wrote:<br></div><blockquote class="qt-gmail_quote" \
style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-color:rgb(204, \
204, 204);border-left-style:solid;border-left-width:1px;padding-left:1ex;"><div \
dir="auto"><div dir="ltr"><br></div><div dir="ltr">Buffering on write up to at most \
one page seems fine? Once you are past a single page it's fine to write either to the \
end of the partition or to a separate file, there's nothing much to be gained, but \
esp. for small partitions there's likely significant value in prepending \
it?<br></div><div dir="ltr"><br></div><div dir="ltr">It might be preferable to retain \
the separate index for those that overflow this buffer, and simply encode in the \
partition index whether the row index is inline or in the separate \
file.<br></div><div dir="ltr"><div><br></div><div dir="ltr"><br></div><blockquote \
type="cite"><div>On 21 Nov 2022, at 13:29, Branimir Lambov &lt;<a \
href="mailto:blambov@apache.org" target="_blank">blambov@apache.org</a>&gt; \
wrote:<br></div></blockquote></div><blockquote type="cite"><div \
dir="ltr"><div><br></div><div dir="ltr"><div dir="ltr"><div>There is no intention \
to introduce any new versions of the format specifically for DSE. If there are any \
further&nbsp;changes to the format, they will be OSS-first. In other words this \
support only extends to preexisting versions of the \
format.<br></div><div><br></div><div>Inline row index in the data file is not \
something we have implemented, and it's currently not in any plans. I personally am \
not sure how it can be done to provide a benefit: if we place it at the end of a \
partition, it does not help much compared to a separate file; if we place it in \
front, we have to buffer the partition content, which will affect write performance. \
In either case it may&nbsp;be harder to cache. Do you have something different in \
mind?<br></div></div><div \
dir="ltr"><br></div><div>Regards,<br></div><div>Branimir<br></div><div><br></div><div \
class="qt-gmail_quote"><div dir="ltr" class="qt-gmail_attr">On Mon, Nov 21, 2022 at \
3:01 PM Benedict &lt;<a href="mailto:benedict@apache.org" \
target="_blank">benedict@apache.org</a>&gt; wrote:<br></div><blockquote \
class="qt-gmail_quote" \
style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-color:rgb(204, \
204, 204);border-left-style:solid;border-left-width:1px;padding-left:1ex;"><div \
dir="auto"><div dir="ltr"><br></div><div dir="ltr">Personally very pleased to see \
this proposal, and I'm not opposed to easing your migration by maintaining some light \
support for internal file versions - though would prefer the support have some \
version limit where it can be excised (maybe for one minor version \
bump?)<br></div><div dir="ltr"><br></div><div dir="ltr">One implementation question: \
are there any plans to support inline row index in the big sstable format files? Is \
this something DSE supports, and on the roadmap just not for initial work, or \
currently not envisioned?<br></div><div dir="ltr"><br></div><div dir="ltr">I would \
anticipate significant advantage to this for many workloads, and no downside (except \
for streaming - which could be resolved fairly easily by skipping over these sections \
when streaming to an old node, but since we don't generally stream between versions I \
don't see any major issue anyway).<br></div><div dir="ltr"><br></div><div \
dir="ltr"><div><br></div><div dir="ltr"><br></div><blockquote type="cite"><div>On 21 \
Nov 2022, at 12:43, Branimir Lambov &lt;<a href="mailto:blambov@apache.org" \
target="_blank">blambov@apache.org</a>&gt; \
wrote:<br></div></blockquote></div><blockquote type="cite"><div \
dir="ltr"><div><br></div><div dir="ltr"><div>Hi \
everyone,<br></div><div><br></div><div>We would like to put CEP-25 for \
discussion.<br></div><div><a \
href="https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-25%3A+Trie-indexed+SSTable+format" \
target="_blank">https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-25%3A+Trie-indexed+SSTable+format</a><br></div><div><br></div><div>The \
proposal describes DSE's Big Trie-indexed SSTable format, which replaces the primary \
index with on-disk tries to improve lookup performance and index size, better handle \
wide partitions, and remove the need to manage key caching and index \
summaries.<br></div><div><br></div><div>We would like to discuss this proposal with \
you.<br></div><div><br></div><div>One of the questions that we want to ask is whether \
anyone objects to maintaining full compatibility with existing files created by \
DataStax Enterprise.<br></div><div><br></div><div>Regards,<br></div><div>Branimir<br>< \
/div></div></div></blockquote></div></blockquote></div><div><br></div><div><br></div><div \
dir="ltr"><div style="padding-top:8px;"><br></div></div></div></div></blockquote></div \
></blockquote></div></div></div></div></blockquote></div></blockquote></div></blockquote><div><br></div></div></blockquote></body></html>
> 



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic