[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-user
Subject:    RE:  Best compaction strategy for rarely used data
From:       "onmstester onmstester via user" <user () cassandra ! apache ! org>
Date:       2023-01-07 6:12:51
Message-ID: 1858ad1521c.b5b003bd9157.2001233668416302877 () zoho ! com
[Download RAW message or body]

Another solution: distribute data in more tables, for example you could create \
multiple tables based on value or hash_bucket of one of the columns, by doing this \
current data volume   and compaction overhead would be divided to the number of \
underlying tables. Although there is a limitation for number of tables in Cassandra \
(a few hundreds).

I wish STCS simply had a limitation for maximum sstable size so sstables bigger that \
this limit would not be compacted at all, that would have solved most of similar \
problems?!



Sent using https://www.zoho.com/mail/








---- On Fri, 30 Dec 2022 21:43:27 +0330 Durity, Sean R via user \
<user@cassandra.apache.org> wrote ---




Yes, clean-up will reduce the disk space on the existing nodes by re-writing only the \
data that the node now owns into new sstables.

  

  

Sean R. Durity

DB Solutions

Staff Systems Engineer – Cassandra

  

From: Lapo Luchini <mailto:lapo@lapo.it> 
 Sent: Friday, December 30, 2022 4:12 AM
 To: mailto:user@cassandra.apache.org
 Subject: [EXTERNAL] Re: Best compaction strategy for rarely used data

  

On 2022-12-29 21: 54, Durity, Sean R via user wrote: > At some point you will end \
up with large sstables (like 1 TB) that won't > compact because there are not  4 \
similar-sized ones able to be compacted Yes, that's exactly what's happening. 




  

INTERNAL USE


On 2022-12-29 21:54, Durity, Sean R via user wrote:

> At some point you will end up with large sstables (like 1 TB) that won't 

> compact because there are not 4 similar-sized ones able to be compacted 

  

Yes, that's exactly what's happening.

  

I'll see maybe just one more compaction, since the biggest sstable is 

already more than 20% of residual free space.

  

> For me, the backup strategy shouldn't drive the rest.

  

Mhh, yes, that makes sense.

  

> And if your data is ever-growing 

> and never deleted, you will be adding nodes to handle the extra data as 

> time goes by (and running clean-up on the existing nodes).

  

What will happen when adding new nodes, as you say, though?

If I have a 1GB sstable with 250GB of data that will be no longer useful 

(as a new node will be the new owner) will that sstable be reduced to 

750GB by "cleanup" or will it retain old data?

  

Thanks,

  

-- 

Lapo Luchini

mailto:lapo@lapo.it

  


[Attachment #3 (text/html)]

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><meta \
content="text/html;charset=UTF-8" http-equiv="Content-Type"></head><body ><div \
style="font-family: Verdana, Arial, Helvetica, sans-serif; font-size: \
10pt;"><div>Another solution: distribute data in more tables, for example you could \
create multiple tables based on value or hash_bucket of one of the columns, by doing \
this current data volume&nbsp; and compaction overhead would be divided to the number \
of underlying tables. Although there is a limitation for number of tables in \
Cassandra (a few hundreds).<br></div><div>I wish STCS simply had a limitation for \
maximum sstable size so sstables bigger that this limit would not be compacted at \
all, that would have solved most of similar problems?!<br></div><div><br></div><div \
id="Zm-_Id_-Sgn" data-zbluepencil-ignore="true" data-sigid="7268527000000008005"><p \
style="" unicode=""><span class="colour" style="color:rgb(42, 42, 42)">Sent using <a \
style="color:#598fde;" href="https://www.zoho.com/mail/" target="_blank">Zoho \
Mail</a></span><br></p></div><div><br></div><div class="zmail_extra_hr" \
style="border-top: 1px solid rgb(204, 204, 204); height: 0px; margin-top: 10px; \
margin-bottom: 10px; line-height: 0px;"><br></div><div class="zmail_extra" \
data-zbluepencil-ignore="true"><div><br></div><div id="Zm-_Id_-Sgn1">---- On Fri, 30 \
Dec 2022 21:43:27 +0330 <b>Durity, Sean R via user \
&lt;user@cassandra.apache.org&gt;</b> wrote ---<br></div><div><br></div><blockquote \
id="blockquote_zmail" style="margin: \
0px;"><style>div.zm_-4415122625234993463_parse_5932018428011287837 a:link, \
div.zm_-4415122625234993463_parse_5932018428011287837 span.x_1358217986MsoHyperlink { \
color: blue;  text-decoration: underline;
}</style><div style="" class="zm_-4415122625234993463_parse_5932018428011287837"><div \
class="x_1358217986WordSection1"><p class="" style="margin: 0px;"><span class="font" \
style="font-family:&quot; undefined: Calibri&quot;"><span class="size" \
style="font-size: 10pt; margin: 0px;"><span class="size" style="font-size:11pt">Yes, \
clean-up will reduce the disk space on the existing nodes by re-writing only the data \
that the node now owns into new sstables.</span></span></span><br></p><p class="" \
style="margin: 0px;"><span class="font" style="font-family:&quot; undefined: \
Calibri&quot;"><span class="size" style="font-size: 10pt; margin: 0px;"><span \
class="size" style="font-size:11pt">&nbsp;</span></span></span><br></p><p class="" \
style="margin: 0px;"><span class="font" style="font-family:&quot; undefined: \
Calibri&quot;"><span class="size" style="font-size: 10pt; margin: 0px;"><span \
class="size" style="font-size:11pt">&nbsp;</span></span></span><br></p><p class="" \
style="margin: 0px;"><span class="font" style="font-family:&quot; undefined: \
Calibri&quot;"><span class="size" style="font-size: 10pt; margin: 0px;"><span \
class="size" style="font-size:11pt">Sean R. Durity</span></span></span><br></p><p \
class="" style="margin: 0px;"><span class="font" style="font-family:&quot; undefined: \
Calibri&quot;"><span class="size" style="font-size: 10pt; margin: 0px;"><span \
class="size" style="font-size:11pt">DB Solutions</span></span></span><br></p><p \
class="" style="margin: 0px;"><span class="font" style="font-family:&quot; undefined: \
Calibri&quot;"><span class="size" style="font-size: 10pt; margin: 0px;"><span \
class="size" style="font-size:11pt">Staff Systems Engineer – \
Cassandra</span></span></span><br></p><p class="" style="margin: 0px;"><span \
class="font" style="font-family:&quot; undefined: Calibri&quot;"><span class="size" \
style="font-size: 10pt; margin: 0px;"><span class="size" \
style="font-size:11pt">&nbsp;</span></span></span><br></p><div style="border :  none; \
border-top :  solid #E1E1E1 1.0pt; padding :  3.0pt 0in 0in 0in;"><p class="" \
style="margin: 0px;"><span class="font" style="font-family:&quot; undefined: \
Calibri&quot;"><span class="size" style="font-size: 10pt; margin: 0px;"><b><span \
class="size" style="font-size:11pt">From:</span></b><span class="size" \
style="font-size:11pt"> Lapo Luchini &lt;<a href="mailto:lapo@lapo.it" \
target="_blank">lapo@lapo.it</a>&gt; <br> <b>Sent:</b> Friday, December 30, 2022 4:12 \
AM<br> <b>To:</b> <a href="mailto:user@cassandra.apache.org" \
target="_blank">user@cassandra.apache.org</a><br> <b>Subject:</b> [EXTERNAL] Re: Best \
compaction strategy for rarely used data</span></span></span></p></div><p class="" \
style="margin: 0px;"><span class="font" style="font-family:&quot; undefined: \
Calibri&quot;"><span class="size" style="font-size: 10pt; margin: \
0px;">&nbsp;</span></span><br></p><div><p class="" style="margin: 0px;"><span \
class="font" style="font-family:&quot; undefined: Calibri&quot;"><span class="size" \
style="font-size: 10pt; margin: 0px;"><span class="colour" style="color:white"><span \
class="size" style="font-size:1pt">On 2022-12-29 21: 54, Durity, Sean R via user \
wrote: &gt; At some point you will end up with large sstables (like 1 TB) that won't \
&gt; compact because there are not  4 similar-sized ones able to be compacted Yes, \
that's exactly what's happening. </span></span></span></span><br></p></div><div><p \
class="" style="margin: 0px;"><span class="font" style="font-family:&quot; undefined: \
Calibri&quot;"><span class="size" style="font-size: 10pt; margin: 0px;"><span \
class="colour" style="color:white"><span class="size" \
style="font-size:1pt"></span></span></span></span><br></p><p class="" style="margin: \
0px;"><span class="font" style="font-family:&quot; undefined: Calibri&quot;"><span \
class="size" style="font-size: 10pt; margin: 0px;"><span class="size" \
style="font-size:11pt">&nbsp;</span></span></span><br></p><p \
class="x_1358217986msipfooterefee4182" align="center" style="margin-right: 0in; \
margin-left: 0in; text-align: center;"><span class="size" style="font-size: 11pt; \
margin-right: 0in; margin-left: 0in; text-align: center;"><span class="colour" \
style="color:black"><span class="size" style="font-size:8pt">INTERNAL \
USE</span></span></span><br></p></div><pre style="margin: 0in; font-size: 10pt; \
font-family: &quot; undefined: CourierNew&quot; white-space: pre-wrap;"><span \
class="font" style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">On 2022-12-29 21:54, Durity, Sean R via user \
wrote:</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; font-family: \
&quot; undefined: CourierNew&quot;"><span class="font" style="font-family:Arial, \
sans-serif"><span class="size" style="font-size:11pt">&gt; At some point you will end \
up with large sstables (like 1 TB) that won't </span></span><br></pre><pre \
style="margin: 0in; font-size: 10pt; font-family: &quot; undefined: \
CourierNew&quot;"><span class="font" style="font-family:Arial, sans-serif"><span \
class="size" style="font-size:11pt">&gt; compact because there are not 4 \
similar-sized ones able to be compacted </span></span><br></pre><pre style="margin: \
0in; font-size: 10pt; font-family: &quot; undefined: CourierNew&quot;"><span \
class="font" style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">&nbsp;</span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: &quot; undefined: CourierNew&quot;"><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" style="font-size:11pt">Yes, \
that's exactly what's happening.</span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: &quot; undefined: CourierNew&quot;"><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">&nbsp;</span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: &quot; undefined: CourierNew&quot;"><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" style="font-size:11pt">I'll \
see maybe just one more compaction, since the biggest sstable is \
</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; font-family: &quot; \
undefined: CourierNew&quot;"><span class="font" style="font-family:Arial, \
sans-serif"><span class="size" style="font-size:11pt">already more than 20% of \
residual free space.</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; \
font-family: &quot; undefined: CourierNew&quot;"><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">&nbsp;</span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: &quot; undefined: CourierNew&quot;"><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" style="font-size:11pt">&gt; \
For me, the backup strategy shouldn't drive the rest.</span></span><br></pre><pre \
style="margin: 0in; font-size: 10pt; font-family: &quot; undefined: \
CourierNew&quot;"><span class="font" style="font-family:Arial, sans-serif"><span \
class="size" style="font-size:11pt">&nbsp;</span></span><br></pre><pre style="margin: \
0in; font-size: 10pt; font-family: &quot; undefined: CourierNew&quot;"><span \
class="font" style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">Mhh, yes, that makes sense.</span></span><br></pre><pre \
style="margin: 0in; font-size: 10pt; font-family: &quot; undefined: \
CourierNew&quot;"><span class="font" style="font-family:Arial, sans-serif"><span \
class="size" style="font-size:11pt">&nbsp;</span></span><br></pre><pre style="margin: \
0in; font-size: 10pt; font-family: &quot; undefined: CourierNew&quot;"><span \
class="font" style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">&gt; And if your data is ever-growing \
</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; font-family: &quot; \
undefined: CourierNew&quot;"><span class="font" style="font-family:Arial, \
sans-serif"><span class="size" style="font-size:11pt">&gt; and never deleted, you \
will be adding nodes to handle the extra data as </span></span><br></pre><pre \
style="margin: 0in; font-size: 10pt; font-family: &quot; undefined: \
CourierNew&quot;"><span class="font" style="font-family:Arial, sans-serif"><span \
class="size" style="font-size:11pt">&gt; time goes by (and running clean-up on the \
existing nodes).</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; \
font-family: &quot; undefined: CourierNew&quot;"><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">&nbsp;</span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: &quot; undefined: CourierNew&quot;"><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" style="font-size:11pt">What \
will happen when adding new nodes, as you say, though?</span></span><br></pre><pre \
style="margin: 0in; font-size: 10pt; font-family: &quot; undefined: \
CourierNew&quot;"><span class="font" style="font-family:Arial, sans-serif"><span \
class="size" style="font-size:11pt">If I have a 1GB sstable with 250GB of data that \
will be no longer useful </span></span><br></pre><pre style="margin: 0in; font-size: \
10pt; font-family: &quot; undefined: CourierNew&quot;"><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" style="font-size:11pt">(as a \
new node will be the new owner) will that sstable be reduced to \
</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; font-family: &quot; \
undefined: CourierNew&quot;"><span class="font" style="font-family:Arial, \
sans-serif"><span class="size" style="font-size:11pt">750GB by "cleanup" or will it \
retain old data?</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; \
font-family: &quot; undefined: CourierNew&quot;"><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">&nbsp;</span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: &quot; undefined: CourierNew&quot;"><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">Thanks,</span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: &quot; undefined: CourierNew&quot;"><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">&nbsp;</span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: &quot; undefined: CourierNew&quot;"><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" style="font-size:11pt">-- \
</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; font-family: &quot; \
undefined: CourierNew&quot;"><span class="font" style="font-family:Arial, \
sans-serif"><span class="size" style="font-size:11pt">Lapo \
Luchini</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; font-family: \
&quot; undefined: CourierNew&quot;"><span class="font" style="font-family:Arial, \
sans-serif"><span class="size" style="font-size:11pt"><a href="mailto:lapo@lapo.it" \
target="_blank">lapo@lapo.it</a></span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: &quot; undefined: CourierNew&quot;"><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">&nbsp;</span></span><br></pre></div></div></blockquote></div><div><br></div></div><br></body></html>




[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic