[prev in list] [next in list] [prev in thread] [next in thread]
List: cassandra-user
Subject: RE: Best compaction strategy for rarely used data
From: "onmstester onmstester via user" <user () cassandra ! apache ! org>
Date: 2023-01-07 6:12:51
Message-ID: 1858ad1521c.b5b003bd9157.2001233668416302877 () zoho ! com
[Download RAW message or body]
Another solution: distribute data in more tables, for example you could create \
multiple tables based on value or hash_bucket of one of the columns, by doing this \
current data volume and compaction overhead would be divided to the number of \
underlying tables. Although there is a limitation for number of tables in Cassandra \
(a few hundreds).
I wish STCS simply had a limitation for maximum sstable size so sstables bigger that \
this limit would not be compacted at all, that would have solved most of similar \
problems?!
Sent using https://www.zoho.com/mail/
---- On Fri, 30 Dec 2022 21:43:27 +0330 Durity, Sean R via user \
<user@cassandra.apache.org> wrote ---
Yes, clean-up will reduce the disk space on the existing nodes by re-writing only the \
data that the node now owns into new sstables.
Sean R. Durity
DB Solutions
Staff Systems Engineer – Cassandra
From: Lapo Luchini <mailto:lapo@lapo.it>
Sent: Friday, December 30, 2022 4:12 AM
To: mailto:user@cassandra.apache.org
Subject: [EXTERNAL] Re: Best compaction strategy for rarely used data
On 2022-12-29 21: 54, Durity, Sean R via user wrote: > At some point you will end \
up with large sstables (like 1 TB) that won't > compact because there are not 4 \
similar-sized ones able to be compacted Yes, that's exactly what's happening.
INTERNAL USE
On 2022-12-29 21:54, Durity, Sean R via user wrote:
> At some point you will end up with large sstables (like 1 TB) that won't
> compact because there are not 4 similar-sized ones able to be compacted
Yes, that's exactly what's happening.
I'll see maybe just one more compaction, since the biggest sstable is
already more than 20% of residual free space.
> For me, the backup strategy shouldn't drive the rest.
Mhh, yes, that makes sense.
> And if your data is ever-growing
> and never deleted, you will be adding nodes to handle the extra data as
> time goes by (and running clean-up on the existing nodes).
What will happen when adding new nodes, as you say, though?
If I have a 1GB sstable with 250GB of data that will be no longer useful
(as a new node will be the new owner) will that sstable be reduced to
750GB by "cleanup" or will it retain old data?
Thanks,
--
Lapo Luchini
mailto:lapo@lapo.it
[Attachment #3 (text/html)]
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><meta \
content="text/html;charset=UTF-8" http-equiv="Content-Type"></head><body ><div \
style="font-family: Verdana, Arial, Helvetica, sans-serif; font-size: \
10pt;"><div>Another solution: distribute data in more tables, for example you could \
create multiple tables based on value or hash_bucket of one of the columns, by doing \
this current data volume and compaction overhead would be divided to the number \
of underlying tables. Although there is a limitation for number of tables in \
Cassandra (a few hundreds).<br></div><div>I wish STCS simply had a limitation for \
maximum sstable size so sstables bigger that this limit would not be compacted at \
all, that would have solved most of similar problems?!<br></div><div><br></div><div \
id="Zm-_Id_-Sgn" data-zbluepencil-ignore="true" data-sigid="7268527000000008005"><p \
style="" unicode=""><span class="colour" style="color:rgb(42, 42, 42)">Sent using <a \
style="color:#598fde;" href="https://www.zoho.com/mail/" target="_blank">Zoho \
Mail</a></span><br></p></div><div><br></div><div class="zmail_extra_hr" \
style="border-top: 1px solid rgb(204, 204, 204); height: 0px; margin-top: 10px; \
margin-bottom: 10px; line-height: 0px;"><br></div><div class="zmail_extra" \
data-zbluepencil-ignore="true"><div><br></div><div id="Zm-_Id_-Sgn1">---- On Fri, 30 \
Dec 2022 21:43:27 +0330 <b>Durity, Sean R via user \
<user@cassandra.apache.org></b> wrote ---<br></div><div><br></div><blockquote \
id="blockquote_zmail" style="margin: \
0px;"><style>div.zm_-4415122625234993463_parse_5932018428011287837 a:link, \
div.zm_-4415122625234993463_parse_5932018428011287837 span.x_1358217986MsoHyperlink { \
color: blue; text-decoration: underline;
}</style><div style="" class="zm_-4415122625234993463_parse_5932018428011287837"><div \
class="x_1358217986WordSection1"><p class="" style="margin: 0px;"><span class="font" \
style="font-family:" undefined: Calibri""><span class="size" \
style="font-size: 10pt; margin: 0px;"><span class="size" style="font-size:11pt">Yes, \
clean-up will reduce the disk space on the existing nodes by re-writing only the data \
that the node now owns into new sstables.</span></span></span><br></p><p class="" \
style="margin: 0px;"><span class="font" style="font-family:" undefined: \
Calibri""><span class="size" style="font-size: 10pt; margin: 0px;"><span \
class="size" style="font-size:11pt"> </span></span></span><br></p><p class="" \
style="margin: 0px;"><span class="font" style="font-family:" undefined: \
Calibri""><span class="size" style="font-size: 10pt; margin: 0px;"><span \
class="size" style="font-size:11pt"> </span></span></span><br></p><p class="" \
style="margin: 0px;"><span class="font" style="font-family:" undefined: \
Calibri""><span class="size" style="font-size: 10pt; margin: 0px;"><span \
class="size" style="font-size:11pt">Sean R. Durity</span></span></span><br></p><p \
class="" style="margin: 0px;"><span class="font" style="font-family:" undefined: \
Calibri""><span class="size" style="font-size: 10pt; margin: 0px;"><span \
class="size" style="font-size:11pt">DB Solutions</span></span></span><br></p><p \
class="" style="margin: 0px;"><span class="font" style="font-family:" undefined: \
Calibri""><span class="size" style="font-size: 10pt; margin: 0px;"><span \
class="size" style="font-size:11pt">Staff Systems Engineer – \
Cassandra</span></span></span><br></p><p class="" style="margin: 0px;"><span \
class="font" style="font-family:" undefined: Calibri""><span class="size" \
style="font-size: 10pt; margin: 0px;"><span class="size" \
style="font-size:11pt"> </span></span></span><br></p><div style="border : none; \
border-top : solid #E1E1E1 1.0pt; padding : 3.0pt 0in 0in 0in;"><p class="" \
style="margin: 0px;"><span class="font" style="font-family:" undefined: \
Calibri""><span class="size" style="font-size: 10pt; margin: 0px;"><b><span \
class="size" style="font-size:11pt">From:</span></b><span class="size" \
style="font-size:11pt"> Lapo Luchini <<a href="mailto:lapo@lapo.it" \
target="_blank">lapo@lapo.it</a>> <br> <b>Sent:</b> Friday, December 30, 2022 4:12 \
AM<br> <b>To:</b> <a href="mailto:user@cassandra.apache.org" \
target="_blank">user@cassandra.apache.org</a><br> <b>Subject:</b> [EXTERNAL] Re: Best \
compaction strategy for rarely used data</span></span></span></p></div><p class="" \
style="margin: 0px;"><span class="font" style="font-family:" undefined: \
Calibri""><span class="size" style="font-size: 10pt; margin: \
0px;"> </span></span><br></p><div><p class="" style="margin: 0px;"><span \
class="font" style="font-family:" undefined: Calibri""><span class="size" \
style="font-size: 10pt; margin: 0px;"><span class="colour" style="color:white"><span \
class="size" style="font-size:1pt">On 2022-12-29 21: 54, Durity, Sean R via user \
wrote: > At some point you will end up with large sstables (like 1 TB) that won't \
> compact because there are not 4 similar-sized ones able to be compacted Yes, \
that's exactly what's happening. </span></span></span></span><br></p></div><div><p \
class="" style="margin: 0px;"><span class="font" style="font-family:" undefined: \
Calibri""><span class="size" style="font-size: 10pt; margin: 0px;"><span \
class="colour" style="color:white"><span class="size" \
style="font-size:1pt"></span></span></span></span><br></p><p class="" style="margin: \
0px;"><span class="font" style="font-family:" undefined: Calibri""><span \
class="size" style="font-size: 10pt; margin: 0px;"><span class="size" \
style="font-size:11pt"> </span></span></span><br></p><p \
class="x_1358217986msipfooterefee4182" align="center" style="margin-right: 0in; \
margin-left: 0in; text-align: center;"><span class="size" style="font-size: 11pt; \
margin-right: 0in; margin-left: 0in; text-align: center;"><span class="colour" \
style="color:black"><span class="size" style="font-size:8pt">INTERNAL \
USE</span></span></span><br></p></div><pre style="margin: 0in; font-size: 10pt; \
font-family: " undefined: CourierNew" white-space: pre-wrap;"><span \
class="font" style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">On 2022-12-29 21:54, Durity, Sean R via user \
wrote:</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; font-family: \
" undefined: CourierNew""><span class="font" style="font-family:Arial, \
sans-serif"><span class="size" style="font-size:11pt">> At some point you will end \
up with large sstables (like 1 TB) that won't </span></span><br></pre><pre \
style="margin: 0in; font-size: 10pt; font-family: " undefined: \
CourierNew""><span class="font" style="font-family:Arial, sans-serif"><span \
class="size" style="font-size:11pt">> compact because there are not 4 \
similar-sized ones able to be compacted </span></span><br></pre><pre style="margin: \
0in; font-size: 10pt; font-family: " undefined: CourierNew""><span \
class="font" style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt"> </span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: " undefined: CourierNew""><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" style="font-size:11pt">Yes, \
that's exactly what's happening.</span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: " undefined: CourierNew""><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt"> </span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: " undefined: CourierNew""><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" style="font-size:11pt">I'll \
see maybe just one more compaction, since the biggest sstable is \
</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; font-family: " \
undefined: CourierNew""><span class="font" style="font-family:Arial, \
sans-serif"><span class="size" style="font-size:11pt">already more than 20% of \
residual free space.</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; \
font-family: " undefined: CourierNew""><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt"> </span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: " undefined: CourierNew""><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" style="font-size:11pt">> \
For me, the backup strategy shouldn't drive the rest.</span></span><br></pre><pre \
style="margin: 0in; font-size: 10pt; font-family: " undefined: \
CourierNew""><span class="font" style="font-family:Arial, sans-serif"><span \
class="size" style="font-size:11pt"> </span></span><br></pre><pre style="margin: \
0in; font-size: 10pt; font-family: " undefined: CourierNew""><span \
class="font" style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">Mhh, yes, that makes sense.</span></span><br></pre><pre \
style="margin: 0in; font-size: 10pt; font-family: " undefined: \
CourierNew""><span class="font" style="font-family:Arial, sans-serif"><span \
class="size" style="font-size:11pt"> </span></span><br></pre><pre style="margin: \
0in; font-size: 10pt; font-family: " undefined: CourierNew""><span \
class="font" style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">> And if your data is ever-growing \
</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; font-family: " \
undefined: CourierNew""><span class="font" style="font-family:Arial, \
sans-serif"><span class="size" style="font-size:11pt">> and never deleted, you \
will be adding nodes to handle the extra data as </span></span><br></pre><pre \
style="margin: 0in; font-size: 10pt; font-family: " undefined: \
CourierNew""><span class="font" style="font-family:Arial, sans-serif"><span \
class="size" style="font-size:11pt">> time goes by (and running clean-up on the \
existing nodes).</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; \
font-family: " undefined: CourierNew""><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt"> </span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: " undefined: CourierNew""><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" style="font-size:11pt">What \
will happen when adding new nodes, as you say, though?</span></span><br></pre><pre \
style="margin: 0in; font-size: 10pt; font-family: " undefined: \
CourierNew""><span class="font" style="font-family:Arial, sans-serif"><span \
class="size" style="font-size:11pt">If I have a 1GB sstable with 250GB of data that \
will be no longer useful </span></span><br></pre><pre style="margin: 0in; font-size: \
10pt; font-family: " undefined: CourierNew""><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" style="font-size:11pt">(as a \
new node will be the new owner) will that sstable be reduced to \
</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; font-family: " \
undefined: CourierNew""><span class="font" style="font-family:Arial, \
sans-serif"><span class="size" style="font-size:11pt">750GB by "cleanup" or will it \
retain old data?</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; \
font-family: " undefined: CourierNew""><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt"> </span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: " undefined: CourierNew""><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt">Thanks,</span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: " undefined: CourierNew""><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt"> </span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: " undefined: CourierNew""><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" style="font-size:11pt">-- \
</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; font-family: " \
undefined: CourierNew""><span class="font" style="font-family:Arial, \
sans-serif"><span class="size" style="font-size:11pt">Lapo \
Luchini</span></span><br></pre><pre style="margin: 0in; font-size: 10pt; font-family: \
" undefined: CourierNew""><span class="font" style="font-family:Arial, \
sans-serif"><span class="size" style="font-size:11pt"><a href="mailto:lapo@lapo.it" \
target="_blank">lapo@lapo.it</a></span></span><br></pre><pre style="margin: 0in; \
font-size: 10pt; font-family: " undefined: CourierNew""><span class="font" \
style="font-family:Arial, sans-serif"><span class="size" \
style="font-size:11pt"> </span></span><br></pre></div></div></blockquote></div><div><br></div></div><br></body></html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic