[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-user
Subject:    RE:  Best compaction strategy for rarely used data
From:       "Durity, Sean R via user" <user () cassandra ! apache ! org>
Date:       2022-12-29 20:54:08
Message-ID: DM8P108MB0229FB389541B8AFDE94853CBCF39 () DM8P108MB0229 ! NAMP108 ! PROD ! OUTLOOK ! COM
[Download RAW message or body]

If there isn't a TTL and timestamp on the data, I'm not sure the benefits of TWCS for \
this use case. I would stick with size-tiered. At some point you will end up with \
large sstables (like 1 TB) that won't compact because there are not 4 similar-sized \
ones able to be compacted (assuming default parameters for STCS). And if your data is \
ever-growing and never deleted, you will be adding nodes to handle the extra data as \
time goes by (and running clean-up on the existing nodes). For me, the backup \
strategy shouldn't drive the rest.


Sean R. Durity

From: Paul Chandler <paul@redshots.com>
Sent: Thursday, December 29, 2022 4:51 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Best compaction strategy for rarely used data

Hi Lapo Take a look at TWCS, I think that could help your use case: \
https: //thelastpickle. com/blog/2016/12/08/TWCS-part1. html \
[thelastpickle. com] Regards Paul Chandler Sent from my iPhone On 29 Dec 2022, at \
08: 55, Lapo Luchini <lapo@ lapo. it>

Hi Lapo

Take a look at TWCS, I think that could help your use case: \
https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html \
[thelastpickle.com]<https://urldefense.com/v3/__https:/thelastpickle.com/blog/2016/12/ \
08/TWCS-part1.html__;!!M-nmYVHPHQ!MthfOMR5U4-KTIvyS7qxtedGqxTx_C4S8cTa5Ym_htV3A1GA835t-aZNPYmXJexah45xutyI4Gra2ZwqSDcj$>


Regards

Paul Chandler
Sent from my iPhone

On 29 Dec 2022, at 08:55, Lapo Luchini <lapo@lapo.it<mailto:lapo@lapo.it>> wrote:
Hi, I have a table which gets (a lot of) data that is written once and very rarely \
read (it is used for data that is mandatory for regulatory reasons), and almost never \
deleted.

I'm using the default SCTS as at the time I didn't know any better, but SSTables size \
are getting huge, which is a problem because they both are getting to the size of the \
available disk and both because I'm using a snapshot-based system to backup the node \
(and thus compacting a huge SSTable into an even bigger one generates a lot of \
traffic for mostly-old data).

I'm thinking about switching to LCS (mainly to solve the size issue), but I read that \
it is "optimized for read heavy workloads […] not a good choice for immutable time \
series data". Given that I don't really care about write nor read speed, but would \
like SSTables size to have a upper limit, would this strategy still be the best?

PS: Googling around a strategy called "incremental compaction" (ICS) keeps getting in \
results, but that's only available in ScyllaDB, right?

--
Lapo Luchini
lapo@lapo.it<mailto:lapo@lapo.it>


INTERNAL USE


[Attachment #3 (text/html)]

<html xmlns:v="urn:schemas-microsoft-com:vml" \
xmlns:o="urn:schemas-microsoft-com:office:office" \
xmlns:w="urn:schemas-microsoft-com:office:word" \
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" \
xmlns="http://www.w3.org/TR/REC-html40"> <head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	font-size:10.0pt;
	font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
span.EmailStyle18
	{mso-style-type:personal-reply;
	font-family:"Calibri",sans-serif;
	color:windowtext;}
p.msipfooterefee4182, li.msipfooterefee4182, div.msipfooterefee4182
	{mso-style-name:msipfooterefee4182;
	mso-margin-top-alt:auto;
	margin-right:0in;
	mso-margin-bottom-alt:auto;
	margin-left:0in;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-size:10.0pt;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">If there isn't a TTL and \
timestamp on the data, I'm not sure the benefits of TWCS for this use case. I would \
stick with size-tiered. At some point you will end up with large sstables (like 1 TB) \
that won't  compact because there are not 4 similar-sized ones able to be compacted \
(assuming default parameters for STCS). And if your data is ever-growing and never \
deleted, you will be adding nodes to handle the extra data as time goes by (and \
running clean-up on the  existing nodes). For me, the backup strategy shouldn't drive \
the rest. <o:p></o:p></span></p> <p class="MsoNormal"><span \
style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p> <p class="MsoNormal"><span \
style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p> <p class="MsoNormal"><span \
style="font-size:11.0pt">Sean R. Durity<o:p></o:p></span></p> <p \
class="MsoNormal"><span style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p> <div \
style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in"> <p \
class="MsoNormal"><b><span style="font-size:11.0pt">From:</span></b><span \
style="font-size:11.0pt"> Paul Chandler &lt;paul@redshots.com&gt; <br>
<b>Sent:</b> Thursday, December 29, 2022 4:51 AM<br>
<b>To:</b> user@cassandra.apache.org<br>
<b>Subject:</b> [EXTERNAL] Re: Best compaction strategy for rarely used \
data<o:p></o:p></span></p> </div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<div>
<p class="MsoNormal" style="mso-line-height-alt:.75pt"><span \
style="font-size:1.0pt;color:white">Hi Lapo Take a look at TWCS, I think that could \
help your use case: https: //thelastpickle. com/blog/2016/12/08/TWCS-part1. html \
[thelastpickle. com] Regards Paul  Chandler Sent from my iPhone On 29 Dec 2022, at \
08: 55, Lapo Luchini &lt;lapo@ lapo. it&gt; <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="mso-line-height-alt:.75pt"><span \
style="font-size:1.0pt;color:white"><o:p></o:p></span></p> </div>
<p class="MsoNormal"><span style="font-size:11.0pt">Hi Lapo<o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">Take a look at TWCS, I think that \
could help your use case:&nbsp;<a \
href="https://urldefense.com/v3/__https:/thelastpickle.com/blog/2016/12/08/TWCS-part1. \
html__;!!M-nmYVHPHQ!MthfOMR5U4-KTIvyS7qxtedGqxTx_C4S8cTa5Ym_htV3A1GA835t-aZNPYmXJexah45xutyI4Gra2ZwqSDcj$">https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
  [thelastpickle.com]</a><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span \
style="font-size:11.0pt">Regards&nbsp;<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:11.0pt">Paul \
Chandler<o:p></o:p></span></p> <div>
<p class="MsoNormal"><span style="font-size:11.0pt">Sent from my \
iPhone<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span \
style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p> <blockquote \
style="margin-top:5.0pt;margin-bottom:5.0pt"> <p class="MsoNormal" \
style="margin-bottom:12.0pt"><span style="font-size:11.0pt">On 29 Dec 2022, at 08:55, \
Lapo Luchini &lt;<a href="mailto:lapo@lapo.it">lapo@lapo.it</a>&gt; \
wrote:<o:p></o:p></span></p> </blockquote>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal" \
style="mso-margin-top-alt:5.0pt;margin-right:.5in;margin-bottom:12.0pt;margin-left:.5in">
 <span style="font-size:11.0pt">Hi, I have a table which gets (a lot of) data that \
is written once and very rarely read (it is used for data that is mandatory for \
regulatory reasons), and almost never deleted.<br> <br>
I'm using the default SCTS as at the time I didn't know any better, but SSTables size \
are getting huge, which is a problem because they both are getting to the size of the \
available disk and both because I'm using a snapshot-based system to backup the node  \
(and thus compacting a huge SSTable into an even bigger one generates a lot of \
traffic for mostly-old data).<br> <br>
I'm thinking about switching to LCS (mainly to solve the size issue), but I read that \
it is &quot;optimized for read heavy workloads […] not a good choice for immutable \
time series data&quot;. Given that I don't really care about write nor read speed, \
but would like  SSTables size to have a upper limit, would this strategy still be the \
best?<br> <br>
PS: Googling around a strategy called &quot;incremental compaction&quot; (ICS) keeps \
getting in results, but that's only available in ScyllaDB, right?<br> <br>
-- <br>
Lapo Luchini<br>
<a href="mailto:lapo@lapo.it">lapo@lapo.it</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p>&nbsp;</o:p></span></p>
<p class="msipfooterefee4182" align="center" style="margin:0in;text-align:center">
<span style="font-size:8.0pt;color:black">INTERNAL USE</span><o:p></o:p></p>
</div>
</blockquote>
</div>
</div>
</body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic