'Re: Repair on a slow node (or is it?)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-user
Subject:    Re: Repair on a slow node (or is it?)
From:       Kane Wilson <k () raft ! so>
Date:       2021-03-29 10:32:20
Message-ID: CALc=uxTu=SWcDr827q3+YRUoPC3pMwwub5+1_pkZAVKixnS4hg () mail ! gmail ! com
[Download RAW message or body]

Check what your compactionthroughput is set to, as it will impact the
validation compactions. also what kind of disks does the DR node have? The
validation compaction sizes are likely fine, I'm not sure of the exact
details but it's normal to expect very large validations.

Rebuilding would not be an ideal mechanism for repairing, and would likely
be slower and chew up a lot of disk space. It's also not guaranteed to give
you data that will be consistent with the other DC, as replicas will only
be streamed from one node.

 I think you're better off looking at setting up regular backups and if you
really need it commitlog backups. The storage would be cheaper and more
reliable, plus less impactful on your production DC. Restoring will also be
a lot easier and faster as well, as restoring from a single node DC will be
network bottlenecked. There are various tools around that do this for you
such as medusa or tablesnap.


raft.so - Cassandra consulting, support, managed services

On Mon., 29 Mar. 2021, 20:47 Lapo Luchini, <lapo@lapo.it> wrote:

> Hi all,
>      I have a 6 nodes production cluster with 1.5 TiB load (RF=3) and a
> single-node DC dedicated as a "remote disaster recovery copy" 2.7 TiB.
>
> Doing repairs only on the production cluster takes a semi-decent time
> (24h for the biggest keyspace, which takes 90% of the space), but by
> doing repair across the two DCs takes forever, and segments often fail
> even if I increased Reaper segment time limit to 2h.
>
> In trying to debug the issue, I noticed that "compactionstats -H" on the
> DR node shows huge (and very very slow) validations:
>
> compaction completed  total      unit  progress
> Validation 2.78 GiB   8.11 GiB   bytes 34.33%
> Validation 0 bytes    2.67 TiB   bytes 0.00%
> Validation 1.7 TiB    2.43 TiB   bytes 69.75%
> Validation 124.26 GiB 2.67 TiB   bytes 4.55%
> Validation 536.67 GiB 2.67 TiB   bytes 19.63%
>
> Such validations take a few hours to complete, and as far as I
> understood segment repair always fails on the first try do to those, and
> only has success after a few tries when the original validation executed
> in the first try has ended.
>
> My question is this: is it normal to have to validate all of the
> keyspace content on each segment's validation?
> Is the DB in a "strange" state?
> Would it be useful to issue a "rebuild" on that node, in order to send
> all missing data anyways, and this skipping the lenghty validations?
>
> thanks!
>
> --
> Lapo Luchini
> lapo@lapo.it
>
>

[Attachment #3 (text/html)]

<div dir="auto"><div>Check what your compactionthroughput is set to, as it will \
impact the validation compactions. also what kind of disks does the DR node have? The \
validation compaction sizes are likely fine, I&#39;m not sure of the exact details \
but it&#39;s normal to expect very large validations.</div><div \
dir="auto"><br></div><div dir="auto">Rebuilding would not be an ideal mechanism for \
repairing, and would likely be slower and chew up a lot of disk space. It&#39;s also \
not guaranteed to give you data that will be consistent with the other DC, as \
replicas will only be streamed from one node.</div><div dir="auto"><br></div><div \
dir="auto">  I think you&#39;re better off looking at setting up regular backups and \
if you really need it commitlog backups. The storage would be cheaper and more \
reliable, plus less impactful on your production DC. Restoring will also be a lot \
easier and faster as well, as restoring from a single node DC will be network \
bottlenecked. There are various tools around that do this for you such as medusa or \
tablesnap.</div><div dir="auto"><br></div><div dir="auto"><br><div \
data-smartmail="gmail_signature" dir="auto">raft.so - Cassandra consulting, support, \
managed services</div><br><div class="gmail_quote" dir="auto"><div dir="ltr" \
class="gmail_attr">On Mon., 29 Mar. 2021, 20:47 Lapo Luchini, &lt;<a \
href="mailto:lapo@lapo.it" target="_blank" rel="noreferrer">lapo@lapo.it</a>&gt; \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 \
                .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>
        I have a 6 nodes production cluster with 1.5 TiB load (RF=3) and a <br>
single-node DC dedicated as a &quot;remote disaster recovery copy&quot; 2.7 TiB.<br>
<br>
Doing repairs only on the production cluster takes a semi-decent time <br>
(24h for the biggest keyspace, which takes 90% of the space), but by <br>
doing repair across the two DCs takes forever, and segments often fail <br>
even if I increased Reaper segment time limit to 2h.<br>
<br>
In trying to debug the issue, I noticed that &quot;compactionstats -H&quot; on the \
<br> DR node shows huge (and very very slow) validations:<br>
<br>
compaction completed   total         unit   progress<br>
Validation 2.78 GiB     8.11 GiB     bytes 34.33%<br>
Validation 0 bytes      2.67 TiB     bytes 0.00%<br>
Validation 1.7 TiB      2.43 TiB     bytes 69.75%<br>
Validation 124.26 GiB 2.67 TiB     bytes 4.55%<br>
Validation 536.67 GiB 2.67 TiB     bytes 19.63%<br>
<br>
Such validations take a few hours to complete, and as far as I <br>
understood segment repair always fails on the first try do to those, and <br>
only has success after a few tries when the original validation executed <br>
in the first try has ended.<br>
<br>
My question is this: is it normal to have to validate all of the <br>
keyspace content on each segment&#39;s validation?<br>
Is the DB in a &quot;strange&quot; state?<br>
Would it be useful to issue a &quot;rebuild&quot; on that node, in order to send <br>
all missing data anyways, and this skipping the lenghty validations?<br>
<br>
thanks!<br>
<br>
-- <br>
Lapo Luchini<br>
<a href="mailto:lapo@lapo.it" rel="noreferrer noreferrer" \
target="_blank">lapo@lapo.it</a><br> <br>
</blockquote></div></div></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic