[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-user
Subject:    Re: Repair on a slow node (or is it?)
From:       Lapo Luchini <lapo () lapo ! it>
Date:       2021-03-31 14:11:52
Message-ID: s41vva$9o5$1 () ciao ! gmane ! io
[Download RAW message or body]

Thanks for all your suggestions!

I'm looking into it and so far it seems to be mainly a problem of disk 
I/O, as the host is running on spindle disks and being a DR of an entire 
cluster gives it many changes to follow.

First (easy) try will be to add an SSD as ZFS cache (ZIL + L2ARC).
Should make a huge difference alrady.

I will then later on study Medusa/tablesnap too, thanks.

cheers,
Lapo

On 2021-03-29 12:32, Kane Wilson wrote:
> Check what your compactionthroughput is set to, as it will impact the 
> validation compactions. also what kind of disks does the DR node have? 
> The validation compaction sizes are likely fine, I'm not sure of the 
> exact details but it's normal to expect very large validations.
> 
> Rebuilding would not be an ideal mechanism for repairing, and would 
> likely be slower and chew up a lot of disk space. It's also not 
> guaranteed to give you data that will be consistent with the other DC, 
> as replicas will only be streamed from one node.
> 
>    I think you're better off looking at setting up regular backups and if 
> you really need it commitlog backups. The storage would be cheaper and 
> more reliable, plus less impactful on your production DC. Restoring will 
> also be a lot easier and faster as well, as restoring from a single node 
> DC will be network bottlenecked. There are various tools around that do 
> this for you such as medusa or tablesnap.

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic