[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-dev
Subject:    Re: TWCS High Disk Space Usage Troubleshooting - Looking for suggestions
From:       Jeff Jirsa <jjirsa () gmail ! com>
Date:       2017-08-10 3:44:14
Message-ID: 7468E2DA-699E-4598-BBB6-14F53BF46C24 () gmail ! com
[Download RAW message or body]


Looks a lot like read repair but impossible to tell for sure


-- 
Jeff Jirsa


> On Aug 9, 2017, at 4:34 PM, Sumanth Pasupuleti <sumanth.pasupuleti.is@gmail.com> \
> wrote: 
> My final try on pushing the attachment over.
> <SSTableSlicer_output.png>
> 
> ​
> 
> > On Wed, Aug 9, 2017 at 4:01 PM, Sumanth Pasupuleti \
> > <sumanth.pasupuleti.is@gmail.com> wrote: Thanks for the insights Jeff! I did go \
> > through the tickets around dropping expired sstables that have overlaps - based \
> > on what I understand, the only undesirable impact of that would be possible data \
> > resurrection. 
> > I have now attached the output of sstableslicer with the mail. Will submit a \
> > patch for review. 
> > Thanks,
> > Sumanth
> > 
> > > On Tue, Aug 8, 2017 at 9:49 PM, Jeff Jirsa <jjirsa@gmail.com> wrote:
> > > The most likely cause is read repairs due to consistency level repairs
> > > (digest mismatch). The only way to actually eliminate read repair is to
> > > read with CL:ONE, which almost nobody does (at least in time series use
> > > cases, because it implies you probably write with ALL, or run repair which
> > > - as you've noted - often isn't necessary in ttl-only use cases).
> > > 
> > > I can't see the image, but more tools for understanding sstable state are
> > > never a bad thing (as long as they're generally useful and maintainable).
> > > 
> > > For what it's worth, there are tickets in flight for being more aggressive
> > > at dropping overlaps, but there are companies that use tools that stop the
> > > cluster, use sstablemetadata to identify sstables we knew should be fully
> > > expired, and manually remove them (/bin/rm) before starting cassandra
> > > again. It works reasonably well IF (and only if) you write all data with
> > > TTLs, and you can identify fully expired sstables based on maximum
> > > timestamps.
> > > 
> > > 
> > > 
> > > 
> > > On Tue, Aug 8, 2017 at 8:51 PM, Sumanth Pasupuleti <
> > > sumanth.pasupuleti.is@gmail.com> wrote:
> > > 
> > > > Hi,
> > > > > 
> > > > > We use TWCS in a few of the column families that have TTL based
> > > > > time-series data, and no explicit deletes are issued. Over the time, we
> > > > > observed the disk usage has been increasing beyond the expected levels.
> > > > > 
> > > > > Data directory in a particular node shows SSTables that are more than
> > > > > 16days old, while the bucket size is configured at 12hours, TTL is at
> > > > > 15days and GC grace at 1hour.
> > > > > Upon using sstableexpiredblockers, we got quite a few sets of blocking
> > > > > and blocked SSTables. SSTableMetadata that is shown in the output indicates
> > > > > there is an overlap in the MinTS-MaxTS period among the blocking SSTable
> > > > > and the blocked SSTables, which is preventing the older SSTables from
> > > > > getting dropped/deleted.
> > > > > 
> > > > > Following are the possible root causes we considered
> > > > > 
> > > > > 1. Hints - old data hints getting replayed from the coordinator node.
> > > > > We ruled this out since hints live for no more than 1 day based on our
> > > > > configuration.
> > > > > 2. External compactions - no external compactions were run, that
> > > > > could cause compaction of SSTables across the TWCS buckets.
> > > > > 3.  Read repairs - this is ruled out as well, since we never ran
> > > > > external repairs, and read repair chance on the TWCS column families has
> > > > > been set to 0.
> > > > > 4.  Application team writing data with older timestamp (in newer
> > > > > SSTables).
> > > > > 
> > > > > 
> > > > > 1. We wanted to identify the specific row keys with older timestamps
> > > > > in the blocking SSTable, that could be causing this issue to occur. We
> > > > > considered using SSTable2Keys/json, however, since both the tools involve
> > > > > outputting the entire content/keys of the SSTable in the order of the keys,
> > > > > they were not helpful in this case.
> > > > > 2. Since we wanted to get data on a few oldest cells with
> > > > > timestamps, we created a tool mostly based off of sstable2json, called
> > > > > sstableslicer, to output 'n' top/bottom cells in an SSTable, ordered either
> > > > > on writetime/localDeletionTime. This helped us identify the specific cells
> > > > > in new SSTables with older timestamps, which further helped in debugging on
> > > > > the application end. From application team perspective, however, writing
> > > > > data with old timestamp is not a possible scenario.
> > > > > 
> > > > > 3. Below is a sample output of sstableslicer
> > > > [image: Inline image 2]
> > > > 
> > > > 
> > > > > Looking for suggestions, especially around following two things:
> > > > > 
> > > > > 1. Did we miss any other case in TWCS that could be causing such
> > > > > overlap?
> > > > > 2. Does sstableslicer seem valuable, to be included in Apache C*? If
> > > > > yes, I shall create a JIRA and submit a PR/patch for review.
> > > > > 
> > > > > C* version we use is 2.1.17.
> > > > 
> > > > Thanks,
> > > > > Sumanth
> > > > > 
> > > > 
> > 
> 



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic