[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-commits
Subject:    [jira] [Commented] (CASSANDRA-8399) Reference Counter exception when dropping user type
From:       "Joshua McKenzie (JIRA)" <jira () apache ! org>
Date:       2014-12-31 23:49:13
Message-ID: JIRA.12758721.1417460297000.123513.1420069753888 () Atlassian ! JIRA
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/CASSANDRA-8399?page=com.atlassian.jira.plu \
gin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14262498#comment-14262498 \
] 

Joshua McKenzie commented on CASSANDRA-8399:
--------------------------------------------

While I agree that the Right Thing here seems to be to protect the entire compaction \
operation by holding a reference, I'm not sure that the 2.X line is appropriate for \
this change at the DataTracker level.  While acquiring and releasing within a single \
SSTableScanner is a cleanly tied together RAII operation that should be an \
"invisible" change from a logical flow / API perspective, pushing that operation into \
markCompacting and unmarkCompacting means we have over 10 upstream users of those \
methods that are having an assumption (and contract) changed on them - namely, that \
if they fail to acquire references on the SSTables in question markCompacting will \
return false.  Correct me if I'm wrong on that - if there's some other more \
appropriate place to make this change than in the DataTracker (haven't worked much in \
this section of the code-base).

A naive change in DataTracker.markCompacting leads to infinite loops (it looks like \
from multiple insertion points) so we'd need to go upstream and fiddle with the \
various marking operations in order to accommodate entries in the SSTableReader \
collections being "unmarkable".  My preference here would be to go with _v2 which \
resolves the ordering problems introduced in CASSANDRA-7932 without introducing a ref \
count on the read path and create a separate ticket for 3.0 to pursue the more \
invasive change of reference counting all compacting sstables.

As you've mentioned several times, reference counting is tricky to get right.  The \
idea of promoting it up to the abstraction of the data tracker for compaction marking \
strikes me as a risky change when we already have quite a few failing unit tests on \
2.X and bugs to resolve.  I definitely think it's the right thing long-term.

> Reference Counter exception when dropping user type
> ---------------------------------------------------
> 
> Key: CASSANDRA-8399
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8399
> Project: Cassandra
> Issue Type: Bug
> Reporter: Philip Thompson
> Assignee: Joshua McKenzie
> Fix For: 2.1.3
> 
> Attachments: 8399_fix_empty_results.txt, 8399_v2.txt, node2.log, ubuntu-8399.log
> 
> 
> When running the dtest \
> {{user_types_test.py:TestUserTypes.test_type_keyspace_permission_isolation}} with \
> the current 2.1-HEAD code, very frequently, but not always, when dropping a type, \
> the following exception is seen:{code} ERROR [MigrationStage:1] 2014-12-01 \
> 13:54:54,824 CassandraDaemon.java:170 - Exception in thread \
>                 Thread[MigrationStage:1,5,main]
> java.lang.AssertionError: Reference counter -1 for \
> /var/folders/v3/z4wf_34n1q506_xjdy49gb780000gn/T/dtest-eW2RXj/test/node2/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-sche
>  ma_keyspaces-ka-14-Data.db
> at org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1662) \
> ~[main/:na] at org.apache.cassandra.io.sstable.SSTableScanner.close(SSTableScanner.java:164) \
> ~[main/:na] at org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:62) \
> ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore$8.close(ColumnFamilyStore.java:1943) \
> ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:2116) \
> ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:2029) \
> ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1963) \
> ~[main/:na] at org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:744) \
> ~[main/:na] at org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:731) \
> ~[main/:na] at org.apache.cassandra.config.Schema.updateVersion(Schema.java:374) \
> ~[main/:na] at org.apache.cassandra.config.Schema.updateVersionAndAnnounce(Schema.java:399) \
> ~[main/:na] at org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:167) \
> ~[main/:na] at org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49) \
> ~[main/:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) \
> ~[main/:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) \
> ~[na:1.7.0_67] at java.util.concurrent.FutureTask.run(FutureTask.java:262) \
> ~[na:1.7.0_67] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) \
> ~[na:1.7.0_67] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) \
> [na:1.7.0_67] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]{code}
> Log of the node with the error is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic