[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-commits
Subject:    [jira] [Commented] (CASSANDRA-5210) DB is randomly and undetectably corrupted during high traffic co
From:       "Jonathan Ellis (JIRA)" <jira () apache ! org>
Date:       2013-01-31 23:43:12
Message-ID: JIRA.12630217.1359667010980.226316.1359675792908 () arcas
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/CASSANDRA-5210?page=com.atlassian.jira.plu \
gin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568245#comment-13568245 \
] 

Jonathan Ellis commented on CASSANDRA-5210:
-------------------------------------------

This sounds a lot like a custom comparator that doesn't actually impose a total \
ordering of its data.  
> DB is randomly and undetectably corrupted during high traffic column family flushes \
>                 
> ------------------------------------------------------------------------------------
>  
> Key: CASSANDRA-5210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5210
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.8.1, 0.8.2, 0.8.3, 0.8.4, 0.8.5, 0.8.6, 0.8.7, 0.8.8, 0.8.9, \
> 0.8.10, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.7, 1.1.8, 1.1.9, \
>                 1.2.0, 1.2.1
> Environment: Cassandra 0.8+, OS/X, java version "1.6.0_37" 
> Reporter: Elden Bishop
> 
> Writes during high traffic column family flushes corrupt the DB and make slice \
> queries return incorrect data. Any multi-column write on any version of Cassandra \
> can put the DB in a state where some columns cannot be read alongside other \
> columns. eg.
> {{
> // *** for any NON-NULL column (eg. col_a=>AAA)
> cqlsh> SELECT 'col_a' FROM test WHERE KEY='row_a';
> returns:     'AAA'
> // *** it can disappear when queried alongside another column
> cqlsh> SELECT 'col_a', 'col_b' FROM test WHERE KEY='row_a';
> returns:      null,   'BBB' // *** col_a is MISSING
> // *** but it depends on the other columns
> cqlsh> SELECT 'col_a', 'col_b', 'col_c' FROM test WHERE KEY='row_a';
> returns:     'AAA',   'BBB',   'CCC' // *** col_a is BACK
> }}
> Once in this state the database is corrupt and essentially returning random data \
> depending on what columns you query. Single column queries always return correct \
> results so there is no way to verify the data. No errors are logged during \
> corruption and it is impossible to detect without querying all combinations of all \
> columns. To reproduce:
> 1. Unzip a distribution of Cassandra and create a test.test column family.
> 2. In a loop alternate between updating either row 'a' or a random row.
> Write a random value to four random columns (out of 10000). Keep track
> of all columns set in row 'a'.
> 3. Each pass through the loop query four random columns (out of 10000) from row \
> 'a'. If a column that is known to be set is null, print out the columns that were \
> requested during the query. 4. The DB is now corrupt and will return the column if \
> queried by itself but will return null if queried alongside the columns that \
>                 triggered the error. This is a permanent condition.
> Observations: This bug only manifests directly after a high traffic column family \
> flush occurs in the log. This is a correlation based on simply watching the log. \
>                 There are no errors or warnings of any kind.
> Workaround: Any multi-column read is potentially invalid and corruption is \
> virtually undetectable. The only workaround is never writing or reading more than a \
> single column in a query. I have a simple groovy script that can trigger the error. \
> I have verified the behavior on Cassandra versions as old as 0.8.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic