[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-commits
Subject:    [jira] Commented: (CASSANDRA-1752) repair leaving FDs unclosed
From:       "Jonathan Ellis (JIRA)" <jira () apache ! org>
Date:       2010-11-30 18:52:13
Message-ID: 11276708.29291291143133850.JavaMail.jira () thor
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/CASSANDRA-1752?page=com.atlassian.jira.plu \
gin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965339#action_12965339 \
] 

Jonathan Ellis commented on CASSANDRA-1752:
-------------------------------------------

can't we leave the timing and logging code inside the Helper compaction method to \
reduce duplication in its callers?

> repair leaving FDs unclosed
> ---------------------------
> 
> Key: CASSANDRA-1752
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1752
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Tyler Hobbs
> Fix For: 0.6.9
> 
> Attachments: 1752-0.6.txt
> 
> 
> "We noticed that after a `nodetool repair` was ran, several of our nodes reported \
> high disk usage; -- even one node hit 100% disk usage. After a restart of that \
> node, disk usage drop instantly by 80 gigabytes -- well that was confusing, but we \
> quickly formed the theory that Cassandra must of been holding open references to \
> deleted file descriptors. "Later, i found this node as an example, it is using \
> about 8-10 gigabytes more than it should be -- 118 gigabytes reported by df, yet du \
> reports only 106 gigabytes in the cassandra directory (nothing else on the \
> mahcine). As you can see from the lsof listing, it is holding open FDs to files \
> that no longer exist on the filesystem, and there are no open streams or as far as \
> I can tell other reasons for the deleted sstable to be open. "This seems to be \
> related to running a repair, as we haven't seen it in any other situations before." \
> A quick check of FileStreamTask shows that the obvious base is covered: {code}
> finally
> {
> try
> {
> raf.close();
> }
> catch (IOException e)
> {
> throw new AssertionError(e);
> }
> }
> {code}
> So it seems that either the transfer loop is never finishing to get to that finally \
> block (in which case why isn't it showing up in outbound streams?) or something \
> else is the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic