'Re: Question about node failure...'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-user
Subject:    Re: Question about node failure...
From:       Tatu Saloranta <tsaloranta () gmail ! com>
Date:       2010-03-30 0:42:12
Message-ID: 5f7770581003291742s21abcf52y7ae1f4f9a55e33df () mail ! gmail ! com
[Download RAW message or body]

On Mon, Mar 29, 2010 at 10:40 AM, Ned Wolpert <ned.wolpert@imemories.com> wrote:
> So,  what does "anti-entropy repair" do then?

Fix discrepancies between live nodes? (caused by transient failures presumably)

> Sounds like you have to 'decommission' the dead node, then I thought run
> 'nodeprobe repair' to get the data adjusted back to a replication factor of
> 3, right?
>
> Also, what is the method to decommission a dead node? pass in the IP address
> of the dead node to nodeprobe on a member of the cluster? I've only used
> 'decommission' to remove the node I ran it on from the cluster... not a
> different node.
>
> It seems like if you decommission a node it should fix the replication
> factor for data that was on that node in this case...

Perhaps it would be good to have convenience workflow for replacing
broken host ("squashing lemons")? I would assume that most common use
case is to effectively replace host that can't be repaired (or perhaps
it might sometimes be best way to do it anyway), by combination of
removing failed host, bringing in new one. Handling this is as
high-level logical operation could be more efficient than doing it
step by step.

-+ Tatu +-

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic