'(Artemis) Lost Messages with Colocated Backup Scale-down Failback'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       activemq-users
Subject:    (Artemis) Lost Messages with Colocated Backup Scale-down Failback
From:       Jason Pyle <seth.pyle86 () gmail ! com>
Date:       2019-01-25 16:39:46
Message-ID: CAJX9eLB1f89FyebPYTWZcDJhR7ZUFQbwJUnW0sb1Xbt9A8B+LQ () mail ! gmail ! com
[Download RAW message or body]


I sent this last week via Nabble but I don't think it got mailed out to
anybody, if it did I apologize for the spam.

We're developing a strategy for backups and HA. Ideally we'd like to use
colocated backups to ensure data integrity and availability with scale-down
configured from the slave to the master host.

We ran into an issue when bringing a server back up, consider this
situation.

Servers 1 and 2 are brought up, and make colocated backups 1b and 2b. 1b
existing on server 2 and 2b existing on server 1. If I bring server 2
offline, 2b comes online then scales down into server 1 as intended. When I
bring server 2 back up, 2b does not failback. This leads to server 2
starting an infinite vote loop to find another server to create a backup
for
it. Since server 1 already possesses backup 2b and is only configured for 1
backup it will infinitely reply that it does not have space for another
backup.

In this state, if more messages are sent to server 2 and server 2
experiences a crash those messages are lost.

I've created an example of this problem based on one of the examples in the
artemis source here
https://github.com/SethPyle376/colocated-scaledown-problem

I've tested this situation with both replication and shared-store and the
problem persists. Any help would be great, we need colocated scaledown
failback working correctly.


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic