[prev in list] [next in list] [prev in thread] [next in thread] 

List:       activemq-users
Subject:    Re: Artemis not honoring quorum-vote-wait setting
From:       Lewis Gardner <lewisgardner () gmail ! com>
Date:       2021-07-12 21:54:01
Message-ID: CAOwgqJDpGKFLoO2RnPN1+EBPb-cW_c8oSRU2GmcyNKRyp689Cg () mail ! gmail ! com
[Download RAW message or body]


Hi Domenico,

thanks for the explanation but if that is correct then the documentation at
https://activemq.apache.org/components/artemis/documentation/latest/network-isolation.html
is horribly wrong as it *explicitly* states this option is for both master
and slave:

>>>

Quorum voting is used by both the live and the backup to decide what to do
if a replication connection is disconnected. Basically the server will
request each live server in the cluster to vote as to whether it thinks the
server it is replicating to or from is still alive. You can also configure
the time for which the quorum manager will wait for the quorum vote
response. The default time is 30 seconds you can configure like so for
master and also for the slave:

<ha-policy>
  <replication>
    <master>
       <quorum-vote-wait>12</quorum-vote-wait>
    </master>
  </replication></ha-policy>
<<<

In the meantime, I have fixed the situation by explicitly setting
quorum-size to "2" (which I believed would be the automatic setting
for a 3-pair cluster).

regards,

Lewis


On Mon, 12 Jul 2021 at 21:57, Domenico Francesco Bruscino <
bruscinodf@gmail.com> wrote:

> Hi Lewis,
>
> the `quorum-vote-wait` parameter only affects nodes that are acting as
> backup. It defines the time that the backup nodes will wait for quorum vote
> responses and not time to wait before sending a quorum vote request. So
> this parameter is not useful to allow Backup-1 to participate in the quorum
> vote.
>
> Anyway, I would not keep the Active-3 live without the quorum to avoid
> split-brains. ARTEMIS-2716[1] should address your use case.
>
> [1] https://issues.apache.org/jira/browse/ARTEMIS-2716
>
> Regards,
> Domenico
>
> On Mon, 12 Jul 2021 at 06:10, Lewis Gardner <lewisgardner@gmail.com>
> wrote:
>
> > I have a 3-active/backup pair HA setup with each pair on a separate
> network
> > segment.
> >
> > Seg 1: Active-1 and Backup-3 (backup for Active-3)
> > Seg 2: Active-2 and Backup-1 (backup for Active-1)
> > Seg 3: Active-3 and Backup-2 (backup for Active-2)
> >
> > I am using the "vote-on-replication-failure = true" option to
> automatically
> > shutdown active nodes which have been network isolated.
> >
> > If I disconnect network segment 1, Backup-1 on segment 2 properly
> announces
> > itself as Live. Active-3 however attempts to get quorum votes from both
> > Active-1 and Active-2, does not receive a reply from Active-1 (as that
> one
> > is on the same failed network segment as Backup-3) and shuts itself down
> > after 5 seconds with "Timeout waiting for quorum vote responses"
> >
> > I have tried increasing the timeout to allow Backup-1 to complete
> becoming
> > Live and participating in Active-3's quorum request but Active-3 always
> > prints "Waiting 5 seconds for quorum vote results", independently of what
> > value I specify in the "quorum-vote-wait" option.
> >
> > The Active-3 configuration is shown below:
> >
> > <connectors>
> >         <connector name="netty-active-1">tcp://
> > 192.168.2.20:61616?sslEnabled=true</connector>
> >         <connector name="netty-active-2">tcp://
> > 192.168.2.21:61616?sslEnabled=true</connector>
> >         <connector name="netty-active-3">tcp://
> > 192.168.2.22:61616?sslEnabled=true</connector>
> >         <connector name="netty-backup-1">tcp://
> > 192.168.2.20:61716?sslEnabled=true</connector>
> >         <connector name="netty-backup-2">tcp://
> > 192.168.2.21:61716?sslEnabled=true</connector>
> >         <connector name="netty-backup-3">tcp://
> > 192.168.2.22:61716?sslEnabled=true</connector>
> > </connectors>
> >
> > <cluster-connections>
> >         <cluster-connection name="my-cluster">
> >                 <connector-ref>netty-active-3</connector-ref>
> >                 <check-period>1000</check-period>
> >                 <connection-ttl>5000</connection-ttl>
> >                 <call-timeout>5000</call-timeout>
> >                 <retry-interval>500</retry-interval>
> >
>  <retry-interval-multiplier>1.0</retry-interval-multiplier>
> >                 <max-retry-interval>5000</max-retry-interval>
> >                 <initial-connect-attempts>-1</initial-connect-attempts>
> >                 <reconnect-attempts>-1</reconnect-attempts>
> >                 <use-duplicate-detection>true</use-duplicate-detection>
> >
>  <message-load-balancing>ON_DEMAND</message-load-balancing>
> >                 <max-hops>1</max-hops>
> >                 <notification-interval>1000</notification-interval>
> >                 <notification-attempts>2</notification-attempts>
> >                 <static-connectors>
> >                         <connector-ref>netty-active-2</connector-ref>
> >                         <connector-ref>netty-active-3</connector-ref>
> >                         <connector-ref>netty-backup-1</connector-ref>
> >                         <connector-ref>netty-backup-2</connector-ref>
> >                         <connector-ref>netty-backup-3</connector-ref>
> >                 </static-connectors>
> >         </cluster-connection>
> > </cluster-connections>
> >
> > <ha-policy>
> >         <replication>
> >                 <master>
> >
> > <vote-on-replication-failure>true</vote-on-replication-failure>
> >                   <quorum-vote-wait>12</quorum-vote-wait>
> >
>  <check-for-live-server>true</check-for-live-server>
> >                         <group-name>server3</group-name>
> >                 </master>
> >         </replication>
> > </ha-policy>
> >
> > How can I make Active-3 wait for Backup-1 to become live before shutting
> > down?
> >
> > regards,
> > Lewis
> >
>


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic