[prev in list] [next in list] [prev in thread] [next in thread] 

List:       keepalived-devel
Subject:    Re: [Keepalived-devel] VRRP: Question about missing GARPs when
From:       Francis Joanis <francis.joanis () gmail ! com>
Date:       2009-02-27 18:19:23
Message-ID: c14fc6470902271019k690a18e5l78023b0d3a1610eb () mail ! gmail ! com
[Download RAW message or body]

Hi again,

Please note that there's a logical error in the broadcast address of
my configurations (it should read 10.0.1.255 and not 10.0.1.95).
However, I redid my tests and had the same results.

Regards,
Francis

On Fri, Feb 27, 2009 at 11:29 AM, Francis Joanis
<francis.joanis@gmail.com> wrote:
> Hi,
>
> I am trying to use keepalived to implement HA between two servers
> using a single VRRP virtual IP address.
>
> I've found an issue that seems to be present from keepalived 1.1.11 to
> 1.1.15 (I did try it with 1.1.16, but see comment at the end of this
> email - I didn't try anything prior to 1.1.11). I am using CentOS 4.7
> i386.
>
> Here's the scenario:
>
> Configuration of primary server:
>
> vrrp_instance VI_1 {
>  state MASTER
>  interface eth0
>  virtual_router_id 12
>  priority 100
>  authentication {
>  auth_type PASS
>  auth_pass test123
>  }
>  virtual_ipaddress {
>  10.0.1.150 brd 10.0.1.95 dev eth0
>  }
> }
>
> Configuration of the other server:
>
> vrrp_instance VI_1 {
>  state BACKUP
>  interface eth0
>  virtual_router_id 12
>  priority 50
>  authentication {
>  auth_type PASS
>  auth_pass test123
>  }
>  virtual_ipaddress {
>  10.0.1.150 brd 10.0.1.95 dev eth0
>  }
> }
>
> 1. Have both servers ready (the primary one owning the VIP)
> 2. On the primary server, run tethereal to show only the
> advertisements and ARPs related to the VIP: tethereal -R "vrrp or (arp
> and arp.dst.proto_ipv4 == 10.0.1.150)"
> 3. On the primary server as well, run the following iptables commands:
> iptables -A OUTPUT -d 224.0.0.18 -j DROP; iptables -A INPUT -d
> 224.0.0.18 -j DROP; sleep 4; iptables -F
> 4. Repeat step 3. and watch the Wireshark traces
>
> The goal of step 3 is to fool both nodes into thinking that they
> cannot talk to each other anymore and to re-establish their
> "connection" after 4 seconds.
>
> Note that the primary server's MAC is Vmware_3f:39:0f and its IP is 10.0.1.194.
>
> Normally, this is what happens:
>
> 0.000000   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 0.999963   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 2.001293   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 3.002984   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 6.808902   10.0.1.193 -> 224.0.0.18   VRRP Announcement (v2)
> 7.811786 Vmware_f8:92:11 -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request)
> 7.811871 Vmware_f8:92:11 -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request)
> 7.812184 Vmware_f8:92:11 -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request)
> 7.812190 Vmware_f8:92:11 -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request)
> 7.812195 Vmware_f8:92:11 -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request)
> 7.812258   10.0.1.193 -> 224.0.0.18   VRRP Announcement (v2)
> 7.812695   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 7.812904 Vmware_3f:39:0f -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request) (duplicate use of 10.0.1.150 detected!)
> 7.813046 Vmware_3f:39:0f -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request) (duplicate use of 10.0.1.150 detected!)
> 7.813144 Vmware_3f:39:0f -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request) (duplicate use of 10.0.1.150 detected!)
> 7.813227 Vmware_3f:39:0f -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request) (duplicate use of 10.0.1.150 detected!)
> 7.813309 Vmware_3f:39:0f -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request) (duplicate use of 10.0.1.150 detected!)
> 8.815182   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 9.815507   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 10.816186   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 11.816746   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 12.817276   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
>
> As expected, the primary server, upon receiving a lower priority
> advertisement, re-elects itself as the owner of the VIP and sends
> GARPs.
>
> However, maybe 1 out of 10 times, the following happens:
>
> 60.779839   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 61.781103   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 62.781640   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 63.783367   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 64.784799   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 68.591456   10.0.1.193 -> 224.0.0.18   VRRP Announcement (v2)
> 69.592976 Vmware_f8:92:11 -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request)
> 69.593290 Vmware_f8:92:11 -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request)
> 69.593296 Vmware_f8:92:11 -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request)
> 69.593471 Vmware_f8:92:11 -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request)
> 69.593476 Vmware_f8:92:11 -> Broadcast    ARP Gratuitous ARP for
> 10.0.1.150 (Request)
> 69.593652   10.0.1.193 -> 224.0.0.18   VRRP Announcement (v2)
> 69.789341   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 70.789822   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 71.790390   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
> 72.791922   10.0.1.194 -> 224.0.0.18   VRRP Announcement (v2)
>
> In this case, no GARPs are sent after the advertisement from 10.0.1.193.
>
> Moreover, when this happens, the keepalived logs don't even show these logs:
>
> Feb 27 11:00:01 localhost Keepalived_vrrp: VRRP_Instance(VI_1)
> Received lower prio advert, forcing new election
> Feb 27 11:00:01 localhost Keepalived_vrrp: VRRP_Instance(VI_1) Sending
> gratuitous ARPs on eth0 for 10.0.1.150
>
> It is as if keepalived was never notified of the advertisement from
> the other server, like if the IP system never "linked" that message to
> keepalived's socket.
>
> Are you guys aware of such issue?
>
> It also brings up another question: what if two servers are connected
> over a "complex" WAN configuration and if links between them go down.
> Once the links get re-established, there could possibly be cases where
> the advertisement from the backup server wouldn't be seen at all by
> the primary server, meaning that there wouldn't have been any way for
> the primary one to know that the backup one ever owned the VIP.
>
> For example, if the (one way) communication from the primary to the
> backup is re-established before the communication between the backup
> and the primary, the backup server would have received the
> advertisements from the primary and stopped sending its own before the
> communication pipe to the primary was truly usable
>
> In that case, a possible (but maybe crude) fix could be to have the
> server currently being elected as the master periodically send GARPs
> (let's say, every 30 seconds - or something configurable) to ensure
> that the network is correctly updated. If it makes sense, I could even
> give a shot at patching keepalived to test it out and share it with
> you.
>
> Note about 1.1.16: I did try it out on CentOS 4.7, but I had weird
> behaviors. Out of the blue, the servers would simply stop sending
> their advertisements and then nothing would work normally. I might
> give it a shot under CentOS 5.x and let you know.
>
> Please let me know if you need more information,
>
> Thanks,
> Francis
>

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Keepalived-devel mailing list
Keepalived-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/keepalived-devel

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic