[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xen-users
Subject:    [Xen-users] Xen, NIC bonding, ARP problem
From:       Dominik Klein <dk () in-telegence ! net>
Date:       2007-03-30 8:05:45
Message-ID: 460CC4D9.3000209 () in-telegence ! net
[Download RAW message or body]

Hi xen-users

Xen seems to have problems with NIC bonding and ARP protocol. I 
experienced the very same as described here:
http://arcknowledge.com/gmane.comp.emulators.xen.user/2005-10/msg00154.html

In short:
ping from domU to a host on the same network works after about 10 
seconds (this actually varies, but it fairly sure works after 10 seconds).
tcpdump shows ARP-replies from the target, but arp -a in domU shows they 
don't make it to domU immediately. Replies need to be send a couple of 
times before they get to domU.
tcpdump also shows, that each ARP-request is sent twice.

Hard-coding MAC-addresses to the arp-table in domU solves this problem, 
but that does not seem like a good solution.

Just to be sure we are talking about the same thing: I am talking about 
"active-backup" bonding.
(see /usr/src/linux/Documentation/networking/bonding.txt
or eg http://www.mjmwired.net/kernel/Documentation/networking/bonding.txt)

The network-script used to configure Xen for NIC bonding is listed here:
http://lists.xensource.com/archives/html/xen-users/2006-04/msg00186.html

And I think I have found the reason, why the described problems happen:

In a normal xen setup, network looks like this:
domU (say ID=1) sees eth0
this eth0 is represented as vif1.0 in dom0 and connected to xenbr0, 
which goes "out" through peth0

brctl show
bridge name     bridge id               STP enabled     interfaces
xenbr0          8000.feffffffffff       no              vif0.0
                                                         peth0
                                                         vif1.0

Neither xenbr0 nor vif1.0 nor peth0 reply to ARP requests. ARP is 
completely handled by domU.

excerpt from ip addr list
xenbr0: <BROADCAST,NOARP,UP>
vif1.0: <BROADCAST,NOARP,UP>
peth0: <BROADCAST,NOARP,UP>

Furthermore, none of these interfaces has an IP-address assigned (in dom0).

In an active-passive NIC bonding setup, both NICs used for the bonding 
device are configured with the same MAC address, but only the currently 
active one has the ARP flag set (ip link set $dev arp on).
In my case, bond0 is made of eth0 and eth2. eth2 is the currently active 
NIC.

So the network looks like this in a bonding setup with xen:
domU (again, ID=1) sees eth0
this eth0 is represented as vif1.0 in dom0 and connected to xenbr0, 
which goes "out" through bond0

brctl show xenbr0
bridge name     bridge id               STP enabled     interfaces
xenbr0          8000.000423c0b33c       no              vif0.0
                                                         bond0
                                                         vif1.0

Now comes the tricky part. ARP is *not* deactivated for xenbr0 and bond0.

excerpt from ip addr list
xenbr0: <BROADCAST,MULTICAST,UP>
vif1.0: <BROADCAST,NOARP,UP>
bond0: <BROADCAST,MULTICAST,MASTER,UP>
eth0: <BROADCAST,MULTICAST,NOARP,SLAVE,UP>
eth2: <BROADCAST,MULTICAST,SLAVE,UP>
remember: eth2 is the currently active card in bond0!

And also: bond0 AND xenbr0 have IP-addresses assigned. This seemed weird 
to me in the first place, but I didnt actually know what to do about this.
Another weird thing in comparison to a "normal xen network setup" is 
that the network-bridge-bonding script does not create peth[02], but 
keeps using bond0 "as is".

So here's what I already tried to solve this problem:

ip link set bond0 arp off
no difference

ip link set xenbr0 arp off
no difference in domU
dom0 can no longer talk to unknown hosts, as it does not do any ARP any more

ip addr purge bond0
no difference

ip addr purge xenbr0
no difference in domU
dom0 can no longer do any IP-networking, as xenbr0 is the device which 
routes are set for
adding appropriate routes for bond0 (which still has the addresses) does 
not solve this problem.

So if anybody has an idea how to get NIC bonding work together with Xen, 
please let me know. If you need any more information, just ask. I know 
this is a fairly complex situation but I would really appreciate some 
help here.

For completeness: I am using openSuSE 10.2 with Xen 3.0.4 and kernel 
2.6.16.33-xen in dom0 and domU.

Regards
Dominik

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic