[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openvswitch-discuss
Subject:    [ovs-discuss] Openvswitch network disconnect
From:       alexw () nicira ! com (Alex Wang)
Date:       2015-06-26 17:37:59
Message-ID: CAArS4XVxsqd46npiyFNvuW_O=NNdChqkZoqvJQ81PpDP3ncqQg () mail ! gmail ! com
[Download RAW message or body]

Hey Chris,

Sorry for this very delayed reply,

Just checked the core file, very useful,

seems like there is a long running thread holding the
'pthread_rwlock_rdlock'
(possible thread 6?) and starving other threads (including main 'ovs-
vswitchd' thread, which may explain why your ovs-appctl cmd hang)...  To
further investigate, we need the debug info pkg installed, like warned in
the
gdb trace:
"""
Missing separate debuginfos, use: debuginfo-install
openvswitch-2.3.0-1.x86_64
"""

I tried to checkout ovs v2.3.0, compile on my local machine, gdb into it
and run 'info symbol <address in the gdb backtrace>'...  it points me to
totally unrelated function...  so we need to have the debug info pkg from
the
same build...

So, is the ovs pkg you used publicly downloadable?  if so, could you point
me to that, so that I can investigate myself?  if not, could you try
translate
this trace for me? using `info symbol <address>`
"""

Thread 6 (Thread 0x7f76affff700 (LWP 2226)):
#0  0x00007f76d201505e in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007f76d1f9a16b in _L_lock_9503 () from /lib64/libc.so.6
#2  0x00007f76d1f976a6 in malloc () from /lib64/libc.so.6
#3  0x00000000004b07a5 in ?? ()
#4  0x00000000004b18b1 in ?? ()
#5  0x0000000000431235 in ?? ()
#6  0x0000000000432195 in ?? ()
#7  0x0000000000433be0 in ?? ()
#8  0x00000000004340fe in ?? ()
#9  0x000000000042c6fa in ?? ()
#10 0x000000000042e9be in ?? ()
#11 0x000000000042ecba in ?? ()
#12 0x0000000000494661 in ?? ()
#13 0x00007f76d1d079d1 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f76d20058fd in clone () from /lib64/libc.so.6

"""


Also, could you grep the /var/log/openvswitch/ovs-vswitchd.log* for the
following string:
"""
Unreasonably long
"""
And post the output, if a thread holding the rwlock for very long time,
there will be such log in ovs-vswitchd.log

Thanks,
Alex Wang,

On Tue, Jun 23, 2015 at 8:51 PM, Chris <contact at progbau.de> wrote:

> Hello Alex,
>
>
>
> It happened again and I got some gdp traces, please see the output in the
> attachment.
>
> During the gdb session the openvswitch started working again, so I'm not
> sure if I got the right point.
>
>
>
> Please have a look and let me know if you need additional information.
>
> Thanks in advance!
>
>
>
> Cheers
>
> Chris
>
>
>
> *From:* Alex Wang [mailto:alexw at nicira.com]
> *Sent:* Thursday, June 04, 2015 14:00
>
> *To:* Chris
> *Cc:* discuss at openvswitch.org; openstack at lists.openstack.org; Soputhi Sea
> *Subject:* Re: [ovs-discuss] Openvswitch network disconnect
>
>
>
> Thx for the info, I'll try reproduce it on my local setup using
> active-backup
>
> bond.  And use it as managment interface,
>
>
>
> Will update if I run into the same issue~
>
>
>
> Thanks,
>
> Alex Wang,
>
>
>
> On Wed, Jun 3, 2015 at 11:52 PM, Chris <contact at progbau.de> wrote:
>
> Hello Alex,
>
>
>
> I will do the gdb debug when it happens again.
>
>
>
> Here the output, in this case there are 4 VMs running on the host. The
> physical interfaces eth2 & eth3 are in active-passive mode bonded to bond0.
> mgmt0 is used for the management of the host system. Anything else is
> created by OpenStack.
>
> The failure happens independent from the number of VMs running on the
> host:
>
>
>
> *ovs-vsctl show*
>
> fbbaf640-ed82-4735-99d2-fbe09f4041f1
>
>     Bridge "br-bond0"
>
>         Port "mgmt0"
>
>             Interface "mgmt0"
>
>                 type: internal
>
>         Port "br-bond0"
>
>             Interface "br-bond0"
>
>                 type: internal
>
>         Port "phy-br-bond0"
>
>             Interface "phy-br-bond0"
>
>         Port "bond0"
>
>             Interface "eth2"
>
>             Interface "eth3"
>
>     Bridge br-int
>
>         fail_mode: secure
>
>         Port br-int
>
>             Interface br-int
>
>                 type: internal
>
>         Port "qvo4166dc0a-69"
>
>             tag: 1
>
>             Interface "qvo4166dc0a-69"
>
>         Port "qvo6d8b70de-9c"
>
>             tag: 1
>
>             Interface "qvo6d8b70de-9c"
>
>         Port "qvoe75237cc-7f"
>
>             tag: 1
>
>             Interface "qvoe75237cc-7f"
>
>         Port "qvoe3b9e1fc-a5"
>
>             tag: 1
>
>             Interface "qvoe3b9e1fc-a5"
>
>         Port "int-br-bond0"
>
>             Interface "int-br-bond0"
>
>     ovs_version: "2.3.0"
>
>
>
> *ip a*
>
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
>
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>
>     inet 127.0.0.1/8 scope host lo
>
> 2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>
>     link/ether 9c:b6:54:b3:67:34 brd ff:ff:ff:ff:ff:ff
>
> 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>
>     link/ether 9c:b6:54:b3:67:35 brd ff:ff:ff:ff:ff:ff
>
> 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen
> 1000
>
>     link/ether 64:51:06:f0:85:98 brd ff:ff:ff:ff:ff:ff
>
> 5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen
> 1000
>
>     link/ether 64:51:06:f0:85:9c brd ff:ff:ff:ff:ff:ff
>
> 6: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>
>     link/ether de:76:ca:6d:57:48 brd ff:ff:ff:ff:ff:ff
>
> 7: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>
>     link/ether ee:f8:2f:ef:94:42 brd ff:ff:ff:ff:ff:ff
>
> 8: br-bond0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>
>     link/ether 64:51:06:f0:85:98 brd ff:ff:ff:ff:ff:ff
>
> 15: mgmt0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
> UNKNOWN
>
>     link/ether fa:e1:51:33:5d:dc brd ff:ff:ff:ff:ff:ff
>
>     inet 10.201.195.75/24 brd 10.201.195.255 scope global mgmt0
>
> 16: qbr4166dc0a-69: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> noqueue state UNKNOWN
>
>     link/ether be:9a:00:23:b1:f3 brd ff:ff:ff:ff:ff:ff
>
> 17: qvo4166dc0a-69: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500
> qdisc pfifo_fast state UP qlen 1000
>
>     link/ether ae:27:12:55:09:0c brd ff:ff:ff:ff:ff:ff
>
> 18: qvb4166dc0a-69: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500
> qdisc pfifo_fast state UP qlen 1000
>
>     link/ether be:9a:00:23:b1:f3 brd ff:ff:ff:ff:ff:ff
>
> 19: tap4166dc0a-69: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> pfifo_fast state UNKNOWN qlen 500
>
>     link/ether fe:16:3e:d6:34:82 brd ff:ff:ff:ff:ff:ff
>
> 20: qbre75237cc-7f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> noqueue state UNKNOWN
>
>     link/ether c2:f2:1a:30:a2:84 brd ff:ff:ff:ff:ff:ff
>
> 21: qvoe75237cc-7f: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500
> qdisc pfifo_fast state UP qlen 1000
>
>     link/ether 1e:7f:f4:57:aa:82 brd ff:ff:ff:ff:ff:ff
>
> 22: qvbe75237cc-7f: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500
> qdisc pfifo_fast state UP qlen 1000
>
>     link/ether c2:f2:1a:30:a2:84 brd ff:ff:ff:ff:ff:ff
>
> 23: tape75237cc-7f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> pfifo_fast state UNKNOWN qlen 500
>
>     link/ether fe:16:3e:02:a4:d7 brd ff:ff:ff:ff:ff:ff
>
> 24: qbre3b9e1fc-a5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> noqueue state UNKNOWN
>
>     link/ether ce:c7:ec:59:fd:1a brd ff:ff:ff:ff:ff:ff
>
> 25: qvoe3b9e1fc-a5: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500
> qdisc pfifo_fast state UP qlen 1000
>
>     link/ether c6:5f:19:25:7c:be brd ff:ff:ff:ff:ff:ff
>
> 26: qvbe3b9e1fc-a5: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500
> qdisc pfifo_fast state UP qlen 1000
>
>     link/ether ce:c7:ec:59:fd:1a brd ff:ff:ff:ff:ff:ff
>
> 27: tape3b9e1fc-a5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> pfifo_fast state UNKNOWN qlen 500
>
>     link/ether fe:16:3e:bb:a7:19 brd ff:ff:ff:ff:ff:ff
>
> 28: qbr6d8b70de-9c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> noqueue state UNKNOWN
>
>     link/ether c2:45:27:31:f8:d1 brd ff:ff:ff:ff:ff:ff
>
> 29: qvo6d8b70de-9c: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500
> qdisc pfifo_fast state UP qlen 1000
>
>     link/ether 46:23:9e:4b:2a:fd brd ff:ff:ff:ff:ff:ff
>
> 30: qvb6d8b70de-9c: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500
> qdisc pfifo_fast state UP qlen 1000
>
>     link/ether c2:45:27:31:f8:d1 brd ff:ff:ff:ff:ff:ff
>
> 31: tap6d8b70de-9c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> pfifo_fast state UNKNOWN qlen 500
>
>     link/ether fe:16:3e:40:9e:44 brd ff:ff:ff:ff:ff:ff
>
> 32: bond0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
>
>     link/ether f6:83:30:bc:4c:7c brd ff:ff:ff:ff:ff:ff
>
> 35: phy-br-bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> pfifo_fast state UP qlen 1000
>
>     link/ether 76:51:9d:bd:24:83 brd ff:ff:ff:ff:ff:ff
>
> 36: int-br-bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> pfifo_fast state UP qlen 1000
>
>     link/ether 5e:fa:05:2a:1a:ef brd ff:ff:ff:ff:ff:ff
>
>
>
>
>
> Cheers,
>
> Chris
>
>
>
>
>
> *From:* Alex Wang [mailto:alexw at nicira.com]
> *Sent:* Thursday, June 04, 2015 13:29
> *To:* Chris
> *Cc:* discuss at openvswitch.org; openstack at lists.openstack.org; Soputhi Sea
> *Subject:* Re: [ovs-discuss] Openvswitch network disconnect
>
>
>
>
>
>
>
> On Wed, Jun 3, 2015 at 11:16 PM, Chris <contact at progbau.de> wrote:
>
> Hello,
>
> We are using Openvswitch in our Openstack setup.
>
> ovs-vswitchd --version
> ovs-vswitchd (Open vSwitch) 2.3.0
> Compiled Oct 28 2014 17:48:05
> OpenFlow versions 0x1:0x1
>
> We experience openvswitch failures from time to time. It seems to happen
> random, no network traffic spikes for example.
> The VM ports and the openvswitch port for the host management just stop
> working.  But the openvswitch services (ovsdb-server/ ovs-vswitchd) are
> still running
>
> For debug purpose the following commands has been executed:
>
> - "ovs-vsctl show" the interfaces are still listed
> The ones below didn't show any result and just hang after the execution:
> - ovs-appctl bond/show bond0
> - ovs-appctl vlog/list
> - ovs-ofctl dump-flows br-bond0
>
>
>
>
>
> This seems to indicate that the ovs-vswitchd process is dead locked...
>
> I could not find comment in branch-2.3 that relates to dead lock.  It
> would be
>
> helpful if you can gdb into the running ovs-vswitchd process and provide
>
> the backtraces when there is a failure.
>
>
>
> Or could I know what your setup looks like (ovs-vsctl show output)?
>
>
>
> Thanks,
>
> Alex Wang,
>
>
>
>
>
>
>
> A "service openvswitch restart" fix it, the connection from the VMs and the
> host are back immediately.
>
> Any help appreciated!
>
> Cheers,
> Chris
>
>
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/discuss/attachments/20150626/c7724d25/attachment-0001.html>

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic