[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openvswitch-discuss
Subject:    [ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll interval (365882ms user, 464906ms system)
From:       yangwanli () jd ! com (yangwanli)
Date:       2015-05-28 2:58:38
Message-ID: 20150528105838154660248 () jd ! com
[Download RAW message or body]

Thank you very much!
I upgraded to the latest kernel 2.6.32-504.16.2.el6.x86_64 for 200 servers  and so \
far, this problem(the ovs-vswitchd hang) has not happened. I seems to be related to \
this bug(http://bugs.centos.org/view.php?id=8703). But,  I'm not sure!


From: Joe Stringer
Date: 2015-05-28 09:55
To: yangwanli
CC: bugs
Subject: Re: Re: [ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll interval \
(365882ms user, 464906ms system)
> Hmm. I have a few of different paths that might help:
 
> 1) Can you get dmesg and/or syslog around the time of the event?
  Not find errors or warning in dmesg and syslog about ovs-vswitchd.
> 2) Does ovs-vswitchd crash or hang?
 Hang, I think so! Because when I excute netstat -anp, I find some unix sockets state \
is CONNECTING. ovs-ofctl dump-flows br-int, not always reponse.  Recently, I found  \
the ovs-vswitchd process of a server later changed to normal, about 20 minutes. I do \
nothing! ovs-ofctl dump-flows br-int is OK!
> A backtrace of the code that ovs-vswitchd is executing would be really
> useful. If it crashes, can you get a core dump and use gdb to get a
> backtrace? If it hangs, perhaps you could attach gdb and print a
> backtrace and post that?
 
> 3) If you're already compiling your own OVS-2.3, you could try
> downloading the latest branch-2.3 from here:
> https://github.com/openvswitch/ovs/tree/branch-2.3 
> There are a few bugfixes that have not made it into a new release yet.
 OK! I will try it. 
4) Is OVS running on the bare metal or is it running within a VM?
 bare metal 
> 5) Lastly, it's interesting that there is a delay of around ~2.5 hours
> (or 10,000 seconds) between the "unreasonably long poll interval"
> messages. Maybe this means something, maybe it's irrelevant.
 
On 26 May 2015 at 18:46, yangwanli <yangwanli at jd.com> wrote:
> I don't enable STP.
> 
> #ovs-vsctl show
> 350424e8-f03c-4f5c-aaa9-fd78ec78f97b
> Bridge br-int
> fail_mode: secure
> Port "int-br-eth1"
> Interface "int-br-eth1"
> type: patch
> options: {peer="phy-br-eth1"}
> Port "tap5c0d4002-26"
> tag: 1
> Interface "tap5c0d4002-26"
> Port "tap4954ce53-06"
> tag: 3
> Interface "tap4954ce53-06"
> Port "tap6631c08d-91"
> tag: 3
> Interface "tap6631c08d-91"
> Port "tapdfd3962f-3a"
> tag: 2
> Interface "tapdfd3962f-3a"
> Port br-int
> Interface br-int
> type: internal
> Port "tapb2cd1333-03"
> tag: 1
> Interface "tapb2cd1333-03"
> Port "tap0c1153d7-bb"
> tag: 1
> Interface "tap0c1153d7-bb"
> Bridge "br-eth1"
> Port "br-eth1"
> Interface "br-eth1"
> type: internal
> Port "eth3"
> Interface "eth3"
> Port "phy-br-eth1"
> Interface "phy-br-eth1"
> type: patch
> options: {peer="int-br-eth1"}
> ovs_version: "2.3.1"
> 
> 
> eth3 is physical NIC.
> The bidge of OVS is created by:
> ovs-vsctl add-br br-int
> ovs-vsctl add-br br-eth1
> ovs-vsctl --timeout=10 -- set-fail-mode br-int secure
> 
> ________________________________
> 
> WanLi  Yang
> 
> JD.COM
> 
> *****************************************************************
> 
> 
> From: Joe Stringer
> Date: 2015-05-27 05:05
> To: yangwanli
> CC: bugs
> Subject: Re: Re: [ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll
> interval (365882ms user, 464906ms system)
> Are you using STP?
> 
> On 22 May 2015 at 21:54, yangwanli <yangwanli at jd.com> wrote:
> > Thank you for your response!
> > 
> > My servers OS are all  CentOS 6.6,  kernel version: 2.6.32-504.el6.x86_64,
> > vCPU:32,  RAM:256G
> > #lscpu
> > Architecture:          x86_64
> > CPU op-mode(s):        32-bit, 64-bit
> > Byte Order:            Little Endian
> > CPU(s):                32
> > On-line CPU(s) list:   0-31
> > Thread(s) per core:    2
> > Core(s) per socket:    8
> > Socket(s):             2
> > NUMA node(s):          2
> > Vendor ID:             GenuineIntel
> > CPU family:            6
> > Model:                 63
> > Stepping:              2
> > CPU MHz:               2599.971
> > BogoMIPS:              5199.24
> > Virtualization:        VT-x
> > L1d cache:             32K
> > L1i cache:             32K
> > L2 cache:              256K
> > L3 cache:              20480K
> > NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
> > NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
> > 
> > OpenvSwitch kernel module(openvswitch.ko) and user space utilities are all
> > build  I use OpenvSwitch-2.3.1 source (change nothing )
> > (http://openvswitch.org/releases/openvswitch-2.3.1.tar.gz)
> > #modinfo openvswitch
> > /lib/modules/2.6.32-504.el6.x86_64/kernel/net/openvswitch/openvswitch.ko
> > version:        2.3.1
> > license:        GPL
> > description:    Open vSwitch switching datapath
> > srcversion:     4D7CD38A83A9A4A782F73A1
> > depends:        libcrc32c
> > vermagic:       2.6.32-504.el6.x86_64 SMP mod_unload modversions
> > parm:           vlan_tso:Enable TSO for VLAN packets (int)
> > 
> > I have 300 servers are all the same hardware , OS and OpenvSwitch-2.3.1.
> > Since May 15, 2015, There are 6 servers  appear this problem that  when I
> > execute  ovs-ofctl dump-flows br-int , not always reponse,
> > OVS ports not transmit datas, led to the virtual machines cannot be
> > accessed.
> > When appear this problem , server load is not high, and while I excute
> > netstat -anp , find some CONNECTING state of unix socket.
> > I restart ovs-vswitchd and ovdb-server, becames normal.
> > I doubt the thread dealock of ovs-vswitchd.
> > 
> > All servers have the following log info:
> > /var/log/openvswitch/ovs-vswitd.log:
> > 2015-05-18T12:13:04.491Z|00138|coverage|INFO|Event coverage, avg rate over
> > last: 5 seconds, last minute, last hour,  hash=9e13ffc8:
> > 2015-05-18T12:13:04.491Z|00139|coverage|INFO|nln_changed
> > 0.0/sec     0.000/sec        0.0000/sec   total: 21
> > 2015-05-18T12:13:04.491Z|00140|coverage|INFO|netlink_received
> > 59.4/sec    53.850/sec       53.6072/sec   total: 1159232
> > 2015-05-18T12:13:04.491Z|00141|coverage|INFO|netlink_recv_jumbo
> > 3.8/sec     2.767/sec        2.6536/sec   total: 53239
> > 2015-05-18T12:13:04.491Z|00142|coverage|INFO|netlink_sent
> > 68.0/sec    62.783/sec       62.9394/sec   total: 1353375
> > 2015-05-18T12:13:04.491Z|00143|coverage|INFO|netdev_set_policing
> > 0.0/sec     0.000/sec        0.0000/sec   total: 12
> > 2015-05-18T12:13:04.491Z|00144|coverage|INFO|netdev_get_ifindex
> > 0.0/sec     0.000/sec        0.0000/sec   total: 6
> > 2015-05-18T12:13:04.491Z|00145|coverage|INFO|netdev_get_hwaddr
> > 0.0/sec     0.000/sec        0.0000/sec   total: 6
> > 2015-05-18T12:13:04.491Z|00146|coverage|INFO|netdev_set_hwaddr
> > 0.0/sec     0.000/sec        0.0000/sec   total: 3
> > 2015-05-18T12:13:04.491Z|00147|coverage|INFO|netdev_get_ethtool
> > 0.0/sec     0.000/sec        0.0000/sec   total: 22
> > 2015-05-18T12:13:04.491Z|00148|coverage|INFO|netdev_set_ethtool
> > 0.0/sec     0.000/sec        0.0000/sec   total: 4
> > 2015-05-18T12:13:04.491Z|00149|coverage|INFO|vconn_received
> > 2.4/sec     2.000/sec        1.9067/sec   total: 41642
> > 2015-05-18T12:13:04.491Z|00150|coverage|INFO|vconn_sent
> > 1.8/sec     1.500/sec        1.4300/sec   total: 31228
> > 2015-05-18T12:13:04.491Z|00151|coverage|INFO|util_xalloc
> > 1987.2/sec  1867.150/sec     1817.8008/sec   total: 37808073
> > 2015-05-18T12:13:04.491Z|00152|coverage|INFO|pstream_open
> > 0.0/sec     0.000/sec        0.0000/sec   total: 5
> > 2015-05-18T12:13:04.491Z|00153|coverage|INFO|stream_open
> > 0.0/sec     0.000/sec        0.0000/sec   total: 1
> > 2015-05-18T12:13:04.491Z|00154|coverage|INFO|rconn_queued
> > 1.2/sec     1.000/sec        0.9533/sec   total: 20815
> > 2015-05-18T12:13:04.491Z|00155|coverage|INFO|rconn_sent
> > 1.2/sec     1.000/sec        0.9533/sec   total: 20815
> > 2015-05-18T12:13:04.491Z|00156|coverage|INFO|poll_fd_wait
> > 537.2/sec   481.217/sec      462.1683/sec   total: 9762985
> > 2015-05-18T12:13:04.491Z|00157|coverage|INFO|poll_zero_timeout
> > 5.2/sec     4.333/sec        3.8581/sec   total: 83823
> > 2015-05-18T12:13:04.491Z|00158|coverage|INFO|txn_unchanged
> > 0.0/sec     0.000/sec        0.0000/sec   total: 11
> > 2015-05-18T12:13:04.491Z|00159|coverage|INFO|txn_incomplete
> > 8.8/sec     8.450/sec        8.3044/sec   total: 170811
> > 2015-05-18T12:13:04.491Z|00160|coverage|INFO|txn_success
> > 0.2/sec     0.200/sec        0.1989/sec   total: 4276
> > 2015-05-18T12:13:04.491Z|00161|coverage|INFO|netdev_get_stats
> > 1.6/sec     1.600/sec        1.5911/sec   total: 32833
> > 2015-05-18T12:13:04.491Z|00162|coverage|INFO|mac_learning_learned
> > 0.2/sec     0.133/sec        0.0686/sec   total: 1595
> > 2015-05-18T12:13:04.491Z|00163|coverage|INFO|mac_learning_expired
> > 0.4/sec     0.150/sec        0.0689/sec   total: 1518
> > 2015-05-18T12:13:04.491Z|00164|coverage|INFO|hmap_pathological
> > 0.0/sec     0.000/sec        0.0000/sec   total: 27
> > 2015-05-18T12:13:04.491Z|00165|coverage|INFO|hmap_expand
> > 102.4/sec    92.300/sec       88.6772/sec   total: 1853939
> > 2015-05-18T12:13:04.491Z|00166|coverage|INFO|flow_extract
> > 23.0/sec    19.883/sec       19.7067/sec   total: 416747
> > 2015-05-18T12:13:04.491Z|00167|coverage|INFO|dpif_port_add
> > 0.0/sec     0.000/sec        0.0000/sec   total: 6
> > 2015-05-18T12:13:04.491Z|00168|coverage|INFO|dpif_flow_flush
> > 0.0/sec     0.000/sec        0.0000/sec   total: 4
> > 2015-05-18T12:13:04.491Z|00169|coverage|INFO|dpif_flow_put
> > 0.0/sec     0.000/sec        0.0000/sec   total: 2
> > 2015-05-18T12:13:04.491Z|00170|coverage|INFO|dpif_flow_del
> > 0.0/sec     0.000/sec        0.0000/sec   total: 1
> > 2015-05-18T12:13:04.491Z|00171|coverage|INFO|dpif_execute
> > 0.0/sec     0.000/sec        0.0000/sec   total: 1
> > 2015-05-18T12:13:04.491Z|00172|coverage|INFO|xlate_actions
> > 38.8/sec    27.317/sec       22.9731/sec   total: 507271
> > 2015-05-18T12:13:04.491Z|00173|coverage|INFO|rev_reconfigure
> > 0.0/sec     0.000/sec        0.0000/sec   total: 20
> > 2015-05-18T12:13:04.491Z|00174|coverage|INFO|rev_port_toggled
> > 0.0/sec     0.000/sec        0.0000/sec   total: 3
> > 2015-05-18T12:13:04.491Z|00175|coverage|INFO|rev_flow_table
> > 0.0/sec     0.000/sec        0.0000/sec   total: 11
> > 2015-05-18T12:13:04.491Z|00176|coverage|INFO|rev_mac_learning
> > 0.6/sec     0.283/sec        0.1286/sec   total: 2813
> > 2015-05-18T12:13:04.491Z|00177|coverage|INFO|ofproto_flush
> > 0.0/sec     0.000/sec        0.0000/sec   total: 3
> > 2015-05-18T12:13:04.491Z|00178|coverage|INFO|ofproto_recv_openflow
> > 1.8/sec     1.500/sec        1.4300/sec   total: 31229
> > 2015-05-18T12:13:04.491Z|00179|coverage|INFO|ofproto_update_port
> > 0.0/sec     0.000/sec        0.0000/sec   total: 34
> > 2015-05-18T12:13:04.491Z|00180|coverage|INFO|bridge_reconfigure
> > 0.0/sec     0.000/sec        0.0000/sec   total: 15
> > 2015-05-18T12:13:04.491Z|00181|coverage|INFO|49 events never hit
> > 2015-05-18T14:59:05.014Z|00182|timeval|WARN|Unreasonably long 2019ms poll
> > interval (43448ms user, 50465ms system)
> > 2015-05-18T14:59:05.014Z|00183|timeval|WARN|faults: 840 minor, 0 major
> > 2015-05-18T14:59:05.014Z|00184|timeval|WARN|disk: 0 reads, 120 writes
> > 2015-05-18T14:59:05.014Z|00185|timeval|WARN|context switches: 2167382
> > voluntary, 529 involuntary
> > 2015-05-18T14:59:05.014Z|00186|coverage|INFO|Skipping details of duplicate
> > event coverage for hash=9e13ffc8
> > 2015-05-18T17:46:50.578Z|00187|timeval|WARN|Unreasonably long 2018ms poll
> > interval (56724ms user, 65745ms system)
> > 2015-05-18T17:46:50.578Z|00188|timeval|WARN|faults: 841 minor, 0 major
> > 2015-05-18T17:46:50.578Z|00189|timeval|WARN|disk: 0 reads, 128 writes
> > 2015-05-18T17:46:50.578Z|00190|timeval|WARN|context switches: 2853218
> > voluntary, 538 involuntary
> > 2015-05-18T17:46:50.578Z|00191|coverage|INFO|Skipping details of duplicate
> > event coverage for hash=9e13ffc8
> > 2015-05-18T20:29:36.092Z|00192|timeval|WARN|Unreasonably long 2018ms poll
> > interval (69561ms user, 80775ms system)
> > 2015-05-18T20:29:36.092Z|00193|timeval|WARN|faults: 841 minor, 0 major
> > 2015-05-18T20:29:36.092Z|00194|timeval|WARN|disk: 0 reads, 136 writes
> > 2015-05-18T20:29:36.092Z|00195|timeval|WARN|context switches: 3519924
> > voluntary, 544 involuntary
> > 2015-05-18T20:29:36.092Z|00196|coverage|INFO|Skipping details of duplicate
> > event coverage for hash=9e13ffc8
> > 2015-05-18T23:21:51.729Z|00197|timeval|WARN|Unreasonably long 2021ms poll
> > interval (83029ms user, 96728ms system)
> > 2015-05-18T23:21:51.729Z|00198|timeval|WARN|faults: 841 minor, 0 major
> > 2015-05-18T23:21:51.729Z|00199|timeval|WARN|disk: 0 reads, 144 writes
> > 2015-05-18T23:21:51.729Z|00200|timeval|WARN|context switches: 4224452
> > voluntary, 553 involuntary
> > 2015-05-18T23:21:51.729Z|00201|coverage|INFO|Skipping details of duplicate
> > event coverage for hash=9e13ffc8
> > 
> > ________________________________
> > 
> > WanLi Yang
> > 
> > JD.COM
> > 
> > *****************************************************************
> > 
> > 
> > From: Joe Stringer
> > Date: 2015-05-23 05:36
> > To: yangwanli
> > CC: bugs
> > Subject: Re: [ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll
> > interval (365882ms user, 464906ms system)
> > On 19 May 2015 at 03:32, yangwanli <yangwanli at jd.com> wrote:
> > > Hi,
> > > I'm a cloud computing R&D engineer. Use OVS as virtual switch, all
> > > virtual
> > > machines and containers communicate with OVS.
> > > I use OVS  VLAN.
> > > OpenvSwitch Version 2.3.1  (nothing change)
> > > CentOS 6.6
> > > In the process of I use, encounter the flowing problems.
> > > Need your help!
> > > 
> > > 1、ovs-vswitchd.log always report warnings:
> > > 
> > > 2015-05-17T00:52:50.463Z|00575|timeval|WARN|Unreasonably long 2020ms poll
> > > interval (365882ms user, 464906ms system)
> > > 2015-05-17T00:52:50.463Z|00576|timeval|WARN|faults: 708 minor, 0 major
> > > 2015-05-17T00:52:50.463Z|00577|timeval|WARN|disk: 0 reads, 392 writes
> > > 2015-05-17T00:52:50.463Z|00578|timeval|WARN|context switches: 19571142
> > > voluntary, 2080 involuntary
> > > 2015-05-17T00:52:50.463Z|00579|coverage|INFO|Skipping details of
> > > duplicate
> > > event coverage for hash=35d02982
> > > 2015-05-17T03:26:36.336Z|00580|timeval|WARN|Unreasonably long 2020ms poll
> > > interval (380306ms user, 483259ms system)
> > > 2015-05-17T03:26:36.336Z|00581|timeval|WARN|faults: 708 minor, 0 major
> > > 2015-05-17T03:26:36.336Z|00582|timeval|WARN|disk: 0 reads, 400 writes
> > > 2015-05-17T03:26:36.336Z|00583|timeval|WARN|context switches: 20299734
> > > voluntary, 2174 involuntary
> > > 2015-05-17T03:26:36.336Z|00584|coverage|INFO|Skipping details of
> > > duplicate
> > > event coverage for hash=35d02982
> > 
> > The main loop of ovs-vswitchd usually expects to cycle at least once
> > per second. This message appears when this takes longer than usual,
> > which may indicate that ovs-vswitchd is under heavy load, or some sort
> > of bug.
> > 
> > > 
> > > 2、I also encounter this problem:
> > > when I execute
> > > ovs-ofctl dump-flows br-int
> > > not always reponse,  but ovs-vsctl show is OK!
> > 
> > The main thread of ovs-vswitchd assembles responses to OpenFlow
> > requests, and given the above messages it sounds like it's busy doing
> > other things. Ovs-vsctl on the other hand talks to ovsdb-server, which
> > looks like it is OK in this scenario.
> > 
> > These both seem like symptoms, but it's not obvious to me what the
> > underlying cause is. Is there anything else in the logs? Is
> > ovs-vswitchd using a lot of CPU?
> > 
> > Are you using the OVS kernel module from CentOS or one distributed
> > with the openvswitch package? ('modinfo openvswitch' should give a
> > hint here)
> > 
> > On 19 May 2015 at 03:32, yangwanli <yangwanli at jd.com> wrote:
> > > Hi,
> > > I'm a cloud computing R&D engineer. Use OVS as virtual switch, all
> > > virtual
> > > machines and containers communicate with OVS.
> > > I use OVS  VLAN.
> > > OpenvSwitch Version 2.3.1  (nothing change)
> > > CentOS 6.6
> > > In the process of I use, encounter the flowing problems.
> > > Need your help!
> > > 
> > > 1、ovs-vswitchd.log always report warnings:
> > > 
> > > 2015-05-17T00:52:50.463Z|00575|timeval|WARN|Unreasonably long 2020ms poll
> > > interval (365882ms user, 464906ms system)
> > > 2015-05-17T00:52:50.463Z|00576|timeval|WARN|faults: 708 minor, 0 major
> > > 2015-05-17T00:52:50.463Z|00577|timeval|WARN|disk: 0 reads, 392 writes
> > > 2015-05-17T00:52:50.463Z|00578|timeval|WARN|context switches: 19571142
> > > voluntary, 2080 involuntary
> > > 2015-05-17T00:52:50.463Z|00579|coverage|INFO|Skipping details of
> > > duplicate
> > > event coverage for hash=35d02982
> > > 2015-05-17T03:26:36.336Z|00580|timeval|WARN|Unreasonably long 2020ms poll
> > > interval (380306ms user, 483259ms system)
> > > 2015-05-17T03:26:36.336Z|00581|timeval|WARN|faults: 708 minor, 0 major
> > > 2015-05-17T03:26:36.336Z|00582|timeval|WARN|disk: 0 reads, 400 writes
> > > 2015-05-17T03:26:36.336Z|00583|timeval|WARN|context switches: 20299734
> > > voluntary, 2174 involuntary
> > > 2015-05-17T03:26:36.336Z|00584|coverage|INFO|Skipping details of
> > > duplicate
> > > event coverage for hash=35d02982
> > > 
> > > 2、I also encounter this problem:
> > > when I execute
> > > ovs-ofctl dump-flows br-int
> > > not always reponse,  but ovs-vsctl show is OK!
> > > 
> > > Thanks!
> > > WanLi Yang
> > > 
> > > 
> > > ________________________________
> > > 
> > > WanLi Yang
> > > 
> > > JD.COM
> > > 
> > > *****************************************************************
> > > 
> > > 
> > > _______________________________________________
> > > discuss mailing list
> > > discuss at openvswitch.org
> > > http://openvswitch.org/mailman/listinfo/discuss
> > > 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/discuss/attachments/20150528/51feaca5/attachment-0001.html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic