[prev in list] [next in list] [prev in thread] [next in thread] 

List:       e1000-devel
Subject:    Re: [E1000-devel] igb transmit queue timeout
From:       Alexander Duyck <alexander.duyck () gmail ! com>
Date:       2018-01-25 16:14:05
Message-ID: CAKgT0Ue1Mn5RqQNHa5GnEMxSimMMMVLy_FH56Luui-BK6vAduA () mail ! gmail ! com
[Download RAW message or body]

I agree with Todd, we need way more information on this.

For example, if we had the dmesg we could tell if the Tx hang message
is being reported or not. If not it might point to a problem with the
interrupts on the device. If I recall correctly the igb driver should
be generating an interrupt every 2 seconds on each of its TxRx
interrupt vectors. If you were to run 'watch -d "grep enp1s0f0-TxRx
/proc/interrupts"' what you should see is all of the interrupt vectors
increment by at least 1 every 2 seconds. If you don't see that then it
could be a sign of an issue in the interrupt handling logic of the
kernel as this is an issue we have seen with Xen in the past.

Thanks.

- Alex

On Wed, Jan 24, 2018 at 2:08 PM, Fujinaka, Todd <todd.fujinaka@intel.com> wrote:
> There's really not enough information here. Ideally you would send us the dmesg of \
> when it fails, and a register dump before and after. 
> I would suggest opening on bug on sourceforge and attaching the dmesg & register \
> dumps to the bug. Don't just copy them into the bug because that's much harder to \
> read. 
> We haven't heard of many issues with the 82576 like this, so you may also want to \
> ask Supermicro for help, but it also looks like your hardware is EOL. 
> Todd Fujinaka
> Software Application Engineer
> Datacenter Engineering Group
> Intel Corporation
> todd.fujinaka@intel.com
> 
> 
> -----Original Message-----
> From: Kojedzinszky Richárd [mailto:kojedzinszky.richard@euronetrt.hu]
> Sent: Wednesday, January 24, 2018 1:44 AM
> To: e1000-devel@lists.sourceforge.net
> Subject: [E1000-devel] igb transmit queue timeout
> 
> Dear maintainers,
> 
> We have a xen virtualization environment, with 6 nearly identical nodes, Supermicro \
> X8DTU boards. 
> We run debian stretch on them, the xen hypervisor and linux kernel is from debian \
> stretch, latest at the time of writing. 
> Unfortunately, we are facing an issue where randomly our igb devices stop working, \
> with the error message: 
> NETDEV WATCHDOG: enp1s0f0 (igb): transmit queue 0 timed out
> 
> And while the driver tries to recover/reset the adapter, it does not succeed. \
> Shutting down the interface and then bringing it back even does not help, a reboot \
> is required to restore normal operation. 
> The servers are connected to our switch with two interfaces, the problem happens \
> randomly on either one. 
> We have tried to disable msi interrupts, but that did not help.
> 
> Unfortunately, we cannot reproduce the problem, I mean it happens randomly, \
> frequently, but we cannot explicitly trigger it. It did happen on nearly all our \
> nodes, so I assume it is not a hardware problem. 
> Our kernel/xen versions:
> 
> # uname -a
> Linux node-3.cloud-b.dravanet.net 4.9.0-5-amd64 #1 SMP Debian
> 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux # xl info
> host                   : x
> release                : 4.9.0-5-amd64
> version                : #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04)
> machine                : x86_64
> nr_cpus                : 8
> max_cpu_id             : 23
> nr_nodes               : 2
> cores_per_socket       : 4
> threads_per_core       : 1
> cpu_mhz                : 3066
> hw_caps                :
> b7ebfbff:029ee3ff:2c100800:00000001:00000000:00000000:00000000:00000100
> virt_caps              : hvm hvm_directio
> total_memory           : 196599
> free_memory            : 94364
> sharing_freed_memory   : 0
> sharing_used_memory    : 0
> outstanding_claims     : 0
> free_cpus              : 0
> xen_major              : 4
> xen_minor              : 8
> xen_extra              : .3-pre
> xen_version            : 4.8.3-pre
> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
> hvm-3.0-x86_32p hvm-3.0-x86_64
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          :
> xen_commandline        : placeholder dom0_mem=4096M gnttab_max_frames=256
> cc_compiler            : gcc (Debian 6.3.0-18) 6.3.0 20170516
> cc_compile_by          : ijackson
> cc_compile_domain      : chiark.greenend.org.uk
> cc_compile_date        : Sat Nov 25 11:30:34 UTC 2017
> build_id               : 23ac95af74d2e3f84c90068ae674c34e764649e7
> xend_config_format     : 4
> 
> What else could we try to resolve this issue?
> 
> Thanks in advance,
> 
> Kojedzinszky Richárd
> Euronet Magyarorszag Informatika Zrt.
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most engaging tech \
> sites, Slashdot.org! http://sdm.link/slashdot \
> _______________________________________________ E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel&#174; Ethernet, visit \
> http://communities.intel.com/community/wired 
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel&#174; Ethernet, visit \
> http://communities.intel.com/community/wired

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit \
http://communities.intel.com/community/wired


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic