[prev in list] [next in list] [prev in thread] [next in thread] 

List:       e1000-devel
Subject:    Re: [E1000-devel] [e1000e] transmit unit hangs...
From:       Jesse Brandeburg <jesse.brandeburg () intel ! com>
Date:       2009-09-27 19:03:41
Message-ID: 1254078221.20405.43.camel () jbrandeb-mobl2
[Download RAW message or body]

Hi Daniel, sorry to heave about your issue and thanks for the report.

On Sun, 2009-09-27 at 06:24 -0700, Daniel J Blueman wrote:
> I'm seeing transmit unit hangs [1] every couple of weeks, with a
> particular activity level on my 82566DC with mainline 2.6.31 x86-64.
> Since this may be a known issue, I'm emailing the e1000e maintainers
> and group directly, rather than the linux-kernel and linux-net mailing
> lists.
> 
> Perhaps what is different to most setups, is I'm running a VLAN [2],
> and I've seen the problem now twice when the activity level is higher,
> over the eth0.30 VLAN (but not over the normal eth0 LAN).
> 

In this case I think that the VLAN isn't related to the issue.  See
below.

> What information is needed to progress this?
> 
> Many thanks!
>   Daniel
> 
> --- [1]
> 
> $ dmesg
> [    0.516710] 0000:00:19.0: eth0: (PCI Express:2.5GB/s:Width x1)
> 00:03:2d:0b:9d:73
> [    0.516747] 0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection
> [    0.516797] 0000:00:19.0: eth0: MAC: 6, PHY: 6, PBA No: ffffff-0ff
> [    4.450508] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
> Control: RX/TX
> ...
> [67918.704063] 0000:00:19.0: eth0: Detected Tx Unit Hang:
> [67918.704066]   TDH                  <79>
> [67918.704068]   TDT                  <d8>
> [67920.704425] 0000:00:19.0: eth0: Detected Tx Unit Hang:
> [67920.704428]   TDH                  <26>
> [67920.704430]   TDT                  <aa>
> [67922.704196] 0000:00:19.0: eth0: Detected Tx Unit Hang:
> [67922.704199]   TDH                  <b>
> [67922.704201]   TDT                  <b1>

The above leads me to believe that you're getting false hangs.  TDH is
still moving forward and the adapter is eventually making progress
transmitting.

I think I see the reason below, or at least a contributor.

<snip>


> # ethtool -S eth0
> NIC statistics:
>      rx_packets: 7550711
>      tx_packets: 7445387
>      rx_bytes: 6806633577
>      tx_bytes: 6164100259
>      rx_broadcast: 31
>      tx_broadcast: 8599
>      rx_multicast: 151
>      tx_multicast: 41
>      multicast: 151
>      rx_missed_errors: 60846
>      tx_deferred_ok: 7919
>      tx_restart_queue: 41121
>      tx_tcp_seg_good: 690431
>      rx_flow_control_xon: 8010
>      rx_flow_control_xoff: 69056

This indicates a problem with whatever you're connected to being too
busy to receive packets and if you're connected to a switch, it is
indicating that switch is overloaded and probably misconfigured, as most
switch vendors will not turn on *sending* flow control from the switch.

>      tx_timeout_count: 0

This indicates that you're not actually experiencing any hardware resets
due to a tx hang, which is good.


>      tx_flow_control_xon: 4846
>      tx_flow_control_xoff: 4855
>      rx_long_byte_count: 6806633577
>      rx_csum_offload_good: 7537182
>      rx_csum_offload_errors: 4

The driver is really just reporting in this case that the flow control
is holding off transmits from being completed for more than two seconds.
The lack of an eventual reset means that all frames completed before 5
seconds elapsed.

I think the solution is to figure out why your link partner is sending
flow control to you.  If your link partner is another computer, the
bidirectional flow control is probably normal.  We likely need another
way in the driver to detect being tx paused due to flow control, it is a
difficult problem to solve in our driver, however.

One thing you could potentially do is increase the tx_timeout_factor for
the 1 gigabit case, but this would require a slight code modification.
If you would like assistance to do this, let me know.

At some point we should be able to get a better hang detect running that
can detect progress is being made, but that code isn't written yet.

Jesse

-- 
Jesse Brandeburg
This email sent via Evolution, powered by Linux


------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic