[prev in list] [next in list] [prev in thread] [next in thread] 

List:       e1000-devel
Subject:    Re: [E1000-devel] sf.net bug
From:       "Duyck, Alexander H" <alexander.h.duyck () intel ! com>
Date:       2010-02-12 15:59:06
Message-ID: 80769D7B14936844A23C0C43D9FBCF0F12D814E129 () orsmsx501 ! amr ! corp ! intel ! com
[Download RAW message or body]

Покотиленко Костик wrote:
> В Вто, 09/02/2010 в 23:34 +0200, Покотиленко Костик пишет:
> 
> > > > Also if ACPI is having an effect on the issue one other thing you
> > > > might try changing in the BIOS would be to disable all CPU
> > > > C-states. The system will consume more power as a result, but the
> > > > CPU also ends up usually being much more responsive as a result,
> > > > and we have seen in the past that this can sometimes resolve
> > > > performance issues. 
> > > 
> > > I'll turn those off:
> > > 
> > > CPU C State=1               ;Options: 1=Enabled: 0=Disabled
> > > C1E=1                       ;Options: 1=Enabled: 0=Disabled
> > 
> > Turned off "CPU C State" and "Spread spectrum", C1E turned off
> > automatically. 
> 
> With "CPU C State" and "Spread spectrum" turned off after 47 hours I
> got:
> 
> NETDEV WATCHDOG: eth1 (igb): transmit timed out
> Modules linked in: ...
> Call Trace:
> ...
> 
> Let summarize:
> 
> - None of kernel (29, 30) and driver combinations solved the problem
> - None of BIOS options helped
> - I've figured out that when TX Unit Hang on 2 configured ports,
> Loopback test fails on 2 unconfigured/used ports also
> - When the NIC stops working, rest of the system feels Ok
> 
> So the problem localized a bit, but the source of the problem it's not
> clear. Is it hardware related or software...
> 
> Also system is in use by ~300 customers, so more downtime that we
> already have is not desireable.
> 
> Server has 2 onboard NICs with one of which we have had similar
> problem, and PCI-e Quad port NIC.
> 
> We can still live with 2 NICs, so one of the options for further
> testing I see is to go back using onboard NICs and put PCI-e Quad
> port NIC to another server I support and do a loop back (Port1<->
> Port2, Port3<->Port4) stress test, but there is 2.6.26 kernel
> (changing not an option).
> 
> Let me know what you think and what are other options of further
> testing. I'm going to try 2.6.32 before switching NIC to another
> server. I Did not do this before because there was issues backporting
> it to Lenny.

At this point it feels like we have pretty much eliminated the drivers as being an \
issue since the unused pair of ports is effected by whatever is causing the first \
pair to fail.  The issue most likely resides somewhere in the path between the \
on-board PCIe bridge and the PCIe root complex on the system.

I think testing the NIC in another system would be our best option for now.  This \
will help to determine if the problem is something in the PCIe bridge on the adapter, \
or a problem in the root complex of the server.  If the issue follows the adapter you \
will likely need to get it replaced, but if the issue disappears we will need to \
start investigating all BIOS options on the system related to PCIe.

Thanks,

Alex 
------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit \
http://communities.intel.com/community/wired


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic