[prev in list] [next in list] [prev in thread] [next in thread] 

List:       e1000-devel
Subject:    Re: [E1000-devel] E1000 crashes on PowerPC
From:       "Hommel, Thomas (GE Indust, GE Fanuc)" <Thomas.Hommel () gefanuc ! com>
Date:       2007-06-29 7:48:54
Message-ID: 62DDBB9E5E23CC4A929EE46F9427CEAF14B1AF () BUDMLVEM04 ! e2k ! ad ! ge ! com
[Download RAW message or body]

Hello,
Here are two oops messages I captured, but the actual error varies each
time I run the test. There may be Segmentation faults, Machine check
exceptions, Failed paging, and other things. Looks like corrupted memory
to me.
I was also able to reproduce the error on a completely differnt system
(Kernel 2.6.12, MPC7447A processor with Marvell MV64360 chipset) with
the same NIC.

Thomas


Linux 2.6.21.5 (192.168.1.151) (10:58 on Friday, 22 June 2007)

login: root
Last login: Fri Jun 22 10:58:12 on console
bash-3.00# cat /proc/cpuinfo
processor       : 0
cpu             : 7448, altivec supported
clock           : 1000.000000MHz
revision        : 0.2 (pvr 8004 0202)
bogomips        : 199.68

processor       : 1
cpu             : 7448, altivec supported
clock           : 1000.000000MHz
revision        : 0.2 (pvr 8004 0202)
bogomips        : 199.68

total bogomips  : 399.36
timebase        : 100000000
platform        : SBS CM6
Vendor          : Freescale Semiconductor
Machine         : SBS_CM6
SVR             : 0x80900120
Memory          : 512 MB
bash-3.00# cat /proc/interrupts
           CPU0       CPU1
 18:          0          0   MPIC      Level     mpc86xx_ecc,
mpc86xx_ecc
 42:        342          0   MPIC      Level     serial
 43:         35          0   MPIC      Level     i2c-mpc, i2c-mpc
 64:      10421          0   MPIC      Level     eth0
 65:     182043          0   MPIC      Level     eth1
 66:          0          0   MPIC      Level     stk17ta8
251:          0         77   MPIC      Edge      IPI0 (call function)
252:        690       1200   MPIC      Edge      IPI1 (reschedule)
253:          0          0   MPIC      Edge      IPI2 (unused)
254:          0          0   MPIC      Edge      IPI3 (debugger break)
BAD:          0
bash-3.00#

------------------------------ Crash #1
-----------------------------------

bash-3.00# ./rep_netio.sh -n10 192.168.2.254

NETIO - Network Throughput Benchmark, Version 1.26
(C) 1997-2005 Kai Uwe Rommel

TCP connection established.
Packet size  1k bytes:  114724 KByte/s Tx,  114814 KByte/s Rx.
Packet size  2k bytes:  114654 KByte/s Tx,  114839 KByte/s Rx.
Packet size  4k bytes:  114569 KByte/s Tx,  114853 KByte/s Rx.
Packet size  8k bytes:  114517 KByte/s Tx,  114886 KByte/s Rx.
Packet size 16k bytes:  114616 KByte/s Tx,  114866 KByte/s Rx.
Packet size 32k bytes:  114737 KByte/s Tx, Machine check in kernel mode.
Caused by (from SRR1=149030): Transfer error ack signal
Oops: Machine check, sig: 7 [#1]
SMP NR_CPUS=2
NIP: C01A20D0 LR: C0199024 CTR: C002C320
REGS: c0371c70 TRAP: 0200   Not tainted  (2.6.21.5)
MSR: 00149030 <EE,ME,IR,DR>  CR: 44002028  XER: 00000000
TASK = c034e140[0] 'swapper' THREAD: c0370000 CPU: 0
GPR00: C0199024 C0371D20 C034E140 C08CD548 0EDF33FA FFFF1DE1 0EDF33FA
33689206
GPR08: C08CD398 C0371D98 E1080000 00000080 06ACFC00 1001BA3C C0380000
FFFFFFFF
GPR16: 00000001 00000000 C0371ED0 00000005 A3273126 C08C2520 C08CD548
C0371D98
GPR24: C0370000 00000000 C0386EAC C0370000 00200200 C0198FFC C08CD548
C08CD380
NIP [C01A20D0] e1000_check_for_link+0x20/0x4f8
LR [C0199024] e1000_watchdog+0x28/0x744
Call Trace:
[C0371D20] [00000001] 0x1 (unreliable)
[C0371D40] [C0199024] e1000_watchdog+0x28/0x744
[C0371D90] [C002C434] run_timer_softirq+0x114/0x1e4
[C0371DD0] [C0027924] __do_softirq+0xa0/0x13c
[C0371E10] [C00064EC] do_softirq+0x64/0x68
[C0371E20] [C0027318] irq_exit+0x54/0x64
[C0371E30] [C000DEC8] timer_interrupt+0x2b4/0x680
[C0371EC0] [C0011810] ret_from_except+0x0/0x14
--- Exception: 901 at cpu_idle+0xb0/0xec
    LR = cpu_idle+0xa4/0xec
[C0371F80] [C0009578] cpu_idle+0xdc/0xec (unreliable)
[C0371FA0] [C0003B14] rest_init+0x2c/0x3c
[C0371FB0] [C03207BC] start_kernel+0x2bc/0x388
[C0371FF0] [000037B4] 0x37b4
Instruction dump:
901f0028 4bffff44 907f0028 4bffff60 7c0802a6 9421ffe0 bfc10018 7c7e1b78
90010024 81430000 7c0004ac 7d60542c <0c0b0000> 4c00012c 380a0008
7c0004ac
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 180 seconds..


------------------------------ Crash #2
-----------------------------------

bash-3.00# ifconfig eth1 192.168.2.151 up
bash-3.00# ./rep_netio.sh -n10 192.168.2.254

NETIO - Network Throughput Benchmark, Version 1.26
(C) 1997-2005 Kai Uwe Rommel

TCP connection established.
Packet size  1k bytes:  114735 KByte/s Tx,  114844 KByte/s Rx.
Packet size  2k bytes:  114642 KByte/s Tx,  114838 KByte/s Rx.
Packet size  4k bytes:  114586 KByte/s Tx,  114835 KByte/s Rx.
Packet size  8k bytes:  114517 KByte/s Tx,  114874 KByte/s Rx.
Packet size 16k bytes:  114626 KByte/s Tx,  114863 KByte/s Rx.
Packet size 32k bytes:  114748 KByte/s Tx,  4504 KByte/s Rx.
Done.


NETIO - Network Throughput Benchmark, Version 1.26
(C) 1997-2005 Kai Uwe Rommel

TCP connection established.
Packet size  1k bytes: Unable to handle kernel paging request for data
at address 0xed97c9da
Faulting instruction address: 0xc0016290
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2
NIP: C0016290 LR: C0196F3C CTR: 00000008
REGS: c03338a0 TRAP: 0300   Not tainted  (2.6.21.5)
MSR: 00009032 <EE,ME,IR,DR>  CR: 28228448  XER: 20000000
DAR: ED97C9DA, DSISR: 40000000
TASK = c0bccd90[1011] 'netio' THREAD: c0332000 CPU: 0
GPR00: 00000000 C0333950 C0BCCD90 DFE47010 ED97C9D6 00000044 DFE4700C
00000008
GPR08: 00000001 DFE47012 DF241960 DFE47010 28228448 1001BA3C 00000000
00000001
GPR16: 00000042 00000001 E101C290 00000001 C08EE000 CFE11290 C08EE380
C08EA5A0
GPR24: 00000029 E101C280 00000023 CFE11280 DF241960 00000042 00000044
DFF10AC0
NIP [C0016290] memcpy+0x1c/0x9c
LR [C0196F3C] e1000_clean_rx_irq+0x390/0x4d8
Call Trace:
[C0333950] [C0196F0C] e1000_clean_rx_irq+0x360/0x4d8 (unreliable)
[C03339A0] [C0198A10] e1000_intr+0x19c/0x57c
[C0333A00] [C004367C] handle_IRQ_event+0x5c/0xb0
[C0333A20] [C0045550] handle_fasteoi_irq+0xac/0x17c
[C0333A40] [C00065B8] do_IRQ+0xc8/0x100
[C0333A60] [C0011810] ret_from_except+0x0/0x14
--- Exception: 501 at __do_softirq+0x74/0x13c
    LR = do_softirq+0x64/0x68
[C0333B20] [000000E0] 0xe0 (unreliable)
[C0333B60] [C00064EC] do_softirq+0x64/0x68
[C0333B70] [C0027318] irq_exit+0x54/0x64
[C0333B80] [C00065BC] do_IRQ+0xcc/0x100
[C0333BA0] [C0011810] ret_from_except+0x0/0x14
--- Exception: 501 at tcp_set_skb_tso_segs+0x40/0x74
    LR = tso_fragment+0x170/0x25c
[C0333C60] [C022EABC] tso_fragment+0x150/0x25c (unreliable)
[C0333C90] [C022ECDC] tcp_push_one+0x114/0x164
[C0333CB0] [C0223ADC] tcp_sendmsg+0x5d8/0xc44
[C0333D10] [C0241FC4] inet_sendmsg+0x50/0x78
[C0333D30] [C01F0C24] sock_sendmsg+0xac/0xf4
[C0333E20] [C01F0FA8] sys_sendto+0xcc/0x108
[C0333F00] [C01F1634] sys_socketcall+0x1c4/0x1d8
[C0333F40] [C0011164] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xffd99ec
    LR = 0x100014ac
Instruction dump:
38c60001 4200fff0 4e800020 7c032040 418100a0 54a7e8ff 38c3fffc 3884fffc
41820028 70c00003 7ce903a6 40820054 <80e40004> 85040008 90e60004
95060008
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 180 seconds..<0>------------[ cut here ]------------





-----Original Message-----
From: Kok, Auke [mailto:auke-jan.h.kok@intel.com] 
Sent: Donnerstag, 28. Juni 2007 17:27
To: Hommel, Thomas (GE Indust, GE Fanuc)
Cc: e1000-devel@lists.sourceforge.net
Subject: Re: [E1000-devel] E1000 crashes on PowerPC

Hommel, Thomas (GE Indust, GE Fanuc) wrote:
> Hello,
> I am trying to use an Intel 82546EB based NIC in a PowerPC system with

> MPC8641D CPU. The NIC seems to work at first glance, but when putting 
> load on the network interfaces (especially when both interfaces run 
> under load), the system shows instabilities and eventually crashes.
> I am using a 2.6.21.5 kernel with additional patches and the 
> integrated driver (7.3.20-k2), but I've also tried the newest 7.5.5 
> driver with the same result.
> I have done extensive testing with various NICs at different PCI bus 
> speeds and bus widths, also with other NICs that use the e1000 driver 
> (e.g. 82544EI), but all showed the same failures. NICs with other 
> drivers worked OK.

Can you possible try to get a trace of the crash? this would be most
useful, even a screenshot would be nice (remember to set your font to 60
lines for the screen to catch as much as you can in that case).

Auke

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic