[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-smp
Subject: PROBLEM: repeated oops+panic on test11 SMP intel on SCSI
From: Aron Rosenberg <amr42 () cornell ! edu>
Date: 2000-12-08 5:19:01
[Download RAW message or body]
Hello All, this is my first bug report so bear with me (I'm trying to
follow the directions.) This is being sent to both smp and scsi cause it
might be both.
Alright, I hope this helps everybody!
Aron Rosenberg
amr42@cornell.edu
Video Conferencing for Linux
http://cu30.sourceforge.net
<begin Bug report>
[1] SMP machine keeps oops'in and crashing on heavy SCSI disk access. This
can be reproduced on low, medium and high CPU usage. File system is ext2.
[2] The kernel will oops and panic when in a shell a user tries to do a
large chksum or md5sum on a file on an SMP test11 machine. This is a
repeatable problem on SMP machine. Intel Dell
[3] Keywords: smp kernel
[4] version 2.4.0test11
[5] oops message and stack trace derefernced
Unable to handle Kernel NULL pointer dereference at virtual address 0000003d
*pde = 00000000
Eip: 0010:[<c01331bc>]
Using defaults from ksymoops -t elf32-i386 -a i386
Eflags: 00010007
eax: 000000001 ebx: c122d628 ecx:00000046 eax: 000000001
esi: c832fc20 eti: 00000202 ebp: c832fc68 esp: c166de50
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c166d000)
Stack: c832fc20 d7eb20ac d7eb2000 00000050 c019a52d c832fc20 00000001 d7eb2000
d7eb2000 00000000 c02db174 c019a829 d7eb2000 00000001 000000f8 00000001
00000001 00000001 d7eb2000 000000f8 c166df08 c02db178 c02db17c 0000001f
Call Trace: [<c019a52d>] [<c019a829>] [<c01b8b9c>] [<c0199b1a>] [<c01abae0>]
[<c01a0524>] [<c01abda1>]
[<c010be61>] [<c010c056>] [<c0108934>] [<c0108934>] [<c010a7d8>]
[<c018934>] [<c0108934>] [<c0100018>]
[<c0108960>] [<c01089c2>] [<c01c6ea5>] [<c0171572>]
Code: 81 78 3c 48 31 13 c0 75 06 f6 40 18 04 75 5d 8b 40 28 39 f0
>>EIP; c01331bc <end_buffer_io_async+74/f4> <=====
Trace; c019a52d <__scsi_end_request+99/144>
Trace; c019a829 <scsi_io_completion+169/34c>
Trace; c01b8b9c <rw_intr+154/15c>
Trace; c0199b1a <scsi_old_done+5a6/5c4>
Trace; c01abae0 <aic7xxx_isr+d8/310>
Trace; c01a0524 <aic7xxx_done_cmds_complete+28/38>
Trace; c01abda1 <do_aic7xxx_isr+89/ac>
Trace; c010be61 <handle_IRQ_event+51/7c>
Trace; c010c056 <do_IRQ+9a/ec>
Trace; c0108934 <default_idle+0/34>
Trace; c0108934 <default_idle+0/34>
Trace; c010a7d8 <ret_from_intr+0/20>
Trace; 0c018934 Before first symbol
Trace; c0108934 <default_idle+0/34>
Trace; c0100018 <startup_32+18/cc>
Trace; c0108960 <default_idle+2c/34>
Trace; c01089c2 <cpu_idle+3a/50>
Trace; c01c6ea5 <vgacon_cursor+1e9/1f4>
Trace; c0171572 <set_cursor+6e/84>
Code; c01331bc <end_buffer_io_async+74/f4>
0000000000000000 <_EIP>:
Code; c01331bc <end_buffer_io_async+74/f4> <=====
0: 81 78 3c 48 31 13 c0 cmpl $0xc0133148,0x3c(%eax) <=====
Code; c01331c3 <end_buffer_io_async+7b/f4>
7: 75 06 jne f <_EIP+0xf> c01331cb
<end_buffer_io_async+83/f4>
Code; c01331c5 <end_buffer_io_async+7d/f4>
9: f6 40 18 04 testb $0x4,0x18(%eax)
Code; c01331c9 <end_buffer_io_async+81/f4>
d: 75 5d jne 6c <_EIP+0x6c> c0133228
<end_buffer_io_async+e0/f4>
Code; c01331cb <end_buffer_io_async+83/f4>
f: 8b 40 28 mov 0x28(%eax),%eax
Code; c01331ce <end_buffer_io_async+86/f4>
12: 39 f0 cmp %esi,%eax
Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
1 warning issued. Results may not be reliable.
-------------------------------------------------------------
I had to copy down by hand, so I hope everything is correct.
Repeat of the oops on no cpu use.
Unable to handle Kernel NULL pointer dereference at virtual address 0000003d
*pde = 00000000
Eip: 0010:[<c01331bc>]
Using defaults from ksymoops -t elf32-i386 -a i386
Eflags: 00010007
eax: 000000001 ebx: c12aefe4 ecx:00000046 eax: 000000001
esi: ca1abd40 eti: 00000202 ebp: ca1abd88 esp: c166de50
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c166d000)
Stack: c832fc20 d7eb20ac d7eb2000 00000050 c019a52d c832fc20 00000001 d7eb2000
d7eb2000 00000000 c02db174 c019a829 d7eb2000 00000001 000000f8 00000001
00000001 00000001 d7eb2000 000000f8 c166df08 c02db178 c02db17c 0000001f
Call Trace: [<c019a52d>] [<c019a829>] [<c01b8b9c>] [<c0199b1a>] [<c01abae0>]
[<c01a0524>] [<c01abda1>]
[<c010be61>] [<c010c056>] [<c0108934>] [<c0108934>] [<c010a7d8>]
[<c018934>] [<c0108934>] [
<c0100018>]
[<c0108960>] [<c01089c2>] [<c01c6ea5>] [<c0171572>]
Code: 81 78 3c 48 31 13 c0 75 06 f6 40 18 04 75 5d 8b 40 28 39 f0
>>EIP; c01331bc <end_buffer_io_async+74/f4> <=====
Trace; c019a52d <__scsi_end_request+99/144>
Trace; c019a829 <scsi_io_completion+169/34c>
Trace; c01b8b9c <rw_intr+154/15c>
Trace; c0199b1a <scsi_old_done+5a6/5c4>
Trace; c01abae0 <aic7xxx_isr+d8/310>
Trace; c01a0524 <aic7xxx_done_cmds_complete+28/38>
Trace; c01abda1 <do_aic7xxx_isr+89/ac>
Trace; c010be61 <handle_IRQ_event+51/7c>
Trace; c010c056 <do_IRQ+9a/ec>
Trace; c0108934 <default_idle+0/34>
Trace; c0108934 <default_idle+0/34>
Trace; c010a7d8 <ret_from_intr+0/20>
Trace; 0c018934 Before first symbol
Trace; c0108934 <default_idle+0/34>
Trace; c0100018 <startup_32+18/cc>
Trace; c0108960 <default_idle+2c/34>
Trace; c01089c2 <cpu_idle+3a/50>
Trace; c01c6ea5 <vgacon_cursor+1e9/1f4>
Trace; c0171572 <set_cursor+6e/84>
Code; c01331bc <end_buffer_io_async+74/f4>
0000000000000000 <_EIP>:
Code; c01331bc <end_buffer_io_async+74/f4> <=====
0: 81 78 3c 48 31 13 c0 cmpl $0xc0133148,0x3c(%eax) <=====
Code; c01331c3 <end_buffer_io_async+7b/f4>
7: 75 06 jne f <_EIP+0xf> c01331cb
<end_buffer_io_async+83/f4>
Code; c01331c5 <end_buffer_io_async+7d/f4>
9: f6 40 18 04 testb $0x4,0x18(%eax)
Code; c01331c9 <end_buffer_io_async+81/f4>
d: 75 5d jne 6c <_EIP+0x6c> c0133228
<end_buffer_io_async+e0/f4>
Code; c01331cb <end_buffer_io_async+83/f4>
f: 8b 40 28 mov 0x28(%eax),%eax
Code; c01331ce <end_buffer_io_async+86/f4>
12: 39 f0 cmp %esi,%eax
Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
1 warning issued. Results may not be reliable.
[6] How to produce
This all happen on heavy disk usage of the SCSI system. If I try to compile
the test11 kernel on a test11 boot with make -j3 or -j4 things will
segfault out and once it oops'ed. This particular oops happened when
untarring/verify a 358 meg file on the SCSI drive. There is also low to
medium processor usage happening. I can reproduce this on an idle machine
too. The file is a 358 meg script file/tar ball which causes the kernel to
oops by running cksum or md5sum verify.
[7.0] Machine type
Dell Precision Workstation 610 MT
Bios: A09
Processors: 2 Pentium II Xeon 400's
Memory 396megs
Disks: 1 IDE 2gig main boot, 1 SCSI-2 attached host 9gig.
[7.1] Linux amr42a 2.4.0-test11 #1 SMP Wed Dec 6 19:33:04 EST 2000 i686 unknown
Kernel modules 2.3.11
Gnu C egcs-2.91.66
Gnu Make 3.77
Binutils 2.10.0.24
Linux C Library 2.1.3
Dynamic linker ldd: version 1.9.9
Procps 2.0.7
Mount 2.9v
Net-tools 1.55
Kbd command
Sh-utils 2.0
Modules Loaded
[7.2] /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 5
model name : Pentium II (Deschutes)
stepping : 2
cpu MHz : 398.000780
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
features : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr
bogomips : 796.26
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 5
model name : Pentium II (Deschutes)
stepping : 2
cpu MHz : 398.000780
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
features : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr
bogomips : 796.26
[7.3] No modules loaded.
[7.4] /proc/ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
01f0-01f7 : ide0
02f8-02ff : serial(auto)
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(auto)
0800-083f : Intel Corporation 82371AB PIIX4 ACPI
0840-085f : Intel Corporation 82371AB PIIX4 ACPI
0cf8-0cff : PCI conf1
dc00-dc7f : 3Com Corporation 3c905B 100BaseTX [Cyclone]
dc00-dc7f : eth0
dce0-dcff : Intel Corporation 82371AB PIIX4 USB
e000-efff : PCI Bus #02
e800-e8ff : Adaptec AIC-7880U
e800-e8fe : aic7xxx
ec00-ecff : Adaptec AHA-2940U2/W / 7890
ec00-ecfe : aic7xxx
ffa0-ffaf : Intel Corporation 82371AB PIIX4 IDE
ffa0-ffa7 : ide0
/proc/iomem
00000000-0009ffff : System RAM
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000c8000-000cc7ff : Extension ROM
000cc800-000ccfff : Extension ROM
000cd000-000cffff : Extension ROM
000f0000-000fffff : System ROM
00100000-17ffdfff : System RAM
00100000-0025193f : Kernel code
00251940-0026777f : Kernel data
17ffe000-17ffffff : reserved
f0000000-f3ffffff : Intel Corporation 440GX - 82443GX Host bridge
f5000000-f5ffffff : PCI Bus #02
f6000000-f6ffffff : PCI Bus #01
f9000000-faffffff : PCI Bus #02
f9ffe000-f9ffefff : Adaptec AIC-7880U
f9fff000-f9ffffff : Adaptec AHA-2940U2/W / 7890
fb000000-fdffffff : PCI Bus #01
fb800000-fbffffff : Texas Instruments TVP4020 [Permedia 2]
fc000000-fc7fffff : Texas Instruments TVP4020 [Permedia 2]
fcfe0000-fcffffff : Texas Instruments TVP4020 [Permedia 2]
fe000000-fe00007f : 3Com Corporation 3c905B 100BaseTX [Cyclone]
fec00000-fec0ffff : reserved
fee00000-fee0ffff : reserved
ffe00000-ffffffff : reserved
[7.5]
00:00.0 Host bridge: Intel Corporation 440GX - 82443GX Host bridge
Subsystem: Dell Computer Corporation: Unknown device 4087
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort+ >SERR- <PERR-
Latency: 64
Region 0: Memory at f0000000 (32-bit, prefetchable) [size=64M]
Capabilities: [a0] AGP version 1.0
Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>
00:01.0 PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge (prog-if
00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV+ VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 64
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: fb000000-fdffffff
Prefetchable memory behind bridge: f6000000-f6ffffff
BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B+
00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 0
00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
(prog-if 80 [Master])
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 64
Region 4: I/O ports at ffa0 [size=16]
00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
(prog-if 00 [UHCI])
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 64
Interrupt: pin D routed to IRQ 19
Region 4: I/O ports at dce0 [size=32]
00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
00:11.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 24)
Subsystem: Dell Computer Corporation 3C905B Fast Etherlink XL 10/100
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 64 (2500ns min, 2500ns max), cache line size 08
Interrupt: pin A routed to IRQ 17
Region 0: I/O ports at dc00 [size=128]
Region 1: Memory at fe000000 (32-bit, non-prefetchable) [size=128]
Expansion ROM at f8000000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00:13.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 03)
(prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 64, cache line size 08
Bus: primary=00, secondary=02, subordinate=02, sec-latency=64
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: f9000000-faffffff
Prefetchable memory behind bridge: 00000000f5000000-00000000f5f00000
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI- D1- D2- AuxCurrent=220mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Bridge: PM- B3+
01:00.0 VGA compatible controller: Texas Instruments TVP4020 [Permedia 2]
(rev 01) (prog-if 00 [VGA])
Subsystem: Diamond Multimedia Systems FIRE GL 1000 PRO
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 16
Region 0: Memory at fcfe0000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at fc000000 (32-bit, non-prefetchable) [size=8M]
Region 2: Memory at fb800000 (32-bit, non-prefetchable) [size=8M]
Expansion ROM at 80000000 [disabled] [size=64K]
Capabilities: [40] AGP version 1.0
Status: RQ=31 SBA+ 64bit- FW- Rate=x1
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>
02:0a.0 SCSI storage controller: Adaptec AHA-2940U2/W / 7890
Subsystem: Dell Computer Corporation: Unknown device 0087
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 64 (9750ns min, 6250ns max), cache line size 08
Interrupt: pin A routed to IRQ 18
BIST result: 00
Region 0: I/O ports at ec00 [size=256]
Region 1: Memory at f9fff000 (64-bit, non-prefetchable) [size=4K]
Expansion ROM at fa000000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
02:0e.0 SCSI storage controller: Adaptec AIC-7880U (rev 01)
Subsystem: Adaptec AIC-7880P Ultra/Ultra Wide SCSI Chipset
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
Latency: 64 (2000ns min, 2000ns max), cache line size 08
Interrupt: pin A routed to IRQ 18
Region 0: I/O ports at e800 [size=256]
Region 1: Memory at f9ffe000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at fa000000 [disabled] [size=64K]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
[7.6] /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: QUANTUM Model: VIKING II 9.1WLS Rev: 3506
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 05 Lun: 00
Vendor: NEC Model: CD-ROM DRIVE:465 Rev: 1.03
Type: CD-ROM ANSI SCSI revision: 02
[7.7] /proc/interupts
CPU0 CPU1
0: 40942 31267 IO-APIC-edge timer
1: 2 0 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
14: 130167 103226 IO-APIC-edge ide0
17: 1626 1404 IO-APIC-level eth0
18: 3210 3195 IO-APIC-level aic7xxx, aic7xxx
NMI: 72143 72143
LOC: 72125 72124
ERR: 0
<end bug report>
-
To unsubscribe from this list: send the line "unsubscribe linux-smp" in
the body of a message to majordomo@vger.kernel.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic