[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-smp
Subject:    OOPS after upgrading CPU's ...
From:       Matthias Weidle <matt () box ! li>
Date:       2000-10-04 22:19:53
[Download RAW message or body]

hi there!

there is some strange stuff going on here and after checking all sources of 
information (without success) i hope that one of you may have the answer 
... :)

ok, here is the problem:

i'm running a smp server machine (mostly doing file server stuff) which was 
running pretty stable with 2 celeron-400 cpu's. i got about 60 days uptime 
without problems - even under heavy load!
a few weeks ago i decided to upgrade the celeron cpu's to some older p3's 
(those with 512kb cache, no coppermine) and did not expect any 
complications with that upgrade. but since then i can't get the machine up 
for more than a couple of days (depending on the load). sooner or later it 
locks with the following kernel oops message:

ksymoops 2.3.4 on i686 2.2.15pre19ext3.  Options used
     -v /usr/src/linux/vmlinux (specified)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.2.15pre19ext3/ (default)
     -m /usr/src/linux/System.map (default)

Warning (compare_ksyms_lsmod): module i2c-isa is in lsmod but not in ksyms, 
probably no symbols exported
Warning (compare_ksyms_lsmod): module i2c-piix4 is in lsmod but not in 
ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module nfsd is in lsmod but not in ksyms, 
probably no symbols exported
Warning (compare_ksyms_lsmod): module w83781d is in lsmod but not in ksyms, 
probably no symbols exported
Unable to handle kernel NULL pointer dereference at virtual address 00000013
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0002
CPU:    1
EIP:    0010:[<c01100c5>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010006
eax: 00000013   ebx: 00000260   ecx: cc100480   edx: cbffa000
esi: cbffa000   edi: 00000013   ebp: cbffbf74   esp: cbffbf4c
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, process nr: 1, stackpage=cbffb000)
Stack: 00000013 c01104c1 00000013 cbffbf7c cbffa000 c0226020 c010b850 
00000013
       cbffbf7c cbffa000 00000000 c010a328 cbffa000 cbffa000 cbffa000 
cbffa000
       c0226020 00000000 00000080 00000018 cbff0018 ffffff13 c0107b15 
00000010
Call Trace: [<c01104c1>] [<c010b850>] [<c010a328>] [<c0107b15>] [<c019c875>]
[<c01166b7>]
Code: e0 28 21 c0 8b 04 85 e4 28 21 c0 83 f8 ff 74 53 bf 00 e0 ff

>>EIP; c01100c5 <mask_IO_APIC_irq+d/84>   <=====
Trace; c01104c1 <do_level_ioapic_IRQ+21/98>
Trace; c010b850 <do_IRQ+38/58>
Trace; c010a328 <common_interrupt+18/20>
Trace; c0107b15 <cpu_idle+3d/50>
Trace; c019c875 <vt_console_print+2fd/314>
Trace; c01166b7 <printk+177/184>
Code;  c01100c5 <mask_IO_APIC_irq+d/84>
00000000 <_EIP>:
Code;  c01100c5 <mask_IO_APIC_irq+d/84>   <=====
   0:   e0 28                     loopne 2a <_EIP+0x2a> c01100ef 
<mask_IO_APIC_irq+37/84> <=====
Code;  c01100c7 <mask_IO_APIC_irq+f/84>
   2:   21 c0                     andl   %eax,%eax
Code;  c01100c9 <mask_IO_APIC_irq+11/84>
   4:   8b 04 85 e4 28 21 c0      movl   0xc02128e4(,%eax,4),%eax
Code;  c01100d0 <mask_IO_APIC_irq+18/84>
   b:   83 f8 ff                  cmpl   $0xffffffff,%eax
Code;  c01100d3 <mask_IO_APIC_irq+1b/84>
   e:   74 53                     je     63 <_EIP+0x63> c0110128 
<mask_IO_APIC_irq+70/84>
Code;  c01100d5 <mask_IO_APIC_irq+1d/84>
  10:   bf 00 e0 ff 00            movl   $0xffe000,%edi

Kernel panic: Attempted to kill the idle task!

4 warnings issued.  Results may not be reliable. 



there have been 4-5 lockups since the upgrade and it was always the same 
oops message.


for the record some additional data about the server box:

soltek sl-68a dual slot1 motherboard (with latest h4 bios)
2 p3-550 with 512kb cache
promise udma66 controler
intel etherexpress nic
64 + 128 mb ram (pc100)
6 hdd's (maxtor and ibm drives)

kernel: 2.2.15pre20 (thats pretty much 2.2.16 i guess)
+ ide patch
+ ext3 patch
+ ppdd patch


if you need any additional data please don't hesitate to contact me for 
that!

is it really possible to break the stability of a box by simply upgrading 
to a better cpu? my first idea was bad ram ... because it is running at 100 
mhz now (66 with the celerons). but then i realized that this would be 
pretty unlikely considering the same oops message all the time.

is there somebody out there who can help me?


best regards,
matt.



-
To unsubscribe from this list: send the line "unsubscribe linux-smp" in
the body of a message to majordomo@vger.kernel.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic