[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-smp
Subject:    Re: Kernel Bug at spinlock.h ?!
From:       ChristopherHuhn <c.huhn () gsi ! de>
Date:       2003-03-10 8:52:04
[Download RAW message or body]

Zwane Mwaikambo wrote:

>On Thu, 6 Mar 2003, ChristopherHuhn wrote:
>
>  
>
>>Hi again,
>>
>>    
>>
>>>It looks like a possible race with rpc_execute and possibly the timer, 
>>>although i can't be certain where the other cpus are. Do the other oopses 
>>>look somewhat similar? Could you supply them?
>>> 
>>>
>>>      
>>>
>>below are some oopses I gathered yesterday and today, all on different 
>>machines.
>>I'd like to remark that we experience massive NFS problems at the moment 
>>that seem to be caused by our mixed potato 2.2/ woody 2.4 environment, 
>>i. e. linking apps on a woody system with the sources  mounted via nfs 
>>from a potato box leads to obscure IO failures like "no space left on 
>>device" (This never happens with woddy only). So this might be a clue 
>>here as well.
>>
>>The oopses are all written down from the screen, I hopefully made little 
>>"transmission" errors.
>>    
>>
>
>Some of these are a bit worrying seeing as they are bit flips, also they 
>all appear to come from a UP machine(?) this would change things with 
>respect to my previous comment about races. Regarding weird io failures 
>are you mounting with the 'soft' option?
>
>	Zwane
>  
>
The machines all all DP Xeons, our SP machines run the same kernel, but 
these oopses only occur on DP machines under heavy load.
The machines are recognized as SMP:
# uname -a
Linux lxb000 2.4.20 #2 SMP Tue Dec 17 10:43:29 CET 2002 i686 unknown

but the e7500 chipset seems not to be supported 100%:

Jan 27 15:26:34 lxb000 kernel: found SMP MP-table at 000f6710
Jan 27 15:26:34 lxb000 kernel: hm, page 000f6000 reserved twice.
Jan 27 15:26:34 lxb000 kernel: hm, page 000f7000 reserved twice.
Jan 27 15:26:34 lxb000 kernel: hm, page 0009f000 reserved twice.
Jan 27 15:26:34 lxb000 kernel: hm, page 000a0000 reserved twice.
Jan 27 15:26:34 lxb000 kernel: On node 0 totalpages: 262016
Jan 27 15:26:34 lxb000 kernel: zone(0): 4096 pages.
Jan 27 15:26:34 lxb000 kernel: zone(1): 225280 pages.
Jan 27 15:26:34 lxb000 kernel: zone(2): 32640 pages.
Jan 27 15:26:34 lxb000 kernel: ACPI: Searched entire block, no RSDP was 
found.
Jan 27 15:26:34 lxb000 kernel: ACPI: Searched entire block, no RSDP was 
found.
Jan 27 15:26:34 lxb000 kernel: ACPI: System description tables not found
Jan 27 15:26:34 lxb000 kernel: Intel MultiProcessor Specification v1.4
Jan 27 15:26:34 lxb000 kernel:     Virtual Wire compatibility mode.
Jan 27 15:26:34 lxb000 kernel: OEM ID:   Product ID: Kings Canyon APIC 
at: 0xFEE00000
Jan 27 15:26:34 lxb000 kernel: Processor #0 Pentium 4(tm) XEON(tm) APIC 
version 20
Jan 27 15:26:34 lxb000 kernel: Processor #6 Pentium 4(tm) XEON(tm) APIC 
version 20
Jan 27 15:26:34 lxb000 kernel: Processor #1 Pentium 4(tm) XEON(tm) APIC 
version 20
Jan 27 15:26:34 lxb000 kernel: Processor #7 Pentium 4(tm) XEON(tm) APIC 
version 20
Jan 27 15:26:34 lxb000 kernel: I/O APIC #2 Version 32 at 0xFEC00000.
Jan 27 15:26:34 lxb000 kernel: I/O APIC #3 Version 32 at 0xFEC80000.
Jan 27 15:26:34 lxb000 kernel: I/O APIC #4 Version 32 at 0xFEC80400.
Jan 27 15:26:34 lxb000 kernel: I/O APIC #5 Version 32 at 0xFEC81000.
Jan 27 15:26:34 lxb000 kernel: I/O APIC #8 Version 32 at 0xFEC81400.
Jan 27 15:26:34 lxb000 kernel: Processors: 4
...

There might be (are) severe flaws in our NFS configuration and network 
performance, but that should not crash the box, should it?

BTW: I just received a link to a bux incl. fix that sounds similar to 
our problem: http://marc.theaimsgroup.com/?l=linux-nfs&m=104716581307294&w=2

With kind regards,

Christopher


-
To unsubscribe from this list: send the line "unsubscribe linux-smp" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic