[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-smp
Subject:    Re: IO-APIC errors using 2.4.0-prerelease & 2.4.0-test12
From:       Robert Redelmeier <redelm () ev1 ! net>
Date:       2001-01-06 2:56:57
[Download RAW message or body]

Linus Torvalds wrote in part:
> 
> ABIT BP6 motherboards have some nasty signal propagation problems
> (apparently the way they routed the APIC bus is highly illegal).

Do you have a reference for this allegation?  I can't find any.
IMHO, the Intel GLT+ APIC bus is poor from a hardware viewpoint:
open-drain wired or that can only run at BCLK/4.  It's really alot 
like ethernet and collisions are to be expected.  Of course Abit
didn't help things by locating the PIIX4 southbridge with the IO-APIC
_so_ far away (20 cm) from the CPUs.  This adds parasitic trace
capacitance to a bus that cannot tolerate much.  Did Abit really
do something illegal like running the APIC bus at 2.0V?

> On most machines the above messages are harmless. But in theory you
> might get bogus APIC-bus messaging that wouldn't trigger the error
> checks, and you could be royally screwed.

If they are so harmless, why don't you drop their printk priority
from KERNEL_EMERG to something lower that doesn't clutter the
console with common `syslogd` configs?  

As for the uncaught (clean CRC) double APIC bus error, alot would 
depend on how  well the ISR is coded :)   With good coding, the ISR 
should realize it has nothing to do and behave gracefully.  A problem
might occur if an IRQ doesn't get acknowledged somehow or the EOI 
gets mangled.  The APIC bus might lock solid with retransmits.

> The reason you see more of them is probably simply because the VM
> scanner is a bit more aggressive, which in turn causes somewhat more
> inter-cpu calls.
> 
> Nothing to be done about the problem that I know about, I'm afraid.

Well, you might consider using the HLT instruction to synchronize
IPIs.  The APIC bus is slow.  Some time ago, I developed `burnAPIC`
(not released) to see if the APIC was causing trouble.  I wasn't 
able to generate more than 300k INTs/second due to the long APIC 
messaging times. There's probably an IPI latency of at least 300ns 
or ~200 CPU clocks before the other CPU can even recognize an IPI.  
Another should not be generated within this window unless you don't 
mind losing it and the ISR is utterly re-entrant.

But frankly, I don't think this is big problem for Linux.  With
full-duplex flood pings to two other machines, I was able to
generate the following 2.4.0t8 /proc/interrupts (HZ=1000?)

-- Robert  author `cpuburn`  http://users.ev1.net/~redelm


           CPU0       CPU1
  0:    2012553    2009127    IO-APIC-edge  timer
  1:       1134       1088    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  8:          2          0    IO-APIC-edge  rtc
 12:         71         64    IO-APIC-edge  PS/2 Mouse
 13:          0          0          XT-PIC  fpu
 14:   10426716   10363420    IO-APIC-edge  ide0
 15:          3         20    IO-APIC-edge  ide1
 16:  410504021  401748440   IO-APIC-level  eth1
 17:  407350472  415248915   IO-APIC-level  eth0
NMI:    4021613    4021613 
LOC:    4021789    4021787 
ERR:    faked 0
eth0      Link encap:Ethernet  HWaddr 00:80:C8:47:73:71  
          inet addr:10.10.123.123  Bcast:10.255.255.255  Mask:255.0.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:488018230 errors:0 dropped:1050 overruns:0 frame:0
          TX packets:487995596 errors:22891 dropped:0 overruns:0
carrier:22891
          collisions:0 txqueuelen:100 
          Interrupt:17 Base address:0xc400 

eth1      Link encap:Ethernet  HWaddr 00:80:C8:E9:07:47  
          inet addr:1.1.1.1  Bcast:1.255.255.255  Mask:255.0.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:468040272 errors:0 dropped:2611 overruns:0 frame:0
          TX packets:468040348 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          Interrupt:16 Base address:0xc800 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16192  Metric:1
          RX packets:46 errors:0 dropped:0 overruns:0 frame:0
          TX packets:46 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
-
To unsubscribe from this list: send the line "unsubscribe linux-smp" in
the body of a message to majordomo@vger.kernel.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic