[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freebsd-smp
Subject:    SMP kernel hangs with latest MegaRAID firmware
From:       "Marc G. Fournier" <scrappy () hub ! org>
Date:       2002-11-25 5:26:37
[Download RAW message or body]


Hi Eric ...

	First and foremost, I don't believe its the CAM support directly
that is breaking things, as you may have seen in my other posts ... but I
do believe its related to the MegaRAID controller.

	When we started this, the problem was that an Oct 28th kernel
would work, but an Oct 29th kernel would hang while booting ... Oct 29th
was when you updated the AMR driver code, and, as I recall, the changes
touched enough files that it wasn't just camifying the code ...

	Your suggestion was to upgrade the firmware on the controller
itself, which made sense, so we schedualed it ... now, while waiting for
that, I downgraded the server to RELENG_4_7, negating any work you did on
the AMR driver, figuring it would give me some stability while waiting for
the firmware upgrade ...

	On Friday, as schedualed, Rackspace upgraded the firmware on the
card, at which point, all hell broke lose ... the RELENG_4_7 kernel could
no longer boot up, they had to bring it up on a GENERIC kernel ...

	After futzing around for a period of time with the kernel configs
(namely, what was different between a GENERIC kernel and my kernel), we
determined that if we disable the SMP code, everything boots up great ...
as soon as we enable the two options required for SMP, it hangs ... so, I
added -v to /boot.config, figuring I should be able to get some
better information for around the hang ... after scannin through the
output a few times, I finally stumbled upon something that should have
been more (or less) obvious:

IOAPIC #0 intpin 2 -> irq 0
Programming 16 pins in IOAPIC #1
SMP: CPU0 apic_initialize():
     lint0: 0x00000700 lint1: 0x00010400 TPR: 0x00000010 SVR: 0x000001ff
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  4, version: 0x000f0011, at 0xfec00000
 io1 (APIC): apic id:  5, version: 0x000f0011, at 0xfec01000
bios32: Found BIOS32 Service Directory header at 0xc00fdb90
bios32: Entry = 0xfdba0 (c00fdba0)  Rev = 0  Len = 1
pcibios: PCI BIOS entry at 0xdbc1
pnpbios: Found PnP BIOS data at 0xc00f4c50
pnpbios: Entry = f0000:3954  Rev = 1.0
Other BIOS signatures found:
ACPI: 00000000

	cpu1 is missing, which is why its hanging while trying to start up
CPU #1 ... so, went back at Rackspace to take a look at the server, make
sure that both CPUs are actually in the machine ... sure enough, they are,
and the BIOs recognizes both ... but, just in case, they swap'd both CPUs
out ... mptable shows:

Processors:     APIC ID Version State           Family  Model   Step    Flags
                 0       0x11    BSP, usable     6       11      1       0x383fbff
                 1       0x11    AP, usable      6       11      1       0x383fbff

	So, the machine has two CPUs in it that worked under RELENG_4_7
*before* the firmware upgrade, but fails to work after the firmware
upgrade ... the operating system sees that there are, in fact, two CPUs in
the machine ...

	So, we have two changes to the MegaRAID card/driver that have
succeeded in crippling SMP ... the motherboard is a Tyan LE-T with a 1.06
BIOS on it ... there are 7 18gig drives in a RAID5 configuration on the
MegaRAID card ... the server was originally setup with (and ran) with a
300W power supply, that has since been upgraded to 400W ...

	One person email'd me and suggested that they've seen similar with
an Adaptec RAID controller when a drive was bad, but as part of the
firmware upgrade, Rackspace ran a consistency check, which I would assume
would pick that up ...

	The key thing right now, to note, is that since the firmware
upgrade, neither a pre or post oct 29th SMP kernel will work, while both
pre/post non-SMP does ...

	Right now, I'm stump'd, so if anyone else has any ideas, I'm all
ears ...

Thanks ...


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic