[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freebsd-hardware
Subject:    Re: MCA error, possible causes?
From:       Ultima <ultima1252 () gmail ! com>
Date:       2016-02-24 20:51:01
Message-ID: CANJ8om4PWNtP2jK5=RE_9w5qhn6EJGASoSoft5ZHzKnHpso+GA () mail ! gmail ! com
[Download RAW message or body]

 Hi John,

 Thanks for the explanation. I ran some tests and ended up being a power
savings mode (aka unstable mode?). Disabling this feature put an end to the
freezes. I came to this conclusion by stress testing the box for 3 days,
and there were no issues. Nothing, then I stopped the stress test and about
15-30 min later it froze. It seemed to only occur during periods of low
load. I have not received any of these errors after turning off this power
savings mode.

On Wed, Feb 24, 2016 at 3:14 PM, John Baldwin <jhb@freebsd.org> wrote:

> On Friday, February 12, 2016 08:11:37 PM Ultima wrote:
> >  Recently installed some cpus and received two MCA errors. Using mcelog,
> I
> > found that the version in ports is about 5 years out of dated and didn't
> > support my cpu. Decided to update it to the newest version (Will post on
> > bugzilla shortly) to pull some more info. Going to post orig and decoded
> > mcelog.
> >
> >
> > Raw:
> > MCA: Bank 20, Status 0xc800084000310e0f
> > MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
> > MCA: Vendor "GenuineIntel", ID 0x306f1, APIC ID 0
> > MCA: CPU 0 COR (33) OVER BUSLG ??? ERR Other
> > MCA: Misc 0x1df87b000d9eff
> > MCA: Bank 5, Status 0xc800008000310e0f
> > MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
> > MCA: Vendor "GenuineIntel", ID 0x306f1, APIC ID 42
> > MCA: CPU 34 COR (2) OVER BUSLG ??? ERR Other
> > MCA: Misc 0xdf87b008d9eff
> >
> > mcelog v131:
> > Hardware event. This is not a software error.
> > CPU 0 BANK 20
> > MISC 1df87b000d9eff
> > MCG status:
> > QPI: Rx detected CRC error - successful LLR wihout Phy re-init
> > STATUS c800084000310e0f MCGSTATUS 0
> > MCGCAP 7000c16 APICID 0 SOCKETID 0
> > CPUID Vendor Intel Family 6 Model 63
> > Hardware event. This is not a software error.
> > CPU 34 BANK 5
> > MISC df87b008d9eff
> > MCG status:
> > QPI: Rx detected CRC error - successful LLR wihout Phy re-init
> > STATUS c800008000310e0f MCGSTATUS 0
> > MCGCAP 7000c16 APICID 2a SOCKETID 0
> > CPUID Vendor Intel Family 6 Model 63
> >
> >  After receiving this error, the system was in a frozen state. Any ideas
> > what may cause this?
>
> Well, hardware causes it.  QPI is the interconnect bus between your
> CPUs and RAM.  "Rx detected CRC error" implies that a CPU detected a
> corrupted message on that bus, but when it requested a resend the
> resent message was ok.  Normally corrected errors shouldn't hang your
> machine, but perhaps your machine had another hardware error after this
> that broke it too badly to report and/or log the subsequent error.
>
> --
> John Baldwin
>
_______________________________________________
freebsd-hardware@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to "freebsd-hardware-unsubscribe@freebsd.org"
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic