[prev in list] [next in list] [prev in thread] [next in thread]
List: opensolaris-networking-discuss
Subject: Re: [networking-discuss] e1000g on snv_95 flakiness
From: Stephen Lau <stevel () opensolaris ! org>
Date: 2008-08-19 21:17:50
Message-ID: 48AB387E.7020607 () opensolaris ! org
[Download RAW message or body]
Just to follow-up; I happened to be on the machine's console when the
most recent network outage occured; when it started working again - I
immediately saw a console message:
SUNW-MSG-ID: SUNOS-8000-1L, TYPE: Defect, VER: 1, SEVERITY: Minor
EVENT-TIME: Tue Aug 19 14:15:00 PDT 2008
PLATFORM: PDSMi, CSN: 0123456789, HOSTNAME: grommit
SOURCE: eft, REV: 1.16
EVENT-ID: ca6dfb00-511d-6729-b83a-ef594726fc65
DESC: The EFT Diagnosis Engine encountered telemetry for which it is
unable to produce a diagnosis. Refer to
http://sun.com/msg/SUNOS-8000-1L for more information.
AUTO-RESPONSE: Error reports from the component will be logged for
examination by Sun.
IMPACT: Automated diagnosis and response for these events will not occur.
REC-ACTION: Run pkgchk -n SUNWfmd to ensure that fault management
software is installed properly. Contact Sun for support.
Which seems to match up to the ereport.io.device.stall/restored messages
reported by fmdump.
cheers,
steve
Stephen Lau wrote:
> Hi Ted,
> I'm quite sure I didn't observe any of these timeout/errors in
> snv_75. Looking at the network pings from our network monitor graphs,
> all the outages occur after I liveupgraded to snv_95. It's a fairly
> typical SAMP system... I'm not running anything specific to trigger the
> network outages. Some of the apps that are running are: Apache2, PHP 5,
> MySQL, Mailman, Dovecot, Postfix, etc. The systemboard is a Supermicro
> 5015M-MT+ with the dual Intel e1000g NICs.
>
> cheers,
> steve
>
> Ted You wrote:
>
>> Hi Stephen,
>>
>> From the ereports, it seems that the device got hang and was reset by
>> the driver automatically. From snv_77, the e1000g driver started to
>> support FMA, so we see the FMA ereports now. The problem might have
>> existed in snv_75.
>>
>> We have not seen this problem before. The recent bug fixes in snv_95
>> have nothing to do with this problem. I need to try to reproduce the
>> problem on our local systems. Could you please let me know what
>> applications or tests you have been running on your system?
>>
>> Thanks,
>> Ted
>>
>>
>> Stephen Lau :
>>
>>> I'm seeing flaky outages on my snv_95 system (just recently LU'd from
>>> snv_75 where I didn't have any issues).
>>>
>>> I seem to get outages of a few minutes (5-6) throughout the day, with
>>> no rhyme or reason as to why.
>>>
>>> fmdump -e shows repeated occurrences of:
>>> Aug 18 12:24:57.2612 ereport.io.device.stall Aug 18
>>> 12:24:57.5985 ereport.io.service.restored
>>> /usr/X11/bin/scanpci shows my e1000g devices to be:
>>> pci bus 0x000d cardnum 0x00 function 0x00: vendor 0x8086 device 0x108c
>>> Intel Corporation 82573E Gigabit Ethernet Controller (Copper)
>>>
>>> pci bus 0x000e cardnum 0x00 function 0x00: vendor 0x8086 device 0x109a
>>> Intel Corporation 82573L Gigabit Ethernet Controller
>>>
>>>
>>>> From my reading it looks like there were 1 or 2 e1000g issues that
>>>> were purportedly fixed in snv_95, but I'm still seeing these
>>>> problems. Anyone have any idea as to what might be the issue, and
>>>> whether or not there is a workaround? I'm debating about pulling
>>>> back the e1000g driver from my snv_75 LU slice and seeing if that
>>>> works, but I'm hoping someone has a workaround for now. :)
>>>>
>>> This message posted from opensolaris.org
>>> _______________________________________________
>>> networking-discuss mailing list
>>> networking-discuss@opensolaris.org
>>>
>
>
>
--
stephen lau | stevel@opensolaris.org | www.whacked.net
_______________________________________________
networking-discuss mailing list
networking-discuss@opensolaris.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic