[prev in list] [next in list] [prev in thread] [next in thread] 

List:       opensolaris-networking-discuss
Subject:    Re: [networking-discuss] e1000g on snv_95 flakiness
From:       Stephen Lau <stevel () opensolaris ! org>
Date:       2008-08-19 21:17:50
Message-ID: 48AB387E.7020607 () opensolaris ! org
[Download RAW message or body]

Just to follow-up; I happened to be on the machine's console when the 
most recent network outage occured; when it started working again - I 
immediately saw a console message:

SUNW-MSG-ID: SUNOS-8000-1L, TYPE: Defect, VER: 1, SEVERITY: Minor
EVENT-TIME: Tue Aug 19 14:15:00 PDT 2008
PLATFORM: PDSMi, CSN: 0123456789, HOSTNAME: grommit
SOURCE: eft, REV: 1.16
EVENT-ID: ca6dfb00-511d-6729-b83a-ef594726fc65
DESC: The EFT Diagnosis Engine encountered telemetry for which it is 
unable to produce a diagnosis.  Refer to 
http://sun.com/msg/SUNOS-8000-1L for more information.
AUTO-RESPONSE: Error reports from the component will be logged for 
examination by Sun.
IMPACT: Automated diagnosis and response for these events will not occur.
REC-ACTION: Run pkgchk -n SUNWfmd to ensure that fault management 
software is installed properly. Contact Sun for support.

Which seems to match up to the ereport.io.device.stall/restored messages 
reported by fmdump.

cheers,
steve

Stephen Lau wrote:
> Hi Ted,
>       I'm quite sure I didn't observe any of these timeout/errors in
> snv_75.  Looking at the network pings from our network monitor graphs,
> all the outages occur after I liveupgraded to snv_95.  It's a fairly
> typical SAMP system... I'm not running anything specific to trigger the
> network outages.  Some of the apps that are running are: Apache2, PHP 5,
> MySQL, Mailman, Dovecot, Postfix, etc.  The systemboard is a Supermicro
> 5015M-MT+ with the dual Intel e1000g NICs.
>
> cheers,
> steve
>
> Ted You wrote:
>    
>> Hi Stephen,
>>
>>  From the ereports, it seems that the device got hang and was reset by
>> the driver automatically. From snv_77, the e1000g driver started to
>> support FMA, so we see the FMA ereports now. The problem might have
>> existed in snv_75.
>>
>> We have not seen this problem before. The recent bug fixes in snv_95
>> have nothing to do with this problem. I need to try to reproduce the
>> problem on our local systems. Could you please let me know what
>> applications or tests you have been running on your system?
>>
>> Thanks,
>> Ted
>>
>>
>> Stephen Lau :
>>      
>>> I'm seeing flaky outages on my snv_95 system (just recently LU'd from
>>> snv_75 where I didn't have any issues).
>>>
>>> I seem to get outages of a few minutes (5-6) throughout the day, with
>>> no rhyme or reason as to why.
>>>
>>> fmdump -e shows repeated occurrences of:
>>> Aug 18 12:24:57.2612 ereport.io.device.stall         Aug 18
>>> 12:24:57.5985 ereport.io.service.restored
>>> /usr/X11/bin/scanpci shows my e1000g devices to be:
>>> pci bus 0x000d cardnum 0x00 function 0x00: vendor 0x8086 device 0x108c
>>>   Intel Corporation 82573E Gigabit Ethernet Controller (Copper)
>>>
>>> pci bus 0x000e cardnum 0x00 function 0x00: vendor 0x8086 device 0x109a
>>>   Intel Corporation 82573L Gigabit Ethernet Controller
>>>
>>>        
>>>>  From my reading it looks like there were 1 or 2 e1000g issues that
>>>> were purportedly fixed in snv_95, but I'm still seeing these
>>>> problems.  Anyone have any idea as to what might be the issue, and
>>>> whether or not there is a workaround?  I'm debating about pulling
>>>> back the e1000g driver from my snv_75 LU slice and seeing if that
>>>> works, but I'm hoping someone has a workaround for now. :)
>>>>          
>>> This message posted from opensolaris.org
>>> _______________________________________________
>>> networking-discuss mailing list
>>> networking-discuss@opensolaris.org
>>>        
>
>
>    


-- 
stephen lau | stevel@opensolaris.org | www.whacked.net

_______________________________________________
networking-discuss mailing list
networking-discuss@opensolaris.org
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic