[prev in list] [next in list] [prev in thread] [next in thread] 

List:       opensolaris-driver-discuss
Subject:    Re: [driver-discuss] Help with a pci-e problem
From:       Ragnar Sundblad <ragge () csc ! kth ! se>
Date:       2010-04-08 9:09:35
Message-ID: 1BEF02BC-5B9D-49F4-A464-F97642F09EF1 () csc ! kth ! se
[Download RAW message or body]


Thanks Peng!

I believe it is zfs that tries to get/set the cache status in
this case.

I have filed the bugs: CR 6941996 (and also CR 6942004).

You don't happen to have any more information on the PCI bridge
error (ereport.io.pci.fabric)? After my tests, with two different
SUN-STK-INT cards in two different slots, I believe it is actually
related to the SUN-STK-INT card.

/ragge

On 8 apr 2010, at 08.44, Peng Liu wrote:

> 于 2010/4/8 0:07, Ragnar Sundblad 写道:
> > On 6 apr 2010, at 18.51, Ragnar Sundblad wrote:
> > 
> > 
> > > On 5 apr 2010, at 11.55, Ragnar Sundblad wrote:
> > > 
> > > 
> > > > On 5 apr 2010, at 06.41, pavan chandrashekar wrote:
> > > > 
> > > > 
> > > > > Ragnar Sundblad wrote:
> > > > > 
> > > > > > Hello,
> > > > > > I wonder if anyone could help me with a pci-e problem.
> > > > > > I have a X4150 running snv_134. It was shipped with a "STK RAID INT"
> > > > > > adaptec/intel/storagetek/sun SAS HBA. The machine also has a
> > > > > > LSI SAS card in another slot, though I don't know if that is
> > > > > > significant in any way.
> > > > > > 
> > > > > It might help troubleshooting.
> > > > > 
> > > > > You can try putting the disks behind the LSI SAS HBA and see if you still \
> > > > > get errors. That will at the least tell you if the two errors are \
> > > > > manifestations of the same problem, or separate issues. 
> > > > > You might still have issues with the fabric. You can then take off the HBA \
> > > > > that is throwing errors (STK RAID) and put the LSI SAS HBA on the slot on \
> > > > > which the STK RAID rested earlier and check the behaviour. Maybe, this will \
> > > > > point at the culprit. If the fabric errors continue with what ever card on \
> > > > > the currently faulty slot (if at all it is), it is more probable that the \
> > > > > issue is with the fabric. 
> > > > Thanks! The only problem right now and the last few days is that the
> > > > machine is at my workplace, some 10 kilometers away, and we have
> > > > eastern holiday right now. I was hoping to use those days off having
> > > > it running tests all by itself, but have instead been chasing hidden
> > > > easter eggs inside an intel design.
> > > > 
> > > > I have now discovered that the ereport.io.pci.fabric started when
> > > > I upgraded from snv_128 to 134, I totally missed that relation before.
> > > > There has been some changes in the PCI code about that time that may
> > > > or may not be related, for example:
> > > > <http://src.opensolaris.org/source/history/onnv/onnv-gate/usr/src/cmd/fm/modules/common/fabric-xlate/fabric-xlate.c>
> > > >  If that means that this is a driver glitch or a hardware problem
> > > > that now became visible, and whether it can be ignored or not,
> > > > is still far beyond my knowledge.
> > > > 
> > > > But I will follow your advice and move the cards around and see what
> > > > happens!
> > > > 
> > > I have now swapped the cards. The problem seems to remain almost identical
> > > to before, but if I understand this it is now on another PCI bridge
> > > (i suppose by this: pci8086,25e2@2, maybe I should check out the chip set
> > > documentation).
> > > 
> > > Can someone please tell me how I can decode the ereport information so
> > > that I can understand what the PCI bridge complains about?
> > > 
> > I have now also tried with another SUN_STK_INT controller (with
> > older firmware, as shipped form Sun) including riser board from another
> > X4150, and it gets the same ereports.
> > 
> > I have tried removing the LSI board, and it still behaves the same.
> > 
> > Is there anyone else out there with a Sun X4xxx running snv_134 with
> > a SUN_STK_INT raid controller that sees or don't see this?
> > 
> > For the record, the ereport.io.pci.fabric-s appears every
> > 4 minutes 4 seconds, give and take half a second or so.
> > 
> > Thanks!
> > 
> > /ragge
> > 
> > 
> Hi Ragnar,
> 
> The fma message about "sd_get_write_cache_enabled: Mode Sense caching page code \
> mismatch 0" is because aac driver does not support MODE SENSE command with Caching \
> mode page. Some userland program wanted to know a disks write-cache status via sd \
> driver, so sd requested Caching mode page from aac. When it failed, sd reported it \
> via fma, and that was logged. Please file an aac driver bug and I'll fix it. 
> Thanks,
> Peng
> 
> > 
> > > Thanks!
> > > 
> > > /ragge
> > > 
> > > Apr 06 2010 18:40:34.965687100 ereport.io.pci.fabric
> > > nvlist version: 0
> > > class = ereport.io.pci.fabric
> > > ena = 0x28d9c49528201801
> > > detector = (embedded nvlist)
> > > nvlist version: 0
> > > version = 0x0
> > > scheme = dev
> > > device-path = /pci@0,0/pci8086,25e2@2
> > > (end detector)
> > > 
> > > bdf = 0x10
> > > device_id = 0x25e2
> > > vendor_id = 0x8086
> > > rev_id = 0xb1
> > > dev_type = 0x40
> > > pcie_off = 0x6c
> > > pcix_off = 0x0
> > > aer_off = 0x100
> > > ecc_ver = 0x0
> > > pci_status = 0x10
> > > pci_command = 0x147
> > > pci_bdg_sec_status = 0x0
> > > pci_bdg_ctrl = 0x3
> > > pcie_status = 0x0
> > > pcie_command = 0x2027
> > > pcie_dev_cap = 0xfc1
> > > pcie_adv_ctl = 0x0
> > > pcie_ue_status = 0x0
> > > pcie_ue_mask = 0x100000
> > > pcie_ue_sev = 0x62031
> > > pcie_ue_hdr0 = 0x0
> > > pcie_ue_hdr1 = 0x0
> > > pcie_ue_hdr2 = 0x0
> > > pcie_ue_hdr3 = 0x0
> > > pcie_ce_status = 0x0
> > > pcie_ce_mask = 0x0
> > > pcie_rp_status = 0x0
> > > pcie_rp_control = 0x7
> > > pcie_adv_rp_status = 0x0
> > > pcie_adv_rp_command = 0x7
> > > pcie_adv_rp_ce_src_id = 0x0
> > > pcie_adv_rp_ue_src_id = 0x0
> > > remainder = 0x0
> > > severity = 0x1
> > > __ttl = 0x1
> > > __tod = 0x4bbb6402 0x398f373c
> > > 
> > > 
> > > 
> > > 
> > > > /ragge
> > > > 
> > > > 
> > > > > Pavan
> > > > > 
> > > > > 
> > > > > > It logs some errors, as shown with "fmdump -e(V).
> > > > > > It is most often a pci bridge error (I think), about five to ten
> > > > > > times an hour, and occasionally a problem with accessing a
> > > > > > mode page on the disks behind the STK raid controller for
> > > > > > enabling/disabling the disks' write caches, one error for each disk,
> > > > > > about every three hours. I don't believe the two have to be related.
> > > > > > I am especially interested in understanding the ereport.io.pci.fabric
> > > > > > report.
> > > > > > I haven't seen this problem on other more or less identical
> > > > > > machines running sol10.
> > > > > > Is this a known software problem, or do I have faulty hardware?
> > > > > > Thanks!
> > > > > > /ragge
> > > > > > --------------
> > > > > > % fmdump -e
> > > > > > ...
> > > > > > Apr 04 01:21:53.2244 ereport.io.pci.fabric           Apr 04 01:30:00.6999 \
> > > > > > ereport.io.pci.fabric           Apr 04 01:30:23.4647 \
> > > > > > ereport.io.scsi.cmd.disk.dev.uderr Apr 04 01:30:23.4651 \
> > > > > >                 ereport.io.scsi.cmd.disk.dev.uderr
> > > > > > ...
> > > > > > % fmdump -eV
> > > > > > Apr 04 2010 01:21:53.224492765 ereport.io.pci.fabric
> > > > > > nvlist version: 0
> > > > > > class = ereport.io.pci.fabric
> > > > > > ena = 0xd6a00a43be800c01
> > > > > > detector = (embedded nvlist)
> > > > > > nvlist version: 0
> > > > > > version = 0x0
> > > > > > scheme = dev
> > > > > > device-path = /pci@0,0/pci8086,25f8@4
> > > > > > (end detector)
> > > > > > bdf = 0x20
> > > > > > device_id = 0x25f8
> > > > > > vendor_id = 0x8086
> > > > > > rev_id = 0xb1
> > > > > > dev_type = 0x40
> > > > > > pcie_off = 0x6c
> > > > > > pcix_off = 0x0
> > > > > > aer_off = 0x100
> > > > > > ecc_ver = 0x0
> > > > > > pci_status = 0x10
> > > > > > pci_command = 0x147
> > > > > > pci_bdg_sec_status = 0x0
> > > > > > pci_bdg_ctrl = 0x3
> > > > > > pcie_status = 0x0
> > > > > > pcie_command = 0x2027
> > > > > > pcie_dev_cap = 0xfc1
> > > > > > pcie_adv_ctl = 0x0
> > > > > > pcie_ue_status = 0x0
> > > > > > pcie_ue_mask = 0x100000
> > > > > > pcie_ue_sev = 0x62031
> > > > > > pcie_ue_hdr0 = 0x0
> > > > > > pcie_ue_hdr1 = 0x0
> > > > > > pcie_ue_hdr2 = 0x0
> > > > > > pcie_ue_hdr3 = 0x0
> > > > > > pcie_ce_status = 0x0
> > > > > > pcie_ce_mask = 0x0
> > > > > > pcie_rp_status = 0x0
> > > > > > pcie_rp_control = 0x7
> > > > > > pcie_adv_rp_status = 0x0
> > > > > > pcie_adv_rp_command = 0x7
> > > > > > pcie_adv_rp_ce_src_id = 0x0
> > > > > > pcie_adv_rp_ue_src_id = 0x0
> > > > > > remainder = 0x0
> > > > > > severity = 0x1
> > > > > > __ttl = 0x1
> > > > > > __tod = 0x4bb7cd91 0xd617cdd
> > > > > > ...
> > > > > > Apr 04 2010 01:30:23.464768275 ereport.io.scsi.cmd.disk.dev.uderr
> > > > > > nvlist version: 0
> > > > > > class = ereport.io.scsi.cmd.disk.dev.uderr
> > > > > > ena = 0xde0cd54f84201c01
> > > > > > detector = (embedded nvlist)
> > > > > > nvlist version: 0
> > > > > > version = 0x0
> > > > > > scheme = dev
> > > > > > device-path = /pci@0,0/pci8086,25f8@4/pci108e,286@0/disk@5,0
> > > > > > devid = id1,sd@TSun_____STK_RAID_INT____EA4B6F24
> > > > > > (end detector)
> > > > > > driver-assessment = fail
> > > > > > op-code = 0x1a
> > > > > > cdb = 0x1a 0x0 0x8 0x0 0x18 0x0
> > > > > > pkt-reason = 0x0
> > > > > > pkt-state = 0x1f
> > > > > > pkt-stats = 0x0
> > > > > > stat-code = 0x0
> > > > > > un-decode-info = sd_get_write_cache_enabled: Mode Sense caching page code \
> > > > > > mismatch 0 un-decode-value =
> > > > > > __ttl = 0x1
> > > > > > __tod = 0x4bb7cf8f 0x1bb3cd13
> > > > > > ...
> > > > > > _______________________________________________
> > > > > > driver-discuss mailing list
> > > > > > driver-discuss@opensolaris.org
> > > > > > http://mail.opensolaris.org/mailman/listinfo/driver-discuss
> > > > > > 
> > > > > 
> > > > _______________________________________________
> > > > driver-discuss mailing list
> > > > driver-discuss@opensolaris.org
> > > > http://mail.opensolaris.org/mailman/listinfo/driver-discuss
> > > > 
> > > _______________________________________________
> > > driver-discuss mailing list
> > > driver-discuss@opensolaris.org
> > > http://mail.opensolaris.org/mailman/listinfo/driver-discuss
> > > 
> > _______________________________________________
> > driver-discuss mailing list
> > driver-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/driver-discuss
> > 
> 

_______________________________________________
driver-discuss mailing list
driver-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/driver-discuss


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic