[prev in list] [next in list] [prev in thread] [next in thread] 

List:       zfs-discuss
Subject:    Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
From:       Ross Smith <myxiplx () hotmail ! com>
Date:       2008-07-31 12:28:37
Message-ID: BAY101-W37AF517D03CD17E4728FC8AE7C0 () phx ! gbl
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


I'm not sure you're actually seeing the same problem there Richard.  It seems that \
for you I/O is stopping on removal of the device, whereas for me I/O continues for \
some considerable time.  You are also able to obtain a result from "zpool status" \
whereas that completely hangs for me.  
To illustrate the difference, this is what I saw today in snv_94, with a pool created \
from a single external USB hard drive.  
1. As before I started a copy of a directory using Solaris' file manager.  About 1/3 \
of the way through I pulled the plug on the drive. 2. File manager continued to copy \
a further 30MB+ of files across.  Checking the properties of the copy shows it \
contains 71.1MB of data and 19,160 files, despite me pulling the drive at around \
8,000 files.  
3.  8:24am  I ran "zpool status":
# zpool status rc-usb  pool: rc-usb state: ONLINEstatus: One or more devices has \
experienced an error resulting in data corruption.  Applications may be \
affected.action: Restore the file in question if possible.  Otherwise restore the \
entire pool from backup.   see: http://www.sun.com/msg/ZFS-8000-8A scrub: none \
requested  
That is as far as it gets.  It never gives me any further information.  I left it two \
hours, and it still had not displayed the status of the drive in the pool.  I also \
did a "zfs list", that also hangs now although I'm pretty sure that if you run "zfs \
list" before "zpool status" it works fine.  
As you can see from /var/adm/messages, I am getting nothing at all from FMA:
Jul 31 08:16:46 unknown usba: [ID 912658 kern.info] USB 2.0 device (usbd49,7350) \
operating at hi speed (USB 2.x) on USB 2.0 root hub: storage@3, scsa2usb0 at bus \
address 2Jul 31 08:16:46 unknown usba: [ID 349649 kern.info]  Maxtor   OneTouch       \
2HAP70DZ    Jul 31 08:16:46 unknown genunix: [ID 936769 kern.info] scsa2usb0 is \
/pci@0,0/pci15d9,a011@2,1/storage@3Jul 31 08:16:46 unknown genunix: [ID 408114 \
kern.info] /pci@0,0/pci15d9,a011@2,1/storage@3 (scsa2usb0) onlineJul 31 08:16:46 \
unknown scsi: [ID 193665 kern.info] sd17 at scsa2usb0: target 0 lun 0Jul 31 08:16:46 \
unknown genunix: [ID 936769 kern.info] sd17 is \
/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0Jul 31 08:16:46 unknown genunix: [ID \
340201 kern.warning] WARNING: Page83 data not standards compliant Maxtor   OneTouch   \
0125Jul 31 08:16:46 unknown genunix: [ID 408114 kern.info] \
/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0 (sd17) onlineJul 31 08:16:49 unknown \
pcplusmp: [ID 444295 kern.info] pcplusmp: ide (ata) instance #1 vector 0xf ioapic 0x4 \
intin 0xf is bound to cpu 3Jul 31 08:16:49 unknown scsi: [ID 193665 kern.info] sd14 \
at marvell88sx1: target 7 lun 0Jul 31 08:16:49 unknown genunix: [ID 936769 kern.info] \
sd14 is /pci@1,0/pci1022,7458@2/pci11ab,11ab@1/disk@7,0Jul 31 08:16:49 unknown \
genunix: [ID 408114 kern.info] /pci@1,0/pci1022,7458@2/pci11ab,11ab@1/disk@7,0 (sd14) \
onlineJul 31 08:21:35 unknown usba: [ID 691482 kern.warning] WARNING: \
/pci@0,0/pci15d9,a011@2,1/storage@3 (scsa2usb0): Disconnected device was busy, please \
reconnect.Jul 31 08:21:38 unknown scsi: [ID 107833 kern.warning] WARNING: \
/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0 (sd17):Jul 31 08:21:38 unknown  Command \
failed to complete...Device is goneJul 31 08:21:38 unknown scsi: [ID 107833 \
kern.warning] WARNING: /pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0 (sd17):Jul 31 \
08:21:38 unknown  Command failed to complete...Device is goneJul 31 08:21:38 unknown \
scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0 \
(sd17):Jul 31 08:21:38 unknown  Command failed to complete...Device is goneJul 31 \
08:21:38 unknown scsi: [ID 107833 kern.warning] WARNING: \
/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0 (sd17):Jul 31 08:21:38 unknown  Command \
failed to complete...Device is goneJul 31 08:21:38 unknown scsi: [ID 107833 \
kern.warning] WARNING: /pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0 (sd17):Jul 31 \
08:21:38 unknown  Command failed to complete...Device is goneJul 31 08:21:38 unknown \
scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0 \
(sd17):Jul 31 08:21:38 unknown  Command failed to complete...Device is goneJul 31 \
08:21:38 unknown scsi: [ID 107833 kern.warning] WARNING: \
/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0 (sd17):Jul 31 08:21:38 unknown  Command \
failed to complete...Device is goneJul 31 08:21:38 unknown scsi: [ID 107833 \
kern.warning] WARNING: /pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0 (sd17):Jul 31 \
08:21:38 unknown  Command failed to complete...Device is goneJul 31 08:24:26 unknown \
scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0 \
(sd17):Jul 31 08:24:26 unknown  Command failed to complete...Device is goneJul 31 \
08:24:26 unknown scsi: [ID 107833 kern.warning] WARNING: \
/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0 (sd17):Jul 31 08:24:26 unknown  Command \
failed to complete...Device is goneJul 31 08:24:26 unknown scsi: [ID 107833 \
kern.warning] WARNING: /pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0 (sd17):Jul 31 \
08:24:26 unknown  drive offlineJul 31 08:27:43 unknown smbd[603]: [ID 766186 \
daemon.error] NbtDatagramDecode[11]: too small packetJul 31 08:39:43 unknown \
smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: too small packetJul 31 \
08:44:50 unknown /sbin/dhcpagent[95]: [ID 732317 daemon.warning] accept_v4_acknak: \
ACK packet on nge0 missing mandatory lease option, ignoredJul 31 08:44:58 unknown \
last message repeated 3 timesJul 31 08:45:06 unknown /sbin/dhcpagent[95]: [ID 732317 \
daemon.warning] accept_v4_acknak: ACK packet on nge0 missing mandatory lease option, \
ignoredJul 31 08:45:06 unknown last message repeated 1 timeJul 31 08:51:44 unknown \
smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: too small packetJul 31 \
09:03:44 unknown smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: too small \
packetJul 31 09:13:51 unknown /sbin/dhcpagent[95]: [ID 732317 daemon.warning] \
accept_v4_acknak: ACK packet on nge0 missing mandatory lease option, ignoredJul 31 \
09:14:09 unknown last message repeated 5 timesJul 31 09:15:44 unknown smbd[603]: [ID \
766186 daemon.error] NbtDatagramDecode[11]: too small packetJul 31 09:27:44 unknown \
smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: too small packetJul 31 \
09:27:55 unknown pcplusmp: [ID 444295 kern.info] pcplusmp: ide (ata) instance #1 \
vector 0xf ioapic 0x4 intin 0xf is bound to cpu 3  
cfgadm reports that the port is empty but still configured:
# cfgadmAp_Id                          Type         Receptacle   Occupant     \
Conditionusb1/3                         unknown      empty        configured   \
unusable 4. 9:32am I now tried writing more data to the pool, to see if I can trigger \
the I/O error you are seeing.  I tried making a second copy of the files on the USB \
drive in the Solaris File manager, but that attempt simply hung the copy dialog.  I'm \
still seeing nothing else that appears relevant in /var/adm/messages.  
5. 10:08am While checking free space, I found that although df works, "df -kh" hangs, \
apparently when it tries to query any zfs pool: # df/                  \
(/dev/dsk/c1t0d0s0 ): 2504586 blocks   656867 files/devices           (/devices       \
):       0 blocks        0 files/dev               (/dev              ):       0 \
blocks        0 files/system/contract   (ctfs              ):       0 blocks \
2147483609 files/proc              (proc              ):       0 blocks    29902 \
files/etc/mnttab        (mnttab            ):       0 blocks        0 \
files/etc/svc/volatile  (swap              ): 9850928 blocks  1180374 \
files/system/object     (objfs             ):       0 blocks 2147483409 \
files/etc/dfs/sharetab  (sharefs           ):       0 blocks 2147483646 \
files/lib/libc.so.1     (/usr/lib/libc/libc_hwcap2.so.1): 2504586 blocks   656867 \
files/dev/fd            (fd                ):       0 blocks        0 files/tmp       \
(swap              ): 9850928 blocks  1180374 files/var/run           (swap           \
): 9850928 blocks  1180374 files/export/home       (/dev/dsk/c1t0d0s7 ):881398942 \
blocks 53621232 files/rc-pool           (rc-pool           ):4344346098 blocks \
4344346098 files/rc-pool/admin     (rc-pool/admin     ):4344346098 blocks 4344346098 \
files/rc-pool/ross-home (rc-pool/ross-home ):4344346098 blocks 4344346098 \
files/rc-pool/vmware    (rc-pool/vmware    ):4344346098 blocks 4344346098 \
files/rc-usb            (rc-usb            ):153725153 blocks 153725153 files# df \
-khFilesystem             size   used  avail capacity  Mounted on/dev/dsk/c1t0d0s0    \
7.2G   6.0G   1.1G    85%    //devices                 0K     0K     0K     0%    \
/devices/dev                     0K     0K     0K     0%    /devctfs                  \
0K     0K     0K     0%    /system/contractproc                     0K     0K     0K  \
0%    /procmnttab                   0K     0K     0K     0%    /etc/mnttabswap        \
4.7G   1.1M   4.7G     1%    /etc/svc/volatileobjfs                    0K     0K     \
0K     0%    /system/objectsharefs                  0K     0K     0K     0%    \
/etc/dfs/sharetab/usr/lib/libc/libc_hwcap2.so.1                       7.2G   6.0G   \
1.1G    85%    /lib/libc.so.1fd                       0K     0K     0K     0%    \
/dev/fdswap                   4.7G    48K   4.7G     1%    /tmpswap                   \
4.7G    76K   4.7G     1%    /var/run/dev/dsk/c1t0d0s7      425G   4.8G   416G     2% \
/export/home  
6. 10:35am  It's now been two hours, neither "zpool status" nor "zfs list" have ever \
finished.  The file copy attempt has also been hung for over an hour (although that's \
not unexpected with 'wait' as the failmode).  
Richard, you say ZFS is not silently failing, well for me it appears that it is.  I \
can't see any warnings from ZFS, I can't get any status information.  I see no way \
that I could find out what files are going to be lost on this server.  
Yes, I'm now aware that the pool has hung since file operations are hanging, however \
had that been my first indication of a problem I believe I am now left in a position \
where I cannot find out either the cause, nor the files affected.  I don't believe I \
have any way to find out which operations had completed without error, but are not \
currently committed to disk.  I certainly don't get the status message you do saying \
permanent errors have been found in files.  
I plugged the USB drive back in now, Solaris detected it ok, but ZFS is still hung.  \
                The rest of /var/adm/messages is:
Jul 31 09:39:44 unknown smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: \
too small packetJul 31 09:45:22 unknown /sbin/dhcpagent[95]: [ID 732317 \
daemon.warning] accept_v4_acknak: ACK packet on nge0 missing mandatory lease option, \
ignoredJul 31 09:45:38 unknown last message repeated 5 timesJul 31 09:51:44 unknown \
smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: too small packetJul 31 \
10:03:44 unknown last message repeated 2 timesJul 31 10:14:27 unknown \
/sbin/dhcpagent[95]: [ID 732317 daemon.warning] accept_v4_acknak: ACK packet on nge0 \
missing mandatory lease option, ignoredJul 31 10:14:45 unknown last message repeated \
5 timesJul 31 10:15:44 unknown smbd[603]: [ID 766186 daemon.error] \
NbtDatagramDecode[11]: too small packetJul 31 10:27:45 unknown smbd[603]: [ID 766186 \
                daemon.error] NbtDatagramDecode[11]: too small packet
Jul 31 10:36:25 unknown usba: [ID 691482 kern.warning] WARNING: \
/pci@0,0/pci15d9,a011@2,1/storage@3 (scsa2usb0): Reinserted device is accessible \
again.Jul 31 10:39:45 unknown smbd[603]: [ID 766186 daemon.error] \
NbtDatagramDecode[11]: too small packetJul 31 10:45:53 unknown /sbin/dhcpagent[95]: \
[ID 732317 daemon.warning] accept_v4_acknak: ACK packet on nge0 missing mandatory \
lease option, ignoredJul 31 10:46:09 unknown last message repeated 5 timesJul 31 \
10:51:45 unknown smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: too small \
packet  
7. 10:55am  Gave up on ZFS ever recovering.  A shutdown attempt hung as expected.  I \
hard-reset the computer.  
Ross
 
 
> Date: Wed, 30 Jul 2008 11:17:08 -0700> From: Richard.Elling@Sun.COM> Subject: Re: \
> [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed> To: \
> myxiplx@hotmail.com> CC: zfs-discuss@opensolaris.org> > I was able to reproduce \
> this in b93, but might have a different> interpretation of the conditions. More \
> below...> > Ross Smith wrote:> > A little more information today. I had a feeling \
> that ZFS would > > continue quite some time before giving an error, and today I've \
> shown > > that you can carry on working with the filesystem for at least half an > \
> > hour with the disk removed.> > > > I suspect on a system with little load you \
> > could carry on working for > > several hours without any indication that there is \
> > a problem. It > > looks to me like ZFS is caching reads & writes, and that \
> > provided > > requests can be fulfilled from the cache, it doesn't care whether \
> > the > > disk is present or not.> > In my \
> > USB-flash-disk-sudden-removal-while-writing-big-file-test,> 1. I/O to the missing \
> > device stopped (as I expected)> 2. FMA kicked in, as expected.> 3. \
> > /var/adm/messages recorded "Command failed to complete... device gone."> 4. After \
> > exactly 9 minutes, 17,951 e-reports had been processed and the> diagnosis was \
> > complete. FMA logged the following to /var/adm/messages> > Jul 30 10:33:44 grond \
> > scsi: [ID 107833 kern.warning] WARNING: > \
> > /pci@0,0/pci1458,5004@b,1/storage@8/disk@0,0 (sd1):> Jul 30 10:33:44 grond \
> > Command failed to complete...Device is gone> Jul 30 10:42:31 grond fmd: [ID \
> > 441519 daemon.error] SUNW-MSG-ID: > ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: \
> > Major> Jul 30 10:42:31 grond EVENT-TIME: Wed Jul 30 10:42:30 PDT 2008> Jul 30 \
> > 10:42:31 grond PLATFORM: , CSN: , HOSTNAME: grond> Jul 30 10:42:31 grond SOURCE: \
> > zfs-diagnosis, REV: 1.0> Jul 30 10:42:31 grond EVENT-ID: \
> > d99769aa-28e8-cf16-d181-945592130525> Jul 30 10:42:31 grond DESC: The number of \
> > I/O errors associated with a > ZFS device exceeded> Jul 30 10:42:31 grond \
> > acceptable levels. Refer to > http://sun.com/msg/ZFS-8000-FD for more \
> > information.> Jul 30 10:42:31 grond AUTO-RESPONSE: The device has been offlined \
> > and > marked as faulted. An attempt> Jul 30 10:42:31 grond will be made to \
> > activate a hot spare if > available.> Jul 30 10:42:31 grond IMPACT: Fault \
> > tolerance of the pool may be > compromised.> Jul 30 10:42:31 grond REC-ACTION: \
> > Run 'zpool status -x' and replace > the bad device.> > The above URL shows what \
> > you expect, but more (and better) info> is available from zpool status -xv> > \
> > pool: rmtestpool> state: UNAVAIL> status: One or more devices are faultd in \
> > response to IO failures.> action: Make sure the affected devices are connected, \
> > then run 'zpool > clear'.> see: http://www.sun.com/msg/ZFS-8000-HC> scrub: none \
> > requested> config:> > NAME STATE READ WRITE CKSUM> rmtestpool UNAVAIL 0 15.7K 0 \
> > insufficient replicas> c2t0d0p0 FAULTED 0 15.7K 0 experienced I/O failures> > \
> > errors: Permanent errors have been detected in the following files:> > \
> > /rmtestpool/random.data> > > If you surf to http://www.sun.com/msg/ZFS-8000-HC \
> > you'll> see words to the effect that,> The pool has experienced I/O failures. \
> > Since the ZFS pool property> 'failmode' is set to 'wait', all I/Os (reads and \
> > writes) are> blocked. See the zpool(1M) manpage for more information on the> \
> > 'failmode' property. Manual intervention is required for I/Os to> be serviced.> > \
> > > > > I would guess that ZFS is attempting to write to the disk in the > > \
> > > > > background, and that this is silently failing.> > It is clearly not \
> > > > > silently failing.> > However, the default failmode property is set to \
> > > > > "wait" which will patiently> wait forever. If you would rather have the I/O \
> > > > > fail, then you should change> the failmode to "continue" I would not \
> > > > > normally recommend a failmode of> "panic"> > Now to figure out how to \
> > > > > recover gracefully... zpool clear isn't happy...> > [sidebar]> while \
> > > > > performing this experiment, I noticed that fmd was checkpointing> the \
> > > > > diagnosis engine to disk in the /var/fm/fmd/ckpt/zfs-diagnosis > \
> > > > > directory.> If this had been the boot disk, with failmode=wait, I'm not \
> > > > > convinced> that we'd get a complete diagnosis... I'll explore that later.> \
> > > > > [/sidebar]> > -- richard> 
_________________________________________________________________
The John Lewis Clearance - save up to 50% with FREE delivery
http://clk.atdmt.com/UKM/go/101719806/direct/01/


[Attachment #5 (text/html)]

<html>
<head>
<style>
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
FONT-SIZE: 10pt;
FONT-FAMILY:Tahoma
}
</style>
</head>
<body class='hmmessage'>
I'm not sure you're actually seeing the same problem there Richard.&nbsp; It seems \
that for you I/O is stopping on removal of the device, whereas for me I/O continues \
for some considerable time.&nbsp; You are also able to obtain a result from "zpool \
status" whereas that completely hangs for me.<BR> &nbsp;<BR>
To illustrate the difference, this is what I saw today in snv_94, with a pool created \
from a single external USB hard drive.<BR> &nbsp;<BR>
1. As before I started a copy of a directory using Solaris' file manager.&nbsp; About \
1/3 of the way through I pulled the plug on the drive.<BR> 2. File manager continued \
to copy a further 30MB+ of files across.&nbsp; Checking the properties of the copy \
shows it contains 71.1MB of data and 19,160 files, despite me pulling the drive at \
around 8,000 files.<BR> &nbsp;<BR>
3.&nbsp; 8:24am&nbsp; I ran "zpool status":<BR>
<FONT face="Courier New, Courier, Monospace"># zpool status rc-usb<BR>&nbsp; pool: \
rc-usb<BR>&nbsp;state: ONLINE<BR>status: One or more devices has experienced an error \
resulting in data<BR>&nbsp;corruption.&nbsp; Applications may be affected.<BR>action: \
Restore the file in question if possible.&nbsp; Otherwise restore the<BR>&nbsp;entire \
pool from backup.<BR>&nbsp;&nbsp; see: </FONT><A \
href="http://www.sun.com/msg/ZFS-8000-8A"><FONT face="Courier New, Courier, \
Monospace">http://www.sun.com/msg/ZFS-8000-8A</FONT></A><BR><FONT face="Courier New, \
Courier, Monospace">&nbsp;scrub: none requested</FONT><BR> &nbsp;<BR>
That is as far as it gets.&nbsp; It never gives me any further information.&nbsp; I \
left it two hours, and it still had not displayed the status of the drive in the \
pool.&nbsp; I also did a "zfs list", that also hangs now although I'm pretty sure \
that if you run "zfs list" before "zpool status" it works fine.<BR> &nbsp;<BR>
As you can see from /var/adm/messages, I am getting nothing at all from FMA:<BR>
<FONT face="Courier New, Courier, Monospace" size=1>Jul 31 08:16:46 unknown usba: [ID \
912658 kern.info] USB 2.0 device (usbd49,7350) operating at hi speed (USB 2.x) on USB \
2.0 root hub: </FONT><A href="mailto:storage@3"><FONT face="Courier New, Courier, \
Monospace" size=1>storage@3</FONT></A><FONT face="Courier New, Courier, Monospace" \
size=1>, scsa2usb0 at bus address 2<BR>Jul 31 08:16:46 unknown usba: [ID 349649 \
kern.info] &nbsp;Maxtor&nbsp;&nbsp; \
OneTouch&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2HAP70DZ&nbsp;&nbsp;&nbsp; \
<BR>Jul 31 08:16:46 unknown genunix: [ID 936769 kern.info] scsa2usb0 is \
</FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3</FONT></A><BR><FONT face="Courier New, \
Courier, Monospace" size=1>Jul 31 08:16:46 unknown genunix: [ID 408114 kern.info] \
</FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3</FONT></A><FONT face="Courier New, \
Courier, Monospace" size=1> (scsa2usb0) online<BR>Jul 31 08:16:46 unknown scsi: [ID \
193665 kern.info] sd17 at scsa2usb0: target 0 lun 0<BR>Jul 31 08:16:46 unknown \
genunix: [ID 936769 kern.info] sd17 is </FONT><A><FONT face="Courier New, Courier, \
Monospace" size=1>/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0</FONT></A><BR><FONT \
face="Courier New, Courier, Monospace" size=1>Jul 31 08:16:46 unknown genunix: [ID \
340201 kern.warning] WARNING: Page83 data not standards compliant Maxtor&nbsp;&nbsp; \
OneTouch&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0125<BR>Jul 31 08:16:46 \
unknown genunix: [ID 408114 kern.info] </FONT><A><FONT face="Courier New, Courier, \
Monospace" size=1>/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0</FONT></A><FONT \
face="Courier New, Courier, Monospace" size=1> (sd17) online<BR>Jul 31 08:16:49 \
unknown pcplusmp: [ID 444295 kern.info] pcplusmp: ide (ata) instance #1 vector 0xf \
ioapic 0x4 intin 0xf is bound to cpu 3<BR>Jul 31 08:16:49 unknown scsi: [ID 193665 \
kern.info] sd14 at marvell88sx1: target 7 lun 0<BR>Jul 31 08:16:49 unknown genunix: \
[ID 936769 kern.info] sd14 is </FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@1,0/pci1022,7458@2/pci11ab,11ab@1/disk@7,0</FONT></A><BR><FONT \
face="Courier New, Courier, Monospace" size=1>Jul 31 08:16:49 unknown genunix: [ID \
408114 kern.info] </FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@1,0/pci1022,7458@2/pci11ab,11ab@1/disk@7,0</FONT></A><FONT face="Courier \
New, Courier, Monospace" size=1> (sd14) online<BR>Jul 31 08:21:35 unknown usba: [ID \
691482 kern.warning] WARNING: </FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3</FONT></A><FONT face="Courier New, \
Courier, Monospace" size=1> (scsa2usb0): Disconnected device was busy, please \
reconnect.<BR>Jul 31 08:21:38 unknown scsi: [ID 107833 kern.warning] WARNING: \
</FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0</FONT></A><FONT face="Courier \
New, Courier, Monospace" size=1> (sd17):<BR>Jul 31 08:21:38 unknown &nbsp;Command \
failed to complete...Device is gone<BR>Jul 31 08:21:38 unknown scsi: [ID 107833 \
kern.warning] WARNING: </FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0</FONT></A><FONT face="Courier \
New, Courier, Monospace" size=1> (sd17):<BR>Jul 31 08:21:38 unknown &nbsp;Command \
failed to complete...Device is gone<BR>Jul 31 08:21:38 unknown scsi: [ID 107833 \
kern.warning] WARNING: </FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0</FONT></A><FONT face="Courier \
New, Courier, Monospace" size=1> (sd17):<BR>Jul 31 08:21:38 unknown &nbsp;Command \
failed to complete...Device is gone<BR>Jul 31 08:21:38 unknown scsi: [ID 107833 \
kern.warning] WARNING: </FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0</FONT></A><FONT face="Courier \
New, Courier, Monospace" size=1> (sd17):<BR>Jul 31 08:21:38 unknown &nbsp;Command \
failed to complete...Device is gone<BR>Jul 31 08:21:38 unknown scsi: [ID 107833 \
kern.warning] WARNING: </FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0</FONT></A><FONT face="Courier \
New, Courier, Monospace" size=1> (sd17):<BR>Jul 31 08:21:38 unknown &nbsp;Command \
failed to complete...Device is gone<BR>Jul 31 08:21:38 unknown scsi: [ID 107833 \
kern.warning] WARNING: </FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0</FONT></A><FONT face="Courier \
New, Courier, Monospace" size=1> (sd17):<BR>Jul 31 08:21:38 unknown &nbsp;Command \
failed to complete...Device is gone<BR>Jul 31 08:21:38 unknown scsi: [ID 107833 \
kern.warning] WARNING: </FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0</FONT></A><FONT face="Courier \
New, Courier, Monospace" size=1> (sd17):<BR>Jul 31 08:21:38 unknown &nbsp;Command \
failed to complete...Device is gone<BR>Jul 31 08:21:38 unknown scsi: [ID 107833 \
kern.warning] WARNING: </FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0</FONT></A><FONT face="Courier \
New, Courier, Monospace" size=1> (sd17):<BR>Jul 31 08:21:38 unknown &nbsp;Command \
failed to complete...Device is gone<BR>Jul 31 08:24:26 unknown scsi: [ID 107833 \
kern.warning] WARNING: </FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0</FONT></A><FONT face="Courier \
New, Courier, Monospace" size=1> (sd17):<BR>Jul 31 08:24:26 unknown &nbsp;Command \
failed to complete...Device is gone<BR>Jul 31 08:24:26 unknown scsi: [ID 107833 \
kern.warning] WARNING: </FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0</FONT></A><FONT face="Courier \
New, Courier, Monospace" size=1> (sd17):<BR>Jul 31 08:24:26 unknown &nbsp;Command \
failed to complete...Device is gone<BR>Jul 31 08:24:26 unknown scsi: [ID 107833 \
kern.warning] WARNING: </FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3/disk@0,0</FONT></A><FONT face="Courier \
New, Courier, Monospace" size=1> (sd17):<BR>Jul 31 08:24:26 unknown &nbsp;drive \
offline<BR>Jul 31 08:27:43 unknown smbd[603]: [ID 766186 daemon.error] \
NbtDatagramDecode[11]: too small packet<BR>Jul 31 08:39:43 unknown smbd[603]: [ID \
766186 daemon.error] NbtDatagramDecode[11]: too small packet<BR>Jul 31 08:44:50 \
unknown /sbin/dhcpagent[95]: [ID 732317 daemon.warning] accept_v4_acknak: ACK packet \
on nge0 missing mandatory lease option, ignored<BR>Jul 31 08:44:58 unknown last \
message repeated 3 times<BR>Jul 31 08:45:06 unknown /sbin/dhcpagent[95]: [ID 732317 \
daemon.warning] accept_v4_acknak: ACK packet on nge0 missing mandatory lease option, \
ignored<BR>Jul 31 08:45:06 unknown last message repeated 1 time<BR>Jul 31 08:51:44 \
unknown smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: too small \
packet<BR>Jul 31 09:03:44 unknown smbd[603]: [ID 766186 daemon.error] \
NbtDatagramDecode[11]: too small packet<BR>Jul 31 09:13:51 unknown \
/sbin/dhcpagent[95]: [ID 732317 daemon.warning] accept_v4_acknak: ACK packet on nge0 \
missing mandatory lease option, ignored<BR>Jul 31 09:14:09 unknown last message \
repeated 5 times<BR>Jul 31 09:15:44 unknown smbd[603]: [ID 766186 daemon.error] \
NbtDatagramDecode[11]: too small packet<BR>Jul 31 09:27:44 unknown smbd[603]: [ID \
766186 daemon.error] NbtDatagramDecode[11]: too small packet<BR>Jul 31 09:27:55 \
unknown pcplusmp: [ID 444295 kern.info] pcplusmp: ide (ata) instance #1 vector 0xf \
ioapic 0x4 intin 0xf is bound to cpu 3</FONT><BR> &nbsp;<BR>
cfgadm reports that the port is empty but still configured:<BR>
<FONT face="Courier New, Courier, Monospace"># \
cfgadm<BR>Ap_Id&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
Type&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Receptacle&nbsp;&nbsp; \
Occupant&nbsp;&nbsp;&nbsp;&nbsp; \
Condition<BR>usb1/3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
unknown&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; empty&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
configured&nbsp;&nbsp; unusable</FONT><BR><BR> 4. 9:32am&nbsp;I now tried writing \
more data to the pool, to see if I can trigger the I/O error you are seeing.&nbsp; I \
tried making a second copy of the files on the USB drive in the Solaris File manager, \
but that attempt simply hung the copy dialog.&nbsp; I'm still seeing nothing else \
that appears relevant in /var/adm/messages.<BR> &nbsp;<BR>
5. 10:08am&nbsp;While checking free space, I found that although df works, "df -kh" \
hangs, apparently when it tries to query any zfs pool:<BR> <FONT face="Courier New, \
Courier, Monospace" size=1># \
df<BR>/&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
(/dev/dsk/c1t0d0s0 ): 2504586 blocks&nbsp;&nbsp; 656867 \
files<BR>/devices&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
(/devices&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 \
blocks&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 \
files<BR>/dev&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
(/dev&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 \
blocks&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 \
files<BR>/system/contract&nbsp;&nbsp; \
(ctfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 blocks 2147483609 \
files<BR>/proc&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
(proc&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 blocks&nbsp;&nbsp;&nbsp; 29902 \
files<BR>/etc/mnttab&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
(mnttab&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 \
blocks&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 files<BR>/etc/svc/volatile&nbsp; \
(swap&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
): 9850928 blocks&nbsp; 1180374 files<BR>/system/object&nbsp;&nbsp;&nbsp;&nbsp; \
(objfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 blocks 2147483409 \
files<BR>/etc/dfs/sharetab&nbsp; \
(sharefs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 blocks 2147483646 \
files<BR>/lib/libc.so.1&nbsp;&nbsp;&nbsp;&nbsp; (/usr/lib/libc/libc_hwcap2.so.1): \
2504586 blocks&nbsp;&nbsp; 656867 \
files<BR>/dev/fd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
(fd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
):&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 \
blocks&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 \
files<BR>/tmp&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
(swap&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
): 9850928 blocks&nbsp; 1180374 \
files<BR>/var/run&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
(swap&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
): 9850928 blocks&nbsp; 1180374 \
files<BR>/export/home&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (/dev/dsk/c1t0d0s7 \
):881398942 blocks 53621232 \
files<BR>/rc-pool&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
(rc-pool&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ):4344346098 \
blocks 4344346098 files<BR>/rc-pool/admin&nbsp;&nbsp;&nbsp;&nbsp; \
(rc-pool/admin&nbsp;&nbsp;&nbsp;&nbsp; ):4344346098 blocks 4344346098 \
files<BR>/rc-pool/ross-home (rc-pool/ross-home ):4344346098 blocks 4344346098 \
files<BR>/rc-pool/vmware&nbsp;&nbsp;&nbsp; (rc-pool/vmware&nbsp;&nbsp;&nbsp; \
):4344346098 blocks 4344346098 \
files<BR>/rc-usb&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
(rc-usb&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ):153725153 \
blocks 153725153 files<BR># df \
-kh<BR>Filesystem&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
size&nbsp;&nbsp; used&nbsp; avail capacity&nbsp; Mounted \
on<BR>/dev/dsk/c1t0d0s0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 7.2G&nbsp;&nbsp; \
6.0G&nbsp;&nbsp; 1.1G&nbsp;&nbsp;&nbsp; 85%&nbsp;&nbsp;&nbsp; \
/<BR>/devices&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; \
0%&nbsp;&nbsp;&nbsp; \
/devices<BR>/dev&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; \
0%&nbsp;&nbsp;&nbsp; \
/dev<BR>ctfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; \
0%&nbsp;&nbsp;&nbsp; \
/system/contract<BR>proc&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; \
0%&nbsp;&nbsp;&nbsp; \
/proc<BR>mnttab&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; \
0%&nbsp;&nbsp;&nbsp; \
/etc/mnttab<BR>swap&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
4.7G&nbsp;&nbsp; 1.1M&nbsp;&nbsp; 4.7G&nbsp;&nbsp;&nbsp;&nbsp; 1%&nbsp;&nbsp;&nbsp; \
/etc/svc/volatile<BR>objfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; \
0%&nbsp;&nbsp;&nbsp; \
/system/object<BR>sharefs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; \
0%&nbsp;&nbsp;&nbsp; \
/etc/dfs/sharetab<BR>/usr/lib/libc/libc_hwcap2.so.1<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;& \
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
7.2G&nbsp;&nbsp; 6.0G&nbsp;&nbsp; 1.1G&nbsp;&nbsp;&nbsp; 85%&nbsp;&nbsp;&nbsp; \
/lib/libc.so.1<BR>fd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; 0K&nbsp;&nbsp;&nbsp;&nbsp; \
0%&nbsp;&nbsp;&nbsp; \
/dev/fd<BR>swap&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
4.7G&nbsp;&nbsp;&nbsp; 48K&nbsp;&nbsp; 4.7G&nbsp;&nbsp;&nbsp;&nbsp; \
1%&nbsp;&nbsp;&nbsp; \
/tmp<BR>swap&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
4.7G&nbsp;&nbsp;&nbsp; 76K&nbsp;&nbsp; 4.7G&nbsp;&nbsp;&nbsp;&nbsp; \
1%&nbsp;&nbsp;&nbsp; /var/run<BR>/dev/dsk/c1t0d0s7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
425G&nbsp;&nbsp; 4.8G&nbsp;&nbsp; 416G&nbsp;&nbsp;&nbsp;&nbsp; 2%&nbsp;&nbsp;&nbsp; \
/export/home</FONT><BR> &nbsp;<BR>
6. 10:35am&nbsp; It's now been two hours, neither "zpool status" nor "zfs list" have \
ever finished.&nbsp; The file copy attempt has also been hung for over an hour \
(although that's not unexpected with 'wait' as the failmode).<BR> &nbsp;<BR>
Richard, you say ZFS is not silently failing, well for me it appears that it \
is.&nbsp; I can't see any warnings from ZFS, I can't get any status \
information.&nbsp; I see no way that I could find out what files are going to be lost \
on this server.<BR> &nbsp;<BR>
Yes, I'm now aware that the pool has hung since file operations are hanging, however \
had that been my first indication of a problem I believe I am now left in a position \
where I cannot find out either the cause, nor the files affected.&nbsp; I don't \
believe I have any way to find out which operations had completed without error, but \
are not currently committed to disk.&nbsp; I certainly don't get the status message \
you do saying permanent errors have been found in files.<BR> &nbsp;<BR>
I plugged the USB drive back in now, Solaris detected it ok, but ZFS is still \
hung.&nbsp; The rest of /var/adm/messages is:<BR> <FONT face="Courier New, Courier, \
Monospace" size=1>Jul 31 09:39:44 unknown smbd[603]: [ID 766186 daemon.error] \
NbtDatagramDecode[11]: too small packet<BR>Jul 31 09:45:22 unknown \
/sbin/dhcpagent[95]: [ID 732317 daemon.warning] accept_v4_acknak: ACK packet on nge0 \
missing mandatory lease option, ignored<BR>Jul 31 09:45:38 unknown last message \
repeated 5 times<BR>Jul 31 09:51:44 unknown smbd[603]: [ID 766186 daemon.error] \
NbtDatagramDecode[11]: too small packet<BR>Jul 31 10:03:44 unknown last message \
repeated 2 times<BR>Jul 31 10:14:27 unknown /sbin/dhcpagent[95]: [ID 732317 \
daemon.warning] accept_v4_acknak: ACK packet on nge0 missing mandatory lease option, \
ignored<BR>Jul 31 10:14:45 unknown last message repeated 5 times<BR>Jul 31 10:15:44 \
unknown smbd[603]: [ID 766186 daemon.error] NbtDatagramDecode[11]: too small \
packet<BR>Jul 31 10:27:45 unknown smbd[603]: [ID 766186 daemon.error] \
NbtDatagramDecode[11]: too small packet</FONT><BR> <FONT face="Courier New, Courier, \
Monospace" size=1>Jul 31 10:36:25 unknown usba: [ID 691482 kern.warning] WARNING: \
</FONT><A><FONT face="Courier New, Courier, Monospace" \
size=1>/pci@0,0/pci15d9,a011@2,1/storage@3</FONT></A><FONT face="Courier New, \
Courier, Monospace" size=1> (scsa2usb0): Reinserted device is accessible \
again.<BR>Jul 31 10:39:45 unknown smbd[603]: [ID 766186 daemon.error] \
NbtDatagramDecode[11]: too small packet<BR>Jul 31 10:45:53 unknown \
/sbin/dhcpagent[95]: [ID 732317 daemon.warning] accept_v4_acknak: ACK packet on nge0 \
missing mandatory lease option, ignored<BR>Jul 31 10:46:09 unknown last message \
repeated 5 times<BR>Jul 31 10:51:45 unknown smbd[603]: [ID 766186 daemon.error] \
NbtDatagramDecode[11]: too small packet</FONT><BR> &nbsp;<BR>
7. 10:55am&nbsp; Gave up on ZFS ever recovering.&nbsp; A shutdown attempt hung as \
expected.&nbsp; I hard-reset the computer.<BR> &nbsp;<BR>
Ross<BR>
&nbsp;<BR>
&nbsp;<BR>
<BR><BR>&gt; Date: Wed, 30 Jul 2008 11:17:08 -0700<BR>&gt; From: \
Richard.Elling@Sun.COM<BR>&gt; Subject: Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 \
hang when drive removed<BR>&gt; To: myxiplx@hotmail.com<BR>&gt; CC: \
zfs-discuss@opensolaris.org<BR>&gt; <BR>&gt; I was able to reproduce this in b93, but \
might have a different<BR>&gt; interpretation of the conditions. More \
below...<BR>&gt; <BR>&gt; Ross Smith wrote:<BR>&gt; &gt; A little more information \
today. I had a feeling that ZFS would <BR>&gt; &gt; continue quite some time before \
giving an error, and today I've shown <BR>&gt; &gt; that you can carry on working \
with the filesystem for at least half an <BR>&gt; &gt; hour with the disk \
removed.<BR>&gt; &gt; <BR>&gt; &gt; I suspect on a system with little load you could \
carry on working for <BR>&gt; &gt; several hours without any indication that there is \
a problem. It <BR>&gt; &gt; looks to me like ZFS is caching reads &amp; writes, and \
that provided <BR>&gt; &gt; requests can be fulfilled from the cache, it doesn't care \
whether the <BR>&gt; &gt; disk is present or not.<BR>&gt; <BR>&gt; In my \
USB-flash-disk-sudden-removal-while-writing-big-file-test,<BR>&gt; 1. I/O to the \
missing device stopped (as I expected)<BR>&gt; 2. FMA kicked in, as expected.<BR>&gt; \
3. /var/adm/messages recorded "Command failed to complete... device gone."<BR>&gt; 4. \
After exactly 9 minutes, 17,951 e-reports had been processed and the<BR>&gt; \
diagnosis was complete. FMA logged the following to /var/adm/messages<BR>&gt; \
<BR>&gt; Jul 30 10:33:44 grond scsi: [ID 107833 kern.warning] WARNING: <BR>&gt; \
/pci@0,0/pci1458,5004@b,1/storage@8/disk@0,0 (sd1):<BR>&gt; Jul 30 10:33:44 grond \
Command failed to complete...Device is gone<BR>&gt; Jul 30 10:42:31 grond fmd: [ID \
441519 daemon.error] SUNW-MSG-ID: <BR>&gt; ZFS-8000-FD, TYPE: Fault, VER: 1, \
SEVERITY: Major<BR>&gt; Jul 30 10:42:31 grond EVENT-TIME: Wed Jul 30 10:42:30 PDT \
2008<BR>&gt; Jul 30 10:42:31 grond PLATFORM: , CSN: , HOSTNAME: grond<BR>&gt; Jul 30 \
10:42:31 grond SOURCE: zfs-diagnosis, REV: 1.0<BR>&gt; Jul 30 10:42:31 grond \
EVENT-ID: d99769aa-28e8-cf16-d181-945592130525<BR>&gt; Jul 30 10:42:31 grond DESC: \
The number of I/O errors associated with a <BR>&gt; ZFS device exceeded<BR>&gt; Jul \
30 10:42:31 grond acceptable levels. Refer to <BR>&gt; http://sun.com/msg/ZFS-8000-FD \
for more information.<BR>&gt; Jul 30 10:42:31 grond AUTO-RESPONSE: The device has \
been offlined and <BR>&gt; marked as faulted. An attempt<BR>&gt; Jul 30 10:42:31 \
grond will be made to activate a hot spare if <BR>&gt; available.<BR>&gt; Jul 30 \
10:42:31 grond IMPACT: Fault tolerance of the pool may be <BR>&gt; \
compromised.<BR>&gt; Jul 30 10:42:31 grond REC-ACTION: Run 'zpool status -x' and \
replace <BR>&gt; the bad device.<BR>&gt; <BR>&gt; The above URL shows what you \
expect, but more (and better) info<BR>&gt; is available from zpool status -xv<BR>&gt; \
<BR>&gt; pool: rmtestpool<BR>&gt; state: UNAVAIL<BR>&gt; status: One or more devices \
are faultd in response to IO failures.<BR>&gt; action: Make sure the affected devices \
are connected, then run 'zpool <BR>&gt; clear'.<BR>&gt; see: \
http://www.sun.com/msg/ZFS-8000-HC<BR>&gt; scrub: none requested<BR>&gt; \
config:<BR>&gt; <BR>&gt; NAME STATE READ WRITE CKSUM<BR>&gt; rmtestpool UNAVAIL 0 \
15.7K 0 insufficient replicas<BR>&gt; c2t0d0p0 FAULTED 0 15.7K 0 experienced I/O \
failures<BR>&gt; <BR>&gt; errors: Permanent errors have been detected in the \
following files:<BR>&gt; <BR>&gt; /rmtestpool/random.data<BR>&gt; <BR>&gt; <BR>&gt; \
If you surf to http://www.sun.com/msg/ZFS-8000-HC you'll<BR>&gt; see words to the \
effect that,<BR>&gt; The pool has experienced I/O failures. Since the ZFS pool \
property<BR>&gt; 'failmode' is set to 'wait', all I/Os (reads and writes) are<BR>&gt; \
blocked. See the zpool(1M) manpage for more information on the<BR>&gt; 'failmode' \
property. Manual intervention is required for I/Os to<BR>&gt; be serviced.<BR>&gt; \
<BR>&gt; &gt; <BR>&gt; &gt; I would guess that ZFS is attempting to write to the disk \
in the <BR>&gt; &gt; background, and that this is silently failing.<BR>&gt; <BR>&gt; \
It is clearly not silently failing.<BR>&gt; <BR>&gt; However, the default failmode \
property is set to "wait" which will patiently<BR>&gt; wait forever. If you would \
rather have the I/O fail, then you should change<BR>&gt; the failmode to "continue" I \
would not normally recommend a failmode of<BR>&gt; "panic"<BR>&gt; <BR>&gt; Now to \
figure out how to recover gracefully... zpool clear isn't happy...<BR>&gt; <BR>&gt; \
[sidebar]<BR>&gt; while performing this experiment, I noticed that fmd was \
checkpointing<BR>&gt; the diagnosis engine to disk in the \
/var/fm/fmd/ckpt/zfs-diagnosis <BR>&gt; directory.<BR>&gt; If this had been the boot \
disk, with failmode=wait, I'm not convinced<BR>&gt; that we'd get a complete \
diagnosis... I'll explore that later.<BR>&gt; [/sidebar]<BR>&gt; <BR>&gt; -- \
richard<BR>&gt; <BR><BR><br /><hr />Win £3000 to spend on whatever you want at Uni! \
<a href='http://clk.atdmt.com/UKM/go/101719803/direct/01/' target='_new'>Click here \
to WIN!</a></body> </html>



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic