'Re: I/O to pool appears to be hung, panic !'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freebsd-scsi
Subject:    Re: I/O to pool appears to be hung, panic !
From:       Ben RUBSON <ben.rubson () gmail ! com>
Date:       2017-07-03 15:40:46
Message-ID: D9AE9CA6-D05C-4909-A56E-1CE6D149E71D () gmail ! com
[Download RAW message or body]

> On 03 Jul 2017, at 17:27, Edward Napierala <trasz@freebsd.org> wrote:
> 
> 2017-07-03 14:36 GMT+01:00 Ben RUBSON <ben.rubson@gmail.com \
> <mailto:ben.rubson@gmail.com>>:
> > On 03 Jul 2017, at 13:10, Edward Napierala <trasz@freebsd.org \
> > <mailto:trasz@freebsd.org>> wrote: 
> > 2017-07-03 10:07 GMT+01:00 Ben RUBSON <ben.rubson@gmail.com \
> > <mailto:ben.rubson@gmail.com> <mailto:ben.rubson@gmail.com \
> > <mailto:ben.rubson@gmail.com>>>: 
> > > On 29 Jun 2017, at 15:36, Ben RUBSON <ben.rubson@gmail.com \
> > > <mailto:ben.rubson@gmail.com> <mailto:ben.rubson@gmail.com \
> > > <mailto:ben.rubson@gmail.com>>> wrote: 
> > > > On 29 Jun 2017, at 14:43, Fabian Keil <freebsd-listen@fabiankeil.de \
> > > > <mailto:freebsd-listen@fabiankeil.de> <mailto:freebsd-listen@fabiankeil.de \
> > > > <mailto:freebsd-listen@fabiankeil.de>>> wrote:
> > > 
> > > Thank you for your feedback Fabian.
> > > 
> > > > Ben RUBSON <ben.rubson@gmail.com <mailto:ben.rubson@gmail.com> \
> > > > <mailto:ben.rubson@gmail.com <mailto:ben.rubson@gmail.com>>> wrote: 
> > > > > One of my servers did a kernel panic last night, giving the following \
> > > > >                 message :
> > > > > panic: I/O to pool 'home' appears to be hung on vdev guid 122... at \
> > > > > '/dev/label/G23iscsi'.
> > > > [...]
> > > > > Here are some numbers regarding this disk, taken from the server hosting \
> > > > > the pool : (unfortunately not from the iscsi target server)
> > > > > https://s23.postimg.org/zd8jy9xaj/busydisk.png \
> > > > > <https://s23.postimg.org/zd8jy9xaj/busydisk.png> \
> > > > > <https://s23.postimg.org/zd8jy9xaj/busydisk.png \
> > > > > <https://s23.postimg.org/zd8jy9xaj/busydisk.png>> 
> > > > > We clearly see that suddendly, disk became 100% busy, meanwhile CPU was \
> > > > > almost idle.
> > 
> > We also clearly see that 5 minutes later (02:09) disk seems to be back but became \
> > 100% busy again, and that 16 minutes later (default vfs.zfs.deadman_synctime_ms), \
> > panic occurred. 
> > > > > No error message at all on both servers.
> > > > [...]
> > > > > The only log I have is the following stacktrace taken from the server \
> > > > >                 console :
> > > > > panic: I/O to pool 'home' appears to be hung on vdev guid 122... at \
> > > > > '/dev/label/G23iscsi'. cpuid = 0
> > > > > KDB: stack backtrace:
> > > > > #0 0xffffffff80b240f7 at kdb_backtrace+0x67
> > > > > #1 0xffffffff80ad9462 at vpanic+0x182
> > > > > #2 0xffffffff80ad92d3 at panic+0x43
> > > > > #3 0xffffffff82238fa7 at vdev_deadman+0x127
> > > > > #4 0xffffffff82238ec0 at vdev_deadman+0x40
> > > > > #5 0xffffffff82238ec0 at vdev_deadman+0x40
> > > > > #6 0xffffffff8222d0a6 at spa_deadman+0x86
> > > > > #7 0xffffffff80af32da at softclock_call_cc+0x18a
> > > > > #8 0xffffffff80af3854 at softclock+0x94
> > > > > #9 0xffffffff80a9348f at intr_event_execute_handlers+0x20f
> > > > > #10 0xffffffff80a936f6 at ithread_loop+0xc6
> > > > > #11 0xffffffff80a900d5 at fork_exit+0x85
> > > > > #12 0xffffffff80f846fe at fork_trampoline+0xe
> > > > > Uptime: 92d2h47m6s
> > > > > 
> > > > > I would have been pleased to make a dump available.
> > > > > However, despite my (correct ?) configuration, server did not dump :
> > > > > (nevertheless, "sysctl debug.kdb.panic=1" make it to dump)
> > > > > # grep ^dump /boot/loader.conf /etc/rc.conf
> > > > > /boot/loader.conf:dumpdev="/dev/mirror/swap"
> > > > > /etc/rc.conf:dumpdev="AUTO"
> > > > 
> > > > You may want to look at the NOTES section in gmirror(8).
> > > 
> > > Yes, I should already be OK (prefer algorithm set).
> > > 
> > > > > I use default kernel, with a rebuilt zfs module :
> > > > > # uname -v
> > > > > FreeBSD 11.0-RELEASE-p8 #0: Wed Feb 22 06:12:04 UTC 2017     \
> > > > > root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC 
> > > > > I use the following iSCSI configuration, which disconnects the disks "as \
> > > > > soon as" they are unavailable : kern.iscsi.ping_timeout=5
> > > > > kern.iscsi.fail_on_disconnection=1
> > > > > kern.iscsi.iscsid_timeout=5
> > > > > 
> > > > > I then think disk was at least correctly reachable during these 20 busy \
> > > > > minutes. 
> > > > > So, any idea why I could have faced this issue ?
> > > > 
> > > > Is it possible that the system was under memory pressure?
> > > 
> > > No I don't think it was :
> > > https://s1.postimg.org/uvsebpyyn/busydisk2.png \
> > > <https://s1.postimg.org/uvsebpyyn/busydisk2.png> \
> > > <https://s1.postimg.org/uvsebpyyn/busydisk2.png \
> > > <https://s1.postimg.org/uvsebpyyn/busydisk2.png>> More than 2GB of available \
> > > memory. Swap not used (624kB).
> > > ARC behaviour seems correct (anon increases because ZFS can't actually write I \
> > > think). Regarding the pool itself, it was receiving data at 6MB/s, sending \
> > > around 30kB blocks to disks. When disk went busy, throughput fell to some kB, \
> > > with 128kB blocks. 
> > > > geli's use of malloc() is known to cause deadlocks under memory pressure:
> > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209759 \
> > > > <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209759> \
> > > > <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209759 \
> > > > <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209759>> 
> > > > Given that gmirror uses malloc() as well it probably has the same issue.
> > > 
> > > I don't use geli so I should not face this issue.
> > > 
> > > > > I would have thought ZFS would have taken the busy device offline, instead \
> > > > > of raising a panic. Perhaps it is already possible to make ZFS behave like \
> > > > > this ?
> > > > 
> > > > There's a tunable for this: vfs.zfs.deadman_enabled.
> > > > If the panic is just a symptom of the deadlock it's unlikely
> > > > to help though.
> > > 
> > > I think this tunable should have prevented the server from having raised a \
> > > panic : # sysctl -d vfs.zfs.deadman_enabled
> > > vfs.zfs.deadman_enabled: Kernel panic on stalled ZFS I/O
> > > # sysctl vfs.zfs.deadman_enabled
> > > vfs.zfs.deadman_enabled: 1
> > > 
> > > But not sure how it would have behaved then...
> > > (busy disk miraculously back to normal status, memory pressure due to anon \
> > > increasing...)
> > 
> > I then think it would be nice, once vfs.zfs.deadman_synctime_ms has expired,
> > to be able to take the busy device offline instead of raising a panic.
> > Currently, disabling deadman will avoid the panic but will let the device slowing \
> > down the pool. 
> > I still did not found the root cause of this issue, not sure I will,
> > quite difficult actually with a stacktrace and some performance graphs only :/
> > 
> > What exactly is the disk doing when that happens?  What does "gstat" say?  If the \
> > iSCSI target is also FreeBSD, what does ctlstat say?
> 
> As shown on this graph made with gstat numbers from initiator :
> https://s23.postimg.org/zd8jy9xaj/busydisk.png \
> <https://s23.postimg.org/zd8jy9xaj/busydisk.png> The disk is continuously writing 3 \
> MBps before the issue happens. When it occurs, response time increases to around 30 \
> seconds (100% busy), and consequently disk throughput drops down to some kBps.
> CPU stays at an almost fully idle level.
> 
> As shown here, no memory pressure :
> https://s1.postimg.org/uvsebpyyn/busydisk2.png \
> <https://s1.postimg.org/uvsebpyyn/busydisk2.png> \
> <https://s1.postimg.org/uvsebpyyn/busydisk2.png \
> <https://s1.postimg.org/uvsebpyyn/busydisk2.png>> 
> At the end of graphs' lines, panic is raised.
> 
> iSCSI target is also FreeBSD, unfortunately ctlstat was not running during the \
> issue occurred. So numbers will be average since system startup (102 days ago).
> I also do not have gstat numbers from this disk on target side
> (to help finding if it's a hardware issue, a iSCSI issue or something else).
> I will think about collecting these numbers if ever issue occurs again.
> 
> It's kind of hard to say something definitive at this point, but I suspect it's a \
> problem at the target side.  I got a report about something quite similar some two \
> years ago, and it turned out to be a problem with a disk controller on the target.

Thank you for your feedback.
I then :
- enabled gstat collection on target, to also have numbers on target, not only on \
                initiator ;
- enabled controller logging (dev.mps.0.debug_level=0x1B) ;
- disabled deadman.

We should be able to investigate further in case issue occurs again.

Of course feel free to notify me in case you have other ideas !

Thank you again,

Ben

_______________________________________________
freebsd-scsi@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"

[prev in list] [next in list] [prev in thread] [next in thread]