'[zfs-discuss] Deadlock with snv_80+'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       zfs-discuss
Subject:    [zfs-discuss] Deadlock with snv_80+
From:       "Mike Gerdts" <mgerdts () gmail ! com>
Date:       2007-12-28 6:00:35
Message-ID: 65f8f3ad0712272200j3c571ffak50d37e91eeefb34e () mail ! gmail ! com
[Download RAW message or body]

I've experienced a hang (cannot re-establish lost VNC session, ssh as
me, console login as root) on an snv_80+ system.  CTEact tells me:

Searching for mutex deadlocks:                                  ### MUTEX ###

thread 2a102765ca0 is waiting for mutex 0x3000466e5c8
mutex is owned by thread 2a10282dca0
stack of thread 2a102765ca0

last ran: 2 mins 43 secs ago,  1 min 30 secs before panic
stack trace is:

unix: swtch ()
genunix: turnstile_block+0x5a4 (0,0,0x3000466e5c8,mutex_sobj_ops,0,0)
unix: mutex_vector_enter+0x528 (0x3000466e5c8)
unix: mutex_enter (0x3000466e5c8)
zfs: vdev_queue_io+0x7c (0x3004f73dd20)
zfs: vdev_disk_io_start+0x158 (0x3004f73dd20)
zfs: zio_vdev_io_start (0x3004f73dd20)
zfs: zio_execute+0xf4 (0x3004f73dd20)
zfs: zio_nowait (?)
zfs: vdev_mirror_io_start+0x1e4 (0x3004f754700)
zfs: zio_vdev_io_start (0x3004f754700)
zfs: zio_execute+0xf4 (0x3004f754700)
zfs: zio_nowait (?)
zfs: vdev_mirror_io_start+0x1e4 (0x3004e468828)
zfs: zio_vdev_io_start (0x3004e468828)
zfs: zio_execute+0xf4 (0x3004e468828)
genunix: taskq_thread+0x1f0 (0x3000506d3c0)
unix: thread_start+4 ()

stack of thread 2a10282dca0

last ran: 1 min 13 secs ago, 17 ticks before panic
stack trace is:

unix: swtch ()
genunix: cv_wait+0x5c (0x2a10282de46,0x2a10282de48)
genunix: delay+0x84 (0x19)
unix: page_resv+0x78 (3,0)
unix: segkmem_xalloc+0xcc (0x3000000a000,0,0x6000,0,0,segkmem_page_create,kvp)
unix: segkmem_alloc_vn+0xac (0x3000000a000,0x6000,0,kvp)
unix: segkmem_alloc (*0x3000001a060,0x6000,?)
genunix: vmem_xalloc+0x6dc (0x3000001a000,0x6000,0x2000,0,0,0,0,0)
genunix: vmem_alloc+0x210 (0x3000001a000,0x6000,0)
genunix: kmem_slab_create+0x44 (0x300060a9908,0)
genunix: kmem_slab_alloc+0x5c (0x300060a9908,0)
genunix: kmem_cache_alloc+0x144 (0x300060a9908,0)
zfs: zio_buf_alloc (0x5e00)
zfs: vdev_queue_io_to_issue+0x168 (0x3000466e528,0x23)
zfs: vdev_queue_io_done+0x54 (0x3005df0fbc8)
zfs: vdev_disk_io_done+4 (0x3005df0fbc8)
zfs: zio_vdev_io_done (0x3005df0fbc8)
zfs: zio_execute+0xf4 (0x3005df0fbc8)
genunix: taskq_thread+0x1f0 (0x3000506d4b0)
unix: thread_start+4 ()

There are 30 other threads, some of which have stacks like the one in
turnstile_lock() above and others that look more like:

thread 2a10281dca0 is waiting for mutex 0x3000466e5c8
mutex is owned by thread 2a10282dca0
stack of thread 2a10281dca0

last ran: 2 mins 43 secs ago,  1 min 30 secs before panic
stack trace is:

unix: swtch ()
genunix: turnstile_block+0x5a4 (0x30006630790,0,0x3000466e5c8,mutex_sobj_ops,0,0
)
unix: mutex_vector_enter+0x528 (0x3000466e5c8)
unix: mutex_enter (0x3000466e5c8)
zfs: vdev_queue_io_done+0x9c (0x3004f49e5c0)
zfs: vdev_disk_io_done+4 (0x3004f49e5c0)
zfs: zio_vdev_io_done (0x3004f49e5c0)
zfs: zio_execute+0xf4 (0x3004f49e5c0)
genunix: taskq_thread+0x1f0 (0x3000506d4b0)
unix: thread_start+4 ()

This kernel is one that I built from bits a little newer than snv_80.
The tip of my repository looks like:

changeset:   5732:d351713150c2
tag:         tip
user:        randyf
date:        Thu Dec 20 16:51:30 2007 -0800
summary:     6631164 AD_FORCE_SUSPEND_TO_RAM test case actually powers
off, when it should just return

While all of this was happening, I was patiently waiting for this hg
clone to finish.  Normally it would be done in a few minutes - I let
it run over night.

***
process id 104159 is
/usr/bin/python /usr/bin/hg clone ssh://anon@hg.opensolaris.org/hg/onnv/onnv-ga
, parent process is 104133

uid is 0x3e8 0t1000, gid is 0x3e8 0t1000

thread addr 3005131ed60, proc addr 300096658a8, lwp addr 3004754c0e8

t_state is 0x1 - TS_SLEEP

Scheduling info:
t_pri is 0x3b, t_epri is 0, t_cid is 0x1
scheduling class is: TS
t_disp_time: is 0xd0fd1c, 0t13696284
last ran: 10 hours 23 mins 0 secs ago,  10 hours 21 mins 47 secs before panic
 on cpu 0

pc is 1104ba8, sp is 2a102ec9440, t_stk 2a102ec9ae0

stack trace is:

unix: swtch ()
genunix: cv_wait+0x5c (0x300492dde18,0x300492dde10)
zfs: zio_wait+0x54 (0x300492ddb88)
zfs: dmu_buf_hold_array_by_dnode+0x1c0 (0x3004a16b7f8,0,0x154,1,0x7ae76a35,
0x2a102ec97fc,0x2a102ec97f0)
zfs: dmu_buf_hold_array+0x60 (0x300082ed2f0,?,0,0x154,1,0x7ae76a35,0x2a102ec97fc
,0x2a102ec97f0)
zfs: dmu_read_uio+0x30 (0x300082ed2f0,*0x30049cc2070,0x2a102ec9a10,0x154)
zfs: zfs_read+0x1e8 (0x30023cecf80,0x2a102ec9a10,?,?,?)
genunix: fop_read+0x48 (0x30023cecf80,0x2a102ec9a10,0,0x300039600e8,0)
genunix: read+0x1fc (3,?,0x200?)
unix: syscall_trap32+0x1e8 ()

The system is an dual proc Ultra 2 w/ 768 MB RAM.  I've done very
similar things on snv_76 and earlier.  The key difference is that I
had previously been running an unmirrored zpool.  Now it looks like...

# zpool status
  pool: pool0
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        pool0         ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c0t1d0s7  ONLINE       0     0     0
            c0t0d0s7  ONLINE       0     0     0

errors: No known data errors

I'll keep the crash dump around for a while in the event that someone
has interest in digging into it more.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[prev in list] [next in list] [prev in thread] [next in thread]