[prev in list] [next in list] [prev in thread] [next in thread]
List: zfs-discuss
Subject: [zfs-discuss] Deadlock with snv_80+
From: "Mike Gerdts" <mgerdts () gmail ! com>
Date: 2007-12-28 6:00:35
Message-ID: 65f8f3ad0712272200j3c571ffak50d37e91eeefb34e () mail ! gmail ! com
[Download RAW message or body]
I've experienced a hang (cannot re-establish lost VNC session, ssh as
me, console login as root) on an snv_80+ system. CTEact tells me:
Searching for mutex deadlocks: ### MUTEX ###
thread 2a102765ca0 is waiting for mutex 0x3000466e5c8
mutex is owned by thread 2a10282dca0
stack of thread 2a102765ca0
last ran: 2 mins 43 secs ago, 1 min 30 secs before panic
stack trace is:
unix: swtch ()
genunix: turnstile_block+0x5a4 (0,0,0x3000466e5c8,mutex_sobj_ops,0,0)
unix: mutex_vector_enter+0x528 (0x3000466e5c8)
unix: mutex_enter (0x3000466e5c8)
zfs: vdev_queue_io+0x7c (0x3004f73dd20)
zfs: vdev_disk_io_start+0x158 (0x3004f73dd20)
zfs: zio_vdev_io_start (0x3004f73dd20)
zfs: zio_execute+0xf4 (0x3004f73dd20)
zfs: zio_nowait (?)
zfs: vdev_mirror_io_start+0x1e4 (0x3004f754700)
zfs: zio_vdev_io_start (0x3004f754700)
zfs: zio_execute+0xf4 (0x3004f754700)
zfs: zio_nowait (?)
zfs: vdev_mirror_io_start+0x1e4 (0x3004e468828)
zfs: zio_vdev_io_start (0x3004e468828)
zfs: zio_execute+0xf4 (0x3004e468828)
genunix: taskq_thread+0x1f0 (0x3000506d3c0)
unix: thread_start+4 ()
stack of thread 2a10282dca0
last ran: 1 min 13 secs ago, 17 ticks before panic
stack trace is:
unix: swtch ()
genunix: cv_wait+0x5c (0x2a10282de46,0x2a10282de48)
genunix: delay+0x84 (0x19)
unix: page_resv+0x78 (3,0)
unix: segkmem_xalloc+0xcc (0x3000000a000,0,0x6000,0,0,segkmem_page_create,kvp)
unix: segkmem_alloc_vn+0xac (0x3000000a000,0x6000,0,kvp)
unix: segkmem_alloc (*0x3000001a060,0x6000,?)
genunix: vmem_xalloc+0x6dc (0x3000001a000,0x6000,0x2000,0,0,0,0,0)
genunix: vmem_alloc+0x210 (0x3000001a000,0x6000,0)
genunix: kmem_slab_create+0x44 (0x300060a9908,0)
genunix: kmem_slab_alloc+0x5c (0x300060a9908,0)
genunix: kmem_cache_alloc+0x144 (0x300060a9908,0)
zfs: zio_buf_alloc (0x5e00)
zfs: vdev_queue_io_to_issue+0x168 (0x3000466e528,0x23)
zfs: vdev_queue_io_done+0x54 (0x3005df0fbc8)
zfs: vdev_disk_io_done+4 (0x3005df0fbc8)
zfs: zio_vdev_io_done (0x3005df0fbc8)
zfs: zio_execute+0xf4 (0x3005df0fbc8)
genunix: taskq_thread+0x1f0 (0x3000506d4b0)
unix: thread_start+4 ()
There are 30 other threads, some of which have stacks like the one in
turnstile_lock() above and others that look more like:
thread 2a10281dca0 is waiting for mutex 0x3000466e5c8
mutex is owned by thread 2a10282dca0
stack of thread 2a10281dca0
last ran: 2 mins 43 secs ago, 1 min 30 secs before panic
stack trace is:
unix: swtch ()
genunix: turnstile_block+0x5a4 (0x30006630790,0,0x3000466e5c8,mutex_sobj_ops,0,0
)
unix: mutex_vector_enter+0x528 (0x3000466e5c8)
unix: mutex_enter (0x3000466e5c8)
zfs: vdev_queue_io_done+0x9c (0x3004f49e5c0)
zfs: vdev_disk_io_done+4 (0x3004f49e5c0)
zfs: zio_vdev_io_done (0x3004f49e5c0)
zfs: zio_execute+0xf4 (0x3004f49e5c0)
genunix: taskq_thread+0x1f0 (0x3000506d4b0)
unix: thread_start+4 ()
This kernel is one that I built from bits a little newer than snv_80.
The tip of my repository looks like:
changeset: 5732:d351713150c2
tag: tip
user: randyf
date: Thu Dec 20 16:51:30 2007 -0800
summary: 6631164 AD_FORCE_SUSPEND_TO_RAM test case actually powers
off, when it should just return
While all of this was happening, I was patiently waiting for this hg
clone to finish. Normally it would be done in a few minutes - I let
it run over night.
***
process id 104159 is
/usr/bin/python /usr/bin/hg clone ssh://anon@hg.opensolaris.org/hg/onnv/onnv-ga
, parent process is 104133
uid is 0x3e8 0t1000, gid is 0x3e8 0t1000
thread addr 3005131ed60, proc addr 300096658a8, lwp addr 3004754c0e8
t_state is 0x1 - TS_SLEEP
Scheduling info:
t_pri is 0x3b, t_epri is 0, t_cid is 0x1
scheduling class is: TS
t_disp_time: is 0xd0fd1c, 0t13696284
last ran: 10 hours 23 mins 0 secs ago, 10 hours 21 mins 47 secs before panic
on cpu 0
pc is 1104ba8, sp is 2a102ec9440, t_stk 2a102ec9ae0
stack trace is:
unix: swtch ()
genunix: cv_wait+0x5c (0x300492dde18,0x300492dde10)
zfs: zio_wait+0x54 (0x300492ddb88)
zfs: dmu_buf_hold_array_by_dnode+0x1c0 (0x3004a16b7f8,0,0x154,1,0x7ae76a35,
0x2a102ec97fc,0x2a102ec97f0)
zfs: dmu_buf_hold_array+0x60 (0x300082ed2f0,?,0,0x154,1,0x7ae76a35,0x2a102ec97fc
,0x2a102ec97f0)
zfs: dmu_read_uio+0x30 (0x300082ed2f0,*0x30049cc2070,0x2a102ec9a10,0x154)
zfs: zfs_read+0x1e8 (0x30023cecf80,0x2a102ec9a10,?,?,?)
genunix: fop_read+0x48 (0x30023cecf80,0x2a102ec9a10,0,0x300039600e8,0)
genunix: read+0x1fc (3,?,0x200?)
unix: syscall_trap32+0x1e8 ()
The system is an dual proc Ultra 2 w/ 768 MB RAM. I've done very
similar things on snv_76 and earlier. The key difference is that I
had previously been running an unmirrored zpool. Now it looks like...
# zpool status
pool: pool0
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on older software versions.
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
pool0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c0t1d0s7 ONLINE 0 0 0
c0t0d0s7 ONLINE 0 0 0
errors: No known data errors
I'll keep the crash dump around for a while in the event that someone
has interest in digging into it more.
--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic