[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freebsd-fs
Subject:    zfs: repeatable assertion failure in zap_deref_leaf on 14-CURRENT
From:       Lawrence Stewart <lstewart () freebsd ! org>
Date:       2021-10-04 4:06:44
Message-ID: 91c2e8a5-3e43-e295-b388-a21e58105bf2 () freebsd ! org
[Download RAW message or body]

Greetings all,

I'm able to reliably trigger a panic with ZFS implicated in the backtrace. This is on \
a laptop with a very simple zpool built on top of a single GELI gpart partition on a \
single nvme disk. A scrub has not found anything of concern.

I encounter the panic when I start Chromium and it begins restoring lots of previous \
windows/tabs. I can also trigger by simply running an ls or find on \
~/.cache/chromium/Cache so presumably something about the contents of that directory \
have been corrupted in an interesting way which is upsetting ZFS. Moving the \
directory out of the way allows me to start Chromium without issue.

I'm currently running world + kernel built a few days ago from Git main 335c4f8edb3a \
kernconf=GENERIC amd64 with a fully up to date package set and drm-kmod-current built \
from ports against my kernel.

I captured a core earlier today and have it around for any spelunking that would be \
helpful to anyone who may be interested. Ignoring the drm-related parts of the \
backtrace which happens after the ZFS panic, frame 36 looks like it's the cause of \
the crash - namely this assertion in zap.c:

        ASSERT(err ||
            ZAP_HASH_IDX(h, zap_leaf_phys(*lp)->l_hdr.lh_prefix_len) ==
            zap_leaf_phys(*lp)->l_hdr.lh_prefix);

Below I've included some output from a kgdb session against the core I captured which \
I hope may be a useful starting point. As noted earlier I can easily move the Cache \
dir out of the way so this is not a blocking issue, but I've kept the Cache dir in \
case anyone is interested in debugging.

Cheers,
Lawrence


Full backtrace:

(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=175153152) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff804cfece in db_fncall_generic (addr=<optimized out>, rv=<optimized \
out>, nargs=0, args=<optimized out>)  at /usr/src/sys/ddb/db_command.c:610
#3  db_fncall (dummy1=<optimized out>, dummy2=<optimized out>, dummy3=<optimized \
out>, dummy4=<optimized out>)  at /usr/src/sys/ddb/db_command.c:658
#4  0xffffffff804cfad2 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized \
out>, dopager=dopager@entry=1)  at /usr/src/sys/ddb/db_command.c:482
#5  0xffffffff804cf74d in db_command_loop () at /usr/src/sys/ddb/db_command.c:535
#6  0xffffffff804d2ec6 in db_trap (type=<optimized out>, code=<optimized out>) at \
/usr/src/sys/ddb/db_main.c:270 #7  0xffffffff80c75487 in kdb_trap (type=type@entry=3, \
code=code@entry=0, tf=tf@entry=0xfffffe01531a6f80)  at \
/usr/src/sys/kern/subr_kdb.c:733 #8  0xffffffff810eedd3 in trap \
(frame=0xfffffe01531a6f80) at /usr/src/sys/amd64/amd64/trap.c:609 #9  <signal handler \
called> #10 kdb_enter (why=0xffffffff812e9008 "panic", msg=<optimized out>) at \
/usr/src/sys/kern/subr_kdb.c:506 #11 0xffffffff80c277e8 in vpanic \
(fmt=0xffffffff811f49e6 "running but not TDS_RUNNING", ap=<optimized out>,   \
ap@entry=0xfffffe01531a70f0) at /usr/src/sys/kern/kern_shutdown.c:908 #12 \
0xffffffff80c27573 in panic (fmt=0xffffffff81e9e1e0 <cnputs_mtx> \
"\270\\*\201\377\377\377\377")  at /usr/src/sys/kern/kern_shutdown.c:844
#13 0xffffffff80c856f7 in sleepq_switch (wchan=wchan@entry=0xfffff800063b0888, \
pri=pri@entry=0)  at /usr/src/sys/kern/subr_sleepqueue.c:609
#14 0xffffffff80c85b2b in sleepq_timedwait (wchan=wchan@entry=0xfffff800063b0888, \
pri=pri@entry=0)  at /usr/src/sys/kern/subr_sleepqueue.c:690
#15 0xffffffff80e9ae5f in linux_add_to_sleepqueue (wchan=0xfffff800063b0888, \
task=task@entry=0xfffff800063b0888,   wmesg=<optimized out>, \
timeout=timeout@entry=11, state=state@entry=2)  at \
/usr/src/sys/compat/linuxkpi/common/src/linux_schedule.c:70 #16 0xffffffff80e9acb6 in \
linux_wait_event_common (wqh=<optimized out>, wq=<optimized out>, timeout=11, \
state=2, lock=0x0)  at /usr/src/sys/compat/linuxkpi/common/src/linux_schedule.c:281
#17 0xffffffff839b4960 in intel_dp_aux_xfer () from /boot/modules/i915kms.ko
#18 0xffffffff839b4269 in intel_dp_aux_transfer () from /boot/modules/i915kms.ko
#19 0xffffffff83a74b79 in drm_dp_dpcd_access () from /boot/modules/drm.ko
#20 0xffffffff83a74c27 in drm_dp_dpcd_write () from /boot/modules/drm.ko
#21 0xffffffff839acb62 in intel_dp_sink_dpms () from /boot/modules/i915kms.ko
#22 0xffffffff839a8d76 in intel_ddi_post_disable () from /boot/modules/i915kms.ko
#23 0xffffffff8393fe7b in haswell_crtc_disable () from /boot/modules/i915kms.ko
#24 0xffffffff83955cdc in intel_old_crtc_state_disables () from \
/boot/modules/i915kms.ko #25 0xffffffff839543c5 in intel_atomic_commit_tail () from \
/boot/modules/i915kms.ko #26 0xffffffff839507da in intel_atomic_commit () from \
/boot/modules/i915kms.ko #27 0xffffffff83a6d378 in drm_client_modeset_commit_atomic \
() from /boot/modules/drm.ko #28 0xffffffff83a6d0d6 in \
drm_client_modeset_commit_force () from /boot/modules/drm.ko #29 0xffffffff83aae50a \
in drm_fb_helper_restore_fbdev_mode_unlocked () from /boot/modules/drm.ko #30 \
0xffffffff83aa8397 in vt_kms_postswitch () from /boot/modules/drm.ko #31 \
0xffffffff80a79f8f in vt_window_switch (vw=0xfffffe014eee7928, \
vw@entry=0xffffffff81aea0f8 <vt_conswindow>)  at /usr/src/sys/dev/vt/vt_core.c:620
#32 0xffffffff80a773af in vtterm_cngrab (tm=<optimized out>) at \
/usr/src/sys/dev/vt/vt_core.c:1832 #33 0xffffffff80bbba56 in cngrab () at \
/usr/src/sys/kern/kern_cons.c:383 #34 0xffffffff80c27720 in vpanic \
(fmt=0xffffffff82835173 "%s", ap=ap@entry=0xfffffe01531a7930)  at \
/usr/src/sys/kern/kern_shutdown.c:892 #35 0xffffffff8258361a in spl_panic \
(file=<optimized out>, func=<optimized out>, line=-2127879961,   \
fmt=0xffffffff81e9e1e0 <cnputs_mtx> "\270\\*\201\377\377\377\377")  at \
/usr/src/sys/contrib/openzfs/module/os/freebsd/spl/spl_misc.c:107 #36 \
0xffffffff82749312 in zap_deref_leaf (zap=zap@entry=0xfffff8035a34d600, h=<optimized \
out>, tx=tx@entry=0x0,   lt=lt@entry=RW_READER, lp=lp@entry=0xfffffe01531a7ab0) at \
/usr/src/sys/contrib/openzfs/module/zfs/zap.c:623 #37 0xffffffff8274b9a7 in \
fzap_cursor_retrieve (zap=0xffffffff81e9e1e0 <cnputs_mtx>, \
zc=zc@entry=0xfffffe01531a7aa0,   za=za@entry=0xfffffe01531a7b78) at \
/usr/src/sys/contrib/openzfs/module/zfs/zap.c:1249 #38 0xffffffff82753fde in \
zap_cursor_retrieve (zc=zc@entry=0xfffffe01531a7aa0, za=za@entry=0xfffffe01531a7b78)  \
at /usr/src/sys/contrib/openzfs/module/zfs/zap_micro.c:1583 #39 0xffffffff825a8a43 in \
zfs_readdir (vp=<optimized out>, uio=0xfffffe01531a7b60, cr=<optimized out>,   \
eofp=0xfffffe01531a7d04, ncookies=<optimized out>, cookies=0x0)  at \
/usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:1814 #40 \
zfs_freebsd_readdir (ap=<optimized out>) at \
/usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:4729 #41 \
0xffffffff811b8004 in VOP_READDIR_APV (vop=0xffffffff828f3360 <zfs_vnodeops>, \
a=a@entry=0xfffffe01531a7d08)  at vnode_if.c:1939
#42 0xffffffff80d2a919 in VOP_READDIR (vp=0xfffff8035a1d5540, uio=0xfffffe01531a7d48, \
cred=<optimized out>,   eofflag=0xfffffe01531a7d04, ncookies=0x0, cookies=0x0) at \
./vnode_if.h:985 #43 kern_getdirentries (td=<optimized out>, fd=<optimized out>, 
    buf=0x814a05000 <error: Cannot access memory at address 0x814a05000>, count=4096, \
basep=basep@entry=0xfffffe01531a7df0,   residp=residp@entry=0x0, \
bufseg=UIO_USERSPACE) at /usr/src/sys/kern/vfs_syscalls.c:4193 #44 0xffffffff80d2ab19 \
                in sys_getdirentries (td=0xffffffff81e9e1e0 <cnputs_mtx>, \
                uap=0xfffffe0153346b30)
--Type <RET> for more, q to quit, c to continue without paging--
    at /usr/src/sys/kern/vfs_syscalls.c:4140
#45 0xffffffff810efbae in syscallenter (td=<optimized out>) at \
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189 #46 amd64_syscall \
(td=0xfffffe0153346740, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1191 #47 <signal \
handler called> #48 0x000000080f1d79fa in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfdfc418

(kgdb) frame 36
#36 0xffffffff82749312 in zap_deref_leaf (zap=zap@entry=0xfffff8035a34d600, \
h=<optimized out>, tx=tx@entry=0x0,   lt=lt@entry=RW_READER, \
lp=lp@entry=0xfffffe01531a7ab0) at /usr/src/sys/contrib/openzfs/module/zfs/zap.c:623 \
623		ASSERT(err ||

(kgdb) p err
$10 = <optimized out>

(kgdb) p h
$11 = <optimized out>

(kgdb) p **lp
$13 = {l_dbu = {dbu_tqent = {tqent_task = {ta_link = {stqe_next = 0x0}, ta_pending = \
                0, ta_priority = 0 '\000', 
        ta_flags = 0 '\000', ta_func = 0x0, ta_context = 0x0}, tqent_timeout_task = \
                {q = 0x0, t = {ta_link = {
            stqe_next = 0x0}, ta_pending = 0, ta_priority = 0 '\000', ta_flags = 0 \
                '\000', ta_func = 0x0, 
          ta_context = 0x0}, c = {c_links = {le = {le_next = 0x0, le_prev = 0x0}, sle \
                = {sle_next = 0x0}, tqe = {
              tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_precision = 0, c_arg = \
                0x0, c_func = 0x0, c_lock = 0x0, 
          c_flags = 0, c_iflags = 0, c_cpu = 0}, f = 0}, tqent_func = 0x0, tqent_arg \
                = 0x0, tqent_id = 0, tqent_hash = {
        cle_next = 0x0, cle_prev = 0x0}, tqent_type = 0 '\000', tqent_registered = 0 \
                '\000', tqent_cancelled = 0 '\000', 
      tqent_rc = 0}, dbu_evict_func_sync = 0xffffffff8274c5d0 <zap_leaf_evict_sync>, \
dbu_evict_func_async = 0x0,   dbu_clear_on_evict_dbufp = 0xfffff8035d29ca08}, \
                l_rwlock = {lock_object = {
      lo_name = 0xffffffff82865b9f <.L.str.25+1> "l->l_rwlock", lo_flags = 40960000, \
lo_data = 0, lo_witness = 0x0},   sx_lock = 33}, l_blkid = 182, l_bs = 14, l_dbuf = \
0xfffff8035d291b28}

(kgdb) info register
rax            0x12                18
rbx            0x0                 0
rcx            0x1                 1
rdx            0xffffffff812b20e7  -2127879961
rsi            0x80                128
rdi            0xffffffff81e9e1e0  -2115378720
rbp            0xfffffe01531a79a0  0xfffffe01531a79a0
rsp            0xfffffe01531a7960  0xfffffe01531a7960
r8             0x0                 0
r9             0x1c200001          471859201
r10            0xffffffff81aea0f8  -2119261960
r11            0x20                32
r12            0x2                 2
r13            0x0                 0
r14            0xfffff8035a34d600  -8781694708224
r15            0x42                66
rip            0xffffffff82749312  0xffffffff82749312 <zap_deref_leaf+242>
eflags         0x86                [ PF SF ]
cs             0x20                32
ss             0x28                40
ds             <unavailable>
es             <unavailable>
fs             <unavailable>
gs             <unavailable>
fs_base        <unavailable>
gs_base        <unavailable>

(kgdb) disassemble zap_deref_leaf
Dump of assembler code for function zap_deref_leaf:
   0xffffffff82749220 <+0>:	push   %rbp
   0xffffffff82749221 <+1>:	mov    %rsp,%rbp
   0xffffffff82749224 <+4>:	push   %r15
   0xffffffff82749226 <+6>:	push   %r14
   0xffffffff82749228 <+8>:	push   %r13
   0xffffffff8274922a <+10>:	push   %r12
   0xffffffff8274922c <+12>:	push   %rbx
   0xffffffff8274922d <+13>:	sub    $0x18,%rsp
   0xffffffff82749231 <+17>:	mov    %ecx,%r12d
   0xffffffff82749234 <+20>:	mov    0xe8(%rdi),%rax
   0xffffffff8274923b <+27>:	mov    0x18(%rax),%rax
   0xffffffff8274923f <+31>:	movabs $0x8000000000000001,%rcx
   0xffffffff82749249 <+41>:	cmp    %rcx,(%rax)
   0xffffffff8274924c <+44>:	jg     0xffffffff82749314 <zap_deref_leaf+244>
   0xffffffff82749252 <+50>:	movabs $0x2f52ab2ab,%rcx
   0xffffffff8274925c <+60>:	cmp    %rcx,0x8(%rax)
   0xffffffff82749260 <+64>:	jne    0xffffffff82749314 <zap_deref_leaf+244>
   0xffffffff82749266 <+70>:	mov    %rdx,%r13
   0xffffffff82749269 <+73>:	mov    %rsi,%r15
   0xffffffff8274926c <+76>:	mov    %rdi,%r14
   0xffffffff8274926f <+79>:	mov    %r8,-0x30(%rbp)
   0xffffffff82749273 <+83>:	mov    0x20(%rax),%rax
   0xffffffff82749277 <+87>:	mov    %eax,%ecx
   0xffffffff82749279 <+89>:	neg    %cl
   0xffffffff8274927b <+91>:	shr    %cl,%rsi
   0xffffffff8274927e <+94>:	test   %rax,%rax
   0xffffffff82749281 <+97>:	cmove  %rax,%rsi
   0xffffffff82749285 <+101>:	lea    -0x38(%rbp),%rdx
   0xffffffff82749289 <+105>:	call   0xffffffff8274aae0 <zap_idx_to_blk>
   0xffffffff8274928e <+110>:	mov    %eax,%ebx
   0xffffffff82749290 <+112>:	test   %eax,%eax
   0xffffffff82749292 <+114>:	jne    0xffffffff8274933a <zap_deref_leaf+282>
   0xffffffff82749298 <+120>:	mov    -0x38(%rbp),%rsi
   0xffffffff8274929c <+124>:	mov    %r14,%rdi
   0xffffffff8274929f <+127>:	mov    %r13,%rdx
   0xffffffff827492a2 <+130>:	mov    %r12d,%ecx
   0xffffffff827492a5 <+133>:	mov    -0x30(%rbp),%r8
   0xffffffff827492a9 <+137>:	call   0xffffffff8274bf60 <zap_get_leaf_byblk>
   0xffffffff827492ae <+142>:	mov    %eax,%ebx
   0xffffffff827492b0 <+144>:	test   %eax,%eax
   0xffffffff827492b2 <+146>:	jne    0xffffffff8274933a <zap_deref_leaf+282>
   0xffffffff827492b8 <+152>:	mov    -0x30(%rbp),%rax
   0xffffffff827492bc <+156>:	mov    (%rax),%rax
   0xffffffff827492bf <+159>:	mov    0x108(%rax),%rax
   0xffffffff827492c6 <+166>:	mov    0x18(%rax),%rax
   0xffffffff827492ca <+170>:	mov    0x20(%rax),%edx
   0xffffffff827492cd <+173>:	mov    %edx,%ecx
   0xffffffff827492cf <+175>:	neg    %cl
   0xffffffff827492d1 <+177>:	shr    %cl,%r15
   0xffffffff827492d4 <+180>:	xor    %ebx,%ebx
   0xffffffff827492d6 <+182>:	test   %dx,%dx
   0xffffffff827492d9 <+185>:	mov    $0x0,%ecx
   0xffffffff827492de <+190>:	cmovne %r15,%rcx
   0xffffffff827492e2 <+194>:	cmp    0x10(%rax),%rcx
   0xffffffff827492e6 <+198>:	je     0xffffffff8274933a <zap_deref_leaf+282>
   0xffffffff827492e8 <+200>:	xor    %ebx,%ebx
   0xffffffff827492ea <+202>:	mov    $0xffffffff82869886,%rdi
   0xffffffff827492f1 <+209>:	mov    $0xffffffff8282e438,%rsi
   0xffffffff827492f8 <+216>:	mov    $0x271,%edx
   0xffffffff827492fd <+221>:	mov    $0xffffffff82835173,%rcx
   0xffffffff82749304 <+228>:	mov    $0xffffffff8283164f,%r8
   0xffffffff8274930b <+235>:	xor    %eax,%eax
   0xffffffff8274930d <+237>:	call   0xffffffff825835e0 <spl_panic>
=> 0xffffffff82749312 <+242>:	jmp    0xffffffff8274933a <zap_deref_leaf+282>
   0xffffffff82749314 <+244>:	mov    0x1a5f4d(%rip),%rdi        # 0xffffffff828ef268 \
<sdt_sdt___set__error+72>  0xffffffff8274931b <+251>:	mov    $0x5,%ebx
   0xffffffff82749320 <+256>:	test   %rdi,%rdi
   0xffffffff82749323 <+259>:	je     0xffffffff8274933a <zap_deref_leaf+282>
--Type <RET> for more, q to quit, c to continue without paging--
   0xffffffff82749325 <+261>:	mov    $0x5,%esi
   0xffffffff8274932a <+266>:	xor    %edx,%edx
   0xffffffff8274932c <+268>:	xor    %ecx,%ecx
   0xffffffff8274932e <+270>:	xor    %r8d,%r8d
   0xffffffff82749331 <+273>:	xor    %r9d,%r9d
   0xffffffff82749334 <+276>:	call   *-0xc364d2(%rip)        # 0xffffffff81b12e68 \
<sdt_probe_func>  0xffffffff8274933a <+282>:	mov    %ebx,%eax
   0xffffffff8274933c <+284>:	add    $0x18,%rsp
   0xffffffff82749340 <+288>:	pop    %rbx
   0xffffffff82749341 <+289>:	pop    %r12
   0xffffffff82749343 <+291>:	pop    %r13
   0xffffffff82749345 <+293>:	pop    %r14
   0xffffffff82749347 <+295>:	pop    %r15
   0xffffffff82749349 <+297>:	pop    %rbp
   0xffffffff8274934a <+298>:	ret    
End of assembler dump.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic