[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freebsd-arm
Subject:    Re: I got a panic for "nvme0: cpl does not map to outstanding cmd" on a MACHIATObin Double Shot
From:       Mark Millard via freebsd-current <freebsd-current () freebsd ! org>
Date:       2021-05-29 20:25:04
Message-ID: 4515C859-98C7-4FE2-AFBA-08CFF28D0774 () yahoo ! com
[Download RAW message or body]



On 2021-May-29, at 01:15, Mark Millard <marklmi at yahoo.com> wrote:

> On 2021-May-23, at 00:46, Mark Millard via freebsd-current <freebsd-current at \
> freebsd.org> wrote: 
> > On 2021-May-23, at 00:08, Mark Millard via freebsd-current <freebsd-current at \
> > freebsd.org> wrote: 
> > > On 2021-May-22, at 22:16, Warner Losh <imp at bsdimp.com> wrote:
> > > 
> > > > On Sat, May 22, 2021 at 10:44 PM Mark Millard via freebsd-arm \
> > > > <freebsd-arm@freebsd.org> wrote: # mount -onoatime 192.168.1.187:/usr/ports/ \
> > > > /mnt/ # diff -r /usr/ports/ /mnt/ | more
> > > > nvme0: cpl does not map to outstanding cmd
> > > > cdw0:00000000 sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0
> > > > panic: received completion for unknown cmd
> > > > 
> > > > cid 0x7e has no currently active command. The cid is used by the driver
> > > > to map completions back to requests.
> > > > 
> > > > So, there's usually 3 possibilities that I've seen this with.
> > > > 
> > > > (1) There's a missing cache flush so you get a bogus cpl back because \
> > > > something stale was read. It's unlikely to be this one because the rest of \
> > > > this look like a successful command completed: sc = 0 is successful \
> > > > completion and sct is a generic command queued. 
> > > > (2) We're looking at the completion record twice because we failed to \
> > > > properly update the head pointer and we've already completed the command. \
> > > > I've only ever seen this in a panic situation where we interrupt the \
> > > > completion routine because something else paniced.
> > > > 
> > > > (3) There's something that's corrupting the act_tr array in the qpair. I've \
> > > > not seen this, but if something else smashes that area (zeroing it in this \
> > > > case), then that could cause an error like this.
> > > 
> > > Of note may be that I buildworld and buildkernel with extra
> > > tuning enabled, targeting the cortex-a72. In one past example
> > > this lead to finding a missing synchronization related to XHCI
> > > handling that was fixed. (The fix was not aarch64 specific at
> > > all.) For that: A cortex-a53 did not show the problem with or
> > > without that tuning. A cortex-a72 showed the problem only with
> > > the cortex-a72 tuning, not with targeting a cortex-a53 tuning
> > > or generic armv7, for example.
> > > 
> > > Not that I've any evidence specifically suggesting such would
> > > be involved here. But it might be good to keep in mind as a
> > > possaibility.
> > > 
> > > > Or it could be something new I've not seen nor thought about before.
> > > > 
> > > > cpuid = 3
> > > > time = 1621743752
> > > > KDB: stack backtrace:
> > > > db_trace_self() at db_trace_self
> > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> > > > vpanic() at vpanic+0x188
> > > > panic() at panic+0x44
> > > > nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc
> > > > nvme_timeout() at nvme_timeout+0x3c
> > > > softclock_call_cc() at softclock_call_cc+0x124
> > > > softclock() at softclock+0x60
> > > > ithread_loop() at ithread_loop+0x2a8
> > > > fork_exit() at fork_exit+0x74
> > > > fork_trampoline() at fork_trampoline+0x14
> > > > KDB: enter: panic
> > > > [ thread pid 12 tid 100028 ]
> > > > Stopped at      kdb_enter+0x48: undefined       f904411f
> > > > db> 
> > > > 
> > > > Based on the "nvme" references, I expect this is tied to
> > > > handling the Optane 480 GiByte that is in the PCIe slot
> > > > and is the boot/only media for the machine doing the diff.
> > > > 
> > > > "db> dump" seems to have worked.
> > > > 
> > > > After the reboot, zpool scrub found no errors.
> > > > 
> > > > For reference:
> > > > 
> > > > # uname -apKU
> > > > FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 \
> > > > main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021     \
> > > > root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72 \
> > > > arm64 aarch64 1400013 1400013 
> > > > If you have the dump, I suggest starting to make sure that the act_tr array \
> > > > looks sane. Make sure all the live pointers point to a sane looking tr. Make \
> > > > sure that tr is on the active list, etc 
> > > > It will take a fair amount of driver reading, though, to see how we got here. \
> > > > I'd also check to make sure that qpair->num_enttries > cpl.cid (0x3e in this \
> > > > case). 
> > > 
> > > Okay. I got this while trying to test an odd diff -r over NFS
> > > issue with the more recent software. So the two will potentially
> > > compete for time.
> > > 
> > > As investigation will be exploratory for me, not familiar, I'll
> > > probably publish periodic notes on things as I go along looking
> > > at stuff.
> > > 
> > > My first is that the /var/crash/core.txt.0 has a gdb backtrace:
> > > 
> > > . . .
> > > #10 0xffff00000047900c in panic (
> > > fmt=0x12 <error: Cannot access memory at address 0x12>)
> > > at /usr/main-src/sys/kern/kern_shutdown.c:843
> > > #11 0xffff0000002226b4 in nvme_qpair_process_completions (
> > > qpair=qpair@entry=0xffffa00008724300)
> > > at /usr/main-src/sys/dev/nvme/nvme_qpair.c:617
> > > #12 0xffff000000223354 in nvme_timeout (arg=arg@entry=0xffffa0000b053980)
> > > at /usr/main-src/sys/dev/nvme/nvme_qpair.c:938
> > > #13 0xffff000000495bf8 in softclock_call_cc (c=0xffffa0000b0539a0, 
> > > cc=cc@entry=0xffff000000de3500 <cc_cpu+768>, direct=0)
> > > at /usr/main-src/sys/kern/kern_timeout.c:696
> > > #14 0xffff000000495fb0 in softclock (arg=0xffff000000de3500 <cc_cpu+768>)
> > > at /usr/main-src/sys/kern/kern_timeout.c:816
> > > #15 0xffff0000004356dc in intr_event_execute_handlers (p=<optimized out>, 
> > > ie=0xffffa000058bc700) at /usr/main-src/sys/kern/kern_intr.c:1168
> > > #16 ithread_execute_handlers (p=<optimized out>, ie=0xffffa000058bc700)
> > > at /usr/main-src/sys/kern/kern_intr.c:1181
> > > #17 ithread_loop (arg=<optimized out>, arg@entry=0xffffa000058aef60)
> > > at /usr/main-src/sys/kern/kern_intr.c:1269
> > > #18 0xffff000000431f6c in fork_exit (
> > > callout=0xffff000000435430 <ithread_loop>, arg=0xffffa000058aef60, 
> > > frame=0xffff0000eb7cc990) at /usr/main-src/sys/kern/kern_fork.c:1083
> > > #19 <signal handler called>
> > > 
> > > So via kgdb . . .
> > > 
> > > (kgdb) up 11
> > > #11 0xffff0000002226b4 in nvme_qpair_process_completions \
> > > (qpair=qpair@entry=0xffffa00008724300) at \
> > > /usr/main-src/sys/dev/nvme/nvme_qpair.c:617 617				KASSERT(0, ("received \
> > > completion for unknown cmd")); 
> > > (kgdb) print/x cpl.cid
> > > $4 = 0x7e
> > > (kgdb) print/x qpair->num_entries
> > > $5 = 0x100
> > > 
> > > Based on also seeing the code:
> > > 
> > > qpair->act_tr = malloc_domainset(sizeof(struct nvme_tracker *) *
> > > qpair->num_entries, M_NVME, DOMAINSET_PREF(qpair->domain),
> > > M_ZERO | M_WAITOK);
> > > 
> > > (kgdb) print qpair->act_tr
> > > $6 = (struct nvme_tracker **) 0xffffa00008725800
> > > (kgdb) x/256g 0xffffa00008725800
> > > 0xffffa00008725800:	0x0000000000000000	0x0000000000000000
> > > 0xffffa00008725810:	0x0000000000000000	0x0000000000000000
> > > . . .
> > > 0xffffa00008725fe0:	0x0000000000000000	0x0000000000000000
> > > 0xffffa00008725ff0:	0x0000000000000000	0x0000000000000000
> > > 
> > > It was all zeros (null pointers). No "live" pointers and, so,
> > > no tr's to inspect.
> > > 
> > > As none of this is familiar context beyond general programming
> > > concepts, it may be some time before I find anything else
> > > potentially of interest to report. If you have other specific
> > > things you would like me to look at, let me know.
> > > 
> > 
> > A fairly obvious thing I should have provided:
> > 
> > (kgdb) print/x *qpair
> > $15 = {ctrlr = 0xffff0000fe154000, id = 0x3, domain = 0x0, cpu = 0x2, vector = \
> > 0x3, rid = 0x4, res = 0xffffa000086ded80, tag = 0xffffa0000877b780, num_entries = \
> > 0x100, num_trackers = 0x80,  sq_tdbl_off = 0x1018, cq_hdbl_off = 0x101c, phase = \
> > 0x1, sq_head = 0x1f, sq_tail = 0x20, cq_head = 0x20, num_cmds = 0x420, \
> > num_intr_handler_calls = 0xe66c, num_retries = 0x0, num_failures = 0x0,  cmd = \
> > 0xffff000100ebb000, cpl = 0xffff000100ebf000, dma_tag = 0xffffa0000b093e00, \
> > dma_tag_payload = 0xffffa000059ef000, queuemem_map = 0xffffa00005a07700, \
> > cmd_bus_addr = 0xacbb000,  cpl_bus_addr = 0xacbf000, free_tr = {tqh_first = \
> > 0xffffa0000b053a80, tqh_last = 0xffffa0000869da80}, outstanding_tr = {tqh_first = \
> > 0xffffa0000b053980, tqh_last = 0xffffa0000b053980}, queued_req = { stqh_first = \
> > 0x0, stqh_last = 0xffffa000087243c8}, act_tr = 0xffffa00008725800, is_enabled = \
> > 0x1, lock = {lock_object = {lo_name = 0xffff00000090321f, lo_flags = 0x1030000, \
> > lo_data = 0x0,  lo_witness = 0xffffa0043fd96080}, mtx_lock = 0x0}}
> > 
> > Looks like I need to boot into the non-debug builds for the
> > other problem I'm testing for repeatability after a commit.
> 
> 
> I've no figured out anything interesting so far.
> 
> But I've run into something that looks odd to me
> (not that I've any evidence it is related to the
> panic, more likely my ignorance):
> 
> There is a use of atomic_store_rel_int(&qpair->cq_head, 0)
> for which I do not find any matching atomic_load_acq_int
> use (or other explicit _acq), so so there is no "synchronizes
> with" status in the code to establish an ordering across
> threads that involves the atomic_store_rel_int that I've
> found, just implicit/default relaxed-load-operations. A grep
> for "cq_head" under dev/nvme/ shows only:
> 
> ./dev/nvme/nvme_private.h:      uint32_t                cq_head;
> ./dev/nvme/nvme_sysctl.c:       SYSCTL_ADD_UINT(ctrlr_ctx, que_list, OID_AUTO, \
>                 "cq_head",
> ./dev/nvme/nvme_sysctl.c:           CTLFLAG_RD, &qpair->cq_head, 0,
> ./dev/nvme/nvme_qpair.c:         * below, but before we can reset cq_head to zero \
>                 at 2. Also cope with
> ./dev/nvme/nvme_qpair.c:                if (qpair->cq_head == qpair->num_entries) {
> ./dev/nvme/nvme_qpair.c:                         * Here we know that we need to \
>                 zero cq_head and then negate
> ./dev/nvme/nvme_qpair.c:                         * the phase, which hasn't been \
>                 assigned if cq_head isn't
> ./dev/nvme/nvme_qpair.c:                        qpair->cq_head = 0;
> ./dev/nvme/nvme_qpair.c:                } else if (qpair->cq_head == 0) {
> ./dev/nvme/nvme_qpair.c:                cpl = qpair->cpl[qpair->cq_head];
> ./dev/nvme/nvme_qpair.c:                         * qpair->cq_head at 1 below.  \
>                 Later, we re-enter this
> ./dev/nvme/nvme_qpair.c:                         * won't have updated cq_head. \
>                 Rather than panic again,
> ./dev/nvme/nvme_qpair.c:                        \
>                 nvme_dump_completion(&qpair->cpl[qpair->cq_head]);
> ./dev/nvme/nvme_qpair.c:                if (++qpair->cq_head == qpair->num_entries) \
>                 {           /* 1 */
> ./dev/nvme/nvme_qpair.c:                        \
>                 atomic_store_rel_int(&qpair->cq_head, 0);       /* 2 */
> ./dev/nvme/nvme_qpair.c:                    qpair->cq_hdbl_off, qpair->cq_head);
> ./dev/nvme/nvme_qpair.c:        qpair->sq_head = qpair->sq_tail = qpair->cq_head = \
> 0; 
> (2 lines above the last has the atomic_store_rel_int use.)
> 
> atomic_thread_fence_rel use would have "synchronizes
> with" based on ordinary loads reading something stored
> after the atomic_thread_fence_rel. Such is documented
> in "man atomic". But that is not what this code is doing.
> "man atomic" does not mention ordinary loads getting such
> a status by reading what an atomic_store_rel_int wrote.
> It does reference the atomic_thread_fence_rel related
> status for ordinary loads.
> 
> So I'm clueless about what is intended to be going on
> relative to that "atomic_store_rel_int(&qpair->cq_head, 0)".
> Overall, the code does not appear to me to match up with the
> aarch64, powerpc64, or powerpc code generation requirements
> to have any matching "synchronizes with" relationships.
> (I'll not list machine instruction sequences here. I'd have
> to look up the detailed sequences. But, as I remember, more
> than a load is involved in the code sequences for the
> acquire side on these types of processors. Nothing in the
> source indicates to generate the additional code as far as
> I can tell.)

Adding notes about the code generation, using fairly
weak sequences (these are for C++/C11 semantics as
I understand):

(Picking prefix-form for powerpc:)
powerpc Load  Relaxed vs. Acquire: ld vs. ld;cmp;bc;isync
powerpc Fence:            Acquire: lwsync
powerpc Store Relaxed vs. Release: st vs. "Fence: Release";st
powerpc Fence:            Release: lwsync

Note: I am unsure how the above mixes with FreeBSD's
plain-load vs. release-fence rules for atomic, for
example. I've never seen a instruction level breakdowns
of what implements FreeBSD's specific rules for
powerpc64.

aarch64 Load  Relaxed vs. Acquire: LDR vs. LDAR
aarch64 Fence:            Acquire: DMB ISH LD
aarch64 Store Relaxed vs. Release: STR vs. STLR
aarch64 Fence:            Release: DMB ISH

Note: Again I am unsure of the pain-load vs.
acquire-fence rule for FreeBSD. And, here,
similarly for the plain-store vs. release-fence
rule. I've never seen a instruction level breakdowns
of what implements FreeBSD's specific rules for
aarch64.

armv7   Load  Relaxed vs. Acquire: ldr vs. ldr;"Fence: Acquire" (or ldr;teq;beq;isb)
armv7   Fence:            Acquire: dmb ish
armv7   Store Relaxed vs. Release: str vs. "Fence: Release";str
armv7   Fence:            Release: dmb ish

Note: The above fits with FreeBSD's plain-load vs.
release-fence and plain-stores vs. acquire-fence rules
because the fences are used to form Store-Release and
Load-Acquire.

(If I gather correctly FreeBSD does not support aarch32 for its
distinctions from armv7, for example the aarch32's alternative
SeqCst mapping that does not mix well with the armv7 mapping.
Still I list aarch32 for reference.)

aarch32 Load  Relaxed vs. Acquire: LDR vs. LDA
aarch32 Fence:            Acquire: DMB ISH LD
aarch32 Store Relaxed vs. Release: STR vs. STL
aarch32 Fence:            Release: DMB ISH

Note: Again I would be unsure of the pain-load
vs. acquire-fence rule for FreeBSD. Similarly
for the plain-store vs. release-fence rule.

I got these code sequences from:

https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

I was unable to use the code sequences to figure out
the intended operation of the:

atomic_store_rel_int(&qpair->cq_head, 0)

absent any atomic_load_acq_int to match --and everything
else being based on just plain load/store for cq_head .

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic