[prev in list] [next in list] [prev in thread] [next in thread]
List: openbsd-tech
Subject: ahci ncq error recovery
From: Jonathan Matthew <jonathan () d14n ! org>
Date: 2017-02-27 23:16:34
Message-ID: 20170227231634.GA1218 () mild ! embarrassm ! net
[Download RAW message or body]
Various people have reported seeing kernel diagnostic assertion
"ccb->ccb_xa.state == ATA_S_ONCHIP" panics with ahci. In short, this happens
when a queued command fails, we ask the device which command fails, and it
gives us the wrong answer. The ccb_xa.state assertion fails if the command
was not active.
For non-queued commands, we handle this by failing all active commands (since
r1.157 in 2010), and every other driver I've looked at does this too for both
queued and non-queued commands, so I think it makes sense to handle queued
command errors the same way.
This came out of the most recent thread about this on bugs@, where it seems
to have made a slight improvement, and has been in snaps for over a week,
so I think it should go in.
ok?
Index: ahci.c
===================================================================
RCS file: /cvs/src/sys/dev/ic/ahci.c,v
retrieving revision 1.28
diff -u -p -u -p -r1.28 ahci.c
--- ahci.c 2 Oct 2016 18:56:05 -0000 1.28
+++ ahci.c 27 Feb 2017 07:10:40 -0000
@@ -2158,6 +2158,12 @@ ahci_port_intr(struct ahci_port *ap, u_i
PORTNAME(ap), err_slot);
ccb = &ap->ap_ccbs[err_slot];
+ if (ccb->ccb_xa.state != ATA_S_ONCHIP) {
+ printf("%s: NCQ errored slot %d is idle"
+ " (%08x active)\n", PORTNAME(ap), err_slot,
+ ci_saved);
+ goto failall;
+ }
} else {
/* Didn't reset, could gather extended info from log. */
}
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic