[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-scsi
Subject:    Re: question about DID_RESET handling
From:       Peter Chang <dpf () google ! com>
Date:       2010-10-28 22:57:44
Message-ID: AANLkTi=BWdTMGEuMk-u5mQP7VpBd3k2PLKrdq==A_unw () mail ! gmail ! com
[Download RAW message or body]

Le 28 octobre 2010 13:22, weasel@meer.net <weasel@meer.net> a =E9crit :
> scsi_decide_disposition() looking at a command w/ DID_RESET returns
> SUCCESS and then scsi_io_completion() unconditionally retries the
> command. w/ our injector setup to corrupt alternating data frames i can
> induce an 'infinite loop' in the case where the controller's reset works
> and then the scsi READ(10) fails.
>
> anyway, i'd like to avoid this loop regardless of 'odd' controller
> behavior so am asking about the two possible fixes i see and to see if
> there's some other way.
>
> - make scsi_decide_disposition()'s handling of DID_RESET just use the
> =A0maybe_retry logic.
>
> =A0however, (being new to this area) scsi_io_completion() calls
> =A0scsi_end_request() in case some data actually got read. i don't see th=
at
> =A0a request w/ DID_RESET set can have data, but if it really can, then..=
.
>
> - copy retry logic to scsi_io_completion() in the DID_RESET case. using
> =A0cmd->retries seems safe here since it won't ahve gotten updated in
> =A0scsi_decide_disposition(), but maybe a comment there might be nice.

in the hopes that i can get a fix pushed upstream of us, i've attached
patches for my two alternatives listed above.

i realize that my error is now artificially induced, but it did
startoff as a real error where something as simple as 'mount /dev/sdb1
/mnt/foo' went off into the weeds. yes, the controller can do better,
but i think we already have the infrastructure in place to avoid this.

\p

["scsi_decide_disposition.diff" (application/octet-stream)]

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 7ad53fa..449ccc9 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1446,7 +1446,7 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
 			return FAILED;
 		}
 	case DID_RESET:
-		return SUCCESS;
+		goto maybe_retry;
 	default:
 		return FAILED;
 	}

["scsi_io_completion.diff" (application/octet-stream)]

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1646fe7..b99c87c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -799,10 +799,13 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 
 	if (host_byte(result) == DID_RESET) {
 		/* Third party bus reset or reset for error recovery
-		 * reasons.  Just retry the command and see what
-		 * happens.
+		 * reasons. scsi_decide_disposition() doesn't handle
+		 * the retry logic because (in theory) we might have
+		 * had something for scsi_end_request() above.
 		 */
-		action = ACTION_RETRY;
+		action = (++cmd->retries) <= cmd->allowed
+			? ACTION_RETRY
+			: ACTION_FAIL;
 	} else if (sense_valid && !sense_deferred) {
 		switch (sshdr.sense_key) {
 		case UNIT_ATTENTION:

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic