Hannes Reinecke d4c391
From: Steffen Maier <maier@linux.vnet.ibm.com>
Hannes Reinecke d4c391
Subject: scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
Hannes Reinecke d4c391
Patch-mainline: v4.14-rc1
Hannes Reinecke d4c391
Git-commit: 71b8e45da51a7b64a23378221c0a5868bd79da4f
Hannes Reinecke d4c391
References: bnc#1066983, LTC#158493
Hannes Reinecke d4c391
Hannes Reinecke d4c391
Description:  zfcp: fix queuecommand for scsi_eh commands when DIX enabled
Hannes Reinecke d4c391
Symptom:      Prerequisites: zfcp.dif=1 and a T10-DIF SCSI disk for which the
Hannes Reinecke d4c391
              SCSI disk driver (sd) enabled DIX. The same single SCSI command
Hannes Reinecke d4c391
              (READ or WRITE) must run into two timeouts without any successful
Hannes Reinecke d4c391
              command response inbetween.
Hannes Reinecke d4c391
              This triggers SCSI error handling (scsi_eh). Each test unit ready
Hannes Reinecke d4c391
              (TUR) SCSI command as part of scsi_eh fails in zfcp causing a
Hannes Reinecke d4c391
              QDIO problem with kernel message:
Hannes Reinecke d4c391
              "zfcp.e78dec: <FCP_device_bus_ID>: A QDIO problem occurred"
Hannes Reinecke d4c391
              (zfcpdbf REC  trace tag "qdires1").
Hannes Reinecke d4c391
              As a result, scsi_eh unnecessarily escalates successful LUN reset
Hannes Reinecke d4c391
              (zfcpdbf SCSI trace tag "lr_okay") to successful target reset
Hannes Reinecke d4c391
              (zfcpdbf SCSI trace tag "tr_okay") to successful host reset
Hannes Reinecke d4c391
              (zfcpdbf SCSI trace tag "schrh_1") which finally gives up by
Hannes Reinecke d4c391
              setting affected SCSI devices offline with kernel message:
Hannes Reinecke d4c391
              "sd H:0:T:L: Device offlined - not ready after error recovery"
Hannes Reinecke d4c391
Problem:      Scsi_eh re-uses regular SCSI commands in scsi_send_eh_cmnd().
Hannes Reinecke d4c391
              Such command can have DIX protection data. Since commit
Hannes Reinecke d4c391
              db007fc5e20c ("[SCSI] Command protection operation"),
Hannes Reinecke d4c391
              scsi_eh_prep_cmnd() saves scmd->prot_op and temporarily resets it
Hannes Reinecke d4c391
              to SCSI_PROT_NORMAL. A re-used command can still have
Hannes Reinecke d4c391
              (scsi_prot_sg_count() != 0) and so zfcp sends down bogus requests
Hannes Reinecke d4c391
              to the FCP channel hardware making the TUR scsi_eh command fail.
Hannes Reinecke d4c391
              This causes scsi_eh_test_devices() to have (finish_cmds == 0)
Hannes Reinecke d4c391
              [not SCSI device is online or not scsi_eh_tur() failed]. So
Hannes Reinecke d4c391
              regular SCSI commands, that caused / were affected by scsi_eh,
Hannes Reinecke d4c391
              are moved to work_q and scsi_eh_test_devices() itself returns
Hannes Reinecke d4c391
              false. This escalates scsi_eh including a final fail in
Hannes Reinecke d4c391
              scsi_eh_ready_devs() causing scsi_eh_offline_sdevs().
Hannes Reinecke d4c391
Solution:     Other FCP LLDDs such as qla2xxx and lpfc shield their
Hannes Reinecke d4c391
              queuecommand() to only access any of scsi_prot_sg...() if
Hannes Reinecke d4c391
              (scsi_get_prot_op(cmd) != SCSI_PROT_NORMAL).
Hannes Reinecke d4c391
              Do the same thing for zfcp, which introduced DIX support with
Hannes Reinecke d4c391
              commit ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support
Hannes Reinecke d4c391
              for DIF/DIX").
Hannes Reinecke d4c391
Reproduction: With zfcp.dif=1 and a T10-DIF SCSI disk for which the SCSI disk
Hannes Reinecke d4c391
              driver (sd) enabled DIX, trigger two timeouts in a row for the
Hannes Reinecke d4c391
              same single SCSI command (READ or WRITE).
Hannes Reinecke d4c391
              To manually create a similar situation: Stop multipathd so we
Hannes Reinecke d4c391
              don't get additional path checker TURs. Enable RSCN suppression
Hannes Reinecke d4c391
              on the SAN switch port beyond the first link, i.e. towards the
Hannes Reinecke d4c391
              storage target. Disable that switch port. Send one SCSI command
Hannes Reinecke d4c391
              in the background (because it will block for a while) e.g. via
Hannes Reinecke d4c391
              "dd if=/dev/mapper/... of=/dev/null count=1 &". After
Hannes Reinecke d4c391
              <SCSI command timeout> seconds, the command runs into the timeout
Hannes Reinecke d4c391
              for the first time, gets aborted, and then a retry is submitted.
Hannes Reinecke d4c391
              The retry is also lost because the switch port is still disabled.
Hannes Reinecke d4c391
              After 1.5 * <SCSI command timeout> seconds, enable that switch
Hannes Reinecke d4c391
              port again. After 2 * <SCSI command timeout> seconds, the command
Hannes Reinecke d4c391
              runs into the timeout for the second time and triggers scsi_eh.
Hannes Reinecke d4c391
              As first step, scsi_eh sends a LUN reset which should get a
Hannes Reinecke d4c391
              successful response from the storage target. The subsequent
Hannes Reinecke d4c391
              scsi_eh TUR is only successful with this fix.
Hannes Reinecke d4c391
Hannes Reinecke d4c391
Upstream-Description:
Hannes Reinecke d4c391
Hannes Reinecke d4c391
              scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
Hannes Reinecke d4c391
Hannes Reinecke d4c391
              Since commit db007fc5e20c ("[SCSI] Command protection operation"),
Hannes Reinecke d4c391
              scsi_eh_prep_cmnd() saves scmd->prot_op and temporarily resets it to
Hannes Reinecke d4c391
              SCSI_PROT_NORMAL.
Hannes Reinecke d4c391
              Other FCP LLDDs such as qla2xxx and lpfc shield their queuecommand()
Hannes Reinecke d4c391
              to only access any of scsi_prot_sg...() if
Hannes Reinecke d4c391
              (scsi_get_prot_op(cmd) != SCSI_PROT_NORMAL).
Hannes Reinecke d4c391
Hannes Reinecke d4c391
              Do the same thing for zfcp, which introduced DIX support with
Hannes Reinecke d4c391
              commit ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for
Hannes Reinecke d4c391
              DIF/DIX").
Hannes Reinecke d4c391
Hannes Reinecke d4c391
              Otherwise, TUR SCSI commands as part of scsi_eh likely fail in zfcp,
Hannes Reinecke d4c391
              because the regular SCSI command with DIX protection data, that scsi_eh
Hannes Reinecke d4c391
              re-uses in scsi_send_eh_cmnd(), of course still has
Hannes Reinecke d4c391
              (scsi_prot_sg_count() != 0) and so zfcp sends down bogus requests to the
Hannes Reinecke d4c391
              FCP channel hardware.
Hannes Reinecke d4c391
Hannes Reinecke d4c391
              This causes scsi_eh_test_devices() to have (finish_cmds == 0)
Hannes Reinecke d4c391
              [not SCSI device is online or not scsi_eh_tur() failed]
Hannes Reinecke d4c391
              so regular SCSI commands, that caused / were affected by scsi_eh,
Hannes Reinecke d4c391
              are moved to work_q and scsi_eh_test_devices() itself returns false.
Hannes Reinecke d4c391
              In turn, it unnecessarily escalates in our case in scsi_eh_ready_devs()
Hannes Reinecke d4c391
              beyond host reset to finally scsi_eh_offline_sdevs()
Hannes Reinecke d4c391
              which sets affected SCSI devices offline with the following kernel message:
Hannes Reinecke d4c391
Hannes Reinecke d4c391
              "kernel: sd H:0:T:L: Device offlined - not ready after error recovery"
Hannes Reinecke d4c391
Hannes Reinecke d4c391
              Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Hannes Reinecke d4c391
              Fixes: ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for DIF/DIX")
Hannes Reinecke d4c391
              Cc: <stable@vger.kernel.org> #2.6.36+
Hannes Reinecke d4c391
              Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Hannes Reinecke d4c391
              Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Hannes Reinecke d4c391
              Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hannes Reinecke d4c391
Hannes Reinecke d4c391
Hannes Reinecke d4c391
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Hannes Reinecke d4c391
Acked-by: Hannes Reinecke <hare@suse.com>
Hannes Reinecke d4c391
---
Hannes Reinecke d4c391
 drivers/s390/scsi/zfcp_fsf.c |    3 ++-
Hannes Reinecke d4c391
 1 file changed, 2 insertions(+), 1 deletion(-)
Hannes Reinecke d4c391
Hannes Reinecke d4c391
--- a/drivers/s390/scsi/zfcp_fsf.c
Hannes Reinecke d4c391
+++ b/drivers/s390/scsi/zfcp_fsf.c
Hannes Reinecke d4c391
@@ -2258,7 +2258,8 @@ int zfcp_fsf_fcp_cmnd(struct scsi_cmnd *
Hannes Reinecke d4c391
 	fcp_cmnd = (struct fcp_cmnd *) &req->qtcb->bottom.io.fcp_cmnd;
Hannes Reinecke d4c391
 	zfcp_fc_scsi_to_fcp(fcp_cmnd, scsi_cmnd, 0);
Hannes Reinecke d4c391
 
Hannes Reinecke d4c391
-	if (scsi_prot_sg_count(scsi_cmnd)) {
Hannes Reinecke d4c391
+	if ((scsi_get_prot_op(scsi_cmnd) != SCSI_PROT_NORMAL) &&
Hannes Reinecke d4c391
+	    scsi_prot_sg_count(scsi_cmnd)) {
Hannes Reinecke d4c391
 		zfcp_qdio_set_data_div(qdio, &req->qdio_req,
Hannes Reinecke d4c391
 				       scsi_prot_sg_count(scsi_cmnd));
Hannes Reinecke d4c391
 		retval = zfcp_qdio_sbals_from_sg(qdio, &req->qdio_req,