|
Hannes Reinecke |
d4c391 |
From: Steffen Maier <maier@linux.vnet.ibm.com>
|
|
Hannes Reinecke |
d4c391 |
Subject: scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
|
|
Hannes Reinecke |
d4c391 |
Patch-mainline: v4.14-rc1
|
|
Hannes Reinecke |
d4c391 |
Git-commit: 71b8e45da51a7b64a23378221c0a5868bd79da4f
|
|
Hannes Reinecke |
d4c391 |
References: bnc#1066983, LTC#158493
|
|
Hannes Reinecke |
d4c391 |
|
|
Hannes Reinecke |
d4c391 |
Description: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
|
|
Hannes Reinecke |
d4c391 |
Symptom: Prerequisites: zfcp.dif=1 and a T10-DIF SCSI disk for which the
|
|
Hannes Reinecke |
d4c391 |
SCSI disk driver (sd) enabled DIX. The same single SCSI command
|
|
Hannes Reinecke |
d4c391 |
(READ or WRITE) must run into two timeouts without any successful
|
|
Hannes Reinecke |
d4c391 |
command response inbetween.
|
|
Hannes Reinecke |
d4c391 |
This triggers SCSI error handling (scsi_eh). Each test unit ready
|
|
Hannes Reinecke |
d4c391 |
(TUR) SCSI command as part of scsi_eh fails in zfcp causing a
|
|
Hannes Reinecke |
d4c391 |
QDIO problem with kernel message:
|
|
Hannes Reinecke |
d4c391 |
"zfcp.e78dec: <FCP_device_bus_ID>: A QDIO problem occurred"
|
|
Hannes Reinecke |
d4c391 |
(zfcpdbf REC trace tag "qdires1").
|
|
Hannes Reinecke |
d4c391 |
As a result, scsi_eh unnecessarily escalates successful LUN reset
|
|
Hannes Reinecke |
d4c391 |
(zfcpdbf SCSI trace tag "lr_okay") to successful target reset
|
|
Hannes Reinecke |
d4c391 |
(zfcpdbf SCSI trace tag "tr_okay") to successful host reset
|
|
Hannes Reinecke |
d4c391 |
(zfcpdbf SCSI trace tag "schrh_1") which finally gives up by
|
|
Hannes Reinecke |
d4c391 |
setting affected SCSI devices offline with kernel message:
|
|
Hannes Reinecke |
d4c391 |
"sd H:0:T:L: Device offlined - not ready after error recovery"
|
|
Hannes Reinecke |
d4c391 |
Problem: Scsi_eh re-uses regular SCSI commands in scsi_send_eh_cmnd().
|
|
Hannes Reinecke |
d4c391 |
Such command can have DIX protection data. Since commit
|
|
Hannes Reinecke |
d4c391 |
db007fc5e20c ("[SCSI] Command protection operation"),
|
|
Hannes Reinecke |
d4c391 |
scsi_eh_prep_cmnd() saves scmd->prot_op and temporarily resets it
|
|
Hannes Reinecke |
d4c391 |
to SCSI_PROT_NORMAL. A re-used command can still have
|
|
Hannes Reinecke |
d4c391 |
(scsi_prot_sg_count() != 0) and so zfcp sends down bogus requests
|
|
Hannes Reinecke |
d4c391 |
to the FCP channel hardware making the TUR scsi_eh command fail.
|
|
Hannes Reinecke |
d4c391 |
This causes scsi_eh_test_devices() to have (finish_cmds == 0)
|
|
Hannes Reinecke |
d4c391 |
[not SCSI device is online or not scsi_eh_tur() failed]. So
|
|
Hannes Reinecke |
d4c391 |
regular SCSI commands, that caused / were affected by scsi_eh,
|
|
Hannes Reinecke |
d4c391 |
are moved to work_q and scsi_eh_test_devices() itself returns
|
|
Hannes Reinecke |
d4c391 |
false. This escalates scsi_eh including a final fail in
|
|
Hannes Reinecke |
d4c391 |
scsi_eh_ready_devs() causing scsi_eh_offline_sdevs().
|
|
Hannes Reinecke |
d4c391 |
Solution: Other FCP LLDDs such as qla2xxx and lpfc shield their
|
|
Hannes Reinecke |
d4c391 |
queuecommand() to only access any of scsi_prot_sg...() if
|
|
Hannes Reinecke |
d4c391 |
(scsi_get_prot_op(cmd) != SCSI_PROT_NORMAL).
|
|
Hannes Reinecke |
d4c391 |
Do the same thing for zfcp, which introduced DIX support with
|
|
Hannes Reinecke |
d4c391 |
commit ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support
|
|
Hannes Reinecke |
d4c391 |
for DIF/DIX").
|
|
Hannes Reinecke |
d4c391 |
Reproduction: With zfcp.dif=1 and a T10-DIF SCSI disk for which the SCSI disk
|
|
Hannes Reinecke |
d4c391 |
driver (sd) enabled DIX, trigger two timeouts in a row for the
|
|
Hannes Reinecke |
d4c391 |
same single SCSI command (READ or WRITE).
|
|
Hannes Reinecke |
d4c391 |
To manually create a similar situation: Stop multipathd so we
|
|
Hannes Reinecke |
d4c391 |
don't get additional path checker TURs. Enable RSCN suppression
|
|
Hannes Reinecke |
d4c391 |
on the SAN switch port beyond the first link, i.e. towards the
|
|
Hannes Reinecke |
d4c391 |
storage target. Disable that switch port. Send one SCSI command
|
|
Hannes Reinecke |
d4c391 |
in the background (because it will block for a while) e.g. via
|
|
Hannes Reinecke |
d4c391 |
"dd if=/dev/mapper/... of=/dev/null count=1 &". After
|
|
Hannes Reinecke |
d4c391 |
<SCSI command timeout> seconds, the command runs into the timeout
|
|
Hannes Reinecke |
d4c391 |
for the first time, gets aborted, and then a retry is submitted.
|
|
Hannes Reinecke |
d4c391 |
The retry is also lost because the switch port is still disabled.
|
|
Hannes Reinecke |
d4c391 |
After 1.5 * <SCSI command timeout> seconds, enable that switch
|
|
Hannes Reinecke |
d4c391 |
port again. After 2 * <SCSI command timeout> seconds, the command
|
|
Hannes Reinecke |
d4c391 |
runs into the timeout for the second time and triggers scsi_eh.
|
|
Hannes Reinecke |
d4c391 |
As first step, scsi_eh sends a LUN reset which should get a
|
|
Hannes Reinecke |
d4c391 |
successful response from the storage target. The subsequent
|
|
Hannes Reinecke |
d4c391 |
scsi_eh TUR is only successful with this fix.
|
|
Hannes Reinecke |
d4c391 |
|
|
Hannes Reinecke |
d4c391 |
Upstream-Description:
|
|
Hannes Reinecke |
d4c391 |
|
|
Hannes Reinecke |
d4c391 |
scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
|
|
Hannes Reinecke |
d4c391 |
|
|
Hannes Reinecke |
d4c391 |
Since commit db007fc5e20c ("[SCSI] Command protection operation"),
|
|
Hannes Reinecke |
d4c391 |
scsi_eh_prep_cmnd() saves scmd->prot_op and temporarily resets it to
|
|
Hannes Reinecke |
d4c391 |
SCSI_PROT_NORMAL.
|
|
Hannes Reinecke |
d4c391 |
Other FCP LLDDs such as qla2xxx and lpfc shield their queuecommand()
|
|
Hannes Reinecke |
d4c391 |
to only access any of scsi_prot_sg...() if
|
|
Hannes Reinecke |
d4c391 |
(scsi_get_prot_op(cmd) != SCSI_PROT_NORMAL).
|
|
Hannes Reinecke |
d4c391 |
|
|
Hannes Reinecke |
d4c391 |
Do the same thing for zfcp, which introduced DIX support with
|
|
Hannes Reinecke |
d4c391 |
commit ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for
|
|
Hannes Reinecke |
d4c391 |
DIF/DIX").
|
|
Hannes Reinecke |
d4c391 |
|
|
Hannes Reinecke |
d4c391 |
Otherwise, TUR SCSI commands as part of scsi_eh likely fail in zfcp,
|
|
Hannes Reinecke |
d4c391 |
because the regular SCSI command with DIX protection data, that scsi_eh
|
|
Hannes Reinecke |
d4c391 |
re-uses in scsi_send_eh_cmnd(), of course still has
|
|
Hannes Reinecke |
d4c391 |
(scsi_prot_sg_count() != 0) and so zfcp sends down bogus requests to the
|
|
Hannes Reinecke |
d4c391 |
FCP channel hardware.
|
|
Hannes Reinecke |
d4c391 |
|
|
Hannes Reinecke |
d4c391 |
This causes scsi_eh_test_devices() to have (finish_cmds == 0)
|
|
Hannes Reinecke |
d4c391 |
[not SCSI device is online or not scsi_eh_tur() failed]
|
|
Hannes Reinecke |
d4c391 |
so regular SCSI commands, that caused / were affected by scsi_eh,
|
|
Hannes Reinecke |
d4c391 |
are moved to work_q and scsi_eh_test_devices() itself returns false.
|
|
Hannes Reinecke |
d4c391 |
In turn, it unnecessarily escalates in our case in scsi_eh_ready_devs()
|
|
Hannes Reinecke |
d4c391 |
beyond host reset to finally scsi_eh_offline_sdevs()
|
|
Hannes Reinecke |
d4c391 |
which sets affected SCSI devices offline with the following kernel message:
|
|
Hannes Reinecke |
d4c391 |
|
|
Hannes Reinecke |
d4c391 |
"kernel: sd H:0:T:L: Device offlined - not ready after error recovery"
|
|
Hannes Reinecke |
d4c391 |
|
|
Hannes Reinecke |
d4c391 |
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
|
|
Hannes Reinecke |
d4c391 |
Fixes: ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for DIF/DIX")
|
|
Hannes Reinecke |
d4c391 |
Cc: <stable@vger.kernel.org> #2.6.36+
|
|
Hannes Reinecke |
d4c391 |
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
|
|
Hannes Reinecke |
d4c391 |
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
|
|
Hannes Reinecke |
d4c391 |
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Hannes Reinecke |
d4c391 |
|
|
Hannes Reinecke |
d4c391 |
|
|
Hannes Reinecke |
d4c391 |
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
|
|
Hannes Reinecke |
d4c391 |
Acked-by: Hannes Reinecke <hare@suse.com>
|
|
Hannes Reinecke |
d4c391 |
---
|
|
Hannes Reinecke |
d4c391 |
drivers/s390/scsi/zfcp_fsf.c | 3 ++-
|
|
Hannes Reinecke |
d4c391 |
1 file changed, 2 insertions(+), 1 deletion(-)
|
|
Hannes Reinecke |
d4c391 |
|
|
Hannes Reinecke |
d4c391 |
--- a/drivers/s390/scsi/zfcp_fsf.c
|
|
Hannes Reinecke |
d4c391 |
+++ b/drivers/s390/scsi/zfcp_fsf.c
|
|
Hannes Reinecke |
d4c391 |
@@ -2258,7 +2258,8 @@ int zfcp_fsf_fcp_cmnd(struct scsi_cmnd *
|
|
Hannes Reinecke |
d4c391 |
fcp_cmnd = (struct fcp_cmnd *) &req->qtcb->bottom.io.fcp_cmnd;
|
|
Hannes Reinecke |
d4c391 |
zfcp_fc_scsi_to_fcp(fcp_cmnd, scsi_cmnd, 0);
|
|
Hannes Reinecke |
d4c391 |
|
|
Hannes Reinecke |
d4c391 |
- if (scsi_prot_sg_count(scsi_cmnd)) {
|
|
Hannes Reinecke |
d4c391 |
+ if ((scsi_get_prot_op(scsi_cmnd) != SCSI_PROT_NORMAL) &&
|
|
Hannes Reinecke |
d4c391 |
+ scsi_prot_sg_count(scsi_cmnd)) {
|
|
Hannes Reinecke |
d4c391 |
zfcp_qdio_set_data_div(qdio, &req->qdio_req,
|
|
Hannes Reinecke |
d4c391 |
scsi_prot_sg_count(scsi_cmnd));
|
|
Hannes Reinecke |
d4c391 |
retval = zfcp_qdio_sbals_from_sg(qdio, &req->qdio_req,
|