Blob Blame History Raw
From: Steffen Maier <maier@linux.vnet.ibm.com>
Subject: scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response
Patch-mainline: v4.14-rc1
Git-commit: fdb7cee3b9e3c561502e58137a837341f10cbf8b
References: bnc#1066983, LTC#158494

Description:  zfcp: fix tracing regressions, part 2
Symptom:      On unsuccessful fabric nameserver responses during zfcp
              automatic port scan, zfcp wastes 63 trace records of size 256
              bytes each in the PAY trace area.

              SCSI trace records on task management functions cannot be
              correlated to HBA trace records because all fields related to
              the FSF request are empty (zero).

              On the following two rather seldom error cases during task
              management function (TMF) handling as part of the scsi_eh
              callbacks for LUN reset or Target reset, zfcp misses to write
              trace records making such situations difficult to debug:
              fc_block_scsi_eh() returning != 0 which is FAST_IO_FAIL, and
              not having gotten an FSF request after the maximum number of
              retry attempts and thus could not issue a TMF and has to
              return FAILED.

              SCSI trace records do not show FCP_RSP_INFO in case of TMFs,
              or would only show the first 96 bytes of FCP_SNS_INFO if a
              target would send more sense data.

              Cannot always determine root cause for FSF requests that:
              Failed by running into an ERP timeout or if they got
              dismissed because a higher level recovery was triggered
              [trace tags "erscf_1" or "erscf_2"], or
              failed by running into an FSF request timeout [trace tag
              "fsrth_1"], or
              failed with a deferred error when using FCP hardware data
              router.
Problem:      Since v4.9 commit aceeffbb59bb ("zfcp: trace full payload of
              all SAN records (req,resp,iels)"), zfcp only checked for
              FC_NS_FID_LAST being set in fp_flags to cap an automatic port
              scan SAN response. This flag likely does not occur in an
              unsuccessful response.

              V2.6.14 commit 8a36e4532ea1 ("[SCSI] zfcp: enhancement of
              zfcp debug features") introdued trace records for TMFs but
              hard coded NULL for a possibly existing TMF FSF request. The
              scsi_cmnd scribble is also zero or unrelated for the TMF
              request so it also could not lookup a suitable FSF request
              from there.

              The following two upstream commits introduced new early
              returns without adding a trace record:
              v2.6.35 commit a1dbfddd02d2 ("[SCSI] zfcp: Pass return code
              from fc_block_scsi_eh to scsi eh"),
              v2.6.30 commit 63caf367e1c9 ("[SCSI] zfcp: Improve
              reliability of SCSI eh handlers in zfcp").

              Zfcp only traced capped sense data.

              Zfcp only used protocol status and FSF status to decide on an
              unsuccessful response. However, this is only one of multiple
              possible sources determining a failed FSF request.
Solution:     Cap unsuccessful responses to at least the actual basic CT_IU
              response plus whatever fits the SAN trace record built-in
              "payload" buffer just in case there's trailing information of
              which we would at least see the existence and its beginning.
              In order not to erroneously cap successful responses, we need
              to swap calling the trace function and setting the CT / ELS
              status to success (0).

              Pass fsf_req to SCSI trace on TMF.
              There are some true cases where we really do not have an FSF
              request:
              "rsl_fai" from zfcp_dbf_scsi_fail_send() called for
              early returns / completions in zfcp_scsi_queuecommand();
              "abrt_or", "abrt_bl", "abrt_ru", "abrt_ar" from
              zfcp_scsi_eh_abort_handler() where we did not get as far;
              "lr_nres", "tr_nres" from zfcp_task_mgmt_function() where
              we're successful and do not need to do anything because
              adapter stopped. For these cases it's correct to pass NULL
              for fsf_req to _zfcp_dbf_scsi().

              Add missing trace record statements.

              Instead of just capped sense data, trace the full FCP_RSP IU
              which includes any sense data if available.

              Check zfcp_fsf_req.status for ZFCP_STATUS_FSFREQ_DISMISSED or
              ZFCP_STATUS_FSFREQ_ERROR and trace with a new tag "fs_rerr"
              at the default trace level.
Reproduction: We typically see such case for an initiator WWPN, which is
              not in any zone.

              Tigger two timeouts in a row for the same single SCSI command
              (READ or WRITE).
              To manually create a similar situation: Stop multipathd so we
              don't get additional path checker TURs. Enable RSCN suppression
              on the SAN switch port beyond the first link, i.e. towards the
              storage target. Disable that switch port. Send one SCSI command
              in the background (because it will block for a while) e.g. via
              "dd if=/dev/mapper/... of=/dev/null count=1 &". After
              <SCSI command timeout> seconds, the command runs into the timeout
              for the first time, gets aborted, and then a retry is submitted.
              The retry is also lost because the switch port is still disabled.
              After 1.5 * <SCSI command timeout> seconds, enable that switch
              port again. After 2 * <SCSI command timeout> seconds, the command
              runs into the timeout for the second time and triggers scsi_eh.
              As first step, scsi_eh sends a LUN reset which should get a
              successful response from the storage target. This response
              causes a SCSI trace record for the TMF.

              Similar to above paragraph to trigger a LUN reset but
              triggering specific case is hardly possible.

              Similar to two paragraphs above for TMF trace records.
              During LUN attachment (automatic or manual) or after
              re-plugging fibres, we typically get some expected sense data.

              No known reproduction for ERP timeout, FSF request timeout,
              or erroneous FSF request using FCP hardware data router.

Upstream-Description:

              scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response

              At the default trace level, we only trace unsuccessful events including
              FSF responses.

              zfcp_dbf_hba_fsf_response() only used protocol status and FSF status to
              decide on an unsuccessful response. However, this is only one of multiple
              possible sources determining a failed struct zfcp_fsf_req.

              An FSF request can also "fail" if its response runs into an ERP timeout
              or if it gets dismissed because a higher level recovery was triggered
              [trace tags "erscf_1" or "erscf_2" in zfcp_erp_strategy_check_fsfreq()].
              FSF requests with ERP timeout are:
              FSF_QTCB_EXCHANGE_CONFIG_DATA, FSF_QTCB_EXCHANGE_PORT_DATA,
              FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT or
              FSF_QTCB_CLOSE_PHYSICAL_PORT for target ports,
              FSF_QTCB_OPEN_LUN, FSF_QTCB_CLOSE_LUN.
              One example is slow queue processing which can cause follow-on errors,
              e.g. FSF_PORT_ALREADY_OPEN after FSF_QTCB_OPEN_PORT_WITH_DID timed out.
              In order to see the root cause, we need to see late responses even if the
              channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.
              Example trace records formatted with zfcpdbf from the s390-tools package:

              Timestamp      : ...
              Area           : REC
              Subarea        : 00
              Level          : 1
              Exception      : -
              CPU ID         : ..
              Caller         : ...
              Record ID      : 1
              Tag            : fcegpf1
              LUN            : 0xffffffffffffffff
              WWPN           : 0x<WWPN>
              D_ID           : 0x00<D_ID>
              Adapter status : 0x5400050b
              Port status    : 0x41200000
              LUN status     : 0x00000000
              Ready count    : 0x00000001
              Running count  : 0x...
              ERP want       : 0x02				ZFCP_ERP_ACTION_REOPEN_PORT
              ERP need       : 0x02				ZFCP_ERP_ACTION_REOPEN_PORT
              |
              Timestamp      : ...				30 seconds later
              Area           : REC
              Subarea        : 00
              Level          : 1
              Exception      : -
              CPU ID         : ..
              Caller         : ...
              Record ID      : 2
              Tag            : erscf_2
              LUN            : 0xffffffffffffffff
              WWPN           : 0x<WWPN>
              D_ID           : 0x00<D_ID>
              Adapter status : 0x5400050b
              Port status    : 0x41200000
              LUN status     : 0x00000000
              Request ID     : 0x<request_ID>
              ERP status     : 0x10000000			ZFCP_STATUS_ERP_TIMEDOUT
              ERP step       : 0x0800				ZFCP_ERP_STEP_PORT_OPENING
              ERP action     : 0x02				ZFCP_ERP_ACTION_REOPEN_PORT
              ERP count      : 0x00
              |
              Timestamp      : ...				later than previous record
              Area           : HBA
              Subarea        : 00
              Level          : 5	> default level		=> 3	<= default level
              Exception      : -
              CPU ID         : 00
              Caller         : ...
              Record ID      : 1
              Tag            : fs_qtcb			=> fs_rerr
              Request ID     : 0x<request_ID>
              Request status : 0x00001010			ZFCP_STATUS_FSFREQ_DISMISSED
              						| ZFCP_STATUS_FSFREQ_CLEANUP
              FSF cmnd       : 0x00000005
              FSF sequence no: 0x...
              FSF issued     : ...				> 30 seconds ago
              FSF stat       : 0x00000000			FSF_GOOD
              FSF stat qual  : 00000000 00000000 00000000 00000000
              Prot stat      : 0x00000001			FSF_PROT_GOOD
              Prot stat qual : 00000000 00000000 00000000 00000000
              Port handle    : 0x...
              LUN handle     : 0x00000000
              QTCB log length: ...
              QTCB log info  : ...

              In case of problems detecting that new responses are waiting on the input
              queue, we sooner or later trigger adapter recovery due to an FSF request
              timeout (trace tag "fsrth_1").
              FSF requests with FSF request timeout are:
              typically FSF_QTCB_ABORT_FCP_CMND; but theoretically also
              FSF_QTCB_EXCHANGE_CONFIG_DATA or FSF_QTCB_EXCHANGE_PORT_DATA via sysfs,
              FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT for WKA ports,
              FSF_QTCB_FCP_CMND for task management function (LUN / target reset).
              One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD
              because the channel filled in the response via DMA into the request's QTCB.

              In a theroretical case, inject code can create an erroneous FSF request
              on purpose. If data router is enabled, it uses deferred error reporting.
              A READ SCSI command can succeed with FSF_PROT_GOOD, FSF_GOOD, and
              SAM_STAT_GOOD. But on writing the read data to host memory via DMA,
              it can still fail, e.g. if an intentionally wrong scatter list does not
              provide enough space. Rather than getting an unsuccessful response,
              we get a QDIO activate check which in turn triggers adapter recovery.
              One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD
              because the channel filled in the response via DMA into the request's QTCB.
              Example trace records formatted with zfcpdbf from the s390-tools package:

              Timestamp      : ...
              Area           : HBA
              Subarea        : 00
              Level          : 6	> default level		=> 3	<= default level
              Exception      : -
              CPU ID         : ..
              Caller         : ...
              Record ID      : 1
              Tag            : fs_norm			=> fs_rerr
              Request ID     : 0x<request_ID2>
              Request status : 0x00001010			ZFCP_STATUS_FSFREQ_DISMISSED
              						| ZFCP_STATUS_FSFREQ_CLEANUP
              FSF cmnd       : 0x00000001
              FSF sequence no: 0x...
              FSF issued     : ...
              FSF stat       : 0x00000000			FSF_GOOD
              FSF stat qual  : 00000000 00000000 00000000 00000000
              Prot stat      : 0x00000001			FSF_PROT_GOOD
              Prot stat qual : ........ ........ 00000000 00000000
              Port handle    : 0x...
              LUN handle     : 0x...
              |
              Timestamp      : ...
              Area           : SCSI
              Subarea        : 00
              Level          : 3
              Exception      : -
              CPU ID         : ..
              Caller         : ...
              Record ID      : 1
              Tag            : rsl_err
              Request ID     : 0x<request_ID2>
              SCSI ID        : 0x...
              SCSI LUN       : 0x...
              SCSI result    : 0x000e0000			DID_TRANSPORT_DISRUPTED
              SCSI retries   : 0x00
              SCSI allowed   : 0x05
              SCSI scribble  : 0x<request_ID2>
              SCSI opcode    : 28...				Read(10)
              FCP rsp inf cod: 0x00
              FCP rsp IU     : 00000000 00000000 00000000 00000000
                                                       ^^	SAM_STAT_GOOD
                               00000000 00000000

              Only with luck in both above cases, we could see a follow-on trace record
              of an unsuccesful event following a successful but late FSF response with
              FSF_PROT_GOOD and FSF_GOOD. Typically this was the case for I/O requests
              resulting in a SCSI trace record "rsl_err" with DID_TRANSPORT_DISRUPTED
              [On ZFCP_STATUS_FSFREQ_DISMISSED, zfcp_fsf_protstatus_eval() sets
              ZFCP_STATUS_FSFREQ_ERROR seen by the request handler functions as failure].
              However, the reason for this follow-on trace was invisible because the
              corresponding HBA trace record was missing at the default trace level
              (by default hidden records with tags "fs_norm", "fs_qtcb", or "fs_open").

              On adapter recovery, after we had shut down the QDIO queues, we perform
              unsuccessful pseudo completions with flag ZFCP_STATUS_FSFREQ_DISMISSED
              for each pending FSF request in zfcp_fsf_req_dismiss_all().
              In order to find the root cause, we need to see all pseudo responses even
              if the channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.

              Therefore, check zfcp_fsf_req.status for ZFCP_STATUS_FSFREQ_DISMISSED
              or ZFCP_STATUS_FSFREQ_ERROR and trace with a new tag "fs_rerr".

              It does not matter that there are numerous places which set
              ZFCP_STATUS_FSFREQ_ERROR after the location where we trace an FSF response
              early. These cases are based on protocol status != FSF_PROT_GOOD or
              == FSF_PROT_FSF_STATUS_PRESENTED and are thus already traced by default
              as trace tag "fs_perr" or "fs_ferr" respectively.

              NB: The trace record with tag "fssrh_1" for status read buffers on dismiss
              all remains. zfcp_fsf_req_complete() handles this and returns early.
              All other FSF request types are handled separately and as described above.

              Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
              Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features")
              Fixes: 2e261af84cdb ("[SCSI] zfcp: Only collect FSF/HBA debug data for matching trace levels")
              Cc: <stable@vger.kernel.org> #2.6.38+
              Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
              Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
              Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Acked-by: Hannes Reinecke <hare@suse.com>
---
 drivers/s390/scsi/zfcp_dbf.h |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/s390/scsi/zfcp_dbf.h
+++ b/drivers/s390/scsi/zfcp_dbf.h
@@ -323,7 +323,11 @@ void zfcp_dbf_hba_fsf_response(struct zf
 {
 	struct fsf_qtcb *qtcb = req->qtcb;
 
-	if ((qtcb->prefix.prot_status != FSF_PROT_GOOD) &&
+	if (unlikely(req->status & (ZFCP_STATUS_FSFREQ_DISMISSED |
+				    ZFCP_STATUS_FSFREQ_ERROR))) {
+		zfcp_dbf_hba_fsf_resp("fs_rerr", 3, req);
+
+	} else if ((qtcb->prefix.prot_status != FSF_PROT_GOOD) &&
 	    (qtcb->prefix.prot_status != FSF_PROT_FSF_STATUS_PRESENTED)) {
 		zfcp_dbf_hba_fsf_resp("fs_perr", 1, req);