|
Hannes Reinecke |
ad5c1b |
From: James Smart <jsmart2021@gmail.com>
|
|
Hannes Reinecke |
ad5c1b |
Date: Thu, 21 Nov 2019 09:59:37 -0800
|
|
Hannes Reinecke |
ad5c1b |
Subject: [PATCH] nvme-fc: fix double-free scenarios on hw queues
|
|
Hannes Reinecke |
ad5c1b |
Git-commit: c869e494ef8b5846d9ba91f1e922c23cd444f0c1
|
|
Hannes Reinecke |
ad5c1b |
Patch-mainline: v5.5-rc2
|
|
Hannes Reinecke |
ad5c1b |
References: bsc#1169045
|
|
Hannes Reinecke |
ad5c1b |
|
|
Hannes Reinecke |
ad5c1b |
If an error occurs on one of the ios used for creating an
|
|
Hannes Reinecke |
ad5c1b |
association, the creating routine has error paths that are
|
|
Hannes Reinecke |
ad5c1b |
invoked by the command failure and the error paths will free
|
|
Hannes Reinecke |
ad5c1b |
up the controller resources created to that point.
|
|
Hannes Reinecke |
ad5c1b |
|
|
Hannes Reinecke |
ad5c1b |
But... the io was ultimately determined by an asynchronous
|
|
Hannes Reinecke |
ad5c1b |
completion routine that detected the error and which
|
|
Hannes Reinecke |
ad5c1b |
unconditionally invokes the error_recovery path which calls
|
|
Hannes Reinecke |
ad5c1b |
delete_association. Delete association deletes all outstanding
|
|
Hannes Reinecke |
ad5c1b |
io then tears down the controller resources. So the
|
|
Hannes Reinecke |
ad5c1b |
create_association thread can be running in parallel with
|
|
Hannes Reinecke |
ad5c1b |
the error_recovery thread. What was seen was the LLDD received
|
|
Hannes Reinecke |
ad5c1b |
a call to delete a queue, causing the LLDD to do a free of a
|
|
Hannes Reinecke |
ad5c1b |
resource, then the transport called the delete queue again
|
|
Hannes Reinecke |
ad5c1b |
causing the driver to repeat the free call. The second free
|
|
Hannes Reinecke |
ad5c1b |
routine corrupted the allocator. The transport shouldn't be
|
|
Hannes Reinecke |
ad5c1b |
making the duplicate call, and the delete queue is just one
|
|
Hannes Reinecke |
ad5c1b |
of the resources being freed.
|
|
Hannes Reinecke |
ad5c1b |
|
|
Hannes Reinecke |
ad5c1b |
To fix, it is realized that the create_association path is
|
|
Hannes Reinecke |
ad5c1b |
completely serialized with one command at a time. So the
|
|
Hannes Reinecke |
ad5c1b |
failed io completion will always be seen by the create_association
|
|
Hannes Reinecke |
ad5c1b |
path and as of the failure, there are no ios to terminate and there
|
|
Hannes Reinecke |
ad5c1b |
is no reason to be manipulating queue freeze states, etc.
|
|
Hannes Reinecke |
ad5c1b |
The serialized condition stays true until the controller is
|
|
Hannes Reinecke |
ad5c1b |
transitioned to the LIVE state. Thus the fix is to change the
|
|
Hannes Reinecke |
ad5c1b |
error recovery path to check the controller state and only
|
|
Hannes Reinecke |
ad5c1b |
invoke the teardown path if not already in the CONNECTING state.
|
|
Hannes Reinecke |
ad5c1b |
|
|
Hannes Reinecke |
ad5c1b |
Reviewed-by: Himanshu Madhani <hmadhani@marvell.com>
|
|
Hannes Reinecke |
ad5c1b |
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
|
|
Hannes Reinecke |
ad5c1b |
Signed-off-by: James Smart <jsmart2021@gmail.com>
|
|
Hannes Reinecke |
ad5c1b |
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Hannes Reinecke |
ad5c1b |
Acked-by: Hannes Reinecke <hare@suse.com>
|
|
Hannes Reinecke |
ad5c1b |
---
|
|
Hannes Reinecke |
ad5c1b |
drivers/nvme/host/fc.c | 18 +++++++++++++++---
|
|
Hannes Reinecke |
ad5c1b |
1 file changed, 15 insertions(+), 3 deletions(-)
|
|
Hannes Reinecke |
ad5c1b |
|
|
Hannes Reinecke |
ad5c1b |
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
|
|
Hannes Reinecke |
ad5c1b |
index d61439f8f5a9..5a70ac395d53 100644
|
|
Hannes Reinecke |
ad5c1b |
--- a/drivers/nvme/host/fc.c
|
|
Hannes Reinecke |
ad5c1b |
+++ b/drivers/nvme/host/fc.c
|
|
Hannes Reinecke |
ad5c1b |
@@ -2913,10 +2913,22 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status)
|
|
Hannes Reinecke |
ad5c1b |
static void
|
|
Hannes Reinecke |
ad5c1b |
__nvme_fc_terminate_io(struct nvme_fc_ctrl *ctrl)
|
|
Hannes Reinecke |
ad5c1b |
{
|
|
Hannes Reinecke |
ad5c1b |
- nvme_stop_keep_alive(&ctrl->ctrl);
|
|
Hannes Reinecke |
ad5c1b |
+ /*
|
|
Hannes Reinecke |
ad5c1b |
+ * if state is connecting - the error occurred as part of a
|
|
Hannes Reinecke |
ad5c1b |
+ * reconnect attempt. The create_association error paths will
|
|
Hannes Reinecke |
ad5c1b |
+ * clean up any outstanding io.
|
|
Hannes Reinecke |
ad5c1b |
+ *
|
|
Hannes Reinecke |
ad5c1b |
+ * if it's a different state - ensure all pending io is
|
|
Hannes Reinecke |
ad5c1b |
+ * terminated. Given this can delay while waiting for the
|
|
Hannes Reinecke |
ad5c1b |
+ * aborted io to return, we recheck adapter state below
|
|
Hannes Reinecke |
ad5c1b |
+ * before changing state.
|
|
Hannes Reinecke |
ad5c1b |
+ */
|
|
Hannes Reinecke |
ad5c1b |
+ if (ctrl->ctrl.state != NVME_CTRL_CONNECTING) {
|
|
Hannes Reinecke |
ad5c1b |
+ nvme_stop_keep_alive(&ctrl->ctrl);
|
|
Hannes Reinecke |
ad5c1b |
|
|
Hannes Reinecke |
ad5c1b |
- /* will block will waiting for io to terminate */
|
|
Hannes Reinecke |
ad5c1b |
- nvme_fc_delete_association(ctrl);
|
|
Hannes Reinecke |
ad5c1b |
+ /* will block will waiting for io to terminate */
|
|
Hannes Reinecke |
ad5c1b |
+ nvme_fc_delete_association(ctrl);
|
|
Hannes Reinecke |
ad5c1b |
+ }
|
|
Hannes Reinecke |
ad5c1b |
|
|
Hannes Reinecke |
ad5c1b |
if (ctrl->ctrl.state != NVME_CTRL_CONNECTING &&
|
|
Hannes Reinecke |
ad5c1b |
!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING))
|
|
Hannes Reinecke |
ad5c1b |
--
|
|
Hannes Reinecke |
ad5c1b |
2.16.4
|
|
Hannes Reinecke |
ad5c1b |
|