Blob Blame History Raw
From: Sagi Grimberg <sagi@grimberg.me>
Date: Fri, 30 Aug 2019 11:00:59 -0700
Subject: [PATCH] nvme: fix ns removal hang when failing to revalidate due to a
 transient error
References: bsc#1156419,jsc#SLE-8281
Git-commit: 205da24343013e0bd62475800df79cd053f22326
Patch-mainline: v5.4-rc1

If a controller reset is racing with a namespace revalidation, the
revalidation (admin) I/O will surely fail, but we should not remove the
namespace as we will execute the I/O when the controller is back up.
Same for spurious allocation errors (return -ENOMEM).

Fix this by checking the specific error code in nvme_revalidate_disk and
if it is a transient error (for example non DNR nvme statuses or
a negative ENOMEM as allocation failure), do not remove the namespace as
it will either recover when the controller is back up and schedule
a subsequent scan, or the controller is going away and the namespaces
will be removed anyways.

This fixes a hang namespace scanning racing with a controller reset and
also sporious I/O errors in path failover coditions where the
controller reset is racing with the namespace scan work with multipath
enabled.

Reported-by: Hannes Reinecke  <hare@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/core.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index efbe2f360be6..206fcb132689 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1848,7 +1848,13 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 free_id:
 	kfree(id);
 out:
-	if (ret > 0)
+	/*
+	 * Only fail the function if we got a fatal error back from the
+	 * device, otherwise ignore the error and just move on.
+	 */
+	if (ret == -ENOMEM || (ret > 0 && !(ret & NVME_SC_DNR)))
+		ret = 0;
+	else if (ret > 0)
 		ret = blk_status_to_errno(nvme_error_status(ret));
 	return ret;
 }
-- 
2.16.4