Daniel Wagner f618dc
From: Steve Wise <swise@opengridcomputing.com>
Daniel Wagner f618dc
Date: Wed, 26 May 2021 13:25:00 +0200
Daniel Wagner f618dc
Subject: RDMA/addr: create addr_wq with WQ_MEM_RECLAIM flag
Daniel Wagner f618dc
Patch-mainline: Never, no fix yet availabe upstream revert offending change
Daniel Wagner f618dc
References: bsc#1183346
Daniel Wagner f618dc
Daniel Wagner f618dc
While running NVMe/oF wire unplug tests, we hit this warning in
Daniel Wagner f618dc
kernel/workqueue.c:check_flush_dependency():
Daniel Wagner f618dc
Daniel Wagner f618dc
WARN_ONCE(worker && ((worker->current_pwq->wq->flags &
Daniel Wagner f618dc
		      (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM),
Daniel Wagner f618dc
	  "workqueue: WQ_MEM_RECLAIM %s:%pf is flushing !WQ_MEM_RECLAIM %s:%pf",
Daniel Wagner f618dc
	  worker->current_pwq->wq->name, worker->current_func,
Daniel Wagner f618dc
	  target_wq->name, target_func);
Daniel Wagner f618dc
Daniel Wagner f618dc
Which I think means we're flushing a workq that doesn't have
Daniel Wagner f618dc
WQ_MEM_RECLAIM set, from workqueue context that does have it set.
Daniel Wagner f618dc
Daniel Wagner f618dc
Looking at rdma_addr_cancel() which is doing the flushing, it flushes
Daniel Wagner f618dc
the addr_wq which doesn't have MEM_RECLAIM set.  Yet rdma_addr_cancel()
Daniel Wagner f618dc
is being called by the nvme host connection timeout/reconnect workqueue
Daniel Wagner f618dc
thread that does have WQ_MEM_RECLAIM set.
Daniel Wagner f618dc
Daniel Wagner f618dc
So set WQ_MEM_RECLAIM on the addr_req workqueue.
Daniel Wagner f618dc
Daniel Wagner f618dc
This is to silence the warning and not fixing the problem at
Daniel Wagner f618dc
all. Upstream is aware of this problem but there is no fix yet. To
Daniel Wagner f618dc
avoid a lot of support requests undo the offending commit.
Daniel Wagner f618dc
Daniel Wagner f618dc
Link: https://patchwork.kernel.org/project/linux-rdma/patch/5f5a1e4e90f3625cea57ffa79fc0e5bcb7efe09d.1548963371.git.swise@opengridcomputing.com/
Daniel Wagner f618dc
Fixes: 39baf10310e6 ("IB/core: Fix use workqueue without WQ_MEM_RECLAIM")
Daniel Wagner f618dc
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Daniel Wagner f618dc
Reviewed-by: Parav Pandit <parav@mellanox.com>
Daniel Wagner f618dc
[dwagner: Updated commit message]
Daniel Wagner f618dc
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Daniel Wagner f618dc
---
Daniel Wagner f618dc
 drivers/infiniband/core/addr.c | 2 +-
Daniel Wagner f618dc
 1 file changed, 1 insertion(+), 1 deletion(-)
Daniel Wagner f618dc
Daniel Wagner f618dc
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
Daniel Wagner f618dc
index 0abce004a959..c92d25c97ba1 100644
Daniel Wagner f618dc
--- a/drivers/infiniband/core/addr.c
Daniel Wagner f618dc
+++ b/drivers/infiniband/core/addr.c
Daniel Wagner f618dc
@@ -871,7 +871,7 @@ static struct notifier_block nb = {
Daniel Wagner f618dc
 
Daniel Wagner f618dc
 int addr_init(void)
Daniel Wagner f618dc
 {
Daniel Wagner f618dc
-	addr_wq = alloc_ordered_workqueue("ib_addr", 0);
Daniel Wagner f618dc
+	addr_wq = alloc_ordered_workqueue("ib_addr", WQ_MEM_RECLAIM);
Daniel Wagner f618dc
 	if (!addr_wq)
Daniel Wagner f618dc
 		return -ENOMEM;
Daniel Wagner f618dc
 
Daniel Wagner f618dc
-- 
Daniel Wagner f618dc
2.29.2
Daniel Wagner f618dc