Blob Blame History Raw
From 2cbb95883c990d0002a77e13d3278913ab26ad79 Mon Sep 17 00:00:00 2001
From: Jie Hai <haijie1@huawei.com>
Date: Tue, 30 Aug 2022 14:22:47 +0800
Subject: [PATCH] dmaengine: hisilicon: Add multi-thread support for a DMA channel
Git-commit: 2cbb95883c990d0002a77e13d3278913ab26ad79
Patch-mainline: v6.1-rc1
References: git-fixes

When we get a DMA channel and try to use it in multiple threads it
will cause oops and hanging the system.

% echo 100 > /sys/module/dmatest/parameters/threads_per_chan
% echo 100 > /sys/module/dmatest/parameters/iterations
% echo 1 > /sys/module/dmatest/parameters/run
[383493.327077] Unable to handle kernel paging request at virtual
		address dead000000000108
[383493.335103] Mem abort info:
[383493.335103]   ESR = 0x96000044
[383493.335105]   EC = 0x25: DABT (current EL), IL = 32 bits
[383493.335107]   SET = 0, FnV = 0
[383493.335108]   EA = 0, S1PTW = 0
[383493.335109]   FSC = 0x04: level 0 translation fault
[383493.335110] Data abort info:
[383493.335111]   ISV = 0, ISS = 0x00000044
[383493.364739]   CM = 0, WnR = 1
[383493.367793] [dead000000000108] address between user and kernel
		address ranges
[383493.375021] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[383493.437574] CPU: 63 PID: 27895 Comm: dma0chan0-copy2 Kdump:
		loaded Tainted: GO 5.17.0-rc4+ #2
[383493.457851] pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT
		-SSBS BTYPE=--)
[383493.465331] pc : vchan_tx_submit+0x64/0xa0
[383493.469957] lr : vchan_tx_submit+0x34/0xa0

This occurs because the transmission timed out, and that's due
to data race. Each thread rewrite channels's descriptor as soon as
device_issue_pending is called. It leads to the situation that
the driver thinks that it uses the right descriptor in interrupt
handler while channels's descriptor has been changed by other
thread. The descriptor which in fact reported interrupt will not
be handled any more, as well as its tx->callback.
That's why timeout reports.

With current fixes channels' descriptor changes it's value only
when it has been used. A new descriptor is acquired from
vc->desc_issued queue that is already filled with descriptors
that are ready to be sent. Threads have no direct access to DMA
channel descriptor. In case of channel's descriptor is busy, try
to submit to HW again when a descriptor is completed. In this case,
vc->desc_issued may be empty when hisi_dma_start_transfer is called,
so delete error reporting on this. Now it is just possible to queue
a descriptor for further processing.

Fixes: e9f08b65250d ("dmaengine: hisilicon: Add Kunpeng DMA engine support")
Signed-off-by: Jie Hai <haijie1@huawei.com>
Acked-by: Zhou Wang <wangzhou1@hisilicon.com>
Link: https://lore.kernel.org/r/20220830062251.52993-4-haijie1@huawei.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Acked-by: Takashi Iwai <tiwai@suse.de>

---
 drivers/dma/hisi_dma.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/hisi_dma.c b/drivers/dma/hisi_dma.c
index 837f7e4adfa6..0233b42143c7 100644
--- a/drivers/dma/hisi_dma.c
+++ b/drivers/dma/hisi_dma.c
@@ -271,7 +271,6 @@ static void hisi_dma_start_transfer(struct hisi_dma_chan *chan)
 
 	vd = vchan_next_desc(&chan->vc);
 	if (!vd) {
-		dev_err(&hdma_dev->pdev->dev, "no issued task!\n");
 		chan->desc = NULL;
 		return;
 	}
@@ -303,7 +302,7 @@ static void hisi_dma_issue_pending(struct dma_chan *c)
 
 	spin_lock_irqsave(&chan->vc.lock, flags);
 
-	if (vchan_issue_pending(&chan->vc))
+	if (vchan_issue_pending(&chan->vc) && !chan->desc)
 		hisi_dma_start_transfer(chan);
 
 	spin_unlock_irqrestore(&chan->vc.lock, flags);
@@ -441,11 +440,10 @@ static irqreturn_t hisi_dma_irq(int irq, void *data)
 				    chan->qp_num, chan->cq_head);
 		if (FIELD_GET(STATUS_MASK, cqe->w0) == STATUS_SUCC) {
 			vchan_cookie_complete(&desc->vd);
+			hisi_dma_start_transfer(chan);
 		} else {
 			dev_err(&hdma_dev->pdev->dev, "task error!\n");
 		}
-
-		chan->desc = NULL;
 	}
 
 	spin_unlock(&chan->vc.lock);
-- 
2.35.3