|
Jiri Slaby |
cf2094 |
From: YuBiao Wang <YuBiao.Wang@amd.com>
|
|
Jiri Slaby |
cf2094 |
Date: Thu, 16 Mar 2023 11:30:32 +0800
|
|
Jiri Slaby |
cf2094 |
Subject: [PATCH] drm/amdgpu: Force signal hw_fences that are embedded in
|
|
Jiri Slaby |
cf2094 |
non-sched jobs
|
|
Jiri Slaby |
cf2094 |
References: bsc#1012628
|
|
Jiri Slaby |
cf2094 |
Patch-mainline: 6.2.12
|
|
Jiri Slaby |
cf2094 |
Git-commit: 033c56474acf567a450f8bafca50e0b610f2b716
|
|
Jiri Slaby |
cf2094 |
|
|
Jiri Slaby |
cf2094 |
[ Upstream commit 033c56474acf567a450f8bafca50e0b610f2b716 ]
|
|
Jiri Slaby |
cf2094 |
|
|
Jiri Slaby |
cf2094 |
[Why]
|
|
Jiri Slaby |
cf2094 |
For engines not supporting soft reset, i.e. VCN, there will be a failed
|
|
Jiri Slaby |
cf2094 |
ib test before mode 1 reset during asic reset. The fences in this case
|
|
Jiri Slaby |
cf2094 |
are never signaled and next time when we try to free the sa_bo, kernel
|
|
Jiri Slaby |
cf2094 |
will hang.
|
|
Jiri Slaby |
cf2094 |
|
|
Jiri Slaby |
cf2094 |
[How]
|
|
Jiri Slaby |
cf2094 |
During pre_asic_reset, driver will clear job fences and afterwards the
|
|
Jiri Slaby |
cf2094 |
fences' refcount will be reduced to 1. For drm_sched_jobs it will be
|
|
Jiri Slaby |
cf2094 |
released in job_free_cb, and for non-sched jobs like ib_test, it's meant
|
|
Jiri Slaby |
cf2094 |
to be released in sa_bo_free but only when the fences are signaled. So
|
|
Jiri Slaby |
cf2094 |
we have to force signal the non_sched bad job's fence during
|
|
Jiri Slaby |
cf2094 |
pre_asic_reset or the clear is not complete.
|
|
Jiri Slaby |
cf2094 |
|
|
Jiri Slaby |
cf2094 |
Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
|
|
Jiri Slaby |
cf2094 |
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
|
|
Jiri Slaby |
cf2094 |
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Jiri Slaby |
cf2094 |
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
Jiri Slaby |
cf2094 |
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
Jiri Slaby |
cf2094 |
---
|
|
Jiri Slaby |
cf2094 |
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 +++++++++
|
|
Jiri Slaby |
cf2094 |
1 file changed, 9 insertions(+)
|
|
Jiri Slaby |
cf2094 |
|
|
Jiri Slaby |
cf2094 |
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
|
|
Jiri Slaby |
cf2094 |
index faff4a3f..f52d0ba9 100644
|
|
Jiri Slaby |
cf2094 |
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
|
|
Jiri Slaby |
cf2094 |
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
|
|
Jiri Slaby |
cf2094 |
@@ -678,6 +678,15 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring)
|
|
Jiri Slaby |
cf2094 |
ptr = &ring->fence_drv.fences[i];
|
|
Jiri Slaby |
cf2094 |
old = rcu_dereference_protected(*ptr, 1);
|
|
Jiri Slaby |
cf2094 |
if (old && old->ops == &amdgpu_job_fence_ops) {
|
|
Jiri Slaby |
cf2094 |
+ struct amdgpu_job *job;
|
|
Jiri Slaby |
cf2094 |
+
|
|
Jiri Slaby |
cf2094 |
+ /* For non-scheduler bad job, i.e. failed ib test, we need to signal
|
|
Jiri Slaby |
cf2094 |
+ * it right here or we won't be able to track them in fence_drv
|
|
Jiri Slaby |
cf2094 |
+ * and they will remain unsignaled during sa_bo free.
|
|
Jiri Slaby |
cf2094 |
+ */
|
|
Jiri Slaby |
cf2094 |
+ job = container_of(old, struct amdgpu_job, hw_fence);
|
|
Jiri Slaby |
cf2094 |
+ if (!job->base.s_fence && !dma_fence_is_signaled(old))
|
|
Jiri Slaby |
cf2094 |
+ dma_fence_signal(old);
|
|
Jiri Slaby |
cf2094 |
RCU_INIT_POINTER(*ptr, NULL);
|
|
Jiri Slaby |
cf2094 |
dma_fence_put(old);
|
|
Jiri Slaby |
cf2094 |
}
|
|
Jiri Slaby |
cf2094 |
--
|
|
Jiri Slaby |
cf2094 |
2.35.3
|
|
Jiri Slaby |
cf2094 |
|