From e6fd675e470d33da2538c899e487054ac5a75b6e Mon Sep 17 00:00:00 2001
From: Xiaogang Chen <xiaogang.chen@amd.com>
Date: Wed, 20 Sep 2023 11:02:51 -0500
Subject: drm/amdkfd: fix some race conditions in vram buffer alloc/free of svm
code
Git-commit: 7bfaa160caed8192f8262c4638f552cad94bcf5a
Patch-mainline: v6.7-rc1
References: jsc#PED-3527 jsc#PED-5475 jsc#PED-6068 jsc#PED-6070 jsc#PED-6116 jsc#PED-6120 jsc#PED-5065 jsc#PED-5477 jsc#PED-5511 jsc#PED-6041 jsc#PED-6069 jsc#PED-6071
This patch fixes:
1: ref number of prange's svm_bo got decreased by an async call from hmm. When
wait svm_bo of prange got released we shoul also wait prang->svm_bo become NULL,
otherwise prange->svm_bo may be set to null after allocate new vram buffer.
2: During waiting svm_bo of prange got released in a while loop should reschedule
current task to give other tasks oppotunity to run, specially the the workque
task that handles svm_bo ref release, otherwise we may enter to softlock.
Signed-off-by: Xiaogang.Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Patrik Jakobsson <pjakobsson@suse.de>
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 136a7a969d48..56a2afdfd3d4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -504,11 +504,11 @@ svm_range_validate_svm_bo(struct kfd_node *node, struct svm_range *prange)
/* We need a new svm_bo. Spin-loop to wait for concurrent
* svm_range_bo_release to finish removing this range from
- * its range list. After this, it is safe to reuse the
- * svm_bo pointer and svm_bo_list head.
+ * its range list and set prange->svm_bo to null. After this,
+ * it is safe to reuse the svm_bo pointer and svm_bo_list head.
*/
- while (!list_empty_careful(&prange->svm_bo_list))
- ;
+ while (!list_empty_careful(&prange->svm_bo_list) || prange->svm_bo)
+ cond_resched();
return false;
}
--
2.43.0