Jiri Slaby fabee2
From: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Jiri Slaby fabee2
Date: Tue, 4 Apr 2023 23:47:16 +0800
Jiri Slaby fabee2
Subject: [PATCH] mm/swap: fix swap_info_struct race between swapoff and
Jiri Slaby fabee2
 get_swap_pages()
Jiri Slaby fabee2
References: bsc#1012628
Jiri Slaby fabee2
Patch-mainline: 6.2.11
Jiri Slaby fabee2
Git-commit: 6fe7d6b992113719e96744d974212df3fcddc76c
Jiri Slaby fabee2
Jiri Slaby fabee2
commit 6fe7d6b992113719e96744d974212df3fcddc76c upstream.
Jiri Slaby fabee2
Jiri Slaby fabee2
The si->lock must be held when deleting the si from the available list.
Jiri Slaby fabee2
Otherwise, another thread can re-add the si to the available list, which
Jiri Slaby fabee2
can lead to memory corruption.  The only place we have found where this
Jiri Slaby fabee2
happens is in the swapoff path.  This case can be described as below:
Jiri Slaby fabee2
Jiri Slaby fabee2
core 0                       core 1
Jiri Slaby fabee2
swapoff
Jiri Slaby fabee2
Jiri Slaby fabee2
del_from_avail_list(si)      waiting
Jiri Slaby fabee2
Jiri Slaby fabee2
try lock si->lock            acquire swap_avail_lock
Jiri Slaby fabee2
                             and re-add si into
Jiri Slaby fabee2
                             swap_avail_head
Jiri Slaby fabee2
Jiri Slaby fabee2
acquire si->lock but missing si already being added again, and continuing
Jiri Slaby fabee2
to clear SWP_WRITEOK, etc.
Jiri Slaby fabee2
Jiri Slaby fabee2
It can be easily found that a massive warning messages can be triggered
Jiri Slaby fabee2
inside get_swap_pages() by some special cases, for example, we call
Jiri Slaby fabee2
madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile,
Jiri Slaby fabee2
run much swapon-swapoff operations (e.g.  stress-ng-swap).
Jiri Slaby fabee2
Jiri Slaby fabee2
However, in the worst case, panic can be caused by the above scene.  In
Jiri Slaby fabee2
swapoff(), the memory used by si could be kept in swap_info[] after
Jiri Slaby fabee2
turning off a swap.  This means memory corruption will not be caused
Jiri Slaby fabee2
immediately until allocated and reset for a new swap in the swapon path.
Jiri Slaby fabee2
A panic message caused: (with CONFIG_PLIST_DEBUG enabled)
Jiri Slaby fabee2
Jiri Slaby fabee2
------------[ cut here ]------------
Jiri Slaby fabee2
top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a
Jiri Slaby fabee2
prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d
Jiri Slaby fabee2
next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a
Jiri Slaby fabee2
WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70
Jiri Slaby fabee2
Modules linked in: rfkill(E) crct10dif_ce(E)...
Jiri Slaby fabee2
CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+
Jiri Slaby fabee2
Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
Jiri Slaby fabee2
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
Jiri Slaby fabee2
pc : plist_check_prev_next_node+0x50/0x70
Jiri Slaby fabee2
lr : plist_check_prev_next_node+0x50/0x70
Jiri Slaby fabee2
sp : ffff0018009d3c30
Jiri Slaby fabee2
x29: ffff0018009d3c40 x28: ffff800011b32a98
Jiri Slaby fabee2
x27: 0000000000000000 x26: ffff001803908000
Jiri Slaby fabee2
x25: ffff8000128ea088 x24: ffff800011b32a48
Jiri Slaby fabee2
x23: 0000000000000028 x22: ffff001800875c00
Jiri Slaby fabee2
x21: ffff800010f9e520 x20: ffff001800875c00
Jiri Slaby fabee2
x19: ffff001800fdc6e0 x18: 0000000000000030
Jiri Slaby fabee2
x17: 0000000000000000 x16: 0000000000000000
Jiri Slaby fabee2
x15: 0736076307640766 x14: 0730073007380731
Jiri Slaby fabee2
x13: 0736076307640766 x12: 0730073007380731
Jiri Slaby fabee2
x11: 000000000004058d x10: 0000000085a85b76
Jiri Slaby fabee2
x9 : ffff8000101436e4 x8 : ffff800011c8ce08
Jiri Slaby fabee2
x7 : 0000000000000000 x6 : 0000000000000001
Jiri Slaby fabee2
x5 : ffff0017df9ed338 x4 : 0000000000000001
Jiri Slaby fabee2
x3 : ffff8017ce62a000 x2 : ffff0017df9ed340
Jiri Slaby fabee2
x1 : 0000000000000000 x0 : 0000000000000000
Jiri Slaby fabee2
Call trace:
Jiri Slaby fabee2
 plist_check_prev_next_node+0x50/0x70
Jiri Slaby fabee2
 plist_check_head+0x80/0xf0
Jiri Slaby fabee2
 plist_add+0x28/0x140
Jiri Slaby fabee2
 add_to_avail_list+0x9c/0xf0
Jiri Slaby fabee2
 _enable_swap_info+0x78/0xb4
Jiri Slaby fabee2
 __do_sys_swapon+0x918/0xa10
Jiri Slaby fabee2
 __arm64_sys_swapon+0x20/0x30
Jiri Slaby fabee2
 el0_svc_common+0x8c/0x220
Jiri Slaby fabee2
 do_el0_svc+0x2c/0x90
Jiri Slaby fabee2
 el0_svc+0x1c/0x30
Jiri Slaby fabee2
 el0_sync_handler+0xa8/0xb0
Jiri Slaby fabee2
 el0_sync+0x148/0x180
Jiri Slaby fabee2
irq event stamp: 2082270
Jiri Slaby fabee2
Jiri Slaby fabee2
Now, si->lock locked before calling 'del_from_avail_list()' to make sure
Jiri Slaby fabee2
other thread see the si had been deleted and SWP_WRITEOK cleared together,
Jiri Slaby fabee2
will not reinsert again.
Jiri Slaby fabee2
Jiri Slaby fabee2
This problem exists in versions after stable 5.10.y.
Jiri Slaby fabee2
Jiri Slaby fabee2
Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba.com
Jiri Slaby fabee2
Fixes: a2468cc9bfdff ("swap: choose swap device according to numa node")
Jiri Slaby fabee2
Tested-by: Yongchen Yin <wb-yyc939293@alibaba-inc.com>
Jiri Slaby fabee2
Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Jiri Slaby fabee2
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Jiri Slaby fabee2
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Jiri Slaby fabee2
Cc: Aaron Lu <aaron.lu@intel.com>
Jiri Slaby fabee2
Cc: <stable@vger.kernel.org>
Jiri Slaby fabee2
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jiri Slaby fabee2
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jiri Slaby fabee2
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Jiri Slaby fabee2
---
Jiri Slaby fabee2
 mm/swapfile.c | 3 ++-
Jiri Slaby fabee2
 1 file changed, 2 insertions(+), 1 deletion(-)
Jiri Slaby fabee2
Jiri Slaby fabee2
diff --git a/mm/swapfile.c b/mm/swapfile.c
Jiri Slaby fabee2
index eb9b0bf1..36899c42 100644
Jiri Slaby fabee2
--- a/mm/swapfile.c
Jiri Slaby fabee2
+++ b/mm/swapfile.c
Jiri Slaby fabee2
@@ -679,6 +679,7 @@ static void __del_from_avail_list(struct swap_info_struct *p)
Jiri Slaby fabee2
 {
Jiri Slaby fabee2
 	int nid;
Jiri Slaby fabee2
 
Jiri Slaby fabee2
+	assert_spin_locked(&p->lock);
Jiri Slaby fabee2
 	for_each_node(nid)
Jiri Slaby fabee2
 		plist_del(&p->avail_lists[nid], &swap_avail_heads[nid]);
Jiri Slaby fabee2
 }
Jiri Slaby fabee2
@@ -2435,8 +2436,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
Jiri Slaby fabee2
 		spin_unlock(&swap_lock);
Jiri Slaby fabee2
 		goto out_dput;
Jiri Slaby fabee2
 	}
Jiri Slaby fabee2
-	del_from_avail_list(p);
Jiri Slaby fabee2
 	spin_lock(&p->lock);
Jiri Slaby fabee2
+	del_from_avail_list(p);
Jiri Slaby fabee2
 	if (p->prio < 0) {
Jiri Slaby fabee2
 		struct swap_info_struct *si = p;
Jiri Slaby fabee2
 		int nid;
Jiri Slaby fabee2
-- 
Jiri Slaby fabee2
2.35.3
Jiri Slaby fabee2