|
Jiri Slaby |
fabee2 |
From: Rongwei Wang <rongwei.wang@linux.alibaba.com>
|
|
Jiri Slaby |
fabee2 |
Date: Tue, 4 Apr 2023 23:47:16 +0800
|
|
Jiri Slaby |
fabee2 |
Subject: [PATCH] mm/swap: fix swap_info_struct race between swapoff and
|
|
Jiri Slaby |
fabee2 |
get_swap_pages()
|
|
Jiri Slaby |
fabee2 |
References: bsc#1012628
|
|
Jiri Slaby |
fabee2 |
Patch-mainline: 6.2.11
|
|
Jiri Slaby |
fabee2 |
Git-commit: 6fe7d6b992113719e96744d974212df3fcddc76c
|
|
Jiri Slaby |
fabee2 |
|
|
Jiri Slaby |
fabee2 |
commit 6fe7d6b992113719e96744d974212df3fcddc76c upstream.
|
|
Jiri Slaby |
fabee2 |
|
|
Jiri Slaby |
fabee2 |
The si->lock must be held when deleting the si from the available list.
|
|
Jiri Slaby |
fabee2 |
Otherwise, another thread can re-add the si to the available list, which
|
|
Jiri Slaby |
fabee2 |
can lead to memory corruption. The only place we have found where this
|
|
Jiri Slaby |
fabee2 |
happens is in the swapoff path. This case can be described as below:
|
|
Jiri Slaby |
fabee2 |
|
|
Jiri Slaby |
fabee2 |
core 0 core 1
|
|
Jiri Slaby |
fabee2 |
swapoff
|
|
Jiri Slaby |
fabee2 |
|
|
Jiri Slaby |
fabee2 |
del_from_avail_list(si) waiting
|
|
Jiri Slaby |
fabee2 |
|
|
Jiri Slaby |
fabee2 |
try lock si->lock acquire swap_avail_lock
|
|
Jiri Slaby |
fabee2 |
and re-add si into
|
|
Jiri Slaby |
fabee2 |
swap_avail_head
|
|
Jiri Slaby |
fabee2 |
|
|
Jiri Slaby |
fabee2 |
acquire si->lock but missing si already being added again, and continuing
|
|
Jiri Slaby |
fabee2 |
to clear SWP_WRITEOK, etc.
|
|
Jiri Slaby |
fabee2 |
|
|
Jiri Slaby |
fabee2 |
It can be easily found that a massive warning messages can be triggered
|
|
Jiri Slaby |
fabee2 |
inside get_swap_pages() by some special cases, for example, we call
|
|
Jiri Slaby |
fabee2 |
madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile,
|
|
Jiri Slaby |
fabee2 |
run much swapon-swapoff operations (e.g. stress-ng-swap).
|
|
Jiri Slaby |
fabee2 |
|
|
Jiri Slaby |
fabee2 |
However, in the worst case, panic can be caused by the above scene. In
|
|
Jiri Slaby |
fabee2 |
swapoff(), the memory used by si could be kept in swap_info[] after
|
|
Jiri Slaby |
fabee2 |
turning off a swap. This means memory corruption will not be caused
|
|
Jiri Slaby |
fabee2 |
immediately until allocated and reset for a new swap in the swapon path.
|
|
Jiri Slaby |
fabee2 |
A panic message caused: (with CONFIG_PLIST_DEBUG enabled)
|
|
Jiri Slaby |
fabee2 |
|
|
Jiri Slaby |
fabee2 |
------------[ cut here ]------------
|
|
Jiri Slaby |
fabee2 |
top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a
|
|
Jiri Slaby |
fabee2 |
prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d
|
|
Jiri Slaby |
fabee2 |
next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a
|
|
Jiri Slaby |
fabee2 |
WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70
|
|
Jiri Slaby |
fabee2 |
Modules linked in: rfkill(E) crct10dif_ce(E)...
|
|
Jiri Slaby |
fabee2 |
CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+
|
|
Jiri Slaby |
fabee2 |
Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
|
|
Jiri Slaby |
fabee2 |
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
|
|
Jiri Slaby |
fabee2 |
pc : plist_check_prev_next_node+0x50/0x70
|
|
Jiri Slaby |
fabee2 |
lr : plist_check_prev_next_node+0x50/0x70
|
|
Jiri Slaby |
fabee2 |
sp : ffff0018009d3c30
|
|
Jiri Slaby |
fabee2 |
x29: ffff0018009d3c40 x28: ffff800011b32a98
|
|
Jiri Slaby |
fabee2 |
x27: 0000000000000000 x26: ffff001803908000
|
|
Jiri Slaby |
fabee2 |
x25: ffff8000128ea088 x24: ffff800011b32a48
|
|
Jiri Slaby |
fabee2 |
x23: 0000000000000028 x22: ffff001800875c00
|
|
Jiri Slaby |
fabee2 |
x21: ffff800010f9e520 x20: ffff001800875c00
|
|
Jiri Slaby |
fabee2 |
x19: ffff001800fdc6e0 x18: 0000000000000030
|
|
Jiri Slaby |
fabee2 |
x17: 0000000000000000 x16: 0000000000000000
|
|
Jiri Slaby |
fabee2 |
x15: 0736076307640766 x14: 0730073007380731
|
|
Jiri Slaby |
fabee2 |
x13: 0736076307640766 x12: 0730073007380731
|
|
Jiri Slaby |
fabee2 |
x11: 000000000004058d x10: 0000000085a85b76
|
|
Jiri Slaby |
fabee2 |
x9 : ffff8000101436e4 x8 : ffff800011c8ce08
|
|
Jiri Slaby |
fabee2 |
x7 : 0000000000000000 x6 : 0000000000000001
|
|
Jiri Slaby |
fabee2 |
x5 : ffff0017df9ed338 x4 : 0000000000000001
|
|
Jiri Slaby |
fabee2 |
x3 : ffff8017ce62a000 x2 : ffff0017df9ed340
|
|
Jiri Slaby |
fabee2 |
x1 : 0000000000000000 x0 : 0000000000000000
|
|
Jiri Slaby |
fabee2 |
Call trace:
|
|
Jiri Slaby |
fabee2 |
plist_check_prev_next_node+0x50/0x70
|
|
Jiri Slaby |
fabee2 |
plist_check_head+0x80/0xf0
|
|
Jiri Slaby |
fabee2 |
plist_add+0x28/0x140
|
|
Jiri Slaby |
fabee2 |
add_to_avail_list+0x9c/0xf0
|
|
Jiri Slaby |
fabee2 |
_enable_swap_info+0x78/0xb4
|
|
Jiri Slaby |
fabee2 |
__do_sys_swapon+0x918/0xa10
|
|
Jiri Slaby |
fabee2 |
__arm64_sys_swapon+0x20/0x30
|
|
Jiri Slaby |
fabee2 |
el0_svc_common+0x8c/0x220
|
|
Jiri Slaby |
fabee2 |
do_el0_svc+0x2c/0x90
|
|
Jiri Slaby |
fabee2 |
el0_svc+0x1c/0x30
|
|
Jiri Slaby |
fabee2 |
el0_sync_handler+0xa8/0xb0
|
|
Jiri Slaby |
fabee2 |
el0_sync+0x148/0x180
|
|
Jiri Slaby |
fabee2 |
irq event stamp: 2082270
|
|
Jiri Slaby |
fabee2 |
|
|
Jiri Slaby |
fabee2 |
Now, si->lock locked before calling 'del_from_avail_list()' to make sure
|
|
Jiri Slaby |
fabee2 |
other thread see the si had been deleted and SWP_WRITEOK cleared together,
|
|
Jiri Slaby |
fabee2 |
will not reinsert again.
|
|
Jiri Slaby |
fabee2 |
|
|
Jiri Slaby |
fabee2 |
This problem exists in versions after stable 5.10.y.
|
|
Jiri Slaby |
fabee2 |
|
|
Jiri Slaby |
fabee2 |
Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba.com
|
|
Jiri Slaby |
fabee2 |
Fixes: a2468cc9bfdff ("swap: choose swap device according to numa node")
|
|
Jiri Slaby |
fabee2 |
Tested-by: Yongchen Yin <wb-yyc939293@alibaba-inc.com>
|
|
Jiri Slaby |
fabee2 |
Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
|
|
Jiri Slaby |
fabee2 |
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
|
|
Jiri Slaby |
fabee2 |
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
|
|
Jiri Slaby |
fabee2 |
Cc: Aaron Lu <aaron.lu@intel.com>
|
|
Jiri Slaby |
fabee2 |
Cc: <stable@vger.kernel.org>
|
|
Jiri Slaby |
fabee2 |
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Jiri Slaby |
fabee2 |
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Jiri Slaby |
fabee2 |
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
Jiri Slaby |
fabee2 |
---
|
|
Jiri Slaby |
fabee2 |
mm/swapfile.c | 3 ++-
|
|
Jiri Slaby |
fabee2 |
1 file changed, 2 insertions(+), 1 deletion(-)
|
|
Jiri Slaby |
fabee2 |
|
|
Jiri Slaby |
fabee2 |
diff --git a/mm/swapfile.c b/mm/swapfile.c
|
|
Jiri Slaby |
fabee2 |
index eb9b0bf1..36899c42 100644
|
|
Jiri Slaby |
fabee2 |
--- a/mm/swapfile.c
|
|
Jiri Slaby |
fabee2 |
+++ b/mm/swapfile.c
|
|
Jiri Slaby |
fabee2 |
@@ -679,6 +679,7 @@ static void __del_from_avail_list(struct swap_info_struct *p)
|
|
Jiri Slaby |
fabee2 |
{
|
|
Jiri Slaby |
fabee2 |
int nid;
|
|
Jiri Slaby |
fabee2 |
|
|
Jiri Slaby |
fabee2 |
+ assert_spin_locked(&p->lock);
|
|
Jiri Slaby |
fabee2 |
for_each_node(nid)
|
|
Jiri Slaby |
fabee2 |
plist_del(&p->avail_lists[nid], &swap_avail_heads[nid]);
|
|
Jiri Slaby |
fabee2 |
}
|
|
Jiri Slaby |
fabee2 |
@@ -2435,8 +2436,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
|
|
Jiri Slaby |
fabee2 |
spin_unlock(&swap_lock);
|
|
Jiri Slaby |
fabee2 |
goto out_dput;
|
|
Jiri Slaby |
fabee2 |
}
|
|
Jiri Slaby |
fabee2 |
- del_from_avail_list(p);
|
|
Jiri Slaby |
fabee2 |
spin_lock(&p->lock);
|
|
Jiri Slaby |
fabee2 |
+ del_from_avail_list(p);
|
|
Jiri Slaby |
fabee2 |
if (p->prio < 0) {
|
|
Jiri Slaby |
fabee2 |
struct swap_info_struct *si = p;
|
|
Jiri Slaby |
fabee2 |
int nid;
|
|
Jiri Slaby |
fabee2 |
--
|
|
Jiri Slaby |
fabee2 |
2.35.3
|
|
Jiri Slaby |
fabee2 |
|