Vlastimil Babka a2a4df
From: Laurent Dufour <ldufour@linux.ibm.com>
Vlastimil Babka a2a4df
Date: Fri, 13 Nov 2020 22:51:53 -0800
Vlastimil Babka a2a4df
Subject: mm/slub: fix panic in slab_alloc_node()
Vlastimil Babka a2a4df
Git-commit: 22e4663e916321b72972c69ca0c6b962f529bd78
Vlastimil Babka a2a4df
Patch-mainline: v5.10-rc4
Vlastimil Babka a2a4df
References: bsc#1208023
Vlastimil Babka a2a4df
Vlastimil Babka a2a4df
While doing memory hot-unplug operation on a PowerPC VM running 1024 CPUs
Vlastimil Babka a2a4df
with 11TB of ram, I hit the following panic:
Vlastimil Babka a2a4df
Vlastimil Babka a2a4df
    BUG: Kernel NULL pointer dereference on read at 0x00000007
Vlastimil Babka a2a4df
    Faulting instruction address: 0xc000000000456048
Vlastimil Babka a2a4df
    Oops: Kernel access of bad area, sig: 11 [#2]
Vlastimil Babka a2a4df
    LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS= 2048 NUMA pSeries
Vlastimil Babka a2a4df
    Modules linked in: rpadlpar_io rpaphp
Vlastimil Babka a2a4df
    CPU: 160 PID: 1 Comm: systemd Tainted: G      D           5.9.0 #1
Vlastimil Babka a2a4df
    NIP:  c000000000456048 LR: c000000000455fd4 CTR: c00000000047b350
Vlastimil Babka a2a4df
    REGS: c00006028d1b77a0 TRAP: 0300   Tainted: G      D            (5.9.0)
Vlastimil Babka a2a4df
    MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24004228  XER: 00000000
Vlastimil Babka a2a4df
    CFAR: c00000000000f1b0 DAR: 0000000000000007 DSISR: 40000000 IRQMASK: 0
Vlastimil Babka a2a4df
    GPR00: c000000000455fd4 c00006028d1b7a30 c000000001bec800 0000000000000000
Vlastimil Babka a2a4df
    GPR04: 0000000000000dc0 0000000000000000 00000000000374ef c00007c53df99320
Vlastimil Babka a2a4df
    GPR08: 000007c53c980000 0000000000000000 000007c53c980000 0000000000000000
Vlastimil Babka a2a4df
    GPR12: 0000000000004400 c00000001e8e4400 0000000000000000 0000000000000f6a
Vlastimil Babka a2a4df
    GPR16: 0000000000000000 c000000001c25930 c000000001d62528 00000000000000c1
Vlastimil Babka a2a4df
    GPR20: c000000001d62538 c00006be469e9000 0000000fffffffe0 c0000000003c0ff8
Vlastimil Babka a2a4df
    GPR24: 0000000000000018 0000000000000000 0000000000000dc0 0000000000000000
Vlastimil Babka a2a4df
    GPR28: c00007c513755700 c000000001c236a4 c00007bc4001f800 0000000000000001
Vlastimil Babka a2a4df
    NIP [c000000000456048] __kmalloc_node+0x108/0x790
Vlastimil Babka a2a4df
    LR [c000000000455fd4] __kmalloc_node+0x94/0x790
Vlastimil Babka a2a4df
    Call Trace:
Vlastimil Babka a2a4df
      kvmalloc_node+0x58/0x110
Vlastimil Babka a2a4df
      mem_cgroup_css_online+0x10c/0x270
Vlastimil Babka a2a4df
      online_css+0x48/0xd0
Vlastimil Babka a2a4df
      cgroup_apply_control_enable+0x2c4/0x470
Vlastimil Babka a2a4df
      cgroup_mkdir+0x408/0x5f0
Vlastimil Babka a2a4df
      kernfs_iop_mkdir+0x90/0x100
Vlastimil Babka a2a4df
      vfs_mkdir+0x138/0x250
Vlastimil Babka a2a4df
      do_mkdirat+0x154/0x1c0
Vlastimil Babka a2a4df
      system_call_exception+0xf8/0x200
Vlastimil Babka a2a4df
      system_call_common+0xf0/0x27c
Vlastimil Babka a2a4df
    Instruction dump:
Vlastimil Babka a2a4df
    e93e0000 e90d0030 39290008 7cc9402a e94d0030 e93e0000 7ce95214 7f89502a
Vlastimil Babka a2a4df
    2fbc0000 419e0018 41920230 e9270010 <89290007> 7f994800 419e0220 7ee6bb78
Vlastimil Babka a2a4df
Vlastimil Babka a2a4df
This pointing to the following code:
Vlastimil Babka a2a4df
Vlastimil Babka a2a4df
    mm/slub.c:2851
Vlastimil Babka a2a4df
            if (unlikely(!object || !node_match(page, node))) {
Vlastimil Babka a2a4df
    c000000000456038:       00 00 bc 2f     cmpdi   cr7,r28,0
Vlastimil Babka a2a4df
    c00000000045603c:       18 00 9e 41     beq     cr7,c000000000456054 <__kmalloc_node+0x114>
Vlastimil Babka a2a4df
    node_match():
Vlastimil Babka a2a4df
    mm/slub.c:2491
Vlastimil Babka a2a4df
            if (node != NUMA_NO_NODE && page_to_nid(page) != node)
Vlastimil Babka a2a4df
    c000000000456040:       30 02 92 41     beq     cr4,c000000000456270 <__kmalloc_node+0x330>
Vlastimil Babka a2a4df
    page_to_nid():
Vlastimil Babka a2a4df
    include/linux/mm.h:1294
Vlastimil Babka a2a4df
    c000000000456044:       10 00 27 e9     ld      r9,16(r7)
Vlastimil Babka a2a4df
    c000000000456048:       07 00 29 89     lbz     r9,7(r9)	<<<< r9 = NULL
Vlastimil Babka a2a4df
    node_match():
Vlastimil Babka a2a4df
    mm/slub.c:2491
Vlastimil Babka a2a4df
    c00000000045604c:       00 48 99 7f     cmpw    cr7,r25,r9
Vlastimil Babka a2a4df
    c000000000456050:       20 02 9e 41     beq     cr7,c000000000456270 <__kmalloc_node+0x330>
Vlastimil Babka a2a4df
Vlastimil Babka a2a4df
The panic occurred in slab_alloc_node() when checking for the page's node:
Vlastimil Babka a2a4df
Vlastimil Babka a2a4df
	object = c->freelist;
Vlastimil Babka a2a4df
	page = c->page;
Vlastimil Babka a2a4df
	if (unlikely(!object || !node_match(page, node))) {
Vlastimil Babka a2a4df
		object = __slab_alloc(s, gfpflags, node, addr, c);
Vlastimil Babka a2a4df
		stat(s, ALLOC_SLOWPATH);
Vlastimil Babka a2a4df
Vlastimil Babka a2a4df
The issue is that object is not NULL while page is NULL which is odd but
Vlastimil Babka a2a4df
may happen if the cache flush happened after loading object but before
Vlastimil Babka a2a4df
loading page.  Thus checking for the page pointer is required too.
Vlastimil Babka a2a4df
Vlastimil Babka a2a4df
The cache flush is done through an inter processor interrupt when a
Vlastimil Babka a2a4df
piece of memory is off-lined.  That interrupt is triggered when a memory
Vlastimil Babka a2a4df
hot-unplug operation is initiated and offline_pages() is calling the
Vlastimil Babka a2a4df
slub's MEM_GOING_OFFLINE callback slab_mem_going_offline_callback()
Vlastimil Babka a2a4df
which is calling flush_cpu_slab().  If that interrupt is caught between
Vlastimil Babka a2a4df
the reading of c->freelist and the reading of c->page, this could lead
Vlastimil Babka a2a4df
to such a situation.  That situation is expected and the later call to
Vlastimil Babka a2a4df
this_cpu_cmpxchg_double() will detect the change to c->freelist and redo
Vlastimil Babka a2a4df
the whole operation.
Vlastimil Babka a2a4df
Vlastimil Babka a2a4df
In commit 6159d0f5c03e ("mm/slub.c: page is always non-NULL in
Vlastimil Babka a2a4df
node_match()") check on the page pointer has been removed assuming that
Vlastimil Babka a2a4df
page is always valid when it is called.  It happens that this is not
Vlastimil Babka a2a4df
true in that particular case, so check for page before calling
Vlastimil Babka a2a4df
node_match() here.
Vlastimil Babka a2a4df
Vlastimil Babka a2a4df
Fixes: 6159d0f5c03e ("mm/slub.c: page is always non-NULL in node_match()")
Vlastimil Babka a2a4df
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Vlastimil Babka a2a4df
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vlastimil Babka a2a4df
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Vlastimil Babka a2a4df
Acked-by: Christoph Lameter <cl@linux.com>
Vlastimil Babka a2a4df
Cc: Wei Yang <richard.weiyang@gmail.com>
Vlastimil Babka a2a4df
Cc: Pekka Enberg <penberg@kernel.org>
Vlastimil Babka a2a4df
Cc: David Rientjes <rientjes@google.com>
Vlastimil Babka a2a4df
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Vlastimil Babka a2a4df
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Vlastimil Babka a2a4df
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Vlastimil Babka a2a4df
Cc: Michal Hocko <mhocko@suse.com>
Vlastimil Babka a2a4df
Cc: <stable@vger.kernel.org>
Vlastimil Babka a2a4df
Link: https://lkml.kernel.org/r/20201027190406.33283-1-ldufour@linux.ibm.com
Vlastimil Babka a2a4df
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka a2a4df
---
Vlastimil Babka a2a4df
 mm/slub.c |    2 +-
Vlastimil Babka a2a4df
 1 file changed, 1 insertion(+), 1 deletion(-)
Vlastimil Babka a2a4df
Vlastimil Babka a2a4df
--- a/mm/slub.c
Vlastimil Babka a2a4df
+++ b/mm/slub.c
Vlastimil Babka a2a4df
@@ -2802,7 +2802,7 @@ redo:
Vlastimil Babka a2a4df
 
Vlastimil Babka a2a4df
 	object = c->freelist;
Vlastimil Babka a2a4df
 	page = c->page;
Vlastimil Babka a2a4df
-	if (unlikely(!object || !node_match(page, node))) {
Vlastimil Babka a2a4df
+	if (unlikely(!object || !page || !node_match(page, node))) {
Vlastimil Babka a2a4df
 		object = __slab_alloc(s, gfpflags, node, addr, c);
Vlastimil Babka a2a4df
 		stat(s, ALLOC_SLOWPATH);
Vlastimil Babka a2a4df
 	} else {