Michal Koutný cedbb1
From: Chunguang Xu <brookxu@tencent.com>
Michal Koutný cedbb1
Date: Sat, 21 Mar 2020 18:22:10 -0700
Michal Koutný cedbb1
Subject: memcg: fix NULL pointer dereference in
Michal Koutný cedbb1
 __mem_cgroup_usage_unregister_event
Michal Koutný cedbb1
MIME-Version: 1.0
Michal Koutný cedbb1
Content-Type: text/plain; charset=UTF-8
Michal Koutný cedbb1
Content-Transfer-Encoding: 8bit
Michal Koutný cedbb1
Git-commit: 7d36665a5886c27ca4c4d0afd3ecc50b400f3587
Michal Koutný cedbb1
Patch-mainline: v5.6-rc7
Michal Koutný cedbb1
References: bsc#1177703
Michal Koutný cedbb1
Michal Koutný cedbb1
An eventfd monitors multiple memory thresholds of the cgroup, closes them,
Michal Koutný cedbb1
the kernel deletes all events related to this eventfd.  Before all events
Michal Koutný cedbb1
are deleted, another eventfd monitors the memory threshold of this cgroup,
Michal Koutný cedbb1
leading to a crash:
Michal Koutný cedbb1
Michal Koutný cedbb1
  BUG: kernel NULL pointer dereference, address: 0000000000000004
Michal Koutný cedbb1
  #PF: supervisor write access in kernel mode
Michal Koutný cedbb1
  #PF: error_code(0x0002) - not-present page
Michal Koutný cedbb1
  PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce067 PMD 0
Michal Koutný cedbb1
  Oops: 0002 [#1] SMP PTI
Michal Koutný cedbb1
  CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not tainted 5.6.0-rc4 #3
Michal Koutný cedbb1
  Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLET70WW (2.24 ) 05/21/2014
Michal Koutný cedbb1
  Workqueue: events memcg_event_remove
Michal Koutný cedbb1
  RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x190
Michal Koutný cedbb1
  RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202
Michal Koutný cedbb1
  RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000000000000001
Michal Koutný cedbb1
  RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000000000000001
Michal Koutný cedbb1
  RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000000000000010
Michal Koutný cedbb1
  R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff8bb226aba880
Michal Koutný cedbb1
  R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000000000000000
Michal Koutný cedbb1
  FS:  0000000000000000(0000) GS:ffff8bb242680000(0000) knlGS:0000000000000000
Michal Koutný cedbb1
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Michal Koutný cedbb1
  CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 00000000001606e0
Michal Koutný cedbb1
  Call Trace:
Michal Koutný cedbb1
    memcg_event_remove+0x32/0x90
Michal Koutný cedbb1
    process_one_work+0x172/0x380
Michal Koutný cedbb1
    worker_thread+0x49/0x3f0
Michal Koutný cedbb1
    kthread+0xf8/0x130
Michal Koutný cedbb1
    ret_from_fork+0x35/0x40
Michal Koutný cedbb1
  CR2: 0000000000000004
Michal Koutný cedbb1
Michal Koutný cedbb1
We can reproduce this problem in the following ways:
Michal Koutný cedbb1
Michal Koutný cedbb1
1. We create a new cgroup subdirectory and a new eventfd, and then we
Michal Koutný cedbb1
   monitor multiple memory thresholds of the cgroup through this eventfd.
Michal Koutný cedbb1
Michal Koutný cedbb1
2.  closing this eventfd, and __mem_cgroup_usage_unregister_event ()
Michal Koutný cedbb1
   will be called multiple times to delete all events related to this
Michal Koutný cedbb1
   eventfd.
Michal Koutný cedbb1
Michal Koutný cedbb1
The first time __mem_cgroup_usage_unregister_event() is called, the
Michal Koutný cedbb1
kernel will clear all items related to this eventfd in thresholds->
Michal Koutný cedbb1
primary.
Michal Koutný cedbb1
Michal Koutný cedbb1
Since there is currently only one eventfd, thresholds-> primary becomes
Michal Koutný cedbb1
empty, so the kernel will set thresholds-> primary and hresholds-> spare
Michal Koutný cedbb1
to NULL.  If at this time, the user creates a new eventfd and monitor
Michal Koutný cedbb1
the memory threshold of this cgroup, kernel will re-initialize
Michal Koutný cedbb1
thresholds-> primary.
Michal Koutný cedbb1
Michal Koutný cedbb1
Then when __mem_cgroup_usage_unregister_event () is called for the
Michal Koutný cedbb1
second time, because thresholds-> primary is not empty, the system will
Michal Koutný cedbb1
access thresholds-> spare, but thresholds-> spare is NULL, which will
Michal Koutný cedbb1
trigger a crash.
Michal Koutný cedbb1
Michal Koutný cedbb1
In general, the longer it takes to delete all events related to this
Michal Koutný cedbb1
eventfd, the easier it is to trigger this problem.
Michal Koutný cedbb1
Michal Koutný cedbb1
The solution is to check whether the thresholds associated with the
Michal Koutný cedbb1
eventfd has been cleared when deleting the event.  If so, we do nothing.
Michal Koutný cedbb1
Michal Koutný cedbb1
[akpm@linux-foundation.org: fix comment, per Kirill]
Michal Koutný cedbb1
Fixes: 907860ed381a ("cgroups: make cftype.unregister_event() void-returning")
Michal Koutný cedbb1
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Michal Koutný cedbb1
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michal Koutný cedbb1
Acked-by: Michal Hocko <mhocko@suse.com>
Michal Koutný cedbb1
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Michal Koutný cedbb1
Cc: Johannes Weiner <hannes@cmpxchg.org>
Michal Koutný cedbb1
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Michal Koutný cedbb1
Cc: <stable@vger.kernel.org>
Michal Koutný cedbb1
Link: http://lkml.kernel.org/r/077a6f67-aefa-4591-efec-f2f3af2b0b02@gmail.com
Michal Koutný cedbb1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Koutný cedbb1
Acked-by: Michal Koutný <mkoutny@suse.com>
Michal Koutný cedbb1
---
Michal Koutný cedbb1
 mm/memcontrol.c | 10 ++++++++--
Michal Koutný cedbb1
 1 file changed, 8 insertions(+), 2 deletions(-)
Michal Koutný cedbb1
Michal Koutný cedbb1
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
Michal Koutný cedbb1
index 2058b8da18db..50492aa9d61b 100644
Michal Koutný cedbb1
--- a/mm/memcontrol.c
Michal Koutný cedbb1
+++ b/mm/memcontrol.c
Michal Koutný cedbb1
@@ -4027,7 +4027,7 @@ static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
Michal Koutný cedbb1
 	struct mem_cgroup_thresholds *thresholds;
Michal Koutný cedbb1
 	struct mem_cgroup_threshold_ary *new;
Michal Koutný cedbb1
 	unsigned long usage;
Michal Koutný cedbb1
-	int i, j, size;
Michal Koutný cedbb1
+	int i, j, size, entries;
Michal Koutný cedbb1
 
Michal Koutný cedbb1
 	mutex_lock(&memcg->thresholds_lock);
Michal Koutný cedbb1
 
Michal Koutný cedbb1
@@ -4047,14 +4047,20 @@ static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
Michal Koutný cedbb1
 	__mem_cgroup_threshold(memcg, type == _MEMSWAP);
Michal Koutný cedbb1
 
Michal Koutný cedbb1
 	/* Calculate new number of threshold */
Michal Koutný cedbb1
-	size = 0;
Michal Koutný cedbb1
+	size = entries = 0;
Michal Koutný cedbb1
 	for (i = 0; i < thresholds->primary->size; i++) {
Michal Koutný cedbb1
 		if (thresholds->primary->entries[i].eventfd != eventfd)
Michal Koutný cedbb1
 			size++;
Michal Koutný cedbb1
+		else
Michal Koutný cedbb1
+			entries++;
Michal Koutný cedbb1
 	}
Michal Koutný cedbb1
 
Michal Koutný cedbb1
 	new = thresholds->spare;
Michal Koutný cedbb1
 
Michal Koutný cedbb1
+	/* If no items related to eventfd have been cleared, nothing to do */
Michal Koutný cedbb1
+	if (!entries)
Michal Koutný cedbb1
+		goto unlock;
Michal Koutný cedbb1
+
Michal Koutný cedbb1
 	/* Set thresholds array to NULL if we don't have thresholds */
Michal Koutný cedbb1
 	if (!size) {
Michal Koutný cedbb1
 		kfree(new);
Michal Koutný cedbb1