Blob Blame History Raw
From 58056f77502f3567b760c9a8fc8d2e9081515b2d Mon Sep 17 00:00:00 2001
From: Shakeel Butt <shakeelb@google.com>
Date: Fri, 5 Nov 2021 13:37:44 -0700
Subject: [PATCH] memcg, kmem: further deprecate kmem.limit_in_bytes
Git-commit: 58056f77502f3567b760c9a8fc8d2e9081515b2d
Patch-mainline: v5.16-rc1
References: bsc#1206896

mhocko@suse.com:
I am referencing bsc#1206896 although this is not a fix for the reported
CVE. Kmem accounting has never been supported so this patch disables the
feature altogether which also removes the only (remotely) potential
attack vector.

The deprecation process of kmem.limit_in_bytes started with the commit
0158115f702 ("memcg, kmem: deprecate kmem.limit_in_bytes") which also
explains in detail the motivation behind the deprecation.  To summarize,
it is the unexpected behavior on hitting the kmem limit.  This patch
moves the deprecation process to the next stage by disallowing to set
the kmem limit.  In future we might just remove the kmem.limit_in_bytes
file completely.

[akpm@linux-foundation.org: s/ENOTSUPP/EOPNOTSUPP/]
[arnd@arndb.de: mark cancel_charge() inline]
  Link: https://lkml.kernel.org/r/20211022070542.679839-1-arnd@kernel.org

Link: https://lkml.kernel.org/r/20211019153408.2916808-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

---
 Documentation/cgroup-v1/memory.txt |    8 ++------
 mm/memcontrol.c                    |    3 ++-
 2 files changed, 4 insertions(+), 7 deletions(-)

--- a/Documentation/cgroup-v1/memory.txt
+++ b/Documentation/cgroup-v1/memory.txt
@@ -78,7 +78,8 @@ Brief summary of control files.
  memory.oom_control		 # set/show oom controls.
  memory.numa_stat		 # show the number of memory usage per numa node
 
- memory.kmem.limit_in_bytes      # set/show hard limit for kernel memory
+ memory.kmem.limit_in_bytes          This knob is deprecated and writing to
+                                     it will return -ENOTSUPP.
  memory.kmem.usage_in_bytes      # show current kernel memory allocation
  memory.kmem.failcnt             # show the number of kernel memory usage hits limits
  memory.kmem.max_usage_in_bytes  # show max kernel memory usage recorded
@@ -462,11 +463,6 @@ About use_hierarchy, see Section 6.
   Because rmdir() moves all pages to parent, some out-of-use page caches can be
   moved to the parent. If you want to avoid that, force_empty will be useful.
 
-  Also, note that when memory.kmem.limit_in_bytes is set the charges due to
-  kernel pages will still be seen. This is not considered a failure and the
-  write will still return success. In this case, it is expected that
-  memory.kmem.usage_in_bytes == memory.usage_in_bytes.
-
   About use_hierarchy, see Section 6.
 
 5.2 stat file
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3041,7 +3041,8 @@ static ssize_t mem_cgroup_write(struct k
 			ret = mem_cgroup_resize_memsw_limit(memcg, nr_pages);
 			break;
 		case _KMEM:
-			ret = memcg_update_kmem_limit(memcg, nr_pages);
+			/* kmem.limit_in_bytes is deprecated. */
+			ret = -EOPNOTSUPP;
 			break;
 		case _TCP:
 			ret = memcg_update_tcp_limit(memcg, nr_pages);