Michal Koutný a95360
From: Johannes Weiner <hannes@cmpxchg.org>
Michal Koutný a95360
Date: Wed, 7 Dec 2022 14:00:39 +0100
Michal Koutný a95360
Subject: mm: memcontrol: deprecate charge moving
Michal Koutný a95360
Git-commit: da34a8484d162585e22ed8c1e4114aa2f60e3567
Michal Koutný a95360
Patch-mainline: v6.3-rc1
Michal Koutný a95360
References: bsc#1209801
Michal Koutný a95360
Michal Koutný a95360
Charge moving mode in cgroup1 allows memory to follow tasks as they
Michal Koutný a95360
migrate between cgroups.  This is, and always has been, a questionable
Michal Koutný a95360
thing to do - for several reasons.
Michal Koutný a95360
Michal Koutný a95360
First, it's expensive.  Pages need to be identified, locked and isolated
Michal Koutný a95360
from various MM operations, and reassigned, one by one.
Michal Koutný a95360
Michal Koutný a95360
Second, it's unreliable.  Once pages are charged to a cgroup, there isn't
Michal Koutný a95360
always a clear owner task anymore.  Cache isn't moved at all, for example.
Michal Koutný a95360
Mapped memory is moved - but if trylocking or isolating a page fails,
Michal Koutný a95360
it's arbitrarily left behind.  Frequent moving between domains may leave a
Michal Koutný a95360
task's memory scattered all over the place.
Michal Koutný a95360
Michal Koutný a95360
Third, it isn't really needed.  Launcher tasks can kick off workload tasks
Michal Koutný a95360
directly in their target cgroup.  Using dedicated per-workload groups
Michal Koutný a95360
allows fine-grained policy adjustments - no need to move tasks and their
Michal Koutný a95360
physical pages between control domains.  The feature was never
Michal Koutný a95360
forward-ported to cgroup2, and it hasn't been missed.
Michal Koutný a95360
Michal Koutný a95360
Despite it being a niche usecase, the maintenance overhead of supporting
Michal Koutný a95360
it is enormous.  Because pages are moved while they are live and subject
Michal Koutný a95360
to various MM operations, the synchronization rules are complicated.
Michal Koutný a95360
There are lock_page_memcg() in MM and FS code, which non-cgroup people
Michal Koutný a95360
don't understand.  In some cases we've been able to shift code and cgroup
Michal Koutný a95360
API calls around such that we can rely on native locking as much as
Michal Koutný a95360
possible.  But that's fragile, and sometimes we need to hold MM locks for
Michal Koutný a95360
longer than we otherwise would (pte lock e.g.).
Michal Koutný a95360
Michal Koutný a95360
Mark the feature deprecated. Hopefully we can remove it soon.
Michal Koutný a95360
Michal Koutný a95360
And backport into -stable kernels so that people who develop against
Michal Koutný a95360
earlier kernels are warned about this deprecation as early as possible.
Michal Koutný a95360
Michal Koutný a95360
[akpm@linux-foundation.org: fix memory.rst underlining]
Michal Koutný a95360
Link: https://lkml.kernel.org/r/Y5COd+qXwk/S+n8N@cmpxchg.org
Michal Koutný a95360
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Michal Koutný a95360
Acked-by: Shakeel Butt <shakeelb@google.com>
Michal Koutný a95360
Acked-by: Hugh Dickins <hughd@google.com>
Michal Koutný a95360
Acked-by: Michal Hocko <mhocko@suse.com>
Michal Koutný a95360
Cc: Muchun Song <songmuchun@bytedance.com>
Michal Koutný a95360
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Michal Koutný a95360
Cc: <stable@vger.kernel.org>
Michal Koutný a95360
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michal Koutný a95360
Acked-by: Michal Koutný <mkoutny@suse.com>
Michal Koutný a95360
---
Michal Koutný a95360
 Documentation/admin-guide/cgroup-v1/memory.rst | 13 +++++++++++--
Michal Koutný a95360
 mm/memcontrol.c                                |  4 ++++
Michal Koutný a95360
 2 files changed, 15 insertions(+), 2 deletions(-)
Michal Koutný a95360
Michal Koutný a95360
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
Michal Koutný a95360
index 60370f2c67b9..258e45cc3b2d 100644
Michal Koutný a95360
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
Michal Koutný a95360
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
Michal Koutný a95360
@@ -86,6 +86,8 @@ Brief summary of control files.
Michal Koutný a95360
  memory.swappiness		     set/show swappiness parameter of vmscan
Michal Koutný a95360
 				     (See sysctl's vm.swappiness)
Michal Koutný a95360
  memory.move_charge_at_immigrate     set/show controls of moving charges
Michal Koutný a95360
+                                     This knob is deprecated and shouldn't be
Michal Koutný a95360
+                                     used.
Michal Koutný a95360
  memory.oom_control		     set/show oom controls.
Michal Koutný a95360
  memory.numa_stat		     show the number of memory usage per numa
Michal Koutný a95360
 				     node
Michal Koutný a95360
@@ -717,8 +719,15 @@ NOTE2:
Michal Koutný a95360
        It is recommended to set the soft limit always below the hard limit,
Michal Koutný a95360
        otherwise the hard limit will take precedence.
Michal Koutný a95360
 
Michal Koutný a95360
-8. Move charges at task migration
Michal Koutný a95360
-=================================
Michal Koutný a95360
+8. Move charges at task migration (DEPRECATED!)
Michal Koutný a95360
+===============================================
Michal Koutný a95360
+
Michal Koutný a95360
+THIS IS DEPRECATED!
Michal Koutný a95360
+
Michal Koutný a95360
+It's expensive and unreliable! It's better practice to launch workload
Michal Koutný a95360
+tasks directly from inside their target cgroup. Use dedicated workload
Michal Koutný a95360
+cgroups to allow fine-grained policy adjustments without having to
Michal Koutný a95360
+move physical pages between control domains.
Michal Koutný a95360
 
Michal Koutný a95360
 Users can move charges associated with a task along with task migration, that
Michal Koutný a95360
 is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
Michal Koutný a95360
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
Michal Koutný a95360
index a698a2b6523b..49f67176a1a2 100644
Michal Koutný a95360
--- a/mm/memcontrol.c
Michal Koutný a95360
+++ b/mm/memcontrol.c
Michal Koutný a95360
@@ -3919,6 +3919,10 @@ static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
Michal Koutný a95360
 {
Michal Koutný a95360
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
Michal Koutný a95360
 
Michal Koutný a95360
+	pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecated. "
Michal Koutný a95360
+		     "Please report your usecase to linux-mm@kvack.org if you "
Michal Koutný a95360
+		     "depend on this functionality.\n");
Michal Koutný a95360
+
Michal Koutný a95360
 	if (val & ~MOVE_MASK)
Michal Koutný a95360
 		return -EINVAL;
Michal Koutný a95360
 
Michal Koutný a95360