|
Michal Koutný |
a95360 |
From: Johannes Weiner <hannes@cmpxchg.org>
|
|
Michal Koutný |
a95360 |
Date: Wed, 7 Dec 2022 14:00:39 +0100
|
|
Michal Koutný |
a95360 |
Subject: mm: memcontrol: deprecate charge moving
|
|
Michal Koutný |
a95360 |
Git-commit: da34a8484d162585e22ed8c1e4114aa2f60e3567
|
|
Michal Koutný |
a95360 |
Patch-mainline: v6.3-rc1
|
|
Michal Koutný |
a95360 |
References: bsc#1209801
|
|
Michal Koutný |
a95360 |
|
|
Michal Koutný |
a95360 |
Charge moving mode in cgroup1 allows memory to follow tasks as they
|
|
Michal Koutný |
a95360 |
migrate between cgroups. This is, and always has been, a questionable
|
|
Michal Koutný |
a95360 |
thing to do - for several reasons.
|
|
Michal Koutný |
a95360 |
|
|
Michal Koutný |
a95360 |
First, it's expensive. Pages need to be identified, locked and isolated
|
|
Michal Koutný |
a95360 |
from various MM operations, and reassigned, one by one.
|
|
Michal Koutný |
a95360 |
|
|
Michal Koutný |
a95360 |
Second, it's unreliable. Once pages are charged to a cgroup, there isn't
|
|
Michal Koutný |
a95360 |
always a clear owner task anymore. Cache isn't moved at all, for example.
|
|
Michal Koutný |
a95360 |
Mapped memory is moved - but if trylocking or isolating a page fails,
|
|
Michal Koutný |
a95360 |
it's arbitrarily left behind. Frequent moving between domains may leave a
|
|
Michal Koutný |
a95360 |
task's memory scattered all over the place.
|
|
Michal Koutný |
a95360 |
|
|
Michal Koutný |
a95360 |
Third, it isn't really needed. Launcher tasks can kick off workload tasks
|
|
Michal Koutný |
a95360 |
directly in their target cgroup. Using dedicated per-workload groups
|
|
Michal Koutný |
a95360 |
allows fine-grained policy adjustments - no need to move tasks and their
|
|
Michal Koutný |
a95360 |
physical pages between control domains. The feature was never
|
|
Michal Koutný |
a95360 |
forward-ported to cgroup2, and it hasn't been missed.
|
|
Michal Koutný |
a95360 |
|
|
Michal Koutný |
a95360 |
Despite it being a niche usecase, the maintenance overhead of supporting
|
|
Michal Koutný |
a95360 |
it is enormous. Because pages are moved while they are live and subject
|
|
Michal Koutný |
a95360 |
to various MM operations, the synchronization rules are complicated.
|
|
Michal Koutný |
a95360 |
There are lock_page_memcg() in MM and FS code, which non-cgroup people
|
|
Michal Koutný |
a95360 |
don't understand. In some cases we've been able to shift code and cgroup
|
|
Michal Koutný |
a95360 |
API calls around such that we can rely on native locking as much as
|
|
Michal Koutný |
a95360 |
possible. But that's fragile, and sometimes we need to hold MM locks for
|
|
Michal Koutný |
a95360 |
longer than we otherwise would (pte lock e.g.).
|
|
Michal Koutný |
a95360 |
|
|
Michal Koutný |
a95360 |
Mark the feature deprecated. Hopefully we can remove it soon.
|
|
Michal Koutný |
a95360 |
|
|
Michal Koutný |
a95360 |
And backport into -stable kernels so that people who develop against
|
|
Michal Koutný |
a95360 |
earlier kernels are warned about this deprecation as early as possible.
|
|
Michal Koutný |
a95360 |
|
|
Michal Koutný |
a95360 |
[akpm@linux-foundation.org: fix memory.rst underlining]
|
|
Michal Koutný |
a95360 |
Link: https://lkml.kernel.org/r/Y5COd+qXwk/S+n8N@cmpxchg.org
|
|
Michal Koutný |
a95360 |
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
|
|
Michal Koutný |
a95360 |
Acked-by: Shakeel Butt <shakeelb@google.com>
|
|
Michal Koutný |
a95360 |
Acked-by: Hugh Dickins <hughd@google.com>
|
|
Michal Koutný |
a95360 |
Acked-by: Michal Hocko <mhocko@suse.com>
|
|
Michal Koutný |
a95360 |
Cc: Muchun Song <songmuchun@bytedance.com>
|
|
Michal Koutný |
a95360 |
Cc: Roman Gushchin <roman.gushchin@linux.dev>
|
|
Michal Koutný |
a95360 |
Cc: <stable@vger.kernel.org>
|
|
Michal Koutný |
a95360 |
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Michal Koutný |
a95360 |
Acked-by: Michal Koutný <mkoutny@suse.com>
|
|
Michal Koutný |
a95360 |
---
|
|
Michal Koutný |
a95360 |
Documentation/admin-guide/cgroup-v1/memory.rst | 13 +++++++++++--
|
|
Michal Koutný |
a95360 |
mm/memcontrol.c | 4 ++++
|
|
Michal Koutný |
a95360 |
2 files changed, 15 insertions(+), 2 deletions(-)
|
|
Michal Koutný |
a95360 |
|
|
Michal Koutný |
a95360 |
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
|
|
Michal Koutný |
a95360 |
index 60370f2c67b9..258e45cc3b2d 100644
|
|
Michal Koutný |
a95360 |
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
|
|
Michal Koutný |
a95360 |
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
|
|
Michal Koutný |
a95360 |
@@ -86,6 +86,8 @@ Brief summary of control files.
|
|
Michal Koutný |
a95360 |
memory.swappiness set/show swappiness parameter of vmscan
|
|
Michal Koutný |
a95360 |
(See sysctl's vm.swappiness)
|
|
Michal Koutný |
a95360 |
memory.move_charge_at_immigrate set/show controls of moving charges
|
|
Michal Koutný |
a95360 |
+ This knob is deprecated and shouldn't be
|
|
Michal Koutný |
a95360 |
+ used.
|
|
Michal Koutný |
a95360 |
memory.oom_control set/show oom controls.
|
|
Michal Koutný |
a95360 |
memory.numa_stat show the number of memory usage per numa
|
|
Michal Koutný |
a95360 |
node
|
|
Michal Koutný |
a95360 |
@@ -717,8 +719,15 @@ NOTE2:
|
|
Michal Koutný |
a95360 |
It is recommended to set the soft limit always below the hard limit,
|
|
Michal Koutný |
a95360 |
otherwise the hard limit will take precedence.
|
|
Michal Koutný |
a95360 |
|
|
Michal Koutný |
a95360 |
-8. Move charges at task migration
|
|
Michal Koutný |
a95360 |
-=================================
|
|
Michal Koutný |
a95360 |
+8. Move charges at task migration (DEPRECATED!)
|
|
Michal Koutný |
a95360 |
+===============================================
|
|
Michal Koutný |
a95360 |
+
|
|
Michal Koutný |
a95360 |
+THIS IS DEPRECATED!
|
|
Michal Koutný |
a95360 |
+
|
|
Michal Koutný |
a95360 |
+It's expensive and unreliable! It's better practice to launch workload
|
|
Michal Koutný |
a95360 |
+tasks directly from inside their target cgroup. Use dedicated workload
|
|
Michal Koutný |
a95360 |
+cgroups to allow fine-grained policy adjustments without having to
|
|
Michal Koutný |
a95360 |
+move physical pages between control domains.
|
|
Michal Koutný |
a95360 |
|
|
Michal Koutný |
a95360 |
Users can move charges associated with a task along with task migration, that
|
|
Michal Koutný |
a95360 |
is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
|
|
Michal Koutný |
a95360 |
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
|
|
Michal Koutný |
a95360 |
index a698a2b6523b..49f67176a1a2 100644
|
|
Michal Koutný |
a95360 |
--- a/mm/memcontrol.c
|
|
Michal Koutný |
a95360 |
+++ b/mm/memcontrol.c
|
|
Michal Koutný |
a95360 |
@@ -3919,6 +3919,10 @@ static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
|
|
Michal Koutný |
a95360 |
{
|
|
Michal Koutný |
a95360 |
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
|
|
Michal Koutný |
a95360 |
|
|
Michal Koutný |
a95360 |
+ pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecated. "
|
|
Michal Koutný |
a95360 |
+ "Please report your usecase to linux-mm@kvack.org if you "
|
|
Michal Koutný |
a95360 |
+ "depend on this functionality.\n");
|
|
Michal Koutný |
a95360 |
+
|
|
Michal Koutný |
a95360 |
if (val & ~MOVE_MASK)
|
|
Michal Koutný |
a95360 |
return -EINVAL;
|
|
Michal Koutný |
a95360 |
|
|
Michal Koutný |
a95360 |
|