From 4b57dc65c10476507fa72443e8853146ea5b1cf6 Mon Sep 17 00:00:00 2001 From: Oscar Salvador Date: Mar 28 2023 08:24:25 +0000 Subject: Merge remote-tracking branch 'origin/users/mkoutny/SLE15-SP5-GA/memcg-immigrate-depr' into SLE15-SP5-GA Pull memcg fixes from Michal Koutny --- diff --git a/patches.suse/mm-memcontrol-deprecate-charge-moving.patch b/patches.suse/mm-memcontrol-deprecate-charge-moving.patch new file mode 100644 index 0000000..6d69304 --- /dev/null +++ b/patches.suse/mm-memcontrol-deprecate-charge-moving.patch @@ -0,0 +1,103 @@ +From: Johannes Weiner +Date: Wed, 7 Dec 2022 14:00:39 +0100 +Subject: mm: memcontrol: deprecate charge moving +Git-commit: da34a8484d162585e22ed8c1e4114aa2f60e3567 +Patch-mainline: v6.3-rc1 +References: bsc#1209801 + +Charge moving mode in cgroup1 allows memory to follow tasks as they +migrate between cgroups. This is, and always has been, a questionable +thing to do - for several reasons. + +First, it's expensive. Pages need to be identified, locked and isolated +from various MM operations, and reassigned, one by one. + +Second, it's unreliable. Once pages are charged to a cgroup, there isn't +always a clear owner task anymore. Cache isn't moved at all, for example. +Mapped memory is moved - but if trylocking or isolating a page fails, +it's arbitrarily left behind. Frequent moving between domains may leave a +task's memory scattered all over the place. + +Third, it isn't really needed. Launcher tasks can kick off workload tasks +directly in their target cgroup. Using dedicated per-workload groups +allows fine-grained policy adjustments - no need to move tasks and their +physical pages between control domains. The feature was never +forward-ported to cgroup2, and it hasn't been missed. + +Despite it being a niche usecase, the maintenance overhead of supporting +it is enormous. Because pages are moved while they are live and subject +to various MM operations, the synchronization rules are complicated. +There are lock_page_memcg() in MM and FS code, which non-cgroup people +don't understand. In some cases we've been able to shift code and cgroup +API calls around such that we can rely on native locking as much as +possible. But that's fragile, and sometimes we need to hold MM locks for +longer than we otherwise would (pte lock e.g.). + +Mark the feature deprecated. Hopefully we can remove it soon. + +And backport into -stable kernels so that people who develop against +earlier kernels are warned about this deprecation as early as possible. + +[akpm@linux-foundation.org: fix memory.rst underlining] +Link: https://lkml.kernel.org/r/Y5COd+qXwk/S+n8N@cmpxchg.org +Signed-off-by: Johannes Weiner +Acked-by: Shakeel Butt +Acked-by: Hugh Dickins +Acked-by: Michal Hocko +Cc: Muchun Song +Cc: Roman Gushchin +Cc: +Signed-off-by: Andrew Morton +Acked-by: Michal Koutný +--- + Documentation/admin-guide/cgroup-v1/memory.rst | 13 +++++++++++-- + mm/memcontrol.c | 4 ++++ + 2 files changed, 15 insertions(+), 2 deletions(-) + +diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst +index 60370f2c67b9..258e45cc3b2d 100644 +--- a/Documentation/admin-guide/cgroup-v1/memory.rst ++++ b/Documentation/admin-guide/cgroup-v1/memory.rst +@@ -86,6 +86,8 @@ Brief summary of control files. + memory.swappiness set/show swappiness parameter of vmscan + (See sysctl's vm.swappiness) + memory.move_charge_at_immigrate set/show controls of moving charges ++ This knob is deprecated and shouldn't be ++ used. + memory.oom_control set/show oom controls. + memory.numa_stat show the number of memory usage per numa + node +@@ -717,8 +719,15 @@ NOTE2: + It is recommended to set the soft limit always below the hard limit, + otherwise the hard limit will take precedence. + +-8. Move charges at task migration +-================================= ++8. Move charges at task migration (DEPRECATED!) ++=============================================== ++ ++THIS IS DEPRECATED! ++ ++It's expensive and unreliable! It's better practice to launch workload ++tasks directly from inside their target cgroup. Use dedicated workload ++cgroups to allow fine-grained policy adjustments without having to ++move physical pages between control domains. + + Users can move charges associated with a task along with task migration, that + is, uncharge task's pages from the old cgroup and charge them to the new cgroup. +diff --git a/mm/memcontrol.c b/mm/memcontrol.c +index a698a2b6523b..49f67176a1a2 100644 +--- a/mm/memcontrol.c ++++ b/mm/memcontrol.c +@@ -3919,6 +3919,10 @@ static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, + { + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + ++ pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecated. " ++ "Please report your usecase to linux-mm@kvack.org if you " ++ "depend on this functionality.\n"); ++ + if (val & ~MOVE_MASK) + return -EINVAL; + + diff --git a/series.conf b/series.conf index 7634909..c926590 100644 --- a/series.conf +++ b/series.conf @@ -36898,6 +36898,7 @@ patches.suse/ipmi_ssif-Rename-idle-state-and-check.patch patches.suse/ipmi-ssif-Remove-rtc_us_timer.patch patches.suse/ipmi-ssif-Add-a-timer-between-request-retries.patch + patches.suse/mm-memcontrol-deprecate-charge-moving.patch patches.suse/ibmvnic-Assign-XPS-map-to-correct-queue-index.patch patches.suse/bnxt_en-Avoid-order-5-memory-allocation-for-TPA-data.patch