Vlastimil Babka c2d3b6
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Vlastimil Babka c2d3b6
Date: Thu, 18 Aug 2022 18:40:33 +0530
Vlastimil Babka c2d3b6
Subject: mm/demotion: add support for explicit memory tiers
Vlastimil Babka c2d3b6
Git-commit: 992bf77591cb7e696fcc59aa7e64d1200b673513
Vlastimil Babka c2d3b6
Patch-mainline: v6.1-rc1
Vlastimil Babka c2d3b6
References: jsc#PED-1248
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
Patch series "mm/demotion: Memory tiers and demotion", v15.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
The current kernel has the basic memory tiering support: Inactive pages on
Vlastimil Babka c2d3b6
a higher tier NUMA node can be migrated (demoted) to a lower tier NUMA
Vlastimil Babka c2d3b6
node to make room for new allocations on the higher tier NUMA node.
Vlastimil Babka c2d3b6
Frequently accessed pages on a lower tier NUMA node can be migrated
Vlastimil Babka c2d3b6
(promoted) to a higher tier NUMA node to improve the performance.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
In the current kernel, memory tiers are defined implicitly via a demotion
Vlastimil Babka c2d3b6
path relationship between NUMA nodes, which is created during the kernel
Vlastimil Babka c2d3b6
initialization and updated when a NUMA node is hot-added or hot-removed.
Vlastimil Babka c2d3b6
The current implementation puts all nodes with CPU into the highest tier,
Vlastimil Babka c2d3b6
and builds the tier hierarchy tier-by-tier by establishing the per-node
Vlastimil Babka c2d3b6
demotion targets based on the distances between nodes.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
This current memory tier kernel implementation needs to be improved for
Vlastimil Babka c2d3b6
several important use cases:
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
* The current tier initialization code always initializes each
Vlastimil Babka c2d3b6
  memory-only NUMA node into a lower tier.  But a memory-only NUMA node
Vlastimil Babka c2d3b6
  may have a high performance memory device (e.g.  a DRAM-backed
Vlastimil Babka c2d3b6
  memory-only node on a virtual machine) and that should be put into a
Vlastimil Babka c2d3b6
  higher tier.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
* The current tier hierarchy always puts CPU nodes into the top tier.
Vlastimil Babka c2d3b6
  But on a system with HBM (e.g.  GPU memory) devices, these memory-only
Vlastimil Babka c2d3b6
  HBM NUMA nodes should be in the top tier, and DRAM nodes with CPUs are
Vlastimil Babka c2d3b6
  better to be placed into the next lower tier.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
* Also because the current tier hierarchy always puts CPU nodes into the
Vlastimil Babka c2d3b6
  top tier, when a CPU is hot-added (or hot-removed) and triggers a memory
Vlastimil Babka c2d3b6
  node from CPU-less into a CPU node (or vice versa), the memory tier
Vlastimil Babka c2d3b6
  hierarchy gets changed, even though no memory node is added or removed.
Vlastimil Babka c2d3b6
  This can make the tier hierarchy unstable and make it difficult to
Vlastimil Babka c2d3b6
  support tier-based memory accounting.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
* A higher tier node can only be demoted to nodes with shortest distance
Vlastimil Babka c2d3b6
  on the next lower tier as defined by the demotion path, not any other
Vlastimil Babka c2d3b6
  node from any lower tier.  This strict, demotion order does not work in
Vlastimil Babka c2d3b6
  all use cases (e.g.  some use cases may want to allow cross-socket
Vlastimil Babka c2d3b6
  demotion to another node in the same demotion tier as a fallback when
Vlastimil Babka c2d3b6
  the preferred demotion node is out of space), and has resulted in the
Vlastimil Babka c2d3b6
  feature request for an interface to override the system-wide, per-node
Vlastimil Babka c2d3b6
  demotion order from the userspace.  This demotion order is also
Vlastimil Babka c2d3b6
  inconsistent with the page allocation fallback order when all the nodes
Vlastimil Babka c2d3b6
  in a higher tier are out of space: The page allocation can fall back to
Vlastimil Babka c2d3b6
  any node from any lower tier, whereas the demotion order doesn't allow
Vlastimil Babka c2d3b6
  that.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
This patch series make the creation of memory tiers explicit under the
Vlastimil Babka c2d3b6
control of device driver.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
Memory Tier Initialization
Vlastimil Babka c2d3b6
==========================
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
Linux kernel presents memory devices as NUMA nodes and each memory device
Vlastimil Babka c2d3b6
is of a specific type.  The memory type of a device is represented by its
Vlastimil Babka c2d3b6
abstract distance.  A memory tier corresponds to a range of abstract
Vlastimil Babka c2d3b6
distance.  This allows for classifying memory devices with a specific
Vlastimil Babka c2d3b6
performance range into a memory tier.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
By default, all memory nodes are assigned to the default tier with
Vlastimil Babka c2d3b6
abstract distance 512.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
A device driver can move its memory nodes from the default tier.  For
Vlastimil Babka c2d3b6
example, PMEM can move its memory nodes below the default tier, whereas
Vlastimil Babka c2d3b6
GPU can move its memory nodes above the default tier.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
The kernel initialization code makes the decision on which exact tier a
Vlastimil Babka c2d3b6
memory node should be assigned to based on the requests from the device
Vlastimil Babka c2d3b6
drivers as well as the memory device hardware information provided by the
Vlastimil Babka c2d3b6
firmware.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
Hot-adding/removing CPUs doesn't affect memory tier hierarchy.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
This patch (of 10):
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
In the current kernel, memory tiers are defined implicitly via a demotion
Vlastimil Babka c2d3b6
path relationship between NUMA nodes, which is created during the kernel
Vlastimil Babka c2d3b6
initialization and updated when a NUMA node is hot-added or hot-removed.
Vlastimil Babka c2d3b6
The current implementation puts all nodes with CPU into the highest tier,
Vlastimil Babka c2d3b6
and builds the tier hierarchy by establishing the per-node demotion
Vlastimil Babka c2d3b6
targets based on the distances between nodes.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
This current memory tier kernel implementation needs to be improved for
Vlastimil Babka c2d3b6
several important use cases,
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
The current tier initialization code always initializes each memory-only
Vlastimil Babka c2d3b6
NUMA node into a lower tier.  But a memory-only NUMA node may have a high
Vlastimil Babka c2d3b6
performance memory device (e.g.  a DRAM-backed memory-only node on a
Vlastimil Babka c2d3b6
virtual machine) that should be put into a higher tier.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
The current tier hierarchy always puts CPU nodes into the top tier.  But
Vlastimil Babka c2d3b6
on a system with HBM or GPU devices, the memory-only NUMA nodes mapping
Vlastimil Babka c2d3b6
these devices should be in the top tier, and DRAM nodes with CPUs are
Vlastimil Babka c2d3b6
better to be placed into the next lower tier.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
With current kernel higher tier node can only be demoted to nodes with
Vlastimil Babka c2d3b6
shortest distance on the next lower tier as defined by the demotion path,
Vlastimil Babka c2d3b6
not any other node from any lower tier.  This strict, demotion order does
Vlastimil Babka c2d3b6
not work in all use cases (e.g.  some use cases may want to allow
Vlastimil Babka c2d3b6
cross-socket demotion to another node in the same demotion tier as a
Vlastimil Babka c2d3b6
fallback when the preferred demotion node is out of space), This demotion
Vlastimil Babka c2d3b6
order is also inconsistent with the page allocation fallback order when
Vlastimil Babka c2d3b6
all the nodes in a higher tier are out of space: The page allocation can
Vlastimil Babka c2d3b6
fall back to any node from any lower tier, whereas the demotion order
Vlastimil Babka c2d3b6
doesn't allow that.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
This patch series address the above by defining memory tiers explicitly.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
Linux kernel presents memory devices as NUMA nodes and each memory device
Vlastimil Babka c2d3b6
is of a specific type.  The memory type of a device is represented by its
Vlastimil Babka c2d3b6
abstract distance.  A memory tier corresponds to a range of abstract
Vlastimil Babka c2d3b6
distance.  This allows for classifying memory devices with a specific
Vlastimil Babka c2d3b6
performance range into a memory tier.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
This patch configures the range/chunk size to be 128.  The default DRAM
Vlastimil Babka c2d3b6
abstract distance is 512.  We can have 4 memory tiers below the default
Vlastimil Babka c2d3b6
DRAM with abstract distance range 0 - 127, 127 - 255, 256- 383, 384 - 511.
Vlastimil Babka c2d3b6
Faster memory devices can be placed in these faster(higher) memory tiers.
Vlastimil Babka c2d3b6
Slower memory devices like persistent memory will have abstract distance
Vlastimil Babka c2d3b6
higher than the default DRAM level.
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
[akpm@linux-foundation.org: fix comment, per Aneesh]
Vlastimil Babka c2d3b6
Link: https://lkml.kernel.org/r/20220818131042.113280-1-aneesh.kumar@linux.ibm.com
Vlastimil Babka c2d3b6
Link: https://lkml.kernel.org/r/20220818131042.113280-2-aneesh.kumar@linux.ibm.com
Vlastimil Babka c2d3b6
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Vlastimil Babka c2d3b6
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Vlastimil Babka c2d3b6
Acked-by: Wei Xu <weixugc@google.com>
Vlastimil Babka c2d3b6
Cc: Alistair Popple <apopple@nvidia.com>
Vlastimil Babka c2d3b6
Cc: Bharata B Rao <bharata@amd.com>
Vlastimil Babka c2d3b6
Cc: Dan Williams <dan.j.williams@intel.com>
Vlastimil Babka c2d3b6
Cc: Dave Hansen <dave.hansen@intel.com>
Vlastimil Babka c2d3b6
Cc: Davidlohr Bueso <dave@stgolabs.net>
Vlastimil Babka c2d3b6
Cc: Hesham Almatary <hesham.almatary@huawei.com>
Vlastimil Babka c2d3b6
Cc: Johannes Weiner <hannes@cmpxchg.org>
Vlastimil Babka c2d3b6
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Vlastimil Babka c2d3b6
Cc: Michal Hocko <mhocko@kernel.org>
Vlastimil Babka c2d3b6
Cc: Tim Chen <tim.c.chen@intel.com>
Vlastimil Babka c2d3b6
Cc: Yang Shi <shy828301@gmail.com>
Vlastimil Babka c2d3b6
Cc: Jagdish Gediya <jvgediya.oss@gmail.com>
Vlastimil Babka c2d3b6
Cc: SeongJae Park <sj@kernel.org>
Vlastimil Babka c2d3b6
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vlastimil Babka c2d3b6
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Vlastimil Babka c2d3b6
---
Vlastimil Babka c2d3b6
 include/linux/memory-tiers.h |   18 ++++++
Vlastimil Babka c2d3b6
 mm/Makefile                  |    1 
Vlastimil Babka c2d3b6
 mm/memory-tiers.c            |  129 +++++++++++++++++++++++++++++++++++++++++++
Vlastimil Babka c2d3b6
 3 files changed, 148 insertions(+)
Vlastimil Babka c2d3b6
Vlastimil Babka c2d3b6
--- /dev/null
Vlastimil Babka c2d3b6
+++ b/include/linux/memory-tiers.h
Vlastimil Babka c2d3b6
@@ -0,0 +1,18 @@
Vlastimil Babka c2d3b6
+/* SPDX-License-Identifier: GPL-2.0 */
Vlastimil Babka c2d3b6
+#ifndef _LINUX_MEMORY_TIERS_H
Vlastimil Babka c2d3b6
+#define _LINUX_MEMORY_TIERS_H
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+/*
Vlastimil Babka c2d3b6
+ * Each tier cover a abstrace distance chunk size of 128
Vlastimil Babka c2d3b6
+ */
Vlastimil Babka c2d3b6
+#define MEMTIER_CHUNK_BITS	7
Vlastimil Babka c2d3b6
+#define MEMTIER_CHUNK_SIZE	(1 << MEMTIER_CHUNK_BITS)
Vlastimil Babka c2d3b6
+/*
Vlastimil Babka c2d3b6
+ * Smaller abstract distance values imply faster (higher) memory tiers. Offset
Vlastimil Babka c2d3b6
+ * the DRAM adistance so that we can accommodate devices with a slightly lower
Vlastimil Babka c2d3b6
+ * adistance value (slightly faster) than default DRAM adistance to be part of
Vlastimil Babka c2d3b6
+ * the same memory tier.
Vlastimil Babka c2d3b6
+ */
Vlastimil Babka c2d3b6
+#define MEMTIER_ADISTANCE_DRAM	((4 * MEMTIER_CHUNK_SIZE) + (MEMTIER_CHUNK_SIZE >> 1))
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+#endif  /* _LINUX_MEMORY_TIERS_H */
Vlastimil Babka c2d3b6
--- a/mm/Makefile
Vlastimil Babka c2d3b6
+++ b/mm/Makefile
Vlastimil Babka c2d3b6
@@ -90,6 +90,7 @@ obj-$(CONFIG_KFENCE) += kfence/
Vlastimil Babka c2d3b6
 obj-$(CONFIG_FAILSLAB) += failslab.o
Vlastimil Babka c2d3b6
 obj-$(CONFIG_MEMTEST)		+= memtest.o
Vlastimil Babka c2d3b6
 obj-$(CONFIG_MIGRATION) += migrate.o
Vlastimil Babka c2d3b6
+obj-$(CONFIG_NUMA) += memory-tiers.o
Vlastimil Babka c2d3b6
 obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o
Vlastimil Babka c2d3b6
 obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o
Vlastimil Babka c2d3b6
 obj-$(CONFIG_PAGE_COUNTER) += page_counter.o
Vlastimil Babka c2d3b6
--- /dev/null
Vlastimil Babka c2d3b6
+++ b/mm/memory-tiers.c
Vlastimil Babka c2d3b6
@@ -0,0 +1,129 @@
Vlastimil Babka c2d3b6
+// SPDX-License-Identifier: GPL-2.0
Vlastimil Babka c2d3b6
+#include <linux/types.h>
Vlastimil Babka c2d3b6
+#include <linux/nodemask.h>
Vlastimil Babka c2d3b6
+#include <linux/slab.h>
Vlastimil Babka c2d3b6
+#include <linux/lockdep.h>
Vlastimil Babka c2d3b6
+#include <linux/memory-tiers.h>
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+struct memory_tier {
Vlastimil Babka c2d3b6
+	/* hierarchy of memory tiers */
Vlastimil Babka c2d3b6
+	struct list_head list;
Vlastimil Babka c2d3b6
+	/* list of all memory types part of this tier */
Vlastimil Babka c2d3b6
+	struct list_head memory_types;
Vlastimil Babka c2d3b6
+	/*
Vlastimil Babka c2d3b6
+	 * start value of abstract distance. memory tier maps
Vlastimil Babka c2d3b6
+	 * an abstract distance  range,
Vlastimil Babka c2d3b6
+	 * adistance_start .. adistance_start + MEMTIER_CHUNK_SIZE
Vlastimil Babka c2d3b6
+	 */
Vlastimil Babka c2d3b6
+	int adistance_start;
Vlastimil Babka c2d3b6
+};
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+struct memory_dev_type {
Vlastimil Babka c2d3b6
+	/* list of memory types that are part of same tier as this type */
Vlastimil Babka c2d3b6
+	struct list_head tier_sibiling;
Vlastimil Babka c2d3b6
+	/* abstract distance for this specific memory type */
Vlastimil Babka c2d3b6
+	int adistance;
Vlastimil Babka c2d3b6
+	/* Nodes of same abstract distance */
Vlastimil Babka c2d3b6
+	nodemask_t nodes;
Vlastimil Babka c2d3b6
+	struct memory_tier *memtier;
Vlastimil Babka c2d3b6
+};
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+static DEFINE_MUTEX(memory_tier_lock);
Vlastimil Babka c2d3b6
+static LIST_HEAD(memory_tiers);
Vlastimil Babka c2d3b6
+static struct memory_dev_type *node_memory_types[MAX_NUMNODES];
Vlastimil Babka c2d3b6
+/*
Vlastimil Babka c2d3b6
+ * For now we can have 4 faster memory tiers with smaller adistance
Vlastimil Babka c2d3b6
+ * than default DRAM tier.
Vlastimil Babka c2d3b6
+ */
Vlastimil Babka c2d3b6
+static struct memory_dev_type default_dram_type  = {
Vlastimil Babka c2d3b6
+	.adistance = MEMTIER_ADISTANCE_DRAM,
Vlastimil Babka c2d3b6
+	.tier_sibiling = LIST_HEAD_INIT(default_dram_type.tier_sibiling),
Vlastimil Babka c2d3b6
+};
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+static struct memory_tier *find_create_memory_tier(struct memory_dev_type *memtype)
Vlastimil Babka c2d3b6
+{
Vlastimil Babka c2d3b6
+	bool found_slot = false;
Vlastimil Babka c2d3b6
+	struct memory_tier *memtier, *new_memtier;
Vlastimil Babka c2d3b6
+	int adistance = memtype->adistance;
Vlastimil Babka c2d3b6
+	unsigned int memtier_adistance_chunk_size = MEMTIER_CHUNK_SIZE;
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+	lockdep_assert_held_once(&memory_tier_lock);
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+	/*
Vlastimil Babka c2d3b6
+	 * If the memtype is already part of a memory tier,
Vlastimil Babka c2d3b6
+	 * just return that.
Vlastimil Babka c2d3b6
+	 */
Vlastimil Babka c2d3b6
+	if (memtype->memtier)
Vlastimil Babka c2d3b6
+		return memtype->memtier;
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+	adistance = round_down(adistance, memtier_adistance_chunk_size);
Vlastimil Babka c2d3b6
+	list_for_each_entry(memtier, &memory_tiers, list) {
Vlastimil Babka c2d3b6
+		if (adistance == memtier->adistance_start) {
Vlastimil Babka c2d3b6
+			memtype->memtier = memtier;
Vlastimil Babka c2d3b6
+			list_add(&memtype->tier_sibiling, &memtier->memory_types);
Vlastimil Babka c2d3b6
+			return memtier;
Vlastimil Babka c2d3b6
+		} else if (adistance < memtier->adistance_start) {
Vlastimil Babka c2d3b6
+			found_slot = true;
Vlastimil Babka c2d3b6
+			break;
Vlastimil Babka c2d3b6
+		}
Vlastimil Babka c2d3b6
+	}
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+	new_memtier = kmalloc(sizeof(struct memory_tier), GFP_KERNEL);
Vlastimil Babka c2d3b6
+	if (!new_memtier)
Vlastimil Babka c2d3b6
+		return ERR_PTR(-ENOMEM);
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+	new_memtier->adistance_start = adistance;
Vlastimil Babka c2d3b6
+	INIT_LIST_HEAD(&new_memtier->list);
Vlastimil Babka c2d3b6
+	INIT_LIST_HEAD(&new_memtier->memory_types);
Vlastimil Babka c2d3b6
+	if (found_slot)
Vlastimil Babka c2d3b6
+		list_add_tail(&new_memtier->list, &memtier->list);
Vlastimil Babka c2d3b6
+	else
Vlastimil Babka c2d3b6
+		list_add_tail(&new_memtier->list, &memory_tiers);
Vlastimil Babka c2d3b6
+	memtype->memtier = new_memtier;
Vlastimil Babka c2d3b6
+	list_add(&memtype->tier_sibiling, &new_memtier->memory_types);
Vlastimil Babka c2d3b6
+	return new_memtier;
Vlastimil Babka c2d3b6
+}
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+static struct memory_tier *set_node_memory_tier(int node)
Vlastimil Babka c2d3b6
+{
Vlastimil Babka c2d3b6
+	struct memory_tier *memtier;
Vlastimil Babka c2d3b6
+	struct memory_dev_type *memtype;
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+	lockdep_assert_held_once(&memory_tier_lock);
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+	if (!node_state(node, N_MEMORY))
Vlastimil Babka c2d3b6
+		return ERR_PTR(-EINVAL);
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+	if (!node_memory_types[node])
Vlastimil Babka c2d3b6
+		node_memory_types[node] = &default_dram_type;
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+	memtype = node_memory_types[node];
Vlastimil Babka c2d3b6
+	node_set(node, memtype->nodes);
Vlastimil Babka c2d3b6
+	memtier = find_create_memory_tier(memtype);
Vlastimil Babka c2d3b6
+	return memtier;
Vlastimil Babka c2d3b6
+}
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+static int __init memory_tier_init(void)
Vlastimil Babka c2d3b6
+{
Vlastimil Babka c2d3b6
+	int node;
Vlastimil Babka c2d3b6
+	struct memory_tier *memtier;
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+	mutex_lock(&memory_tier_lock);
Vlastimil Babka c2d3b6
+	/*
Vlastimil Babka c2d3b6
+	 * Look at all the existing N_MEMORY nodes and add them to
Vlastimil Babka c2d3b6
+	 * default memory tier or to a tier if we already have memory
Vlastimil Babka c2d3b6
+	 * types assigned.
Vlastimil Babka c2d3b6
+	 */
Vlastimil Babka c2d3b6
+	for_each_node_state(node, N_MEMORY) {
Vlastimil Babka c2d3b6
+		memtier = set_node_memory_tier(node);
Vlastimil Babka c2d3b6
+		if (IS_ERR(memtier))
Vlastimil Babka c2d3b6
+			/*
Vlastimil Babka c2d3b6
+			 * Continue with memtiers we are able to setup
Vlastimil Babka c2d3b6
+			 */
Vlastimil Babka c2d3b6
+			break;
Vlastimil Babka c2d3b6
+	}
Vlastimil Babka c2d3b6
+	mutex_unlock(&memory_tier_lock);
Vlastimil Babka c2d3b6
+
Vlastimil Babka c2d3b6
+	return 0;
Vlastimil Babka c2d3b6
+}
Vlastimil Babka c2d3b6
+subsys_initcall(memory_tier_init);