Michal Suchanek ad43d2
From 67df77845c181166d4bc324cbb0382f7e81c7631 Mon Sep 17 00:00:00 2001
Michal Suchanek ad43d2
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Michal Suchanek ad43d2
Date: Mon, 17 Aug 2020 11:22:57 +0530
Michal Suchanek ad43d2
Subject: [PATCH] powerpc/numa: Restrict possible nodes based on platform
Michal Suchanek ad43d2
Michal Suchanek ad43d2
References: bsc#1209999 ltc#202140 bsc#1142685 ltc#179509 FATE#327775 git-fixes
Michal Suchanek ad43d2
Patch-mainline: v5.10-rc1
Michal Suchanek ad43d2
Git-commit: 67df77845c181166d4bc324cbb0382f7e81c7631
Michal Suchanek ad43d2
Michal Suchanek ad43d2
As per draft LoPAPR (Revision 2.9_pre7), section B.5.3 "Run Time
Michal Suchanek ad43d2
Abstraction Services (RTAS) Node" available at:
Michal Suchanek ad43d2
  https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200611.pdf
Michal Suchanek ad43d2
Michal Suchanek ad43d2
... there are 2 device tree properties:
Michal Suchanek ad43d2
Michal Suchanek ad43d2
  "ibm,max-associativity-domains"
Michal Suchanek ad43d2
   which defines the maximum number of domains that the firmware i.e
Michal Suchanek ad43d2
   PowerVM can support.
Michal Suchanek ad43d2
Michal Suchanek ad43d2
and:
Michal Suchanek ad43d2
Michal Suchanek ad43d2
  "ibm,current-associativity-domains"
Michal Suchanek ad43d2
   which defines the maximum number of domains that the current
Michal Suchanek ad43d2
   platform can support.
Michal Suchanek ad43d2
Michal Suchanek ad43d2
The value of "ibm,max-associativity-domains" is always greater than or
Michal Suchanek ad43d2
equal to "ibm,current-associativity-domains" property. If the latter
Michal Suchanek ad43d2
property is not available, use "ibm,max-associativity-domain" as a
Michal Suchanek ad43d2
fallback. In this yet to be released LoPAPR, "ibm,current-associativity-domains"
Michal Suchanek ad43d2
is mentioned in page 833 / B.5.3 which is covered under under
Michal Suchanek ad43d2
"Appendix B. System Binding" section
Michal Suchanek ad43d2
Michal Suchanek ad43d2
Currently powerpc uses the "ibm,max-associativity-domains" property
Michal Suchanek ad43d2
while setting the possible number of nodes. This is currently set at
Michal Suchanek ad43d2
32. However the possible number of nodes for a platform may be
Michal Suchanek ad43d2
significantly less. Hence set the possible number of nodes based on
Michal Suchanek ad43d2
"ibm,current-associativity-domains" property.
Michal Suchanek ad43d2
Michal Suchanek ad43d2
Nathan Lynch had raised a valid concern that post LPM (Live Partition
Michal Suchanek ad43d2
Migration), a user could DLPAR add processors and memory after LPM
Michal Suchanek ad43d2
with "new" associativity properties:
Michal Suchanek ad43d2
  https://lore.kernel.org/linuxppc-dev/871rljfet9.fsf@linux.ibm.com/t/#u
Michal Suchanek ad43d2
Michal Suchanek ad43d2
He also pointed out that "ibm,max-associativity-domains" has the same
Michal Suchanek ad43d2
contents on all currently available PowerVM systems, unlike
Michal Suchanek ad43d2
"ibm,current-associativity-domains" and hence may be better able to
Michal Suchanek ad43d2
handle the new NUMA associativity properties.
Michal Suchanek ad43d2
Michal Suchanek ad43d2
However with the recent commit dbce45628085 ("powerpc/numa: Limit
Michal Suchanek ad43d2
possible nodes to within num_possible_nodes"), all new NUMA
Michal Suchanek ad43d2
associativity properties are capped to initially set nr_node_ids.
Michal Suchanek ad43d2
Hence this commit should be safe with any new DLPAR add post LPM.
Michal Suchanek ad43d2
Michal Suchanek ad43d2
  $ lsprop /proc/device-tree/rtas/ibm,*associ*-domains
Michal Suchanek ad43d2
  /proc/device-tree/rtas/ibm,current-associativity-domains
Michal Suchanek ad43d2
  		 00000005 00000001 00000002 00000002 00000002 00000010
Michal Suchanek ad43d2
  /proc/device-tree/rtas/ibm,max-associativity-domains
Michal Suchanek ad43d2
  		 00000005 00000001 00000008 00000020 00000020 00000100
Michal Suchanek ad43d2
Michal Suchanek ad43d2
  $ cat /sys/devices/system/node/possible ##Before patch
Michal Suchanek ad43d2
  0-31
Michal Suchanek ad43d2
Michal Suchanek ad43d2
  $ cat /sys/devices/system/node/possible ##After patch
Michal Suchanek ad43d2
  0-1
Michal Suchanek ad43d2
Michal Suchanek ad43d2
Note the maximum nodes this platform can support is only 2 but the
Michal Suchanek ad43d2
possible nodes is set to 32.
Michal Suchanek ad43d2
Michal Suchanek ad43d2
This is important because lot of kernel and user space code allocate
Michal Suchanek ad43d2
structures for all possible nodes leading to a lot of memory that is
Michal Suchanek ad43d2
allocated but not used.
Michal Suchanek ad43d2
Michal Suchanek ad43d2
I ran a simple experiment to create and destroy 100 memory cgroups on
Michal Suchanek ad43d2
boot on a 8 node machine (Power8 Alpine).
Michal Suchanek ad43d2
Michal Suchanek ad43d2
Before patch:
Michal Suchanek ad43d2
  free -k at boot
Michal Suchanek ad43d2
                total        used        free      shared  buff/cache   available
Michal Suchanek ad43d2
  Mem:      523498176     4106816   518820608       22272      570752   516606720
Michal Suchanek ad43d2
  Swap:       4194240           0     4194240
Michal Suchanek ad43d2
Michal Suchanek ad43d2
  free -k after creating 100 memory cgroups
Michal Suchanek ad43d2
                total        used        free      shared  buff/cache   available
Michal Suchanek ad43d2
  Mem:      523498176     4628416   518246464       22336      623296   516058688
Michal Suchanek ad43d2
  Swap:       4194240           0     4194240
Michal Suchanek ad43d2
Michal Suchanek ad43d2
  free -k after destroying 100 memory cgroups
Michal Suchanek ad43d2
                total        used        free      shared  buff/cache   available
Michal Suchanek ad43d2
  Mem:      523498176     4697408   518173760       22400      627008   515987904
Michal Suchanek ad43d2
  Swap:       4194240           0     4194240
Michal Suchanek ad43d2
Michal Suchanek ad43d2
After patch:
Michal Suchanek ad43d2
  free -k at boot
Michal Suchanek ad43d2
                total        used        free      shared  buff/cache   available
Michal Suchanek ad43d2
  Mem:      523498176     3969472   518933888       22272      594816   516731776
Michal Suchanek ad43d2
  Swap:       4194240           0     4194240
Michal Suchanek ad43d2
Michal Suchanek ad43d2
  free -k after creating 100 memory cgroups
Michal Suchanek ad43d2
                total        used        free      shared  buff/cache   available
Michal Suchanek ad43d2
  Mem:      523498176     4181888   518676096       22208      640192   516496448
Michal Suchanek ad43d2
  Swap:       4194240           0     4194240
Michal Suchanek ad43d2
Michal Suchanek ad43d2
  free -k after destroying 100 memory cgroups
Michal Suchanek ad43d2
                total        used        free      shared  buff/cache   available
Michal Suchanek ad43d2
  Mem:      523498176     4232320   518619904       22272      645952   516443264
Michal Suchanek ad43d2
  Swap:       4194240           0     4194240
Michal Suchanek ad43d2
Michal Suchanek ad43d2
Observations:
Michal Suchanek ad43d2
  Fixed kernel takes 137344 kb (4106816-3969472) less to boot.
Michal Suchanek ad43d2
  Fixed kernel takes 309184 kb (4628416-4181888-137344) less to create 100 memcgs.
Michal Suchanek ad43d2
Michal Suchanek ad43d2
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Michal Suchanek ad43d2
[mpe: Reformat change log a bit for readability]
Michal Suchanek ad43d2
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michal Suchanek ad43d2
Link: https://lore.kernel.org/r/20200817055257.110873-1-srikar@linux.vnet.ibm.com
Michal Suchanek ad43d2
Acked-by: Michal Suchanek <msuchanek@suse.de>
Michal Suchanek ad43d2
---
Michal Suchanek ad43d2
 arch/powerpc/mm/numa.c | 15 ++++++++++++---
Michal Suchanek ad43d2
 1 file changed, 12 insertions(+), 3 deletions(-)
Michal Suchanek ad43d2
Michal Suchanek ad43d2
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
Michal Suchanek ad43d2
index 1f61fa2148b5..5ddc83ba20f4 100644
Michal Suchanek ad43d2
--- a/arch/powerpc/mm/numa.c
Michal Suchanek ad43d2
+++ b/arch/powerpc/mm/numa.c
Michal Suchanek ad43d2
@@ -900,10 +900,19 @@ static void __init find_possible_nodes(void)
Michal Suchanek ad43d2
 	if (!rtas)
Michal Suchanek ad43d2
 		return;
Michal Suchanek ad43d2
 
Michal Suchanek ad43d2
-	if (of_property_read_u32_index(rtas,
Michal Suchanek ad43d2
-				"ibm,max-associativity-domains",
Michal Suchanek ad43d2
+	if (of_property_read_u32_index(rtas, "ibm,current-associativity-domains",
Michal Suchanek ad43d2
+				min_common_depth, &numnodes)) {
Michal Suchanek ad43d2
+		/*
Michal Suchanek ad43d2
+		 * ibm,current-associativity-domains is a fairly recent
Michal Suchanek ad43d2
+		 * property. If it doesn't exist, then fallback on
Michal Suchanek ad43d2
+		 * ibm,max-associativity-domains. Current denotes what the
Michal Suchanek ad43d2
+		 * platform can support compared to max which denotes what the
Michal Suchanek ad43d2
+		 * Hypervisor can support.
Michal Suchanek ad43d2
+		 */
Michal Suchanek ad43d2
+		if (of_property_read_u32_index(rtas, "ibm,max-associativity-domains",
Michal Suchanek ad43d2
 				min_common_depth, &numnodes))
Michal Suchanek ad43d2
-		goto out;
Michal Suchanek ad43d2
+			goto out;
Michal Suchanek ad43d2
+	}
Michal Suchanek ad43d2
 
Michal Suchanek ad43d2
 	for (i = 0; i < numnodes; i++) {
Michal Suchanek ad43d2
 		if (!node_possible(i))
Michal Suchanek ad43d2
-- 
Michal Suchanek ad43d2
2.40.0
Michal Suchanek ad43d2