Blob Blame History Raw
From a9b39a7419c43d789fe30364f9fb322215f1e60b Mon Sep 17 00:00:00 2001
From: Hillf Danton <hdanton@sina.com>
Date: Mon, 23 Sep 2019 15:37:26 -0700
Subject: [PATCH] mm, reclaim: make should_continue_reclaim perform dryrun
 detection

References: bnc#1155780 (VM/FS functional and performance backports)
Patch-mainline: v5.4-rc1
Git-commit: 1c6c15971e4709953f75082a5d44212536b1c2b7

Patch series "address hugetlb page allocation stalls", v2.

Allocation of hugetlb pages via sysctl or procfs can stall for minutes or
hours.  A simple example on a two node system with 8GB of memory is as
follows:

echo 4096 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
echo 4096 > /proc/sys/vm/nr_hugepages

Obviously, both allocation attempts will fall short of their 8GB goal.
However, one or both of these commands may stall and not be interruptible.
The issues were initially discussed in mail thread [1] and RFC code at
[2].

This series addresses the issues causing the stalls.  There are two
distinct fixes, a cleanup, and an optimization.  The reclaim patch by
Hillf and compaction patch by Vlasitmil address corner cases in their
respective areas.  hugetlb page allocation could stall due to either of
these issues.  Vlasitmil added a cleanup patch after Hillf's
modifications.  The hugetlb patch by Mike is an optimization suggested
during the debug and development process.

[1] http://lkml.kernel.org/r/d38a095e-dc39-7e82-bb76-2c9247929f07@oracle.com
[2] http://lkml.kernel.org/r/20190724175014.9935-1-mike.kravetz@oracle.com

This patch (of 4):

Address the issue of should_continue_reclaim returning true too often for
__GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned.  This was
observed during hugetlb page allocation causing stalls for minutes or
hours.

We can stop reclaiming pages if compaction reports it can make a progress.
There might be side-effects for other high-order allocations that would
potentially benefit from reclaiming more before compaction so that they
would be faster and less likely to stall.  However, the consequences of
premature/over-reclaim are considered worse.

We can also bail out of reclaiming pages if we know that there are not
enough inactive lru pages left to satisfy the costly allocation.

We can give up reclaiming pages too if we see dryrun occur, with the
certainty of plenty of inactive pages.  IOW with dryrun detected, we are
sure we have reclaimed as many pages as we could.

Link: http://lkml.kernel.org/r/20190806014744.15446-2-mike.kravetz@oracle.com
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/vmscan.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3944acd94764..c4be05cc681d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2621,18 +2621,6 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 			return false;
 	}
 
-	/*
-	 * If we have not reclaimed enough pages for compaction and the
-	 * inactive lists are large enough, continue reclaiming
-	 */
-	pages_for_compaction = compact_gap(sc->order);
-	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
-	if (get_nr_swap_pages() > 0)
-		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
-	if (sc->nr_reclaimed < pages_for_compaction &&
-			inactive_lru_pages > pages_for_compaction)
-		return true;
-
 	/* If compaction would go ahead or the allocation would succeed, stop */
 	for (z = 0; z <= sc->reclaim_idx; z++) {
 		struct zone *zone = &pgdat->node_zones[z];
@@ -2648,7 +2636,21 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 			;
 		}
 	}
-	return true;
+
+	/*
+	 * If we have not reclaimed enough pages for compaction and the
+	 * inactive lists are large enough, continue reclaiming
+	 */
+	pages_for_compaction = compact_gap(sc->order);
+	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
+	if (get_nr_swap_pages() > 0)
+		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
+
+	return inactive_lru_pages > pages_for_compaction &&
+		/*
+		 * avoid dryrun with plenty of inactive pages
+		 */
+		nr_scanned && nr_reclaimed;
 }
 
 static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg)