Blob Blame History Raw
From: Michal Hocko <mhocko@suse.cz>
Subject: pagecache_limit: batch large nr_to_scan targets
Patch-mainline: never, SUSE specific
References: bnc#895221

Although pagecache_limit is expected to be set before the load is started there
seem to be a user which sets the limit after the machine is short on memory and
has a large amount of the page cache already. Although such a usage is dubious
at best we still shouldn't fall flat under such conditions.

We had a report where a machine with 512GB of RAM crashed as a result of hard
lockup detector (which is on by default) when the limit was set to 1G because
the page cache reclaimer got a target of 22M pages and isolate_lru_pages simply
starved other spinners for too long.

This patch batches the scan target by SWAP_CLUSTER_MAX chunks to prevent
from such issues. It should be still noted that setting limit to a
very small value wrt. an already large page cache target is dangerous and can
lead to big reclaim storms and page cache over-reclaim.

Signed-off-by: Michal Hocko <mhocko@suse.cz>

---
 mm/vmscan.c |   19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3766,9 +3766,6 @@ static int shrink_all_nodes(unsigned lon
 				unsigned long scanned = sc->nr_scanned;
 				struct reclaim_state *old_rs = current->reclaim_state;
 
-				/* shrink_list takes lru_lock with IRQ off so we
-				 * should be careful about really huge nr_to_scan
-				 */
 				nr_to_scan = min(nr_pages, lru_pages);
 
 				/*
@@ -3780,10 +3777,18 @@ static int shrink_all_nodes(unsigned lon
 				 *
 				 * Let's stick with this for bug-to-bug compatibility
 				 */
-				if (shrink_node_per_memcg(pgdat, lru,
-					nr_to_scan, nr_pages, &nr_reclaimed, sc)) {
-					pagecache_reclaim_unlock_node(pgdat);
-					goto out_wakeup;
+				while (nr_to_scan > 0) {
+					/* shrink_list takes lru_lock with IRQ off so we
+					 * should be careful about really huge nr_to_scan
+					 */
+					unsigned long batch = min_t(unsigned long, nr_to_scan, SWAP_CLUSTER_MAX);
+
+					if (shrink_node_per_memcg(pgdat, lru,
+						batch, nr_pages, &nr_reclaimed, sc)) {
+						pagecache_reclaim_unlock_node(pgdat);
+						goto out_wakeup;
+					}
+					nr_to_scan -= batch;
 				}
 
 				current->reclaim_state = &reclaim_state;