Blob Blame History Raw
From bc5fa217709124797c98bd602b55de7fec34d150 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 7 Nov 2018 10:26:07 +0100
Subject: [PATCH 5/5] mm, memory_hotplug: be more verbose for memory offline
 failures
Git-commit: 2932c8b05056d4ba702f70f4deebe1c97600e62b
Patch-mainline: v5.0-rc1
References: generic hotplug debugability

There is only very limited information printed when the memory offlining
Fails: 
[ 1984.506184] rac1 kernel: memory offlining [mem 0x82600000000-0x8267fffffff] failed due to signal backoff

This tells us that the failure is triggered by the userspace
intervention but it doesn't tell us much more about the underlying
reason. It might be that the page migration failes repeatedly and the
userspace timeout expires and send a signal or it might be some of the
earlier steps (isolation, memory notifier) takes too long.

If the migration failes then it would be really helpful to see which
page that and its state. The same applies to the isolation phase. If we
fail to isolate a page from the allocator then knowing the state of the
page would be helpful as well.

Dump the page state that fails to get isolated or migrated. This will
tell us more about the failure and what to focus on during debugging.

Signed-off-by: Michal Hocko <mhocko@suse.com>

---
 mm/memory_hotplug.c |   12 ++++++++----
 mm/page_alloc.c     |    7 +++++--
 2 files changed, 13 insertions(+), 6 deletions(-)

--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1522,10 +1522,8 @@ do_migrate_range(unsigned long start_pfn
 						    page_is_file_cache(page));
 
 		} else {
-#ifdef CONFIG_DEBUG_VM
-			pr_alert("failed to isolate pfn %lx\n", pfn);
+			pr_warn("failed to isolate pfn %lx\n", pfn);
 			dump_page(page, "isolation failed");
-#endif
 			put_page(page);
 			/* Because we don't have big zone->lock. we should
 			   check this again here. */
@@ -1545,8 +1543,14 @@ do_migrate_range(unsigned long start_pfn
 		/* Allocate a new page from the nearest neighbor node */
 		ret = migrate_pages(&source, new_node_page, NULL, 0,
 					MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
-		if (ret)
+		if (ret) {
+			list_for_each_entry(page, &source, lru) {
+				pr_warn("migrating pfn %lx failed ret:%d ",
+				       page_to_pfn(page), ret);
+				dump_page(page, "migration failure");
+			}
 			putback_movable_pages(&source);
+		}
 	}
 out:
 	return ret;
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7426,7 +7426,7 @@ bool has_unmovable_pages(struct zone *zo
 			unsigned int skip_pages;
 
 			if (!hugepage_migration_supported(page_hstate(head)))
-				return true;
+				goto unmovable;
 
 			skip_pages = (1 << compound_order(head)) - (page - head);
 			iter += skip_pages - 1;
@@ -7471,9 +7471,12 @@ bool has_unmovable_pages(struct zone *zo
 		 * page at boot.
 		 */
 		if (found > count)
-			return true;
+			goto unmovable;
 	}
 	return false;
+unmovable:
+	dump_page(pfn_to_page(pfn+iter), "unmovable page");
+	return true;
 }
 
 bool is_pageblock_removable_nolock(struct page *page)