| From 3f8dec116210ca649163574ed5f8df1e3b837d07 Mon Sep 17 00:00:00 2001 |
| From: Darren Hart <darren@os.amperecomputing.com> |
| Date: Tue, 8 Mar 2022 10:50:48 -0800 |
| Subject: [PATCH] ACPI/APEI: Limit printable size of BERT table data |
| Git-commit: 3f8dec116210ca649163574ed5f8df1e3b837d07 |
| Patch-mainline: v5.18-rc1 |
| References: git-fixes |
| |
| Platforms with large BERT table data can trigger soft lockup errors |
| while attempting to print the entire BERT table data to the console at |
| Boot: |
| |
| watchdog: BUG: soft lockup - CPU#160 stuck for 23s! [swapper/0:1] |
| |
| Observed on Ampere Altra systems with a single BERT record of ~250KB. |
| |
| The original bert driver appears to have assumed relatively small table |
| data. Since it is impractical to reassemble large table data from |
| interwoven console messages, and the table data is available in |
| |
| /sys/firmware/acpi/tables/data/BERT |
| |
| limit the size for tables printed to the console to 1024 (for no reason |
| other than it seemed like a good place to kick off the discussion, would |
| appreciate feedback from existing users in terms of what size would |
| maintain their current usage model). |
| |
| Alternatively, we could make printing a CONFIG option, use the |
| bert_disable boot arg (or something similar), or use a debug log level. |
| However, all those solutions require extra steps or change the existing |
| behavior for small table data. Limiting the size preserves existing |
| behavior on existing platforms with small table data, and eliminates the |
| soft lockups for platforms with large table data, while still making it |
| available. |
| |
| Signed-off-by: Darren Hart <darren@os.amperecomputing.com> |
| Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
| Acked-by: Takashi Iwai <tiwai@suse.de> |
| |
| |
| drivers/acpi/apei/bert.c | 8 ++++++-- |
| 1 file changed, 6 insertions(+), 2 deletions(-) |
| |
| diff --git a/drivers/acpi/apei/bert.c b/drivers/acpi/apei/bert.c |
| index 86211422f4ee..598fd19b65fa 100644 |
| |
| |
| @@ -29,6 +29,7 @@ |
| |
| #undef pr_fmt |
| #define pr_fmt(fmt) "BERT: " fmt |
| +#define ACPI_BERT_PRINT_MAX_LEN 1024 |
| |
| static int bert_disable; |
| |
| @@ -58,8 +59,11 @@ static void __init bert_print_all(struct acpi_bert_region *region, |
| } |
| |
| pr_info_once("Error records from previous boot:\n"); |
| - |
| - cper_estatus_print(KERN_INFO HW_ERR, estatus); |
| + if (region_len < ACPI_BERT_PRINT_MAX_LEN) |
| + cper_estatus_print(KERN_INFO HW_ERR, estatus); |
| + else |
| + pr_info_once("Max print length exceeded, table data is available at:\n" |
| + "/sys/firmware/acpi/tables/data/BERT"); |
| |
| /* |
| * Because the boot error source is "one-time polled" type, |
| -- |
| 2.31.1 |
| |