Blob Blame History Raw
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Date: Mon, 22 Jun 2020 19:35:23 +0800
Subject: PCI/ERR: Clear PCIe Device Status errors only if OS owns AER
Git-commit: 068c29a248b6ddbfdf7bb150b547569759620d36
Patch-mainline: v5.9-rc1
References: bsc#1174426

pcie_clear_device_status() resets the error bits in the PCIe Device Status
Register (PCI_EXP_DEVSTA).

Previously we did this unconditionally, but on ACPI systems, the _OSC AER
bit negotiates control of the AER capability.  Per sec 4.5.1 of the System
Firmware Intermediary _OSC and DPC Updates ECN [1], this bit also covers
other error enable/status bits including the following:

  Correctable Error Reporting Enable
  Non-Fatal Error Reporting Enable
  Fatal Error Reporting Enable
  Unsupported Request Reporting Enable

These bits are all in the PCIe Device Control register (the ECN omitted
"Reporting", but I think that's a typo), so by implication the _OSC AER bit
also applies to the error status bits in the PCIe Device Status register:

  Correctable Error Detected
  Non-Fatal Error Detected
  Fatal Error Detected
  Unsupported Request Detected

Clear the PCIe Device Status error bits only when the OS controls the AER
capability and related error enable/status bits.  If platform firmware
controls the AER capability, firmware is responsible for clearing these
bits.

One call path leading here is:

  ghes_do_proc
    ghes_handle_aer
      aer_recover_queue
        schedule_work(&aer_recover_work)
  ...
  aer_recover_work_func
    pcie_do_recovery
      pcie_clear_device_status

[1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
    2020, affecting PCI Firmware Specification, Rev. 3.2
    https://members.pcisig.com/wg/PCI-SIG/document/14076
[bhelgaas: commit log, move test from pcie_clear_device_status() to callers]
Link: https://lore.kernel.org/r/20200622113523.891666-1-Jonathan.Cameron@huawei.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
---
 drivers/pci/pcie/aer.c |    3 ++-
 drivers/pci/pcie/err.c |    3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -882,7 +882,8 @@ static void handle_error_source(struct p
 		if (aer)
 			pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS,
 					info->status);
-		pci_aer_clear_device_status(dev);
+		if (pcie_aer_is_native(dev))
+			pci_aer_clear_device_status(dev);
 	} else if (info->severity == AER_NONFATAL)
 		pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset);
 	else if (info->severity == AER_FATAL)
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -193,7 +193,8 @@ pci_ers_result_t pcie_do_recovery(struct
 	pci_dbg(dev, "broadcast resume message\n");
 	pci_walk_bus(bus, report_resume, &status);
 
-	pci_aer_clear_device_status(dev);
+	if (pcie_aer_is_native(dev))
+		pci_aer_clear_device_status(dev);
 	pci_aer_clear_nonfatal_status(dev);
 	pci_info(dev, "AER: Device recovery successful\n");
 	return status;