Blob Blame History Raw
From: Yazen Ghannam <yazen.ghannam@amd.com>
Date: Mon, 22 Jan 2024 22:14:01 -0600
Subject: Documentation: RAS: Add index and address translation section
Git-commit: 1289c431641f8beacc47db506210154dcea2492a
Patch-mainline: v6.9-rc1
References: jsc#PED-7618

There are a lot of RAS topic to document, and there are a lot of details
for each topic.

Prep for this by adding an index for the RAS directory. This will
provide a top-level document and table of contents. It also provides the
option to build the RAS directory individually using "make SPHINXDIRS=".

Start a section on address translation. This will be expanded with
details for future translation methods and how they're used in the
kernel.

Move the error decoding topic to its own section. Links to other error
decoding kernel docs will be added.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240123041401.79812-4-yazen.ghannam@amd.com

Acked-by: Nikolay Borisov <nik.borisov@suse.com>
---
 Documentation/RAS/address-translation.rst |   24 ++++++++++++++++++++++++
 Documentation/RAS/error-decoding.rst      |   21 +++++++++++++++++++++
 Documentation/RAS/index.rst               |   14 ++++++++++++++
 Documentation/RAS/ras.rst                 |   26 --------------------------
 Documentation/index.rst                   |    2 +-
 MAINTAINERS                               |    1 +
 6 files changed, 61 insertions(+), 27 deletions(-)

--- /dev/null
+++ b/Documentation/RAS/address-translation.rst
@@ -0,0 +1,24 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Address translation
+===================
+
+x86 AMD
+-------
+
+Zen-based AMD systems include a Data Fabric that manages the layout of
+physical memory. Devices attached to the Fabric, like memory controllers,
+I/O, etc., may not have a complete view of the system physical memory map.
+These devices may provide a "normalized", i.e. device physical, address
+when reporting memory errors. Normalized addresses must be translated to
+a system physical address for the kernel to action on the memory.
+
+AMD Address Translation Library (CONFIG_AMD_ATL) provides translation for
+this case.
+
+Glossary of acronyms used in address translation for Zen-based systems
+
+* CCM               = Cache Coherent Moderator
+* COD               = Cluster-on-Die
+* COH_ST            = Coherent Station
+* DF                = Data Fabric
--- /dev/null
+++ b/Documentation/RAS/error-decoding.rst
@@ -0,0 +1,21 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Error decoding
+==============
+
+x86
+---
+
+Error decoding on AMD systems should be done using the rasdaemon tool:
+https://github.com/mchehab/rasdaemon/
+
+While the daemon is running, it would automatically log and decode
+errors. If not, one can still decode such errors by supplying the
+hardware information from the error::
+
+        $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca
+
+Also, the user can pass particular family and model to decode the error
+string::
+
+        $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca --family <CPU Family> --model <CPU Model> --bank <BANK_NUM>
--- /dev/null
+++ b/Documentation/RAS/index.rst
@@ -0,0 +1,14 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================================
+Reliability, Availability and Serviceability (RAS) features
+===========================================================
+
+This documents different aspects of the RAS functionality present in the
+kernel.
+
+.. toctree::
+   :maxdepth: 2
+
+   error-decoding
+   address-translation
--- a/Documentation/RAS/ras.rst
+++ /dev/null
@@ -1,26 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-Reliability, Availability and Serviceability features
-=====================================================
-
-This documents different aspects of the RAS functionality present in the
-kernel.
-
-Error decoding
----------------
-
-* x86
-
-Error decoding on AMD systems should be done using the rasdaemon tool:
-https://github.com/mchehab/rasdaemon/
-
-While the daemon is running, it would automatically log and decode
-errors. If not, one can still decode such errors by supplying the
-hardware information from the error::
-
-        $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca
-
-Also, the user can pass particular family and model to decode the error
-string::
-
-        $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca --family <CPU Family> --model <CPU Model> --bank <BANK_NUM>
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -165,7 +165,7 @@ to ReStructured Text format, or are simp
    :maxdepth: 2
 
    staging/index
-   RAS/ras
+   RAS/index
 
 
 Translations
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15702,6 +15702,7 @@ M:	Tony Luck <tony.luck@intel.com>
 M:	Borislav Petkov <bp@alien8.de>
 L:	linux-edac@vger.kernel.org
 S:	Maintained
+F:	Documentation/RAS/
 F:	Documentation/admin-guide/ras.rst
 F:	drivers/ras/
 F:	include/linux/ras.h