Blob Blame History Raw
From: Dust Li <dust.li@linux.alibaba.com>
Date: Mon, 27 Dec 2021 20:38:06 +0800
Subject: RDMA/mlx5: Print wc status on CQE error and dump needed
Patch-mainline: v5.17-rc1
Git-commit: a7ad9ddeb528b91de03cedeef34532dc0ba77bfd
References: jsc#PED-1552

mlx5_handle_error_cqe() only dump the content of the CQE which is raw hex
data, and not straighforward for debug.  Print WC status message when we
got CQE error and dump is need.

Here is an example of how the dmesg log looks like with this:

 infiniband mlx5_0: mlx5_handle_error_cqe:333:(pid 0): WC error: 10, message: remote access error
 infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe
 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 00000030: 00 00 00 00 00 00 88 13 08 03 61 b3 1e a1 42 d3

Link: https://lore.kernel.org/r/20211227123806.47530-1-dust.li@linux.alibaba.com
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
---
 drivers/infiniband/hw/mlx5/cq.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -328,8 +328,11 @@ static void mlx5_handle_error_cqe(struct
 	}
 
 	wc->vendor_err = cqe->vendor_err_synd;
-	if (dump)
+	if (dump) {
+		mlx5_ib_warn(dev, "WC error: %d, Message: %s\n", wc->status,
+			     ib_wc_status_msg(wc->status));
 		dump_cqe(dev, cqe);
+	}
 }
 
 static void handle_atomics(struct mlx5_ib_qp *qp, struct mlx5_cqe64 *cqe64,