Thomas Bogendoerfer 62d4fc
From: Manish Chopra <manishc@marvell.com>
Thomas Bogendoerfer 62d4fc
Date: Tue, 26 Apr 2022 08:39:13 -0700
Thomas Bogendoerfer 62d4fc
Subject: bnx2x: fix napi API usage sequence
Thomas Bogendoerfer 62d4fc
Patch-mainline: v5.18-rc5
Thomas Bogendoerfer 62d4fc
Git-commit: af68656d66eda219b7f55ce8313a1da0312c79e1
Thomas Bogendoerfer 62d4fc
References: bsc#1198217
Thomas Bogendoerfer 62d4fc
Thomas Bogendoerfer 62d4fc
While handling PCI errors (AER flow) driver tries to
Thomas Bogendoerfer 62d4fc
disable NAPI [napi_disable()] after NAPI is deleted
Thomas Bogendoerfer 62d4fc
[__netif_napi_del()] which causes unexpected system
Thomas Bogendoerfer 62d4fc
hang/crash.
Thomas Bogendoerfer 62d4fc
Thomas Bogendoerfer 62d4fc
System message log shows the following:
Thomas Bogendoerfer 62d4fc
=======================================
Thomas Bogendoerfer 62d4fc
[ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
Thomas Bogendoerfer 62d4fc
[ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)'
Thomas Bogendoerfer 62d4fc
[ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
Thomas Bogendoerfer 62d4fc
bnx2x->error_detected(IO frozen)
Thomas Bogendoerfer 62d4fc
[ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports:
Thomas Bogendoerfer 62d4fc
'need reset'
Thomas Bogendoerfer 62d4fc
[ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking
Thomas Bogendoerfer 62d4fc
bnx2x->error_detected(IO frozen)
Thomas Bogendoerfer 62d4fc
[ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports:
Thomas Bogendoerfer 62d4fc
'need reset'
Thomas Bogendoerfer 62d4fc
[ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
Thomas Bogendoerfer 62d4fc
[ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor: 168e14e4 [ 3222.583557] EEH: PCI cmd/status register: 00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow:
Thomas Bogendoerfer 62d4fc
[ 3222.583744] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20: 00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows:
Thomas Bogendoerfer 62d4fc
[ 3222.584079] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor: 168e14e4 [ 3222.584491] EEH: PCI cmd/status register: 00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow:
Thomas Bogendoerfer 62d4fc
[ 3222.584677] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20: 00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows:
Thomas Bogendoerfer 62d4fc
[ 3222.585011] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset'
Thomas Bogendoerfer 62d4fc
[ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
Thomas Bogendoerfer 62d4fc
bnx2x->slot_reset()
Thomas Bogendoerfer 62d4fc
[ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing...
Thomas Bogendoerfer 62d4fc
[ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset
Thomas Bogendoerfer 62d4fc
--> driver unload
Thomas Bogendoerfer 62d4fc
Thomas Bogendoerfer 62d4fc
Uninterruptible tasks
Thomas Bogendoerfer 62d4fc
=====================
Thomas Bogendoerfer 62d4fc
crash> ps | grep UN
Thomas Bogendoerfer 62d4fc
     213      2  11  c000000004c89e00  UN   0.0       0      0  [eehd]
Thomas Bogendoerfer 62d4fc
     215      2   0  c000000004c80000  UN   0.0       0      0
Thomas Bogendoerfer 62d4fc
[kworker/0:2]
Thomas Bogendoerfer 62d4fc
    2196      1  28  c000000004504f00  UN   0.1   15936  11136  wickedd
Thomas Bogendoerfer 62d4fc
    4287      1   9  c00000020d076800  UN   0.0    4032   3008  agetty
Thomas Bogendoerfer 62d4fc
    4289      1  20  c00000020d056680  UN   0.0    7232   3840  agetty
Thomas Bogendoerfer 62d4fc
   32423      2  26  c00000020038c580  UN   0.0       0      0
Thomas Bogendoerfer 62d4fc
[kworker/26:3]
Thomas Bogendoerfer 62d4fc
   32871   4241  27  c0000002609ddd00  UN   0.1   18624  11648  sshd
Thomas Bogendoerfer 62d4fc
   32920  10130  16  c00000027284a100  UN   0.1   48512  12608  sendmail
Thomas Bogendoerfer 62d4fc
   33092  32987   0  c000000205218b00  UN   0.1   48512  12608  sendmail
Thomas Bogendoerfer 62d4fc
   33154   4567  16  c000000260e51780  UN   0.1   48832  12864  pickup
Thomas Bogendoerfer 62d4fc
   33209   4241  36  c000000270cb6500  UN   0.1   18624  11712  sshd
Thomas Bogendoerfer 62d4fc
   33473  33283   0  c000000205211480  UN   0.1   48512  12672  sendmail
Thomas Bogendoerfer 62d4fc
   33531   4241  37  c00000023c902780  UN   0.1   18624  11648  sshd
Thomas Bogendoerfer 62d4fc
Thomas Bogendoerfer 62d4fc
EEH handler hung while bnx2x sleeping and holding RTNL lock
Thomas Bogendoerfer 62d4fc
===========================================================
Thomas Bogendoerfer 62d4fc
crash> bt 213
Thomas Bogendoerfer 62d4fc
PID: 213    TASK: c000000004c89e00  CPU: 11  COMMAND: "eehd"
Thomas Bogendoerfer 62d4fc
  #0 [c000000004d477e0] __schedule at c000000000c70808
Thomas Bogendoerfer 62d4fc
  #1 [c000000004d478b0] schedule at c000000000c70ee0
Thomas Bogendoerfer 62d4fc
  #2 [c000000004d478e0] schedule_timeout at c000000000c76dec
Thomas Bogendoerfer 62d4fc
  #3 [c000000004d479c0] msleep at c0000000002120cc
Thomas Bogendoerfer 62d4fc
  #4 [c000000004d479f0] napi_disable at c000000000a06448
Thomas Bogendoerfer 62d4fc
                                        ^^^^^^^^^^^^^^^^
Thomas Bogendoerfer 62d4fc
  #5 [c000000004d47a30] bnx2x_netif_stop at c0080000018dba94 [bnx2x]
Thomas Bogendoerfer 62d4fc
  #6 [c000000004d47a60] bnx2x_io_slot_reset at c0080000018a551c [bnx2x]
Thomas Bogendoerfer 62d4fc
  #7 [c000000004d47b20] eeh_report_reset at c00000000004c9bc
Thomas Bogendoerfer 62d4fc
  #8 [c000000004d47b90] eeh_pe_report at c00000000004d1a8
Thomas Bogendoerfer 62d4fc
  #9 [c000000004d47c40] eeh_handle_normal_event at c00000000004da64
Thomas Bogendoerfer 62d4fc
Thomas Bogendoerfer 62d4fc
And the sleeping source code
Thomas Bogendoerfer 62d4fc
============================
Thomas Bogendoerfer 62d4fc
crash> dis -ls c000000000a06448
Thomas Bogendoerfer 62d4fc
FILE: ../net/core/dev.c
Thomas Bogendoerfer 62d4fc
LINE: 6702
Thomas Bogendoerfer 62d4fc
Thomas Bogendoerfer 62d4fc
   6697  {
Thomas Bogendoerfer 62d4fc
   6698          might_sleep();
Thomas Bogendoerfer 62d4fc
   6699          set_bit(NAPI_STATE_DISABLE, &n->state);
Thomas Bogendoerfer 62d4fc
   6700
Thomas Bogendoerfer 62d4fc
   6701          while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
Thomas Bogendoerfer 62d4fc
* 6702                  msleep(1);
Thomas Bogendoerfer 62d4fc
   6703          while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
Thomas Bogendoerfer 62d4fc
   6704                  msleep(1);
Thomas Bogendoerfer 62d4fc
   6705
Thomas Bogendoerfer 62d4fc
   6706          hrtimer_cancel(&n->timer);
Thomas Bogendoerfer 62d4fc
   6707
Thomas Bogendoerfer 62d4fc
   6708          clear_bit(NAPI_STATE_DISABLE, &n->state);
Thomas Bogendoerfer 62d4fc
   6709  }
Thomas Bogendoerfer 62d4fc
Thomas Bogendoerfer 62d4fc
EEH calls into bnx2x twice based on the system log above, first through
Thomas Bogendoerfer 62d4fc
bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes
Thomas Bogendoerfer 62d4fc
the following call chains:
Thomas Bogendoerfer 62d4fc
Thomas Bogendoerfer 62d4fc
bnx2x_io_error_detected()
Thomas Bogendoerfer 62d4fc
  +-> bnx2x_eeh_nic_unload()
Thomas Bogendoerfer 62d4fc
       +-> bnx2x_del_all_napi()
Thomas Bogendoerfer 62d4fc
            +-> __netif_napi_del()
Thomas Bogendoerfer 62d4fc
Thomas Bogendoerfer 62d4fc
bnx2x_io_slot_reset()
Thomas Bogendoerfer 62d4fc
  +-> bnx2x_netif_stop()
Thomas Bogendoerfer 62d4fc
       +-> bnx2x_napi_disable()
Thomas Bogendoerfer 62d4fc
            +->napi_disable()
Thomas Bogendoerfer 62d4fc
Thomas Bogendoerfer 62d4fc
Fix this by correcting the sequence of NAPI APIs usage,
Thomas Bogendoerfer 62d4fc
that is delete the NAPI after disabling it.
Thomas Bogendoerfer 62d4fc
Thomas Bogendoerfer 62d4fc
Fixes: 7fa6f34081f1 ("bnx2x: AER revised")
Thomas Bogendoerfer 62d4fc
Reported-by: David Christensen <drc@linux.vnet.ibm.com>
Thomas Bogendoerfer 62d4fc
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Thomas Bogendoerfer 62d4fc
Signed-off-by: Manish Chopra <manishc@marvell.com>
Thomas Bogendoerfer 62d4fc
Signed-off-by: Ariel Elior <aelior@marvell.com>
Thomas Bogendoerfer 62d4fc
Link: https://lore.kernel.org/r/20220426153913.6966-1-manishc@marvell.com
Thomas Bogendoerfer 62d4fc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thomas Bogendoerfer 62d4fc
Acked-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Thomas Bogendoerfer 62d4fc
---
Thomas Bogendoerfer 62d4fc
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    9 +++++----
Thomas Bogendoerfer 62d4fc
 1 file changed, 5 insertions(+), 4 deletions(-)
Thomas Bogendoerfer 62d4fc
Thomas Bogendoerfer 62d4fc
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
Thomas Bogendoerfer 62d4fc
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
Thomas Bogendoerfer 62d4fc
@@ -14195,10 +14195,6 @@ static int bnx2x_eeh_nic_unload(struct b
Thomas Bogendoerfer 62d4fc
 
Thomas Bogendoerfer 62d4fc
 	/* Stop Tx */
Thomas Bogendoerfer 62d4fc
 	bnx2x_tx_disable(bp);
Thomas Bogendoerfer 62d4fc
-	/* Delete all NAPI objects */
Thomas Bogendoerfer 62d4fc
-	bnx2x_del_all_napi(bp);
Thomas Bogendoerfer 62d4fc
-	if (CNIC_LOADED(bp))
Thomas Bogendoerfer 62d4fc
-		bnx2x_del_all_napi_cnic(bp);
Thomas Bogendoerfer 62d4fc
 	netdev_reset_tc(bp->dev);
Thomas Bogendoerfer 62d4fc
 
Thomas Bogendoerfer 62d4fc
 	del_timer_sync(&bp->timer);
Thomas Bogendoerfer 62d4fc
@@ -14303,6 +14299,11 @@ static pci_ers_result_t bnx2x_io_slot_re
Thomas Bogendoerfer 62d4fc
 		bnx2x_drain_tx_queues(bp);
Thomas Bogendoerfer 62d4fc
 		bnx2x_send_unload_req(bp, UNLOAD_RECOVERY);
Thomas Bogendoerfer 62d4fc
 		bnx2x_netif_stop(bp, 1);
Thomas Bogendoerfer 62d4fc
+		bnx2x_del_all_napi(bp);
Thomas Bogendoerfer 62d4fc
+
Thomas Bogendoerfer 62d4fc
+		if (CNIC_LOADED(bp))
Thomas Bogendoerfer 62d4fc
+			bnx2x_del_all_napi_cnic(bp);
Thomas Bogendoerfer 62d4fc
+
Thomas Bogendoerfer 62d4fc
 		bnx2x_free_irq(bp);
Thomas Bogendoerfer 62d4fc
 
Thomas Bogendoerfer 62d4fc
 		/* Report UNLOAD_DONE to MCP */