|
Thomas Bogendoerfer |
62d4fc |
From: Manish Chopra <manishc@marvell.com>
|
|
Thomas Bogendoerfer |
62d4fc |
Date: Tue, 26 Apr 2022 08:39:13 -0700
|
|
Thomas Bogendoerfer |
62d4fc |
Subject: bnx2x: fix napi API usage sequence
|
|
Thomas Bogendoerfer |
62d4fc |
Patch-mainline: v5.18-rc5
|
|
Thomas Bogendoerfer |
62d4fc |
Git-commit: af68656d66eda219b7f55ce8313a1da0312c79e1
|
|
Thomas Bogendoerfer |
62d4fc |
References: bsc#1198217
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
While handling PCI errors (AER flow) driver tries to
|
|
Thomas Bogendoerfer |
62d4fc |
disable NAPI [napi_disable()] after NAPI is deleted
|
|
Thomas Bogendoerfer |
62d4fc |
[__netif_napi_del()] which causes unexpected system
|
|
Thomas Bogendoerfer |
62d4fc |
hang/crash.
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
System message log shows the following:
|
|
Thomas Bogendoerfer |
62d4fc |
=======================================
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)'
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
|
|
Thomas Bogendoerfer |
62d4fc |
bnx2x->error_detected(IO frozen)
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports:
|
|
Thomas Bogendoerfer |
62d4fc |
'need reset'
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking
|
|
Thomas Bogendoerfer |
62d4fc |
bnx2x->error_detected(IO frozen)
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports:
|
|
Thomas Bogendoerfer |
62d4fc |
'need reset'
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor: 168e14e4 [ 3222.583557] EEH: PCI cmd/status register: 00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow:
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3222.583744] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20: 00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows:
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3222.584079] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor: 168e14e4 [ 3222.584491] EEH: PCI cmd/status register: 00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow:
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3222.584677] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20: 00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows:
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3222.585011] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset'
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
|
|
Thomas Bogendoerfer |
62d4fc |
bnx2x->slot_reset()
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing...
|
|
Thomas Bogendoerfer |
62d4fc |
[ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset
|
|
Thomas Bogendoerfer |
62d4fc |
--> driver unload
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
Uninterruptible tasks
|
|
Thomas Bogendoerfer |
62d4fc |
=====================
|
|
Thomas Bogendoerfer |
62d4fc |
crash> ps | grep UN
|
|
Thomas Bogendoerfer |
62d4fc |
213 2 11 c000000004c89e00 UN 0.0 0 0 [eehd]
|
|
Thomas Bogendoerfer |
62d4fc |
215 2 0 c000000004c80000 UN 0.0 0 0
|
|
Thomas Bogendoerfer |
62d4fc |
[kworker/0:2]
|
|
Thomas Bogendoerfer |
62d4fc |
2196 1 28 c000000004504f00 UN 0.1 15936 11136 wickedd
|
|
Thomas Bogendoerfer |
62d4fc |
4287 1 9 c00000020d076800 UN 0.0 4032 3008 agetty
|
|
Thomas Bogendoerfer |
62d4fc |
4289 1 20 c00000020d056680 UN 0.0 7232 3840 agetty
|
|
Thomas Bogendoerfer |
62d4fc |
32423 2 26 c00000020038c580 UN 0.0 0 0
|
|
Thomas Bogendoerfer |
62d4fc |
[kworker/26:3]
|
|
Thomas Bogendoerfer |
62d4fc |
32871 4241 27 c0000002609ddd00 UN 0.1 18624 11648 sshd
|
|
Thomas Bogendoerfer |
62d4fc |
32920 10130 16 c00000027284a100 UN 0.1 48512 12608 sendmail
|
|
Thomas Bogendoerfer |
62d4fc |
33092 32987 0 c000000205218b00 UN 0.1 48512 12608 sendmail
|
|
Thomas Bogendoerfer |
62d4fc |
33154 4567 16 c000000260e51780 UN 0.1 48832 12864 pickup
|
|
Thomas Bogendoerfer |
62d4fc |
33209 4241 36 c000000270cb6500 UN 0.1 18624 11712 sshd
|
|
Thomas Bogendoerfer |
62d4fc |
33473 33283 0 c000000205211480 UN 0.1 48512 12672 sendmail
|
|
Thomas Bogendoerfer |
62d4fc |
33531 4241 37 c00000023c902780 UN 0.1 18624 11648 sshd
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
EEH handler hung while bnx2x sleeping and holding RTNL lock
|
|
Thomas Bogendoerfer |
62d4fc |
===========================================================
|
|
Thomas Bogendoerfer |
62d4fc |
crash> bt 213
|
|
Thomas Bogendoerfer |
62d4fc |
PID: 213 TASK: c000000004c89e00 CPU: 11 COMMAND: "eehd"
|
|
Thomas Bogendoerfer |
62d4fc |
#0 [c000000004d477e0] __schedule at c000000000c70808
|
|
Thomas Bogendoerfer |
62d4fc |
#1 [c000000004d478b0] schedule at c000000000c70ee0
|
|
Thomas Bogendoerfer |
62d4fc |
#2 [c000000004d478e0] schedule_timeout at c000000000c76dec
|
|
Thomas Bogendoerfer |
62d4fc |
#3 [c000000004d479c0] msleep at c0000000002120cc
|
|
Thomas Bogendoerfer |
62d4fc |
#4 [c000000004d479f0] napi_disable at c000000000a06448
|
|
Thomas Bogendoerfer |
62d4fc |
^^^^^^^^^^^^^^^^
|
|
Thomas Bogendoerfer |
62d4fc |
#5 [c000000004d47a30] bnx2x_netif_stop at c0080000018dba94 [bnx2x]
|
|
Thomas Bogendoerfer |
62d4fc |
#6 [c000000004d47a60] bnx2x_io_slot_reset at c0080000018a551c [bnx2x]
|
|
Thomas Bogendoerfer |
62d4fc |
#7 [c000000004d47b20] eeh_report_reset at c00000000004c9bc
|
|
Thomas Bogendoerfer |
62d4fc |
#8 [c000000004d47b90] eeh_pe_report at c00000000004d1a8
|
|
Thomas Bogendoerfer |
62d4fc |
#9 [c000000004d47c40] eeh_handle_normal_event at c00000000004da64
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
And the sleeping source code
|
|
Thomas Bogendoerfer |
62d4fc |
============================
|
|
Thomas Bogendoerfer |
62d4fc |
crash> dis -ls c000000000a06448
|
|
Thomas Bogendoerfer |
62d4fc |
FILE: ../net/core/dev.c
|
|
Thomas Bogendoerfer |
62d4fc |
LINE: 6702
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
6697 {
|
|
Thomas Bogendoerfer |
62d4fc |
6698 might_sleep();
|
|
Thomas Bogendoerfer |
62d4fc |
6699 set_bit(NAPI_STATE_DISABLE, &n->state);
|
|
Thomas Bogendoerfer |
62d4fc |
6700
|
|
Thomas Bogendoerfer |
62d4fc |
6701 while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
|
|
Thomas Bogendoerfer |
62d4fc |
* 6702 msleep(1);
|
|
Thomas Bogendoerfer |
62d4fc |
6703 while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
|
|
Thomas Bogendoerfer |
62d4fc |
6704 msleep(1);
|
|
Thomas Bogendoerfer |
62d4fc |
6705
|
|
Thomas Bogendoerfer |
62d4fc |
6706 hrtimer_cancel(&n->timer);
|
|
Thomas Bogendoerfer |
62d4fc |
6707
|
|
Thomas Bogendoerfer |
62d4fc |
6708 clear_bit(NAPI_STATE_DISABLE, &n->state);
|
|
Thomas Bogendoerfer |
62d4fc |
6709 }
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
EEH calls into bnx2x twice based on the system log above, first through
|
|
Thomas Bogendoerfer |
62d4fc |
bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes
|
|
Thomas Bogendoerfer |
62d4fc |
the following call chains:
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
bnx2x_io_error_detected()
|
|
Thomas Bogendoerfer |
62d4fc |
+-> bnx2x_eeh_nic_unload()
|
|
Thomas Bogendoerfer |
62d4fc |
+-> bnx2x_del_all_napi()
|
|
Thomas Bogendoerfer |
62d4fc |
+-> __netif_napi_del()
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
bnx2x_io_slot_reset()
|
|
Thomas Bogendoerfer |
62d4fc |
+-> bnx2x_netif_stop()
|
|
Thomas Bogendoerfer |
62d4fc |
+-> bnx2x_napi_disable()
|
|
Thomas Bogendoerfer |
62d4fc |
+->napi_disable()
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
Fix this by correcting the sequence of NAPI APIs usage,
|
|
Thomas Bogendoerfer |
62d4fc |
that is delete the NAPI after disabling it.
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
Fixes: 7fa6f34081f1 ("bnx2x: AER revised")
|
|
Thomas Bogendoerfer |
62d4fc |
Reported-by: David Christensen <drc@linux.vnet.ibm.com>
|
|
Thomas Bogendoerfer |
62d4fc |
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
|
|
Thomas Bogendoerfer |
62d4fc |
Signed-off-by: Manish Chopra <manishc@marvell.com>
|
|
Thomas Bogendoerfer |
62d4fc |
Signed-off-by: Ariel Elior <aelior@marvell.com>
|
|
Thomas Bogendoerfer |
62d4fc |
Link: https://lore.kernel.org/r/20220426153913.6966-1-manishc@marvell.com
|
|
Thomas Bogendoerfer |
62d4fc |
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Thomas Bogendoerfer |
62d4fc |
Acked-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
|
|
Thomas Bogendoerfer |
62d4fc |
---
|
|
Thomas Bogendoerfer |
62d4fc |
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 9 +++++----
|
|
Thomas Bogendoerfer |
62d4fc |
1 file changed, 5 insertions(+), 4 deletions(-)
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
|
|
Thomas Bogendoerfer |
62d4fc |
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
|
|
Thomas Bogendoerfer |
62d4fc |
@@ -14195,10 +14195,6 @@ static int bnx2x_eeh_nic_unload(struct b
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
/* Stop Tx */
|
|
Thomas Bogendoerfer |
62d4fc |
bnx2x_tx_disable(bp);
|
|
Thomas Bogendoerfer |
62d4fc |
- /* Delete all NAPI objects */
|
|
Thomas Bogendoerfer |
62d4fc |
- bnx2x_del_all_napi(bp);
|
|
Thomas Bogendoerfer |
62d4fc |
- if (CNIC_LOADED(bp))
|
|
Thomas Bogendoerfer |
62d4fc |
- bnx2x_del_all_napi_cnic(bp);
|
|
Thomas Bogendoerfer |
62d4fc |
netdev_reset_tc(bp->dev);
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
del_timer_sync(&bp->timer);
|
|
Thomas Bogendoerfer |
62d4fc |
@@ -14303,6 +14299,11 @@ static pci_ers_result_t bnx2x_io_slot_re
|
|
Thomas Bogendoerfer |
62d4fc |
bnx2x_drain_tx_queues(bp);
|
|
Thomas Bogendoerfer |
62d4fc |
bnx2x_send_unload_req(bp, UNLOAD_RECOVERY);
|
|
Thomas Bogendoerfer |
62d4fc |
bnx2x_netif_stop(bp, 1);
|
|
Thomas Bogendoerfer |
62d4fc |
+ bnx2x_del_all_napi(bp);
|
|
Thomas Bogendoerfer |
62d4fc |
+
|
|
Thomas Bogendoerfer |
62d4fc |
+ if (CNIC_LOADED(bp))
|
|
Thomas Bogendoerfer |
62d4fc |
+ bnx2x_del_all_napi_cnic(bp);
|
|
Thomas Bogendoerfer |
62d4fc |
+
|
|
Thomas Bogendoerfer |
62d4fc |
bnx2x_free_irq(bp);
|
|
Thomas Bogendoerfer |
62d4fc |
|
|
Thomas Bogendoerfer |
62d4fc |
/* Report UNLOAD_DONE to MCP */
|