Jiri Slaby fd75dd
From: Zhang Yuchen <zhangyuchen.lcr@bytedance.com>
Jiri Slaby fd75dd
Date: Wed, 12 Apr 2023 15:49:07 +0800
Jiri Slaby fd75dd
Subject: ipmi: fix SSIF not responding under certain cond.
Jiri Slaby fd75dd
Git-commit: 6d2555cde2918409b0331560e66f84a0ad4849c6
Jiri Slaby fd75dd
Patch-mainline: v6.4-rc1
Jiri Slaby fd75dd
References: git-fixes
Jiri Slaby fd75dd
Jiri Slaby fd75dd
The ipmi communication is not restored after a specific version of BMC is
Jiri Slaby fd75dd
upgraded on our server.
Jiri Slaby fd75dd
The ipmi driver does not respond after printing the following log:
Jiri Slaby fd75dd
Jiri Slaby fd75dd
    ipmi_ssif: Invalid response getting flags: 1c 1
Jiri Slaby fd75dd
Jiri Slaby fd75dd
I found that after entering this branch, ssif_info->ssif_state always
Jiri Slaby fd75dd
holds SSIF_GETTING_FLAGS and never return to IDLE.
Jiri Slaby fd75dd
Jiri Slaby fd75dd
As a result, the driver cannot be loaded, because the driver status is
Jiri Slaby fd75dd
checked during the unload process and must be IDLE in shutdown_ssif():
Jiri Slaby fd75dd
Jiri Slaby fd75dd
        while (ssif_info->ssif_state != SSIF_IDLE)
Jiri Slaby fd75dd
                schedule_timeout(1);
Jiri Slaby fd75dd
Jiri Slaby fd75dd
The process trigger this problem is:
Jiri Slaby fd75dd
Jiri Slaby fd75dd
1. One msg timeout and next msg start send, and call
Jiri Slaby fd75dd
ssif_set_need_watch().
Jiri Slaby fd75dd
Jiri Slaby fd75dd
2. ssif_set_need_watch()->watch_timeout()->start_flag_fetch() change
Jiri Slaby fd75dd
ssif_state to SSIF_GETTING_FLAGS.
Jiri Slaby fd75dd
Jiri Slaby fd75dd
3. In msg_done_handler() ssif_state == SSIF_GETTING_FLAGS, if an error
Jiri Slaby fd75dd
message is received, the second branch does not modify the ssif_state.
Jiri Slaby fd75dd
Jiri Slaby fd75dd
4. All retry action need IS_SSIF_IDLE() == True. Include retry action in
Jiri Slaby fd75dd
watch_timeout(), msg_done_handler(). Sending msg does not work either.
Jiri Slaby fd75dd
SSIF_IDLE is also checked in start_next_msg().
Jiri Slaby fd75dd
Jiri Slaby fd75dd
5. The only thing that can be triggered in the SSIF driver is
Jiri Slaby fd75dd
watch_timeout(), after destory_user(), this timer will stop too.
Jiri Slaby fd75dd
Jiri Slaby fd75dd
So, if enter this branch, the ssif_state will remain SSIF_GETTING_FLAGS
Jiri Slaby fd75dd
and can't send msg, no timer started, can't unload.
Jiri Slaby fd75dd
Jiri Slaby fd75dd
We did a comparative test before and after adding this patch, and the
Jiri Slaby fd75dd
result is effective.
Jiri Slaby fd75dd
Jiri Slaby fd75dd
Fixes: 259307074bfc ("ipmi: Add SMBus interface driver (SSIF)")
Jiri Slaby fd75dd
Jiri Slaby fd75dd
[js] the constant is named SSIF_NORMAL in 4.*
Jiri Slaby fd75dd
Jiri Slaby fd75dd
Cc: stable@vger.kernel.org
Jiri Slaby fd75dd
Signed-off-by: Zhang Yuchen <zhangyuchen.lcr@bytedance.com>
Jiri Slaby fd75dd
Message-Id: <20230412074907.80046-1-zhangyuchen.lcr@bytedance.com>
Jiri Slaby fd75dd
Signed-off-by: Corey Minyard <minyard@acm.org>
Jiri Slaby fd75dd
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Jiri Slaby fd75dd
---
Jiri Slaby fd75dd
 drivers/char/ipmi/ipmi_ssif.c |    4 ++--
Jiri Slaby fd75dd
 1 file changed, 2 insertions(+), 2 deletions(-)
Jiri Slaby fd75dd
Jiri Slaby fd75dd
--- a/drivers/char/ipmi/ipmi_ssif.c
Jiri Slaby fd75dd
+++ b/drivers/char/ipmi/ipmi_ssif.c
Jiri Slaby fd75dd
@@ -774,9 +774,9 @@ static void msg_done_handler(struct ssif
Jiri Slaby fd75dd
 		} else if (data[0] != (IPMI_NETFN_APP_REQUEST | 1) << 2
Jiri Slaby fd75dd
 			   || data[1] != IPMI_GET_MSG_FLAGS_CMD) {
Jiri Slaby fd75dd
 			/*
Jiri Slaby fd75dd
-			 * Don't abort here, maybe it was a queued
Jiri Slaby fd75dd
-			 * response to a previous command.
Jiri Slaby fd75dd
+			 * Recv error response, give up.
Jiri Slaby fd75dd
 			 */
Jiri Slaby fd75dd
+			ssif_info->ssif_state = SSIF_NORMAL;
Jiri Slaby fd75dd
 			ipmi_ssif_unlock_cond(ssif_info, flags);
Jiri Slaby fd75dd
 			pr_warn(PFX "Invalid response getting flags: %x %x\n",
Jiri Slaby fd75dd
 				data[0], data[1]);