Petr Tesarik 378329
From: Benjamin Block <bblock@linux.ibm.com>
Petr Tesarik 378329
Date: Wed, 16 Nov 2022 11:50:37 +0100
Petr Tesarik 378329
Subject: scsi: zfcp: Fix double free of FSF request when qdio send fails
Petr Tesarik 378329
Git-commit: 0954256e970ecf371b03a6c9af2cf91b9c4085ff
Petr Tesarik 378329
Patch-mainline: v6.1-rc6
Petr Tesarik 378329
References: git-fixes
Petr Tesarik 378329
Petr Tesarik 378329
We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache
Petr Tesarik 378329
the FSF request ID when sending a new FSF request. This is used in case the
Petr Tesarik 378329
sending fails and we need to remove the request from our internal hash
Petr Tesarik 378329
table again (so we don't keep an invalid reference and use it when we free
Petr Tesarik 378329
the request again).
Petr Tesarik 378329
Petr Tesarik 378329
In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32
Petr Tesarik 378329
bit wide), but the rest of the zfcp code (and the firmware specification)
Petr Tesarik 378329
handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x
Petr Tesarik 378329
ELF ABI]).  For one this has the obvious problem that when the ID grows
Petr Tesarik 378329
past 32 bit (this can happen reasonably fast) it is truncated to 32 bit
Petr Tesarik 378329
when storing it in the cache variable and so doesn't match the original ID
Petr Tesarik 378329
anymore.  The second less obvious problem is that even when the original ID
Petr Tesarik 378329
has not yet grown past 32 bit, as soon as the 32nd bit is set in the
Petr Tesarik 378329
original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we
Petr Tesarik 378329
cast it back to 'unsigned long'. As the cached variable is of a signed
Petr Tesarik 378329
type, the compiler will choose a sign-extending instruction to load the 32
Petr Tesarik 378329
bit variable into a 64 bit register (e.g.: 'lgf %r11,188(%r15)'). So once
Petr Tesarik 378329
we pass the cached variable into 'zfcp_reqlist_find_rm()' to remove the
Petr Tesarik 378329
request again all the leading zeros will be flipped to ones to extend the
Petr Tesarik 378329
sign and won't match the original ID anymore (this has been observed in
Petr Tesarik 378329
practice).
Petr Tesarik 378329
Petr Tesarik 378329
If we can't successfully remove the request from the hash table again after
Petr Tesarik 378329
'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify
Petr Tesarik 378329
the adapter about new work because the adapter is already gone during
Petr Tesarik 378329
e.g. a ChpID toggle) we will end up with a double free.  We unconditionally
Petr Tesarik 378329
free the request in the calling function when 'zfcp_fsf_req_send()' fails,
Petr Tesarik 378329
but because the request is still in the hash table we end up with a stale
Petr Tesarik 378329
memory reference, and once the zfcp adapter is either reset during recovery
Petr Tesarik 378329
or shutdown we end up freeing the same memory twice.
Petr Tesarik 378329
Petr Tesarik 378329
The resulting stack traces vary depending on the kernel and have no direct
Petr Tesarik 378329
correlation to the place where the bug occurs. Here are three examples that
Petr Tesarik 378329
have been seen in practice:
Petr Tesarik 378329
Petr Tesarik 378329
  list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00)
Petr Tesarik 378329
  ------------[ cut here ]------------
Petr Tesarik 378329
  kernel BUG at lib/list_debug.c:62!
Petr Tesarik 378329
  monitor event: 0040 ilc:2 [#1] PREEMPT SMP
Petr Tesarik 378329
  Modules linked in: ...
Petr Tesarik 378329
  CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded
Petr Tesarik 378329
  Hardware name: ...
Petr Tesarik 378329
  Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140)
Petr Tesarik 378329
             R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Petr Tesarik 378329
  Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6
Petr Tesarik 378329
             0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8
Petr Tesarik 378329
             00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800
Petr Tesarik 378329
             00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70
Petr Tesarik 378329
  Krnl Code: 00000003cbeea1e8: c020004f68a7        larl    %r2,00000003cc8d7336
Petr Tesarik 378329
             00000003cbeea1ee: c0e50027fd65        brasl   %r14,00000003cc3e9cb8
Petr Tesarik 378329
            #00000003cbeea1f4: af000000            mc      0,0
Petr Tesarik 378329
            >00000003cbeea1f8: c02000920440        larl    %r2,00000003cd12aa78
Petr Tesarik 378329
             00000003cbeea1fe: c0e500289c25        brasl   %r14,00000003cc3fda48
Petr Tesarik 378329
             00000003cbeea204: b9040043            lgr     %r4,%r3
Petr Tesarik 378329
             00000003cbeea208: b9040051            lgr     %r5,%r1
Petr Tesarik 378329
             00000003cbeea20c: b9040032            lgr     %r3,%r2
Petr Tesarik 378329
  Call Trace:
Petr Tesarik 378329
   [<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140
Petr Tesarik 378329
  ([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140)
Petr Tesarik 378329
   [<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp]
Petr Tesarik 378329
   [<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp]
Petr Tesarik 378329
   [<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp]
Petr Tesarik 378329
   [<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp]
Petr Tesarik 378329
   [<00000003cb5eece8>] kthread+0x138/0x150
Petr Tesarik 378329
   [<00000003cb557f3c>] __ret_from_fork+0x3c/0x60
Petr Tesarik 378329
   [<00000003cc4172ea>] ret_from_fork+0xa/0x40
Petr Tesarik 378329
  INFO: lockdep is turned off.
Petr Tesarik 378329
  Last Breaking-Event-Address:
Petr Tesarik 378329
   [<00000003cc3e9d04>] _printk+0x4c/0x58
Petr Tesarik 378329
  Kernel panic - not syncing: Fatal exception: panic_on_oops
Petr Tesarik 378329
Petr Tesarik 378329
or:
Petr Tesarik 378329
Petr Tesarik 378329
  Unable to handle kernel pointer dereference in virtual kernel address space
Petr Tesarik 378329
  Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
Petr Tesarik 378329
  Fault in home space mode while using kernel ASCE.
Petr Tesarik 378329
  AS:0000000063b10007 R3:0000000000000024
Petr Tesarik 378329
  Oops: 0038 ilc:3 [#1] SMP
Petr Tesarik 378329
  Modules linked in: ...
Petr Tesarik 378329
  CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded
Petr Tesarik 378329
  Hardware name: ...
Petr Tesarik 378329
  Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp])
Petr Tesarik 378329
             R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Petr Tesarik 378329
  Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8
Petr Tesarik 378329
             0000000000000000 0000000000000055 0000000000000000 00000000a8515800
Petr Tesarik 378329
             0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c
Petr Tesarik 378329
             000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48
Petr Tesarik 378329
  Krnl Code: 000003ff7febaf7e: a7f4003d            brc     15,000003ff7febaff8
Petr Tesarik 378329
             000003ff7febaf82: e32020000004        lg      %r2,0(%r2)
Petr Tesarik 378329
            #000003ff7febaf88: ec2100388064        cgrj    %r2,%r1,8,000003ff7febaff8
Petr Tesarik 378329
            >000003ff7febaf8e: e3b020100020        cg      %r11,16(%r2)
Petr Tesarik 378329
             000003ff7febaf94: a774fff7            brc     7,000003ff7febaf82
Petr Tesarik 378329
             000003ff7febaf98: ec280030007c        cgij    %r2,0,8,000003ff7febaff8
Petr Tesarik 378329
             000003ff7febaf9e: e31020080004        lg      %r1,8(%r2)
Petr Tesarik 378329
             000003ff7febafa4: e33020000004        lg      %r3,0(%r2)
Petr Tesarik 378329
  Call Trace:
Petr Tesarik 378329
   [<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp]
Petr Tesarik 378329
   [<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp]
Petr Tesarik 378329
   [<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp]
Petr Tesarik 378329
   [<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128
Petr Tesarik 378329
   [<000000006292f300>] __do_softirq+0x130/0x3c0
Petr Tesarik 378329
   [<0000000061d906c6>] irq_exit_rcu+0xfe/0x118
Petr Tesarik 378329
   [<000000006291e818>] do_io_irq+0xc8/0x168
Petr Tesarik 378329
   [<000000006292d516>] io_int_handler+0xd6/0x110
Petr Tesarik 378329
   [<000000006292d596>] psw_idle_exit+0x0/0xa
Petr Tesarik 378329
  ([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0)
Petr Tesarik 378329
   [<000000006292ceea>] default_idle_call+0x52/0xf8
Petr Tesarik 378329
   [<0000000061de4fa4>] do_idle+0xd4/0x168
Petr Tesarik 378329
   [<0000000061de51fe>] cpu_startup_entry+0x36/0x40
Petr Tesarik 378329
   [<0000000061d4faac>] smp_start_secondary+0x12c/0x138
Petr Tesarik 378329
   [<000000006292d88e>] restart_int_handler+0x6e/0x90
Petr Tesarik 378329
  Last Breaking-Event-Address:
Petr Tesarik 378329
   [<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp]
Petr Tesarik 378329
  Kernel panic - not syncing: Fatal exception in interrupt
Petr Tesarik 378329
Petr Tesarik 378329
or:
Petr Tesarik 378329
Petr Tesarik 378329
  Unable to handle kernel pointer dereference in virtual kernel address space
Petr Tesarik 378329
  Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803
Petr Tesarik 378329
  Fault in home space mode while using kernel ASCE.
Petr Tesarik 378329
  AS:0000000077c40007 R3:0000000000000024
Petr Tesarik 378329
  Oops: 0038 ilc:3 [#1] SMP
Petr Tesarik 378329
  Modules linked in: ...
Petr Tesarik 378329
  CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded
Petr Tesarik 378329
  Hardware name: ...
Petr Tesarik 378329
  Workqueue: kblockd blk_mq_run_work_fn
Petr Tesarik 378329
  Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398)
Petr Tesarik 378329
             R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Petr Tesarik 378329
  Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20
Petr Tesarik 378329
             0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20
Petr Tesarik 378329
             00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6
Petr Tesarik 378329
             00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0
Petr Tesarik 378329
  Krnl Code: 0000000076fc0302: c004000000d4	brcl	0,76fc04aa
Petr Tesarik 378329
             0000000076fc0308: b904001b		lgr	%r1,%r11
Petr Tesarik 378329
            #0000000076fc030c: e3106020001a	algf	%r1,32(%r6)
Petr Tesarik 378329
            >0000000076fc0312: e31010000082	xg	%r1,0(%r1)
Petr Tesarik 378329
             0000000076fc0318: b9040001		lgr	%r0,%r1
Petr Tesarik 378329
             0000000076fc031c: e30061700082	xg	%r0,368(%r6)
Petr Tesarik 378329
             0000000076fc0322: ec59000100d9	aghik	%r5,%r9,1
Petr Tesarik 378329
             0000000076fc0328: e34003b80004	lg	%r4,952
Petr Tesarik 378329
  Call Trace:
Petr Tesarik 378329
   [<0000000076fc0312>] __kmalloc+0xd2/0x398
Petr Tesarik 378329
   [<0000000076f318f2>] mempool_alloc+0x72/0x1f8
Petr Tesarik 378329
   [<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp]
Petr Tesarik 378329
   [<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp]
Petr Tesarik 378329
   [<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp]
Petr Tesarik 378329
   [<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod]
Petr Tesarik 378329
   [<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208
Petr Tesarik 378329
   [<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8
Petr Tesarik 378329
   [<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod]
Petr Tesarik 378329
   [<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818
Petr Tesarik 378329
   [<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0
Petr Tesarik 378329
   [<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218
Petr Tesarik 378329
   [<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90
Petr Tesarik 378329
   [<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0
Petr Tesarik 378329
   [<0000000076da6d74>] process_one_work+0x274/0x4d0
Petr Tesarik 378329
   [<0000000076da7018>] worker_thread+0x48/0x560
Petr Tesarik 378329
   [<0000000076daef18>] kthread+0x140/0x160
Petr Tesarik 378329
   [<000000007751d144>] ret_from_fork+0x28/0x30
Petr Tesarik 378329
  Last Breaking-Event-Address:
Petr Tesarik 378329
   [<0000000076fc0474>] __kmalloc+0x234/0x398
Petr Tesarik 378329
  Kernel panic - not syncing: Fatal exception: panic_on_oops
Petr Tesarik 378329
Petr Tesarik 378329
To fix this, simply change the type of the cache variable to 'unsigned
Petr Tesarik 378329
long', like the rest of zfcp and also the argument for
Petr Tesarik 378329
'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension
Petr Tesarik 378329
and so can successfully remove the request from the hash table.
Petr Tesarik 378329
Petr Tesarik 378329
Fixes: e60a6d69f1f8 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe")
Petr Tesarik 378329
Cc: <stable@vger.kernel.org> #v2.6.34+
Petr Tesarik 378329
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Petr Tesarik 378329
Link: https://lore.kernel.org/r/979f6e6019d15f91ba56182f1aaf68d61bf37fc6.1668595505.git.bblock@linux.ibm.com
Petr Tesarik 378329
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Petr Tesarik 378329
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Petr Tesarik 378329
Acked-by: Petr Tesarik <ptesarik@suse.com>
Petr Tesarik 378329
---
Petr Tesarik 378329
 drivers/s390/scsi/zfcp_fsf.c |    2 +-
Petr Tesarik 378329
 1 file changed, 1 insertion(+), 1 deletion(-)
Petr Tesarik 378329
Petr Tesarik 378329
--- a/drivers/s390/scsi/zfcp_fsf.c
Petr Tesarik 378329
+++ b/drivers/s390/scsi/zfcp_fsf.c
Petr Tesarik 378329
@@ -884,7 +884,7 @@ static int zfcp_fsf_req_send(struct zfcp
Petr Tesarik 378329
 	const bool is_srb = zfcp_fsf_req_is_status_read_buffer(req);
Petr Tesarik 378329
 	struct zfcp_adapter *adapter = req->adapter;
Petr Tesarik 378329
 	struct zfcp_qdio *qdio = adapter->qdio;
Petr Tesarik 378329
-	int req_id = req->req_id;
Petr Tesarik 378329
+	unsigned long req_id = req->req_id;
Petr Tesarik 378329
 
Petr Tesarik 378329
 	zfcp_reqlist_add(adapter->req_list, req);
Petr Tesarik 378329