Petr Tesarik 1acccf
From: Benjamin Block <bblock@linux.ibm.com>
Petr Tesarik 1acccf
Date: Wed, 16 Nov 2022 11:50:37 +0100
Petr Tesarik 1acccf
Subject: scsi: zfcp: Fix double free of FSF request when qdio send fails
Petr Tesarik 1acccf
Git-commit: 0954256e970ecf371b03a6c9af2cf91b9c4085ff
Petr Tesarik 1acccf
Patch-mainline: v6.1-rc6
Petr Tesarik 1acccf
References: git-fixes
Petr Tesarik 1acccf
Petr Tesarik 1acccf
We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache
Petr Tesarik 1acccf
the FSF request ID when sending a new FSF request. This is used in case the
Petr Tesarik 1acccf
sending fails and we need to remove the request from our internal hash
Petr Tesarik 1acccf
table again (so we don't keep an invalid reference and use it when we free
Petr Tesarik 1acccf
the request again).
Petr Tesarik 1acccf
Petr Tesarik 1acccf
In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32
Petr Tesarik 1acccf
bit wide), but the rest of the zfcp code (and the firmware specification)
Petr Tesarik 1acccf
handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x
Petr Tesarik 1acccf
ELF ABI]).  For one this has the obvious problem that when the ID grows
Petr Tesarik 1acccf
past 32 bit (this can happen reasonably fast) it is truncated to 32 bit
Petr Tesarik 1acccf
when storing it in the cache variable and so doesn't match the original ID
Petr Tesarik 1acccf
anymore.  The second less obvious problem is that even when the original ID
Petr Tesarik 1acccf
has not yet grown past 32 bit, as soon as the 32nd bit is set in the
Petr Tesarik 1acccf
original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we
Petr Tesarik 1acccf
cast it back to 'unsigned long'. As the cached variable is of a signed
Petr Tesarik 1acccf
type, the compiler will choose a sign-extending instruction to load the 32
Petr Tesarik 1acccf
bit variable into a 64 bit register (e.g.: 'lgf %r11,188(%r15)'). So once
Petr Tesarik 1acccf
we pass the cached variable into 'zfcp_reqlist_find_rm()' to remove the
Petr Tesarik 1acccf
request again all the leading zeros will be flipped to ones to extend the
Petr Tesarik 1acccf
sign and won't match the original ID anymore (this has been observed in
Petr Tesarik 1acccf
practice).
Petr Tesarik 1acccf
Petr Tesarik 1acccf
If we can't successfully remove the request from the hash table again after
Petr Tesarik 1acccf
'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify
Petr Tesarik 1acccf
the adapter about new work because the adapter is already gone during
Petr Tesarik 1acccf
e.g. a ChpID toggle) we will end up with a double free.  We unconditionally
Petr Tesarik 1acccf
free the request in the calling function when 'zfcp_fsf_req_send()' fails,
Petr Tesarik 1acccf
but because the request is still in the hash table we end up with a stale
Petr Tesarik 1acccf
memory reference, and once the zfcp adapter is either reset during recovery
Petr Tesarik 1acccf
or shutdown we end up freeing the same memory twice.
Petr Tesarik 1acccf
Petr Tesarik 1acccf
The resulting stack traces vary depending on the kernel and have no direct
Petr Tesarik 1acccf
correlation to the place where the bug occurs. Here are three examples that
Petr Tesarik 1acccf
have been seen in practice:
Petr Tesarik 1acccf
Petr Tesarik 1acccf
  list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00)
Petr Tesarik 1acccf
  ------------[ cut here ]------------
Petr Tesarik 1acccf
  kernel BUG at lib/list_debug.c:62!
Petr Tesarik 1acccf
  monitor event: 0040 ilc:2 [#1] PREEMPT SMP
Petr Tesarik 1acccf
  Modules linked in: ...
Petr Tesarik 1acccf
  CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded
Petr Tesarik 1acccf
  Hardware name: ...
Petr Tesarik 1acccf
  Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140)
Petr Tesarik 1acccf
             R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Petr Tesarik 1acccf
  Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6
Petr Tesarik 1acccf
             0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8
Petr Tesarik 1acccf
             00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800
Petr Tesarik 1acccf
             00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70
Petr Tesarik 1acccf
  Krnl Code: 00000003cbeea1e8: c020004f68a7        larl    %r2,00000003cc8d7336
Petr Tesarik 1acccf
             00000003cbeea1ee: c0e50027fd65        brasl   %r14,00000003cc3e9cb8
Petr Tesarik 1acccf
            #00000003cbeea1f4: af000000            mc      0,0
Petr Tesarik 1acccf
            >00000003cbeea1f8: c02000920440        larl    %r2,00000003cd12aa78
Petr Tesarik 1acccf
             00000003cbeea1fe: c0e500289c25        brasl   %r14,00000003cc3fda48
Petr Tesarik 1acccf
             00000003cbeea204: b9040043            lgr     %r4,%r3
Petr Tesarik 1acccf
             00000003cbeea208: b9040051            lgr     %r5,%r1
Petr Tesarik 1acccf
             00000003cbeea20c: b9040032            lgr     %r3,%r2
Petr Tesarik 1acccf
  Call Trace:
Petr Tesarik 1acccf
   [<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140
Petr Tesarik 1acccf
  ([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140)
Petr Tesarik 1acccf
   [<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp]
Petr Tesarik 1acccf
   [<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp]
Petr Tesarik 1acccf
   [<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp]
Petr Tesarik 1acccf
   [<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp]
Petr Tesarik 1acccf
   [<00000003cb5eece8>] kthread+0x138/0x150
Petr Tesarik 1acccf
   [<00000003cb557f3c>] __ret_from_fork+0x3c/0x60
Petr Tesarik 1acccf
   [<00000003cc4172ea>] ret_from_fork+0xa/0x40
Petr Tesarik 1acccf
  INFO: lockdep is turned off.
Petr Tesarik 1acccf
  Last Breaking-Event-Address:
Petr Tesarik 1acccf
   [<00000003cc3e9d04>] _printk+0x4c/0x58
Petr Tesarik 1acccf
  Kernel panic - not syncing: Fatal exception: panic_on_oops
Petr Tesarik 1acccf
Petr Tesarik 1acccf
or:
Petr Tesarik 1acccf
Petr Tesarik 1acccf
  Unable to handle kernel pointer dereference in virtual kernel address space
Petr Tesarik 1acccf
  Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
Petr Tesarik 1acccf
  Fault in home space mode while using kernel ASCE.
Petr Tesarik 1acccf
  AS:0000000063b10007 R3:0000000000000024
Petr Tesarik 1acccf
  Oops: 0038 ilc:3 [#1] SMP
Petr Tesarik 1acccf
  Modules linked in: ...
Petr Tesarik 1acccf
  CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded
Petr Tesarik 1acccf
  Hardware name: ...
Petr Tesarik 1acccf
  Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp])
Petr Tesarik 1acccf
             R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Petr Tesarik 1acccf
  Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8
Petr Tesarik 1acccf
             0000000000000000 0000000000000055 0000000000000000 00000000a8515800
Petr Tesarik 1acccf
             0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c
Petr Tesarik 1acccf
             000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48
Petr Tesarik 1acccf
  Krnl Code: 000003ff7febaf7e: a7f4003d            brc     15,000003ff7febaff8
Petr Tesarik 1acccf
             000003ff7febaf82: e32020000004        lg      %r2,0(%r2)
Petr Tesarik 1acccf
            #000003ff7febaf88: ec2100388064        cgrj    %r2,%r1,8,000003ff7febaff8
Petr Tesarik 1acccf
            >000003ff7febaf8e: e3b020100020        cg      %r11,16(%r2)
Petr Tesarik 1acccf
             000003ff7febaf94: a774fff7            brc     7,000003ff7febaf82
Petr Tesarik 1acccf
             000003ff7febaf98: ec280030007c        cgij    %r2,0,8,000003ff7febaff8
Petr Tesarik 1acccf
             000003ff7febaf9e: e31020080004        lg      %r1,8(%r2)
Petr Tesarik 1acccf
             000003ff7febafa4: e33020000004        lg      %r3,0(%r2)
Petr Tesarik 1acccf
  Call Trace:
Petr Tesarik 1acccf
   [<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp]
Petr Tesarik 1acccf
   [<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp]
Petr Tesarik 1acccf
   [<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp]
Petr Tesarik 1acccf
   [<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128
Petr Tesarik 1acccf
   [<000000006292f300>] __do_softirq+0x130/0x3c0
Petr Tesarik 1acccf
   [<0000000061d906c6>] irq_exit_rcu+0xfe/0x118
Petr Tesarik 1acccf
   [<000000006291e818>] do_io_irq+0xc8/0x168
Petr Tesarik 1acccf
   [<000000006292d516>] io_int_handler+0xd6/0x110
Petr Tesarik 1acccf
   [<000000006292d596>] psw_idle_exit+0x0/0xa
Petr Tesarik 1acccf
  ([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0)
Petr Tesarik 1acccf
   [<000000006292ceea>] default_idle_call+0x52/0xf8
Petr Tesarik 1acccf
   [<0000000061de4fa4>] do_idle+0xd4/0x168
Petr Tesarik 1acccf
   [<0000000061de51fe>] cpu_startup_entry+0x36/0x40
Petr Tesarik 1acccf
   [<0000000061d4faac>] smp_start_secondary+0x12c/0x138
Petr Tesarik 1acccf
   [<000000006292d88e>] restart_int_handler+0x6e/0x90
Petr Tesarik 1acccf
  Last Breaking-Event-Address:
Petr Tesarik 1acccf
   [<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp]
Petr Tesarik 1acccf
  Kernel panic - not syncing: Fatal exception in interrupt
Petr Tesarik 1acccf
Petr Tesarik 1acccf
or:
Petr Tesarik 1acccf
Petr Tesarik 1acccf
  Unable to handle kernel pointer dereference in virtual kernel address space
Petr Tesarik 1acccf
  Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803
Petr Tesarik 1acccf
  Fault in home space mode while using kernel ASCE.
Petr Tesarik 1acccf
  AS:0000000077c40007 R3:0000000000000024
Petr Tesarik 1acccf
  Oops: 0038 ilc:3 [#1] SMP
Petr Tesarik 1acccf
  Modules linked in: ...
Petr Tesarik 1acccf
  CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded
Petr Tesarik 1acccf
  Hardware name: ...
Petr Tesarik 1acccf
  Workqueue: kblockd blk_mq_run_work_fn
Petr Tesarik 1acccf
  Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398)
Petr Tesarik 1acccf
             R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Petr Tesarik 1acccf
  Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20
Petr Tesarik 1acccf
             0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20
Petr Tesarik 1acccf
             00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6
Petr Tesarik 1acccf
             00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0
Petr Tesarik 1acccf
  Krnl Code: 0000000076fc0302: c004000000d4	brcl	0,76fc04aa
Petr Tesarik 1acccf
             0000000076fc0308: b904001b		lgr	%r1,%r11
Petr Tesarik 1acccf
            #0000000076fc030c: e3106020001a	algf	%r1,32(%r6)
Petr Tesarik 1acccf
            >0000000076fc0312: e31010000082	xg	%r1,0(%r1)
Petr Tesarik 1acccf
             0000000076fc0318: b9040001		lgr	%r0,%r1
Petr Tesarik 1acccf
             0000000076fc031c: e30061700082	xg	%r0,368(%r6)
Petr Tesarik 1acccf
             0000000076fc0322: ec59000100d9	aghik	%r5,%r9,1
Petr Tesarik 1acccf
             0000000076fc0328: e34003b80004	lg	%r4,952
Petr Tesarik 1acccf
  Call Trace:
Petr Tesarik 1acccf
   [<0000000076fc0312>] __kmalloc+0xd2/0x398
Petr Tesarik 1acccf
   [<0000000076f318f2>] mempool_alloc+0x72/0x1f8
Petr Tesarik 1acccf
   [<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp]
Petr Tesarik 1acccf
   [<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp]
Petr Tesarik 1acccf
   [<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp]
Petr Tesarik 1acccf
   [<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod]
Petr Tesarik 1acccf
   [<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208
Petr Tesarik 1acccf
   [<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8
Petr Tesarik 1acccf
   [<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod]
Petr Tesarik 1acccf
   [<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818
Petr Tesarik 1acccf
   [<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0
Petr Tesarik 1acccf
   [<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218
Petr Tesarik 1acccf
   [<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90
Petr Tesarik 1acccf
   [<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0
Petr Tesarik 1acccf
   [<0000000076da6d74>] process_one_work+0x274/0x4d0
Petr Tesarik 1acccf
   [<0000000076da7018>] worker_thread+0x48/0x560
Petr Tesarik 1acccf
   [<0000000076daef18>] kthread+0x140/0x160
Petr Tesarik 1acccf
   [<000000007751d144>] ret_from_fork+0x28/0x30
Petr Tesarik 1acccf
  Last Breaking-Event-Address:
Petr Tesarik 1acccf
   [<0000000076fc0474>] __kmalloc+0x234/0x398
Petr Tesarik 1acccf
  Kernel panic - not syncing: Fatal exception: panic_on_oops
Petr Tesarik 1acccf
Petr Tesarik 1acccf
To fix this, simply change the type of the cache variable to 'unsigned
Petr Tesarik 1acccf
long', like the rest of zfcp and also the argument for
Petr Tesarik 1acccf
'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension
Petr Tesarik 1acccf
and so can successfully remove the request from the hash table.
Petr Tesarik 1acccf
Petr Tesarik 1acccf
Fixes: e60a6d69f1f8 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe")
Petr Tesarik 1acccf
Cc: <stable@vger.kernel.org> #v2.6.34+
Petr Tesarik 1acccf
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Petr Tesarik 1acccf
Link: https://lore.kernel.org/r/979f6e6019d15f91ba56182f1aaf68d61bf37fc6.1668595505.git.bblock@linux.ibm.com
Petr Tesarik 1acccf
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Petr Tesarik 1acccf
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Petr Tesarik 1acccf
Acked-by: Petr Tesarik <ptesarik@suse.com>
Petr Tesarik 1acccf
---
Petr Tesarik 1acccf
 drivers/s390/scsi/zfcp_fsf.c |    2 +-
Petr Tesarik 1acccf
 1 file changed, 1 insertion(+), 1 deletion(-)
Petr Tesarik 1acccf
Petr Tesarik 1acccf
--- a/drivers/s390/scsi/zfcp_fsf.c
Petr Tesarik 1acccf
+++ b/drivers/s390/scsi/zfcp_fsf.c
Petr Tesarik 1acccf
@@ -751,7 +751,7 @@ static int zfcp_fsf_req_send(struct zfcp
Petr Tesarik 1acccf
 	struct zfcp_adapter *adapter = req->adapter;
Petr Tesarik 1acccf
 	struct zfcp_qdio *qdio = adapter->qdio;
Petr Tesarik 1acccf
 	int with_qtcb = (req->qtcb != NULL);
Petr Tesarik 1acccf
-	int req_id = req->req_id;
Petr Tesarik 1acccf
+	unsigned long req_id = req->req_id;
Petr Tesarik 1acccf
 
Petr Tesarik 1acccf
 	zfcp_reqlist_add(adapter->req_list, req);
Petr Tesarik 1acccf