Blob Blame History Raw
From f5449e74802c1112dea984aec8af7a33c4516af1 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg@nvidia.com>
Date: Mon, 14 Sep 2020 08:59:56 -0300
Subject: [PATCH 1/1] RDMA/ucma: Rework ucma_migrate_id() to avoid races with
 destroy
Git-commit: f5449e74802c1112dea984aec8af7a33c4516af1
Patch-mainline: v5.10
References: bsc#1187050 CVE-2020-36385

ucma_destroy_id() assumes that all things accessing the ctx will do so via
the xarray. This assumption violated only in the case the FD is being
closed, then the ctx is reached via the ctx_list. Normally this is OK
since ucma_destroy_id() cannot run concurrenty with release(), however
with ucma_migrate_id() is involved this can violated as the close of the
2nd FD can run concurrently with destroy on the first:

                CPU0                      CPU1
        ucma_destroy_id(fda)
                                  ucma_migrate_id(fda -> fdb)
                                       ucma_get_ctx()
        xa_lock()
         _ucma_find_context()
         xa_erase()
        xa_unlock()
                                       xa_lock()
                                        ctx->file = new_file
                                        list_move()
                                       xa_unlock()
                                      ucma_put_ctx()

                                   ucma_close(fdb)
                                      _destroy_id()
                                      kfree(ctx)

        _destroy_id()
          wait_for_completion()
          // boom, ctx was freed

The ctx->file must be modified under the handler and xa_lock, and prior to
modification the ID must be rechecked that it is still reachable from
cur_file, ie there is no parallel destroy or migrate.

To make this work remove the double locking and streamline the control
flow. The double locking was obsoleted by the handler lock now directly
preventing new uevents from being created, and the ctx_list cannot be read
while holding fgets on both files. Removing the double locking also
removes the need to check for the same file.

Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()")
Link: https://lore.kernel.org/r/0-v1-05c5a4090305+3a872-ucma_syz_migrate_jgg@nvidia.com
Reported-and-tested-by: syzbot+cc6fc752b3819e082d0c@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
---
 drivers/infiniband/core/ucma.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 1257d3686603..f6b8fc2f63fb 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1610,6 +1610,13 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
 	ucma_lock_files(cur_file, new_file);
 	mutex_lock(&mut);
 
+	if (_ucma_find_context(cmd.id, cur_file) != ctx) {
+		mutex_unlock(&mut);
+		ucma_unlock_files(cur_file, new_file);
+		ret = -ENOENT;
+		goto err_unlock;
+	}
+
 	list_move_tail(&ctx->list, &new_file->ctx_list);
 	ucma_move_events(ctx, new_file);
 	ctx->file = new_file;
@@ -1670,6 +1670,7 @@ response:
 			 &resp, sizeof(resp)))
 		ret = -EFAULT;
 
+err_unlock:
 	ucma_put_ctx(ctx);
 file_put:
 	fdput(f);
-- 
2.31.1.5.g533053588dc3