Blob Blame History Raw
From: NeilBrown <neilb@suse.de>
Subject: VFS: Check rename_lock in lookup_fast()
References: bsc#1174734
Patch-mainline: never, work-around

A filesystem using DMAPI can request a service on a file while holding
i_rwsem on the parent of that file.  If the lookup of the file needs to
take i_rwsem on that parent, we can deadlock.

Normally the name will be found using RCU lookup so the i_rwsem won't be
needed.  However a rename elsewhere in any filesystem can move arbitrary
dentries from one hash chain to another.  If this happens when the
lookup done for the DMAPI daemon is inspecting that dentry, the
RCU-lookup can fail.

This is not fatal with the current code as it checks again after taking
the mutex - but this deadlocks DMAPI.

This can be fixed by checking the rename_look seqlock and retrying the
RCU lookup if it might have raced with a rename.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/namei.c |   22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1588,14 +1588,26 @@ static int lookup_fast(struct nameidata
 	int err;
 
 	/*
-	 * Rename seqlock is not required here because in the off chance
-	 * of a false negative due to a concurrent rename, the caller is
-	 * going to fall back to non-racy lookup.
+	 * Rename seqlock is used to work-around a problem with DMAPI.
+	 * The filesystem can call the DMAPI server while holding i_mutex
+	 * on a directory, and will ask it to examine a file in the directory.
+	 * If we then take i_mutex in lookup_slow(), we will deadlock.
+	 * The name *will* be in the cache, but renames elsewhere in
+	 * any filesystem can perturb the hash table and cause __d_lookup_rcu()
+	 * to fail.  Use of the seqlock means we will retry when that might
+	 * have happened.
 	 */
 	if (nd->flags & LOOKUP_RCU) {
-		unsigned seq;
+		unsigned seq, rename_seq;
 		bool negative;
-		dentry = __d_lookup_rcu(parent, &nd->last, &seq);
+
+		do {
+			rename_seq = read_seqbegin(&rename_lock);
+			dentry = __d_lookup_rcu(parent, &nd->last, &seq);
+			if (dentry)
+				break;
+		} while (read_seqretry(&rename_lock, rename_seq));
+
 		if (unlikely(!dentry)) {
 			if (unlazy_walk(nd))
 				return -ECHILD;