Blob Blame History Raw
From: NeilBrown <neilb@suse.com>
Subject: dm-snap: avoid deadock on s->lock when a read is split.
Patch-mainline: Never, not perfect and solution still under discussion.
References: bsc#939826

If a large read request is sent to a dm-snap snapshot device at much
the same time as a write is sent to the origin device there is the
chance of a deadlock.

The read is split, the first half is registered as a pending read and
generic_make_request queues it on the current->bio_list list.

Then the write completes and pending_complete() takes the ->lock and
waits for any over-lapping reads to complete - particularly the one on
current->bio_list of that other process.

Then second half of the read is processed which requires taking the same ->lock.

The reader cannot get the lock because the writer hold it, and the
write won't let go until the reader finishes and the current->bio_list is flushed.

We break that deadlock in the writer by dropping the lock while
waiting for the read to compelte.

This leaves the possibility of a live lock: if a reader repeatedly
reads the same sector using O_DIRECT, it could prevent a writer from
ever making progress.  While this a real issue it is likely less of an
issue than the very real possibility of a deadlock.

I'm working with upstream to try to find a better solution that is not
overly complicated.

Signed-off-by: NeilBrown <neilb@suse.com>

---
 drivers/md/dm-snap.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1473,6 +1473,7 @@ static void pending_complete(void *conte
 	}
 	*e = pe->e;
 
+retry:
 	down_write(&s->lock);
 	if (!s->valid) {
 		free_completed_exception(e);
@@ -1481,7 +1482,11 @@ static void pending_complete(void *conte
 	}
 
 	/* Check for conflicting reads */
-	__check_for_conflicting_io(s, pe->e.old_chunk);
+	if (__chunk_is_tracked(s, pe->e.old_chunk)) {
+		up_write(&s->lock);
+		msleep(1);
+		goto retry;
+	}
 
 	/*
 	 * Add a proper exception, and remove the