Blob Blame History Raw
From: Josef Bacik <josef@toxicpanda.com>
Date: Fri, 13 Mar 2020 15:58:08 -0400
Subject: btrfs: only check priority tickets for priority flushing
Patch-mainline: Submitted, https://lore.kernel.org/linux-btrfs/20200313195809.141753-1-josef@toxicpanda.com/
References: bsc#1165949

In debugging a generic/320 failure on ppc64, Nikolay noticed that
sometimes we'd ENOSPC out with plenty of space to reclaim if we had
committed the transaction.  He further discovered that this was because
there was a priority ticket that was small enough to fit in the free
space currently in the space_info.

Consider the following scenario.  There is no more space to reclaim in
the fs without committing the transaction.  Assume there's 1mib of space
free in the space info, but there are pending normal tickets with 2mib
reservations.

Now a priority ticket comes in with a .5mib reservation.  Because we
have normal tickets pending we add ourselves to the priority list,
despite the fact that we could satisfy this reservation.

The flushing machinery now gets to the point where it wants to commit
the transaction, but because there's a .5mib ticket on the priority list
and we have 1mib of free space we assume the ticket will be granted
soon, so we bail without committing the transaction.

Meanwhile the priority flushing does not commit the transaction, and
eventually fails with an ENOSPC.  Then all other tickets are failed with
ENOSPC because we were never able to actually commit the transaction.

The fix for this is we should have simply granted the priority flusher
his reservation, because there was space to make the reservation.
Priority flushers by definition take priority, so they are allowed to
make their reservations before any previous normal tickets.  By not
adding this priority ticket to the list the normal flushing mechanisms
will then commit the transaction and everything will continue normally.

We still need to serialize ourselves with other priority tickets, so if
there are any tickets on the priority list then we need to add ourselves
to that list in order to maintain the serialization between priority
tickets.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Acked-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/space-info.c |   26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1030,6 +1030,17 @@ static int handle_reserve_ticket(struct
 	return ret;
 }

+
+/*
+ * This returns true if this flush state will go through the ordinary flushing
+ * code.
+ */
+static inline bool is_normal_flushing(enum btrfs_reserve_flush_enum flush)
+{
+	return (flush == BTRFS_RESERVE_FLUSH_ALL) ||
+		(flush == BTRFS_RESERVE_FLUSH_ALL_STEAL);
+}
+
 /**
  * reserve_metadata_bytes - try to reserve bytes from the block_rsv's space
  * @root - the root we're allocating for
@@ -1061,10 +1072,18 @@ static int __reserve_metadata_bytes(stru
 	spin_lock(&space_info->lock);
 	ret = -ENOSPC;
 	used = btrfs_space_info_used(space_info, true);
-	pending_tickets = !list_empty(&space_info->tickets) ||
-		!list_empty(&space_info->priority_tickets);

 	/*
+	 * We don't want NO_FLUSH allocations to jump everybody, they can
+	 * generally handle ENOSPC in a different way, so treat them the same as
+	 * normal flushers when it comes to skipping pending tickets.
+	 */
+	if (is_normal_flushing(flush) || (flush == BTRFS_RESERVE_NO_FLUSH))
+		pending_tickets = !list_empty(&space_info->tickets) ||
+			!list_empty(&space_info->priority_tickets);
+	else
+		pending_tickets = !list_empty(&space_info->priority_tickets);
+	/*
 	 * If we have enough space then hooray, make our reservation and carry
 	 * on.  If not see if we can overcommit, and if we can, hooray carry on.
 	 * If not things get more complicated.
@@ -1088,8 +1107,7 @@ static int __reserve_metadata_bytes(stru
 		ticket.error = 0;
 		init_waitqueue_head(&ticket.wait);
 		ticket.steal = (flush == BTRFS_RESERVE_FLUSH_ALL_STEAL);
-		if (flush == BTRFS_RESERVE_FLUSH_ALL ||
-		    flush == BTRFS_RESERVE_FLUSH_ALL_STEAL) {
+		if (is_normal_flushing(flush)) {
 			list_add_tail(&ticket.list, &space_info->tickets);
 			if (!space_info->flush) {
 				space_info->flush = 1;