Filipe Manana 903f5b
From: Filipe Manana <fdmanana@suse.com>
Filipe Manana 903f5b
Date: Tue, 20 Apr 2021 10:55:44 +0100
Filipe Manana 903f5b
Git-commit: f9690f426b2134cc3e74bfc5d9dfd6a4b2ca5281
Filipe Manana 903f5b
Patch-mainline: v5.13-rc1
Filipe Manana 903f5b
Subject: [PATCH] btrfs: fix race when picking most recent mod log operation
Filipe Manana 903f5b
 for an old root
Filipe Manana 903f5b
References: bsc#1186439
Filipe Manana 903f5b
Filipe Manana 903f5b
Commit dbcc7d57bffc0c ("btrfs: fix race when cloning extent buffer during
Filipe Manana 903f5b
rewind of an old root"), fixed a race when we need to rewind the extent
Filipe Manana 903f5b
buffer of an old root. It was caused by picking a new mod log operation
Filipe Manana 903f5b
for the extent buffer while getting a cloned extent buffer with an outdated
Filipe Manana 903f5b
number of items (off by -1), because we cloned the extent buffer without
Filipe Manana 903f5b
locking it first.
Filipe Manana 903f5b
Filipe Manana 903f5b
However there is still another similar race, but in the opposite direction.
Filipe Manana 903f5b
The cloned extent buffer has a number of items that does not match the
Filipe Manana 903f5b
number of tree mod log operations that are going to be replayed. This is
Filipe Manana 903f5b
because right after we got the last (most recent) tree mod log operation to
Filipe Manana 903f5b
replay and before locking and cloning the extent buffer, another task adds
Filipe Manana 903f5b
a new pointer to the extent buffer, which results in adding a new tree mod
Filipe Manana 903f5b
log operation and incrementing the number of items in the extent buffer.
Filipe Manana 903f5b
So after cloning we have mismatch between the number of items in the extent
Filipe Manana 903f5b
buffer and the number of mod log operations we are going to apply to it.
Filipe Manana 903f5b
This results in hitting a BUG_ON() that produces the following stack trace:
Filipe Manana 903f5b
Filipe Manana 903f5b
   ------------[ cut here ]------------
Filipe Manana 903f5b
   kernel BUG at fs/btrfs/tree-mod-log.c:675!
Filipe Manana 903f5b
   invalid opcode: 0000 [#1] SMP KASAN PTI
Filipe Manana 903f5b
   CPU: 3 PID: 4811 Comm: crawl_1215 Tainted: G        W         5.12.0-7d1efdf501f8-misc-next+ #99
Filipe Manana 903f5b
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Filipe Manana 903f5b
   RIP: 0010:tree_mod_log_rewind+0x3b1/0x3c0
Filipe Manana 903f5b
   Code: 05 48 8d 74 10 (...)
Filipe Manana 903f5b
   RSP: 0018:ffffc90001027090 EFLAGS: 00010293
Filipe Manana 903f5b
   RAX: 0000000000000000 RBX: ffff8880a8514600 RCX: ffffffffaa9e59b6
Filipe Manana 903f5b
   RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8880a851462c
Filipe Manana 903f5b
   RBP: ffffc900010270e0 R08: 00000000000000c0 R09: ffffed1004333417
Filipe Manana 903f5b
   R10: ffff88802199a0b7 R11: ffffed1004333416 R12: 000000000000000e
Filipe Manana 903f5b
   R13: ffff888135af8748 R14: ffff88818766ff00 R15: ffff8880a851462c
Filipe Manana 903f5b
   FS:  00007f29acf62700(0000) GS:ffff8881f2200000(0000) knlGS:0000000000000000
Filipe Manana 903f5b
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Filipe Manana 903f5b
   CR2: 00007f0e6013f718 CR3: 000000010d42e003 CR4: 0000000000170ee0
Filipe Manana 903f5b
   Call Trace:
Filipe Manana 903f5b
    btrfs_get_old_root+0x16a/0x5c0
Filipe Manana 903f5b
    ? lock_downgrade+0x400/0x400
Filipe Manana 903f5b
    btrfs_search_old_slot+0x192/0x520
Filipe Manana 903f5b
    ? btrfs_search_slot+0x1090/0x1090
Filipe Manana 903f5b
    ? free_extent_buffer.part.61+0xd7/0x140
Filipe Manana 903f5b
    ? free_extent_buffer+0x13/0x20
Filipe Manana 903f5b
    resolve_indirect_refs+0x3e9/0xfc0
Filipe Manana 903f5b
    ? lock_downgrade+0x400/0x400
Filipe Manana 903f5b
    ? __kasan_check_read+0x11/0x20
Filipe Manana 903f5b
    ? add_prelim_ref.part.11+0x150/0x150
Filipe Manana 903f5b
    ? lock_downgrade+0x400/0x400
Filipe Manana 903f5b
    ? __kasan_check_read+0x11/0x20
Filipe Manana 903f5b
    ? lock_acquired+0xbb/0x620
Filipe Manana 903f5b
    ? __kasan_check_write+0x14/0x20
Filipe Manana 903f5b
    ? do_raw_spin_unlock+0xa8/0x140
Filipe Manana 903f5b
    ? rb_insert_color+0x340/0x360
Filipe Manana 903f5b
    ? prelim_ref_insert+0x12d/0x430
Filipe Manana 903f5b
    find_parent_nodes+0x5c3/0x1830
Filipe Manana 903f5b
    ? stack_trace_save+0x87/0xb0
Filipe Manana 903f5b
    ? resolve_indirect_refs+0xfc0/0xfc0
Filipe Manana 903f5b
    ? fs_reclaim_acquire+0x67/0xf0
Filipe Manana 903f5b
    ? __kasan_check_read+0x11/0x20
Filipe Manana 903f5b
    ? lockdep_hardirqs_on_prepare+0x210/0x210
Filipe Manana 903f5b
    ? fs_reclaim_acquire+0x67/0xf0
Filipe Manana 903f5b
    ? __kasan_check_read+0x11/0x20
Filipe Manana 903f5b
    ? ___might_sleep+0x10f/0x1e0
Filipe Manana 903f5b
    ? __kasan_kmalloc+0x9d/0xd0
Filipe Manana 903f5b
    ? trace_hardirqs_on+0x55/0x120
Filipe Manana 903f5b
    btrfs_find_all_roots_safe+0x142/0x1e0
Filipe Manana 903f5b
    ? find_parent_nodes+0x1830/0x1830
Filipe Manana 903f5b
    ? trace_hardirqs_on+0x55/0x120
Filipe Manana 903f5b
    ? ulist_free+0x1f/0x30
Filipe Manana 903f5b
    ? btrfs_inode_flags_to_xflags+0x50/0x50
Filipe Manana 903f5b
    iterate_extent_inodes+0x20e/0x580
Filipe Manana 903f5b
    ? tree_backref_for_extent+0x230/0x230
Filipe Manana 903f5b
    ? release_extent_buffer+0x225/0x280
Filipe Manana 903f5b
    ? read_extent_buffer+0xdd/0x110
Filipe Manana 903f5b
    ? lock_downgrade+0x400/0x400
Filipe Manana 903f5b
    ? __kasan_check_read+0x11/0x20
Filipe Manana 903f5b
    ? lock_acquired+0xbb/0x620
Filipe Manana 903f5b
    ? __kasan_check_write+0x14/0x20
Filipe Manana 903f5b
    ? do_raw_spin_unlock+0xa8/0x140
Filipe Manana 903f5b
    ? _raw_spin_unlock+0x22/0x30
Filipe Manana 903f5b
    ? release_extent_buffer+0x225/0x280
Filipe Manana 903f5b
    iterate_inodes_from_logical+0x129/0x170
Filipe Manana 903f5b
    ? iterate_inodes_from_logical+0x129/0x170
Filipe Manana 903f5b
    ? btrfs_inode_flags_to_xflags+0x50/0x50
Filipe Manana 903f5b
    ? iterate_extent_inodes+0x580/0x580
Filipe Manana 903f5b
    ? __vmalloc_node+0x92/0xb0
Filipe Manana 903f5b
    ? init_data_container+0x34/0xb0
Filipe Manana 903f5b
    ? init_data_container+0x34/0xb0
Filipe Manana 903f5b
    ? kvmalloc_node+0x60/0x80
Filipe Manana 903f5b
    btrfs_ioctl_logical_to_ino+0x158/0x230
Filipe Manana 903f5b
    btrfs_ioctl+0x2038/0x4360
Filipe Manana 903f5b
    ? __kasan_check_write+0x14/0x20
Filipe Manana 903f5b
    ? mmput+0x3b/0x220
Filipe Manana 903f5b
    ? btrfs_ioctl_get_supported_features+0x30/0x30
Filipe Manana 903f5b
    ? __kasan_check_read+0x11/0x20
Filipe Manana 903f5b
    ? __kasan_check_read+0x11/0x20
Filipe Manana 903f5b
    ? lock_release+0xc8/0x650
Filipe Manana 903f5b
    ? __might_fault+0x64/0xd0
Filipe Manana 903f5b
    ? __kasan_check_read+0x11/0x20
Filipe Manana 903f5b
    ? lock_downgrade+0x400/0x400
Filipe Manana 903f5b
    ? lockdep_hardirqs_on_prepare+0x210/0x210
Filipe Manana 903f5b
    ? lockdep_hardirqs_on_prepare+0x13/0x210
Filipe Manana 903f5b
    ? _raw_spin_unlock_irqrestore+0x51/0x63
Filipe Manana 903f5b
    ? __kasan_check_read+0x11/0x20
Filipe Manana 903f5b
    ? do_vfs_ioctl+0xfc/0x9d0
Filipe Manana 903f5b
    ? ioctl_file_clone+0xe0/0xe0
Filipe Manana 903f5b
    ? lock_downgrade+0x400/0x400
Filipe Manana 903f5b
    ? lockdep_hardirqs_on_prepare+0x210/0x210
Filipe Manana 903f5b
    ? __kasan_check_read+0x11/0x20
Filipe Manana 903f5b
    ? lock_release+0xc8/0x650
Filipe Manana 903f5b
    ? __task_pid_nr_ns+0xd3/0x250
Filipe Manana 903f5b
    ? __kasan_check_read+0x11/0x20
Filipe Manana 903f5b
    ? __fget_files+0x160/0x230
Filipe Manana 903f5b
    ? __fget_light+0xf2/0x110
Filipe Manana 903f5b
    __x64_sys_ioctl+0xc3/0x100
Filipe Manana 903f5b
    do_syscall_64+0x37/0x80
Filipe Manana 903f5b
    entry_SYSCALL_64_after_hwframe+0x44/0xae
Filipe Manana 903f5b
   RIP: 0033:0x7f29ae85b427
Filipe Manana 903f5b
   Code: 00 00 90 48 8b (...)
Filipe Manana 903f5b
   RSP: 002b:00007f29acf5fcf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Filipe Manana 903f5b
   RAX: ffffffffffffffda RBX: 00007f29acf5ff40 RCX: 00007f29ae85b427
Filipe Manana 903f5b
   RDX: 00007f29acf5ff48 RSI: 00000000c038943b RDI: 0000000000000003
Filipe Manana 903f5b
   RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f29acf60120
Filipe Manana 903f5b
   R10: 00005640d5fc7b00 R11: 0000000000000246 R12: 0000000000000003
Filipe Manana 903f5b
   R13: 00007f29acf5ff48 R14: 00007f29acf5ff40 R15: 00007f29acf5fef8
Filipe Manana 903f5b
   Modules linked in:
Filipe Manana 903f5b
   ---[ end trace 85e5fce078dfbe04 ]---
Filipe Manana 903f5b
Filipe Manana 903f5b
  (gdb) l *(tree_mod_log_rewind+0x3b1)
Filipe Manana 903f5b
  0xffffffff819e5b21 is in tree_mod_log_rewind (fs/btrfs/tree-mod-log.c:675).
Filipe Manana 903f5b
  670                      * the modification. As we're going backwards, we do the
Filipe Manana 903f5b
  671                      * opposite of each operation here.
Filipe Manana 903f5b
  672                      */
Filipe Manana 903f5b
  673                     switch (tm->op) {
Filipe Manana 903f5b
  674                     case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
Filipe Manana 903f5b
  675                             BUG_ON(tm->slot < n);
Filipe Manana 903f5b
  676                             fallthrough;
Filipe Manana 903f5b
  677                     case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_MOVING:
Filipe Manana 903f5b
  678                     case BTRFS_MOD_LOG_KEY_REMOVE:
Filipe Manana 903f5b
  679                             btrfs_set_node_key(eb, &tm->key, tm->slot);
Filipe Manana 903f5b
  (gdb) quit
Filipe Manana 903f5b
Filipe Manana 903f5b
The following steps explain in more detail how it happens:
Filipe Manana 903f5b
Filipe Manana 903f5b
1) We have one tree mod log user (through fiemap or the logical ino ioctl),
Filipe Manana 903f5b
   with a sequence number of 1, so we have fs_info->tree_mod_seq == 1.
Filipe Manana 903f5b
   This is task A;
Filipe Manana 903f5b
Filipe Manana 903f5b
2) Another task is at ctree.c:balance_level() and we have eb X currently as
Filipe Manana 903f5b
   the root of the tree, and we promote its single child, eb Y, as the new
Filipe Manana 903f5b
   root.
Filipe Manana 903f5b
Filipe Manana 903f5b
   Then, at ctree.c:balance_level(), we call:
Filipe Manana 903f5b
Filipe Manana 903f5b
      ret = btrfs_tree_mod_log_insert_root(root->node, child, true);
Filipe Manana 903f5b
Filipe Manana 903f5b
3) At btrfs_tree_mod_log_insert_root() we create a tree mod log operation
Filipe Manana 903f5b
   of type BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING, with a ->logical field
Filipe Manana 903f5b
   pointing to ebX->start. We only have one item in eb X, so we create
Filipe Manana 903f5b
   only one tree mod log operation, and store in the "tm_list" array;
Filipe Manana 903f5b
Filipe Manana 903f5b
4) Then, still at btrfs_tree_mod_log_insert_root(), we create a tree mod
Filipe Manana 903f5b
   log element of operation type BTRFS_MOD_LOG_ROOT_REPLACE, ->logical set
Filipe Manana 903f5b
   to ebY->start, ->old_root.logical set to ebX->start, ->old_root.level
Filipe Manana 903f5b
   set to the level of eb X and ->generation set to the generation of eb X;
Filipe Manana 903f5b
Filipe Manana 903f5b
5) Then btrfs_tree_mod_log_insert_root() calls tree_mod_log_free_eb() with
Filipe Manana 903f5b
   "tm_list" as argument. After that, tree_mod_log_free_eb() calls
Filipe Manana 903f5b
   tree_mod_log_insert(). This inserts the mod log operation of type
Filipe Manana 903f5b
   BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING from step 3 into the rbtree
Filipe Manana 903f5b
   with a sequence number of 2 (and fs_info->tree_mod_seq set to 2);
Filipe Manana 903f5b
Filipe Manana 903f5b
6) Then, after inserting the "tm_list" single element into the tree mod
Filipe Manana 903f5b
   log rbtree, the BTRFS_MOD_LOG_ROOT_REPLACE element is inserted, which
Filipe Manana 903f5b
   gets the sequence number 3 (and fs_info->tree_mod_seq set to 3);
Filipe Manana 903f5b
Filipe Manana 903f5b
7) Back to ctree.c:balance_level(), we free eb X by calling
Filipe Manana 903f5b
   btrfs_free_tree_block() on it. Because eb X was created in the current
Filipe Manana 903f5b
   transaction, has no other references and writeback did not happen for
Filipe Manana 903f5b
   it, we add it back to the free space cache/tree;
Filipe Manana 903f5b
Filipe Manana 903f5b
8) Later some other task B allocates the metadata extent from eb X, since
Filipe Manana 903f5b
   it is marked as free space in the space cache/tree, and uses it as a
Filipe Manana 903f5b
   node for some other btree;
Filipe Manana 903f5b
Filipe Manana 903f5b
9) The tree mod log user task calls btrfs_search_old_slot(), which calls
Filipe Manana 903f5b
   btrfs_get_old_root(), and finally that calls tree_mod_log_oldest_root()
Filipe Manana 903f5b
   with time_seq == 1 and eb_root == eb Y;
Filipe Manana 903f5b
Filipe Manana 903f5b
10) The first iteration of the while loop finds the tree mod log element
Filipe Manana 903f5b
    with sequence number 3, for the logical address of eb Y and of type
Filipe Manana 903f5b
    BTRFS_MOD_LOG_ROOT_REPLACE;
Filipe Manana 903f5b
Filipe Manana 903f5b
11) Because the operation type is BTRFS_MOD_LOG_ROOT_REPLACE, we don't
Filipe Manana 903f5b
    break out of the loop, and set root_logical to point to
Filipe Manana 903f5b
    tm->old_root.logical, which corresponds to the logical address of
Filipe Manana 903f5b
    eb X;
Filipe Manana 903f5b
Filipe Manana 903f5b
12) On the next iteration of the while loop, the call to
Filipe Manana 903f5b
    tree_mod_log_search_oldest() returns the smallest tree mod log element
Filipe Manana 903f5b
    for the logical address of eb X, which has a sequence number of 2, an
Filipe Manana 903f5b
    operation type of BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING and
Filipe Manana 903f5b
    corresponds to the old slot 0 of eb X (eb X had only 1 item in it
Filipe Manana 903f5b
    before being freed at step 7);
Filipe Manana 903f5b
Filipe Manana 903f5b
13) We then break out of the while loop and return the tree mod log
Filipe Manana 903f5b
    operation of type BTRFS_MOD_LOG_ROOT_REPLACE (eb Y), and not the one
Filipe Manana 903f5b
    for slot 0 of eb X, to btrfs_get_old_root();
Filipe Manana 903f5b
Filipe Manana 903f5b
14) At btrfs_get_old_root(), we process the BTRFS_MOD_LOG_ROOT_REPLACE
Filipe Manana 903f5b
    operation and set "logical" to the logical address of eb X, which was
Filipe Manana 903f5b
    the old root. We then call tree_mod_log_search() passing it the logical
Filipe Manana 903f5b
    address of eb X and time_seq == 1;
Filipe Manana 903f5b
Filipe Manana 903f5b
15) But before calling tree_mod_log_search(), task B locks eb X, adds a
Filipe Manana 903f5b
    key to eb X, which results in adding a tree mod log operation of type
Filipe Manana 903f5b
    BTRFS_MOD_LOG_KEY_ADD, with a sequence number of 4, to the tree mod
Filipe Manana 903f5b
    log, and increments the number of items in eb X from 0 to 1.
Filipe Manana 903f5b
    Now fs_info->tree_mod_seq has a value of 4;
Filipe Manana 903f5b
Filipe Manana 903f5b
16) Task A then calls tree_mod_log_search(), which returns the most recent
Filipe Manana 903f5b
    tree mod log operation for eb X, which is the one just added by task B
Filipe Manana 903f5b
    at the previous step, with a sequence number of 4, a type of
Filipe Manana 903f5b
    BTRFS_MOD_LOG_KEY_ADD and for slot 0;
Filipe Manana 903f5b
Filipe Manana 903f5b
17) Before task A locks and clones eb X, task A adds another key to eb X,
Filipe Manana 903f5b
    which results in adding a new BTRFS_MOD_LOG_KEY_ADD mod log operation,
Filipe Manana 903f5b
    with a sequence number of 5, for slot 1 of eb X, increments the
Filipe Manana 903f5b
    number of items in eb X from 1 to 2, and unlocks eb X.
Filipe Manana 903f5b
    Now fs_info->tree_mod_seq has a value of 5;
Filipe Manana 903f5b
Filipe Manana 903f5b
18) Task A then locks eb X and clones it. The clone has a value of 2 for
Filipe Manana 903f5b
    the number of items and the pointer "tm" points to the tree mod log
Filipe Manana 903f5b
    operation with sequence number 4, not the most recent one with a
Filipe Manana 903f5b
    sequence number of 5, so there is mismatch between the number of
Filipe Manana 903f5b
    mod log operations that are going to be applied to the cloned version
Filipe Manana 903f5b
    of eb X and the number of items in the clone;
Filipe Manana 903f5b
Filipe Manana 903f5b
19) Task A then calls tree_mod_log_rewind() with the clone of eb X, the
Filipe Manana 903f5b
    tree mod log operation with sequence number 4 and a type of
Filipe Manana 903f5b
    BTRFS_MOD_LOG_KEY_ADD, and time_seq == 1;
Filipe Manana 903f5b
Filipe Manana 903f5b
20) At tree_mod_log_rewind(), we set the local variable "n" with a value
Filipe Manana 903f5b
    of 2, which is the number of items in the clone of eb X.
Filipe Manana 903f5b
Filipe Manana 903f5b
    Then in the first iteration of the while loop, we process the mod log
Filipe Manana 903f5b
    operation with sequence number 4, which is targeted at slot 0 and has
Filipe Manana 903f5b
    a type of BTRFS_MOD_LOG_KEY_ADD. This results in decrementing "n" from
Filipe Manana 903f5b
    2 to 1.
Filipe Manana 903f5b
Filipe Manana 903f5b
    Then we pick the next tree mod log operation for eb X, which is the
Filipe Manana 903f5b
    tree mod log operation with a sequence number of 2, a type of
Filipe Manana 903f5b
    BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING and for slot 0, it is the one
Filipe Manana 903f5b
    added in step 5 to the tree mod log tree.
Filipe Manana 903f5b
Filipe Manana 903f5b
    We go back to the top of the loop to process this mod log operation,
Filipe Manana 903f5b
    and because its slot is 0 and "n" has a value of 1, we hit the BUG_ON:
Filipe Manana 903f5b
Filipe Manana 903f5b
        (...)
Filipe Manana 903f5b
        switch (tm->op) {
Filipe Manana 903f5b
        case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
Filipe Manana 903f5b
                BUG_ON(tm->slot < n);
Filipe Manana 903f5b
                fallthrough;
Filipe Manana 903f5b
        (...)
Filipe Manana 903f5b
Filipe Manana 903f5b
Fix this by checking for a more recent tree mod log operation after locking
Filipe Manana 903f5b
and cloning the extent buffer of the old root node, and use it as the first
Filipe Manana 903f5b
operation to apply to the cloned extent buffer when rewinding it.
Filipe Manana 903f5b
Filipe Manana 903f5b
Stable backport notes: due to moved code and renames, in =< 5.11 the
Filipe Manana 903f5b
change should be applied to ctree.c:get_old_root.
Filipe Manana 903f5b
Filipe Manana 903f5b
Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Filipe Manana 903f5b
Link: https://lore.kernel.org/linux-btrfs/20210404040732.GZ32440@hungrycats.org/
Filipe Manana 903f5b
Fixes: 834328a8493079 ("Btrfs: tree mod log's old roots could still be part of the tree")
Filipe Manana 903f5b
CC: stable@vger.kernel.org # 4.4+
Filipe Manana 903f5b
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Filipe Manana 903f5b
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana 903f5b
---
Filipe Manana 903f5b
 fs/btrfs/ctree.c | 20 ++++++++++++++++++++
Filipe Manana 903f5b
 1 file changed, 20 insertions(+)
Filipe Manana 903f5b
Filipe Manana 903f5b
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
Filipe Manana 903f5b
index 3c9da1aec..8780a7432 100755
Filipe Manana 903f5b
--- a/fs/btrfs/ctree.c
Filipe Manana 903f5b
+++ b/fs/btrfs/ctree.c
Filipe Manana 903f5b
@@ -1384,10 +1384,30 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
Filipe Manana 903f5b
 				   "failed to read tree block %llu from get_old_root",
Filipe Manana 903f5b
 				   logical);
Filipe Manana 903f5b
 		} else {
Filipe Manana 903f5b
+			struct tree_mod_elem *tm2;
Filipe Manana 903f5b
+
Filipe Manana 903f5b
 			btrfs_tree_read_lock(old);
Filipe Manana 903f5b
 			eb = btrfs_clone_extent_buffer(old);
Filipe Manana 903f5b
+			/*
Filipe Manana 903f5b
+			 * After the lookup for the most recent tree mod operation
Filipe Manana 903f5b
+			 * above and before we locked and cloned the extent buffer
Filipe Manana 903f5b
+			 * 'old', a new tree mod log operation may have been added.
Filipe Manana 903f5b
+			 * So lookup for a more recent one to make sure the number
Filipe Manana 903f5b
+			 * of mod log operations we replay is consistent with the
Filipe Manana 903f5b
+			 * number of items we have in the cloned extent buffer,
Filipe Manana 903f5b
+			 * otherwise we can hit a BUG_ON when rewinding the extent
Filipe Manana 903f5b
+			 * buffer.
Filipe Manana 903f5b
+			 */
Filipe Manana 903f5b
+			tm2 = tree_mod_log_search(fs_info, logical, time_seq);
Filipe Manana 903f5b
 			btrfs_tree_read_unlock(old);
Filipe Manana 903f5b
 			free_extent_buffer(old);
Filipe Manana 903f5b
+			ASSERT(tm2);
Filipe Manana 903f5b
+			ASSERT(tm2 == tm || tm2->seq > tm->seq);
Filipe Manana 903f5b
+			if (!tm2 || tm2->seq < tm->seq) {
Filipe Manana 903f5b
+				free_extent_buffer(eb);
Filipe Manana 903f5b
+				return NULL;
Filipe Manana 903f5b
+			}
Filipe Manana 903f5b
+			tm = tm2;
Filipe Manana 903f5b
 		}
Filipe Manana 903f5b
 	} else if (old_root) {
Filipe Manana 903f5b
 		eb_root_owner = btrfs_header_owner(eb_root);
Filipe Manana 903f5b
-- 
Filipe Manana 903f5b
2.26.2
Filipe Manana 903f5b