NeilBrown 839f5a
From: NeilBrown <neilb@suse.de>
NeilBrown 839f5a
Subject: NFS: Handle missing attributes in OPEN reply
NeilBrown 839f5a
Patch-mainline: Submitted, 04jan2023 linux-nfs@vger.kernel.org
NeilBrown 839f5a
References: bsc#1203740
NeilBrown 839f5a
NeilBrown 839f5a
If a NFSv4 OPEN reply reports that the file was successfully opened but
NeilBrown 839f5a
the subsequent GETATTR fails, Linux-NFS will attempt a stand-alone
NeilBrown 839f5a
GETATTR request.  If that also fails, handling of the reply is aborted
NeilBrown 839f5a
with error -EAGAIN and the open is attempted again from the start.
NeilBrown 839f5a
NeilBrown 839f5a
This leaves the server with an active state (because the OPEN operation
NeilBrown 839f5a
succeeded) which the client doesn't know about.  If the open-owner
NeilBrown 839f5a
(local user) did not have the file already open, this has minimal
NeilBrown 839f5a
consequences for the client and only causes the server to spend
NeilBrown 839f5a
resources on an open state that will never be used or explicitly closed.
NeilBrown 839f5a
NeilBrown 839f5a
If the open-owner DID already have the file open, then it will hold a
NeilBrown 839f5a
reference to the open-state for that file, but the seq-id in the
NeilBrown 839f5a
state-id will now be out-of-sync with the server.  The server will have
NeilBrown 839f5a
incremented the seq-id, but the client will not have noticed.  So when
NeilBrown 839f5a
the client next attempts to access the file using that state (READ,
NeilBrown 839f5a
WRITE, SETATTR), the attempt will fail with NFS4ERR_OLD_STATEID.
NeilBrown 839f5a
NeilBrown 839f5a
The Linux-client assumes this error is due to a race and simply retries
NeilBrown 839f5a
on the basis that the local state-id information should have been
NeilBrown 839f5a
updated by another thread.  This basis is invalid in this case and the
NeilBrown 839f5a
result is an infinite loop attempting IO and getting OLD_STATEID.
NeilBrown 839f5a
NeilBrown 839f5a
This has been observed with a NetApp Filer as the server (ONTAP 9.8 p5,
NeilBrown 839f5a
using NFSv4.0).  The client is creating, writing, and unlinking a
NeilBrown 839f5a
particular file from multiple clients (.bash_history).  If a new OPEN
NeilBrown 839f5a
from one client races with a REMOVE from another client while the first
NeilBrown 839f5a
client already has the file open, the Filer can report success for the
NeilBrown 839f5a
OPEN op, but NFS4ERR_STALE for the ACCESS and GETATTR ops in the OPEN
NeilBrown 839f5a
request.  This gets the seq-id out-of-sync and a subsequent write to the
NeilBrown 839f5a
other open on the first client causes the infinite loop to occur.
NeilBrown 839f5a
NeilBrown 839f5a
The reason that the client returns -EAGAIN is that it needs to find the
NeilBrown 839f5a
inode so it can find the associated state to update the seq-id, but the
NeilBrown 839f5a
inode lookup requires the file-id which is provided in the GETATTR
NeilBrown 839f5a
reply.  Without the file-id normal inode lookup cannot be used.
NeilBrown 839f5a
NeilBrown 839f5a
This patch changes the lookup so that when the file-id is not available
NeilBrown 839f5a
the list of states owned by the open-owner is examined to find the state
NeilBrown 839f5a
with the correct state-id (ignoring the seq-id part of the state-id).
NeilBrown 839f5a
If this is found it is used just as when a normal inode lookup finds an
NeilBrown 839f5a
inode.  If it isn't found, -EAGAIN is returned as before.
NeilBrown 839f5a
NeilBrown 839f5a
This bug can be demonstrated by modifying the Linux NFS server as
NeilBrown 839f5a
follows:
NeilBrown 839f5a
NeilBrown 839f5a
1/ The second time a file is opened, unlink it.  This simulates
NeilBrown 839f5a
   a race with another client, without needing to have a race:
NeilBrown 839f5a
Vlastimil Babka f07faa
    // in fs/nfsd/nfs4proc.c: @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
NeilBrown 839f5a
     	if (reclaim && !status)
NeilBrown 839f5a
     		nn->somebody_reclaimed = true;
NeilBrown 839f5a
     out:
NeilBrown 839f5a
    +	if (!status && open->op_stateid.si_generation > 1) {
NeilBrown 839f5a
    +		printk("Opening gen %d\n", (int)open->op_stateid.si_generation);
NeilBrown 839f5a
    +		vfs_unlink(mnt_user_ns(resfh->fh_export->ex_path.mnt),
NeilBrown 839f5a
    +			   resfh->fh_dentry->d_parent->d_inode,
NeilBrown 839f5a
    +			   resfh->fh_dentry, NULL);
NeilBrown 839f5a
    +	}
NeilBrown 839f5a
     	if (open->op_filp) {
NeilBrown 839f5a
     		fput(open->op_filp);
NeilBrown 839f5a
     		open->op_filp = NULL;
NeilBrown 839f5a
NeilBrown 839f5a
2/ When a GETATTR op is attempted on an unlinked file, return ESTALE
NeilBrown 839f5a
Vlastimil Babka f07faa
    // @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
NeilBrown 839f5a
     	if (status)
NeilBrown 839f5a
     		return status;
NeilBrown 839f5a
NeilBrown 839f5a
    +	if (cstate->current_fh.fh_dentry->d_inode->i_nlink == 0) {
NeilBrown 839f5a
    +		printk("Return Estale for unlinked file\n");
NeilBrown 839f5a
    +		return nfserr_stale;
NeilBrown 839f5a
    +	}
NeilBrown 839f5a
    +
NeilBrown 839f5a
     	if (getattr->ga_bmval[1] & NFSD_WRITEONLY_ATTRS_WORD1)
NeilBrown 839f5a
     		return nfserr_inval;
NeilBrown 839f5a
NeilBrown 839f5a
Then mount the filesystem and
NeilBrown 839f5a
NeilBrown 839f5a
  Thread 1            Thread 2
NeilBrown 839f5a
  open a file
NeilBrown 839f5a
                      open the same file (will fail)
NeilBrown 839f5a
  write to that file
NeilBrown 839f5a
NeilBrown 839f5a
I use this shell fragment, using 'sleep' for synchronisation.
NeilBrown 839f5a
The use of /bin/echo ensures the write is flushed when /bin/echo closes
NeilBrown 839f5a
the fd on exit.
NeilBrown 839f5a
NeilBrown 839f5a
    (
NeilBrown 839f5a
        /bin/echo hello
NeilBrown 839f5a
        sleep 3
NeilBrown 839f5a
        /bin/echo there
NeilBrown 839f5a
    ) > /import/A/afile &
NeilBrown 839f5a
    sleep 3
NeilBrown 839f5a
    cat /import/A/afile
NeilBrown 839f5a
NeilBrown 839f5a
Probably when the OPEN succeeds, the GETATTR fails, and we don't already
NeilBrown 839f5a
have the state open, we should explicitly close the state.  Leaving it
NeilBrown 839f5a
open could cause problems if, for example, the server revoked it and
NeilBrown 839f5a
signalled the client that there was a revoked state.  The client would
NeilBrown 839f5a
not be able to find the state that needed to be relinquished.  I haven't
NeilBrown 839f5a
attempted to implement this.
NeilBrown 839f5a
NeilBrown 839f5a
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown 839f5a
Acked-by: NeilBrown <neilb@suse.com>
NeilBrown 839f5a
NeilBrown 839f5a
---
NeilBrown 839f5a
 fs/nfs/nfs4_fs.h   |    1 +
NeilBrown 839f5a
 fs/nfs/nfs4proc.c  |   18 ++++++++++++++----
NeilBrown 839f5a
 fs/nfs/nfs4state.c |   17 +++++++++++++++++
NeilBrown 839f5a
 3 files changed, 32 insertions(+), 4 deletions(-)
NeilBrown 839f5a
NeilBrown 839f5a
--- a/fs/nfs/nfs4_fs.h
NeilBrown 839f5a
+++ b/fs/nfs/nfs4_fs.h
NeilBrown 839f5a
@@ -467,6 +467,7 @@ extern void nfs4_put_state_owner(struct
NeilBrown 839f5a
 extern void nfs4_purge_state_owners(struct nfs_server *, struct list_head *);
NeilBrown 839f5a
 extern void nfs4_free_state_owners(struct list_head *head);
NeilBrown 839f5a
 extern struct nfs4_state * nfs4_get_open_state(struct inode *, struct nfs4_state_owner *);
NeilBrown 839f5a
+extern struct inode *nfs4_get_inode_by_stateid(nfs4_stateid *stateid, struct nfs4_state_owner *owner);
NeilBrown 839f5a
 extern void nfs4_put_open_state(struct nfs4_state *);
NeilBrown 839f5a
 extern void nfs4_close_state(struct nfs4_state *, fmode_t);
NeilBrown 839f5a
 extern void nfs4_close_sync(struct nfs4_state *, fmode_t);
NeilBrown 839f5a
--- a/fs/nfs/nfs4proc.c
NeilBrown 839f5a
+++ b/fs/nfs/nfs4proc.c
NeilBrown 839f5a
@@ -1946,10 +1946,20 @@ nfs4_opendata_get_inode(struct nfs4_open
NeilBrown 839f5a
 	case NFS4_OPEN_CLAIM_NULL:
NeilBrown 839f5a
 	case NFS4_OPEN_CLAIM_DELEGATE_CUR:
NeilBrown 839f5a
 	case NFS4_OPEN_CLAIM_DELEGATE_PREV:
NeilBrown 839f5a
-		if (!(data->f_attr.valid & NFS_ATTR_FATTR))
NeilBrown 839f5a
-			return ERR_PTR(-EAGAIN);
NeilBrown 839f5a
-		inode = nfs_fhget(data->dir->d_sb, &data->o_res.fh,
NeilBrown 839f5a
-				&data->f_attr, data->f_label);
NeilBrown 839f5a
+		if (data->f_attr.valid & NFS_ATTR_FATTR) {
NeilBrown 839f5a
+			inode = nfs_fhget(data->dir->d_sb, &data->o_res.fh,
NeilBrown 839f5a
+					  &data->f_attr, data->f_label);
NeilBrown 839f5a
+		} else {
NeilBrown 839f5a
+			/* We don't have the fileid and so cannot do inode
NeilBrown 839f5a
+			 * lookup.  If we already have this state open we MUST
NeilBrown 839f5a
+			 * update the seqid to match the server, so we need to
NeilBrown 839f5a
+			 * find it if possible.
NeilBrown 839f5a
+			 */
NeilBrown 839f5a
+			inode = nfs4_get_inode_by_stateid(&data->o_res.stateid,
NeilBrown 839f5a
+							  data->owner);
NeilBrown 839f5a
+			if (!inode)
NeilBrown 839f5a
+				inode = ERR_PTR(-EAGAIN);
NeilBrown 839f5a
+		}
NeilBrown 839f5a
 		break;
NeilBrown 839f5a
 	default:
NeilBrown 839f5a
 		inode = d_inode(data->dentry);
NeilBrown 839f5a
--- a/fs/nfs/nfs4state.c
NeilBrown 839f5a
+++ b/fs/nfs/nfs4state.c
NeilBrown 839f5a
@@ -751,6 +751,23 @@ out:
NeilBrown 839f5a
 	return state;
NeilBrown 839f5a
 }
NeilBrown 839f5a
 
NeilBrown 839f5a
+struct inode *
NeilBrown 839f5a
+nfs4_get_inode_by_stateid(nfs4_stateid *stateid, struct nfs4_state_owner *owner)
NeilBrown 839f5a
+{
NeilBrown 839f5a
+	struct nfs4_state *state;
NeilBrown 839f5a
+	struct inode *inode = NULL;
NeilBrown 839f5a
+
NeilBrown 839f5a
+	spin_lock(&owner->so_lock);
NeilBrown 839f5a
+	list_for_each_entry(state, &owner->so_states, open_states)
NeilBrown 839f5a
+		if (nfs4_stateid_match_other(stateid, &state->open_stateid)) {
NeilBrown 839f5a
+			inode = state->inode;
NeilBrown 839f5a
+			ihold(inode);
NeilBrown 839f5a
+			break;
NeilBrown 839f5a
+		}
NeilBrown 839f5a
+	spin_unlock(&owner->so_lock);
NeilBrown 839f5a
+	return inode;
NeilBrown 839f5a
+}
NeilBrown 839f5a
+
NeilBrown 839f5a
 void nfs4_put_open_state(struct nfs4_state *state)
NeilBrown 839f5a
 {
NeilBrown 839f5a
 	struct inode *inode = state->inode;