Blob Blame History Raw
From: Xiubo Li <xiubli@redhat.com>
Date: Tue, 10 Dec 2019 20:29:40 -0500
Subject: ceph: check availability of mds cluster on mount after wait timeout
Git-commit: 97820058fb2831a4b203981fa2566ceaaa396103
Patch-mainline: v5.6-rc1
References: bsc#1205903

If all the MDS daemons are down for some reason, then the first mount
attempt will fail with EIO after the mount request times out.  A mount
attempt will also fail with EIO if all of the MDS's are laggy.

This patch changes the code to return -EHOSTUNREACH in these situations
and adds a pr_info error message to help the admin determine the cause.

Url: https://tracker.ceph.com/issues/4386
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Luis Henriques <lhenriques@suse.com>

---
 fs/ceph/mds_client.c |    3 +--
 fs/ceph/super.c      |    5 +++++
 2 files changed, 6 insertions(+), 2 deletions(-)

--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2493,8 +2493,7 @@ static void __do_request(struct ceph_mds
 		if (!(mdsc->fsc->mount_options->flags &
 		      CEPH_MOUNT_OPT_MOUNTWAIT) &&
 		    !ceph_mdsmap_is_cluster_available(mdsc->mdsmap)) {
-			err = -ENOENT;
-			pr_info("probably no mds server is up\n");
+			err = -EHOSTUNREACH;
 			goto finish;
 		}
 	}
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -1106,6 +1106,11 @@ static struct dentry *ceph_mount(struct
 	return res;
 
 out_splat:
+	if (!ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) {
+		pr_info("No mds server is up or the cluster is laggy\n");
+		err = -EHOSTUNREACH;
+	}
+
 	ceph_mdsc_close_sessions(fsc->mdsc);
 	deactivate_locked_super(sb);
 	goto out_final;