Blob Blame History Raw
From da3627c30d229fea1e070e984366f80a1c4d9166 Mon Sep 17 00:00:00 2001
From: Gang He <ghe@suse.com>
Date: Tue, 29 May 2018 11:09:22 +0800
Subject: [PATCH 3/3] dlm: remove O_NONBLOCK flag in sctp_connect_to_sock
Git-commit: da3627c30d229fea1e070e984366f80a1c4d9166
Patch-mainline: v4.18-rc1
References: bsc#1080542

We should remove O_NONBLOCK flag when calling sock->ops->connect()
in sctp_connect_to_sock() function.
Why?
1. up to now, sctp socket connect() function ignores the flag argument,
that means O_NONBLOCK flag does not take effect, then we should remove
it to avoid the confusion (but is not urgent).
2. for the future, there will be a patch to fix this problem, then the flag
argument will take effect, the patch has been queued at https://git.kernel.o
rg/pub/scm/linux/kernel/git/davem/net.git/commit/net/sctp?id=644fbdeacf1d3ed
d366e44b8ba214de9d1dd66a9.
But, the O_NONBLOCK flag will make sock->ops->connect() directly return
without any wait time, then the connection will not be established, DLM kernel
module will call sock->ops->connect() again and again, the bad results are,
CPU usage is almost 100%, even trigger soft_lockup problem if the related
configurations are enabled,
DLM kernel module also prints lots of messages like,
[Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
[Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
[Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
[Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
The upper application (e.g. ocfs2 mount command) is hanged at new_lockspace(),
the whole backtrace is as below,
tb0307-nd2:~ # cat /proc/2935/stack
[<0>] new_lockspace+0x957/0xac0 [dlm]
[<0>] dlm_new_lockspace+0xae/0x140 [dlm]
[<0>] user_cluster_connect+0xc3/0x3a0 [ocfs2_stack_user]
[<0>] ocfs2_cluster_connect+0x144/0x220 [ocfs2_stackglue]
[<0>] ocfs2_dlm_init+0x215/0x440 [ocfs2]
[<0>] ocfs2_fill_super+0xcb0/0x1290 [ocfs2]
[<0>] mount_bdev+0x173/0x1b0
[<0>] mount_fs+0x35/0x150
[<0>] vfs_kern_mount.part.23+0x54/0x100
[<0>] do_mount+0x59a/0xc40
[<0>] SyS_mount+0x80/0xd0
[<0>] do_syscall_64+0x76/0x140
[<0>] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[<0>] 0xffffffffffffffff

So, I think we should remove O_NONBLOCK flag here, since DLM kernel module can
not handle non-block sockect in connect() properly.

Signed-off-by: Gang He <ghe@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
---
 fs/dlm/lowcomms.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index d31e9abfb9f1..a5e4a221435c 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1092,7 +1092,7 @@ static void sctp_connect_to_sock(struct connection *con)
 	kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *)&tv,
 			  sizeof(tv));
 	result = sock->ops->connect(sock, (struct sockaddr *)&daddr, addr_len,
-				   O_NONBLOCK);
+				   0);
 	memset(&tv, 0, sizeof(tv));
 	kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *)&tv,
 			  sizeof(tv));
-- 
2.16.2