Blob Blame History Raw
From: Firo Yang <firo.yang@suse.com>
Date: Mon, 20 Apr 2020 17:10:01 +0200
Subject: net: fix race condition in __inet_lookup_established()
Patch-mainline: Never, mainline solution (v5.5-rc3) too intrusive
References: bsc#1151794 bsc#1180624

Listening and established sockets share the same slab cache which has
SLAB_TYPESAFE_BY_RCU flag set but this only protects from a slab page being
freed and reused for a different slab cache (or other purpose), not from
being reused for a new object from the same slab cache. Therefore the loop
in __inet_lookup_established() can skip from an established socket to
a listening one.

Since commit 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under
synflood"), listener hashtable no longer uses "nulls" lists so that after
such switch, sk_nulls_for_each_rcu() loop in __inet_lookup_established()
would hit NULL as end marker which it would fail to recognize. Analogously,
__inet_lookup_listener() and inet_diag_dump_icsk() may hit an opposite
switch from a listener socket to an established one.

The upstream solution, commit 8dbd76e79a16 ("tcp/dccp: fix possible race
__inet_lookup_established()"), is rather intrusive and would break kABI
in a way which would be impossible to work around reliably. Therefore we
use a simpler patch which is safer at the expense of a minor performance
penalty.

For testing purpose, add a temporary debugging message whenever a race
resulting in a switch from an established socket to a listener or vice
versa is detected.

Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Firo Yang <firo.yang@suse.com>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
 net/ipv4/inet_diag.c       |  5 +++++
 net/ipv4/inet_hashtables.c | 21 ++++++++++++++++++---
 2 files changed, 23 insertions(+), 3 deletions(-)

--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -884,6 +884,11 @@ void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *skb,
 			sk_for_each(sk, &ilb->head) {
 				struct inet_sock *inet = inet_sk(sk);
 
+				if (unlikely(is_a_nulls(&sk->sk_nulls_node))) {
+					pr_info("%s: bsc#1180624 race encountered\n",
+						__func__);
+					break;
+				}
 				if (!net_eq(sock_net(sk), net))
 					continue;
 
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -212,12 +212,23 @@ struct sock *__inet_lookup_listener(struct net *net,
 {
 	unsigned int hash = inet_lhashfn(net, hnum);
 	struct inet_listen_hashbucket *ilb = &hashinfo->listening_hash[hash];
-	int score, hiscore = 0, matches = 0, reuseport = 0;
+	int score, hiscore, matches, reuseport;
 	bool exact_dif = inet_exact_dif_match(net, skb);
-	struct sock *sk, *result = NULL;
-	u32 phash = 0;
+	struct sock *sk, *result;
+	u32 phash;
+
+begin:
+	hiscore = 0;
+	matches = 0;
+	reuseport = 0;
+	result = NULL;
+	phash = 0;
 
 	sk_for_each_rcu(sk, &ilb->head) {
+		if (unlikely(is_a_nulls(&sk->sk_nulls_node))) {
+			pr_info("%s: bsc#1180624 race encountered\n", __func__);
+			goto begin;
+		}
 		score = compute_score(sk, net, hnum, daddr, dif, exact_dif);
 		if (score > hiscore) {
 			reuseport = sk->sk_reuseport;
@@ -283,6 +294,10 @@ struct sock *__inet_lookup_established(struct net *net,
 
 begin:
 	sk_nulls_for_each_rcu(sk, node, &head->chain) {
+		if (unlikely(!node)) {
+			pr_info("%s: bsc#1151794 race encountered\n", __func__);
+			goto begin;
+		}
 		if (sk->sk_hash != hash)
 			continue;
 		if (likely(INET_MATCH(sk, net, acookie,