From: Firo Yang <firo.yang@suse.com>
Date: Mon, 20 Apr 2020 17:10:01 +0200
Subject: net: fix race condition in __inet_lookup_established()
Patch-mainline: Never, mainline solution (v5.5-rc3) too intrusive
References: bsc#1151794 bsc#1180624
Listening and established sockets share the same slab cache which has
SLAB_TYPESAFE_BY_RCU flag set but this only protects from a slab page being
freed and reused for a different slab cache (or other purpose), not from
being reused for a new object from the same slab cache. Therefore the loop
in __inet_lookup_established() can skip from an established socket to
a listening one.
Since commit 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under
synflood"), listener hashtable no longer uses "nulls" lists so that after
such switch, sk_nulls_for_each_rcu() loop in __inet_lookup_established()
would hit NULL as end marker which it would fail to recognize. Analogously,
__inet_lookup_listener() and inet_diag_dump_icsk() may hit an opposite
switch from a listener socket to an established one.
The upstream solution, commit 8dbd76e79a16 ("tcp/dccp: fix possible race
__inet_lookup_established()"), is rather intrusive and would break kABI
in a way which would be impossible to work around reliably. Therefore we
use a simpler patch which is safer at the expense of a minor performance
penalty.
For testing purpose, add a temporary debugging message whenever a race
resulting in a switch from an established socket to a listener or vice
versa is detected.
Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Firo Yang <firo.yang@suse.com>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
net/ipv4/inet_diag.c | 5 +++++
net/ipv4/inet_hashtables.c | 21 ++++++++++++++++++---
2 files changed, 23 insertions(+), 3 deletions(-)
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -884,6 +884,11 @@ void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *skb,
sk_for_each(sk, &ilb->head) {
struct inet_sock *inet = inet_sk(sk);
+ if (unlikely(is_a_nulls(&sk->sk_nulls_node))) {
+ pr_info("%s: bsc#1180624 race encountered\n",
+ __func__);
+ break;
+ }
if (!net_eq(sock_net(sk), net))
continue;
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -212,12 +212,23 @@ struct sock *__inet_lookup_listener(struct net *net,
{
unsigned int hash = inet_lhashfn(net, hnum);
struct inet_listen_hashbucket *ilb = &hashinfo->listening_hash[hash];
- int score, hiscore = 0, matches = 0, reuseport = 0;
+ int score, hiscore, matches, reuseport;
bool exact_dif = inet_exact_dif_match(net, skb);
- struct sock *sk, *result = NULL;
- u32 phash = 0;
+ struct sock *sk, *result;
+ u32 phash;
+
+begin:
+ hiscore = 0;
+ matches = 0;
+ reuseport = 0;
+ result = NULL;
+ phash = 0;
sk_for_each_rcu(sk, &ilb->head) {
+ if (unlikely(is_a_nulls(&sk->sk_nulls_node))) {
+ pr_info("%s: bsc#1180624 race encountered\n", __func__);
+ goto begin;
+ }
score = compute_score(sk, net, hnum, daddr, dif, exact_dif);
if (score > hiscore) {
reuseport = sk->sk_reuseport;
@@ -283,6 +294,10 @@ struct sock *__inet_lookup_established(struct net *net,
begin:
sk_nulls_for_each_rcu(sk, node, &head->chain) {
+ if (unlikely(!node)) {
+ pr_info("%s: bsc#1151794 race encountered\n", __func__);
+ goto begin;
+ }
if (sk->sk_hash != hash)
continue;
if (likely(INET_MATCH(sk, net, acookie,