|
Michal Kubecek |
7838ee |
From: Neal Cardwell <ncardwell@google.com>
|
|
Michal Kubecek |
7838ee |
Date: Thu, 3 Aug 2017 09:19:54 -0400
|
|
Michal Kubecek |
7838ee |
Subject: tcp: fix xmit timer to only be reset if data ACKed/SACKed
|
|
Michal Kubecek |
7838ee |
Patch-mainline: v4.13-rc5
|
|
Michal Kubecek |
7838ee |
Git-commit: df92c8394e6ea0469e8056946ef8add740ab8046
|
|
Michal Kubecek |
7838ee |
References: bsc#1076830
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
Fix a TCP loss recovery performance bug raised recently on the netdev
|
|
Michal Kubecek |
7838ee |
list, in two threads:
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
(i) July 26, 2017: netdev thread "TCP fast retransmit issues"
|
|
Michal Kubecek |
7838ee |
(ii) July 26, 2017: netdev thread:
|
|
Michal Kubecek |
7838ee |
"[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
|
|
Michal Kubecek |
7838ee |
outstanding TLP retransmission"
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
The basic problem is that incoming TCP packets that did not indicate
|
|
Michal Kubecek |
7838ee |
forward progress could cause the xmit timer (TLP or RTO) to be rearmed
|
|
Michal Kubecek |
7838ee |
and pushed back in time. In certain corner cases this could result in
|
|
Michal Kubecek |
7838ee |
the following problems noted in these threads:
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
- Repeated ACKs coming in with bogus SACKs corrupted by middleboxes
|
|
Michal Kubecek |
7838ee |
could cause TCP to repeatedly schedule TLPs forever. We kept
|
|
Michal Kubecek |
7838ee |
sending TLPs after every ~200ms, which elicited bogus SACKs, which
|
|
Michal Kubecek |
7838ee |
caused more TLPs, ad infinitum; we never fired an RTO to fill in
|
|
Michal Kubecek |
7838ee |
the holes.
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
- Incoming data segments could, in some cases, cause us to reschedule
|
|
Michal Kubecek |
7838ee |
our RTO or TLP timer further out in time, for no good reason. This
|
|
Michal Kubecek |
7838ee |
could cause repeated inbound data to result in stalls in outbound
|
|
Michal Kubecek |
7838ee |
data, in the presence of packet loss.
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
This commit fixes these bugs by changing the TLP and RTO ACK
|
|
Michal Kubecek |
7838ee |
processing to:
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
(a) Only reschedule the xmit timer once per ACK.
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
(b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the
|
|
Michal Kubecek |
7838ee |
ACK indicates sufficient forward progress (a packet was
|
|
Michal Kubecek |
7838ee |
cumulatively ACKed, or we got a SACK for a packet that was sent
|
|
Michal Kubecek |
7838ee |
before the most recent retransmit of the write queue head).
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
This brings us back into closer compliance with the RFCs, since, as
|
|
Michal Kubecek |
7838ee |
the comment for tcp_rearm_rto() notes, we should only restart the RTO
|
|
Michal Kubecek |
7838ee |
timer after forward progress on the connection. Previously we were
|
|
Michal Kubecek |
7838ee |
restarting the xmit timer even in these cases where there was no
|
|
Michal Kubecek |
7838ee |
forward progress.
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
As a side benefit, this commit simplifies and speeds up the TCP timer
|
|
Michal Kubecek |
7838ee |
arming logic. We had been calling inet_csk_reset_xmit_timer() three
|
|
Michal Kubecek |
7838ee |
times on normal ACKs that cumulatively acknowledged some data:
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
1) Once near the top of tcp_ack() to switch from TLP timer to RTO:
|
|
Michal Kubecek |
7838ee |
if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
|
|
Michal Kubecek |
7838ee |
tcp_rearm_rto(sk);
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
2) Once in tcp_clean_rtx_queue(), to update the RTO:
|
|
Michal Kubecek |
7838ee |
if (flag & FLAG_ACKED) {
|
|
Michal Kubecek |
7838ee |
tcp_rearm_rto(sk);
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO
|
|
Michal Kubecek |
7838ee |
to TLP:
|
|
Michal Kubecek |
7838ee |
if (icsk->icsk_pending == ICSK_TIME_RETRANS)
|
|
Michal Kubecek |
7838ee |
tcp_schedule_loss_probe(sk);
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
This commit, by only rescheduling the xmit timer once per ACK,
|
|
Michal Kubecek |
7838ee |
simplifies the code and reduces CPU overhead.
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
This commit was tested in an A/B test with Google web server
|
|
Michal Kubecek |
7838ee |
traffic. SNMP stats and request latency metrics were within noise
|
|
Michal Kubecek |
7838ee |
levels, substantiating that for normal web traffic patterns this is a
|
|
Michal Kubecek |
7838ee |
rare issue. This commit was also tested with packetdrill tests to
|
|
Michal Kubecek |
7838ee |
verify that it fixes the timer behavior in the corner cases discussed
|
|
Michal Kubecek |
7838ee |
in the netdev threads mentioned above.
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
This patch is a bug fix patch intended to be queued for -stable
|
|
Michal Kubecek |
7838ee |
relases.
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
|
|
Michal Kubecek |
7838ee |
Reported-by: Klavs Klavsen <kl@vsen.dk>
|
|
Michal Kubecek |
7838ee |
Reported-by: Mao Wenan <maowenan@huawei.com>
|
|
Michal Kubecek |
7838ee |
Signed-off-by: Neal Cardwell <ncardwell@google.com>
|
|
Michal Kubecek |
7838ee |
Signed-off-by: Yuchung Cheng <ycheng@google.com>
|
|
Michal Kubecek |
7838ee |
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
|
|
Michal Kubecek |
7838ee |
Acked-by: Eric Dumazet <edumazet@google.com>
|
|
Michal Kubecek |
7838ee |
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Michal Kubecek |
7838ee |
Acked-by: Michal Kubecek <mkubecek@suse.cz>
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
---
|
|
Michal Kubecek |
7838ee |
net/ipv4/tcp_input.c | 25 ++++++++++++++++---------
|
|
Michal Kubecek |
7838ee |
net/ipv4/tcp_output.c | 9 ---------
|
|
Michal Kubecek |
7838ee |
2 files changed, 16 insertions(+), 18 deletions(-)
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
|
|
Michal Kubecek |
7838ee |
index 932c532c7a34..48141dec4e2d 100644
|
|
Michal Kubecek |
7838ee |
--- a/net/ipv4/tcp_input.c
|
|
Michal Kubecek |
7838ee |
+++ b/net/ipv4/tcp_input.c
|
|
Michal Kubecek |
7838ee |
@@ -110,6 +110,7 @@ int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
|
|
Michal Kubecek |
7838ee |
#define FLAG_ORIG_SACK_ACKED 0x200 /* Never retransmitted data are (s)acked */
|
|
Michal Kubecek |
7838ee |
#define FLAG_SND_UNA_ADVANCED 0x400 /* Snd_una was changed (!= FLAG_DATA_ACKED) */
|
|
Michal Kubecek |
7838ee |
#define FLAG_DSACKING_ACK 0x800 /* SACK blocks contained D-SACK info */
|
|
Michal Kubecek |
7838ee |
+#define FLAG_SET_XMIT_TIMER 0x1000 /* Set TLP or RTO timer */
|
|
Michal Kubecek |
7838ee |
#define FLAG_SACK_RENEGING 0x2000 /* snd_una advanced to a sacked seq */
|
|
Michal Kubecek |
7838ee |
#define FLAG_UPDATE_TS_RECENT 0x4000 /* tcp_replace_ts_recent() */
|
|
Michal Kubecek |
7838ee |
#define FLAG_NO_CHALLENGE_ACK 0x8000 /* do not call tcp_send_challenge_ack() */
|
|
Michal Kubecek |
7838ee |
@@ -3015,6 +3016,13 @@ void tcp_rearm_rto(struct sock *sk)
|
|
Michal Kubecek |
7838ee |
}
|
|
Michal Kubecek |
7838ee |
}
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
+/* Try to schedule a loss probe; if that doesn't work, then schedule an RTO. */
|
|
Michal Kubecek |
7838ee |
+static void tcp_set_xmit_timer(struct sock *sk)
|
|
Michal Kubecek |
7838ee |
+{
|
|
Michal Kubecek |
7838ee |
+ if (!tcp_schedule_loss_probe(sk))
|
|
Michal Kubecek |
7838ee |
+ tcp_rearm_rto(sk);
|
|
Michal Kubecek |
7838ee |
+}
|
|
Michal Kubecek |
7838ee |
+
|
|
Michal Kubecek |
7838ee |
/* If we get here, the whole TSO packet has not been acked. */
|
|
Michal Kubecek |
7838ee |
static u32 tcp_tso_acked(struct sock *sk, struct sk_buff *skb)
|
|
Michal Kubecek |
7838ee |
{
|
|
Michal Kubecek |
7838ee |
@@ -3177,7 +3185,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,
|
|
Michal Kubecek |
7838ee |
ca_rtt_us);
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
if (flag & FLAG_ACKED) {
|
|
Michal Kubecek |
7838ee |
- tcp_rearm_rto(sk);
|
|
Michal Kubecek |
7838ee |
+ flag |= FLAG_SET_XMIT_TIMER; /* set TLP or RTO timer */
|
|
Michal Kubecek |
7838ee |
if (unlikely(icsk->icsk_mtup.probe_size &&
|
|
Michal Kubecek |
7838ee |
!after(tp->mtu_probe.probe_seq_end, tp->snd_una))) {
|
|
Michal Kubecek |
7838ee |
tcp_mtup_probe_success(sk);
|
|
Michal Kubecek |
7838ee |
@@ -3205,7 +3213,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,
|
|
Michal Kubecek |
7838ee |
* after when the head was last (re)transmitted. Otherwise the
|
|
Michal Kubecek |
7838ee |
* timeout may continue to extend in loss recovery.
|
|
Michal Kubecek |
7838ee |
*/
|
|
Michal Kubecek |
7838ee |
- tcp_rearm_rto(sk);
|
|
Michal Kubecek |
7838ee |
+ flag |= FLAG_SET_XMIT_TIMER; /* set TLP or RTO timer */
|
|
Michal Kubecek |
7838ee |
}
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
if (icsk->icsk_ca_ops->pkts_acked) {
|
|
Michal Kubecek |
7838ee |
@@ -3577,9 +3585,6 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
|
|
Michal Kubecek |
7838ee |
if (after(ack, tp->snd_nxt))
|
|
Michal Kubecek |
7838ee |
goto invalid_ack;
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
- if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
|
|
Michal Kubecek |
7838ee |
- tcp_rearm_rto(sk);
|
|
Michal Kubecek |
7838ee |
-
|
|
Michal Kubecek |
7838ee |
if (after(ack, prior_snd_una)) {
|
|
Michal Kubecek |
7838ee |
flag |= FLAG_SND_UNA_ADVANCED;
|
|
Michal Kubecek |
7838ee |
icsk->icsk_retransmits = 0;
|
|
Michal Kubecek |
7838ee |
@@ -3644,18 +3649,20 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
|
|
Michal Kubecek |
7838ee |
flag |= tcp_clean_rtx_queue(sk, prior_fackets, prior_snd_una, &acked,
|
|
Michal Kubecek |
7838ee |
&sack_state);
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
+ if (tp->tlp_high_seq)
|
|
Michal Kubecek |
7838ee |
+ tcp_process_tlp_ack(sk, ack, flag);
|
|
Michal Kubecek |
7838ee |
+ /* If needed, reset TLP/RTO timer; RACK may later override this. */
|
|
Michal Kubecek |
7838ee |
+ if (flag & FLAG_SET_XMIT_TIMER)
|
|
Michal Kubecek |
7838ee |
+ tcp_set_xmit_timer(sk);
|
|
Michal Kubecek |
7838ee |
+
|
|
Michal Kubecek |
7838ee |
if (tcp_ack_is_dubious(sk, flag)) {
|
|
Michal Kubecek |
7838ee |
is_dupack = !(flag & (FLAG_SND_UNA_ADVANCED | FLAG_NOT_DUP));
|
|
Michal Kubecek |
7838ee |
tcp_fastretrans_alert(sk, acked, is_dupack, &flag, &rexmit);
|
|
Michal Kubecek |
7838ee |
}
|
|
Michal Kubecek |
7838ee |
- if (tp->tlp_high_seq)
|
|
Michal Kubecek |
7838ee |
- tcp_process_tlp_ack(sk, ack, flag);
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
|
|
Michal Kubecek |
7838ee |
sk_dst_confirm(sk);
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
- if (icsk->icsk_pending == ICSK_TIME_RETRANS)
|
|
Michal Kubecek |
7838ee |
- tcp_schedule_loss_probe(sk);
|
|
Michal Kubecek |
7838ee |
delivered = tp->delivered - delivered; /* freshly ACKed or SACKed */
|
|
Michal Kubecek |
7838ee |
lost = tp->lost - lost; /* freshly marked lost */
|
|
Michal Kubecek |
7838ee |
tcp_rate_gen(sk, delivered, lost, sack_state.rate);
|
|
Michal Kubecek |
7838ee |
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
|
|
Michal Kubecek |
7838ee |
index de3428c97751..c8cdfac1b3a9 100644
|
|
Michal Kubecek |
7838ee |
--- a/net/ipv4/tcp_output.c
|
|
Michal Kubecek |
7838ee |
+++ b/net/ipv4/tcp_output.c
|
|
Michal Kubecek |
7838ee |
@@ -2349,21 +2349,12 @@ bool tcp_schedule_loss_probe(struct sock *sk)
|
|
Michal Kubecek |
7838ee |
u32 rtt = usecs_to_jiffies(tp->srtt_us >> 3);
|
|
Michal Kubecek |
7838ee |
u32 timeout, rto_delta_us;
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
- /* No consecutive loss probes. */
|
|
Michal Kubecek |
7838ee |
- if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
|
|
Michal Kubecek |
7838ee |
- tcp_rearm_rto(sk);
|
|
Michal Kubecek |
7838ee |
- return false;
|
|
Michal Kubecek |
7838ee |
- }
|
|
Michal Kubecek |
7838ee |
/* Don't do any loss probe on a Fast Open connection before 3WHS
|
|
Michal Kubecek |
7838ee |
* finishes.
|
|
Michal Kubecek |
7838ee |
*/
|
|
Michal Kubecek |
7838ee |
if (tp->fastopen_rsk)
|
|
Michal Kubecek |
7838ee |
return false;
|
|
Michal Kubecek |
7838ee |
|
|
Michal Kubecek |
7838ee |
- /* TLP is only scheduled when next timer event is RTO. */
|
|
Michal Kubecek |
7838ee |
- if (icsk->icsk_pending != ICSK_TIME_RETRANS)
|
|
Michal Kubecek |
7838ee |
- return false;
|
|
Michal Kubecek |
7838ee |
-
|
|
Michal Kubecek |
7838ee |
/* Schedule a loss probe in 2*RTT for SACK capable connections
|
|
Michal Kubecek |
7838ee |
* in Open state, that are either limited by cwnd or application.
|
|
Michal Kubecek |
7838ee |
*/
|
|
Michal Kubecek |
7838ee |
--
|
|
Michal Kubecek |
7838ee |
2.16.2
|
|
Michal Kubecek |
7838ee |
|