Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Sep 2020 13:15:54 -0400
From:      Liang Tian <l.tian.email@gmail.com>
To:        "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>
Cc:        FreeBSD Transport <freebsd-transport@freebsd.org>
Subject:   Re: Fast recovery ssthresh value
Message-ID:  <CAJhigrjdRzK5fKpE9jTQM5p-wzKUBALK7Cc34_Qbi7HCZ_NCXw@mail.gmail.com>
In-Reply-To: <SN4PR0601MB3728D1F8ABC9C86972B6C53886590@SN4PR0601MB3728.namprd06.prod.outlook.com>
References:  <CAJhigrhbguXQzeYGfMtPRK03fp6KR65q8gjB9e9L-5tGGsuyzQ@mail.gmail.com> <SN4PR0601MB3728D1F8ABC9C86972B6C53886590@SN4PR0601MB3728.namprd06.prod.outlook.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Richard,

Thanks for the explanation and sorry for the late reply.
I've been investigating SACK loss recovery and I think I'm seeing an
issue similar to the ABC L value issue that I reported
previously(https://reviews.freebsd.org/D26120) and I do believe there
is a deviation to RFC3517:
The issue happens when a DupAck is received during SACK loss recovery
in the presence of ACK Thinning or receiver enabling LRO, which means
the SACK block edges could expand by more than 1 SMSS(We've seen
30*SMSS), i.e. a single DupAck could decrement `pipe` by more than 1
SMSS.
In RFC3517,
(C) If cwnd - pipe >=3D 1 SMSS, the sender SHOULD transmit one or more segm=
ents...
        (C.5) If cwnd - pipe >=3D 1 SMSS, return to (C.1)
So based on RFC, the sender should be able to send more segments if
such DupAck is received, because of the big change to `pipe`.

In the current implementation, the cwin variable, which controls the
amount of data that can be transmitted based on the new information,
is dictated by snd_cwnd. The snd_cwnd is incremented by 1 SMSS for
each DupAck received. I believe this effectively limits the
retransmission triggered by each DupAck to 1 SMSS -  deviation.
 307         cwin =3D
 308             imax(min(tp->snd_wnd, tp->snd_cwnd) - sack_bytes_rxmt, 0);

As a result, SACK is not doing enough recovery in this scenario and
loss has to be recovered by RTO.
Again, I'd appreciate feedback from the community.

Regards,
Liang Tian




On Sun, Aug 23, 2020 at 3:56 PM Scheffenegger, Richard
<Richard.Scheffenegger@netapp.com> wrote:
>
> Hi Liang,
>
> In SACK loss recovery, you can recover up to ssthresh (prior cwnd/2 [or 7=
0% in case of cubic]) lost bytes - at least in theory.
>
> In comparison, (New)Reno can only recover one lost packet per window, and=
 then keeps on transmitting new segments (ack + cwnd), even before the rece=
ipt of the retransmitted packet is acked.
>
> For historic reasons, the semantic of the variable cwnd is overloaded dur=
ing loss recovery, and it doesn't "really" indicate cwnd, but rather indica=
tes if/when retransmissions can happen.
>
>
> In both cases (also the simple one, with only one packet loss), cwnd shou=
ld be equal (or near equal) to ssthresh by the time loss recovery is finish=
ed - but NOT before! While it may appear like slow-start, the value of the =
cwnd variable really increases by acked_bytes only per ACK (not acked_bytes=
 + SMSS), since the left edge (snd_una) doesn't move right - unlike during =
slow-start. But numerically, these different phases (slow-start / sack loss=
-recovery) may appear very similar.
>
> You could check this using the (loadable) SIFTR module, which captures t_=
flags (indicating if cong/loss recovery is active), ssthresh, cwnd, and oth=
er parameters.
>
> That is at least how things are supposed to work; or have you investigate=
d the timing and behavior of SACK loss recovery and found a deviation to RF=
C3517? Note that FBSD currently has not fully implemented RFC6675 support (=
which deviates slightly from 3517 under specific circumstances; I have a pa=
tch pending to implemente 6675 rescue retransmissions, but haven't tweaked =
the other aspects of 6675 vs. 3517.
>
> BTW: While freebsd-net is not the wrong DL per se, TCP, UDP, SCTP specifi=
c questions can also be posted to freebsd-transport, which is more narrowly=
 focused.
>
> Best regards,
>
> Richard Scheffenegger
>
> -----Original Message-----
> From: owner-freebsd-net@freebsd.org <owner-freebsd-net@freebsd.org> On Be=
half Of Liang Tian
> Sent: Sonntag, 23. August 2020 00:14
> To: freebsd-net <freebsd-net@freebsd.org>
> Subject: Fast recovery ssthresh value
>
> NetApp Security WARNING: This is an external email. Do not click links or=
 open attachments unless you recognize the sender and know the content is s=
afe.
>
>
>
>
> Hi all,
>
> When 3 dupacks are received and TCP enter fast recovery, if SACK is used,=
 the CWND is set to maxseg:
>
> 2593                     if (tp->t_flags & TF_SACK_PERMIT) {
> 2594                         TCPSTAT_INC(
> 2595                             tcps_sack_recovery_episode);
> 2596                         tp->snd_recover =3D tp->snd_nxt;
> 2597                         tp->snd_cwnd =3D maxseg;
> 2598                         (void) tp->t_fb->tfb_tcp_output(tp);
> 2599                         goto drop;
> 2600                     }
>
> Otherwise(SACK is not in use), CWND is set to maxseg before
> tcp_output() and then set back to snd_ssthresh+inflation
> 2601                     tp->snd_nxt =3D th->th_ack;
> 2602                     tp->snd_cwnd =3D maxseg;
> 2603                     (void) tp->t_fb->tfb_tcp_output(tp);
> 2604                     KASSERT(tp->snd_limited <=3D 2,
> 2605                         ("%s: tp->snd_limited too big",
> 2606                         __func__));
> 2607                     tp->snd_cwnd =3D tp->snd_ssthresh +
> 2608                          maxseg *
> 2609                          (tp->t_dupacks - tp->snd_limited);
> 2610                     if (SEQ_GT(onxt, tp->snd_nxt))
> 2611                         tp->snd_nxt =3D onxt;
> 2612                     goto drop;
>
> I'm wondering in the SACK case, should CWND be set back to ssthresh(which=
 has been slashed in cc_cong_signal() a few lines above) before line 2599, =
like non-SACK case, instead of doing slow start from maxseg?
> I read rfc6675 and a few others, and it looks like that's the case. I app=
reciate your opinion, again.
>
> Thanks,
> Liang
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJhigrjdRzK5fKpE9jTQM5p-wzKUBALK7Cc34_Qbi7HCZ_NCXw>