Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Sep 2020 09:49:13 +0000
From:      "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>
To:        Liang Tian <l.tian.email@gmail.com>
Cc:        FreeBSD Transport <freebsd-transport@freebsd.org>
Subject:   RE: Fast recovery ssthresh value
Message-ID:  <SN4PR0601MB372817A4C0D80D981B1CE52586270@SN4PR0601MB3728.namprd06.prod.outlook.com>
In-Reply-To: <CAJhigrjdRzK5fKpE9jTQM5p-wzKUBALK7Cc34_Qbi7HCZ_NCXw@mail.gmail.com>
References:  <CAJhigrhbguXQzeYGfMtPRK03fp6KR65q8gjB9e9L-5tGGsuyzQ@mail.gmail.com> <SN4PR0601MB3728D1F8ABC9C86972B6C53886590@SN4PR0601MB3728.namprd06.prod.outlook.com> <CAJhigrjdRzK5fKpE9jTQM5p-wzKUBALK7Cc34_Qbi7HCZ_NCXw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Liang,

Yes, you are absolutely correct about this observation. The SACK loss recov=
ery will only send  one MSS per received ACK right now - and when there is =
ACK thinning present, will fail to timely recover all the missing packets, =
eventually receiving no more ACK to clock out more retransmissions...

I have a Diff in review, to implement Proportional Rate Reduction:

https://reviews.freebsd.org/D18892

Which should address not only that issue about ACK thinning, but also the i=
ssue that current SACK loss recovery has to wait until pipe drops below sst=
hresh, before the retransmissions are clocked out. And then, they would act=
ually be clocked out at the same rate at the incoming ACKs. This would be t=
he same rate as when the overload happened (barring any ACK thinning), and =
as a secondary effect, it was observed that this behavior too can lead to s=
elf-inflicted loss - of retransmissions.

If you have the ability to patch your kernel with D18892 and observe how th=
e reaction is in your dramatic ACK thinning scenario, that would be good to=
 know! The assumption of the Patch was, that - as per TCP RFC requirements =
- there is one ACK for each received out-of-sequence data segment, and ACK =
drops / thinning are not happening on such a massive scale as you describe =
it.

Best regards,

Richard Scheffenegger

-----Original Message-----
From: owner-freebsd-transport@freebsd.org <owner-freebsd-transport@freebsd.=
org> On Behalf Of Liang Tian
Sent: Mittwoch, 9. September 2020 19:16
To: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>
Cc: FreeBSD Transport <freebsd-transport@freebsd.org>
Subject: Re: Fast recovery ssthresh value

Hi Richard,

Thanks for the explanation and sorry for the late reply.
I've been investigating SACK loss recovery and I think I'm seeing an issue =
similar to the ABC L value issue that I reported
previously(https://reviews.freebsd.org/D26120) and I do believe there is a =
deviation to RFC3517:
The issue happens when a DupAck is received during SACK loss recovery in th=
e presence of ACK Thinning or receiver enabling LRO, which means the SACK b=
lock edges could expand by more than 1 SMSS(We've seen 30*SMSS), i.e. a sin=
gle DupAck could decrement `pipe` by more than 1 SMSS.
In RFC3517,
(C) If cwnd - pipe >=3D 1 SMSS, the sender SHOULD transmit one or more segm=
ents...
        (C.5) If cwnd - pipe >=3D 1 SMSS, return to (C.1) So based on RFC, =
the sender should be able to send more segments if such DupAck is received,=
 because of the big change to `pipe`.

In the current implementation, the cwin variable, which controls the amount=
 of data that can be transmitted based on the new information, is dictated =
by snd_cwnd. The snd_cwnd is incremented by 1 SMSS for each DupAck received=
. I believe this effectively limits the retransmission triggered by each Du=
pAck to 1 SMSS -  deviation.
 307         cwin =3D
 308             imax(min(tp->snd_wnd, tp->snd_cwnd) - sack_bytes_rxmt, 0);

As a result, SACK is not doing enough recovery in this scenario and loss ha=
s to be recovered by RTO.
Again, I'd appreciate feedback from the community.

Regards,
Liang Tian




On Sun, Aug 23, 2020 at 3:56 PM Scheffenegger, Richard <Richard.Scheffenegg=
er@netapp.com> wrote:
>
> Hi Liang,
>
> In SACK loss recovery, you can recover up to ssthresh (prior cwnd/2 [or 7=
0% in case of cubic]) lost bytes - at least in theory.
>
> In comparison, (New)Reno can only recover one lost packet per window, and=
 then keeps on transmitting new segments (ack + cwnd), even before the rece=
ipt of the retransmitted packet is acked.
>
> For historic reasons, the semantic of the variable cwnd is overloaded dur=
ing loss recovery, and it doesn't "really" indicate cwnd, but rather indica=
tes if/when retransmissions can happen.
>
>
> In both cases (also the simple one, with only one packet loss), cwnd shou=
ld be equal (or near equal) to ssthresh by the time loss recovery is finish=
ed - but NOT before! While it may appear like slow-start, the value of the =
cwnd variable really increases by acked_bytes only per ACK (not acked_bytes=
 + SMSS), since the left edge (snd_una) doesn't move right - unlike during =
slow-start. But numerically, these different phases (slow-start / sack loss=
-recovery) may appear very similar.
>
> You could check this using the (loadable) SIFTR module, which captures t_=
flags (indicating if cong/loss recovery is active), ssthresh, cwnd, and oth=
er parameters.
>
> That is at least how things are supposed to work; or have you investigate=
d the timing and behavior of SACK loss recovery and found a deviation to RF=
C3517? Note that FBSD currently has not fully implemented RFC6675 support (=
which deviates slightly from 3517 under specific circumstances; I have a pa=
tch pending to implemente 6675 rescue retransmissions, but haven't tweaked =
the other aspects of 6675 vs. 3517.
>
> BTW: While freebsd-net is not the wrong DL per se, TCP, UDP, SCTP specifi=
c questions can also be posted to freebsd-transport, which is more narrowly=
 focused.
>
> Best regards,
>
> Richard Scheffenegger
>
> -----Original Message-----
> From: owner-freebsd-net@freebsd.org <owner-freebsd-net@freebsd.org> On=20
> Behalf Of Liang Tian
> Sent: Sonntag, 23. August 2020 00:14
> To: freebsd-net <freebsd-net@freebsd.org>
> Subject: Fast recovery ssthresh value
>
> Hi all,
>
> When 3 dupacks are received and TCP enter fast recovery, if SACK is used,=
 the CWND is set to maxseg:
>
> 2593                     if (tp->t_flags & TF_SACK_PERMIT) {
> 2594                         TCPSTAT_INC(
> 2595                             tcps_sack_recovery_episode);
> 2596                         tp->snd_recover =3D tp->snd_nxt;
> 2597                         tp->snd_cwnd =3D maxseg;
> 2598                         (void) tp->t_fb->tfb_tcp_output(tp);
> 2599                         goto drop;
> 2600                     }
>
> Otherwise(SACK is not in use), CWND is set to maxseg before
> tcp_output() and then set back to snd_ssthresh+inflation
> 2601                     tp->snd_nxt =3D th->th_ack;
> 2602                     tp->snd_cwnd =3D maxseg;
> 2603                     (void) tp->t_fb->tfb_tcp_output(tp);
> 2604                     KASSERT(tp->snd_limited <=3D 2,
> 2605                         ("%s: tp->snd_limited too big",
> 2606                         __func__));
> 2607                     tp->snd_cwnd =3D tp->snd_ssthresh +
> 2608                          maxseg *
> 2609                          (tp->t_dupacks - tp->snd_limited);
> 2610                     if (SEQ_GT(onxt, tp->snd_nxt))
> 2611                         tp->snd_nxt =3D onxt;
> 2612                     goto drop;
>
> I'm wondering in the SACK case, should CWND be set back to ssthresh(which=
 has been slashed in cc_cong_signal() a few lines above) before line 2599, =
like non-SACK case, instead of doing slow start from maxseg?
> I read rfc6675 and a few others, and it looks like that's the case. I app=
reciate your opinion, again.
>
> Thanks,
> Liang
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
_______________________________________________
freebsd-transport@freebsd.org mailing list https://lists.freebsd.org/mailma=
n/listinfo/freebsd-transport
To unsubscribe, send any mail to "freebsd-transport-unsubscribe@freebsd.org=
"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?SN4PR0601MB372817A4C0D80D981B1CE52586270>