Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Nov 2015 12:41:45 -0800
From:      hiren panchasara <hiren@strugglingcoder.info>
To:        Jonathan Looney <jlooney@juniper.net>
Cc:        FreeBSD Transports <transport@FreeBSD.org>, Randall Stewart <rrs@netflix.com>
Subject:   Re: Maintaining dupack counter per hole (was: The trouble with sack..)
Message-ID:  <20151116204145.GX29829@strugglingcoder.info>
In-Reply-To: <D26F88C5.4B837%jlooney@juniper.net>
References:  <DA8A5844-8F11-42D5-B923-3F329203B867@netflix.com> <20151030062423.GB5261@strugglingcoder.info> <D258E1AC.4A234%jlooney@juniper.net> <20151116180644.GU29829@strugglingcoder.info> <D26F88C5.4B837%jlooney@juniper.net>

next in thread | previous in thread | raw e-mail | index | archive | help

--Cou6PmgoyP0+llr2
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 11/16/15 at 07:19P, Jonathan Looney wrote:
> On 11/16/15, 1:06 PM, "hiren panchasara" <hiren@strugglingcoder.info>
> wrote:
>=20
> >Getting back to this old thread...
> >
> >On 10/30/15 at 01:09P, Jonathan Looney wrote:
> >>The SACK hole-tracking code is already quite complex. If we're going to
> >> make a fundamental change, perhaps it is time to consider a rewrite,
> >> rather than a smaller patch? Maybe this is the best code we can write.
> >>Or,
> >> maybe it is time for a re-coding to make it more easily accessible.
> >
> >As Randall said (in response to this), that rewrite is not necessary. And
> >I agree to that. I don't see tcp_sack.c being in that bad shape that it
> >demands a rewrite. But ofc, if you have something really cleaner/faster
> >in mind, I'm all ears. :-)
>=20
> I don't have anything better in mind. :-)
>=20
> In fact, I think the code performs pretty well. And, that may be because
> it prioritizes performance over readability. But, since I think we've all
> stared at this code for a long time trying to figure out how it works
> (note the repetitive comments about obscure code), I think it is worth
> considering whether it is time to try to rewrite it to be more readable.
>=20
> Having said that, I don't see any evidence that now is the time. I just
> wanted to prompt the discussion. :-)

Fair enough. And we should look out for such opportunities whenever
possible :-)
>=20
>=20
> >>=20
> >> Do you charge all three holes with the duplicate ACKs? Do you copy the
> >> counter to the holes?
> >>=20
> >> Or, is the fact that the ACK is slightly different enough to reset the
> >> counter?
> >
> >Basically, whenever a hole gets broken up, subholes carry-forward the
> >dupack strike counter.
>=20
> This somewhat makes sense. And, it looks like you are charging all
> existing holes for each ACK that doesn't cover them (regardless of the
> fact that it isn't really a "duplicate ACK" in this case). That also
> generally makes sense.

Why do you say those are not duplicate acks? I know this goes into now
the definition of a duplicate ack what what we mean by that but to me,
this is *the* benefit we get from SACK that we can identify these guys
as duplicate acks.

>=20
> But, I think we could change this to help cases where packets are
> re-ordered.

I disagree with the proposal with reasons in-line below.=20
>
> >> If you reset the counter anytime the hole is broken up, it will take a
> >> while to get to three in a really out-of-order network scenario. On the
> >> other hand, if you don't reset the counter, you may retransmit too fas=
t.
> >
> >By 'too fast', I think you mean spurious retransmissions. If so, can you
> >explain a bit more?
>=20
> Sure.
>=20
> Imagine that my network is load-balancing over a 4xGE bundle, but the
> bundle links have different loads and slightly re-order the packets.
>=20
> The packets arrive at the remote side out of order, like this:
>=20
> 4500:5999
>=20
> 1500:2999
>=20
> 3000:4499
>=20
> 0:1499
>=20
> The remote side ACKs them in that order and we receive the ACKs in that
> order.
>=20
> After the third ACK, we will view the 0:1499 block as having three strikes
> and will retransmit it (assuming that is our threshold), even though the
> ACK will arrive momentarily.
>=20
> Of course, I think that problem already exists with our code today. But,
> my question is how the per-hole counter improves this.
I don't have an articulated answer for this right now. One clear issue
with the way we do retransmit is that we wait for 1/2 the inflight to
drain. (Look at 'Compute the amount of data in flight....' comment and
nearby code)

> I think the bottom
> line is that our code today assumes that we need to do a SACK retransmit
> (for all outstanding SACK holes) as soon as the first outstanding byte is
> not ACKd for three ACKs in a row.

But getting up to this dupack thresh "three" is also broken right now. A
slightly different but related problem. (As explained by Randall in 2)
of 'The trouble with sack..' email on this list.)
>=20
> If we instead reset the counter for a hole anytime the hole was partially
> ACKd (including being split), we would retransmit less aggressively. In
> this example, that would probably be good.
>=20
> A middle-ground is to charge all holes every time we receive an ACK that
> does not partially ACK the hole. If a hole is partially ACKd, neither
> charge the ACK with a strike nor reset the counter.

This is the idea I don't agree to. SACK which partially acks does tell
us something important and IMO it'd be bad if don't consider that
information and strike the counter.
>=20
> In any case, I think a rule that any particular byte must be ACKd in one
> of the next three packets after a higher byte is ACKd is potentially too
> rigid given the myriad ways packets can be re-ordered.

So, coming back to your example, it's a valid problem. But this is
not a valid solution for it. When you know that you are introducing some=20
degree of reordering as a sender (as your example shows), the dupack
threshold should be bumped up to that degree.

Now, linux does this with some smart heuristics. It tracks per
connection degree of reordering that dupack threshold is always set to
that for any point in time. I am wondering it something like that'd
help. OR at least, a sockopt that anyone can set (much simpler than
coming up with heuristics logic for the first pass) for their dupack
threshold. This would probably help your case very much.

Apologies if I am all over the place in this response.

Cheers,
Hiren

--Cou6PmgoyP0+llr2
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQF8BAABCgBmBQJWSj+FXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRBNEUyMEZBMUQ4Nzg4RjNGMTdFNjZGMDI4
QjkyNTBFMTU2M0VERkU1AAoJEIuSUOFWPt/lzYkH/0lEHuG/XbqeK+HNuumRxToe
tFrP+Xit+mA3Gf4cOhExj8f5ti93CdfVLK07Hg/oRpN4GaVUupVQHu2HoSiSXvBL
hOi10NIC7l2Mlf9agm5H1kiTlzIoWlWqaSOsviJjTzpm+r5aFyE62VZvkXL7BqWz
YfSbJBLwg3ObZ7zDiPwlcRVtuUO9iEmGhwdTnyssrzXDsdeb4KJwZO0wgaAEsxY/
hWyFz8R/kM9Wdapy1t9lfTTxqifchv9/lbIsfAnkoBA1S5M5pu120rI5a1dbP/Jm
fK/30qOyuil1tKLBRukQrrRjAwOaMiUZ7+98B73VDEpCUq+Rfbwjehryoo8+5VE=
=uUsr
-----END PGP SIGNATURE-----

--Cou6PmgoyP0+llr2--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151116204145.GX29829>