From owner-freebsd-transport@freebsd.org Wed Oct 28 23:27:38 2015 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 69CFFA20A5B for ; Wed, 28 Oct 2015 23:27:38 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 4E827198E for ; Wed, 28 Oct 2015 23:27:38 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: by mailman.ysv.freebsd.org (Postfix) id 4B3B7A20A5A; Wed, 28 Oct 2015 23:27:38 +0000 (UTC) Delivered-To: transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 310F5A20A59 for ; Wed, 28 Oct 2015 23:27:38 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from mail.strugglingcoder.info (strugglingcoder.info [65.19.130.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.strugglingcoder.info", Issuer "mail.strugglingcoder.info" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 14C23198D for ; Wed, 28 Oct 2015 23:27:37 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from localhost (unknown [10.1.1.3]) (Authenticated sender: hiren@strugglingcoder.info) by mail.strugglingcoder.info (Postfix) with ESMTPA id 5785F10B120 for ; Wed, 28 Oct 2015 16:27:37 -0700 (PDT) Date: Wed, 28 Oct 2015 16:27:37 -0700 From: hiren panchasara To: transport@FreeBSD.org Subject: Re: Correct inflight/pipe calculation Message-ID: <20151028232737.GG5261@strugglingcoder.info> References: <20151007172610.GA42742@strugglingcoder.info> <20151021232210.GL28288@strugglingcoder.info> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="yQbNiKLmgenwUfTN" Content-Disposition: inline In-Reply-To: <20151021232210.GL28288@strugglingcoder.info> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Oct 2015 23:27:38 -0000 --yQbNiKLmgenwUfTN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 10/21/15 at 04:22P, hiren panchasara wrote: > On 10/07/15 at 10:26P, hiren panchasara wrote: > > Randall and I have been poking at different ways to improve FreeBSD > > tcp's reaction to loss. One of the major issue we found is that we do > > not use information provided by SACK efficiently even though we do keep > > the SACK scoreboard well in shape. Knowing amount of data in flight can > > be really crucial and can help us use available capacity of the path > > more efficiently. We currently do not have an accurate way of knowing > > this information. > >=20 > > For example, inside tcp_do_segment(), while processing duplicate acks, > > we try to compute amount of data inflight with: > > awnd =3D (tp->snd_nxt - tp->snd_fack) + > > tp->sackhint.sack_bytes_rexmit; > >=20 > > Which is incorrect as it doesn't take into account whats been already > > sacked by the receiver. > > There are definitely other places in the stack where we do this > > incorrectly. > >=20 > > RFC 6675 provides guidance on how to implement calculations for > > bytes in flight at any point in time. Randall and I came to a conclusion > > that following can provide us inflight information almost(!) accurately > > with least amount of code changes: > >=20 > > pipe =3D snd_max - snd_una - sackhint.sacked_bytes + sackhint.sack_byte= s_rexmit; > >=20 > > here, > > snd_max: highest sequence number sent > > snd_una: lowest sequence number sent but not yet cumulatively acked > > sacked_bytes: total bytes sacked by receiver reported via SACK holes > > sack_bytes_rexmit: total bytes retransmitted from holes in this recovery > > period > >=20 > > Only missing piece in FreeBSD is sackd_bytes. This is basically total > > bytes sacked by the receiver and it can be extracted from SACK holes > > reported by the receiver. The approach we've decided to take is pretty > > simple: we already process each ACK with sack holes in=20 > > tcp_sack_doack() and extract sack blocks out of it. We'd now also track > > this new variable there which keeps track of total sacked bytes > > reported. > >=20 > > The downside with this approach is: > > There is no persistent information about sacked bytes. We recalculate > > it every time we get an ACK with sack holes in it. So if, for any > > reason, receiver decides to drop sack info than we get incorrect > > value for inflight. This may be also true when there are more holes but > > receiver can only report 3 at a time. > >=20 > > I have actual code that I've been testing and if people see no major pr= oblem > > with this approach, I can put it up for review in phabricator. >=20 > Well, I didn't receive any replies so here it is: > https://reviews.freebsd.org/D3971 >=20 > Please take a look at that. Closing the loop. This has been committed: https://svnweb.freebsd.org/changeset/base/290122 Cheers, Hiren --yQbNiKLmgenwUfTN Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAABCgBmBQJWMVnlXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRBNEUyMEZBMUQ4Nzg4RjNGMTdFNjZGMDI4 QjkyNTBFMTU2M0VERkU1AAoJEIuSUOFWPt/lAsAH/R4CcgPuMBguWagONCc542Fq 7FgQncCFfcy4VOaH1F0uudILJPFdMKd/LbtWYyYRL0Eec5eIF2GbqCq/DXvZjzhe YosyytQ9ctH11ihlcIq23OmERUr/dH3aQWbgSzP2Tp0rZpoGSbnJN8Xdc0ak/Tnz 8dBi1EzF+sslCwO1QtZV4kCb8/203KjRsVtgdJTX8H7eK9si/x4DS7gYBnFCbGUh vsiDainq4eNYlFFCsCZF8Ik5BtsSUxqajRXoH2w5L1SdtdCP4GGRZLz4ldAMcimj Sq7A8X8narUsVz4sgE5BBnLXmXPGHSYZA/pWx6j60VJdhF9kLWYQ3yR3nq69yis= =sRbu -----END PGP SIGNATURE----- --yQbNiKLmgenwUfTN-- From owner-freebsd-transport@freebsd.org Fri Oct 30 06:24:25 2015 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EC83BA21D28 for ; Fri, 30 Oct 2015 06:24:25 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id D3C041355 for ; Fri, 30 Oct 2015 06:24:25 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: by mailman.ysv.freebsd.org (Postfix) id D10D4A21D27; Fri, 30 Oct 2015 06:24:25 +0000 (UTC) Delivered-To: transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CFA99A21D26 for ; Fri, 30 Oct 2015 06:24:25 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from mail.strugglingcoder.info (strugglingcoder.info [65.19.130.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.strugglingcoder.info", Issuer "mail.strugglingcoder.info" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id B98051354 for ; Fri, 30 Oct 2015 06:24:25 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from localhost (unknown [10.1.1.3]) (Authenticated sender: hiren@strugglingcoder.info) by mail.strugglingcoder.info (Postfix) with ESMTPA id A97F610BF12; Thu, 29 Oct 2015 23:24:23 -0700 (PDT) Date: Thu, 29 Oct 2015 23:24:23 -0700 From: hiren panchasara To: Randall Stewart Cc: FreeBSD Transports Subject: Maintaining dupack counter per hole (was: The trouble with sack..) Message-ID: <20151030062423.GB5261@strugglingcoder.info> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="3+nIULlytNYGw3fk" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Oct 2015 06:24:26 -0000 --3+nIULlytNYGw3fk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable (Something Randall and I discussed today) On 10/07/15 at 12:17P, Randall Stewart via freebsd-transport wrote: >=20 > 3) When we have more than one hole the goal of SACK was to retransmit eve= ry time that > a hole had 3 dup-acks so that one could recover multiple blocks that = were lost. We just > plain don?t track dup-acks per hole. We do continue to count, but we = will wait to retransmit > anything until after we have drained 1/2 the data in flight from the = network at a minimum. And only then > do we start incrementing cwnd (remember we crashed it to 1 MTU) so th= at we can retransmit. There > may be some other twists in the code that we are missing but this is = what we believe (this could could > probably win the C obfuscation contest if someone were willing to ent= er it :-D) Wondering if we can add this dupack counter in struct sackhole {} and every time we process acks with sack in tcp_sack_doack(), we increment this counter if the same hole appears again. And retransmit it on (or after?) 3rd dupack. Cheers, Hiren --3+nIULlytNYGw3fk Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAABCgBmBQJWMw0TXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRBNEUyMEZBMUQ4Nzg4RjNGMTdFNjZGMDI4 QjkyNTBFMTU2M0VERkU1AAoJEIuSUOFWPt/l6CIH+wX47Sl9L2S+fYio2j1vqapF bk1obggQoLcqn5a3+O+O+XPzYk2oeHlJl4KM6Yx454LuQ2FybNOpeW31zmPrKMqx 2aOmRP+AiLJti7367yQRNvy7r7gvlzaViBRCWBTUm0qZ2QW3ZpYGv684m2n80G7i L/WSI16Ih2W/eRoLkfdQFxby/Wy+LkYWv7eiqLplLU6JP1w7IdBdsTPA7dpVICWJ gaEfrokA69138AzMHg9wB9urx0qoKzNXXprq1vOo2IAHXF8oWfAhtlWi3qK7pk14 Bpv9ULV+Ow7ybWdmUwRdBoQ/JQDh4UQ/8Pom0GkD+0BhaL6Gc4d8KQGCY0sHWA4= =Wa8D -----END PGP SIGNATURE----- --3+nIULlytNYGw3fk-- From owner-freebsd-transport@freebsd.org Fri Oct 30 14:41:47 2015 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F2336A1EF06 for ; Fri, 30 Oct 2015 14:41:47 +0000 (UTC) (envelope-from jlooney@juniper.net) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id BB99914C2 for ; Fri, 30 Oct 2015 14:41:47 +0000 (UTC) (envelope-from jlooney@juniper.net) Received: by mailman.ysv.freebsd.org (Postfix) id BB2ECA1EF05; Fri, 30 Oct 2015 14:41:47 +0000 (UTC) Delivered-To: transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BABC3A1EF04 for ; Fri, 30 Oct 2015 14:41:47 +0000 (UTC) (envelope-from jlooney@juniper.net) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1on0140.outbound.protection.outlook.com [157.56.110.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6B0CD14C1 for ; Fri, 30 Oct 2015 14:41:46 +0000 (UTC) (envelope-from jlooney@juniper.net) Received: from BLUPR05MB1971.namprd05.prod.outlook.com (10.162.224.25) by BLUPR05MB1972.namprd05.prod.outlook.com (10.162.224.26) with Microsoft SMTP Server (TLS) id 15.1.312.18; Fri, 30 Oct 2015 13:09:23 +0000 Received: from BLUPR05MB1971.namprd05.prod.outlook.com ([10.162.224.25]) by BLUPR05MB1971.namprd05.prod.outlook.com ([10.162.224.25]) with mapi id 15.01.0312.014; Fri, 30 Oct 2015 13:09:23 +0000 From: Jonathan Looney To: hiren panchasara , Randall Stewart CC: FreeBSD Transports Subject: Re: Maintaining dupack counter per hole (was: The trouble with sack..) Thread-Topic: Maintaining dupack counter per hole (was: The trouble with sack..) Thread-Index: AQHREtukPt6xEWx9x06P2pRt2Ui3Kp6Dv5mA Date: Fri, 30 Oct 2015 13:09:23 +0000 Message-ID: References: <20151030062423.GB5261@strugglingcoder.info> In-Reply-To: <20151030062423.GB5261@strugglingcoder.info> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.5.7.151005 authentication-results: spf=none (sender IP is ) smtp.mailfrom=jlooney@juniper.net; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [66.129.241.11] x-microsoft-exchange-diagnostics: 1; BLUPR05MB1972; 5:CwpbNEb9tiEY0Fhav9NU3Kvh0nZopPSnUE9M/mRbJBt/AQNwC8dgFPlChvYWft24/15x3y5dAMwscCPu6N+g++Y/AtbqOEZ9wf4zR6iZlAZUVn7o9crWig9wuW/yemxarzsp6Kza3NkHVynNgk43Qg==; 24:iIvPBnFSIftptk8Y8bxi2mTZ5bt+E3RT8+6EApLBdY3rD4zV1XjYAaW5L07sg6JQVOtIPW41poLmuRiuzjej7LfJj36xC3iPWIv4nZ6wt2Y=; 20:0TZgsGv3uYSfrLxAWGfF/mt0PjyGhkE0I30KhyPjnhyuj5l1fJM+PJM1KyXfu72Wr5Of15woZ6PhQW1+eh0S3Q== x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BLUPR05MB1972; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(601004)(2401047)(520078)(5005006)(8121501046)(10201501046)(3002001)(102215026); SRVR:BLUPR05MB1972; BCL:0; PCL:0; RULEID:; SRVR:BLUPR05MB1972; x-forefront-prvs: 07459438AA x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(189002)(377454003)(24454002)(479174004)(199003)(4001350100001)(87936001)(5002640100001)(10400500002)(5001770100001)(5008740100001)(101416001)(2900100001)(2950100001)(97736004)(5001960100002)(189998001)(92566002)(102836002)(77096005)(81156007)(5004730100002)(19580405001)(5007970100001)(11100500001)(40100003)(105586002)(99286002)(66066001)(76176999)(83506001)(106356001)(122556002)(19580395003)(86362001)(106116001)(50986999)(54356999)(36756003)(6606295002); DIR:OUT; SFP:1102; SCL:1; SRVR:BLUPR05MB1972; H:BLUPR05MB1971.namprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: juniper.net does not designate permitted sender hosts) spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-ID: <6F4D6B5447E08546B0F42B5DF84458C3@namprd05.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: juniper.net X-MS-Exchange-CrossTenant-originalarrivaltime: 30 Oct 2015 13:09:23.0451 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: bea78b3c-4cdb-4130-854a-1d193232e5f4 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BLUPR05MB1972 X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Oct 2015 14:41:48 -0000 On 10/30/15, 2:24 AM, "hiren panchasara" wrote: >(Something Randall and I discussed today) > >On 10/07/15 at 12:17P, Randall Stewart via freebsd-transport wrote: >>=20 >> 3) When we have more than one hole the goal of SACK was to retransmit >>every time that >> a hole had 3 dup-acks so that one could recover multiple blocks >>that were lost. We just >> plain don?t track dup-acks per hole. We do continue to count, but >>we will wait to retransmit >> anything until after we have drained 1/2 the data in flight from >>the network at a minimum. And only then >> do we start incrementing cwnd (remember we crashed it to 1 MTU) so >>that we can retransmit. There >> may be some other twists in the code that we are missing but this >>is what we believe (this could could >> probably win the C obfuscation contest if someone were willing to >>enter it :-D) > >Wondering if we can add this dupack counter in struct sackhole {} and >every time we process acks with sack in tcp_sack_doack(), we increment >this counter if the same hole appears again. And retransmit it on (or >after?) 3rd dupack. The SACK hole-tracking code is already quite complex. If we're going to make a fundamental change, perhaps it is time to consider a rewrite, rather than a smaller patch? Maybe this is the best code we can write. Or, maybe it is time for a re-coding to make it more easily accessible. In any case, how do you propose tracking holes that are carved up by later packets? E.g. Hole is 1:1500. Then, you receive a packet with 500:750, leaving two holes. Then, you receive a packet with 1000:1250, leaving three holes. Do you charge all three holes with the duplicate ACKs? Do you copy the counter to the holes? Or, is the fact that the ACK is slightly different enough to reset the counter? If you reset the counter anytime the hole is broken up, it will take a while to get to three in a really out-of-order network scenario. On the other hand, if you don't reset the counter, you may retransmit too fast. Just my initial reaction... Jonathan From owner-freebsd-transport@freebsd.org Fri Oct 30 16:50:54 2015 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AF40FA2190F for ; Fri, 30 Oct 2015 16:50:54 +0000 (UTC) (envelope-from rrs@netflix.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 84389133C for ; Fri, 30 Oct 2015 16:50:54 +0000 (UTC) (envelope-from rrs@netflix.com) Received: by mailman.ysv.freebsd.org (Postfix) id 818D6A2190E; Fri, 30 Oct 2015 16:50:54 +0000 (UTC) Delivered-To: transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 81143A2190D for ; Fri, 30 Oct 2015 16:50:54 +0000 (UTC) (envelope-from rrs@netflix.com) Received: from mail-pa0-x233.google.com (mail-pa0-x233.google.com [IPv6:2607:f8b0:400e:c03::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48E48133B for ; Fri, 30 Oct 2015 16:50:54 +0000 (UTC) (envelope-from rrs@netflix.com) Received: by padhy1 with SMTP id hy1so73216653pad.0 for ; Fri, 30 Oct 2015 09:50:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netflix.com; s=google; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=6WlQ8gdl38QAbl6G+SE5pjgFSoBortSgIcbd4Tgsijo=; b=Aw/2PX0oMiadRPvj1yi+ee5XSJxT9kyNm0gda5/ox006mu4G0Tmwx1rYkgrMtQ7Gao lC4i98g2Bz0Vse9mZiUx87rBUp3jxRpGmDbKNPAgM8esTpWPYcN70kOYiWSuabLmFizP 6VT9ZbHDxYrRNxjUIK09I1MwCC3b2eg6GUUbg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=6WlQ8gdl38QAbl6G+SE5pjgFSoBortSgIcbd4Tgsijo=; b=lEOKYu8O9wFp+cMBGb/4NtrdCIAR/HZBLgl+mdhJHx7y4KrX+TCx1yr0hY0h2M+uca D17mrjIIKIiUYbXZNyxFLlnH13vIZlFDVWp0T0nn/bmR6RrXXvpKYxhB4bCiWMWQZBCg O5dRpnBGYmUzyV17SeeD9GXTV5k9kNUDqvwfZU0SZ9QDBzhH1ahliK0cRlXfSxv0TKcN r1Yy9STv/mhU9XDctCZZUwW//TtdfszxaXxqhvo5e00Armxaxc4c/X2UrtL3CoR4J8a+ mWr+JPF5HDb6SCxzEjf1XaavJOBbylQR+tQFItDQx5AUc/LDVko1mIklrvuhuG50nYZ8 mYlA== X-Gm-Message-State: ALoCoQl1UbN/qdJwWQBgdQGUU4F2D1XQNKtwFqMfGZ6jRFsv1KRvnj0xSez0oJoOCYOv+nxozALY X-Received: by 10.68.65.42 with SMTP id u10mr9935096pbs.8.1446223853706; Fri, 30 Oct 2015 09:50:53 -0700 (PDT) Received: from [10.16.208.207] ([69.53.232.0]) by smtp.gmail.com with ESMTPSA id bo5sm9104925pbb.76.2015.10.30.09.50.52 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 30 Oct 2015 09:50:52 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Maintaining dupack counter per hole (was: The trouble with sack..) From: Randall Stewart In-Reply-To: Date: Fri, 30 Oct 2015 09:50:56 -0700 Cc: hiren panchasara , FreeBSD Transports Message-Id: <4C66DD2E-DFD0-4334-B14F-16289DC82A41@netflix.com> References: <20151030062423.GB5261@strugglingcoder.info> To: Jonathan Looney X-Mailer: Apple Mail (2.1878.6) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Oct 2015 16:50:54 -0000 I don=92t think you would reset the counter just because a hole was = broken up. You are either getting re-ordering in combination with TSO or possibly a retransmission which fills in some but not all of the hole.. And as to the =93rewrite=94 I am not sure its needed, its complex code I = will admit but it does seem to work. I would say only rewrite if its needed to make = the strike per hole work. Of course we need to think about this a bit more. Generally we know what = the =93new data=94 is in a SACK, so thats probably key. Hiren also pointed = out a recent Google draft to me: https://tools.ietf.org/html/draft-cheng-tcpm-rack-00 It seems to me that this is a very good thing :-) Of course there are lots of other things that need to be implemented = with it.. reorder detection being one.=20 I would imagine this is a big source of some of the gains that Quic gets = in recovery.. if you note Jana is noted in the Ack=92s section for his implementation in Quic.. R On Oct 30, 2015, at 6:09 AM, Jonathan Looney = wrote: > On 10/30/15, 2:24 AM, "hiren panchasara" > hiren@strugglingcoder.info> wrote: >=20 >> (Something Randall and I discussed today) >>=20 >> On 10/07/15 at 12:17P, Randall Stewart via freebsd-transport wrote: >>>=20 >>> 3) When we have more than one hole the goal of SACK was to = retransmit >>> every time that >>> a hole had 3 dup-acks so that one could recover multiple blocks >>> that were lost. We just >>> plain don?t track dup-acks per hole. We do continue to count, but >>> we will wait to retransmit >>> anything until after we have drained 1/2 the data in flight from >>> the network at a minimum. And only then >>> do we start incrementing cwnd (remember we crashed it to 1 MTU) = so >>> that we can retransmit. There >>> may be some other twists in the code that we are missing but this >>> is what we believe (this could could >>> probably win the C obfuscation contest if someone were willing to >>> enter it :-D) >>=20 >> Wondering if we can add this dupack counter in struct sackhole {} and >> every time we process acks with sack in tcp_sack_doack(), we = increment >> this counter if the same hole appears again. And retransmit it on (or >> after?) 3rd dupack. >=20 > The SACK hole-tracking code is already quite complex. If we're going = to > make a fundamental change, perhaps it is time to consider a rewrite, > rather than a smaller patch? Maybe this is the best code we can write. = Or, > maybe it is time for a re-coding to make it more easily accessible. >=20 > In any case, how do you propose tracking holes that are carved up by = later > packets? >=20 > E.g. >=20 > Hole is 1:1500. >=20 > Then, you receive a packet with 500:750, leaving two holes. >=20 > Then, you receive a packet with 1000:1250, leaving three holes. >=20 > Do you charge all three holes with the duplicate ACKs? Do you copy the > counter to the holes? >=20 > Or, is the fact that the ACK is slightly different enough to reset the > counter? >=20 > If you reset the counter anytime the hole is broken up, it will take a > while to get to three in a really out-of-order network scenario. On = the > other hand, if you don't reset the counter, you may retransmit too = fast. >=20 > Just my initial reaction... >=20 > Jonathan -------- Randall Stewart rrs@netflix.com 803-317-4952 From owner-freebsd-transport@freebsd.org Sat Oct 31 00:33:25 2015 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 82FC8A22E3E for ; Sat, 31 Oct 2015 00:33:25 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 672A21AB1 for ; Sat, 31 Oct 2015 00:33:25 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: by mailman.ysv.freebsd.org (Postfix) id 6527AA22E3D; Sat, 31 Oct 2015 00:33:25 +0000 (UTC) Delivered-To: transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 64BF8A22E3C for ; Sat, 31 Oct 2015 00:33:25 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from mail.strugglingcoder.info (strugglingcoder.info [65.19.130.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.strugglingcoder.info", Issuer "mail.strugglingcoder.info" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 51F5A1AB0 for ; Sat, 31 Oct 2015 00:33:24 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from localhost (unknown [10.1.1.3]) (Authenticated sender: hiren@strugglingcoder.info) by mail.strugglingcoder.info (Postfix) with ESMTPA id 67AD510B715 for ; Fri, 30 Oct 2015 17:33:24 -0700 (PDT) Date: Fri, 30 Oct 2015 17:33:24 -0700 From: hiren panchasara To: transport@FreeBSD.org Subject: Re: Setting congestion window on loss detection Message-ID: <20151031003324.GI5261@strugglingcoder.info> References: <20151007195445.GC42742@strugglingcoder.info> <20151012171927.GB92230@strugglingcoder.info> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="z3SYAdNKCFJcUCPa" Content-Disposition: inline In-Reply-To: <20151012171927.GB92230@strugglingcoder.info> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Oct 2015 00:33:25 -0000 --z3SYAdNKCFJcUCPa Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 10/12/15 at 10:19P, hiren panchasara wrote: > On 10/07/15 at 12:54P, hiren panchasara wrote: > > Found this issue about a month ago and started a discussion on -net: > > https://lists.freebsd.org/pipermail/freebsd-net/2015-September/043249.h= tml > >=20 > > I feel this forum is a better place to discuss this further now. > >=20 > > Problem: We set cwnd to 1mss when we detect loss via arrivals of 3 dupa= cks. > > That is wrong as we severely underutilizing network capacity by doing > > so. > >=20 > > Next question is, what should we set cwnd to? > >=20 > > RFC6675 (TCP SACK) suggests following on detecting loss: > > ssthresh =3D cwnd =3D (FlightSize / 2) > >=20 > > RFC5681 (TCP Congestion control) suggest: > > ssthresh =3D max (FlightSize / 2, 2*SMSS) > > cwnd =3D (ssthresh + 3*SMSS) > >=20 > > (Here, FlightSize is bytes in flight.) > >=20 > > OR should we let whatever congestion control (CC) algo in control decide > > that value? >=20 > I also tried to look at what Linux does. It has PRR (Proportional Rate > Reduction) RFC 6937 (something I plan to work on after these initial > needed fixes/improvements) in place. Looking back pre-PRR code, linux > seems to be doing following: >=20 > cwnd =3D min(cwnd, FlightSize) >=20 > Here, cwnd in the equation is adjusted as per rate-halving > (draft-mathis-tcp-ratehalving-00) which says "the window is reduced by > sending one data segment for each two segments which are acknowledged". >=20 > (I am not very familiar with linux code so please correct me if that's > not the case.) >=20 > Basically, I think any of these approaches is better than what we have in > the tree right now. As suggested by Randall on the transport group call, I'd go ahead and see if following RFC6675 gives us any better results with different loss scenarios. Cheers, Hiren --z3SYAdNKCFJcUCPa Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAABCgBmBQJWNAxUXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRBNEUyMEZBMUQ4Nzg4RjNGMTdFNjZGMDI4 QjkyNTBFMTU2M0VERkU1AAoJEIuSUOFWPt/lZ70H/0IV/XiFfGtpQpICrdL3J1lK iud16poakoUr1mXlGDXyy8XZpqd+aMF9YWM8o7RI8b3UNjFReInD8+Akc6PzYyVU 7bbLQoN+ZwVkRuALih3BXM6oJAA5IWsYOqm6dbhlktrkN46EuDiiUB/o6lvsC/Js fG2OZftBwpeVJyOTdlkjL/3H/JzgSuvHjVNqN2rwY8JnJeo4xT6mpgYZ9lgBo+PV nTFdX4tgeDEctaw4tM7K3Jl1U2mIZeAKrfICA1PyBD0KHPxFnnmO92+BKrSxln1i r4DNAW7IlmnVvAqXLqnl2wKbqrQa25fvgaWIqqpC38vO3c1MzPIrvxSEjYnSxGU= =q0PY -----END PGP SIGNATURE----- --z3SYAdNKCFJcUCPa-- From owner-freebsd-transport@freebsd.org Sat Oct 31 23:44:24 2015 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6CFE4A2235A for ; Sat, 31 Oct 2015 23:44:24 +0000 (UTC) (envelope-from rrs@netflix.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 45ED61945 for ; Sat, 31 Oct 2015 23:44:24 +0000 (UTC) (envelope-from rrs@netflix.com) Received: by mailman.ysv.freebsd.org (Postfix) id 45B11A22359; Sat, 31 Oct 2015 23:44:24 +0000 (UTC) Delivered-To: transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2C978A22357 for ; Sat, 31 Oct 2015 23:44:24 +0000 (UTC) (envelope-from rrs@netflix.com) Received: from mail-pa0-x233.google.com (mail-pa0-x233.google.com [IPv6:2607:f8b0:400e:c03::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id ECAB51944 for ; Sat, 31 Oct 2015 23:44:23 +0000 (UTC) (envelope-from rrs@netflix.com) Received: by padhk11 with SMTP id hk11so109255085pad.1 for ; Sat, 31 Oct 2015 16:44:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netflix.com; s=google; h=from:content-type:subject:message-id:date:to:mime-version; bh=EhcFpeb6i+4Dbc9HGxy3eBD9RLxmiCXewu4KGn3EJDQ=; b=FOQ/+kc/NAWdLi2Tc9GpAWm0cS64ZvVukkksw0EWqlNOXuSf+O7y/2rhGJ4jcVxWoX Ah4RU31CLdihjUpbqzD1kOCPZ12+F/dyYAgMOcyq3ZJQc9bdciURLrRfugIW9wVOQD7s p2BHCj6jnmRHD4XpIQMT4YU0C4cRPubU0l0/w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:subject:message-id:date:to :mime-version; bh=EhcFpeb6i+4Dbc9HGxy3eBD9RLxmiCXewu4KGn3EJDQ=; b=V9HAI/fkDtpSAh572OrXbyxNlMlYNqLJ5XttpHMHQwTdS14WYRE+N88rQyZwB1yz77 eX5Z/Arc7WzFf92pPvdFhalQ3IDfk3pM18UDsqjsCzGiiFUoYVUOaS3ShbKyxuw0AEba U/xxFcXGZXCAXhDpSjKKAN68Ol7t5GxyplhhX+mzHSUbQEok5KCNfqqRq+8mIwvn69Ng cP5k5fqjRR/RjNmDBVXofWUnRTx272nMiBfJ7vaOO9rK58+JaqnsSUW7Mxm+qAwNPHct DOLOyAdVZ3qxBupD0uwBWnqb8L0Hmm1pfDKMvMVC7NbIIhtZ1T2qFsZjegc9uLxcukU1 /h6A== X-Gm-Message-State: ALoCoQluyl0HYUceaAjzsJeH13rwlpG8x5eWpD7peVxw8ehZPBy++U4PmE2sNjFG22haYzVtdmx9 X-Received: by 10.68.197.9 with SMTP id iq9mr17347411pbc.123.1446335062668; Sat, 31 Oct 2015 16:44:22 -0700 (PDT) Received: from ?IPv6:2607:fb10:16:208:4565:cc15:dfc:bed2? ([2607:fb10:16:208:4565:cc15:dfc:bed2]) by smtp.gmail.com with ESMTPSA id hq1sm15797482pbb.43.2015.10.31.16.44.21 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 31 Oct 2015 16:44:21 -0700 (PDT) From: Randall Stewart Subject: Modular TCP .. or the start.. Message-Id: <1180181A-03CE-48FB-BC52-1606AF1EFBD3@netflix.com> Date: Sat, 31 Oct 2015 16:44:20 -0700 To: FreeBSD Transports Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) X-Mailer: Apple Mail (2.1878.6) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Oct 2015 23:44:24 -0000 All: As promised I managed to get working this afternoon a modular transport = much on the lines of what we have talked about. The review of it is here: https://reviews.freebsd.org/D4055 There are still things left un-done .. but its a first step that covers = quite a bit and I also supply a KLM called fastpath that shows how it can be used = and adds two different fast-path approaches we have tried at Netflix (we may be = moving completely to the fastack version). There is also a utility included in usr.sbin that allows you to list and = manipulate whats in the system (i.e. see the default, set the default, show all entries). = An individual socket can also set its own TCP function set at open (but not after it = has moved to some other state). There are a few things missing that Jonathan wanted in particular I did = *not* do tcp_input() or pru_usrreq() These would involve a lot more work in the in_pcb and I did not want to = tackle that yet=85 it can be something we add later.. there is enough here that I think a lot can = be done to play with new tcp stacks and alternate designs :-) Please look and make comments, the usual suspects on the transport = review are named as reviewers Best wishes R -------- Randall Stewart rrs@netflix.com 803-317-4952