Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Nov 2015 11:23:04 -0800
From:      Matthew Macy <mmacy@nextbsd.org>
To:        "Mark Felder" <feld@FreeBSD.org>
Cc:        "<transport@freebsd.org>" <transport@freebsd.org>
Subject:   Re: Should delayed ACKs be enabled by default?
Message-ID:  <15135ccada3.10df21d5156685.7375508093788694995@nextbsd.org>
In-Reply-To: <1448296698.1772293.447796761.5B2239BD@webmail.messagingengine.com>
References:  <1448296698.1772293.447796761.5B2239BD@webmail.messagingengine.com>

next in thread | previous in thread | raw e-mail | index | archive | help
The fixed delay is a known problem (in the CC community - not in FreeBSD) and is particularly excessive on FreeBSD (100ms vs 40ms on Linux). It hasn't discussed much because it is a minor annoyance vis a vis other shortfalls in TCP. Part of my incast mitigation changes is capping delack at 40ms and scaling it down dynamically at a rate of 1/2 RTO. With minRTO set to 3 ticks that would potentially bring it down to 1ms in the data center. 

Glenn Judd at Morgan Stanley wrote a good paper about their experiences called "Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter"  [https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-judd.pdf]. MS found that eliminating the delack increased CPU overhead and that they were best served by setting delack to 1ms. I would wager that when the RTO (and thus delack) got well below 1ms that the extra interrupt overhead would outweigh any work coalescing benefits, but that is something we'll need to measure at that point.



-M

 ---- On Mon, 23 Nov 2015 08:38:18 -0800 Mark Felder <feld@FreeBSD.org> wrote ---- 
 > John Nagle recently commented on a HN post[1] regarding the issues
 > delayed acks cause. Has this been studied in depth on FreeBSD?
 > 
 > His post:
 > 
 > > That still irks me. The real problem is not tinygram prevention. It's
 > > ACK delays, and that stupid fixed timer. They both went into TCP around
 > > the same time, but independently. I did tinygram prevention (the Nagle
 > > algorithm) and Berkeley did delayed ACKs, both in the early 1980s. The
 > > combination of the two is awful. Unfortunately by the time I found about
 > > delayed ACKs, I had changed jobs, was out of networking, and doing a
 > > product for Autodesk on non-networked PCs.
 > >
 > > Delayed ACKs are a win only in certain circumstances - mostly character
 > > echo for Telnet. (When Berkeley installed delayed ACKs, they were doing
 > > a lot of Telnet from terminal concentrators in student terminal rooms to
 > > host VAX machines doing the work. For that particular situation, it made
 > > sense.) The delayed ACK timer is scaled to expected human response time.
 > > A delayed ACK is a bet that the other end will reply to what you just
 > > sent almost immediately. Except for some RPC protocols, this is
 > > unlikely. So the ACK delay mechanism loses the bet, over and over,
 > > delaying the ACK, waiting for a packet on which the ACK can be
 > > piggybacked, not getting it, and then sending the ACK, delayed. There's
 > > nothing in TCP to automatically turn this off. However, Linux (and I
 > > think Windows) now have a TCP_QUICKACK socket option. Turn that on
 > > unless you have a very unusual application.
 > >
 > > Turning on TCP_NODELAY has similar effects, but can make throughput
 > > worse for small writes. If you write a loop which sends just a few bytes
 > > (worst case, one byte) to a socket with "write()", and the Nagle
 > > algorithm is disabled with TCP_NODELAY, each write becomes one IP
 > > packet. This increases traffic by a factor of 40, with IP and TCP
 > > headers for each payload. Tinygram prevention won't let you send a
 > > second packet if you have one in flight, unless you have enough data to
 > > fill the maximum sized packet. It accumulates bytes for one round trip
 > > time, then sends everything in the queue. That's almost always what you
 > > want. If you have TCP_NODELAY set, you need to be much more aware of
 > > buffering and flushing issues.
 > >
 > > None of this matters for bulk one-way transfers, which is most HTTP
 > > today. (I've never looked at the impact of this on the SSL handshake,
 > > where it might matter.)
 > >
 > > Short version: set TCP_QUICKACK. If you find a case where that makes
 > > things worse, let me know.
 > >
 > > John Nagle
 > 
 > 
 > [1] https://news.ycombinator.com/item?id=10608356
 > 
 > 
 > 
 > -- 
 >   Mark Felder
 >   ports-secteam member
 >   feld@FreeBSD.org
 > _______________________________________________
 > freebsd-transport@freebsd.org mailing list
 > https://lists.freebsd.org/mailman/listinfo/freebsd-transport
 > To unsubscribe, send any mail to "freebsd-transport-unsubscribe@freebsd.org"
 > 




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15135ccada3.10df21d5156685.7375508093788694995>