Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Aug 1996 16:04:48 +0200 (MET DST)
From:      Luigi Rizzo <luigi@labinfo.iet.unipi.it>
To:        hackers@freebsd.org
Cc:        luigi@labinfo.iet.unipi.it (Luigi Rizzo), olah@freefall.freebsd.org, phk@freebsd.org, jkh@freebsd.org
Subject:   SACK and other TCP modifications available
Message-ID:  <199608271404.QAA06501@labinfo.iet.unipi.it>

next in thread | raw e-mail | index | archive | help
It would be nice if someone could try some of the changes below
and send some feedback, and possibly arrange to include this in
-current sources.

	Luigi
-----------

The file "sack.diffs", available from

	http://www.iet.unipi.it/~luigi/research.html

includes a number of modifications to TCP designed to improve
performance in presence of losses, namely:
- MODIFIED FAST RETRANSMIT
- NEWRENO
- SACK (Selective Acknowledgements)
- TSACK (Selective Acknowledgements in RFC1323 timestamps)

The software is in alpha stage, although it has been running for
a couple of weeks in intermediate formats, and it is running on a
couple of our systems since Aug.26 1996.

MODIFIED FAST RETRANSMIT is really helpful on lossy links, and does
not need modifications at the receive side. Same for NEWRENO.

SACK (and/or TSACK), especially if sided by MODIFIED FAST RETRANSMIT,
can improve throughput dramatically.

The diffs are against FreeBSD 2.1R, although they should be easily
ported to other BSD-derived systems. Most options must be enabled in
the kernel config file via

	option	SPECIFICOPTIONNAME

and need to be enabled via a sysctl variable in order to activate
them.

Since this code is evolving rapidly, please check the above URL to see
if there is a newer version. In particular, this code still has some
diagnostic output which goes to /var/log/messages.

Bugs, fixes and suggestions can be reported to me at

	luigi@iet.unipi.it

---------------------------
A brief description of all changes follows:

CLEANUP OF BSD CODE

+ BSD code has some strange ways of updating the count of duplicate
  acks. The count gets reset by some unexpected events (window
  updates, old segments) and does not get checked/reset properly in
  the header prediction code. A number of small fixes tries to count
  dupacks more consistently.

+ added a flag, TF_FAST_RXMT, to indicate that we are in
  fast retransmit/fast recovery. This is needed to support a different fast
  retransmit policy, and makes the code somewhat easier to read.

+ the count of retransmitted and dup bytes is now accumulated per
  connection as well as globally. This is useful for statistical
  purposes, and can be used later to determine if a connection is
  experiencing losses or duplicate data.

+ additional variables are added to tcpstat, to count for various
  events.

MODIFIED FAST RETRANSMIT

+ BSD enters fast retransmit when there are 3 consecutive duplicate
  acks; the number 3 was chosen to reduce the chance that a reordering
  of packets in the net is seen as a segment loss.
  However, in presence of large losses, or when the amount of
  outstanding data is small, or the window is narrow, there are so few
  packets in transit that 3 dupacks cannot happen, and the chance of a
  reordering is low. In these situations, 1 or 2 dupacks almost certainly
  mean that a segment has been lost. Instead of waiting for a timeout,
  fast retransmit can be started earlier. This code identifies these
  cases, and lowers the threshold for fast retransmit to 2 or 1 dup.

  Note 1: in many cases (e.g. telnet, http), there are still a lot
  of timeouts which occur after 0 dupacks, because in many cases
  there is only one segment in flight. We cannot do much on this.

  Note 2: since the tcp control block accumulates statistics on the
  amount of dup/retransmitted data, perhaps this behaviour can be made
  more adaptive if the connection shows a significant reordering of
  segments.

NEWRENO (following a suggestion by J. Hoe)

+ In Reno, after a fast retransmit, a non-dup ack causes exit from
  fast recovery.  However, in case of multiple losses in the same
  window, there might need three more dupacks to detect this, and
  a subsequent fastretrans would shrink the window even further.
  We save the value of snd_max in snd_max_rxmt at the time of the
  fast retransmit; then if snd_una does not advance to snd_max_rxmt
  the segment at snd_una has been lost and can be retransmitted
  immediately.

SACK

+ This is an implementation of the SACK options as described in the
  recent internet draft, to which it is fully compliant. The maximum
  lifetime of SACK can be set to 0 or more timeouts. The retransmission
  strategy, during fast recovery, is as follows: if new data can
  be sent within snd_wnd and snd_cwnd, then do it. Otherwise, old
  blocks (up to, but not beyond, the last SACKed block) are sent
  again. There is currently no provision to resent the block snd_una
  if this has been lost twice (a solution is in the works).

TSACK

+ This is a simplified version of SACK, which carries SACK information
  embedded in slightly modified RFC1323 timestamps. There are some
  tradeoffs in using TSACKs (almost no need for receiver support, less
  precise SACKs) instead of ACKs, but TSACKs have some advantage over
  SACKs in some cases.

ARTIFICIAL LOSSES

+ in order to test the behaviour of the above code, there is a new
  function, tcp_dropit(), which allows some incoming data and ack
  packets to be dropped. Currently the drop rate is 10% for data
  segment, 5% for pure acks. Segments are dropped using a repetitive
  pattern of 499 segments, in order to make results a bit more
  reproducible (they aren't reproducible anyways, because the actual
  generation of ACKs depends on the behaviour of the receiver process
  and there is some interaction with timeouts).

All the above mechanisms can be enabled by setting the variable

	net.inet.tcp.sack

as follows:

SACK lifetime	0..15	(0 and 1 are equivalent)
SACK		0x10	enables sack negotiation and processing
TSACK		0x20	enables TSACK generation
MODIFIED_FR	0x40	enables modified fast retransmit
NEWRENO		0x80	enables newreno
LOSSY		0x100	enables dropping incoming data/acks

The following kernel options are needed:

option	TSACK		enables TSACK generation
option	SACK		enables SACK code, TSACK processing, LOSSY

Newreno and modified fast retransmit are compiled in by default.

You might also need the following changes to sysctl and netstat. The former
needs to be recompiled with the new tcp_var.h The patch below just
allows you to enter values as hex numbers instead of decimal ones.

The patch to netstat (which also needs to be recompiled) is there to
allow you to see the additional statistic variables in the tcpstat
structure. Since these variables are allocated at the bottom of the
structure, older netstat will work, just don't write all available info.


diff -cbwr /usr.sbin/sysctl/sysctl.c ./sysctl.c
*** /cdrom/usr/src/usr.sbin/sysctl/sysctl.c     Sun Jun 11 06:32:58 1995
--- ./sysctl.c  Mon Aug 19 16:28:31 1996
***************
*** 342,348 ****
        if (newsize > 0) {
                switch (type) {
                case CTLTYPE_INT:
!                       intval = atoi(newval);
                        newval = &intval;
                        newsize = sizeof intval;
                        break;
--- 342,349 ----
        if (newsize > 0) {
                switch (type) {
                case CTLTYPE_INT:
!                       sscanf(newval, "%i", &intval); /* XXX */
!                       /* intval = atoi(newval); */
                        newval = &intval;
                        newsize = sizeof intval;
                        break;

diff -cbwr netstat/inet.c /usr/src/usr.bin/netstat/inet.c
*** netstat/inet.c      Sat Jul 29 11:42:54 1995
--- /usr/src/usr.bin/netstat/inet.c     Fri Aug 23 17:02:49 1996
***************
*** 227,233 ****
--- 227,243 ----
        p(tcps_conndrops, "\t%d embryonic connection%s dropped\n");
        p2(tcps_rttupdated, tcps_segstimed,
                "\t%d segment%s updated rtt (of %d attempt%s)\n");
+       p(tcps_zerodupw, "\t%d invalid invalid dupack reset on window update\n");
        p(tcps_rexmttimeo, "\t%d retransmit timeout%s\n");
+       p(tcps_rexmt[0], "\t\t%d retransmit timeout with 0 dup acks\n");
+       p(tcps_rexmt[1], "\t\t%d retransmit timeout with 1 dup acks\n");
+       p(tcps_rexmt[2], "\t\t%d retransmit timeout with 2 dup acks\n");
+       p(tcps_fastretransmit, "\t%d fast retransmit%s\n");
+       p(tcps_fastrexmt[0], "\t\t%d with 1 dup ack\n");
+       p(tcps_fastrexmt[1], "\t\t%d with 2 dup ack\n");
+       p(tcps_fastrexmt[2], "\t\t%d with 3 dup ack\n");
+       p(tcps_newreno, "\t%d newreno retrans\n");
+       p(tcps_fastrecovery, "\t%d fast recovery\n");
        p(tcps_timeoutdrop, "\t\t%d connection%s dropped by rexmit timeout\n");
        p(tcps_persisttimeo, "\t%d persist timeout%s\n");
        p(tcps_persistdrop, "\t\t%d connection%s dropped by persist timeout\n");




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199608271404.QAA06501>