Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 1 Dec 1995 14:02:03 +0100 (MET)
From:      Luigi Rizzo <luigi@labinfo.iet.unipi.it>
To:        hackers@freebsd.org
Subject:   A proposal for selective acks
Message-ID:  <199512011302.OAA09790@labinfo.iet.unipi.it>

next in thread | raw e-mail | index | archive | help
I posted this recently on comp.protocols.tcp.ip

I believe this is something that could be implemented without much
difficulty in FreeBSD (possibly in -current), and almost without
side effects to the rest of the kernel. I would appreciate receiving
opinions from you guys.

It appears that selective acks (SACKs) were a popular subject 5-10
years ago, to the point that NETBLT [RFC998] included them, and
there was even a proposal for TCP [RFC1072].

I have a proposal for an efficient encoding of the SACK option
described in RFC1072.

In the following I assume that the reader has read the section of
RFC1072 dealing with selective ACKS (which is a nice reading, as
most papers from Van Jacobson).

	Comments are solicited and welcome.
	Luigi
-----------

The proposal for SACKs by Van Jacobson [RFC1072] is very clear and
detailed about a possible implementation, to the point that one
wonders why they haven't been actually included in RFC1323 and
implemented.

A possible explaination is that the SACK option, as specified in
RFC1072, might suffer from the limited amount of space available
for TCP options. This only enables to selectively ack a limited
number of out-of-sequence segments, thus reducing the effectiveness
of the mechanism. The problem comes from the need of passing both
the relative origin and the size of each segment, totalling 4 bytes
per segment.

However, consider the typical case in which SACKs would be useful,
i.e. the transfer of large blocks of data. In these conditions,
the sender should mostly send maximum-sized segments.  Now look at
the reassembly queue at the receiving station:  starting from the
first unacked byte, we have a sequence of holes (H) and segments
(S), starting with a hole and ending with a segment, as below.

	HHH SSSS HH SS HH SS HHHH SSSS HH SSS

What I expect to see (have not verified, though) is a sequence of
Holes-Segments whose size is generally a multiple of the MSS in
use for the connection. The only exceptions could possibly be first
hole (because of some odd-length communication before the large
block of data begins), and the last segment (which may contain less
than an MSS because no more data are available).

If the above is true, a SACK option could comprise the following fields:

  h1	the size of the first hole. The hole starts at the
	point indicated by the ACK field in the header.

  s1	the size of the last segment

  ss	the GCD of all the remaining segments/holes. This
	should usually represent the MSS in use.

  b[]	a bitmap indicating which segments (of size ss) are 
	present or not after the first hole
  
  lb	the number of elements in the bitmap b[]. Actually,
	only 3 bits are needed for lb because the option length
	could be used to derive the size of b[] in bytes.

Following the description of the SACK option in RFC1072, we could
represent h1, s1, ss as 16-bit integers, and use 2 bytes for option
kind+len. This would leave up to (44-8)=36 bytes for b[]+lb, i.e.
room for over 250 segments. Even if some room is taken by other
options (e.g. timestamps), there is still sufficient room to deal
with large windows.

Note that only ones in b[] are meaningful, so the receiver can deal
with the following situations by sending several SACK with different
values for h1:
  * the send window is very large, and the bitmap is not large enough;
  * for some reason (the sender changes to a new MSS which is prime
    wrt the previous one; or the sender is sending very small segments,
    etc.) ss becomes very small.
As a matter of fact, h1 could be thought of as a "skip" field, thus
it might be necessary to encode it with a larger number of bits.

Maybe the data in b[] can be compressed further, especially if the
losses are not very large (in this case runs of '1' could be encoded
efficiently, as in FAX documenta). If the losses become large,
though, I suspect that there is little chance for compression.

---------

====================================================================
Luigi Rizzo                     Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it       Universita' di Pisa
tel: +39-50-568533              via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522              http://www.iet.unipi.it/~luigi/
====================================================================






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199512011302.OAA09790>