Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 01 Feb 2007 17:23:08 -0600
From:      Dave Baukus <david.baukus@us.fujitsu.com>
To:        freebsd-net@freebsd.org
Cc:        "Baukus, David" <David.Baukus@us.fujitsu.com>
Subject:   ETIMEDOUT bug
Message-ID:  <45C2765C.7010708@us.fujitsu.com>

next in thread | raw e-mail | index | archive | help
There is a bug  tcp_output() for at least freeBSD6.1
that causes a perfectly good TCP to be dropped by its
retransmit timer; the application receives ETIMEDOUT.

Consider a TCP that never transmits (the receive end of the ttcp
utility is an example), while the TCP is established
snd_max == snd_una == snd_nxt == (isr + 1) and the retransmit
timer should never be started. If the retransmit timer is started
then it is never stopped by tcp_input/tcp_out because
snd_max == snd_una == snd_nxt (always). Once started the
timer continues its count up till tp->t_rxtshift == 12 and
the connection that never transmitted gets falsely killed.

The bug is to blindly rely on the return value of ip_output().
If ip_output() returns ENOBUFS then the retransmit timer is
activated:

 From the end of tcp_output():
out:
SOCKBUF_UNLOCK_ASSERT(&so->so_snd);	/* Check gotos. */
if (error == ENOBUFS) {
         if (!callout_active(tp->tt_rexmt) &&
             !callout_active(tp->tt_persist))
                      callout_reset(tp->tt_rexmt, tp->t_rxtcur,
                          tcp_timer_rexmt, tp);
                      tp->snd_cwnd = tp->t_maxseg;
                      return (0);
}

My simple minded fix would be not to start the retransmit timer;
if tcp_output() wanted to time this transmit it would have started
the timer up above.

This ETIMEDOUT problem is easily recreated on any old machine
using a single slow ethernet device and the ttcp test utility.
First, fire up a couple ttcp receivers. Second, flood the same
interface with enough ttcp transmitters to cause the driver's transmit
ring and interface queue to back up. Eventually, one of the ttcp
receives will get ENOBUFS from ip_output() and the retransmit
timer will be wrongly activated for a pure ACK segment.

I was able to do it w/ the following on freeBSD6.1:

box1:
ttcp -s -l 16384 -p 9444 -v -b 128000 -r
ttcp -s -l 16384 -p 9445 -v -b 128000 -r
ttcp -s -n 6553600 -l 4096 -p 9446 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9447 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9448 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9449 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9450 -v -b 128000 -t 192.168.222.13

box2:
ttcp -s -n 6553600 -l 8192 -p 9444 -v -b 128000 -t  192.168.222.222
ttcp -s -n 9999999 -l 128  -p 9445 -v -b 128000  -t  192.168.222.222
ttcp -s -l 16384 -p 9446 -v -b 128000 -r
ttcp -s -l 16384 -p 9447 -v -b 128000 -r
ttcp -s -l 16384 -p 9448 -v -b 128000 -r
ttcp -s -l 16384 -p 9449 -v -b 128000 -r
ttcp -s -l 16384 -p 9450 -v -b 128000 -r

-- 
Dave Baukus
    david.baukus@us.fujitsu.com

    Fujitsu Network Communications
          Richardson, Texas
                  USA



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45C2765C.7010708>