Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Apr 2008 01:47:13 +0200
From:      Andre Oppermann <andre@freebsd.org>
To:        Mark Hills <mark@pogo.org.uk>
Cc:        Peter Jeremy <peterjeremy@optushome.com.au>, freebsd-net@freebsd.org
Subject:   Re: read() returns ETIMEDOUT on steady TCP connection
Message-ID:  <480E7901.5000804@freebsd.org>
In-Reply-To: <480C9AC6.8090802@freebsd.org>
References:  <alpine.BSO.1.10.0804191437400.21362@zrgural.vwaro.pbz>	<20080420025010.GJ73016@server.vk2pj.dyndns.org>	<alpine.BSO.1.10.0804201238480.31900@zrgural.vwaro.pbz>	<480BBD7E.8010700@freebsd.org>	<alpine.BSO.1.10.0804210740100.1745@zrgural.vwaro.pbz> <480C9AC6.8090802@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Andre Oppermann wrote:
> Mark Hills wrote:
>> On Mon, 21 Apr 2008, Andre Oppermann wrote:
>>
>>> Mark Hills wrote:
>>>> On Sun, 20 Apr 2008, Peter Jeremy wrote:
>>
>>>>> I can't explain the problem but it definitely looks like a resource
>>>>> starvation issue within the kernel.
>>>>
>>>> I've traced the source of the ETIMEDOUT within the kernel to 
>>>> tcp_timer_rexmt() in tcp_timer.c:
>>>>
>>>>   if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) {
>>>>           tp->t_rxtshift = TCP_MAXRXTSHIFT;
>>>>           tcpstat.tcps_timeoutdrop++;
>>>>           tp = tcp_drop(tp, tp->t_softerror ?
>>>>                         tp->t_softerror : ETIMEDOUT);
>>>>           goto out;
>>>>   }
>>>
>>> Yes, this is related to either lack of mbufs to create a segment
>>> or a problem in sending it.  That may be full interface queue, a
>>> bandwidth manager (dummynet) or some firewall internally rejecting
>>> the segment (ipfw, pf).  Do you run any firewall in stateful mode?
>>
>> There's no firewall running.
>>
>>>> I'm new to FreeBSD, but it seems to implies that it's reaching a 
>>>> limit of a number of retransmits of sending ACKs on the TCP 
>>>> connection receiving the inbound data? But I checked this using 
>>>> tcpdump on the server and could see no retransmissions.
>>>
>>> When you have internal problems the segment never makes it to the
>>> wire and thus you wont see it in tcpdump.
>>>
>>> Please report the output of 'netstat -s -p tcp' and 'netstat -m'.
>>
>> Posted below. You can see it it in there: "131 connections dropped by 
>> rexmit timeout"
>>
>>>> As a test, I ran a simulation with the necessary changes to increase 
>>>> TCP_MAXRXTSHIFT (including adding appropriate entries to 
>>>> tcp_sync_backoff[] and tcp_backoff[]) and it appeared I was able to 
>>>> reduce the frequency of the problem occurring, but not to a usable 
>>>> level.
>>>
>>> Possible causes are timers that fire too early.  Resource starvation
>>> (you are doing a lot of traffic).  Or of course some bug in the code.
>>
>> As I said in my original email, the data transfer doesn't stop or 
>> splutter, it's simply cut mid-flow. Sounds like something happening 
>> prematurely.
>>
>> Thanks for the help,
> 
> The output doesn't show any obvious problems.  I have to write some
> debug code to run on your system.  I'll do that later today if time
> permits.  Otherwise tomorrow.

  http://people.freebsd.org/~andre/tcp_output-error-log.diff

Please apply this patch and enable the sysctl net.inet.tcp.log_debug=1
and report any output.  You likely get some (normal) noise from syncache.
What we are looking for is reports from tcp_output.

-- 
Andre



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?480E7901.5000804>