Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Mar 2015 11:44:31 -0800
From:      Rumen Telbizov <telbizov@gmail.com>
To:        "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
Subject:   Re: Stale TIME_WAIT tcp connections
Message-ID:  <CAENR%2B_W-TwKp_ibOrzuriYBiL0Rvz6-uEtXVDad5P8CFmGaOCA@mail.gmail.com>
In-Reply-To: <op.xux9mtx6g7njmm@michael-think.fritz.box>
References:  <CAENR%2B_U2H9Vf1xNjOGEsc0BuLhpTNL0iz81p5qDUxS_kdvfX5w@mail.gmail.com> <op.xux9mtx6g7njmm@michael-think.fritz.box>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello again,

Thank you for the responses.
No I don't have any IPSEC in the kernel. Further observations overnight
revealed that:

a) Those "stale" TIME_WAIT sockets do expire at some time, since I was
watching one of them which seemed to stay around for hours but in the
morning it actually was gone.

b) It seems like both sockets which don't get established (only syn sent)
are getting registered and get stuck there as well as fully established and
properly closed ones. Here are a couple of examples:

Monitoring the traffic from a specific client host (server IP obfuscated to
1.2.3.4, client IP to 5.6.7.8):

IP 5.6.7.8.43440 > 1.2.3.4.5666: Flags [S], seq 4056322107, win 5840,
options [mss 1460,sackOK,TS val 729030596 ecr 0,nop,wscale 7], length 0
IP 5.6.7.8.43437 > 1.2.3.4.5666: Flags [S], seq 3979308195, win 5840,
options [mss 1460,sackOK,TS val 729031604 ecr 0,nop,wscale 7], length 0

Those are connections that never got established. I picked up and watched
one of those syn-only tuples and it seems like it does allocate and consume
a connection:

# date ; sockstat | grep 5.6.7.8:43440
Wed Mar  4 19:02:24 UTC 2015
?        ?          ?     ?  tcp4   1.2.3.4:5666     5.6.7.8:43440
# date ; sockstat | grep 5.6.7.8:43440
Wed Mar  4 19:10:11 UTC 2015
?        ?          ?     ?  tcp4   1.2.3.4:5666     5.6.7.8:43440
# date ; netstat -na | grep 5.6.7.8.43440
Wed Mar  4 19:38:56 UTC 2015
tcp4       0      0 1.2.3.4.5666      5.6.7.8.43440     TIME_WAIT


And here's a properly established and closed TCP socket between the same
client and server:

19:14:47.827359 IP 5.6.7.8.33877 > 1.2.3.4.5666: Flags [S], seq 3819001779,
win 5840, options [mss 1460,sackOK,TS val 729095309 ecr 0,nop,wscale 7],
length 0
19:14:47.827390 IP 1.2.3.4.5666 > 5.6.7.8.33877: Flags [S.], seq
2990857548, ack 3819001780, win 65535, options [mss 1436,nop,wscale
6,sackOK,TS val 2460189516 ecr 729095309], length 0
19:14:47.979287 IP 5.6.7.8.33877 > 1.2.3.4.5666: Flags [.], ack 1, win 46,
options [nop,nop,TS val 729095347 ecr 2460189516], length 0
19:14:47.979408 IP 5.6.7.8.33877 > 1.2.3.4.5666: Flags [P.], seq 1:1041,
ack 1, win 46, options [nop,nop,TS val 729095347 ecr 2460189516], length
1040
19:14:47.980136 IP 1.2.3.4.5666 > 5.6.7.8.33877: Flags [F.], seq 1, ack
1041, win 1045, options [nop,nop,TS val 2460189668 ecr 729095347], length 0
19:14:48.132156 IP 5.6.7.8.33877 > 1.2.3.4.5666: Flags [F.], seq 1041, ack
2, win 46, options [nop,nop,TS val 729095386 ecr 2460189668], length 0
19:14:48.132173 IP 1.2.3.4.5666 > 5.6.7.8.33877: Flags [.], ack 1042, win
1045, options [nop,nop,TS val 2460189821 ecr 729095386], length 0


It also gets stuck there for quite a while:

# sockstat | grep 5.6.7.8:33877
?        ?          ?     ?  tcp4   1.2.3.4:5666     5.6.7.8:33877
# date ; netstat -na | grep 5.6.7.8.33877
Wed Mar  4 19:16:09 UTC 2015
tcp4       0      0 1.2.3.4.5666      5.6.7.8.33877     TIME_WAIT
# date ; netstat -na | grep 5.6.7.8.33877
Wed Mar  4 19:31:31 UTC 2015
tcp4       0      0 1.2.3.4.5666      5.6.7.8.33877     TIME_WAIT

So naturally the server never manages to get "on top of things" due to not
discarding those on time.

Any other ideas and suggestions?

Regards,
Rumen Telbizov

On Tue, Mar 3, 2015 at 5:41 PM, Michael Ross <gmx@ross.cx> wrote:

> On Wed, 04 Mar 2015 01:36:18 +0100, Rumen Telbizov <telbizov@gmail.com>
> wrote:
>
>  Hello everyone,
>>
>> We have a server running 9.3-RELEASE which is exhibiting a high number o=
f
>> TIME_WAIT tcp connections which are NOT being recycled. That is, netstat
>> reports them over and over again, no matter how long we wait for them to
>> be
>> flushed out. Currently this server has been out of rotation for a couple
>> of
>> hours and I still see the same tcp sockets there. Overall we have:
>>
>> # netstat -na | grep TIME_WAIT | wc -l
>>    *30066*
>>
>> Tracking one particular TCP socket in TIME_WAIT proves that it stays the=
re
>> all the time.
>>
>> Another observation is that pfctl shows a very large number of state
>> entries, even after pfctl -F all, or disable/enable sequence.
>>
>> # pfctl -si
>> State Table                          Total             Rate
>>   current entries                    *59280*
>>
>> At the same time though:
>>
>> # pfctl -ss | wc -l
>>       18
>>
>> After the problem was discovered we tried tweaking the following setting=
s
>> without any luck:
>>
>> net.inet.tcp.fast_finwait2_recycle=3D1
>> net.inet.tcp.finwait2_timeout=3D5000
>> net.inet.tcp.maxtcptw=3D50000
>> net.inet.tcp.msl=3D100
>>
>> =E2=80=8BSo it seems like this system is "stuck" and =E2=80=8Bdoesn't re=
cycle those TCP
>> sockets. Again, the machine is out of rotation and not actively acceptin=
g
>> any traffic. I will keep it like that in case further investigation is
>> required. Please do let me know if there's anything else you'd like to
>> know
>> from the state of the machine or something I could try.
>>
>> =E2=80=8BRegards,
>>
>
> Are you using any IPSEC?
> I observed something similar a while back, haven't checked again since i
> reported this.
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D194690
> Affected 9.2, too.
>
> Michael
>



--=20
Rumen Telbizov
Unix Systems Administrator <http://telbizov.com>;



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAENR%2B_W-TwKp_ibOrzuriYBiL0Rvz6-uEtXVDad5P8CFmGaOCA>