Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 May 2008 19:38:16 -0400
From:      Mark Saad <msaad@datapipe.com>
To:        "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>, <freebsd-hackers@freebsd.org>
Subject:   : Re: Socket leak (Was: Re: What triggers "No BufferSpace), ?Available"
Message-ID:  <482A2668.8070209@datapipe.com>

next in thread | raw e-mail | index | archive | help
Hello All
  This issue goes back some time, but I do not see a solution. Sorry 
about the cross post
not sure where this belongs. Here is an overview of my issue which is 
similar and I hope
someone can point me in the direction of a solution.

I have experiencing an odd socket related issue on a few servers i
manage. They are fairly large ftp servers for popular north american news
agency. They handle 1000's of ftp transactions per hour. Currently they are running
FreeBSD 6.3-Release-p1 . I have verified this happened on FreeBSD
6.1-Release 6.1-Stable 6.2-Release 6.2-Stable and 7.0-Release all 32bit
installs and in both SMP an UP kernels. Oddly this issue did not happen
on FreeBSD 4.x . I have a similar setup that has a 1400+ Day uptime
running FreeBSD 4.x-Release. 

The issue is after 7 to 14 days the servers lock up and will not create any new 
tcp sockets. The system used proftp with mysql for authentication of the ftp accounts.
The system is also running Apache 2.2.x , Postfix, Cyrus, clam-av, Diablo JDK 1.5 for Resin Appserver
and daemontools .

The only sysctls that seem to help are kern.ipc.maxsockets and
kern.maxusers . Currently they are set to 65535 and 1024 .
Changing kern.ipc.maxsockbuf did not have any effect I tried bumping this 
up to 2Meg, 

In any case I started work on logging everything we could think 
of to see what was happening.

I started logging the values of kern.ipc.numopensockets and I noticed
that something is leaking sockets. Here is a sample of the log

2008-04-29--15:04.10 ____ kern.ipc.numopensockets: 1501
2008-04-29--16:04.01 ____ kern.ipc.numopensockets: 1535
2008-04-29--17:04.00 ____ kern.ipc.numopensockets: 1617
2008-04-29--18:04.00 ____ kern.ipc.numopensockets: 1710

This continues until kern.ipc.maxsockets its reached or the box is
rebooted.

The other thing we looked at was the output from vmstat -z
The first thing was the high amount of malloc 128 bucket failures

128 Bucket:    524,        0,     2489,       80,     8364, 23055239

I also logged the mbuf clusters, we never reached the max mbuf clusters

Its almost like there are stale sockets. Here is a snapshot of the server now

ewr# sockstat -4u |wc -l
     139
ewr# sysctl kern.ipc.numopensockets
kern.ipc.numopensockets: 13935

ewr# uptime
7:30PM  up 6 days, 26 mins, 3 users, load averages: 0.18, 0.25, 0.17


My questions.

1. If I can not identify who / what is consuming all my tcp sockets
what will happen if I double or triple the value of kern.ipc.maxsockets ?

2. Could this be an issue with a low kern.maxusers . Its currently set
to 1024 . Also at times when I can not create a new socket I am not
pinned on mbuf clusters . I was able to verify this in the past.

3. I installed a debugging kernel, which I built on the server. I was able to
get a coredump of the server at the point in time we last had an issue.
But I am not sure what I can do with this, kernel debugging is way
beyond what I am capable of doing . Do I want to even pursue this ?

4. Does anyone have any system tunings you could recommend for a high volume
ftp site ? What does ftp.freebsd.org have ?


-- 
Mark Saad
msaad@datapipe.com






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?482A2668.8070209>