From owner-freebsd-stable@FreeBSD.ORG Tue May 13 23:47:50 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9011E106566C; Tue, 13 May 2008 23:47:50 +0000 (UTC) (envelope-from msaad@datapipe.com) Received: from exchfe01.datapipe-corp.net (exchfe01.datapipe-corp.net [64.106.130.69]) by mx1.freebsd.org (Postfix) with ESMTP id 59B588FC0A; Tue, 13 May 2008 23:47:50 +0000 (UTC) (envelope-from msaad@datapipe.com) Received: from divide.lan (192.168.128.20) by exchfe01.datapipe-corp.net (64.106.130.71) with Microsoft SMTP Server id 8.0.783.2; Tue, 13 May 2008 19:37:37 -0400 Message-ID: <482A2639.7000401@datapipe.com> Date: Tue, 13 May 2008 19:37:29 -0400 From: Mark Saad User-Agent: Thunderbird 2.0.0.14 (Macintosh/20080421) MIME-Version: 1.0 To: "freebsd-stable@freebsd.org" , Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: : Re: Socket leak (Was: Re: What triggers "No BufferSpace), ?Available" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: msaad@datapipe.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 May 2008 23:47:50 -0000 Hello All This issue goes back some time, but I do not see a solution. Sorry about the cross post not sure where this belongs. Here is an overview of my issue which is similar and I hope someone can point me in the direction of a solution. I have experiencing an odd socket related issue on a few servers i manage. They are fairly large ftp servers for popular north american news agency. They handle 1000's of ftp transactions per hour. Currently they are running FreeBSD 6.3-Release-p1 . I have verified this happened on FreeBSD 6.1-Release 6.1-Stable 6.2-Release 6.2-Stable and 7.0-Release all 32bit installs and in both SMP an UP kernels. Oddly this issue did not happen on FreeBSD 4.x . I have a similar setup that has a 1400+ Day uptime running FreeBSD 4.x-Release. The issue is after 7 to 14 days the servers lock up and will not create any new tcp sockets. The system used proftp with mysql for authentication of the ftp accounts. The system is also running Apache 2.2.x , Postfix, Cyrus, clam-av, Diablo JDK 1.5 for Resin Appserver and daemontools . The only sysctls that seem to help are kern.ipc.maxsockets and kern.maxusers . Currently they are set to 65535 and 1024 . Changing kern.ipc.maxsockbuf did not have any effect I tried bumping this up to 2Meg, In any case I started work on logging everything we could think of to see what was happening. I started logging the values of kern.ipc.numopensockets and I noticed that something is leaking sockets. Here is a sample of the log 2008-04-29--15:04.10 ____ kern.ipc.numopensockets: 1501 2008-04-29--16:04.01 ____ kern.ipc.numopensockets: 1535 2008-04-29--17:04.00 ____ kern.ipc.numopensockets: 1617 2008-04-29--18:04.00 ____ kern.ipc.numopensockets: 1710 This continues until kern.ipc.maxsockets its reached or the box is rebooted. The other thing we looked at was the output from vmstat -z The first thing was the high amount of malloc 128 bucket failures 128 Bucket: 524, 0, 2489, 80, 8364, 23055239 I also logged the mbuf clusters, we never reached the max mbuf clusters Its almost like there are stale sockets. Here is a snapshot of the server now ewr# sockstat -4u |wc -l 139 ewr# sysctl kern.ipc.numopensockets kern.ipc.numopensockets: 13935 ewr# uptime 7:30PM up 6 days, 26 mins, 3 users, load averages: 0.18, 0.25, 0.17 My questions. 1. If I can not identify who / what is consuming all my tcp sockets what will happen if I double or triple the value of kern.ipc.maxsockets ? 2. Could this be an issue with a low kern.maxusers . Its currently set to 1024 . Also at times when I can not create a new socket I am not pinned on mbuf clusters . I was able to verify this in the past. 3. I installed a debugging kernel, which I built on the server. I was able to get a coredump of the server at the point in time we last had an issue. But I am not sure what I can do with this, kernel debugging is way beyond what I am capable of doing . Do I want to even pursue this ? 4. Does anyone have any system tunings you could recommend for a high volume ftp site ? What does ftp.freebsd.org have ? -- Mark Saad msaad@datapipe.com