From owner-freebsd-stable@FreeBSD.ORG Wed Oct 1 11:50:48 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 13785106568D for ; Wed, 1 Oct 2008 11:50:48 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA02.emeryville.ca.mail.comcast.net (qmta02.emeryville.ca.mail.comcast.net [76.96.30.24]) by mx1.freebsd.org (Postfix) with ESMTP id ED7C78FC14 for ; Wed, 1 Oct 2008 11:50:47 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from OMTA07.emeryville.ca.mail.comcast.net ([76.96.30.59]) by QMTA02.emeryville.ca.mail.comcast.net with comcast id MP0Q1a00C1GXsucA2PqnmC; Wed, 01 Oct 2008 11:50:47 +0000 Received: from koitsu.dyndns.org ([67.180.253.227]) by OMTA07.emeryville.ca.mail.comcast.net with comcast id MPqm1a0054v8bD78TPqm2i; Wed, 01 Oct 2008 11:50:47 +0000 X-Authority-Analysis: v=1.0 c=1 a=QycZ5dHgAAAA:8 a=kb8kndks5vnnKcu7smUA:9 a=sZch_v3e5QWijE5lrlIA:7 a=9-yko1tzbrNF7AiPbLtBulX1kpgA:4 a=EoioJ0NPDVgA:10 a=LY0hPdMaydYA:10 Received: by icarus.home.lan (Postfix, from userid 1000) id 6E2F3C9432; Wed, 1 Oct 2008 04:50:46 -0700 (PDT) Date: Wed, 1 Oct 2008 04:50:46 -0700 From: Jeremy Chadwick To: Stephen Clark Message-ID: <20081001115046.GA20384@icarus.home.lan> References: <48E36204.5090108@earthlink.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <48E36204.5090108@earthlink.net> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: FreeBSD Stable Subject: Re: resource leak X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2008 11:50:48 -0000 On Wed, Oct 01, 2008 at 07:41:56AM -0400, Stephen Clark wrote: > Hello List, > > I am running into a strange problem that points to a resource leak. The > problem manifests itself after one of our remote systems has been up > around 100 days. > The symptom is that it appears no new processes can be spawned. If I try to > ssh to the unit, I can see the 3-way tcp handshake and then no more traffic. > Examining log files, like cron, etc show that when this happens no more entries > are written into the cron log. The unit is acting as a firewall, router > and vpn appliance these functions continue to work. We have a C > application that is periodically started out of a shell script that > reports various information about the system, it stops reporting, while > vpns, ospf routing, and ipfilter firewalling continue to work and write > into their logfiles. > > My question is how do I monitor the various resources in the system that could > prevent the spawning of a new process? Periodically logging "ps -auxw" output to a file would be useful, as ideally you'd gradually see the list get longer and longer over time; it's possible you have many zombie processes as a result of a parent which is not reaping its children (calling waitpid(2) or its friends). Other things that might come in useful are "fstat" and "vmstat -s". It sounds like your C program relies heavily on system() or execl() and fork(), which is why it's affected -- while the other programs are likely kernel-level. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |