Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Oct 2008 05:49:55 -0700
From:      Jeremy Chadwick <koitsu@FreeBSD.org>
To:        Stephen Clark <sclark46@earthlink.net>
Cc:        FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Re: resource leak
Message-ID:  <20081001124955.GA21577@icarus.home.lan>
In-Reply-To: <48E36D62.6090001@earthlink.net>
References:  <48E36204.5090108@earthlink.net> <20081001115046.GA20384@icarus.home.lan> <48E36D62.6090001@earthlink.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Oct 01, 2008 at 08:30:26AM -0400, Stephen Clark wrote:
> Jeremy Chadwick wrote:
>> On Wed, Oct 01, 2008 at 07:41:56AM -0400, Stephen Clark wrote:
>>> Hello List,
>>>
>>> I am running into a strange problem that points to a resource leak. 
>>> The problem manifests itself after one of our remote systems has been 
>>> up around 100 days.
>>> The symptom is that it appears no new processes can be spawned. If I try to
>>> ssh to the unit, I can see the 3-way tcp handshake and then no more traffic.
>>> Examining log files, like cron, etc show that when this happens no more entries
>>> are written into the cron log. The unit is acting as a firewall, 
>>> router and vpn appliance these functions continue to work. We have a 
>>> C application that is periodically started out of a shell script that 
>>> reports various information about the system, it stops reporting, 
>>> while vpns, ospf routing, and ipfilter firewalling continue to work 
>>> and write into their logfiles.
>>>
>>> My question is how do I monitor the various resources in the system that could
>>> prevent the spawning of a new process?
>>
>> Periodically logging "ps -auxw" output to a file would be useful, as
>> ideally you'd gradually see the list get longer and longer over time;
>> it's possible you have many zombie processes as a result of a parent
>> which is not reaping its children (calling waitpid(2) or its friends).
>>
>> Other things that might come in useful are "fstat" and "vmstat -s".
>>
>> It sounds like your C program relies heavily on system() or execl() and
>> fork(), which is why it's affected -- while the other programs are
>> likely kernel-level.
>>
> Thanks Jeremy,
>
> I have added those commands to a periodic daily script.
>
> Another thing I have noticed is that quite often the problem seems to
> start at 2am in the morning, right when the periodic daily script runs.
>
> But I think it is coincidence and that we have reached the edge of the 
> resource limit and all the jobs that get spawned by the periodic daily 
> scripts pushes us over the limit.
>
> The other thing is that having logged into some of the systems that have 
> been up in the 80 day range, I don't see a lot/any zombies. I just wonder 
> if it is and fd leak, the fstat should point that out.

You might find the below thread beneficial -- an individual came to the
lists stating that they were running out of fds as a result of some
Java software running amok on their systems.

http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/thread.html#45383
http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045383.html

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081001124955.GA21577>