Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Mar 2000 04:20:24 -0600
From:      Gerd Knops <gerti@bitart.com>
To:        Alfred Perlstein <bright@wintelcom.net>
Cc:        gerti-freebsds@bitart.com, freebsd-stable@FreeBSD.ORG
Subject:   Re: Random signal 9 (SIGKILL), please help!
Message-ID:  <20000329102024.3950.qmail@camelot.bitart.com>
In-Reply-To: <20000328213754.L21029@fw.wintelcom.net>
References:  <20000329041104.3028.qmail@camelot.bitart.com> <20000328204948.K21029@fw.wintelcom.net> <20000329043747.3094.qmail@camelot.bitart.com> <20000328213754.L21029@fw.wintelcom.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Alfred Perlstein wrote:
> * Gerd Knops <gerti@bitart.com> [000328 21:03] wrote:
> > Alfred Perlstein wrote:
> > > * Gerd Knops <gerti@bitart.com> [000328 20:36] wrote:
> > > > Only on the FreeBSD systems I see that child processes occasionaly
> > > > get killed bya signal 9, and I just can't figure out why.
> > > >
> > > > Syslog does not give any indication. The machines do not swap (I
> > > > know processes mayget killed when the systems run out of swap
> > > > space). The times at which the processesare killed does seem to
> > > > be random, meaning it does not seem dome house keeping codethat
> > > > causes it.
> > > >
> > > > The processes are spawned from various daemons, and are killed
> > > > at different pointsin their existence, even when just barely
> > > > started and no resources to mention areconsumed yet.
> > > >
> > > > All processes run as root, so 'limit' should not be the cause.
> > > >
> > > > Is there anything else but the swapper that can trigger a 'signal
> > > > 9' to be sent toprocesses?
> > > >
> > > > The systems in question run a variety of versions, starting from
> > > > 3.2 Release to afairly recent (4 weeks) 3.4 stable.
> > > >
> > > This is on all the FreeBSD systems?  This is really confusing I've
> > > _never_ heard of this happening, do you have any machines built
> > > with the same _exact_ hardware exibiting the same problems or not?
> > >
> > Nope, different hardware, all Intel CPUs, some Pentium Pro, some
> > Pentium II, ASUS and Gigabyte motherboards.
> >
> > > Have you tried 4.0?  Without some sample code this is going to
> > > be very hard to reproduce.
> > >
> > The code is >50k lines of perl... No I have not tried 4.0 yet. And
> > I can not reproduce the problem either, it just randomly appears
> > at a very low rate. 23 machines running FreeBSD, and I see about 1
> > to 3 of those a day.
> >
> > > Are you sure you aren't running out of process slots?  What is
> > > maxusers set to in the kernel?
> >
> > 64.
>
> Try maybe 128?
>
> >
> > > How many processes typically run at the same time?
> > >
> > Varying, the busiest machine peaks at about 100 processes, but I
> > have seen it on machines running only 50 processes.
> >
> > Thanks for responding!
>
> I've never heard of signal 9 "by accident" and since this problem
> happens on a variety of 3.x systems 3.2-3.4 (3.4-stable also?)
> it seems really weird.
>
Yes, 3.4 stable as well.

> I don't think I can be of very much help without access to the
> code and the machines running it, as well as how it is being
> run, apache+cgi?
>
No, daemons started via a startup script which in turn is started
via rc.local.

I think I found a correlation between pid roll over (from 99999
to 0) and the spurious signals. Some program seems to keep
taps on pids that already went away, and when they 'come back' they
are killed again. I am suspicious of syslogd at the moment (I pipe
syslog output through a filter), one of the very few programs in the
base system that are running on those systems and that uses SIGKILL.

However it will probably take some time before I can wrap my head
around that code, it's not exactly heavily commented... If anyone
with more intimate knowledge could have a look I'd appreciate that.

Gerd


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000329102024.3950.qmail>