Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Jul 1999 16:59:37 -0400 (EDT)
From:      Snob Art Genre <ben@narcissus.net>
To:        hackers@freebsd.org
Subject:   amandad zombies (fwd)
Message-ID:  <Pine.BSF.3.96.990720165529.41250C-100000@narcissus.net>

next in thread | raw e-mail | index | archive | help
Hi there,
  I've been beating my head against a mysterious problem for some
time now, and I'm hoping that you folks can help me out.  When I run
amanda, seven out of my 16 hosts don't respond.  Of these, some are
Solaris and some are FreeBSD 3.1-RELEASE, but it's the FreeBSD ones I'm
concerned with at the moment.  I'm using Amanda 2.4.1.  (Note that the
symptomology on the Solaris machines is different, which is why I'm
posting this to -hackers.)

>From my experiments with amcheck and snoop, it looks like the amandad on
the affected clients is doing OK for one connection, but then the
amandad process sticks around as a zombie and apparently inetd won't
spawn a new one until the old one dies.  The only way to get rid of the
zombie that I've been able to find (besides rebooting of course) is to kill
inetd.

(on my backup host)
bash-2.02$ amcheck -c general

(on a malfunctioning client)
cache2# inetd -d
ADD : amanda proto=udp accept=0 max=1 user=backup group=(null)
class=daemon builtin=0x0 server=/usr/local/amanda/libexec/amandad
inetd: enabling amanda, fd 4
inetd: registered /usr/local/amanda/libexec/amandad on 4
inetd: someone wants amanda
inetd: inetd: disabling amanda, fd 4+ closing from 4

inetd: 6692 execl /usr/local/amanda/libexec/amandad

Looks fine.  But that's with a freshly started inetd -- the second time
I run amcheck, I get no output from inetd.

Now let's look at the truss output of inetd while I do an amcheck.  This
is with a fresh inetd.

cache2# truss -p `cat /var/run/inetd.pid`
syscall (null)()
        returns 1 (0x1)
syscall sigprocmask(0x1,0x82001)
        returns 0 (0x0)
syscall gettimeofday(0x80580ac,0x0)
        returns 0 (0x0)
syscall fork()
        returns 6717 (0x1a3d)
syscall sigprocmask(0x3,0x0)
        returns 532481 (0x82001)
syscall sigprocmask(0x1,0x82001)
        returns 0 (0x0)
SIGNAL 20
SIGNAL 20
SIGNAL 20
syscall sigsuspend(0x0)
        errno 4 'Interrupted system call'
syscall write(5,0xefbfda3b,1)
        returns 1 (0x1)
syscall sigreturn(0xefbfda64)
        errno 4 'Interrupted system call'

And here it stays ... while I do another amcheck, even.  It looks like
SIGCHLD is being delivered, but I don't see any wait-type syscalls.

Any thoughts, anyone?

--
 Ben

UNIX Systems Engineer, Skunk Group
StarMedia Network, Inc.



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.990720165529.41250C-100000>