Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Dec 1997 08:02:01 -0500
From:      Gary Palmer <gjp@erols.com>
To:        current@freebsd.org
Subject:   Re: crash (in networking code?)
Message-ID:  <349BC1C9.2781E494@erols.com>
References:  <349B1918.794BDF32@erols.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I wrote:
> 
> Hi,
> 
> We have a weird proxying system here running 100% custom code.
> As a test we put a new version on a FreeBSD snap release just
> to see how well it lasts compared to the Sun and Linux boxes we
> had been testing (and using previously). I used the latest
> ``stable'' snap:

Some more info.

When I killed the proxy process, the process table showed:

    0 22318     1   0 -14  0   172  344 ckvnlk D     ??    0:00.01
./epproxy
    0 25110   187   0 -14  0   172   52 ckvnlk D     ??    0:00.00 sleep
5
    0   187     1   0 -14  0   444  284 ckvnlk D     p0-   0:00.02
/bin/csh -f ./runit.sh

The sleep 5 is part of runit.sh ... runit.sh is meant to restart
the proxy incase it dies. Interestingly, the epproxy process
does *not* ever detatch from the controlling terminal, and
yet the ps above shows no controlling terminal. My current
theory is these panic's are a result of poor FD management in
the proxy code (i.e. stdin, stdout, stderr are never closed,
and hence shared amongst all the children). The backtrace
(see below) seems to indicate a vnode problem of some sort,
which sort of supports my theory. (The rest of the theory
is based on the fact that I'm not seeing this on other boxes,
and the only difference that I can see (obviously) is that
the other boxes have daemons which truly detatch and
close std(in|out|err). )

When I went to reboot the system to try and clear out the stuck
processes, I got a double panic (both were page faults). `bt' on 
he corefile (no debugging info, sorry), shows:

root@pproxy6:/usr/crash> gdb -k kernel.0 vmcore.0 
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for
details.
GDB 4.16 (i386-unknown-freebsd), 
Copyright 1996 Free Software Foundation, Inc...(no debugging symbols
found)...
IdlePTD 1cd000
current pcb at 1a9cb0
panic: page fault
#0  0xf01116bf in boot ()
(kgdb) bt
#0  0xf01116bf in boot ()
#1  0xf011198e in panic ()
#2  0xf017e861 in trap_fatal ()
#3  0xf017e2f4 in trap_pfault ()
#4  0xf017df5f in trap ()
#5  0xf010d4a0 in lockstatus ()
#6  0xf012d949 in vop_stdislocked ()
#7  0xf0165761 in ufs_vnoperate ()
#8  0xf0130774 in vfs_msync ()
#9  0xf0131250 in sync ()
#10 0xf011158b in boot ()
#11 0xf011198e in panic ()
#12 0xf017e861 in trap_fatal ()
#13 0xf017e2f4 in trap_pfault ()
#14 0xf017df5f in trap ()
#15 0xf010d4a0 in lockstatus ()
#16 0xf012d949 in vop_stdislocked ()
#17 0xf0165761 in ufs_vnoperate ()
#18 0xf0130774 in vfs_msync ()
#19 0xf0131250 in sync ()
#20 0xf012b661 in vfs_update ()
#21 0xf01052e2 in kproc_start ()
#22 0xf01754f3 in fork_trampoline ()
Cannot access memory at address 0x7fff004.

I have a kernel with full debugging support running now, and will
likely get a coredump tonite :-( Interestingly, when I tried
reproducing this on another machine (with a slightly older kernel)
I couldn't (but the other machine was under simulated load, not
real load)

Gary



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?349BC1C9.2781E494>