Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 Aug 2011 20:04:05 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Doug Barton <dougb@FreeBSD.org>
Cc:        freebsd-stable@FreeBSD.org, "Vogel, Jack" <jack.vogel@intel.com>
Subject:   Re: crash on 8.2-RELEASE amd64, high-traffic squid server
Message-ID:  <20110819030405.GA83032@icarus.home.lan>
In-Reply-To: <alpine.BSF.2.00.1108181931070.77926@172-17-198-245.tybonyfhvgr.arg>
References:  <alpine.BSF.2.00.1108181931070.77926@172-17-198-245.tybonyfhvgr.arg>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Aug 18, 2011 at 07:36:50PM -0700, Doug Barton wrote:
> Howdy,
> 
> I have some high-traffic squid servers, most of which are running a
> flavor of RELENG_7 very successfully, but one that I've been
> evaluating 8.x on has had a lot of problems. Most recently we had
> the crash below twice in the last 2 weeks. Same exact backtrace. Any
> suggestions on where to look would be appreciated.
> 
> 
> Thanks,
> 
> Doug
> 
> #0  doadump () at pcpu.h:224
> 224	pcpu.h: No such file or directory.
> 	in pcpu.h
> (kgdb) #0  doadump () at pcpu.h:224
> #1  0xffffffff803ec4be in boot (howto=260)
>     at /usr/src/sys/kern/kern_shutdown.c:419
> #2  0xffffffff803ec8f1 in panic (fmt=Variable "fmt" is not available.
> )
>     at /usr/src/sys/kern/kern_shutdown.c:592
> #3  0xffffffff8069a4d0 in trap_fatal (frame=0x1c, eva=Variable "eva" is not available.
> )
>     at /usr/src/sys/amd64/amd64/trap.c:783
> #4  0xffffffff8069aab9 in trap (frame=0xffffff800012f650)
>     at /usr/src/sys/amd64/amd64/trap.c:592
> #5  0xffffffff80682e84 in calltrap ()
>     at /usr/src/sys/amd64/amd64/exception.S:224
> #6  0xffffffff80698896 in bcopy ()
>     at /usr/src/sys/amd64/amd64/support.S:124
> #7  0xffffffff8044df61 in sbcompress (sb=0xffffff01d98945e0,
>     m=0xffffff010b815300, n=0xffffff006baa3700)
>     at /usr/src/sys/kern/uipc_sockbuf.c:779
> #8  0xffffffff8044e1e6 in sbappendstream_locked (sb=0xffffff01d98945e0,
>     m=0xffffff010b815300) at /usr/src/sys/kern/uipc_sockbuf.c:534
> #9  0xffffffff80527530 in tcp_do_segment (m=0xffffff010b815300, th=Variable "th" is not available.
> )
>     at /usr/src/sys/netinet/tcp_input.c:2588
> #10 0xffffffff80528b4b in tcp_input (m=0xffffff010b815300, off0=Variable "off0" is not available.
> )
>     at /usr/src/sys/netinet/tcp_input.c:1029
> #11 0xffffffff804c3b2c in ip_input (m=0xffffff010b815300)
>     at /usr/src/sys/netinet/ip_input.c:787
> #12 0xffffffff804a631e in netisr_dispatch_src (proto=1, source=Variable "source" is not available.
> )
>     at /usr/src/sys/net/netisr.c:917
> #13 0xffffffff8049d73d in ether_demux (ifp=0xffffff0002d30000,
>     m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:894
> #14 0xffffffff8049db2d in ether_input (ifp=0xffffff0002d30000,
>     m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:753
> #15 0xffffffff8027c18a in em_rxeof (rxr=0xffffff0002d7c600, count=98,
>     done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4293
> #16 0xffffffff8027c5a8 in em_handle_que (context=Variable "context" is not available.
> )
>     at /usr/src/sys/dev/e1000/if_em.c:1482
> #17 0xffffffff80429ab5 in taskqueue_run_locked (queue=0xffffff0002d8d800)
>     at /usr/src/sys/kern/subr_taskqueue.c:250
> #18 0xffffffff80429c4e in taskqueue_thread_loop (arg=Variable "arg" is not available.
> )
>     at /usr/src/sys/kern/subr_taskqueue.c:387
> #19 0xffffffff803c30f8 in fork_exit (
>     callout=0xffffffff80429c00 <taskqueue_thread_loop>,
>     arg=0xffffff80005a8748, frame=0xffffff800012fc40)
>     at /usr/src/sys/kern/kern_fork.c:845
> #20 0xffffffff8068334e in fork_trampoline ()
>     at /usr/src/sys/amd64/amd64/exception.S:565
> #21 0x0000000000000000 in ?? ()
> #22 0x0000000000000000 in ?? ()
> #23 0x0000000000000000 in ?? ()
> #24 0x0000000000000000 in ?? ()
> #25 0x0000000000000000 in ?? ()
> #26 0x0000000000000000 in ?? ()
> #27 0x0000000000000000 in ?? ()
> #28 0x0000000000000000 in ?? ()
> #29 0x0000000000000000 in ?? ()
> #30 0x0000000000000000 in ?? ()
> #31 0x0000000000000000 in ?? ()
> #32 0x0000000000000000 in ?? ()
> #33 0x0000000000000000 in ?? ()
> #34 0x0000000000000000 in ?? ()
> #35 0x0000000000000000 in ?? ()
> #36 0x0000000000000000 in ?? ()
> #37 0x0000000000000000 in ?? ()
> #38 0x0000000000000000 in ?? ()
> #39 0x0000000000000000 in ?? ()
> #40 0x0000000000000000 in ?? ()
> #41 0x0000000000000000 in ?? ()
> #42 0x0000000000000000 in ?? ()
> #43 0x0000000000000000 in ?? ()
> #44 0x0000000000000000 in ?? ()
> #45 0xffffffff8095ac00 in affinity ()
> #46 0x0000000000000000 in ?? ()
> #47 0x0000000000000000 in ?? ()
> #48 0xffffff0002d2d8c0 in ?? ()
> #49 0xffffff800012f320 in ?? ()
> #50 0xffffff800012f2c8 in ?? ()
> #51 0xffffff0002c59000 in ?? ()
> #52 0xffffffff80411db9 in sched_switch (td=0xffffffff80429c00,
>     newtd=0xffffff80005a8748, flags=Variable "flags" is not available.
> )
>     at /usr/src/sys/kern/sched_ule.c:1852
> Previous frame inner to this frame (corrupt stack?)
> (kgdb)

CC'ing Jack Vogel here, since I see em(4) is involved.  Jack will
probably want this data from the system:

# uname -a       (hostname can be XXX'd out)
# dmesg          (particularly the emX entries and driver version)
# pciconf -lvbc  (specifically the emX entries and related data)
# ifconfig -a    (IPs and MACs can be X'd out; mainly interested in
                  options and other pieces)
# netstat -m     (if possible from a system which has been up a while
                  and is a likely crash candidate)
# vmstat -i      (same condition as netstat -m)

There isn't enough data above for me to determine what's going on, but
from the stack trace it looks like sbcompress() may be given some data
which is null or inaccessible.  The source for that hasn't been touched
directly in a while.  The TCP stack/code, however, has been (since
8.2-RELEASE for sure).  I think em(4) has as well.  This may end up
being a case where running RELENG_8 is the fix, but I'd love to be able
to say that for certain.

"bt full" would be helpful but the above indicates the kernel might not
have debugging symbols included in it?  I've seen this kind of output
even on a system with "makeoptions DEBUG=-g" in its kernel config before
though.  Never was sure how to deal with that problem.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110819030405.GA83032>