Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 28 Oct 2007 14:52:16 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Chris Chou <m2chrischou@gmail.com>
Cc:        darrenr@FreeBSD.org, freebsd-stable@freebsd.org
Subject:   Re: FreeBSD 7.0 crashed when running super-smack upon PostgreSQL
Message-ID:  <20071028145040.C32129@fledge.watson.org>
In-Reply-To: <4724752D.9080605@GMail.com>
References:  <47243533.2040107@GMail.com> <20071028113153.S32129@fledge.watson.org> <4724752D.9080605@GMail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sun, 28 Oct 2007, Chris Chou wrote:

> Hi Robert,
>
> This is the backtrace output in the culprit thread (100100):

<snip>

> #0  doadump () at pcpu.h:195
> 195        __asm __volatile("movl %%fs:0,%0" : "=r" (td));
> (kgdb) thread
> [Current thread is 134 (Thread 100123)]
> (kgdb) thread 133
> [Switching to thread 133 (Thread 100100)]#0  sched_switch (td=0xc46d6210,
>   newtd=Variable "newtd" is not available.
> ) at /usr/src/sys/kern/sched_ule.c:1908
> 1908            cpuid = PCPU_GET(cpuid);
> (kgdb) backtrace
> #0  sched_switch (td=0xc46d6210, newtd=Variable "newtd" is not available.
> ) at /usr/src/sys/kern/sched_ule.c:1908
> #1  0xc061a1b3 in mi_switch (flags=Variable "flags" is not available.
> ) at /usr/src/sys/kern/kern_synch.c:442
> #2  0xc063f027 in sleepq_switch (wchan=Variable "wchan" is not available.
> )
>   at /usr/src/sys/kern/subr_sleepqueue.c:459
> #3  0xc063f7d6 in sleepq_wait (wchan=0xc093cae0)
>   at /usr/src/sys/kern/subr_sleepqueue.c:542
> #4  0xc0619a46 in _sx_xlock_hard (sx=0xc093cae0, tid=3295502864, opts=0,
>   file=0x0, line=0) at /usr/src/sys/kern/kern_sx.c:555
> #5  0xc04a8ea7 in fr_checknatout (fin=0xe69009b4, passp=0xe69009b0) at 
> sx.h:153
> #6  0xc049a7d0 in fr_check (ip=0xc4244a44, hlen=20, ifp=0xc40dc400, out=1,
>   mp=0xe6900a98) at /usr/src/sys/contrib/ipfilter/netinet/fil.c:2602
> #7  0xc049e01f in fr_check_wrapper (arg=0x0, mp=0xe6900a98, ifp=0xc40dc400,
>   dir=2) at /usr/src/sys/contrib/ipfilter/netinet/ip_fil_freebsd.c:178
> #8  0xc06b33d8 in pfil_run_hooks (ph=0xc0960b20, mp=0xe6900b24,
>   ifp=0xc40dc400, dir=2, inp=0xc494be70) at /usr/src/sys/net/pfil.c:78
> #9  0xc06eaea2 in ip_output (m=0xc4244a00, opt=0x0, ro=0xe6900af8, flags=0,
>   imo=0x0, inp=0xc494be70) at /usr/src/sys/netinet/ip_output.c:438
> #10 0xc074841c in tcp_output (tp=0xc541a000)
>   at /usr/src/sys/netinet/tcp_output.c:1127
> #11 0xc0753061 in tcp_usr_connect (so=0xc5204dec, nam=0xc42869a0,
>   td=0xc46d6210) at /usr/src/sys/netinet/tcp_usrreq.c:479
> #12 0xc06606d2 in soconnect (so=0xc5204dec, nam=0xc42869a0, td=0xc46d6210)
>   at /usr/src/sys/kern/uipc_socket.c:765
> ---Type <return> to continue, or q <return> to quit---
> #13 0xc06670bc in kern_connect (td=0xc46d6210, fd=4, sa=0xc42869a0)
>   at /usr/src/sys/kern/uipc_syscalls.c:558
> #14 0xc0667246 in connect (td=0xc46d6210, uap=0xe6900cfc)
>   at /usr/src/sys/kern/uipc_syscalls.c:526
> #15 0xc0873c35 in syscall (frame=0xe6900d38)
>   at /usr/src/sys/i386/i386/trap.c:1008
> #16 0xc085deb0 in Xint0x80_syscall () at 
> /usr/src/sys/i386/i386/exception.s:196
> #17 0x00000033 in ?? ()
> Previous frame inner to this frame (corrupt stack?)
>
> The crash occurred when I running: super-smack -d pg select-key.smack 35 
> 10000 that super-smack started 35 processes.
>
> The postgresql and super-smack are all inside a jail environment with 
> jail_sysvipc_allow="YES".
>
> The firewall package is ipfilter. I enable it in /etc/rc.conf as following:
> ipfilter_enable="YES"
> ipfilter_rules="/etc/ipf.rules"
> ipmon_enable="YES"
> ipmon_flags="-Ds"
> ipnat_enable="YES"
> ipnat_rules="/etc/ipnat.rules"
>
> Do you mean you want to get the core file and the debug kernel? I can put 
> them on a place you can reach.

Actually, this is sufficient to track the problem -- this is due to the use of 
the sx(9) lock primitive in ipfilter in RELENG_7; Darren has fixed this in 
HEAD but not MFC'd it yet.  I've CC'd Darren to remind him that he needs to 
MFC the conversion to rwlock(9) as soon as possible so that users of 7.x betas 
(and eventually release) don't see this panic.

Robert N M Watson
Computer Laboratory
University of Cambridge

>
> Best regards,
>
> Chris
>
> Robert Watson wrote:
>> On Sun, 28 Oct 2007, Chris Chou wrote:
>> 
>>> Dear all,
>>> 
>>> FreeBSD RELENG_7.0 crashed when running super-smack benchmark test upon 
>>> jailed postgresql.
>> 
>> Chris,
>> 
>> Thanks for this report.  Sounds like a bug somewhere :-).  Is there any 
>> chance I can get remote access to a box holding the synchronized kernel, 
>> kernel debugging symbols, source code, and core dump?  Could I ask you also 
>> to be careful not to delete the kernel with symbols or change the source 
>> code on the box so that the core and associated parts all remain in sync? 
>> I don't strictly need access to the core directly, since we may well be 
>> able to reasonably debug this without that, so here are some directions to 
>> try and see how far we get:
>> 
>> It sounds like TCP in a user thread is stumbling over a violation of system 
>> invariants (don't sleep while holding a mutex) performed by another thread. 
>> We need to track down the original thread and figure out why it's sleeping 
>> while holding that lock -- perhaps it's a user thread performing a 
>> copyin/copyout holding the lock, or perhaps an ithread or other software 
>> interrupt thread acquiring a lock of an inappropriate type (such as an sx 
>> lock) while holding the lock.
>> 
>> The debugging output, 'tid 100100, pid 9284', tells us which thread/process 
>> the culprit may be.  We need to get a kernel stack trace of that thread and 
>> that should shed some light on matters.  "info threads" and "thread" can be 
>> used in kgdb to list threads and select a thread.  kgdb has its own thread 
>> ID scheme so you'll need to use info threads to track down the right kgdb 
>> thread id before you can select the identified kernel thread/process.  Once 
>> it's selected, "bt" should print the backtrace of that thread, and that may 
>> give us a good starting point.
>> 
>> Could you let me know what, if any, firewall packages you are using?
>> 
>> Robert N M Watson
>> Computer Laboratory
>> University of Cambridge
>> 
>>> 
>>> uname -a
>>> FreeBSD mercury 7.0-BETA1 FreeBSD 7.0-BETA1 #4: Fri Oct 26 23:49:24 CST 
>>> 2007 chris@mercury:/usr/obj/usr/src/sys/MERCURY  i386
>>> 
>>> I have got a dumped core, and the following backtrace:
>>> 
>>> kgdb: kvm_nlist(_stopped_cpus):
>>> kgdb: kvm_nlist(_stoppcbs):
>>> [GDB will not be able to debug user-mode threads: 
>>> /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
>>> GNU gdb 6.1.1 [FreeBSD]
>>> Copyright 2004 Free Software Foundation, Inc.
>>> GDB is free software, covered by the GNU General Public License, and you 
>>> are
>>> welcome to change it and/or distribute copies of it under certain 
>>> conditions.
>>> Type "show copying" to see the conditions.
>>> There is absolutely no warranty for GDB.  Type "show warranty" for 
>>> details.
>>> This GDB was configured as "i386-marcel-freebsd".
>>> 
>>> Unread portion of the kernel message buffer:
>>> Sleeping thread (tid 100100, pid 9284) owns a non-sleepable lock
>>> panic: sleeping thread
>>> Uptime: 15h3m6s
>>> Physical memory: 1011 MB
>>> Dumping 223 MB: 208 192 176 160 144 128 112 96 80 64 48 32 16
>>> 
>>> #0  doadump () at pcpu.h:195
>>> 195        __asm __volatile("movl %%fs:0,%0" : "=r" (td));
>>> (kgdb) backtrace
>>> #0  doadump () at pcpu.h:195
>>> #1  0xc0612624 in boot (howto=260) at 
>>> /usr/src/sys/kern/kern_shutdown.c:409
>>> #2  0xc0612824 in panic (fmt=Variable "fmt" is not available.
>>> ) at /usr/src/sys/kern/kern_shutdown.c:563
>>> #3  0xc064131b in propagate_priority (td=0xc46d6210)
>>>   at /usr/src/sys/kern/subr_turnstile.c:222
>>> #4  0xc0641cf8 in turnstile_wait (ts=0xc40b5b90, owner=0xc46d6210, 
>>> queue=Variable "queue" is not available.
>>> )
>>>   at /usr/src/sys/kern/subr_turnstile.c:739
>>> #5  0xc0606dcd in _mtx_lock_sleep (m=0xc09617ac, tid=3295513696, opts=0,
>>>   file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:394
>>> #6  0xc0750c64 in tcp_usr_attach (so=0xc49014a4, proto=0, td=0xc46d8c60)
>>>   at /usr/src/sys/netinet/tcp_usrreq.c:1421
>>> #7  0xc0661da4 in socreate (dom=2, aso=0xe6924c70, type=1, proto=0,
>>>   cred=0xc4971900, td=0xc46d8c60) at /usr/src/sys/kern/uipc_socket.c:376
>>> #8  0xc0667b1b in socket (td=0xc46d8c60, uap=0xe6924cfc)
>>>   at /usr/src/sys/kern/uipc_syscalls.c:178
>>> #9  0xc0873c35 in syscall (frame=0xe6924d38)
>>>   at /usr/src/sys/i386/i386/trap.c:1008
>>> #10 0xc085deb0 in Xint0x80_syscall () at 
>>> /usr/src/sys/i386/i386/exception.s:196
>>> #11 0x00000033 in ?? ()
>>> Previous frame inner to this frame (corrupt stack?)
>>> (kgdb)
>>> 
>>> The super-smack is compiled from port benchmarks/super-smack with option 
>>> WITH_POSTGRESQL enabled.
>>> 
>>> The Postgresql is in a jail environment, packages are
>>> postgresql-client-8.2.4
>>> postgresql-server-8.2.4_1
>>> 
>>> The crash occurred only once, and I can not reproduce it after rebooting.
>>> 
>>> Best regards,
>>> 
>>> Chris Chou
>>> _______________________________________________
>>> freebsd-stable@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>>> 
>> 
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071028145040.C32129>