From owner-freebsd-current@FreeBSD.ORG Tue Aug 31 04:27:08 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1A9AE16A4CE for ; Tue, 31 Aug 2004 04:27:08 +0000 (GMT) Received: from mx2.synetsystems.com (mx2.synetsystems.com [216.226.140.79]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6551443D1D for ; Tue, 31 Aug 2004 04:27:07 +0000 (GMT) (envelope-from rmtodd@servalan.servalan.com) Received: by mx2.synetsystems.com (Postfix, from userid 66) id 8B01A3C9; Tue, 31 Aug 2004 00:27:06 -0400 (EDT) Received: from rmtodd by servalan.servalan.com with local (Exim 4.41 (FreeBSD)) id 1C1zlS-00039w-D6; Mon, 30 Aug 2004 22:56:58 -0500 To: freebsd-current@freebsd.org Orig-To: Richard Todd References: <20040820211618.62A1C7E4@mx2.synetsystems.com> Message-Id: From: Richard Todd Date: Mon, 30 Aug 2004 22:56:58 -0500 Subject: Re: Panic 'kernel trap doesn't have ucred' in last night's -current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Aug 2004 04:27:08 -0000 In the freebsd-current mailing list I wrote last week: >Hi. Upgraded to -current last night and got the following panic. The panic >seems to be fairly repeatable and is triggered by a minute or so's worth of >database activity with mysqld. The version of mysql is 3.23.58_1 from ports, >linked against libpthread; using libmap.conf to force usage of libc_r avoids >the panic, so it's definitely a thread/KSE issue. (Also note the oddness >of the backtrace in the frames between doreti_ast and sched_switch -- is this >normal, is the stack mangled, or is gdb just hallucinating?) My kernel config I've had a chance to do a bit more followup on this, and have found the following: 1) The bug is still there in -current as of last night. 2) I can fairly reliably reproduce the bug by starting mysqld (with it using libpthread.so) and start several mysqldumps of the same database as near simultaneously as possible. Got a coredump from this most recent attempt, about which more below. 3) Yeah, apparently gdb was a little confused w.r.t. the stack backtrace, as a backtrace from DDB on the console at the time of the panic looks different, and more plausible: the DDB backtrace shows that thread_alloc_spare() called crhold() (which it does), whereas the gdb backtrace is under the delusion that thread_alloc_spare() calls thr_suspend(). 4) Looking at the (sane) DDB backtrace as well as poking around in upper frames, the final sequence of events leading to trouble is now clearer: 1. ast calls thread_user_enter(). The thread passed to thread_user_enter, thread 0xc28c39a0, has a NULL td_ucred pointer. (Why? Good question.) 2. thread_user_enter() calls thread_alloc_spare(). 3. thread_alloc_spare() does its business, creating another thread, and at the end tries to give the new thread a reference to the orig. thread's cred by calling crhold(). Unfortunately, the orig. thread's td_ucred is NULL. 4. crhold() cheerfully takes the NULL pointer offered to it and tries to lock the mutex pointed to by the cred structure at 0x00000000. This causes an immediate segfault, hence the trap 12. 5. trap() goes about its business, notices the current thread has no td_ucred, and panics. So the real question is how come that thread had a NULL td_ucred when it got there? Wish I knew. If it'll help anyone figure this out, I had ktr(4) enabled and logging a bunch of stuff (KTR_GEN|KTR_LOCK|KTR_PROC) and can supply a ktrdump from that core file; I didn't include it here because it's rather long. Script started on Mon Aug 30 22:01:46 2004 ichotolot# gdb6 -k kernel.debug /var/crash/vmcore.106 GNU gdb 20040803 [GDB v6.x for FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-portbld-freebsd6.0"... panic: kernel trap doesn't have ucred panic messages: --- panic: kernel trap doesn't have ucred cpuid = 0 KDB: stack backtrace: kdb_backtrace(100,c28c36e0,c,c28c36e0,0) at kdb_backtrace+0x29 panic(c083022d,6c,0,0,df512c9c) at panic+0x114 trap(c0600018,c0970010,10,c28c36e0,c28c36e0) at trap+0x2f0 calltrap() at calltrap+0x5 --- trap 0xc, eip = 0xc060e8e3, esp = 0xdf512cbc, ebp = 0xdf512ccc --- crhold(0,c28c39e4,c4,c2886e40,c28c36e0) at crhold+0x13 thread_alloc_spare(c28c36e0) at thread_alloc_spare+0x3b thread_user_enter(c25e2c40,c28c36e0) at thread_user_enter+0x63 ast(df512d48) at ast+0x94 doreti_ast() at doreti_ast+0x17 boot() called on cpu#0 Uptime: 1h22m58s Dumping 638 MB [CTRL-C to abort] 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 --- #0 doadump () at pcpu.h:159 159 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); doadump () at pcpu.h:159 159 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:159 #1 0xc06115dc in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:384 #2 0xc06118f5 in panic (fmt=0xc083022d "kernel trap doesn't have ucred") at /usr/src/sys/kern/kern_shutdown.c:540 #3 0xc07a187c in trap (frame= {tf_fs = -1067450344, tf_es = -1063845872, tf_ds = 16, tf_edi = -1030998304, tf_esi = -1030998304, tf_ebp = -548328244, tf_isp = -548328280, tf_ebx = 0, tf_edx = 4, tf_ecx = 0, tf_eax = -1034015680, tf_trapno = 12, tf_err = 0, tf_eip = -1067390749, tf_cs = 8, tf_eflags = 66118, tf_esp = 0, tf_ss = -1065293200}) at /usr/src/sys/i386/i386/trap.c:413 #4 0xc0790bda in calltrap () at /usr/src/sys/i386/i386/exception.s:140 #5 0xc0600018 in thr_suspend (td=0xc28c36e0, uap=0x0) at /usr/src/sys/kern/kern_thr.c:282 #6 0xc060189f in thread_alloc_spare (td=0x6cb) at /usr/src/sys/kern/kern_kse.c:1021 #7 0xc0601bbf in thread_user_enter (p=0x0, td=0xc28c39a0) at /usr/src/sys/kern/kern_kse.c:1170 #8 0xc062f640 in ast (framep=0xdf512d48) at /usr/src/sys/kern/subr_trap.c:166 #9 0xc079154d in doreti_ast () at /usr/src/sys/i386/i386/exception.s:294 #10 0xdf512d48 in ?? () #11 0x0000002f in ?? () #12 0x0000002f in ?? () #13 0x0000002f in ?? () #14 0x0b13f400 in ?? () #15 0x08340000 in ?? () #16 0xbfbfe3d8 in ?? () #17 0xdf512d74 in ?? () #18 0x2033d81c in ?? () #19 0x0b108a00 in ?? () #20 0x0833a900 in ?? () #21 0x00000000 in ?? () #22 0x00000016 in ?? () #23 0x00000002 in ?? () #24 0x20339d2f in ?? () #25 0x0000001f in ?? () #26 0x00000202 in ?? () #27 0xbfbfe3ac in ?? () #28 0x0000002f in ?? () #29 0x00000000 in ?? () #30 0x00000000 in ?? () #31 0x00000000 in ?? () #32 0x00000000 in ?? () #33 0x1ef8e000 in ?? () #34 0xc25e4c60 in ?? () #35 0xc1ca05f0 in ?? () #36 0xdf512c9c in ?? () #37 0xdf512c84 in ?? () #38 0xc28c36e0 in ?? () #39 0xc062141b in sched_switch (td=0x2033d81c, newtd=0xb13f400) ---Type to continue, or q to quit---q at /usr/src/sys/Quit (kgdb) fr 7 #7 0xc0601bbf in thread_user_enter (p=0x0, td=0xc28c39a0) at /usr/src/sys/kern/kern_kse.c:1170 1170 thread_alloc_spare(td); (kgdb) p td $1 = (struct thread *) 0xc28c39a0 (kgdb) p *td $2 = {td_proc = 0xc25e2c40, td_ksegrp = 0x0, td_plist = { tqe_next = 0xc2778420, tqe_prev = 0xc28c32c8}, td_kglist = { tqe_next = 0x0, tqe_prev = 0xc28c32d0}, td_slpq = {tqe_next = 0xc200ab00, tqe_prev = 0xc200a178}, td_lockq = {tqe_next = 0x0, tqe_prev = 0x0}, td_runq = {tqe_next = 0x0, tqe_prev = 0xc08726cc}, td_selq = { tqh_first = 0x0, tqh_last = 0xc28c39d0}, td_sleepqueue = 0xc1fbace0, td_turnstile = 0xc23c0bc0, td_tid = 100205, td_flags = 0, td_inhibitors = 0, td_pflags = 0, td_last_kse = 0x0, td_kse = 0x0, td_dupfd = 0, td_wchan = 0x0, td_wmesg = 0x0, td_lastcpu = 0 '\0', td_oncpu = 0 '\0', td_locks = 0, td_blocked = 0x0, td_ithd = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 0, td_mailbox = 0x0, td_ucred = 0x0, td_standin = 0x0, td_prticks = 0, td_upcall = 0x0, td_sticks = 0, td_uuticks = 0, td_usticks = 0, td_intrval = 0, td_oldsigmask = {__bits = { 0, 0, 0, 0}}, td_sigmask = {__bits = {0, 0, 0, 0}}, td_siglist = { __bits = {0, 0, 0, 0}}, td_waitset = 0x0, td_umtx = {tqe_next = 0x0, tqe_prev = 0x0}, td_generation = 0, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 0}, td_kflags = 0, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0, td_base_pri = 160 ' ', td_priority = 160 ' ', td_pcb = 0xdf518da0, td_state = TDS_INACTIVE, td_retval = {1, 137600624}, td_slpcallout = {c_links = {sle = {sle_next = 0xc28c33dc}, tqe = { tqe_next = 0xc28c33dc, tqe_prev = 0xce9ed370}}, c_time = 500970, c_arg = 0xc28c39a0, c_func = 0, c_flags = 8}, td_frame = 0xdf518d48, td_kstack_obj = 0xc2895948, td_kstack = 3746656256, td_kstack_pages = 2, td_altkstack_obj = 0x0, td_altkstack = 0, td_altkstack_pages = 0, td_critnest = 1, td_md = {md_savecrit = 582}, td_sched = 0xc28c3afc} (kgdb) p td->td_ucred $3 = (struct ucred *) 0x0 (kgdb) q ichotolot# exit exit Script done on Mon Aug 30 22:04:29 2004