Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 Jan 2012 21:57:35 +0400
From:      Gleb Smirnoff <glebius@FreeBSD.org>
To:        mdf@FreeBSD.org
Cc:        current@FreeBSD.org
Subject:   Re: new panic in cpu_reset() with WITNESS
Message-ID:  <20120117175735.GJ12760@FreeBSD.org>
In-Reply-To: <CAMBSHm_iuFwV5Hm7ArzzQbyf1mKgjmC=FGkhqGMdYgCzuJcZZg@mail.gmail.com>
References:  <20120117110242.GD12760@glebius.int.ru> <CAMBSHm_iuFwV5Hm7ArzzQbyf1mKgjmC=FGkhqGMdYgCzuJcZZg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jan 17, 2012 at 07:34:23AM -0800, mdf@freebsd.org wrote:
m> 2012/1/17 Gleb Smirnoff <glebius@freebsd.org>:
m> > šNew panic has been introduced somewhere between
m> > r229851 and r229932, that happens on shutdown if
m> > kernel has WITNESS and doesn't have WITNESS_SKIPSPIN.
m> >
m> > Uptime: 1h0m17s
m> > Rebooting...
m> > panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500
m> > cpuid = 0
m> > KDB: enter: panic
m> > [ thread pid 1 tid 100001 ]
m> > Stopped at š š škdb_enter+0x3b: movq š š$0,0x514d32(%rip)
m> > db>
m> > db> bt
m> > Tracing pid 1 tid 100001 td 0xfffffe0001d5e000
m> > kdb_enter() at kdb_enter+0x3b
m> > panic() at panic+0x1c7
m> > _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
m> > cnputs() at cnputs+0x7a
m> > putchar() at putchar+0x11f
m> > kvprintf() at kvprintf+0x83
m> > vprintf() at vprintf+0x85
m> > printf() at printf+0x67
m> > witness_checkorder() at witness_checkorder+0x773
m> > _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99
m> > uart_cnputc() at uart_cnputc+0x3e
m> > cnputc() at cnputc+0x4c
m> > cnputs() at cnputs+0x26
m> > putchar() at putchar+0x11f
m> > kvprintf() at kvprintf+0x83
m> > vprintf() at vprintf+0x85
m> > printf() at printf+0x67
m> > cpu_reset() at cpu_reset+0x81
m> > kern_reboot() at kern_reboot+0x3a5
m> > --More--^M š š š š^Msys_reboot() at sys_reboot+0x42
m> > amd64_syscall() at amd64_syscall+0x39e
m> > Xfast_syscall() at Xfast_syscall+0xf7
m> > --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 0x7fffffffd6d8, rbp = 0x49 ---
m> > db>
m> > db> show locks
m> > exclusive sleep mutex Giant (Giant) r = 0 (0xffffffff809bc560) locked @ /usr/src/head/sys/kern/kern_module.c:101
m> > exclusive spin mutex smp rendezvous (smp rendezvous) r = 0 (0xffffffff80a08840) locked @ /usr/src/head/sys/kern/kern_shutdown.c:542
m> > db>
m> >
m> > So the problem is that we are holding smp rendezvous mutex during the cpu_reset().
m> > No mutexes should be obtained after it. However, since cpu_reset() does priting
m> > we obtain cnputs_mtx, and later obtain uart_hwmtx. The latter is hardcoded in
m> > the subr_witness.c as mutex to obtain before smp rendezvous, this triggers
m> > yet another printf from witness, that finally panics due to recursing on
m> > cnputs_mtx.
m> 
m> At $WORK we explicitly marked cnputs_mtx as NO_WITNESS since it didn't
m> seem possible to fit it into the heirarchy in any sane way, since a
m> print can come from basically anywhere.
m> 
m> If anyone has a better fix, that'd be great, but I haven't been able
m> to think of one.

Setting NO_WITNESS on cnputs_mtx won't help for the above problem, IMHO.

-- 
Totus tuus, Glebius.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120117175735.GJ12760>