Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Dec 2013 18:34:12 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Mark Johnston <markj@freebsd.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: smp_rendezvous_cpus() deadlock
Message-ID:  <20131230163412.GA59496@kib.kiev.ua>
In-Reply-To: <20131229213618.GA4990@charmander.home>
References:  <20131229213618.GA4990@charmander.home>

next in thread | previous in thread | raw e-mail | index | archive | help

--ys5LA81KmMchncP6
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Dec 29, 2013 at 04:36:18PM -0500, Mark Johnston wrote:
> Hello,
>=20
> While experimenting with some userland DTrace scripts, I seem to
> be consistently able to trigger a deadlock between smp_rendezvous_cpus()
> (called periodically by DTrace) and smp_targeted_tlb_shootdown():
>=20
> spin lock 0xffffffff80fe0620 (smp rendezvous) held by 0xfffff8000753b490 =
(tid 100059) too long
> panic: spin lock held too long
> [...]
> (gdb) bt
> #0  doadump (textdump=3D1) at pcpu.h:219
> #1  0xffffffff806387c7 in kern_reboot (howto=3D260) at /usr/home/markj/sr=
c/freebsd/sys/kern/kern_shutdown.c:452
> #2  0xffffffff80638cd5 in vpanic (fmt=3D<value optimized out>, ap=3D<valu=
e optimized out>) at /usr/home/markj/src/freebsd/sys/kern/kern_shutdown.c:7=
59
> #3  0xffffffff80638d23 in panic (fmt=3D<value optimized out>) at /usr/hom=
e/markj/src/freebsd/sys/kern/kern_shutdown.c:688
> #4  0xffffffff80624b68 in _mtx_lock_spin_cookie (c=3D<value optimized out=
>, tid=3D<value optimized out>, opts=3D<value optimized out>, file=3D<value=
 optimized out>, line=3D<value optimized out>)
>     at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:551
> #5  0xffffffff80624878 in __mtx_lock_spin_flags (c=3D<value optimized out=
>, opts=3D0, file=3D0xffffffff80a1ca28 "/usr/home/markj/src/freebsd/sys/ker=
n/subr_smp.c", line=3D498) at /usr/home/markj/src/freebsd/sys/kern/kern_mut=
ex.c:279
> #6  0xffffffff8067eba3 in smp_rendezvous_cpus (setup_func=3D0xffffffff806=
7eae0 <smp_no_rendevous_barrier>, action_func=3D0xffffffff814e2d00 <dtrace_=
sync_func>, teardown_func=3D0xffffffff8067eae0 <smp_no_rendevous_barrier>,=
=20
>         arg=3D0x0) at /usr/home/markj/src/freebsd/sys/kern/subr_smp.c:498
> #7  0xffffffff814d5743 in dtrace_state_deadman (arg=3D0xfffff80007ee5c00)=
 at /usr/home/markj/src/freebsd/sys/modules/dtrace/dtrace/../../../cddl/con=
trib/opensolaris/uts/common/dtrace/dtrace.c:13144
> #8  0xffffffff8064cf38 in softclock_call_cc (c=3D0xfffff80007ee5d40, cc=
=3D0xffffffff80fda080, direct=3D0) at /usr/home/markj/src/freebsd/sys/kern/=
kern_timeout.c:681
> #9  0xffffffff8064d2b7 in softclock (arg=3D<value optimized out>) at /usr=
/home/markj/src/freebsd/sys/kern/kern_timeout.c:809
> #10 0xffffffff8060a053 in intr_event_execute_handlers (p=3D<value optimiz=
ed out>, ie=3D0xfffff80002958d00) at /usr/home/markj/src/freebsd/sys/kern/k=
ern_intr.c:1263
> #11 0xffffffff8060aa26 in ithread_loop (arg=3D0xfffff80002999f60) at /usr=
/home/markj/src/freebsd/sys/kern/kern_intr.c:1276
> #12 0xffffffff806071a4 in fork_exit (callout=3D0xffffffff8060a980 <ithrea=
d_loop>, arg=3D0xfffff80002999f60, frame=3D0xfffffe0113b99ac0) at /usr/home=
/markj/src/freebsd/sys/kern/kern_fork.c:977
> #13 0xffffffff808d7fce in fork_trampoline () at /usr/home/markj/src/freeb=
sd/sys/amd64/amd64/exception.S:605
>=20
> (kgdb) tid 100059
> [Switching to thread 67 (Thread 100059)]#0  0xffffffff808e1f08 in cpustop=
_handler () at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1432
> 1432            savectx(&stoppcbs[cpu]);
> (kgdb) bt
> #0  0xffffffff808e1f08 in cpustop_handler () at /usr/home/markj/src/freeb=
sd/sys/amd64/amd64/mp_machdep.c:1432
> #1  0xffffffff808e1ecf in ipi_nmi_handler () at /usr/home/markj/src/freeb=
sd/sys/amd64/amd64/mp_machdep.c:1417
> #2  0xffffffff808f1e02 in trap (frame=3D0xfffffe0113b68f30) at /usr/home/=
markj/src/freebsd/sys/amd64/amd64/trap.c:208
> #3  0xffffffff808d7ed3 in nmi_calltrap () at /usr/home/markj/src/freebsd/=
sys/amd64/amd64/exception.S:504
> #4  0xffffffff808e1b39 in smp_targeted_tlb_shootdown (mask=3D{__bits =3D =
{0}}, vector=3D<value optimized out>, pmap=3D<value optimized out>, addr1=
=3D<value optimized out>, addr2=3D<value optimized out>)
>     at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1204
> #5  0xffffffff808e2f25 in pmap_invalidate_page (pmap=3D<value optimized o=
ut>, va=3D<value optimized out>) at /usr/home/markj/src/freebsd/sys/amd64/a=
md64/pmap.c:1375
> #6  0xffffffff808ec3d5 in pmap_ts_referenced (m=3D0xfffff800bcfc78b8) at =
/usr/home/markj/src/freebsd/sys/amd64/amd64/pmap.c:5743
> #7  0xffffffff808c8953 in vm_pageout () at /usr/home/markj/src/freebsd/sy=
s/vm/vm_pageout.c:1366
> #8  0xffffffff806071a4 in fork_exit (callout=3D0xffffffff808c7930 <vm_pag=
eout>, arg=3D0x0, frame=3D0xfffffe011bfabac0) at /usr/home/markj/src/freebs=
d/sys/kern/kern_fork.c:977
> #9  0xffffffff808d7fce in fork_trampoline () at /usr/home/markj/src/freeb=
sd/sys/amd64/amd64/exception.S:605
>=20
> Indeed, there is a comment above the definition of smp_ipi_mtx in
> subr_smp.c to the effect that a deadlock can occur if, say, the target
> CPU of smp_targeted_tlb_shootdown() is spinning on smp_ipi_mtx. Is there
> any reason that this deadlock doesn't happen more often in practice? Is
> it possible to spin on smp_ipi_mtx without disabling interrupts, as that
> doesn't seem to be necessary in this case?

IMO, what wrong there is that smp_rendezvous_cpus() called from the
wrong context.  As you noted yourself, the interrupts are disabled
in the caller, and doing this operation in the interrupt context is
not correct.

Note that smp_tlb_shootdown() and smp_targeted_tlb_shootdown() both
assert that interrupts are enabled. IMO similar assert would be useful
for mtx_lock_spin(&smp_ipi_mtx), but adding it is somewhat in non-ugly
way seems to be not trivial.  Might be, a flag for mtx_init() that
forces the check for given mutex, but again, there is no MI primitive
to assert that local interrupts are enabled on the CPU.

--ys5LA81KmMchncP6
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJSwaCDAAoJEJDCuSvBvK1BBoMP/3ji+jzHp/U5+MuzaAzNYlRt
GF4EZgIOYmi3J4cqmPZvgk+KJy9xvVItLQvqMwzMU+GpNaeZwZDWdoKeE4d7BE2C
gY6Oj4YyA7oB9c1xIMUu9ZWjW6iGeyfXGn2dL/Y4q5QbL1V3IFF1X67VwYPPc8VT
La0U3vQvXfb6MA9716gclAnSOXroPJL1rH6+gzB9E23f32CSA1PvuHfS3NIwjej5
zWvwDOjGD9olCzRMBcVFQCriyYdFEuCsIXNtAcQURHsGkgr86h7HBmLAbNYFQH6c
M/y9yzI/lf9l6qfxVsbFozT9Wi+qu8/Gk3BHXphm0xfA66o3dQGa0kyEfF5md3e8
gE93xxr6rbLR4YLq/dLzl5PWvoqzU7a/Wjd/Hj3gasQCSSbQJsUnq719YuYouuQi
XUDTln+BswrYQ/5armjwYexlqeuEFTXILZwO8O+twBUi90Mb7sicR/iP+yt3GVnf
q8/20dVuuG/uJbIZx+zo/SqZv55fKX+zZheuDSGD1kWNeVdaCrWlLDjl7Z2b3wjG
GpySglUS91vjj4+XRPyWZnEfCrrx5lE0VV5Eg/89ZVApSa5Io1vfhub/aamym9kV
IrEL+YBzIY4MwbteKMHwBaehlHO7GsKP7i9szARe5wzCnvK8YiNj77/gGwAqiMP2
CiAw91v560P77pxkU979
=Uwpt
-----END PGP SIGNATURE-----

--ys5LA81KmMchncP6--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131230163412.GA59496>