From owner-freebsd-current@FreeBSD.ORG Mon Dec 30 16:34:22 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7FB99D99; Mon, 30 Dec 2013 16:34:22 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 05FA81A3C; Mon, 30 Dec 2013 16:34:21 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rBUGYCuV099413; Mon, 30 Dec 2013 18:34:12 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rBUGYCuV099413 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rBUGYCUa099411; Mon, 30 Dec 2013 18:34:12 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 30 Dec 2013 18:34:12 +0200 From: Konstantin Belousov To: Mark Johnston Subject: Re: smp_rendezvous_cpus() deadlock Message-ID: <20131230163412.GA59496@kib.kiev.ua> References: <20131229213618.GA4990@charmander.home> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ys5LA81KmMchncP6" Content-Disposition: inline In-Reply-To: <20131229213618.GA4990@charmander.home> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Dec 2013 16:34:22 -0000 --ys5LA81KmMchncP6 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Dec 29, 2013 at 04:36:18PM -0500, Mark Johnston wrote: > Hello, >=20 > While experimenting with some userland DTrace scripts, I seem to > be consistently able to trigger a deadlock between smp_rendezvous_cpus() > (called periodically by DTrace) and smp_targeted_tlb_shootdown(): >=20 > spin lock 0xffffffff80fe0620 (smp rendezvous) held by 0xfffff8000753b490 = (tid 100059) too long > panic: spin lock held too long > [...] > (gdb) bt > #0 doadump (textdump=3D1) at pcpu.h:219 > #1 0xffffffff806387c7 in kern_reboot (howto=3D260) at /usr/home/markj/sr= c/freebsd/sys/kern/kern_shutdown.c:452 > #2 0xffffffff80638cd5 in vpanic (fmt=3D, ap=3D) at /usr/home/markj/src/freebsd/sys/kern/kern_shutdown.c:7= 59 > #3 0xffffffff80638d23 in panic (fmt=3D) at /usr/hom= e/markj/src/freebsd/sys/kern/kern_shutdown.c:688 > #4 0xffffffff80624b68 in _mtx_lock_spin_cookie (c=3D, tid=3D, opts=3D, file=3D, line=3D) > at /usr/home/markj/src/freebsd/sys/kern/kern_mutex.c:551 > #5 0xffffffff80624878 in __mtx_lock_spin_flags (c=3D, opts=3D0, file=3D0xffffffff80a1ca28 "/usr/home/markj/src/freebsd/sys/ker= n/subr_smp.c", line=3D498) at /usr/home/markj/src/freebsd/sys/kern/kern_mut= ex.c:279 > #6 0xffffffff8067eba3 in smp_rendezvous_cpus (setup_func=3D0xffffffff806= 7eae0 , action_func=3D0xffffffff814e2d00 , teardown_func=3D0xffffffff8067eae0 ,= =20 > arg=3D0x0) at /usr/home/markj/src/freebsd/sys/kern/subr_smp.c:498 > #7 0xffffffff814d5743 in dtrace_state_deadman (arg=3D0xfffff80007ee5c00)= at /usr/home/markj/src/freebsd/sys/modules/dtrace/dtrace/../../../cddl/con= trib/opensolaris/uts/common/dtrace/dtrace.c:13144 > #8 0xffffffff8064cf38 in softclock_call_cc (c=3D0xfffff80007ee5d40, cc= =3D0xffffffff80fda080, direct=3D0) at /usr/home/markj/src/freebsd/sys/kern/= kern_timeout.c:681 > #9 0xffffffff8064d2b7 in softclock (arg=3D) at /usr= /home/markj/src/freebsd/sys/kern/kern_timeout.c:809 > #10 0xffffffff8060a053 in intr_event_execute_handlers (p=3D, ie=3D0xfffff80002958d00) at /usr/home/markj/src/freebsd/sys/kern/k= ern_intr.c:1263 > #11 0xffffffff8060aa26 in ithread_loop (arg=3D0xfffff80002999f60) at /usr= /home/markj/src/freebsd/sys/kern/kern_intr.c:1276 > #12 0xffffffff806071a4 in fork_exit (callout=3D0xffffffff8060a980 , arg=3D0xfffff80002999f60, frame=3D0xfffffe0113b99ac0) at /usr/home= /markj/src/freebsd/sys/kern/kern_fork.c:977 > #13 0xffffffff808d7fce in fork_trampoline () at /usr/home/markj/src/freeb= sd/sys/amd64/amd64/exception.S:605 >=20 > (kgdb) tid 100059 > [Switching to thread 67 (Thread 100059)]#0 0xffffffff808e1f08 in cpustop= _handler () at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1432 > 1432 savectx(&stoppcbs[cpu]); > (kgdb) bt > #0 0xffffffff808e1f08 in cpustop_handler () at /usr/home/markj/src/freeb= sd/sys/amd64/amd64/mp_machdep.c:1432 > #1 0xffffffff808e1ecf in ipi_nmi_handler () at /usr/home/markj/src/freeb= sd/sys/amd64/amd64/mp_machdep.c:1417 > #2 0xffffffff808f1e02 in trap (frame=3D0xfffffe0113b68f30) at /usr/home/= markj/src/freebsd/sys/amd64/amd64/trap.c:208 > #3 0xffffffff808d7ed3 in nmi_calltrap () at /usr/home/markj/src/freebsd/= sys/amd64/amd64/exception.S:504 > #4 0xffffffff808e1b39 in smp_targeted_tlb_shootdown (mask=3D{__bits =3D = {0}}, vector=3D, pmap=3D, addr1= =3D, addr2=3D) > at /usr/home/markj/src/freebsd/sys/amd64/amd64/mp_machdep.c:1204 > #5 0xffffffff808e2f25 in pmap_invalidate_page (pmap=3D, va=3D) at /usr/home/markj/src/freebsd/sys/amd64/a= md64/pmap.c:1375 > #6 0xffffffff808ec3d5 in pmap_ts_referenced (m=3D0xfffff800bcfc78b8) at = /usr/home/markj/src/freebsd/sys/amd64/amd64/pmap.c:5743 > #7 0xffffffff808c8953 in vm_pageout () at /usr/home/markj/src/freebsd/sy= s/vm/vm_pageout.c:1366 > #8 0xffffffff806071a4 in fork_exit (callout=3D0xffffffff808c7930 , arg=3D0x0, frame=3D0xfffffe011bfabac0) at /usr/home/markj/src/freebs= d/sys/kern/kern_fork.c:977 > #9 0xffffffff808d7fce in fork_trampoline () at /usr/home/markj/src/freeb= sd/sys/amd64/amd64/exception.S:605 >=20 > Indeed, there is a comment above the definition of smp_ipi_mtx in > subr_smp.c to the effect that a deadlock can occur if, say, the target > CPU of smp_targeted_tlb_shootdown() is spinning on smp_ipi_mtx. Is there > any reason that this deadlock doesn't happen more often in practice? Is > it possible to spin on smp_ipi_mtx without disabling interrupts, as that > doesn't seem to be necessary in this case? IMO, what wrong there is that smp_rendezvous_cpus() called from the wrong context. As you noted yourself, the interrupts are disabled in the caller, and doing this operation in the interrupt context is not correct. Note that smp_tlb_shootdown() and smp_targeted_tlb_shootdown() both assert that interrupts are enabled. IMO similar assert would be useful for mtx_lock_spin(&smp_ipi_mtx), but adding it is somewhat in non-ugly way seems to be not trivial. Might be, a flag for mtx_init() that forces the check for given mutex, but again, there is no MI primitive to assert that local interrupts are enabled on the CPU. --ys5LA81KmMchncP6 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSwaCDAAoJEJDCuSvBvK1BBoMP/3ji+jzHp/U5+MuzaAzNYlRt GF4EZgIOYmi3J4cqmPZvgk+KJy9xvVItLQvqMwzMU+GpNaeZwZDWdoKeE4d7BE2C gY6Oj4YyA7oB9c1xIMUu9ZWjW6iGeyfXGn2dL/Y4q5QbL1V3IFF1X67VwYPPc8VT La0U3vQvXfb6MA9716gclAnSOXroPJL1rH6+gzB9E23f32CSA1PvuHfS3NIwjej5 zWvwDOjGD9olCzRMBcVFQCriyYdFEuCsIXNtAcQURHsGkgr86h7HBmLAbNYFQH6c M/y9yzI/lf9l6qfxVsbFozT9Wi+qu8/Gk3BHXphm0xfA66o3dQGa0kyEfF5md3e8 gE93xxr6rbLR4YLq/dLzl5PWvoqzU7a/Wjd/Hj3gasQCSSbQJsUnq719YuYouuQi XUDTln+BswrYQ/5armjwYexlqeuEFTXILZwO8O+twBUi90Mb7sicR/iP+yt3GVnf q8/20dVuuG/uJbIZx+zo/SqZv55fKX+zZheuDSGD1kWNeVdaCrWlLDjl7Z2b3wjG GpySglUS91vjj4+XRPyWZnEfCrrx5lE0VV5Eg/89ZVApSa5Io1vfhub/aamym9kV IrEL+YBzIY4MwbteKMHwBaehlHO7GsKP7i9szARe5wzCnvK8YiNj77/gGwAqiMP2 CiAw91v560P77pxkU979 =Uwpt -----END PGP SIGNATURE----- --ys5LA81KmMchncP6--