From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 19:11:06 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 68F4F106564A for ; Thu, 16 Sep 2010 19:11:06 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id F3B758FC17 for ; Thu, 16 Sep 2010 19:11:05 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o8GJAvVF028136 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 16 Sep 2010 22:10:57 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o8GJAvVR012684; Thu, 16 Sep 2010 22:10:57 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o8GJAv9U012683; Thu, 16 Sep 2010 22:10:57 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 16 Sep 2010 22:10:57 +0300 From: Kostik Belousov To: Matthew Jacob Message-ID: <20100916191057.GF2389@deviant.kiev.zoral.com.ua> References: <4C92694D.1070705@feral.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="8TaQrIeukR7mmbKf" Content-Disposition: inline In-Reply-To: <4C92694D.1070705@feral.com> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_50, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-hackers@freebsd.org Subject: Re: race conditions for destroying and opening a dev X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 19:11:06 -0000 --8TaQrIeukR7mmbKf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Sep 16, 2010 at 12:00:29PM -0700, Matthew Jacob wrote: >=20 > Has anyone seen this scenario before? I am seeing it in RELENG_7, but=20 > the code in question exists through to head. >=20 > Thread 1: >=20 > (kgdb) where > #0 sched_switch (td=3D0xffffff003a04ea80, newtd=3D0xffffff00210b4000,=20 > flags=3DVariable "flags" is not available. > ) at ../../../kern/sched_ule.c:1944 > #1 0xffffffff803b6091 in mi_switch (flags=3D1, newtd=3D0x0) at=20 > ../../../kern/kern_synch.c:450 > #2 0xffffffff80402399 in sleepq_switch (wchan=3D0xffffff8413b50b60) at= =20 > ../../../kern/subr_sleepqueue.c:497 > #3 0xffffffff80402e8c in sleepq_timedwait (wchan=3D0xffffff8413b50b60) a= t=20 > ../../../kern/subr_sleepqueue.c:615 > #4 0xffffffff803b682d in _sleep (ident=3D0xffffff8413b50b60,=20 > lock=3D0xffffffff80b0ee00, priority=3D76, wmesg=3D0xffffffff806583bb "dev= drn",=20 > timo=3D100) at ../../../kern/kern_synch.c:228 > #5 0xffffffff8037640c in destroy_devl (dev=3D0xffffff003aaf0000) at=20 > ../../../kern/kern_conf.c:874 > #6 0xffffffff80376759 in destroy_dev (dev=3D0xffffff003aaf0000) at=20 > ../../../kern/kern_conf.c:916 > #7 0xffffffff8034c939 in g_dev_orphan (cp=3D0xffffff003a544800) at=20 > ../../../geom/geom_dev.c:438 > #8 0xffffffff803506a0 in g_run_events () at ../../../geom/geom_event.c:1= 64 > #9 0xffffffff80351f1c in g_event_procbody () at=20 > ../../../geom/geom_kern.c:141 > #10 0xffffffff8038a73a in fork_exit (callout=3D0xffffffff80351eb0=20 > , arg=3D0x0,=20 > frame=3D0xffffff8413b50c80) at ../../../kern/kern_fork.c:829 > #11 0xffffffff805a747e in fork_trampoline () at=20 > ../../../amd64/amd64/exception.S:564 > #12 0x0000000000000000 in ?? () >=20 > This thread is waiting on the threadcount to go away- i.e., the last=20 > close of the device to occur ("da16" in this case). >=20 > Thread 2: >=20 > (kgdb) where > #0 sched_switch (td=3D0xffffff009bb4ca80, newtd=3D0xffffff003af43380,=20 > flags=3DVariable "flags" is not available. > ) at ../../../kern/sched_ule.c:1944 > #1 0xffffffff803b6091 in mi_switch (flags=3D1, newtd=3D0x0) at=20 > ../../../kern/kern_synch.c:450 > #2 0xffffffff80402399 in sleepq_switch (wchan=3D0xffffffff80b0e040) at= =20 > ../../../kern/subr_sleepqueue.c:497 > #3 0xffffffff80402f84 in sleepq_wait (wchan=3D0xffffffff80b0e040) at=20 > ../../../kern/subr_sleepqueue.c:580 > #4 0xffffffff803b5385 in _sx_xlock_hard (sx=3D0xffffffff80b0e040,=20 > tid=3D18446742976810240640, opts=3DVariable "opts" is not available. > ) at ../../../kern/kern_sx.c:562 > #5 0xffffffff803b5731 in _sx_xlock (sx=3D0xffffffff80b0e040, opts=3D0,= =20 > file=3D0xffffffff80652d27 "../../../geom/geom_dev.c", line=3D196) at sx.h= :154 > #6 0xffffffff8034d1bc in g_dev_open (dev=3D0xffffff003aaf0000, flags=3D1= ,=20 > fmt=3DVariable "fmt" is not available. > ) at ../../../geom/geom_dev.c:196 > #7 0xffffffff80333741 in devfs_open (ap=3D0xffffff841dea88b0) at=20 > ../../../fs/devfs/devfs_vnops.c:902 > #8 0xffffffff80601daf in VOP_OPEN_APV (vop=3D0xffffffff8089fb80,=20 > a=3D0xffffff841dea88b0) at vnode_if.c:371 > #9 0xffffffff80467246 in vn_open_cred (ndp=3D0xffffff841dea8a00,=20 > flagp=3D0xffffff841dea894c, cmode=3DVariable "cmode" is not available. > ) at vnode_if.h:199 > #10 0xffffffff80463770 in kern_open (td=3D0xffffff009bb4ca80,=20 > path=3D0x5114a0
, pathseg=3DVariable=20 > "pathseg" is not available. > ) at ../../../kern/vfs_syscalls.c:1054 > #11 0xffffffff805c599e in syscall (frame=3D0xffffff841dea8c80) at=20 > ../../../amd64/amd64/trap.c:911 > #12 0xffffffff805a723b in Xfast_syscall () at=20 > ../../../amd64/amd64/exception.S:349 > #13 0x00000008009a219c in ?? () >=20 > This thread was opening the device, bumped the refcount, but then wedged= =20 > on the geom topology lock ..... >=20 > the refcount field is protected under devmtx.... >=20 > Anyone seen this? >=20 > I'm half inclined to either add in CDP_SCHED_DTR when one calls=20 > destroy_dev, or make dev_refthread look at CDP_ACTIVE, leaning more=20 > toward the latter. >=20 > Any thoughts on this? And who owns the topology lock ? Is it thread 1 ? Destroy_devl() clears si_devsw for departing cdev, and *refthread() checks si_devsw against NULL as an indicator of device destruction in progress. I think that this situation is what destroy_dev_sched(9) was created for. --8TaQrIeukR7mmbKf Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkySa8AACgkQC3+MBN1Mb4jKNwCgv30TrKYWhEeXq1KmjAP516a4 AxAAoKkXX9pQeQkkTIxWtC0V8662YWhb =gNHJ -----END PGP SIGNATURE----- --8TaQrIeukR7mmbKf--