Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Sep 2010 22:10:57 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Matthew Jacob <mj@feral.com>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: race conditions for destroying and opening a dev
Message-ID:  <20100916191057.GF2389@deviant.kiev.zoral.com.ua>
In-Reply-To: <4C92694D.1070705@feral.com>
References:  <4C92694D.1070705@feral.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--8TaQrIeukR7mmbKf
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Sep 16, 2010 at 12:00:29PM -0700, Matthew Jacob wrote:
>=20
> Has anyone seen this scenario before? I am seeing it in RELENG_7, but=20
> the code in question exists through to head.
>=20
> Thread 1:
>=20
> (kgdb) where
> #0  sched_switch (td=3D0xffffff003a04ea80, newtd=3D0xffffff00210b4000,=20
> flags=3DVariable "flags" is not available.
> ) at ../../../kern/sched_ule.c:1944
> #1  0xffffffff803b6091 in mi_switch (flags=3D1, newtd=3D0x0) at=20
> ../../../kern/kern_synch.c:450
> #2  0xffffffff80402399 in sleepq_switch (wchan=3D0xffffff8413b50b60) at=
=20
> ../../../kern/subr_sleepqueue.c:497
> #3  0xffffffff80402e8c in sleepq_timedwait (wchan=3D0xffffff8413b50b60) a=
t=20
> ../../../kern/subr_sleepqueue.c:615
> #4  0xffffffff803b682d in _sleep (ident=3D0xffffff8413b50b60,=20
> lock=3D0xffffffff80b0ee00, priority=3D76, wmesg=3D0xffffffff806583bb "dev=
drn",=20
> timo=3D100) at ../../../kern/kern_synch.c:228
> #5  0xffffffff8037640c in destroy_devl (dev=3D0xffffff003aaf0000) at=20
> ../../../kern/kern_conf.c:874
> #6  0xffffffff80376759 in destroy_dev (dev=3D0xffffff003aaf0000) at=20
> ../../../kern/kern_conf.c:916
> #7  0xffffffff8034c939 in g_dev_orphan (cp=3D0xffffff003a544800) at=20
> ../../../geom/geom_dev.c:438
> #8  0xffffffff803506a0 in g_run_events () at ../../../geom/geom_event.c:1=
64
> #9  0xffffffff80351f1c in g_event_procbody () at=20
> ../../../geom/geom_kern.c:141
> #10 0xffffffff8038a73a in fork_exit (callout=3D0xffffffff80351eb0=20
> <g_event_procbody at ../../../geom/geom_kern.c:132>, arg=3D0x0,=20
> frame=3D0xffffff8413b50c80) at ../../../kern/kern_fork.c:829
> #11 0xffffffff805a747e in fork_trampoline () at=20
> ../../../amd64/amd64/exception.S:564
> #12 0x0000000000000000 in ?? ()
>=20
> This thread is waiting on the threadcount to go away- i.e., the last=20
> close of the device to occur ("da16" in this case).
>=20
> Thread 2:
>=20
> (kgdb) where
> #0  sched_switch (td=3D0xffffff009bb4ca80, newtd=3D0xffffff003af43380,=20
> flags=3DVariable "flags" is not available.
> ) at ../../../kern/sched_ule.c:1944
> #1  0xffffffff803b6091 in mi_switch (flags=3D1, newtd=3D0x0) at=20
> ../../../kern/kern_synch.c:450
> #2  0xffffffff80402399 in sleepq_switch (wchan=3D0xffffffff80b0e040) at=
=20
> ../../../kern/subr_sleepqueue.c:497
> #3  0xffffffff80402f84 in sleepq_wait (wchan=3D0xffffffff80b0e040) at=20
> ../../../kern/subr_sleepqueue.c:580
> #4  0xffffffff803b5385 in _sx_xlock_hard (sx=3D0xffffffff80b0e040,=20
> tid=3D18446742976810240640, opts=3DVariable "opts" is not available.
> ) at ../../../kern/kern_sx.c:562
> #5  0xffffffff803b5731 in _sx_xlock (sx=3D0xffffffff80b0e040, opts=3D0,=
=20
> file=3D0xffffffff80652d27 "../../../geom/geom_dev.c", line=3D196) at sx.h=
:154
> #6  0xffffffff8034d1bc in g_dev_open (dev=3D0xffffff003aaf0000, flags=3D1=
,=20
> fmt=3DVariable "fmt" is not available.
> ) at ../../../geom/geom_dev.c:196
> #7  0xffffffff80333741 in devfs_open (ap=3D0xffffff841dea88b0) at=20
> ../../../fs/devfs/devfs_vnops.c:902
> #8  0xffffffff80601daf in VOP_OPEN_APV (vop=3D0xffffffff8089fb80,=20
> a=3D0xffffff841dea88b0) at vnode_if.c:371
> #9  0xffffffff80467246 in vn_open_cred (ndp=3D0xffffff841dea8a00,=20
> flagp=3D0xffffff841dea894c, cmode=3DVariable "cmode" is not available.
> ) at vnode_if.h:199
> #10 0xffffffff80463770 in kern_open (td=3D0xffffff009bb4ca80,=20
> path=3D0x5114a0 <Address 0x5114a0 out of bounds>, pathseg=3DVariable=20
> "pathseg" is not available.
> ) at ../../../kern/vfs_syscalls.c:1054
> #11 0xffffffff805c599e in syscall (frame=3D0xffffff841dea8c80) at=20
> ../../../amd64/amd64/trap.c:911
> #12 0xffffffff805a723b in Xfast_syscall () at=20
> ../../../amd64/amd64/exception.S:349
> #13 0x00000008009a219c in ?? ()
>=20
> This thread was opening the device, bumped the refcount, but then wedged=
=20
> on the geom topology lock .....
>=20
> the refcount field is protected under devmtx....
>=20
> Anyone seen this?
>=20
> I'm half inclined to either add in CDP_SCHED_DTR when one calls=20
> destroy_dev, or make dev_refthread look at CDP_ACTIVE, leaning more=20
> toward the latter.
>=20
> Any thoughts on this?

And who owns the topology lock ? Is it thread 1 ?

Destroy_devl() clears si_devsw for departing cdev, and *refthread()
checks si_devsw against NULL as an indicator of device destruction in
progress.

I think that this situation is what destroy_dev_sched(9) was created
for.

--8TaQrIeukR7mmbKf
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (FreeBSD)

iEYEARECAAYFAkySa8AACgkQC3+MBN1Mb4jKNwCgv30TrKYWhEeXq1KmjAP516a4
AxAAoKkXX9pQeQkkTIxWtC0V8662YWhb
=gNHJ
-----END PGP SIGNATURE-----

--8TaQrIeukR7mmbKf--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100916191057.GF2389>