Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 Jan 2010 15:51:26 +0200
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Pawel Jakub Dawidek <pjd@freebsd.org>
Cc:        freebsd-hackers@freebsd.org, Alexander Motin <mav@freebsd.org>, FreeBSD-Current <freebsd-current@freebsd.org>, freebsd-geom@freebsd.org
Subject:   Re: Deadlock between GEOM and devfs device destroy and process exit.
Message-ID:  <20100130135126.GV3877@deviant.kiev.zoral.com.ua>
In-Reply-To: <20100130114451.GB1660@garage.freebsd.pl>
References:  <4B636812.8060403@FreeBSD.org> <20100130112749.GA1660@garage.freebsd.pl> <20100130114451.GB1660@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help

--2HdWiV8iqzNK3pYB
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Jan 30, 2010 at 12:44:51PM +0100, Pawel Jakub Dawidek wrote:
> On Sat, Jan 30, 2010 at 12:27:49PM +0100, Pawel Jakub Dawidek wrote:
> > On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote:
> > > Hi.
> > >=20
> > > Experimenting with SATA hot-plug I've found quite repeatable deadlock
> > > case. Problem observed when several SATA devices, opened via devfs,
> > > disappear at exactly same time. In my case, at time of unplugging SATA
> > > Port Multiplier with several disks beyond it. All I have to do is to =
run
> > > several `dd if=3D/dev/adaX of=3D/dev/null bs=3D1m &` commands and unp=
lug
> > > multiplier. That causes predictable I/O errors and devices destructio=
n.
> > > But with high probability several dd processes getting stuck in kerne=
l.
> > [...]
> >=20
> > I observed the same thing yesterday while stress-testing HAST:
> >=20
> >  3659  2504  3659     0  DE+     GEOM top 0x8079a348 dd
> >  3658  2102  2102     0  DE+     GEOM top 0x8079a348 hastd
> >     2     0     0     0  DL      devdrn   0x85b1bc68 [g_event]
> >=20
> > Both dd(1) and hastd(8) wait for the GEOM topology lock in the exit pat=
h,
> > which is already held by the g_event thread.
>=20
> Maybe I'll add how I understand what's going on:
>=20
> GEOM calls destroy_dev() while holding the topology lock.
>=20
> Destroy_dev() wants to destroy device, but can't because there are
> threads that still have it open.
>=20
> The threads can't close it, because to close it they need the topology
> lock.
>=20
> The deadlock is quite obvious, IMHO.
>=20
> I believe the problem could be solved by dropping the topology lock in
> g_dev_orphan() when calling destroy_dev(dev), but it is hard to say if
> it is safe to drop the topology lock there. Maybe Poul-Henning could
> take a look.

As I already said, if you cannot drop a lock, destroy_dev_sched() is
designed to handle this. You should be careful to not allow any further
activitity on the device scheduled for destruction.

--2HdWiV8iqzNK3pYB
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (FreeBSD)

iEYEARECAAYFAktkOV0ACgkQC3+MBN1Mb4geLQCg3v+nX9pTfbMUUpasQBDnMwnd
B7EAoN5oA9K9nFfI62P4vwKRzIUyAMO7
=15Wt
-----END PGP SIGNATURE-----

--2HdWiV8iqzNK3pYB--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100130135126.GV3877>