Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Mar 2014 15:00:13 +0100
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        freebsd-fs@FreeBSD.org, Andreas Longwitz <longwitz@incore.de>, Andriy Gapon <avg@FreeBSD.org>, freebsd-geom@FreeBSD.org
Subject:   Re: g_mirror_access() dropping geom topology_lock [Was: Kernel crash trying to import a ZFS pool with log device]
Message-ID:  <20140321140012.GB1656@garage.freebsd.pl>
In-Reply-To: <532C17DD.9030704@FreeBSD.org>
References:  <532B5A0C.1010008@incore.de> <532C085D.3020201@FreeBSD.org> <532C17DD.9030704@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--61jdw2sOBCFtR2d/
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Mar 21, 2014 at 12:43:41PM +0200, Alexander Motin wrote:
> On 21.03.2014 11:37, Andriy Gapon wrote:
> > Boom!
> >
> > I see two issues here.
> > First, the ZFS tasting code could be made more robust.  If it never tri=
ed to
> > re-use the consumer and always created a new one, then most likely this=
 crash
> > could be avoided.  But there is no bug in the code.  The code is correc=
t and it
> > it uses GEOM topology lock to avoid any concurrency issues.
> >
> > But GEOM mirror code breaks a contract on which the ZFS code relies.
> > g_access() must be called with the topology lock hold.
> > I extend this requirement to a requirement that access method of any GE=
OM
> > provider must operate under the topology lock and must never drop it.
> > In other words, if a caller must acquire g_topology_lock before calling
> > g_access, then in return it must have a guarantee that the GEOM topolog=
y stays
> > unchanged across the call to g_access().
> > g_mirror_access() breaks the above contract.
> >
> > So, the code in vdev_geom_attach() obtains g_topology_lock, then it fin=
ds an
> > existing valid consumer and calls g_access() on it.  It reasonably expe=
cts that
> > the consumer remains valid, but because g_mirror_access() drops and req=
uires the
> > topology lock, there is a chance that the topology can change and the c=
onsumer
> > may become invalid.
> >
> > I am not very familiar with gmirror code, so I am not sure how to fix t=
he
> > problem from that end.
>=20
> I can confirm this. I know about this problem for some time already. The=
=20
> same issue as shown in GMIRROR is also present in GRAID. AFAIR the=20
> problem is in keeping lock order between GEOM topology lock and class'=20
> own lock.
>=20
> The only "excuse" is that it is not very reasonable to have ZFS on top=20
> of GMIRROR or GRAID.

In my opinion we should stop pretending that we can do without dropping
the topology lock in the access method, accept that fact and act
accordingly in other GEOM classes (like ZFS::VDEV).

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://mobter.com

--61jdw2sOBCFtR2d/
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iEYEARECAAYFAlMsRewACgkQForvXbEpPzTfVQCfc7YI5qqBOJYWU+TFgk5nMvZa
oFkAoLNRBKH+RCATCfkhJlLucOTzxHzu
=BHMw
-----END PGP SIGNATURE-----

--61jdw2sOBCFtR2d/--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140321140012.GB1656>