Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 Jan 2008 18:57:48 +0100
From:      Peter Schuller <peter.schuller@infidyne.com>
To:        freebsd-current@freebsd.org
Cc:        Pawel Jakub Dawidek <pjd@freebsd.org>, current@freebsd.org
Subject:   Re: (ZFS?): panic: lockmgr: locking against myself
Message-ID:  <200801011857.57757.peter.schuller@infidyne.com>
In-Reply-To: <200707310126.06923.peter.schuller@infidyne.com>
References:  <200707282028.37102.peter.schuller@infidyne.com> <200707292157.09742.peter.schuller@infidyne.com> <200707310126.06923.peter.schuller@infidyne.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart1470949.D78ygmVzrc
Content-Type: text/plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

(quoting last post for convenience; more history at=20
http://www.usenetarticles.com/thread/952336.html)

> > vnode 0xffffff00037473e0: tag devfs, type VDIR
> >   usecount 0, writecount 0, refcount 1 mountedhere 0xffffff0003745ca0
> >   flags (VV_ROOT)
> >     lock type devfs: EXCL (count 1) by thread 0xffffff00010e6680 (pid 1)
>
> Some additional facts:
>
> Looking at the printouts, there is always a sequence of three or more
> (three at least twice; more than three at least once) vrele():s of the sa=
me
> vnode, in both the successful case and the panicing case. There are no
> vrele():s of any other vnodes in either case.
>
> Inserting enter/exit debug printouts in mountcheckdirs() confirms that all
> calls occur within the bounds of a single call to mountcheckdirs(). Does
> not this imply there is some locking mismatch in the non-ZFS specific cod=
e?
> I must admit I find the locking confusing; with several locking/unlocking
> functions/macros intermixed at different levels in the callstack. My
> (incorrect) reading was that this panic should always be happening, which
> is obviously not the case.
>
> Running with vfs.zfs.debug=3D1 confirms that vdev_geom open/attach/detach=
 is
> happening prior to any vrele() even in the panicing case (i.e., zfs pool
> discovery seems to complete).
>
> In the case of an expected provider not being found, vd->vdev_devid is NU=
LL
> in vdev_geom_open(), based on the "provider not found" debug printout
> (perhaps normal?).

I *think* I just experienced the same problem on 7.0-BETA3, except the kern=
el=20
does not have WITNESS/INVARIANTS so I just get a hack instead of a panic. I=
=20
wanted to post with the information I have for completeness; I realize what=
=20
follows is a bunch of anecdotal mumbo-jumbo.

The boot-up process hangs right before the would-be 'trying to mount root=20
from....", after all the glabel tasting has completed.

This was on a completely different system than the one in the original post=
,=20
but it also has root-on-zfs (this time on a 5 disk raidz2). It's a dual cor=
e=20
amd64 machine with a low-end mobo and low-end SATA controllers (SiI and som=
e=20
built-in nVidia chipset).

It all started when I was booting back into FreeBSD after having Windows=20
booted for a while. It wouldn't boot. If fiddled some wiht vfs.zfs.debug=3D=
1,=20
removing a cd ion the drive (in case it affected timing), but it did not=20
help. I did not try the boot-7-live cd trick this time as I did originally =
on=20
the other machine.

I looked carefully to make sure all drives were detected, including geom=20
tasting on all but one of them that are in the zfs pool. The I/O indicator=
=20
leds on the respective drives that ar part of the zfs pool did not indicate=
=20
any I/O after the hang. I waited 5+ minutes at least once in the hope that =
it=20
was a drive timing out.

After several attempts I turned off the machine and let it do a cold boot -=
 at=20
this point the system booted fine.

This is different from before, in that previously the behavior was seemingl=
y=20
triggered by changes in system configuration (loss of a drive, etc). This=20
time it was just a reboot. I *did* touch a bunch of cables in between, and=
=20
blew some air on components (for reasons not relating to this) which I=20
originally figured could explain the problem.

Before this incident, the system has booted with root-on-zfs several times =
(at=20
least 25, probably more like 50+) without any kind of problem, ever.

=2D-=20
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


--nextPart1470949.D78ygmVzrc
Content-Type: application/pgp-signature; name=signature.asc 
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQBHen8lDNor2+l1i30RAh9uAJ0XoABn2gWFopb+g0hP73bRS8HJ/ACgm42P
Ho33IXjvrscn04uOtk4K31I=
=mTaZ
-----END PGP SIGNATURE-----

--nextPart1470949.D78ygmVzrc--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200801011857.57757.peter.schuller>