Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Jan 2008 19:22:20 +0100
From:      Peter Schuller <peter.schuller@infidyne.com>
To:        freebsd-current@freebsd.org
Cc:        Pawel Jakub Dawidek <pjd@freebsd.org>, current@freebsd.org
Subject:   Re: (ZFS?): panic: lockmgr: locking against myself
Message-ID:  <200801211922.29463.peter.schuller@infidyne.com>
In-Reply-To: <200801011857.57757.peter.schuller@infidyne.com>
References:  <200707282028.37102.peter.schuller@infidyne.com> <200707310126.06923.peter.schuller@infidyne.com> <200801011857.57757.peter.schuller@infidyne.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart11573178.0QpTdHsbOW
Content-Type: text/plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

> I *think* I just experienced the same problem on 7.0-BETA3, except the
> kernel does not have WITNESS/INVARIANTS so I just get a hack instead of a
> panic. I wanted to post with the information I have for completeness; I
> realize what follows is a bunch of anecdotal mumbo-jumbo.

So I can now confirm this problem on 7.0-RC1 on the machine where I origina=
lly=20
saw this happen.

If I could trigger this in a debuggable environment I would try to get some=
=20
much more interesting information, but as this was during time-limited acce=
ss=20
to the machine in a noisy colocation facility, with people waiting on the=20
machine to come back up, I was not in a position to do very much. Instead I=
=20
will again, as an added data point, provide an approximate timeline below. =
As=20
previously observed, it seems to be triggered by changes in the availabilit=
y=20
of disks and/or zpool configuration, with cold reboots somehow mitigating t=
he=20
problem.

Note that the drives are likely to have been moved around a bit logically (=
but=20
not physically), due to the level of indirection and drive number allocatio=
n=20
resulting from the single-disk raid0 virtual hardware raid device.

Timeline:

* Machine running 7-CURRENT from the october/september era.

* One disk in a three-way zfs mirror (tank, on which root fs is) gets kicke=
d=20
out.

* For probably unrelated reasons, the machine crashes with a kmem_alloc err=
or=20
(this was the first time ever on this machine). Don't have details; this wa=
s=20
observed by colocation personel.

* Machine rebooted and panic:s as described in this thread.

* I arrive on-site and reboot again just for kicks. Same problem.

* I physically remove the broken disk and replace it with the new one, and =
add=20
the virtual disk in the RAID controller bios (recap: this is a Dell 2950).

* Now it boots again.

* I zpool replace tank label/tank3 label/tank3r1 (after various=20
disklabel/glabel action).

* make installkernell (7.0-RC1)

* Reboot with resilvering/replacement still in progress.

* Panic on boot.

* Tried cold reboot (turn off,turn on) -> it now boots again without a pani=
c.

* Make installworld.

* At this point I no longer remember whether it booted again or whether I h=
ad=20
to do another cold reboot.

* Machine has not been rebooted again since resilvering completed.

=2D-=20
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


--nextPart11573178.0QpTdHsbOW
Content-Type: application/pgp-signature; name=signature.asc 
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQBHlOLlDNor2+l1i30RAsFRAKCgj/XuvtqTc+WXzjOtWy/cxgkncACff+vg
3H9oCfMme4pUkUyIhkT9aXI=
=/YjJ
-----END PGP SIGNATURE-----

--nextPart11573178.0QpTdHsbOW--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200801211922.29463.peter.schuller>