From owner-freebsd-current@FreeBSD.ORG Tue Jan 1 18:18:02 2008 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 763FD16A417 for ; Tue, 1 Jan 2008 18:18:02 +0000 (UTC) (envelope-from peter.schuller@infidyne.com) Received: from smtp.infidyne.com (ds9.infidyne.com [88.80.6.206]) by mx1.freebsd.org (Postfix) with ESMTP id EEB0113C4E9 for ; Tue, 1 Jan 2008 18:18:01 +0000 (UTC) (envelope-from peter.schuller@infidyne.com) Received: from c-8216e555.03-51-73746f3.cust.bredbandsbolaget.se (c-8216e555.03-51-73746f3.cust.bredbandsbolaget.se [85.229.22.130]) by smtp.infidyne.com (Postfix) with ESMTP id 5148D779CF; Tue, 1 Jan 2008 18:58:02 +0100 (CET) From: Peter Schuller To: freebsd-current@freebsd.org Date: Tue, 1 Jan 2008 18:57:48 +0100 User-Agent: KMail/1.9.7 References: <200707282028.37102.peter.schuller@infidyne.com> <200707292157.09742.peter.schuller@infidyne.com> <200707310126.06923.peter.schuller@infidyne.com> In-Reply-To: <200707310126.06923.peter.schuller@infidyne.com> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1470949.D78ygmVzrc"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200801011857.57757.peter.schuller@infidyne.com> Cc: Pawel Jakub Dawidek , current@freebsd.org Subject: Re: (ZFS?): panic: lockmgr: locking against myself X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Jan 2008 18:18:02 -0000 --nextPart1470949.D78ygmVzrc Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline (quoting last post for convenience; more history at=20 http://www.usenetarticles.com/thread/952336.html) > > vnode 0xffffff00037473e0: tag devfs, type VDIR > > usecount 0, writecount 0, refcount 1 mountedhere 0xffffff0003745ca0 > > flags (VV_ROOT) > > lock type devfs: EXCL (count 1) by thread 0xffffff00010e6680 (pid 1) > > Some additional facts: > > Looking at the printouts, there is always a sequence of three or more > (three at least twice; more than three at least once) vrele():s of the sa= me > vnode, in both the successful case and the panicing case. There are no > vrele():s of any other vnodes in either case. > > Inserting enter/exit debug printouts in mountcheckdirs() confirms that all > calls occur within the bounds of a single call to mountcheckdirs(). Does > not this imply there is some locking mismatch in the non-ZFS specific cod= e? > I must admit I find the locking confusing; with several locking/unlocking > functions/macros intermixed at different levels in the callstack. My > (incorrect) reading was that this panic should always be happening, which > is obviously not the case. > > Running with vfs.zfs.debug=3D1 confirms that vdev_geom open/attach/detach= is > happening prior to any vrele() even in the panicing case (i.e., zfs pool > discovery seems to complete). > > In the case of an expected provider not being found, vd->vdev_devid is NU= LL > in vdev_geom_open(), based on the "provider not found" debug printout > (perhaps normal?). I *think* I just experienced the same problem on 7.0-BETA3, except the kern= el=20 does not have WITNESS/INVARIANTS so I just get a hack instead of a panic. I= =20 wanted to post with the information I have for completeness; I realize what= =20 follows is a bunch of anecdotal mumbo-jumbo. The boot-up process hangs right before the would-be 'trying to mount root=20 from....", after all the glabel tasting has completed. This was on a completely different system than the one in the original post= ,=20 but it also has root-on-zfs (this time on a 5 disk raidz2). It's a dual cor= e=20 amd64 machine with a low-end mobo and low-end SATA controllers (SiI and som= e=20 built-in nVidia chipset). It all started when I was booting back into FreeBSD after having Windows=20 booted for a while. It wouldn't boot. If fiddled some wiht vfs.zfs.debug=3D= 1,=20 removing a cd ion the drive (in case it affected timing), but it did not=20 help. I did not try the boot-7-live cd trick this time as I did originally = on=20 the other machine. I looked carefully to make sure all drives were detected, including geom=20 tasting on all but one of them that are in the zfs pool. The I/O indicator= =20 leds on the respective drives that ar part of the zfs pool did not indicate= =20 any I/O after the hang. I waited 5+ minutes at least once in the hope that = it=20 was a drive timing out. After several attempts I turned off the machine and let it do a cold boot -= at=20 this point the system booted fine. This is different from before, in that previously the behavior was seemingl= y=20 triggered by changes in system configuration (loss of a drive, etc). This=20 time it was just a reboot. I *did* touch a bunch of cables in between, and= =20 blew some air on components (for reasons not relating to this) which I=20 originally figured could explain the problem. Before this incident, the system has booted with root-on-zfs several times = (at=20 least 25, probably more like 50+) without any kind of problem, ever. =2D-=20 / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org --nextPart1470949.D78ygmVzrc Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQBHen8lDNor2+l1i30RAh9uAJ0XoABn2gWFopb+g0hP73bRS8HJ/ACgm42P Ho33IXjvrscn04uOtk4K31I= =mTaZ -----END PGP SIGNATURE----- --nextPart1470949.D78ygmVzrc--