From owner-freebsd-fs@FreeBSD.ORG  Sun Feb 17 00:34:10 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 1001CE9A
 for <fs@freebsd.org>; Sun, 17 Feb 2013 00:34:10 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from fallbackmx08.syd.optusnet.com.au
 (fallbackmx08.syd.optusnet.com.au [211.29.132.10])
 by mx1.freebsd.org (Postfix) with ESMTP id 805AB1D6
 for <fs@freebsd.org>; Sun, 17 Feb 2013 00:34:09 +0000 (UTC)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
 [211.29.132.185])
 by fallbackmx08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
 r1H0Y8kH002802 for <fs@freebsd.org>; Sun, 17 Feb 2013 11:34:08 +1100
Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au
 (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106])
 by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r1H0XwuD020783
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
 for <fs@freebsd.org>; Sun, 17 Feb 2013 11:34:00 +1100
Date: Sun, 17 Feb 2013 11:33:58 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: fs@freebsd.org
Subject: cleaning files beyond EOF
Message-ID: <20130217113031.N9271@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=MscKcBme c=1 sm=1 a=kj9zAlcOel0A:10
 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=MxLOMYGytlMA:10
 a=Yc0UDWaBctfB_lDtZ-YA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 17 Feb 2013 00:34:10 -0000

I have a (possibly damaged) ffs data block with nonzero data beyond
EOF.  Is anything responsible for clearing this data when the file
is mmapped()?

At least old versions of gcc mmap() the file and have a bug checking
for EOF.  They read the garbage beyond the end and get confused.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Sun Feb 17 05:55:39 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 4A43FC29
 for <fs@freebsd.org>; Sun, 17 Feb 2013 05:55:39 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id A7216AA0
 for <fs@freebsd.org>; Sun, 17 Feb 2013 05:55:38 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r1H5tSKK017200;
 Sun, 17 Feb 2013 07:55:28 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r1H5tSKK017200
Received: (from kostik@localhost)
 by tom.home (8.14.6/8.14.6/Submit) id r1H5tSGT017199;
 Sun, 17 Feb 2013 07:55:28 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sun, 17 Feb 2013 07:55:28 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: cleaning files beyond EOF
Message-ID: <20130217055528.GB2522@kib.kiev.ua>
References: <20130217113031.N9271@besplex.bde.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="lnVtgFsQ/+aGFUA8"
Content-Disposition: inline
In-Reply-To: <20130217113031.N9271@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 17 Feb 2013 05:55:39 -0000


--lnVtgFsQ/+aGFUA8
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Feb 17, 2013 at 11:33:58AM +1100, Bruce Evans wrote:
> I have a (possibly damaged) ffs data block with nonzero data beyond
> EOF.  Is anything responsible for clearing this data when the file
> is mmapped()?
>=20
> At least old versions of gcc mmap() the file and have a bug checking
> for EOF.  They read the garbage beyond the end and get confused.

Does the 'damaged' status of the data block mean that it contain the
garbage after EOF on disk ?

UFS uses a small wrapper around vnode_generic_getpages() as the
VOP_GETPAGES(), the wrapping code can be ignored for the current
purpose.

vnode_generic_getpages() iterates over the the pages after the bstrategy()
and marks the part of the page after EOF valid and zeroes it, using
vm_page_set_valid_range().

--lnVtgFsQ/+aGFUA8
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRIHDQAAoJEJDCuSvBvK1BVvIQAII9S5yh7gY5/sDH53QfKv7T
6r1ekPRmUgcSMGcFMwH6N88uVOckdXFcROIJKmgKDvjQzzdvif3SPzBFwShm55rg
zsqs5IuEi+xJtn2N6TQOdgTiV+//GLonIKXMjrQluF0qm9+BhP9bUmSrkIzHGV5P
Lf20n3EMok3hF03cxdJrDI0jHJ+wUpZTee1SELq/fcMHK04R0BCtsVgulHwRnrr9
N+fdhpB5Eh85LH5OEALgiF2x5deK/khhauPyFRymg+s2N57w7EsxxQHAyYnuz/KN
sDY0FOlvNLqwzBGtomrJnRWY+d6RQiKlcxvWkBUP9leOFnuGnRVc1MjYHhNP/Htz
GzAeQKLLuDIroSZy5IbcGrLzc2XQIKix1ILPtfffV/2Vn/bQCXHC1EGqPfQl2UqK
8K90QjVR0z7mBvbaiW8QgoN/2Dy0UvqYwOTBrAzAET6tRmH/nOuCAqhzsBAvssY1
tjRS5nqeF3S/AbalZtDNM7TmNZOlWTlLYUrvDuct/C68CkWGejYYFGMoGOuL2/vC
hGb5pSH9IDNG5NP3mtoXylukmplgUnhPOiXyxLLGAoFnVTL5nGJaxlUwu8XElOxq
+3xNgLt2mkR4er8mx9WaB3moistCIU1LEI4M/kQEOQLQfTWxdc7hC2P8hvm6IN0V
TCvwmJrtvt+it4vdaTpg
=lL16
-----END PGP SIGNATURE-----

--lnVtgFsQ/+aGFUA8--

From owner-freebsd-fs@FreeBSD.ORG  Sun Feb 17 07:02:04 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id A007D368
 for <fs@freebsd.org>; Sun, 17 Feb 2013 07:02:04 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail12.syd.optusnet.com.au (mail12.syd.optusnet.com.au
 [211.29.132.193]) by mx1.freebsd.org (Postfix) with ESMTP id 9793EC2F
 for <fs@freebsd.org>; Sun, 17 Feb 2013 07:02:02 +0000 (UTC)
Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au
 (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106])
 by mail12.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r1H71osh019511
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Sun, 17 Feb 2013 18:01:52 +1100
Date: Sun, 17 Feb 2013 18:01:50 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: cleaning files beyond EOF
In-Reply-To: <20130217055528.GB2522@kib.kiev.ua>
Message-ID: <20130217172928.C1900@besplex.bde.org>
References: <20130217113031.N9271@besplex.bde.org>
 <20130217055528.GB2522@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=RbTIkCRv c=1 sm=1 a=xK1pj5J4f3QA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=GlckP5_kgdUA:10
 a=9bBWqGSFoObTARNqQagA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117
Cc: fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 17 Feb 2013 07:02:04 -0000

On Sun, 17 Feb 2013, Konstantin Belousov wrote:

> On Sun, Feb 17, 2013 at 11:33:58AM +1100, Bruce Evans wrote:
>> I have a (possibly damaged) ffs data block with nonzero data beyond
>> EOF.  Is anything responsible for clearing this data when the file
>> is mmapped()?
>>
>> At least old versions of gcc mmap() the file and have a bug checking
>> for EOF.  They read the garbage beyond the end and get confused.
>
> Does the 'damaged' status of the data block mean that it contain the
> garbage after EOF on disk ?

Yes, it's at most software damage.  I used a broken version of
vfs_bio_clrbuf() for a long time and it probably left some unusual
blocks.  This matters suprisingly rarely.

I forgot to mention that this is with an old version of FreeBSD,
where I changed vfs_bio.c a lot but barely touched vm.

> UFS uses a small wrapper around vnode_generic_getpages() as the
> VOP_GETPAGES(), the wrapping code can be ignored for the current
> purpose.
>
> vnode_generic_getpages() iterates over the the pages after the bstrategy()
> and marks the part of the page after EOF valid and zeroes it, using
> vm_page_set_valid_range().

The old version has a large non-wrapper in ffs, and vnode_generic_getpages()
uses vm_page_set_validclean().  Maybe the bug is just in the old
ffs_getpages().  It seems to do only DEV_BSIZE'ed zeroing stuff.  It
begins with the same "We have to zero that data" code that forms most
of the wrapper in the current version.  It normally only returns
vnode_pager_generic_getpages() after that if bsize < PAGE_SIZE.
However, my version has a variable which I had forgotten about to
control this, and the forgotten setting of this variable results in
always using vnode_pager_generic_getpages(), as in -current.  I probably
copied some fixes in -current for this.  So the bug can't be just in
ffs_getpages().

The "damaged" block is at the end of vfs_default.c.  The file size is
25 * PAGE_SIZE + 16.  It is in 7 16K blocks, 2 full 2K frags, and 1 frag
with 16 bytes valid in it.

I have another problem that is apparently with
vnode_pager_generic_getpages() and now affects -current from about a
year ago in an identical way with the old version: mmap() is very slow
in msdosfs.  cmp uses mmap() too much, and reading files sequentially
using mmap() is 3.4 times slower than reading them using read() on my
DVD media/drive.  The i/o seems to be correctly clustered for both.
with average transaction sizes over 50K but tps much lower for mmap().
Similarly on a (faster) hard disk except the slowness is not as noticeable
(drive buffering might hide it completely).  However, for ffs files on
the hard disk, mmap() is as fast as read().

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Sun Feb 17 07:48:38 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id CA14491C
 for <fs@freebsd.org>; Sun, 17 Feb 2013 07:48:38 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 0CB47D76
 for <fs@freebsd.org>; Sun, 17 Feb 2013 07:48:37 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r1H7mWvY008623;
 Sun, 17 Feb 2013 09:48:32 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r1H7mWvY008623
Received: (from kostik@localhost)
 by tom.home (8.14.6/8.14.6/Submit) id r1H7mWI0008622;
 Sun, 17 Feb 2013 09:48:32 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sun, 17 Feb 2013 09:48:32 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: cleaning files beyond EOF
Message-ID: <20130217074832.GA2598@kib.kiev.ua>
References: <20130217113031.N9271@besplex.bde.org>
 <20130217055528.GB2522@kib.kiev.ua>
 <20130217172928.C1900@besplex.bde.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="mP3DRpeJDSE+ciuQ"
Content-Disposition: inline
In-Reply-To: <20130217172928.C1900@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 17 Feb 2013 07:48:38 -0000


--mP3DRpeJDSE+ciuQ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Feb 17, 2013 at 06:01:50PM +1100, Bruce Evans wrote:
> On Sun, 17 Feb 2013, Konstantin Belousov wrote:
>=20
> > On Sun, Feb 17, 2013 at 11:33:58AM +1100, Bruce Evans wrote:
> >> I have a (possibly damaged) ffs data block with nonzero data beyond
> >> EOF.  Is anything responsible for clearing this data when the file
> >> is mmapped()?
> >>
> >> At least old versions of gcc mmap() the file and have a bug checking
> >> for EOF.  They read the garbage beyond the end and get confused.
> >
> > Does the 'damaged' status of the data block mean that it contain the
> > garbage after EOF on disk ?
>=20
> Yes, it's at most software damage.  I used a broken version of
> vfs_bio_clrbuf() for a long time and it probably left some unusual
> blocks.  This matters suprisingly rarely.
I recently had to modify the vfs_bio_clrbuf().  For me, a bug in the
function did matter a lot, because the function is used, in particular,
to clear the indirect blocks.  The bug caused quite random filesystem
failures until I figured it out.  My version of vfs_bio_clrbuf() is
at the end of the message, it avoids accessing b_data.

>=20
> I forgot to mention that this is with an old version of FreeBSD,
> where I changed vfs_bio.c a lot but barely touched vm.
>=20
> > UFS uses a small wrapper around vnode_generic_getpages() as the
> > VOP_GETPAGES(), the wrapping code can be ignored for the current
> > purpose.
> >
> > vnode_generic_getpages() iterates over the the pages after the bstrateg=
y()
> > and marks the part of the page after EOF valid and zeroes it, using
> > vm_page_set_valid_range().
>=20
> The old version has a large non-wrapper in ffs, and vnode_generic_getpage=
s()
> uses vm_page_set_validclean().  Maybe the bug is just in the old
> ffs_getpages().  It seems to do only DEV_BSIZE'ed zeroing stuff.  It
> begins with the same "We have to zero that data" code that forms most
> of the wrapper in the current version.  It normally only returns
> vnode_pager_generic_getpages() after that if bsize < PAGE_SIZE.
> However, my version has a variable which I had forgotten about to
> control this, and the forgotten setting of this variable results in
> always using vnode_pager_generic_getpages(), as in -current.  I probably
> copied some fixes in -current for this.  So the bug can't be just in
> ffs_getpages().
>=20
> The "damaged" block is at the end of vfs_default.c.  The file size is
> 25 * PAGE_SIZE + 16.  It is in 7 16K blocks, 2 full 2K frags, and 1 frag
> with 16 bytes valid in it.
But the ffs_getpages() might be indeed the culprit. It calls
vm_page_zero_invalid(), which only has DEV_BSIZE granularity. I think
that ffs_getpages() also should zero the after eof part of the last page
of the file to fix your damage, since device read cannot read less than
DEV_BSIZE.

diff --git a/sys/ufs/ffs/ffs_vnops.c b/sys/ufs/ffs/ffs_vnops.c
index ef6194c..4240b78 100644
--- a/sys/ufs/ffs/ffs_vnops.c
+++ b/sys/ufs/ffs/ffs_vnops.c
@@ -844,9 +844,9 @@ static int
 ffs_getpages(ap)
 	struct vop_getpages_args *ap;
 {
-	int i;
 	vm_page_t mreq;
-	int pcount;
+	uint64_t size;
+	int i, pcount;
=20
 	pcount =3D round_page(ap->a_count) / PAGE_SIZE;
 	mreq =3D ap->a_m[ap->a_reqpage];
@@ -861,6 +861,9 @@ ffs_getpages(ap)
 	if (mreq->valid) {
 		if (mreq->valid !=3D VM_PAGE_BITS_ALL)
 			vm_page_zero_invalid(mreq, TRUE);
+		size =3D VTOI(ap->a_vp)->i_size;
+		if (mreq->pindex =3D=3D OFF_TO_IDX(size))
+			pmap_zero_page_area(mreq, size & PAGE_MASK, PAGE_SIZE);
 		for (i =3D 0; i < pcount; i++) {
 			if (i !=3D ap->a_reqpage) {
 				vm_page_lock(ap->a_m[i]);

On the other hand, it is not clear should we indeed protect against such
case, or just declare the disk data broken.

>=20
> I have another problem that is apparently with
> vnode_pager_generic_getpages() and now affects -current from about a
> year ago in an identical way with the old version: mmap() is very slow
> in msdosfs.  cmp uses mmap() too much, and reading files sequentially
> using mmap() is 3.4 times slower than reading them using read() on my
> DVD media/drive.  The i/o seems to be correctly clustered for both.
> with average transaction sizes over 50K but tps much lower for mmap().
> Similarly on a (faster) hard disk except the slowness is not as noticeable
> (drive buffering might hide it completely).  However, for ffs files on
> the hard disk, mmap() is as fast as read().

diff --git a/sys/kern/vfs_bio.c b/sys/kern/vfs_bio.c
index 6393399..83d3609 100644
--- a/sys/kern/vfs_bio.c
+++ b/sys/kern/vfs_bio.c
@@ -3704,8 +4070,7 @@ vfs_bio_set_valid(struct buf *bp, int base, int size)
 void
 vfs_bio_clrbuf(struct buf *bp)=20
 {
-	int i, j, mask;
-	caddr_t sa, ea;
+	int i, j, mask, sa, ea, slide;
=20
 	if ((bp->b_flags & (B_VMIO | B_MALLOC)) !=3D B_VMIO) {
 		clrbuf(bp);
@@ -3723,39 +4088,69 @@ vfs_bio_clrbuf(struct buf *bp)
 		if ((bp->b_pages[0]->valid & mask) =3D=3D mask)
 			goto unlock;
 		if ((bp->b_pages[0]->valid & mask) =3D=3D 0) {
-			bzero(bp->b_data, bp->b_bufsize);
+			pmap_zero_page_area(bp->b_pages[0], 0, bp->b_bufsize);
 			bp->b_pages[0]->valid |=3D mask;
 			goto unlock;
 		}
 	}
-	ea =3D sa =3D bp->b_data;
-	for(i =3D 0; i < bp->b_npages; i++, sa =3D ea) {
-		ea =3D (caddr_t)trunc_page((vm_offset_t)sa + PAGE_SIZE);
-		ea =3D (caddr_t)(vm_offset_t)ulmin(
-		    (u_long)(vm_offset_t)ea,
-		    (u_long)(vm_offset_t)bp->b_data + bp->b_bufsize);
+	sa =3D bp->b_offset & PAGE_MASK;
+	slide =3D 0;
+	for (i =3D 0; i < bp->b_npages; i++) {
+		slide =3D imin(slide + PAGE_SIZE, bp->b_bufsize + sa);
+		ea =3D slide & PAGE_MASK;
+		if (ea =3D=3D 0)
+			ea =3D PAGE_SIZE;
 		if (bp->b_pages[i] =3D=3D bogus_page)
 			continue;
-		j =3D ((vm_offset_t)sa & PAGE_MASK) / DEV_BSIZE;
+		j =3D sa / DEV_BSIZE;
 		mask =3D ((1 << ((ea - sa) / DEV_BSIZE)) - 1) << j;
 		VM_OBJECT_LOCK_ASSERT(bp->b_pages[i]->object, MA_OWNED);
 		if ((bp->b_pages[i]->valid & mask) =3D=3D mask)
 			continue;
 		if ((bp->b_pages[i]->valid & mask) =3D=3D 0)
-			bzero(sa, ea - sa);
+			pmap_zero_page_area(bp->b_pages[i], sa, ea - sa);
 		else {
 			for (; sa < ea; sa +=3D DEV_BSIZE, j++) {
-				if ((bp->b_pages[i]->valid & (1 << j)) =3D=3D 0)
-					bzero(sa, DEV_BSIZE);
+				if ((bp->b_pages[i]->valid & (1 << j)) =3D=3D 0) {
+					pmap_zero_page_area(bp->b_pages[i],
+					    sa, DEV_BSIZE);
+				}
 			}
 		}
 		bp->b_pages[i]->valid |=3D mask;
+		sa =3D 0;
 	}
 unlock:
 	VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object);
 	bp->b_resid =3D 0;
 }
=20
+void
+vfs_bio_bzero_buf(struct buf *bp, int base, int size)
+{
+	vm_page_t m;
+	int i, n;
+
+	if ((bp->b_flags & B_UNMAPPED) =3D=3D 0) {
+		BUF_CHECK_MAPPED(bp);
+		bzero(bp->b_data + base, size);
+	} else {
+		BUF_CHECK_UNMAPPED(bp);
+		n =3D PAGE_SIZE - (base & PAGE_MASK);
+		VM_OBJECT_LOCK(bp->b_bufobj->bo_object);
+		for (i =3D base / PAGE_SIZE; size > 0 && i < bp->b_npages; ++i) {
+			m =3D bp->b_pages[i];
+			if (n > size)
+				n =3D size;
+			pmap_zero_page_area(m, base & PAGE_MASK, n);
+			base +=3D n;
+			size -=3D n;
+			n =3D PAGE_SIZE;
+		}
+		VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object);
+	}
+}
+
 /*
  * vm_hold_load_pages and vm_hold_free_pages get pages into
  * a buffers address space.  The pages are anonymous and are

--mP3DRpeJDSE+ciuQ
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRIItPAAoJEJDCuSvBvK1BLxoP/RBTTxKS5x4P4Y8eh72d80PC
Q7FmAKW62SjIFapsHMYVoIdsfi7QYBWW8RGLI9LSjHfz5ouwb4p7LwR5NjcFCjw3
2G/wv9cTgkHv7/+QVqWNXBi0FquLVPY9oPwe2reFz9MZpHJeN0j/5rYKzCWGLxkC
EMog2CdI5UBleLRTHCeXvHiSss75W39GAvefijo1C0/soAESCfSBLiITLZu3ZRjE
+8sLS4mJ7pGHJ2AQdbVKGb1dkt6XHoZ2T8Jc3C/GwyoYE4CnUXe/2mssSCjePEng
ho5eElyqW+3bwtMJKSQAuS+rQDhUvTLQFHHLlkZLyaDHbIVB0mkC03WQGRkdLi34
0Y+bzfNjcncyM197xUaG828DrpwKQZdsfkG685VzWBFWtzqz5epNVjjLWPBt+ReO
HBB5FNAU9FXa0u8/amkpnexYTbfgnRx0h8mtd3B7eviKeZ7o2XrMwUAWVihYvZQN
gAl/np41x8WXuIZ0BO3itc4LFFlSYj84AloyP3Tw9LT1zuk8bCcl3Vz0Xp6WAce/
tKfskZIMcgUn02wO3XOkHviwfWsi1IJ+eZhgqg7KWefj80WwtWTM8WR83Q0yhcBO
rtTfsgFYPJdhF3dAK7EZBcASSs/xYOL74/qcyUVQwUZ4Wg2AA2ew1VdvSPm10qUu
Rsd8TZkYmljlzb1BKkAK
=Zg5p
-----END PGP SIGNATURE-----

--mP3DRpeJDSE+ciuQ--

From owner-freebsd-fs@FreeBSD.ORG  Sun Feb 17 14:46:57 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 8201D7C3;
 Sun, 17 Feb 2013 14:46:57 +0000 (UTC)
 (envelope-from konstantin.kuklin@gmail.com)
Received: from mail-qc0-f170.google.com (mail-qc0-f170.google.com
 [209.85.216.170])
 by mx1.freebsd.org (Postfix) with ESMTP id 01B2DA69;
 Sun, 17 Feb 2013 14:46:56 +0000 (UTC)
Received: by mail-qc0-f170.google.com with SMTP id d42so1784742qca.29
 for <multiple recipients>; Sun, 17 Feb 2013 06:46:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:date:message-id:subject:from:to
 :content-type; bh=YJgMsTnDELuLnMjp4ab6RPnA5/bvID2AmVen7Oc9bqY=;
 b=Ct18D0IecE5HBoQkAf6tnWJM2HcdTVlwt9zL9zXCqo7Jr2AqE3dzBRkt6MNdKWfOjC
 +Dqk6oy/vEhGf9PQ+fuS6l/i6P9d7MJCl3uKm/19qEbvqNdDUW5V7LboGYBOeq2FT1av
 KDyHfvc+26pt6Ey7CNakG8drYR3LbIcrErw8RtKz4fRudA5XZ6FJuRkx9IlB/oO0qr2R
 WIEnobaLFNrLcHkO79hVvzH6RHhZGv4Jh7XQul/gl+VE3kv/Te4EbA5bAAnsz3BoJI6H
 HfEMiWW+3XvRqG3GDDTMxYLZ+0wWl6ZkMIxj0wuvva7hySiGeiBbVEVLV2/mb9gB2Av8
 sEEA==
MIME-Version: 1.0
X-Received: by 10.224.186.83 with SMTP id cr19mr4106277qab.51.1361112410258;
 Sun, 17 Feb 2013 06:46:50 -0800 (PST)
Received: by 10.49.98.130 with HTTP; Sun, 17 Feb 2013 06:46:50 -0800 (PST)
Date: Sun, 17 Feb 2013 18:46:50 +0400
Message-ID: <CAOrTs_aRL1cO3Jm1YrXoQRGoUX4vVqYKvFcTXbzyijYAhDsCVA@mail.gmail.com>
Subject: zfs raid1 error resilvering and mount
From: Konstantin Kuklin <konstantin.kuklin@gmail.com>
To: freebsd-fs@freebsd.org, pjd@freebsd.org, mm@freebsd.org, 
 zfs-discuss@opensolaris.org
X-Mailman-Approved-At: Sun, 17 Feb 2013 15:41:55 +0000
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 17 Feb 2013 14:46:57 -0000

hi, i have raid1 on zfs with 2 device on pool
first device died and boot from second not working...

i try to get http://mfsbsd.vx.sk/ flash and load from it with zpool import
http://puu.sh/2402E

when  i load zfs.ko and opensolaris.ko i see this message:
Solaris: WARNING: Can't open objset for zroot/var/crash
Solaris: WARNING: Can't open objset for zroot/var/crash

zpool status:
http://puu.sh/2405f

resilvering freeze with:
zpool status -v
        .............
        zroot/usr:<0x28ff>
        zroot/usr:<0x29ff>
        zroot/usr:<0x2aff>
        zroot/var/crash:<0x0>
 root@Flash:/root #

how i can delete or drop it fs zroot/var/crash (1m-10m size i didn`t
remember) and mount other zfs points with my data
--=20
=F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD
=EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE.

From owner-freebsd-fs@FreeBSD.ORG  Sun Feb 17 15:49:21 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 3B81CDA8
 for <freebsd-fs@freebsd.org>; Sun, 17 Feb 2013 15:49:21 +0000 (UTC)
 (envelope-from ml@my.gd)
Received: from mail-wg0-f47.google.com (mail-wg0-f47.google.com [74.125.82.47])
 by mx1.freebsd.org (Postfix) with ESMTP id B3817CBB
 for <freebsd-fs@freebsd.org>; Sun, 17 Feb 2013 15:49:20 +0000 (UTC)
Received: by mail-wg0-f47.google.com with SMTP id dr13so3892638wgb.2
 for <freebsd-fs@freebsd.org>; Sun, 17 Feb 2013 07:49:19 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-received:content-type:mime-version:subject:from:in-reply-to:date
 :cc:content-transfer-encoding:message-id:references:to:x-mailer
 :x-gm-message-state;
 bh=/E1oXavxbSDixeGVxlfjfS/UGeorjyrFZrcpbJjMODA=;
 b=WPIp681J8azoVLfdfy8rOyN3zQBOqnMx8PCUqG0h5rOsHKWOyxf7B7LitTEEtoIoJo
 bx8hBcJXVnbHxoU5lSjBXTcse1K2p4oZWLAQqc7Bj05+TRlL3/0lSmBZBtItQ6g8SKi8
 OsG9MobHQ2b11PK4SXZ910XBBmY+JyTrD+ERa6dAwUSFrxjzdKobrrcasn+NX6vr9wB+
 ucu/8TYwW4uZ78C7RTQWAjhbnuXZV8vHOh1XBIYrVAuEY1jvbDDqT0QgbhhiACBcHviA
 eIv6n5C3oAi4VUQwEnIClPMmbi65X6KB34+Ha5Xr7Ik7NEIjxVlgj/WOnVAwGDzlIMHj
 q/7w==
X-Received: by 10.194.76.7 with SMTP id g7mr14164727wjw.50.1361116159500;
 Sun, 17 Feb 2013 07:49:19 -0800 (PST)
Received: from [192.168.0.13] (did75-17-88-165-130-96.fbx.proxad.net.
 [88.165.130.96])
 by mx.google.com with ESMTPS id n2sm15221335wiy.6.2013.02.17.07.49.16
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Sun, 17 Feb 2013 07:49:18 -0800 (PST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: zfs raid1 error resilvering and mount
From: Fleuriot Damien <ml@my.gd>
In-Reply-To: <CAOrTs_aRL1cO3Jm1YrXoQRGoUX4vVqYKvFcTXbzyijYAhDsCVA@mail.gmail.com>
Date: Sun, 17 Feb 2013 16:49:16 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd>
References: <CAOrTs_aRL1cO3Jm1YrXoQRGoUX4vVqYKvFcTXbzyijYAhDsCVA@mail.gmail.com>
To: Konstantin Kuklin <konstantin.kuklin@gmail.com>
X-Mailer: Apple Mail (2.1499)
X-Gm-Message-State: ALoCoQlrKOlQ3DxEH5EJQg53TWZS6xhL9Q0GwMjZbCtFVzx4d18znNvbEVhXIRO4EyVqf2TDvWuD
Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 17 Feb 2013 15:49:21 -0000

Hmmm, zfs destroy -f zroot/var/crash ?

Then you can try to zfs mount -a


Removing pjd and mm from cc, if they want to read your message they're =
old enough to check their ML subscription.


On Feb 17, 2013, at 3:46 PM, Konstantin Kuklin =
<konstantin.kuklin@gmail.com> wrote:

> hi, i have raid1 on zfs with 2 device on pool
> first device died and boot from second not working...
>=20
> i try to get http://mfsbsd.vx.sk/ flash and load from it with zpool =
import
> http://puu.sh/2402E
>=20
> when  i load zfs.ko and opensolaris.ko i see this message:
> Solaris: WARNING: Can't open objset for zroot/var/crash
> Solaris: WARNING: Can't open objset for zroot/var/crash
>=20
> zpool status:
> http://puu.sh/2405f
>=20
> resilvering freeze with:
> zpool status -v
>        .............
>        zroot/usr:<0x28ff>
>        zroot/usr:<0x29ff>
>        zroot/usr:<0x2aff>
>        zroot/var/crash:<0x0>
> root@Flash:/root #
>=20
> how i can delete or drop it fs zroot/var/crash (1m-10m size i didn`t
> remember) and mount other zfs points with my data
> --=20
> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC
> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0=B0=
=D0=BD=D1=82=D0=B8=D0=BD.
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 18 07:48:37 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 1B0AE285;
 Mon, 18 Feb 2013 07:48:37 +0000 (UTC)
 (envelope-from konstantin.kuklin@gmail.com)
Received: from mail-qa0-f42.google.com (mail-qa0-f42.google.com
 [209.85.216.42]) by mx1.freebsd.org (Postfix) with ESMTP id B4BB6180;
 Mon, 18 Feb 2013 07:48:36 +0000 (UTC)
Received: by mail-qa0-f42.google.com with SMTP id cr7so1142172qab.15
 for <multiple recipients>; Sun, 17 Feb 2013 23:48:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type:content-transfer-encoding;
 bh=cT4axFSL92SPKDjGTdoRlDwT+fTQwVAfQ1UkfHo2XXQ=;
 b=HCuvugKmIja6AXqf4LaG9YZnlMlSrHyPCzFnGCTpyfkImax6iNpb30wjyHS+RII7Fu
 0MOXiI33Cy9hQcyfAjNNx5CZ+vX1HoLdrpodzT7xnH6SuznOmzWYjy29hmk35xE9iFGj
 0foZXtDRznBmfHU4EHATHMHt0SdecHMO1z2fQwwskRhe4ZZ/fVq1kNEI09N5fvEN+e8p
 q9U9CjiJJj5uO8T7+wYbY0GUolD5ZvkinYvgolYReMl4jEErVVibQbhggiWqK0sdpUwl
 vj80IrnKt+S9uITDPMj6hawuIH7lmyrr9v98Cu2v61na5K/449dyUyuA2ZNyi0fM0Dod
 yGjA==
MIME-Version: 1.0
X-Received: by 10.49.2.35 with SMTP id 3mr4510000qer.36.1361173709882; Sun, 17
 Feb 2013 23:48:29 -0800 (PST)
Received: by 10.49.98.130 with HTTP; Sun, 17 Feb 2013 23:48:29 -0800 (PST)
In-Reply-To: <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd>
References: <CAOrTs_aRL1cO3Jm1YrXoQRGoUX4vVqYKvFcTXbzyijYAhDsCVA@mail.gmail.com>
 <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd>
Date: Mon, 18 Feb 2013 11:48:29 +0400
Message-ID: <CAOrTs_a1CX2tiBe+1zzsSMw+TTqvfFJD=Dr-dUURa_4s4aSYCA@mail.gmail.com>
Subject: Re: zfs raid1 error resilvering and mount
From: Konstantin Kuklin <konstantin.kuklin@gmail.com>
To: Fleuriot Damien <ml@my.gd>
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Feb 2013 07:48:37 -0000

i can`t do it, because resilvering in progress(freeze on 0.1%) and zfs
list empty

2013/2/17 Fleuriot Damien <ml@my.gd>:
> Hmmm, zfs destroy -f zroot/var/crash ?
>
> Then you can try to zfs mount -a
>
>
>
> Removing pjd and mm from cc, if they want to read your message they're ol=
d enough to check their ML subscription.
>
>
> On Feb 17, 2013, at 3:46 PM, Konstantin Kuklin <konstantin.kuklin@gmail.c=
om> wrote:
>
>> hi, i have raid1 on zfs with 2 device on pool
>> first device died and boot from second not working...
>>
>> i try to get http://mfsbsd.vx.sk/ flash and load from it with zpool impo=
rt
>> http://puu.sh/2402E
>>
>> when  i load zfs.ko and opensolaris.ko i see this message:
>> Solaris: WARNING: Can't open objset for zroot/var/crash
>> Solaris: WARNING: Can't open objset for zroot/var/crash
>>
>> zpool status:
>> http://puu.sh/2405f
>>
>> resilvering freeze with:
>> zpool status -v
>>        .............
>>        zroot/usr:<0x28ff>
>>        zroot/usr:<0x29ff>
>>        zroot/usr:<0x2aff>
>>        zroot/var/crash:<0x0>
>> root@Flash:/root #
>>
>> how i can delete or drop it fs zroot/var/crash (1m-10m size i didn`t
>> remember) and mount other zfs points with my data
>> --
>> =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD
>> =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE.
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>


--
=F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD
=EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE.

From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 18 09:20:37 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 26AAC6BA
 for <freebsd-fs@freebsd.org>; Mon, 18 Feb 2013 09:20:37 +0000 (UTC)
 (envelope-from ml@my.gd)
Received: from mail-we0-x233.google.com (mail-we0-x233.google.com
 [IPv6:2a00:1450:400c:c03::233])
 by mx1.freebsd.org (Postfix) with ESMTP id B7DE2738
 for <freebsd-fs@freebsd.org>; Mon, 18 Feb 2013 09:20:36 +0000 (UTC)
Received: by mail-we0-f179.google.com with SMTP id p43so2805207wea.10
 for <freebsd-fs@freebsd.org>; Mon, 18 Feb 2013 01:20:36 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-received:content-type:mime-version:subject:from:in-reply-to:date
 :cc:content-transfer-encoding:message-id:references:to:x-mailer
 :x-gm-message-state;
 bh=vmIm9LMYXGEY/CeIYC6vUXKe92lPLhLQI79IByJimGQ=;
 b=XxpkQkBr1x/SB3qm/KmXUckb/mNzNxML2wp9cnucNutA/ETP0MAdBEP8Zi2OZ2viY4
 mS0/qhbkRQtQ/4OCP+PhmuroTWlroGgpqpHZpMsvPDhLfB8e4HmJcsHG0JWvcwsFsPqd
 XPj0se8+Ttp+7QBYK6lNpLJXsG0BCXjQq4KdVU8QcMwtW+KxK6tFxj7ahG4a2A2m36rY
 3WxZmw6GU5lQeF0J1WUQbOi5Kp+Ty+t5/vozEa8eydkQuYeNZ53K7l7g4yfU81xXODWi
 27YjYFpGO6lAjb4So1wXYvmyT1pkmoBRlG+90fB1G3TjRnpHbI86wMntkcyk1WCcMdf/
 svCQ==
X-Received: by 10.180.108.3 with SMTP id hg3mr16623336wib.33.1361179235927;
 Mon, 18 Feb 2013 01:20:35 -0800 (PST)
Received: from [10.75.0.66] ([83.167.62.196])
 by mx.google.com with ESMTPS id j4sm16786695wiz.10.2013.02.18.01.20.25
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Mon, 18 Feb 2013 01:20:34 -0800 (PST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: zfs raid1 error resilvering and mount
From: Fleuriot Damien <ml@my.gd>
In-Reply-To: <CAOrTs_a1CX2tiBe+1zzsSMw+TTqvfFJD=Dr-dUURa_4s4aSYCA@mail.gmail.com>
Date: Mon, 18 Feb 2013 10:20:26 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <B8CACAE9-64E5-40EE-821D-079524579C4F@my.gd>
References: <CAOrTs_aRL1cO3Jm1YrXoQRGoUX4vVqYKvFcTXbzyijYAhDsCVA@mail.gmail.com>
 <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd>
 <CAOrTs_a1CX2tiBe+1zzsSMw+TTqvfFJD=Dr-dUURa_4s4aSYCA@mail.gmail.com>
To: Konstantin Kuklin <konstantin.kuklin@gmail.com>
X-Mailer: Apple Mail (2.1499)
X-Gm-Message-State: ALoCoQkGhu5zMU5dEYlGnnIQyupXcawdrhwa7Tas2mDHxCEKGqosY8hYij0ehZM3SSDCj9aExj5p
Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Feb 2013 09:20:37 -0000

Reassure me here, you've replaced your failed vdev before trying to =
resilver right ?

Your zpool status suggests otherwise, so I only want to make sure this =
is a status from before replacing your drive.


On Feb 18, 2013, at 8:48 AM, Konstantin Kuklin =
<konstantin.kuklin@gmail.com> wrote:

> i can`t do it, because resilvering in progress(freeze on 0.1%) and zfs
> list empty
>=20
> 2013/2/17 Fleuriot Damien <ml@my.gd>:
>> Hmmm, zfs destroy -f zroot/var/crash ?
>>=20
>> Then you can try to zfs mount -a
>>=20
>>=20
>>=20
>> Removing pjd and mm from cc, if they want to read your message =
they're old enough to check their ML subscription.
>>=20
>>=20
>> On Feb 17, 2013, at 3:46 PM, Konstantin Kuklin =
<konstantin.kuklin@gmail.com> wrote:
>>=20
>>> hi, i have raid1 on zfs with 2 device on pool
>>> first device died and boot from second not working...
>>>=20
>>> i try to get http://mfsbsd.vx.sk/ flash and load from it with zpool =
import
>>> http://puu.sh/2402E
>>>=20
>>> when  i load zfs.ko and opensolaris.ko i see this message:
>>> Solaris: WARNING: Can't open objset for zroot/var/crash
>>> Solaris: WARNING: Can't open objset for zroot/var/crash
>>>=20
>>> zpool status:
>>> http://puu.sh/2405f
>>>=20
>>> resilvering freeze with:
>>> zpool status -v
>>>       .............
>>>       zroot/usr:<0x28ff>
>>>       zroot/usr:<0x29ff>
>>>       zroot/usr:<0x2aff>
>>>       zroot/var/crash:<0x0>
>>> root@Flash:/root #
>>>=20
>>> how i can delete or drop it fs zroot/var/crash (1m-10m size i didn`t
>>> remember) and mount other zfs points with my data
>>> --
>>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC
>>> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0=
=B0=D0=BD=D1=82=D0=B8=D0=BD.
>>> _______________________________________________
>>> freebsd-fs@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to =
"freebsd-fs-unsubscribe@freebsd.org"
>>=20
>=20
>=20
>=20
> --
> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC
> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0=B0=
=D0=BD=D1=82=D0=B8=D0=BD.


From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 18 11:06:44 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 149CD1C4
 for <freebsd-fs@FreeBSD.org>; Mon, 18 Feb 2013 11:06:44 +0000 (UTC)
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id ECC42E1F
 for <freebsd-fs@FreeBSD.org>; Mon, 18 Feb 2013 11:06:43 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r1IB6h2t061518
 for <freebsd-fs@FreeBSD.org>; Mon, 18 Feb 2013 11:06:43 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r1IB6hFQ061516
 for freebsd-fs@FreeBSD.org; Mon, 18 Feb 2013 11:06:43 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 18 Feb 2013 11:06:43 GMT
Message-Id: <201302181106.r1IB6hFQ061516@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
 owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@freebsd.org>
To: freebsd-fs@FreeBSD.org
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Feb 2013 11:06:44 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/176179  fs         [nfs] nfs client KASSERT: panic: attempt to set TDF_SB
o kern/176141  fs         [zfs] sharesmb=on makes errors for sharenfs, and still
o kern/175950  fs         [zfs] Possible deadlock in zfs after long uptime
o kern/175897  fs         [zfs] operations on readonly zpool hang
o kern/175179  fs         [zfs] ZFS may attach wrong device on move
o kern/175071  fs         [ufs] [panic] softdep_deallocate_dependencies: unrecov
o kern/174372  fs         [zfs] Pagefault appears to be related to ZFS
o kern/174315  fs         [zfs] chflags uchg not supported
o kern/174310  fs         [zfs] root point mounting broken on CURRENT with multi
o kern/174279  fs         [ufs] UFS2-SU+J journal and filesystem corruption
o kern/174060  fs         [ext2fs] Ext2FS system crashes (buffer overflow?)
o kern/173830  fs         [zfs] Brain-dead simple change to ZFS error descriptio
o kern/173718  fs         [zfs] phantom directory in zraid2 pool
f kern/173657  fs         [nfs] strange UID map with nfsuserd
o kern/173363  fs         [zfs] [panic] Panic on 'zpool replace' on readonly poo
o kern/173136  fs         [unionfs] mounting above the NFS read-only share panic
o kern/172348  fs         [unionfs] umount -f of filesystem in use with readonly
o kern/172334  fs         [unionfs] unionfs permits recursive union mounts; caus
o kern/171626  fs         [tmpfs] tmpfs should be noisier when the requested siz
o kern/171415  fs         [zfs] zfs recv fails with "cannot receive incremental 
o kern/170945  fs         [gpt] disk layout not portable between direct connect 
o bin/170778   fs         [zfs] [panic] FreeBSD panics randomly
o kern/170680  fs         [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA
o kern/170497  fs         [xfs][panic] kernel will panic whenever I ls a mounted
o kern/169945  fs         [zfs] [panic] Kernel panic while importing zpool (afte
o kern/169480  fs         [zfs] ZFS stalls on heavy I/O
o kern/169398  fs         [zfs] Can't remove file with permanent error
o kern/169339  fs         panic while " : > /etc/123"
o kern/169319  fs         [zfs] zfs resilver can't complete
o kern/168947  fs         [nfs] [zfs] .zfs/snapshot directory is messed up when 
o kern/168942  fs         [nfs] [hang] nfsd hangs after being restarted (not -HU
o kern/168158  fs         [zfs] incorrect parsing of sharenfs options in zfs (fs
o kern/167979  fs         [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste
o kern/167977  fs         [smbfs] mount_smbfs results are differ when utf-8 or U
o kern/167688  fs         [fusefs] Incorrect signal handling with direct_io
o kern/167685  fs         [zfs] ZFS on USB drive prevents shutdown / reboot
o kern/167612  fs         [portalfs] The portal file system gets stuck inside po
o kern/167272  fs         [zfs] ZFS Disks reordering causes ZFS to pick the wron
o kern/167260  fs         [msdosfs] msdosfs disk was mounted the second time whe
o kern/167109  fs         [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene
o kern/167105  fs         [nfs] mount_nfs can not handle source exports wiht mor
o kern/167067  fs         [zfs] [panic] ZFS panics the server
o kern/167065  fs         [zfs] boot fails when a spare is the boot disk
o kern/167048  fs         [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF
o kern/166912  fs         [ufs] [panic] Panic after converting Softupdates to jo
o kern/166851  fs         [zfs] [hang] Copying directory from the mounted UFS di
o kern/166477  fs         [nfs] NFS data corruption.
o kern/165950  fs         [ffs] SU+J and fsck problem
o kern/165923  fs         [nfs] Writing to NFS-backed mmapped files fails if flu
o kern/165521  fs         [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31
o kern/165392  fs         Multiple mkdir/rmdir fails with errno 31
o kern/165087  fs         [unionfs] lock violation in unionfs
o kern/164472  fs         [ufs] fsck -B panics on particular data inconsistency
o kern/164370  fs         [zfs] zfs destroy for snapshot fails on i386 and sparc
o kern/164261  fs         [nullfs] [patch] fix panic with NFS served from NULLFS
o kern/164256  fs         [zfs] device entry for volume is not created after zfs
o kern/164184  fs         [ufs] [panic] Kernel panic with ufs_makeinode
o kern/163801  fs         [md] [request] allow mfsBSD legacy installed in 'swap'
o kern/163770  fs         [zfs] [hang] LOR between zfs&syncer + vnlru leading to
o kern/163501  fs         [nfs] NFS exporting a dir and a subdir in that dir to 
o kern/162944  fs         [coda] Coda file system module looks broken in 9.0
o kern/162860  fs         [zfs] Cannot share ZFS filesystem to hosts with a hyph
o kern/162751  fs         [zfs] [panic] kernel panics during file operations
o kern/162591  fs         [nullfs] cross-filesystem nullfs does not work as expe
o kern/162519  fs         [zfs] "zpool import" relies on buggy realpath() behavi
o kern/162362  fs         [snapshots] [panic] ufs with snapshot(s) panics when g
o kern/161968  fs         [zfs] [hang] renaming snapshot with -r including a zvo
o kern/161864  fs         [ufs] removing journaling from UFS partition fails on 
o bin/161807   fs         [patch] add option for explicitly specifying metadata 
o kern/161579  fs         [smbfs] FreeBSD sometimes panics when an smb share is 
o kern/161533  fs         [zfs] [panic] zfs receive panic: system ioctl returnin
o kern/161438  fs         [zfs] [panic] recursed on non-recursive spa_namespace_
o kern/161424  fs         [nullfs] __getcwd() calls fail when used on nullfs mou
o kern/161280  fs         [zfs] Stack overflow in gptzfsboot
o kern/161205  fs         [nfs] [pfsync] [regression] [build] Bug report freebsd
o kern/161169  fs         [zfs] [panic] ZFS causes kernel panic in dbuf_dirty
o kern/161112  fs         [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3
o kern/160893  fs         [zfs] [panic] 9.0-BETA2 kernel panic
o kern/160860  fs         [ufs] Random UFS root filesystem corruption with SU+J 
o kern/160801  fs         [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o
o kern/160790  fs         [fusefs] [panic] VPUTX: negative ref count with FUSE
o kern/160777  fs         [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo
o kern/160706  fs         [zfs] zfs bootloader fails when a non-root vdev exists
o kern/160591  fs         [zfs] Fail to boot on zfs root with degraded raidz2 [r
o kern/160410  fs         [smbfs] [hang] smbfs hangs when transferring large fil
o kern/160283  fs         [zfs] [patch] 'zfs list' does abort in make_dataset_ha
o kern/159930  fs         [ufs] [panic] kernel core
o kern/159402  fs         [zfs][loader] symlinks cause I/O errors
o kern/159357  fs         [zfs] ZFS MAXNAMELEN macro has confusing name (off-by-
o kern/159356  fs         [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s
o kern/159351  fs         [nfs] [patch] - divide by zero in mountnfs()
o kern/159251  fs         [zfs] [request]: add FLETCHER4 as DEDUP hash option
o kern/159077  fs         [zfs] Can't cd .. with latest zfs version
o kern/159048  fs         [smbfs] smb mount corrupts large files
o kern/159045  fs         [zfs] [hang] ZFS scrub freezes system
o kern/158839  fs         [zfs] ZFS Bootloader Fails if there is a Dead Disk
o kern/158802  fs         amd(8) ICMP storm and unkillable process.
o kern/158231  fs         [nullfs] panic on unmounting nullfs mounted over ufs o
f kern/157929  fs         [nfs] NFS slow read
o kern/157399  fs         [zfs] trouble with: mdconfig force delete && zfs strip
o kern/157179  fs         [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov
o kern/156797  fs         [zfs] [panic] Double panic with FreeBSD 9-CURRENT and 
o kern/156781  fs         [zfs] zfs is losing the snapshot directory,
p kern/156545  fs         [ufs] mv could break UFS on SMP systems
o kern/156193  fs         [ufs] [hang] UFS snapshot hangs && deadlocks processes
o kern/156039  fs         [nullfs] [unionfs] nullfs + unionfs do not compose, re
o kern/155615  fs         [zfs] zfs v28 broken on sparc64 -current
o kern/155587  fs         [zfs] [panic] kernel panic with zfs
p kern/155411  fs         [regression] [8.2-release] [tmpfs]: mount: tmpfs : No 
o kern/155199  fs         [ext2fs] ext3fs mounted as ext2fs gives I/O errors
o bin/155104   fs         [zfs][patch] use /dev prefix by default when importing
o kern/154930  fs         [zfs] cannot delete/unlink file from full volume -> EN
o kern/154828  fs         [msdosfs] Unable to create directories on external USB
o kern/154491  fs         [smbfs] smb_co_lock: recursive lock for object 1
p kern/154228  fs         [md] md getting stuck in wdrain state
o kern/153996  fs         [zfs] zfs root mount error while kernel is not located
o kern/153753  fs         [zfs] ZFS v15 - grammatical error when attempting to u
o kern/153716  fs         [zfs] zpool scrub time remaining is incorrect
o kern/153695  fs         [patch] [zfs] Booting from zpool created on 4k-sector 
o kern/153680  fs         [xfs] 8.1 failing to mount XFS partitions
o kern/153418  fs         [zfs] [panic] Kernel Panic occurred writing to zfs vol
o kern/153351  fs         [zfs] locking directories/files in ZFS
o bin/153258   fs         [patch][zfs] creating ZVOLs requires `refreservation' 
s kern/153173  fs         [zfs] booting from a gzip-compressed dataset doesn't w
o bin/153142   fs         [zfs] ls -l outputs `ls: ./.zfs: Operation not support
o kern/153126  fs         [zfs] vdev failure, zpool=peegel type=vdev.too_small
o kern/152022  fs         [nfs] nfs service hangs with linux client [regression]
o kern/151942  fs         [zfs] panic during ls(1) zfs snapshot directory
o kern/151905  fs         [zfs] page fault under load in /sbin/zfs
o bin/151713   fs         [patch] Bug in growfs(8) with respect to 32-bit overfl
o kern/151648  fs         [zfs] disk wait bug
o kern/151629  fs         [fs] [patch] Skip empty directory entries during name 
o kern/151330  fs         [zfs] will unshare all zfs filesystem after execute a 
o kern/151326  fs         [nfs] nfs exports fail if netgroups contain duplicate 
o kern/151251  fs         [ufs] Can not create files on filesystem with heavy us
o kern/151226  fs         [zfs] can't delete zfs snapshot
o kern/150503  fs         [zfs] ZFS disks are UNAVAIL and corrupted after reboot
o kern/150501  fs         [zfs] ZFS vdev failure vdev.bad_label on amd64
o kern/150390  fs         [zfs] zfs deadlock when arcmsr reports drive faulted
o kern/150336  fs         [nfs] mountd/nfsd became confused; refused to reload n
o kern/149208  fs         mksnap_ffs(8) hang/deadlock
o kern/149173  fs         [patch] [zfs] make OpenSolaris <sys/nvpair.h> installa
o kern/149015  fs         [zfs] [patch] misc fixes for ZFS code to build on Glib
o kern/149014  fs         [zfs] [patch] declarations in ZFS libraries/utilities 
o kern/149013  fs         [zfs] [patch] make ZFS makefiles use the libraries fro
o kern/148504  fs         [zfs] ZFS' zpool does not allow replacing drives to be
o kern/148490  fs         [zfs]: zpool attach - resilver bidirectionally, and re
o kern/148368  fs         [zfs] ZFS hanging forever on 8.1-PRERELEASE
o kern/148138  fs         [zfs] zfs raidz pool commands freeze
o kern/147903  fs         [zfs] [panic] Kernel panics on faulty zfs device
o kern/147881  fs         [zfs] [patch] ZFS "sharenfs" doesn't allow different "
o kern/147420  fs         [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt 
o kern/146941  fs         [zfs] [panic] Kernel Double Fault - Happens constantly
o kern/146786  fs         [zfs] zpool import hangs with checksum errors
o kern/146708  fs         [ufs] [panic] Kernel panic in softdep_disk_write_compl
o kern/146528  fs         [zfs] Severe memory leak in ZFS on i386
o kern/146502  fs         [nfs] FreeBSD 8 NFS Client Connection to Server
s kern/145712  fs         [zfs] cannot offline two drives in a raidz2 configurat
o kern/145411  fs         [xfs] [panic] Kernel panics shortly after mounting an 
f bin/145309   fs         bsdlabel: Editing disk label invalidates the whole dev
o kern/145272  fs         [zfs] [panic] Panic during boot when accessing zfs on 
o kern/145246  fs         [ufs] dirhash in 7.3 gratuitously frees hashes when it
o kern/145238  fs         [zfs] [panic] kernel panic on zpool clear tank
o kern/145229  fs         [zfs] Vast differences in ZFS ARC behavior between 8.0
o kern/145189  fs         [nfs] nfsd performs abysmally under load
o kern/144929  fs         [ufs] [lor] vfs_bio.c + ufs_dirhash.c
p kern/144447  fs         [zfs] sharenfs fsunshare() & fsshare_main() non functi
o kern/144416  fs         [panic] Kernel panic on online filesystem optimization
s kern/144415  fs         [zfs] [panic] kernel panics on boot after zfs crash
o kern/144234  fs         [zfs] Cannot boot machine with recent gptzfsboot code 
o kern/143825  fs         [nfs] [panic] Kernel panic on NFS client
o bin/143572   fs         [zfs] zpool(1): [patch] The verbose output from iostat
o kern/143212  fs         [nfs] NFSv4 client strange work ...
o kern/143184  fs         [zfs] [lor] zfs/bufwait LOR
o kern/142878  fs         [zfs] [vfs] lock order reversal
o kern/142597  fs         [ext2fs] ext2fs does not work on filesystems with real
o kern/142489  fs         [zfs] [lor] allproc/zfs LOR
o kern/142466  fs         Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re
o kern/142306  fs         [zfs] [panic] ZFS drive (from OSX Leopard) causes two 
o kern/142068  fs         [ufs] BSD labels are got deleted spontaneously
o kern/141897  fs         [msdosfs] [panic] Kernel panic. msdofs: file name leng
o kern/141463  fs         [nfs] [panic] Frequent kernel panics after upgrade fro
o kern/141305  fs         [zfs] FreeBSD ZFS+sendfile severe performance issues (
o kern/141091  fs         [patch] [nullfs] fix panics with DIAGNOSTIC enabled
o kern/141086  fs         [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS
o kern/141010  fs         [zfs] "zfs scrub" fails when backed by files in UFS2
o kern/140888  fs         [zfs] boot fail from zfs root while the pool resilveri
o kern/140661  fs         [zfs] [patch] /boot/loader fails to work on a GPT/ZFS-
o kern/140640  fs         [zfs] snapshot crash
o kern/140068  fs         [smbfs] [patch] smbfs does not allow semicolon in file
o kern/139725  fs         [zfs] zdb(1) dumps core on i386 when examining zpool c
o kern/139715  fs         [zfs] vfs.numvnodes leak on busy zfs
p bin/139651   fs         [nfs] mount(8): read-only remount of NFS volume does n
o kern/139407  fs         [smbfs] [panic] smb mount causes system crash if remot
o kern/138662  fs         [panic] ffs_blkfree: freeing free block
o kern/138421  fs         [ufs] [patch] remove UFS label limitations
o kern/138202  fs         mount_msdosfs(1) see only 2Gb
o kern/136968  fs         [ufs] [lor] ufs/bufwait/ufs (open)
o kern/136945  fs         [ufs] [lor] filedesc structure/ufs (poll)
o kern/136944  fs         [ffs] [lor] bufwait/snaplk (fsync)
o kern/136873  fs         [ntfs] Missing directories/files on NTFS volume
o kern/136865  fs         [nfs] [patch] NFS exports atomic and on-the-fly atomic
p kern/136470  fs         [nfs] Cannot mount / in read-only, over NFS
o kern/135546  fs         [zfs] zfs.ko module doesn't ignore zpool.cache filenam
o kern/135469  fs         [ufs] [panic] kernel crash on md operation in ufs_dirb
o kern/135050  fs         [zfs] ZFS clears/hides disk errors on reboot
o kern/134491  fs         [zfs] Hot spares are rather cold...
o kern/133676  fs         [smbfs] [panic] umount -f'ing a vnode-based memory dis
p kern/133174  fs         [msdosfs] [patch] msdosfs must support multibyte inter
o kern/132960  fs         [ufs] [panic] panic:ffs_blkfree: freeing free frag
o kern/132397  fs         reboot causes filesystem corruption (failure to sync b
o kern/132331  fs         [ufs] [lor] LOR ufs and syncer
o kern/132237  fs         [msdosfs] msdosfs has problems to read MSDOS Floppy
o kern/132145  fs         [panic] File System Hard Crashes
o kern/131441  fs         [unionfs] [nullfs] unionfs and/or nullfs not combineab
o kern/131360  fs         [nfs] poor scaling behavior of the NFS server under lo
o kern/131342  fs         [nfs] mounting/unmounting of disks causes NFS to fail
o bin/131341   fs         makefs: error "Bad file descriptor"  on the mount poin
o kern/130920  fs         [msdosfs] cp(1) takes 100% CPU time while copying file
o kern/130210  fs         [nullfs] Error by check nullfs
o kern/129760  fs         [nfs] after 'umount -f' of a stale NFS share FreeBSD l
o kern/129488  fs         [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: 
o kern/129231  fs         [ufs] [patch] New UFS mount (norandom) option - mostly
o kern/129152  fs         [panic] non-userfriendly panic when trying to mount(8)
o kern/127787  fs         [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs
o bin/127270   fs         fsck_msdosfs(8) may crash if BytesPerSec is zero
o kern/127029  fs         [panic] mount(8): trying to mount a write protected zi
o kern/126287  fs         [ufs] [panic] Kernel panics while mounting an UFS file
o kern/125895  fs         [ffs] [panic] kernel: panic: ffs_blkfree: freeing free
s kern/125738  fs         [zfs] [request] SHA256 acceleration in ZFS
o kern/123939  fs         [msdosfs] corrupts new files
o kern/122380  fs         [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash
o bin/122172   fs         [fs]: amd(8) automount daemon dies on 6.3-STABLE i386,
o bin/121898   fs         [nullfs] pwd(1)/getcwd(2) fails with Permission denied
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
o kern/120483  fs         [ntfs] [patch] NTFS filesystem locking changes
o kern/120482  fs         [ntfs] [patch] Sync style changes between NetBSD and F
o kern/118912  fs         [2tb] disk sizing/geometry problem with large array
o kern/118713  fs         [minidump] [patch] Display media size required for a k
o kern/118318  fs         [nfs] NFS server hangs under special circumstances
o bin/118249   fs         [ufs] mv(1): moving a directory changes its mtime
o kern/118126  fs         [nfs] [patch] Poor NFS server write performance
o kern/118107  fs         [ntfs] [panic] Kernel panic when accessing a file at N
o kern/117954  fs         [ufs] dirhash on very large directories blocks the mac
o bin/117315   fs         [smbfs] mount_smbfs(8) and related options can't mount
o kern/117158  fs         [zfs] zpool scrub causes panic if geli vdevs detach on
o bin/116980   fs         [msdosfs] [patch] mount_msdosfs(8) resets some flags f
o conf/116931  fs         lack of fsck_cd9660 prevents mounting iso images with 
o kern/116583  fs         [ffs] [hang] System freezes for short time when using 
o bin/115361   fs         [zfs] mount(8) gets into a state where it won't set/un
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/113852  fs         [smbfs] smbfs does not properly implement DFS referral
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/111843  fs         [msdosfs] Long Names of files are incorrectly created 
o kern/111782  fs         [ufs] dump(8) fails horribly for large filesystems
s bin/111146   fs         [2tb] fsck(8) fails on 6T filesystem
o bin/107829   fs         [2TB] fdisk(8): invalid boundary checking in fdisk / w
o kern/106107  fs         [ufs] left-over fsck_snapshot after unfinished backgro
o kern/104406  fs         [ufs] Processes get stuck in "ufs" state under persist
o kern/104133  fs         [ext2fs] EXT2FS module corrupts EXT2/3 filesystems
o kern/103035  fs         [ntfs] Directories in NTFS mounted disc images appear 
o kern/101324  fs         [smbfs] smbfs sometimes not case sensitive when it's s
o kern/99290   fs         [ntfs] mount_ntfs ignorant of cluster sizes
s bin/97498    fs         [request] newfs(8) has no option to clear the first 12
o kern/97377   fs         [ntfs] [patch] syntax cleanup for ntfs_ihash.c
o kern/95222   fs         [cd9660] File sections on ISO9660 level 3 CDs ignored
o kern/94849   fs         [ufs] rename on UFS filesystem is not atomic
o bin/94810    fs         fsck(8) incorrectly reports 'file system marked clean'
o kern/94769   fs         [ufs] Multiple file deletions on multi-snapshotted fil
o kern/94733   fs         [smbfs] smbfs may cause double unlock
o kern/93942   fs         [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D
o kern/92272   fs         [ffs] [hang] Filling a filesystem while creating a sna
o kern/91134   fs         [smbfs] [patch] Preserve access and modification time 
a kern/90815   fs         [smbfs] [patch] SMBFS with character conversions somet
o kern/88657   fs         [smbfs] windows client hang when browsing a samba shar
o kern/88555   fs         [panic] ffs_blkfree: freeing free frag on AMD 64
o bin/87966    fs         [patch] newfs(8): introduce -A flag for newfs to enabl
o kern/87859   fs         [smbfs] System reboot while umount smbfs.
o kern/86587   fs         [msdosfs] rm -r /PATH fails with lots of small files
o bin/85494    fs         fsck_ffs: unchecked use of cg_inosused macro etc.
o kern/80088   fs         [smbfs] Incorrect file time setting on NTFS mounted vi
o bin/74779    fs         Background-fsck checks one filesystem twice and omits 
o kern/73484   fs         [ntfs] Kernel panic when doing `ls` from the client si
o bin/73019    fs         [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino
o kern/71774   fs         [ntfs] NTFS cannot "see" files on a WinXP filesystem
o bin/70600    fs         fsck(8) throws files away when it can't grow lost+foun
o kern/68978   fs         [panic] [ufs] crashes with failing hard disk, loose po
o kern/65920   fs         [nwfs] Mounted Netware filesystem behaves strange
o kern/65901   fs         [smbfs] [patch] smbfs fails fsx write/truncate-down/tr
o kern/61503   fs         [smbfs] mount_smbfs does not work as non-root
o kern/55617   fs         [smbfs] Accessing an nsmb-mounted drive via a smb expo
o kern/51685   fs         [hang] Unbounded inode allocation causes kernel to loc
o kern/36566   fs         [smbfs] System reboot with dead smb mount and umount
o bin/27687    fs         fsck(8) wrapper is not properly passing options to fsc
o kern/18874   fs         [2TB] 32bit NFS servers export wrong negative values t

300 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 18 19:05:59 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 5A78243E;
 Mon, 18 Feb 2013 19:05:59 +0000 (UTC)
 (envelope-from eadler@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 3603EFD3;
 Mon, 18 Feb 2013 19:05:59 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r1IJ5xFD056138;
 Mon, 18 Feb 2013 19:05:59 GMT
 (envelope-from eadler@freefall.freebsd.org)
Received: (from eadler@localhost)
 by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r1IJ5xID056134;
 Mon, 18 Feb 2013 19:05:59 GMT (envelope-from eadler)
Date: Mon, 18 Feb 2013 19:05:59 GMT
Message-Id: <201302181905.r1IJ5xID056134@freefall.freebsd.org>
To: eadler@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: eadler@FreeBSD.org
Subject: Re: bin/176253: zfs pool indentation is misleading/wrong
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Feb 2013 19:05:59 -0000

Synopsis: zfs pool indentation is misleading/wrong

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: eadler
Responsible-Changed-When: Mon Feb 18 19:05:40 UTC 2013
Responsible-Changed-Why: 
over to appropriate list

http://www.freebsd.org/cgi/query-pr.cgi?pr=176253

From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 18 19:10:01 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 59CD85DE
 for <freebsd-fs@smarthost.ysv.freebsd.org>;
 Mon, 18 Feb 2013 19:10:01 +0000 (UTC)
 (envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 470FDAD
 for <freebsd-fs@smarthost.ysv.freebsd.org>;
 Mon, 18 Feb 2013 19:10:01 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r1IJA0vk056295
 for <freebsd-fs@freefall.freebsd.org>; Mon, 18 Feb 2013 19:10:01 GMT
 (envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r1IJA0ES056291;
 Mon, 18 Feb 2013 19:10:00 GMT (envelope-from gnats)
Date: Mon, 18 Feb 2013 19:10:00 GMT
Message-Id: <201302181910.r1IJA0ES056291@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
Cc: 
From: Nathan Rich <n4th4nr1ch@gmail.com>
Subject: Re: misc/176253: zfs pool indentation is misleading/wrong
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: Nathan Rich <n4th4nr1ch@gmail.com>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Feb 2013 19:10:01 -0000

The following reply was made to PR bin/176253; it has been noted by GNATS.

From: Nathan Rich <n4th4nr1ch@gmail.com>
To: bug-followup@freebsd.org, Nathan.Rich@dynastysystems.com
Cc:  
Subject: Re: misc/176253: zfs pool indentation is misleading/wrong
Date: Mon, 18 Feb 2013 12:03:28 -0700

 --e89a8f921a1ea22c3d04d60462c0
 Content-Type: text/plain; charset=ISO-8859-1
 
 Indentation doesn't show up in the PR, so here is clarification:
 the cache section is presented as a top-level item, as if it were a
 different zpool.
 
 --e89a8f921a1ea22c3d04d60462c0
 Content-Type: text/html; charset=ISO-8859-1
 
 Indentation doesn&#39;t show up in the PR, so here is clarification:<div>the cache section is presented as a top-level item, as if it were a different zpool.</div>
 
 --e89a8f921a1ea22c3d04d60462c0--

From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 18 19:12:49 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id EE689677
 for <freebsd-fs@FreeBSD.org>; Mon, 18 Feb 2013 19:12:49 +0000 (UTC)
 (envelope-from jmg@h2.funkthat.com)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 by mx1.freebsd.org (Postfix) with ESMTP id ACD39C9
 for <freebsd-fs@FreeBSD.org>; Mon, 18 Feb 2013 19:12:49 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id r1IJChGE024579
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
 for <freebsd-fs@FreeBSD.org>; Mon, 18 Feb 2013 11:12:43 -0800 (PST)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id r1IJChgf024578
 for freebsd-fs@FreeBSD.org; Mon, 18 Feb 2013 11:12:43 -0800 (PST)
 (envelope-from jmg)
Date: Mon, 18 Feb 2013 11:12:42 -0800
From: John-Mark Gurney <jmg@funkthat.com>
To: freebsd-fs@FreeBSD.org
Subject: ZFS on 9.1 doesn't see errors on geli volumes...
Message-ID: <20130218191242.GI55866@funkthat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Mon, 18 Feb 2013 11:12:43 -0800 (PST)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Feb 2013 19:12:50 -0000

I'm running 9.1:
FreeBSD gold.funkthat.com 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #26 r241041M: Wed Dec 12 23:02:31 PST 2012     jmg@gold.funkthat.com:/usr/src.9stable/sys/amd64/compile/gold  amd64

The modifications are limited to improving AES-NI performance.

On a box, and decided to go full ZFS w/ geli encrypted volumes (including
root fs)...  One of the hard drives started going bad, so I started
seeing:
hpt27xx: Device error information 0x1000000
hpt27xx: Task file error, StatusReg=0x51, ErrReg=0x40, LBA[0-3]=0xf495e928,LBA[4-7]=0x0.
(da3:hpt27xx0:0:3:0): READ(10). CDB: 28 0 f4 95 e8 f8 0 0 80 0 
(da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed
(da3:hpt27xx0:0:3:0): Error 5, Unretryable error
GEOM_ELI: g_eli_read_done() failed label/toby.eli[READ(offset=2100974186496, length=90112)]

and:
(da3:hpt27xx0:0:3:0): WRITE(10). CDB: 2a 0 ef cc 10 90 0 0 8 0 
(da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed
(da3:hpt27xx0:0:3:0): Error 5, Unretryable error
GEOM_ELI: Crypto WRITE request failed (error=5). label/toby.eli[WRITE(offset=2059841654784, length=4096)]

So we can see that geli is failing, but zpool status command doesn't show
any errors at all...  The READ and WRITE columns both show 0 for the device..

Now I do know that the WRITEs are not retried, because if I do a scrub
afterward, it detects cksum errors, and does properly increases the
count in the CKSUM column...

Now if I pull a device, it will see that the device is lost, but no
matter how many read or write errors get returned by geli, zfs doesn't
seem to count them...

Has anyone else seen this w/ ZFS?  Is it possible that it's a problem w/
geli, and not ZFS?

I haven't tried to run a test w/ gnop to fail some read/writes on -current..

P.S. Please keep me cc'd, as I'm not on the list.

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 18 20:01:21 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id EEA29538;
 Mon, 18 Feb 2013 20:01:21 +0000 (UTC)
 (envelope-from jmg@h2.funkthat.com)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 by mx1.freebsd.org (Postfix) with ESMTP id B04B52AD;
 Mon, 18 Feb 2013 20:01:21 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id r1IK1LTl025235
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Mon, 18 Feb 2013 12:01:21 -0800 (PST)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id r1IK1L1W025234;
 Mon, 18 Feb 2013 12:01:21 -0800 (PST) (envelope-from jmg)
Date: Mon, 18 Feb 2013 12:01:21 -0800
From: John-Mark Gurney <jmg@funkthat.com>
To: freebsd-fs@FreeBSD.org
Subject: Re: ZFS on 9.1 doesn't see errors on geli volumes...
Message-ID: <20130218200121.GJ55866@funkthat.com>
References: <20130218191242.GI55866@funkthat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130218191242.GI55866@funkthat.com>
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Mon, 18 Feb 2013 12:01:21 -0800 (PST)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Feb 2013 20:01:22 -0000

John-Mark Gurney wrote this message on Mon, Feb 18, 2013 at 11:12 -0800:
> I'm running 9.1:
> FreeBSD gold.funkthat.com 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #26 r241041M: Wed Dec 12 23:02:31 PST 2012     jmg@gold.funkthat.com:/usr/src.9stable/sys/amd64/compile/gold  amd64
> 
> The modifications are limited to improving AES-NI performance.
> 
> On a box, and decided to go full ZFS w/ geli encrypted volumes (including
> root fs)...  One of the hard drives started going bad, so I started
> seeing:
> hpt27xx: Device error information 0x1000000
> hpt27xx: Task file error, StatusReg=0x51, ErrReg=0x40, LBA[0-3]=0xf495e928,LBA[4-7]=0x0.
> (da3:hpt27xx0:0:3:0): READ(10). CDB: 28 0 f4 95 e8 f8 0 0 80 0 
> (da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed
> (da3:hpt27xx0:0:3:0): Error 5, Unretryable error
> GEOM_ELI: g_eli_read_done() failed label/toby.eli[READ(offset=2100974186496, length=90112)]
> 
> and:
> (da3:hpt27xx0:0:3:0): WRITE(10). CDB: 2a 0 ef cc 10 90 0 0 8 0 
> (da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed
> (da3:hpt27xx0:0:3:0): Error 5, Unretryable error
> GEOM_ELI: Crypto WRITE request failed (error=5). label/toby.eli[WRITE(offset=2059841654784, length=4096)]
> 
> So we can see that geli is failing, but zpool status command doesn't show
> any errors at all...  The READ and WRITE columns both show 0 for the device..
> 
> Now I do know that the WRITEs are not retried, because if I do a scrub
> afterward, it detects cksum errors, and does properly increases the
> count in the CKSUM column...
> 
> Now if I pull a device, it will see that the device is lost, but no
> matter how many read or write errors get returned by geli, zfs doesn't
> seem to count them...
> 
> Has anyone else seen this w/ ZFS?  Is it possible that it's a problem w/
> geli, and not ZFS?
> 
> I haven't tried to run a test w/ gnop to fail some read/writes on -current..
> 
> P.S. Please keep me cc'd, as I'm not on the list.

Well, after some digging w/ help from smh@, it looks like the write
case in geli is broken...  in g_eli_write_done, we have the code:
        if (pbp->bio_error != 0) {
                G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=%d).",
                    pbp->bio_error);
                pbp->bio_completed = 0;
        }
        /*
         * Write is finished, send it up.
         */
        pbp->bio_completed = pbp->bio_length;
        sc = pbp->bio_to->geom->softc;
        g_io_deliver(pbp, pbp->bio_error);
        atomic_subtract_int(&sc->sc_inflight, 1);

so, we just end up overwriting the bio_completed error...

pjd, should we just put the bio_completed = line under an else?

something like:
	if (pbp->bio_error != 0) {
		G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=%d).",
		    pbp->bio_error);
		pbp->bio_completed = 0;
	} else
		pbp->bio_completed = pbp->bio_length;

	/* Write is finished, send it up. */
	g_io_deliver(pbp, pbp->bio_error);
	sc = pbp->bio_to->geom->softc;
	atomic_subtract_int(&sc->sc_inflight, 1);

But doesn't explain why read's aren't being counted though...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 18 21:14:32 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 89B4C969
 for <freebsd-fs@FreeBSD.org>; Mon, 18 Feb 2013 21:14:32 +0000 (UTC)
 (envelope-from smh@freebsd.org)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 by mx1.freebsd.org (Postfix) with ESMTP id 0715C80F
 for <freebsd-fs@FreeBSD.org>; Mon, 18 Feb 2013 21:14:31 +0000 (UTC)
X-Spam-Processed: mail1.multiplay.co.uk, Mon, 18 Feb 2013 21:14:22 +0000
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
 mail1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST
 shortcircuit=ham autolearn=disabled version=3.2.5
Received: from r2d2 ([188.220.16.49])
 by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 (MDaemon PRO v10.0.4) with ESMTP id md50002285358.msg
 for <freebsd-fs@FreeBSD.org>; Mon, 18 Feb 2013 21:14:22 +0000
X-MDRemoteIP: 188.220.16.49
X-Return-Path: smh@freebsd.org
X-Envelope-From: smh@freebsd.org
X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.org
Message-ID: <EF8E2A888F8D44FBA7362610150DAEA1@multiplay.co.uk>
From: "Steven Hartland" <smh@freebsd.org>
To: "John-Mark Gurney" <jmg@funkthat.com>,
	<freebsd-fs@FreeBSD.org>
References: <20130218191242.GI55866@funkthat.com>
 <20130218200121.GJ55866@funkthat.com>
Subject: Re: ZFS on 9.1 doesn't see errors on geli volumes...
Date: Mon, 18 Feb 2013 21:14:35 -0000
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="----=_NextPart_000_0B02_01CE0E1C.F42EBD20"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Feb 2013 21:14:32 -0000

This is a multi-part message in MIME format.

------=_NextPart_000_0B02_01CE0E1C.F42EBD20
Content-Type: text/plain;
	format=flowed;
	charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit


----- Original Message ----- 
From: "John-Mark Gurney" <jmg@funkthat.com>
To: <freebsd-fs@FreeBSD.org>
Sent: Monday, February 18, 2013 8:01 PM
Subject: Re: ZFS on 9.1 doesn't see errors on geli volumes...


> John-Mark Gurney wrote this message on Mon, Feb 18, 2013 at 11:12 -0800:
>> I'm running 9.1:
>> FreeBSD gold.funkthat.com 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #26 r241041M: Wed Dec 12 23:02:31 PST 2012 
>> jmg@gold.funkthat.com:/usr/src.9stable/sys/amd64/compile/gold  amd64
>>
>> The modifications are limited to improving AES-NI performance.
>>
>> On a box, and decided to go full ZFS w/ geli encrypted volumes (including
>> root fs)...  One of the hard drives started going bad, so I started
>> seeing:
>> hpt27xx: Device error information 0x1000000
>> hpt27xx: Task file error, StatusReg=0x51, ErrReg=0x40, LBA[0-3]=0xf495e928,LBA[4-7]=0x0.
>> (da3:hpt27xx0:0:3:0): READ(10). CDB: 28 0 f4 95 e8 f8 0 0 80 0
>> (da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed
>> (da3:hpt27xx0:0:3:0): Error 5, Unretryable error
>> GEOM_ELI: g_eli_read_done() failed label/toby.eli[READ(offset=2100974186496, length=90112)]
>>
>> and:
>> (da3:hpt27xx0:0:3:0): WRITE(10). CDB: 2a 0 ef cc 10 90 0 0 8 0
>> (da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed
>> (da3:hpt27xx0:0:3:0): Error 5, Unretryable error
>> GEOM_ELI: Crypto WRITE request failed (error=5). label/toby.eli[WRITE(offset=2059841654784, length=4096)]
>>
>> So we can see that geli is failing, but zpool status command doesn't show
>> any errors at all...  The READ and WRITE columns both show 0 for the device..
>>
>> Now I do know that the WRITEs are not retried, because if I do a scrub
>> afterward, it detects cksum errors, and does properly increases the
>> count in the CKSUM column...
>>
>> Now if I pull a device, it will see that the device is lost, but no
>> matter how many read or write errors get returned by geli, zfs doesn't
>> seem to count them...
>>
>> Has anyone else seen this w/ ZFS?  Is it possible that it's a problem w/
>> geli, and not ZFS?
>>
>> I haven't tried to run a test w/ gnop to fail some read/writes on -current..
>>
>> P.S. Please keep me cc'd, as I'm not on the list.
>
> Well, after some digging w/ help from smh@, it looks like the write
> case in geli is broken...  in g_eli_write_done, we have the code:
>        if (pbp->bio_error != 0) {
>                G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=%d).",
>                    pbp->bio_error);
>                pbp->bio_completed = 0;
>        }
>        /*
>         * Write is finished, send it up.
>         */
>        pbp->bio_completed = pbp->bio_length;
>        sc = pbp->bio_to->geom->softc;
>        g_io_deliver(pbp, pbp->bio_error);
>        atomic_subtract_int(&sc->sc_inflight, 1);
>
> so, we just end up overwriting the bio_completed error...
>
> pjd, should we just put the bio_completed = line under an else?
>
> something like:
> if (pbp->bio_error != 0) {
> G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=%d).",
>     pbp->bio_error);
> pbp->bio_completed = 0;
> } else
> pbp->bio_completed = pbp->bio_length;
>
> /* Write is finished, send it up. */
> g_io_deliver(pbp, pbp->bio_error);
> sc = pbp->bio_to->geom->softc;
> atomic_subtract_int(&sc->sc_inflight, 1);
>
> But doesn't explain why read's aren't being counted though...

Looks like the read case will loose the error if its not the last
bio in sector group.

The attached should fix both cases.

A question for someone familiar with geom: why is bio_completed
not set to bio_length in the read success case? Is this correct
or is this another little bug?

On a related note, if anyone's got some pointers to docs about
the internals of geom, I'd be interested :)

    Regards
    Steve
    Regards
    Steve 

------=_NextPart_000_0B02_01CE0E1C.F42EBD20
Content-Type: application/octet-stream;
	name="g_eli-error-loss.patch"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="g_eli-error-loss.patch"

--- sys/geom/eli/g_eli.c.orig	2013-02-18 20:50:53.838663732 +0000=0A=
+++ sys/geom/eli/g_eli.c	2013-02-18 21:04:59.429602837 +0000=0A=
@@ -158,7 +158,7 @@=0A=
 =0A=
 	G_ELI_LOGREQ(2, bp, "Request done.");=0A=
 	pbp =3D bp->bio_parent;=0A=
-	if (pbp->bio_error =3D=3D 0)=0A=
+	if (pbp->bio_error =3D=3D 0 && bp->bio_error !=3D 0)=0A=
 		pbp->bio_error =3D bp->bio_error;=0A=
 	g_destroy_bio(bp);=0A=
 	/*=0A=
@@ -169,7 +169,8 @@=0A=
 		return;=0A=
 	sc =3D pbp->bio_to->geom->softc;=0A=
 	if (pbp->bio_error !=3D 0) {=0A=
-		G_ELI_LOGREQ(0, pbp, "%s() failed", __func__);=0A=
+		G_ELI_LOGREQ(0, pbp, "%s() failed (error=3D%d)", __func__,=0A=
+		    pbp->bio_error);=0A=
 		pbp->bio_completed =3D 0;=0A=
 		if (pbp->bio_driver2 !=3D NULL) {=0A=
 			free(pbp->bio_driver2, M_ELI);=0A=
@@ -198,10 +199,8 @@=0A=
 =0A=
 	G_ELI_LOGREQ(2, bp, "Request done.");=0A=
 	pbp =3D bp->bio_parent;=0A=
-	if (pbp->bio_error =3D=3D 0) {=0A=
-		if (bp->bio_error !=3D 0)=0A=
-			pbp->bio_error =3D bp->bio_error;=0A=
-	}=0A=
+	if (pbp->bio_error =3D=3D 0 && bp->bio_error !=3D 0)=0A=
+		pbp->bio_error =3D bp->bio_error;=0A=
 	g_destroy_bio(bp);=0A=
 	/*=0A=
 	 * Do we have all sectors already?=0A=
@@ -212,14 +211,15 @@=0A=
 	free(pbp->bio_driver2, M_ELI);=0A=
 	pbp->bio_driver2 =3D NULL;=0A=
 	if (pbp->bio_error !=3D 0) {=0A=
-		G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=3D%d).",=0A=
+		G_ELI_LOGREQ(0, pbp, "%s() failed (error=3D%d)", __func__,=0A=
 		    pbp->bio_error);=0A=
 		pbp->bio_completed =3D 0;=0A=
-	}=0A=
+	} else=0A=
+		pbp->bio_completed =3D pbp->bio_length;=0A=
+=0A=
 	/*=0A=
 	 * Write is finished, send it up.=0A=
 	 */=0A=
-	pbp->bio_completed =3D pbp->bio_length;=0A=
 	sc =3D pbp->bio_to->geom->softc;=0A=
 	g_io_deliver(pbp, pbp->bio_error);=0A=
 	atomic_subtract_int(&sc->sc_inflight, 1);=0A=

------=_NextPart_000_0B02_01CE0E1C.F42EBD20--


From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 18 22:38:18 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id C2F2D10C
 for <freebsd-fs@FreeBSD.org>; Mon, 18 Feb 2013 22:38:18 +0000 (UTC)
 (envelope-from pawel@dawidek.net)
Received: from mail.dawidek.net (garage.dawidek.net [91.121.88.72])
 by mx1.freebsd.org (Postfix) with ESMTP id 5F07AAFC
 for <freebsd-fs@FreeBSD.org>; Mon, 18 Feb 2013 22:38:18 +0000 (UTC)
Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149])
 by mail.dawidek.net (Postfix) with ESMTPSA id F03E6408;
 Mon, 18 Feb 2013 23:35:22 +0100 (CET)
Date: Mon, 18 Feb 2013 23:39:23 +0100
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: John-Mark Gurney <jmg@funkthat.com>
Subject: Re: ZFS on 9.1 doesn't see errors on geli volumes...
Message-ID: <20130218223923.GB1375@garage.freebsd.pl>
References: <20130218191242.GI55866@funkthat.com>
 <20130218200121.GJ55866@funkthat.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="rS8CxjVDS/+yyDmU"
Content-Disposition: inline
In-Reply-To: <20130218200121.GJ55866@funkthat.com>
X-OS: FreeBSD 10.0-CURRENT amd64
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Feb 2013 22:38:18 -0000


--rS8CxjVDS/+yyDmU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Feb 18, 2013 at 12:01:21PM -0800, John-Mark Gurney wrote:
> John-Mark Gurney wrote this message on Mon, Feb 18, 2013 at 11:12 -0800:
> > I'm running 9.1:
> > FreeBSD gold.funkthat.com 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #26 r24=
1041M: Wed Dec 12 23:02:31 PST 2012     jmg@gold.funkthat.com:/usr/src.9sta=
ble/sys/amd64/compile/gold  amd64
> >=20
> > The modifications are limited to improving AES-NI performance.
> >=20
> > On a box, and decided to go full ZFS w/ geli encrypted volumes (includi=
ng
> > root fs)...  One of the hard drives started going bad, so I started
> > seeing:
> > hpt27xx: Device error information 0x1000000
> > hpt27xx: Task file error, StatusReg=3D0x51, ErrReg=3D0x40, LBA[0-3]=3D0=
xf495e928,LBA[4-7]=3D0x0.
> > (da3:hpt27xx0:0:3:0): READ(10). CDB: 28 0 f4 95 e8 f8 0 0 80 0=20
> > (da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed
> > (da3:hpt27xx0:0:3:0): Error 5, Unretryable error
> > GEOM_ELI: g_eli_read_done() failed label/toby.eli[READ(offset=3D2100974=
186496, length=3D90112)]
> >=20
> > and:
> > (da3:hpt27xx0:0:3:0): WRITE(10). CDB: 2a 0 ef cc 10 90 0 0 8 0=20
> > (da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed
> > (da3:hpt27xx0:0:3:0): Error 5, Unretryable error
> > GEOM_ELI: Crypto WRITE request failed (error=3D5). label/toby.eli[WRITE=
(offset=3D2059841654784, length=3D4096)]
> >=20
> > So we can see that geli is failing, but zpool status command doesn't sh=
ow
> > any errors at all...  The READ and WRITE columns both show 0 for the de=
vice..
> >=20
> > Now I do know that the WRITEs are not retried, because if I do a scrub
> > afterward, it detects cksum errors, and does properly increases the
> > count in the CKSUM column...
> >=20
> > Now if I pull a device, it will see that the device is lost, but no
> > matter how many read or write errors get returned by geli, zfs doesn't
> > seem to count them...
> >=20
> > Has anyone else seen this w/ ZFS?  Is it possible that it's a problem w/
> > geli, and not ZFS?
> >=20
> > I haven't tried to run a test w/ gnop to fail some read/writes on -curr=
ent..
> >=20
> > P.S. Please keep me cc'd, as I'm not on the list.
>=20
> Well, after some digging w/ help from smh@, it looks like the write
> case in geli is broken...  in g_eli_write_done, we have the code:
>         if (pbp->bio_error !=3D 0) {
>                 G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=
=3D%d).",
>                     pbp->bio_error);
>                 pbp->bio_completed =3D 0;
>         }
>         /*
>          * Write is finished, send it up.
>          */
>         pbp->bio_completed =3D pbp->bio_length;
>         sc =3D pbp->bio_to->geom->softc;
>         g_io_deliver(pbp, pbp->bio_error);
>         atomic_subtract_int(&sc->sc_inflight, 1);
>=20
> so, we just end up overwriting the bio_completed error...
>=20
> pjd, should we just put the bio_completed =3D line under an else?
>=20
> something like:
> 	if (pbp->bio_error !=3D 0) {
> 		G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=3D%d).",
> 		    pbp->bio_error);
> 		pbp->bio_completed =3D 0;
> 	} else
> 		pbp->bio_completed =3D pbp->bio_length;
>=20
> 	/* Write is finished, send it up. */
> 	g_io_deliver(pbp, pbp->bio_error);
> 	sc =3D pbp->bio_to->geom->softc;
> 	atomic_subtract_int(&sc->sc_inflight, 1);
>=20
> But doesn't explain why read's aren't being counted though...

Your patch looks correct (but add { } around else content before
committing).

The logic in vdev_geom.c should also be modified to treat other errors
just like EIOs.

This all still doesn't explain what you are seeing, as you did have
EIOs. Experimenting with gnop may provide more info.

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://tupytaj.pl

--rS8CxjVDS/+yyDmU
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlEirZsACgkQForvXbEpPzT9/ACgz5v6IEAct2VE77NxS6TBo+YP
IWoAn3EAyT2rnsSKSIGacppcyFc193iM
=oeJ/
-----END PGP SIGNATURE-----

--rS8CxjVDS/+yyDmU--

From owner-freebsd-fs@FreeBSD.ORG  Mon Feb 18 23:57:00 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 71C2D275
 for <freebsd-fs@freebsd.org>; Mon, 18 Feb 2013 23:57:00 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 1C936E64
 for <freebsd-fs@freebsd.org>; Mon, 18 Feb 2013 23:56:59 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqEEANC+IlGDaFvO/2dsb2JhbABEhkm5W4Ebc4IfAQEBAwEBAQEgKyALBRYYAgINGQIpAQkmBggHBAEcBIdrBgyueJI2gSOMOhAGBIEDNAeCLYETA4hniw2COIEdjzuDJU99CBce
X-IronPort-AV: E=Sophos;i="4.84,691,1355115600"; d="scan'208";a="14644626"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu.net.uoguelph.ca with ESMTP; 18 Feb 2013 18:56:58 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id F2B75B3F45;
 Mon, 18 Feb 2013 18:56:58 -0500 (EST)
Date: Mon, 18 Feb 2013 18:56:58 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Momchil Ivanov <momchil@xaxo.eu>
Message-ID: <1794994447.3103158.1361231818953.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <86bobtmvb0.wl%momchil@xaxo.eu>
Subject: Re: NFS + Kerberos
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Feb 2013 23:57:00 -0000

Monchil Ivanov wrote:
> Hello,
> 
> I have been trying to follow this guide [1] to get NFS with Kerberos
> working on FreeBSD, but I have some trouble. I hope somebody has the
> time and desire to help me...
> 
> I am using FreeBSD 9.1 as NFS server with the following configuration
> on the server:
> 
> file /etc/krb5.conf:
> 
> [libdefaults]
> default_realm = EXAMPLE.LOCAL
> default_etypes = des-cbc-crc
> default_etypes_des = des-cbc-crc
> allow_weak_crypto = true
> [realms]
> EXAMPLE.LOCAL = {
> kdc = kerberos.example.local
> admin_server = kerberos.example.local
> }
> [domain_realm]
> .example.local = EXAMPLE.LOCAL
> 
> file /etc/exports:
> 
> V4: / -sec=krb5i:krb5p
> /tank/storage -sec=krb5i:krb5p
> 
> file /etc/rc.conf:
> 
> ## nfsv4
> nfs_server_enable="YES"
> nfsv4_server_enable="YES"
> nfsuserd_enable="YES"
> mountd_enable="YES"
> mountd_flags="-r -n"
> 
> # for kerberos
> gssd_enable="YES"
> 
> kerberos seems to be working:
> 
> root@srv:/root # kinit -k nfs/srv.example.local
> root@srv:/root # klist
> Credentials cache: FILE:/tmp/krb5cc_0
> Principal: nfs/srv.example.local@EXAMPLE.LOCAL
> 
> Issued Expires Principal
> Feb 2 21:04:02 Feb 3 07:04:02 krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL
> root@srv:/root # kdestroy
> root@srv:/root # ktutil list
> FILE:/etc/krb5.keytab:
> 
> Vno Type Principal
> 1 des-cbc-crc nfs/srv.example.local@EXAMPLE.LOCAL
> 
> krb4:/etc/srvtab:
> 
> Vno Type Principal
> 
> the client is FreeBSD 8.2 with the following configuration:
> 
> file /etc/krb5.conf:
> 
> [libdefaults]
> default_realm = EXAMPLE.LOCAL
> default_etypes = des-cbc-crc
> default_etypes_des = des-cbc-crc
> allow_weak_crypto = true
> [realms]
> EXAMPLE.LOCAL = {
> kdc = kerberos.example.local
> admin_server = kerberos.example.local
> }
> [domain_realm]
> .example.local = EXAMPLE.LOCAL
> 
> file /etc/rc.conf:
> 
> ## NFS v4
> nfsuserd_enable="YES"
> nfscbd_enable="YES"
> # kerberos
> gssd_enable="YES"
> 
> file /etc/sysctl.conf:
> # Allow normal users to mount filesystems.
> vfs.usermount=1
> 
> here is the output from the client:
> 
> $ klist
> klist: No ticket file: /tmp/krb5cc_1001
> 
> $ mount -t nfs -o nfsv4,soft,sec=krb5i srv.example.local:/tank/storage
> /mnt/srv
> mount_nfs: can't update /var/db/mounttab for
> srv.example.local:/tank/storage
> nfsv4 err=10016
> mount_nfs: /mnt/srv, : Input/output error
> 
> then I do:
> 
> $ kinit user
> $ klist
> Credentials cache: FILE:/tmp/krb5cc_1001
> Principal: user@EXAMPLE.LOCAL
> 
> Issued Expires Principal
> Feb 2 21:15:36 Feb 3 07:15:33 krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL
> 
> $ mount -t nfs -o nfsv4,soft,sec=krb5i srv.example.local:/tank/storage
> /mnt/srv
> mount_nfs: can't update /var/db/mounttab for
> srv.example.local:/tank/storage
> nfsv4 err=10016
> mount_nfs: /mnt/srv, : Input/output error
> 
> $ klist
> Credentials cache: FILE:/tmp/krb5cc_1001
> Principal: user@EXAMPLE.LOCAL
> 
> Issued Expires Principal
> Feb 2 21:15:36 Feb 3 07:15:33 krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL
> Feb 2 21:15:43 Feb 3 07:15:33 nfs/srv.example.local@EXAMPLE.LOCAL
> 
> Note: the mount works without Kerberos if I add "sys" to the "sec"
> option on both lines of /etc/exports, ownership works too, therefore I
> think that nfsv4 works, nfsv3 works too. However I have no idea why
> they don't work with Kerberos.
> 
> Note: With and without a kerberos ticket, the result when using nfsv3
> is:
> 
> $ mount -t nfs -o nfsv3,soft,sec=krb5i srv.example.local:/tank/storage
> /mnt/srv
> mount_nfs: can't update /var/db/mounttab for
> srv.example.local:/tank/storage
> 
> $ ls /mnt/srv
> ls: /mnt/srv: Permission denied
> 
> Is there an easy way to get it working? Am I doing something wrong?
> 
Thanks to Elias's hard work, a bug/fix has just been isolated in the
Kerberos library that causes the gssd to fail to translate a principal
to a uid. The fix is to increase the size of the buffer passed to
getpwnam_r(). See this thread:
http://docs.FreeBSD.org/cgi/mid.cgi?CADtN0WKVzbKxhaLQw8y2KLhhRJC9n4ht9wyPmGQ+pHqSjQkVNw

I haven't run into this bug, so I don't know what systems are affected,
but it would explain why you can't get it working.

I'd suggest you apply the patch in the email (increase buf to 1024) and
then try again with libraries built with the patch.

rick

> PS: Please CC me, since I am not subscribed.
> 
> 1: http://code.google.com/p/macnfsv4/wiki/FreeBSD8KerberizedNFSSetup
> 
> Regards,
> Momchil
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 19 07:08:36 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id CCF6D7D4;
 Tue, 19 Feb 2013 07:08:36 +0000 (UTC)
 (envelope-from alfred@ixsystems.com)
Received: from mail.iXsystems.com (newknight.ixsystems.com [206.40.55.70])
 by mx1.freebsd.org (Postfix) with ESMTP id 8E9DF99;
 Tue, 19 Feb 2013 07:08:36 +0000 (UTC)
Received: from localhost (mail.ixsystems.com [10.2.55.1])
 by mail.iXsystems.com (Postfix) with ESMTP id DB4BF80FF;
 Mon, 18 Feb 2013 23:08:35 -0800 (PST)
Received: from mail.iXsystems.com ([10.2.55.1])
 by localhost (mail.ixsystems.com [10.2.55.1]) (maiad, port 10024) with ESMTP
 id 90308-08; Mon, 18 Feb 2013 23:08:35 -0800 (PST)
Received: from Alfreds-MacBook-Pro-9.local (unknown [10.8.0.26])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.iXsystems.com (Postfix) with ESMTPSA id 617B280EC;
 Mon, 18 Feb 2013 23:08:35 -0800 (PST)
Message-ID: <512324F2.4060707@ixsystems.com>
Date: Mon, 18 Feb 2013 23:08:34 -0800
From: Alfred Perlstein <alfred@ixsystems.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7;
 rv:17.0) Gecko/20130107 Thunderbird/17.0.2
MIME-Version: 1.0
To: Konstantin Belousov <kib@FreeBSD.org>, Doug Rabson <dfr@rabson.org>, 
 Xin Li <delphij@delphij.net>, fs@freebsd.org
Subject: Advisory lock crashes.
Content-Type: multipart/mixed; boundary="------------080502030802070307030108"
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 07:08:36 -0000

This is a multi-part message in MIME format.
--------------080502030802070307030108
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hello Konstantin & Doug,

We're getting a few crashes in what looks to be kern_lockf.c:

fault address here is 0x360 which appears to mean that the "sx" owner 
thread is NULL

db>  bt
Tracing pid 5099 tid 101614 td 0xfffffe005d54e8c0
_sx_xlock_hard() at _sx_xlock_hard+0xb3
lf_advlockasync() at lf_advlockasync+0x5d7
lf_advlock() at lf_advlock+0x47
vop_stdadvlock() at vop_stdadvlock+0xb3
VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x4a
closef() at closef+0x352
kern_close() at kern_close+0x172
amd64_syscall() at amd64_syscall+0x58a
Xfast_syscall() at Xfast_syscall+0xf7
--- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8011651fc, rsp = 0x7fffffbfdd58, rbp = 0x807c3d6c0 ---

(kgdb) list *(_sx_xlock_hard+0xb3)
0xffffffff806242c3 is in _sx_xlock_hard 
(/usr/home/jpaetzel/9.0.6-RELEASE-p1/FreeBSD/src/sys/kern/kern_sx.c:514).
509                     x = sx->sx_lock;
510                     if ((sx->lock_object.lo_flags & SX_NOADAPTIVE) 
== 0) {
511                             if ((x & SX_LOCK_SHARED) == 0) {
512                                     x = SX_OWNER(x);
513                                     owner = (struct thread *)x;
514                                     if (TD_IS_RUNNING(owner)) {
515                                             if 
(LOCK_LOG_TEST(&sx->lock_object, 0))
516 CTR3(KTR_LOCK,
517                                                 "%s: spinning on %p 
held by %p",
518 __func__, sx, owner);


Another panic here, which we have less information is attached as an image.

We're looking at using some INVARIANTS and WITNESS kernels, but was 
wondering if y'all had any other suggestions to use please?

thank you,
-Alfred

--------------080502030802070307030108--

From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 19 07:33:12 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 23271B3D
 for <fs@freebsd.org>; Tue, 19 Feb 2013 07:33:12 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 8A7EC173
 for <fs@freebsd.org>; Tue, 19 Feb 2013 07:33:11 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r1J7Wu4b048370;
 Tue, 19 Feb 2013 09:32:56 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r1J7Wu4b048370
Received: (from kostik@localhost)
 by tom.home (8.14.6/8.14.6/Submit) id r1J7Wuoi048369;
 Tue, 19 Feb 2013 09:32:56 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Tue, 19 Feb 2013 09:32:56 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Alfred Perlstein <alfred@ixsystems.com>
Subject: Re: Advisory lock crashes.
Message-ID: <20130219073256.GV2598@kib.kiev.ua>
References: <512324F2.4060707@ixsystems.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="sr8SBrQ3fbgntwtR"
Content-Disposition: inline
In-Reply-To: <512324F2.4060707@ixsystems.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: Xin Li <delphij@delphij.net>, fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 07:33:12 -0000


--sr8SBrQ3fbgntwtR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Feb 18, 2013 at 11:08:34PM -0800, Alfred Perlstein wrote:
> Hello Konstantin & Doug,
>=20
> We're getting a few crashes in what looks to be kern_lockf.c:
>=20
> fault address here is 0x360 which appears to mean that the "sx" owner=20
> thread is NULL
What is the version of FreeBSD ?
What is the filesystem owning the file which was advlocked ?
Show the line number for lf_advlockasync+0x5d7.

No, I never saw nothing similar in last 3 years.
>=20
> db>  bt
> Tracing pid 5099 tid 101614 td 0xfffffe005d54e8c0
> _sx_xlock_hard() at _sx_xlock_hard+0xb3
> lf_advlockasync() at lf_advlockasync+0x5d7
> lf_advlock() at lf_advlock+0x47
> vop_stdadvlock() at vop_stdadvlock+0xb3
> VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x4a
> closef() at closef+0x352
> kern_close() at kern_close+0x172
> amd64_syscall() at amd64_syscall+0x58a
> Xfast_syscall() at Xfast_syscall+0xf7
> --- syscall (6, FreeBSD ELF64, sys_close), rip =3D 0x8011651fc, rsp =3D 0=
x7fffffbfdd58, rbp =3D 0x807c3d6c0 ---
>=20
> (kgdb) list *(_sx_xlock_hard+0xb3)
> 0xffffffff806242c3 is in _sx_xlock_hard=20
> (/usr/home/jpaetzel/9.0.6-RELEASE-p1/FreeBSD/src/sys/kern/kern_sx.c:514).
> 509                     x =3D sx->sx_lock;
> 510                     if ((sx->lock_object.lo_flags & SX_NOADAPTIVE)=20
> =3D=3D 0) {
> 511                             if ((x & SX_LOCK_SHARED) =3D=3D 0) {
> 512                                     x =3D SX_OWNER(x);
> 513                                     owner =3D (struct thread *)x;
> 514                                     if (TD_IS_RUNNING(owner)) {
> 515                                             if=20
> (LOCK_LOG_TEST(&sx->lock_object, 0))
> 516 CTR3(KTR_LOCK,
> 517                                                 "%s: spinning on %p=
=20
> held by %p",
> 518 __func__, sx, owner);
>=20
>=20
> Another panic here, which we have less information is attached as an imag=
e.
>=20
> We're looking at using some INVARIANTS and WITNESS kernels, but was=20
> wondering if y'all had any other suggestions to use please?
>=20
> thank you,
> -Alfred


--sr8SBrQ3fbgntwtR
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRIyqnAAoJEJDCuSvBvK1B0egP/A9aHJw0KZcC+gz05cmIDwyd
3A4I4+wCOdvBEJbJOU08sYJdbWrNPCuMzAaovTLQ8P7a/IO667p6/UHpK4UqtLhX
5F3euYJ8F7Rac+AQ321txEQAGN4dQaFcUezaekU7H6kX0CN5n0d0JJyd/GwMDNK6
764Y8pKm3AWBXTw2qVWKbXjE+FH5kdq9sxiGq8y6noCSXMJY5kbA1XrlQ5f3EvrP
aHzs4uL42XBjIPVbFwyV7Z4KNUWN5RwSlqoQlHpbW9jJVaSpPge+LpMDihft4LED
gR3fzsFh0Q0s+a9we1TGggnyQp8ukffqmYmES56I1gOEiu14z1cUGsBZyJYEjm5y
DPmIc/MJhnmXTbSZgDw5EWas3keXt4AwPi+pcGaaRPlpyxZ6jPApxe4XGm3Q8060
eEkoKLvvvBRzPPwgy9zc2MRheN0RtipW+58ZHBmJAnFvLJOgGl/YiSFcGTJK1M2R
X19kWAQfTVqkq1SpGTakfvED1Rg2lBwXNzsrWSqq28KcMYK1+PnvGaNptr+ApUXg
+gHgr1FWw10ka3yzMPUz2CvDtFUnIMz1/VWoAl8+KMZqjvxUc04pH/N8M/mz7XnZ
I/dpEmdD7Lctiw0+UNo9pbs1369St01DzFiGkbXnDOPJva5PNFGBkweth3ENnOZk
p9UK4wA+oEqR0mBnJBz2
=pOEM
-----END PGP SIGNATURE-----

--sr8SBrQ3fbgntwtR--

From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 19 08:07:17 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 604A38A8
 for <fs@freebsd.org>; Tue, 19 Feb 2013 08:07:17 +0000 (UTC)
 (envelope-from alfred@ixsystems.com)
Received: from mail.iXsystems.com (newknight.ixsystems.com [206.40.55.70])
 by mx1.freebsd.org (Postfix) with ESMTP id 494132DD
 for <fs@freebsd.org>; Tue, 19 Feb 2013 08:07:16 +0000 (UTC)
Received: from localhost (mail.ixsystems.com [10.2.55.1])
 by mail.iXsystems.com (Postfix) with ESMTP id BBEFD82F9;
 Tue, 19 Feb 2013 00:07:16 -0800 (PST)
Received: from mail.iXsystems.com ([10.2.55.1])
 by localhost (mail.ixsystems.com [10.2.55.1]) (maiad, port 10024) with ESMTP
 id 97714-01; Tue, 19 Feb 2013 00:07:16 -0800 (PST)
Received: from Alfreds-MacBook-Pro-9.local (unknown [10.8.0.26])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.iXsystems.com (Postfix) with ESMTPSA id 1006082F0;
 Tue, 19 Feb 2013 00:07:16 -0800 (PST)
Message-ID: <512332B3.10400@ixsystems.com>
Date: Tue, 19 Feb 2013 00:07:15 -0800
From: Alfred Perlstein <alfred@ixsystems.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7;
 rv:17.0) Gecko/20130107 Thunderbird/17.0.2
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Advisory lock crashes.
References: <512324F2.4060707@ixsystems.com>
 <20130219073256.GV2598@kib.kiev.ua>
In-Reply-To: <20130219073256.GV2598@kib.kiev.ua>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Xin Li <delphij@delphij.net>, fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 08:07:17 -0000

On 2/18/13 11:32 PM, Konstantin Belousov wrote:
> On Mon, Feb 18, 2013 at 11:08:34PM -0800, Alfred Perlstein wrote:
>> Hello Konstantin & Doug,
>>
>> We're getting a few crashes in what looks to be kern_lockf.c:
>>
>> fault address here is 0x360 which appears to mean that the "sx" owner
>> thread is NULL
> What is the version of FreeBSD ?
This is a releng 9.0 system.  (note, we have the most up to date version 
of this file with the exception of a cosmetic diff for MALLOC defines).

> What is the filesystem owning the file which was advlocked ?
I'm pretty sure that is going to be ZFS.

> Show the line number for lf_advlockasync+0x5d7.

> (kgdb) list *(lf_advlockasync+0x5d7)
> 0xffffffff80604fc7 is in lf_advlockasync (sx.h:152).
> 147     {
> 148             uintptr_t tid = (uintptr_t)td;
> 149             int error = 0;
> 150
> 151             if (!atomic_cmpset_acq_ptr(&sx->sx_lock, 
> SX_LOCK_UNLOCKED, tid))
> 152                     error = _sx_xlock_hard(sx, tid, opts, file, line);
> 153             else
> 154 LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(LS_SX_XLOCK_ACQUIRE,
> 155                         sx, 0, 0, file, line);
> 156
That may not be helpful so I've included this:
/usr/home/alfred # bc
ibase=16
5D7
1495

(kgdb) disasse lf_advlockasync
Dump of assembler code for function lf_advlockasync:
0xffffffff806049f0 <lf_advlockasync+0>: push   %rbp
0xffffffff806049f1 <lf_advlockasync+1>: mov    %rdx,%rcx
> 0xffffffff80604f70 <lf_advlockasync+1408>:      mov    -0x80(%rbp),%rdi
> 0xffffffff80604f74 <lf_advlockasync+1412>:      xor %ecx,%ecx
> 0xffffffff80604f76 <lf_advlockasync+1414>:      xor %edx,%edx
> 0xffffffff80604f78 <lf_advlockasync+1416>:      mov %rbx,%rsi
> 0xffffffff80604f7b <lf_advlockasync+1419>:      callq 
> 0xffffffff806246d0 <_sx_xunlock_hard>
> 0xffffffff80604f80 <lf_advlockasync+1424>:      jmpq 
> 0xffffffff80604c53 <lf_advlockasync+611>
> 0xffffffff80604f85 <lf_advlockasync+1429>:      mov -0x58(%rbp),%rcx
> 0xffffffff80604f89 <lf_advlockasync+1433>:      xor %r12d,%r12d
> 0xffffffff80604f8c <lf_advlockasync+1436>:      mov 0x18(%rcx),%edi
> 0xffffffff80604f8f <lf_advlockasync+1439>:      callq 
> 0xffffffff80603b90 <lf_clearremotesys>
> 0xffffffff80604f94 <lf_advlockasync+1444>:      jmpq 
> 0xffffffff80604c70 <lf_advlockasync+640>
> 0xffffffff80604f99 <lf_advlockasync+1449>:      lea 0xc8(%r13),%rdi
> 0xffffffff80604fa0 <lf_advlockasync+1456>:      xor %r8d,%r8d
> 0xffffffff80604fa3 <lf_advlockasync+1459>:      xor %ecx,%ecx
> 0xffffffff80604fa5 <lf_advlockasync+1461>:      xor %edx,%edx
> 0xffffffff80604fa7 <lf_advlockasync+1463>:      mov %rbx,%rsi
> 0xffffffff80604faa <lf_advlockasync+1466>:      callq 
> 0xffffffff8060a1f0 <_mtx_lock_sleep>
> 0xffffffff80604faf <lf_advlockasync+1471>:      jmpq 
> 0xffffffff80604f2e <lf_advlockasync+1342>
> 0xffffffff80604fb4 <lf_advlockasync+1476>:      mov -0x80(%rbp),%rdi
> 0xffffffff80604fb8 <lf_advlockasync+1480>:      xor %r8d,%r8d
> 0xffffffff80604fbb <lf_advlockasync+1483>:      xor %ecx,%ecx
> 0xffffffff80604fbd <lf_advlockasync+1485>:      xor %edx,%edx
> 0xffffffff80604fbf <lf_advlockasync+1487>:      mov %rbx,%rsi
> 0xffffffff80604fc2 <lf_advlockasync+1490>:      callq 
> 0xffffffff80624210 <_sx_xlock_hard>
> 0xffffffff80604fc7 <lf_advlockasync+1495>:      jmpq 
> 0xffffffff80604f15 <lf_advlockasync+1317>
> 0xffffffff80604fcc <lf_advlockasync+1500>:      lea 0xc8(%r13),%rdi
> 0xffffffff80604fd3 <lf_advlockasync+1507>:      xor %ecx,%ecx
> 0xffffffff80604fd5 <lf_advlockasync+1509>:      xor %edx,%edx
> 0xffffffff80604fd7 <lf_advlockasync+1511>:      xor %esi,%esi
> 0xffffffff80604fd9 <lf_advlockasync+1513>:      callq 
> 0xffffffff8060a040 <_mtx_unlock_sleep>
> 0xffffffff80604fde <lf_advlockasync+1518>:      jmpq 
> 0xffffffff80604f5c <lf_advlockasync+1388>
> 0xffffffff80604fe3 <lf_advlockasync+1523>:      mov %r15,(%rcx)
> 0xffffffff80604fe6 <lf_advlockasync+1526>:      mov %r15,%r14
> 0xffffffff80604fe9 <lf_advlockasync+1529>:      mov %gs:0x0,%rax
> 0xffffffff80604ff2 <lf_advlockasync+1538>:      lock cmpxchg 
> %rbx,0xe0(%r13)


>
> No, I never saw nothing similar in last 3 years.

Yes, I'd suspect we'd all see more things here.  We're very much capable 
of adding instrumentation to the OS/kernel to help track this down if 
you have ideas.

-Alfred


>> db>  bt
>> Tracing pid 5099 tid 101614 td 0xfffffe005d54e8c0
>> _sx_xlock_hard() at _sx_xlock_hard+0xb3
>> lf_advlockasync() at lf_advlockasync+0x5d7
>> lf_advlock() at lf_advlock+0x47
>> vop_stdadvlock() at vop_stdadvlock+0xb3
>> VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x4a
>> closef() at closef+0x352
>> kern_close() at kern_close+0x172
>> amd64_syscall() at amd64_syscall+0x58a
>> Xfast_syscall() at Xfast_syscall+0xf7
>> --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8011651fc, rsp = 0x7fffffbfdd58, rbp = 0x807c3d6c0 ---
>>
>> (kgdb) list *(_sx_xlock_hard+0xb3)
>> 0xffffffff806242c3 is in _sx_xlock_hard
>> (/usr/home/jpaetzel/9.0.6-RELEASE-p1/FreeBSD/src/sys/kern/kern_sx.c:514).
>> 509                     x = sx->sx_lock;
>> 510                     if ((sx->lock_object.lo_flags & SX_NOADAPTIVE)
>> == 0) {
>> 511                             if ((x & SX_LOCK_SHARED) == 0) {
>> 512                                     x = SX_OWNER(x);
>> 513                                     owner = (struct thread *)x;
>> 514                                     if (TD_IS_RUNNING(owner)) {
>> 515                                             if
>> (LOCK_LOG_TEST(&sx->lock_object, 0))
>> 516 CTR3(KTR_LOCK,
>> 517                                                 "%s: spinning on %p
>> held by %p",
>> 518 __func__, sx, owner);
>>
>>
>> Another panic here, which we have less information is attached as an image.
>>
>> We're looking at using some INVARIANTS and WITNESS kernels, but was
>> wondering if y'all had any other suggestions to use please?
>>
>> thank you,
>> -Alfred
>


From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 19 08:20:40 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id A0A2C30B
 for <fs@freebsd.org>; Tue, 19 Feb 2013 08:20:40 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id DAFAD634
 for <fs@freebsd.org>; Tue, 19 Feb 2013 08:20:39 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r1J8KQ07054222;
 Tue, 19 Feb 2013 10:20:26 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r1J8KQ07054222
Received: (from kostik@localhost)
 by tom.home (8.14.6/8.14.6/Submit) id r1J8KQAP054221;
 Tue, 19 Feb 2013 10:20:26 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Tue, 19 Feb 2013 10:20:26 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Alfred Perlstein <alfred@ixsystems.com>
Subject: Re: Advisory lock crashes.
Message-ID: <20130219082026.GY2598@kib.kiev.ua>
References: <512324F2.4060707@ixsystems.com>
 <20130219073256.GV2598@kib.kiev.ua> <512332B3.10400@ixsystems.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="DejVYFcqCV4p9T4J"
Content-Disposition: inline
In-Reply-To: <512332B3.10400@ixsystems.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: Xin Li <delphij@delphij.net>, fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 08:20:40 -0000


--DejVYFcqCV4p9T4J
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Feb 19, 2013 at 12:07:15AM -0800, Alfred Perlstein wrote:
> On 2/18/13 11:32 PM, Konstantin Belousov wrote:
> > On Mon, Feb 18, 2013 at 11:08:34PM -0800, Alfred Perlstein wrote:
> >> Hello Konstantin & Doug,
> >>
> >> We're getting a few crashes in what looks to be kern_lockf.c:
> >>
> >> fault address here is 0x360 which appears to mean that the "sx" owner
> >> thread is NULL
> > What is the version of FreeBSD ?
> This is a releng 9.0 system.  (note, we have the most up to date version=
=20
> of this file with the exception of a cosmetic diff for MALLOC defines).
My suspicion is that the issue is not in the kern_lockf.c at all,
rather it is a bug in the vnode lifetime management in the filesystem
code.  If true, the absense of the changes in the kern_lockf.c does
not matter, but the changes in ZFS do.

AFAIR, there were a lot of fixes in this area for ZFS, done by avg.

>=20
> > What is the filesystem owning the file which was advlocked ?
> I'm pretty sure that is going to be ZFS.
>=20
> > Show the line number for lf_advlockasync+0x5d7.
>=20
> > (kgdb) list *(lf_advlockasync+0x5d7)
> > 0xffffffff80604fc7 is in lf_advlockasync (sx.h:152).
> > 147     {
> > 148             uintptr_t tid =3D (uintptr_t)td;
> > 149             int error =3D 0;
> > 150
> > 151             if (!atomic_cmpset_acq_ptr(&sx->sx_lock,=20
> > SX_LOCK_UNLOCKED, tid))
> > 152                     error =3D _sx_xlock_hard(sx, tid, opts, file, l=
ine);
> > 153             else
> > 154 LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(LS_SX_XLOCK_ACQUIRE,
> > 155                         sx, 0, 0, file, line);
> > 156
> That may not be helpful so I've included this:
> /usr/home/alfred # bc
> ibase=3D16
> 5D7
> 1495
>=20
> (kgdb) disasse lf_advlockasync
> Dump of assembler code for function lf_advlockasync:
> 0xffffffff806049f0 <lf_advlockasync+0>: push   %rbp
> 0xffffffff806049f1 <lf_advlockasync+1>: mov    %rdx,%rcx
> > 0xffffffff80604f70 <lf_advlockasync+1408>:      mov    -0x80(%rbp),%rdi
> > 0xffffffff80604f74 <lf_advlockasync+1412>:      xor %ecx,%ecx
> > 0xffffffff80604f76 <lf_advlockasync+1414>:      xor %edx,%edx
> > 0xffffffff80604f78 <lf_advlockasync+1416>:      mov %rbx,%rsi
> > 0xffffffff80604f7b <lf_advlockasync+1419>:      callq=20
> > 0xffffffff806246d0 <_sx_xunlock_hard>
> > 0xffffffff80604f80 <lf_advlockasync+1424>:      jmpq=20
> > 0xffffffff80604c53 <lf_advlockasync+611>
> > 0xffffffff80604f85 <lf_advlockasync+1429>:      mov -0x58(%rbp),%rcx
> > 0xffffffff80604f89 <lf_advlockasync+1433>:      xor %r12d,%r12d
> > 0xffffffff80604f8c <lf_advlockasync+1436>:      mov 0x18(%rcx),%edi
> > 0xffffffff80604f8f <lf_advlockasync+1439>:      callq=20
> > 0xffffffff80603b90 <lf_clearremotesys>
> > 0xffffffff80604f94 <lf_advlockasync+1444>:      jmpq=20
> > 0xffffffff80604c70 <lf_advlockasync+640>
> > 0xffffffff80604f99 <lf_advlockasync+1449>:      lea 0xc8(%r13),%rdi
> > 0xffffffff80604fa0 <lf_advlockasync+1456>:      xor %r8d,%r8d
> > 0xffffffff80604fa3 <lf_advlockasync+1459>:      xor %ecx,%ecx
> > 0xffffffff80604fa5 <lf_advlockasync+1461>:      xor %edx,%edx
> > 0xffffffff80604fa7 <lf_advlockasync+1463>:      mov %rbx,%rsi
> > 0xffffffff80604faa <lf_advlockasync+1466>:      callq=20
> > 0xffffffff8060a1f0 <_mtx_lock_sleep>
> > 0xffffffff80604faf <lf_advlockasync+1471>:      jmpq=20
> > 0xffffffff80604f2e <lf_advlockasync+1342>
> > 0xffffffff80604fb4 <lf_advlockasync+1476>:      mov -0x80(%rbp),%rdi
> > 0xffffffff80604fb8 <lf_advlockasync+1480>:      xor %r8d,%r8d
> > 0xffffffff80604fbb <lf_advlockasync+1483>:      xor %ecx,%ecx
> > 0xffffffff80604fbd <lf_advlockasync+1485>:      xor %edx,%edx
> > 0xffffffff80604fbf <lf_advlockasync+1487>:      mov %rbx,%rsi
> > 0xffffffff80604fc2 <lf_advlockasync+1490>:      callq=20
> > 0xffffffff80624210 <_sx_xlock_hard>
> > 0xffffffff80604fc7 <lf_advlockasync+1495>:      jmpq=20
> > 0xffffffff80604f15 <lf_advlockasync+1317>
> > 0xffffffff80604fcc <lf_advlockasync+1500>:      lea 0xc8(%r13),%rdi
> > 0xffffffff80604fd3 <lf_advlockasync+1507>:      xor %ecx,%ecx
> > 0xffffffff80604fd5 <lf_advlockasync+1509>:      xor %edx,%edx
> > 0xffffffff80604fd7 <lf_advlockasync+1511>:      xor %esi,%esi
> > 0xffffffff80604fd9 <lf_advlockasync+1513>:      callq=20
> > 0xffffffff8060a040 <_mtx_unlock_sleep>
> > 0xffffffff80604fde <lf_advlockasync+1518>:      jmpq=20
> > 0xffffffff80604f5c <lf_advlockasync+1388>
> > 0xffffffff80604fe3 <lf_advlockasync+1523>:      mov %r15,(%rcx)
> > 0xffffffff80604fe6 <lf_advlockasync+1526>:      mov %r15,%r14
> > 0xffffffff80604fe9 <lf_advlockasync+1529>:      mov %gs:0x0,%rax
> > 0xffffffff80604ff2 <lf_advlockasync+1538>:      lock cmpxchg=20
> > %rbx,0xe0(%r13)
This is not helpful too, you demonstrated the inlined part of the sx_lock().
I need to understand which sx caused the issue, state->ls_lock (and
then it is related to the vnode life), or lf_lock_states_lock.

Either the logic of the assembler should be analyzed to decipher which
lock is it, or try to list more lines around the reported address, to
see which sx_xlock() line is there.

>=20
>=20
> >
> > No, I never saw nothing similar in last 3 years.
>=20
> Yes, I'd suspect we'd all see more things here.  We're very much capable=
=20
> of adding instrumentation to the OS/kernel to help track this down if=20
> you have ideas.
INVARIANTS, DIAGNOSTIC, DEBUG_VFS_LOCK.

What is needed is the printout of *vp involved in the panic.

>=20
> -Alfred
>=20
>=20
> >> db>  bt
> >> Tracing pid 5099 tid 101614 td 0xfffffe005d54e8c0
> >> _sx_xlock_hard() at _sx_xlock_hard+0xb3
> >> lf_advlockasync() at lf_advlockasync+0x5d7
> >> lf_advlock() at lf_advlock+0x47
> >> vop_stdadvlock() at vop_stdadvlock+0xb3
> >> VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x4a
> >> closef() at closef+0x352
> >> kern_close() at kern_close+0x172
> >> amd64_syscall() at amd64_syscall+0x58a
> >> Xfast_syscall() at Xfast_syscall+0xf7
> >> --- syscall (6, FreeBSD ELF64, sys_close), rip =3D 0x8011651fc, rsp =
=3D 0x7fffffbfdd58, rbp =3D 0x807c3d6c0 ---
> >>
> >> (kgdb) list *(_sx_xlock_hard+0xb3)
> >> 0xffffffff806242c3 is in _sx_xlock_hard
> >> (/usr/home/jpaetzel/9.0.6-RELEASE-p1/FreeBSD/src/sys/kern/kern_sx.c:51=
4).
> >> 509                     x =3D sx->sx_lock;
> >> 510                     if ((sx->lock_object.lo_flags & SX_NOADAPTIVE)
> >> =3D=3D 0) {
> >> 511                             if ((x & SX_LOCK_SHARED) =3D=3D 0) {
> >> 512                                     x =3D SX_OWNER(x);
> >> 513                                     owner =3D (struct thread *)x;
> >> 514                                     if (TD_IS_RUNNING(owner)) {
> >> 515                                             if
> >> (LOCK_LOG_TEST(&sx->lock_object, 0))
> >> 516 CTR3(KTR_LOCK,
> >> 517                                                 "%s: spinning on %p
> >> held by %p",
> >> 518 __func__, sx, owner);
> >>
> >>
> >> Another panic here, which we have less information is attached as an i=
mage.
> >>
> >> We're looking at using some INVARIANTS and WITNESS kernels, but was
> >> wondering if y'all had any other suggestions to use please?
> >>
> >> thank you,
> >> -Alfred
> >

--DejVYFcqCV4p9T4J
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRIzXJAAoJEJDCuSvBvK1B7jUQAJQadXQ7Z6dMDtZ/zEnFv0kJ
3r9OGt5zg3vX71XPug/4FqjBjkbGh6d2IeT/how1u/OL37iRjdc7tLKIkjM/VEJp
XaMOIvG2k7MtUOPF9jd2g74DdSdB6zA56I0tdVpKbEQ1ea0t3/Zwxhz4ERBPGIVH
VBVlblLV5kAlTivC2EeoZfc390sCFRY3TINBuTPYQkuqvHgI2YIoMH9MSdi4Yenr
pkJKTaHL0zTYDnybcgMcqdb7GoNjHDiqMamXgdKdgvfKYT7qgMwte0yUHoUGk994
jBgaa1KOYJCCm1cbpzp0FowMs9b6rQ6aWIF0ZOdV7B0IRgPWvxs3lu6okUU29YZF
cdvCpRLyJPxx+47zgZrhrlxlsHLj/09SvYkyB12iW7BVgf03jIbpHm7+dBBjGxvV
ovHuv5/hwYNAbZuteE0nctQAp8Qdfd0UcknCe1IL6/S4BWFO4ftIw+MIk+NAS/dA
ihN4XNBOe9+3DkSPQydJ6efwDiGlo9W+S1r0P/8rbBSuadk5NCG8Y3/7EV4rIIVw
NHu7I0MtHpHSPxGkM/NvaawHl9QjqVx+uCitlzEUex5uGJep2ujifEvTB5Fkd+sr
pxcYVEOEdoOKVI3LMqAOe7Rz7W//vplxFZs7O3Nakn/xGtecftusXoDwPjFVsZX/
c1clMBwyUukYuiGP3z6U
=HbNy
-----END PGP SIGNATURE-----

--DejVYFcqCV4p9T4J--

From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 19 08:36:13 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 1532D86B
 for <fs@freebsd.org>; Tue, 19 Feb 2013 08:36:13 +0000 (UTC)
 (envelope-from alfred@ixsystems.com)
Received: from mail.iXsystems.com (newknight.ixsystems.com [206.40.55.70])
 by mx1.freebsd.org (Postfix) with ESMTP id E306271D
 for <fs@freebsd.org>; Tue, 19 Feb 2013 08:36:12 +0000 (UTC)
Received: from localhost (mail.ixsystems.com [10.2.55.1])
 by mail.iXsystems.com (Postfix) with ESMTP id 5830D843B;
 Tue, 19 Feb 2013 00:36:12 -0800 (PST)
Received: from mail.iXsystems.com ([10.2.55.1])
 by localhost (mail.ixsystems.com [10.2.55.1]) (maiad, port 10024) with ESMTP
 id 98169-06; Tue, 19 Feb 2013 00:36:12 -0800 (PST)
Received: from Alfreds-MacBook-Pro-9.local (unknown [10.8.0.26])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mail.iXsystems.com (Postfix) with ESMTPSA id 060798434;
 Tue, 19 Feb 2013 00:36:11 -0800 (PST)
Message-ID: <5123397B.8030807@ixsystems.com>
Date: Tue, 19 Feb 2013 00:36:11 -0800
From: Alfred Perlstein <alfred@ixsystems.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7;
 rv:17.0) Gecko/20130107 Thunderbird/17.0.2
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Advisory lock crashes.
References: <512324F2.4060707@ixsystems.com>
 <20130219073256.GV2598@kib.kiev.ua> <512332B3.10400@ixsystems.com>
 <20130219082026.GY2598@kib.kiev.ua>
In-Reply-To: <20130219082026.GY2598@kib.kiev.ua>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Xin Li <delphij@delphij.net>, fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 08:36:13 -0000

On 2/19/13 12:20 AM, Konstantin Belousov wrote:
> On Tue, Feb 19, 2013 at 12:07:15AM -0800, Alfred Perlstein wrote:
>> On 2/18/13 11:32 PM, Konstantin Belousov wrote:
>>> On Mon, Feb 18, 2013 at 11:08:34PM -0800, Alfred Perlstein wrote:
>>>> Hello Konstantin & Doug,
>>>>
>>>> We're getting a few crashes in what looks to be kern_lockf.c:
>>>>
>>>> fault address here is 0x360 which appears to mean that the "sx" owner
>>>> thread is NULL
>>> What is the version of FreeBSD ?
>> This is a releng 9.0 system.  (note, we have the most up to date version
>> of this file with the exception of a cosmetic diff for MALLOC defines).
> My suspicion is that the issue is not in the kern_lockf.c at all,
> rather it is a bug in the vnode lifetime management in the filesystem
> code.  If true, the absense of the changes in the kern_lockf.c does
> not matter, but the changes in ZFS do.
>
> AFAIR, there were a lot of fixes in this area for ZFS, done by avg.

That would make sense.  It appears as if the lockf data structures are 
being free()'d out from under us.

Maybe there are some asserts we can put in place to catch this under 
DEBUG_VFS? or something?

Meanwhile we'll try to catchup with zfs fixes in head.

-Alfred

>
>>> What is the filesystem owning the file which was advlocked ?
>> I'm pretty sure that is going to be ZFS.
>>
>>> Show the line number for lf_advlockasync+0x5d7.
>>> (kgdb) list *(lf_advlockasync+0x5d7)
>>> 0xffffffff80604fc7 is in lf_advlockasync (sx.h:152).
>>> 147     {
>>> 148             uintptr_t tid = (uintptr_t)td;
>>> 149             int error = 0;
>>> 150
>>> 151             if (!atomic_cmpset_acq_ptr(&sx->sx_lock,
>>> SX_LOCK_UNLOCKED, tid))
>>> 152                     error = _sx_xlock_hard(sx, tid, opts, file, line);
>>> 153             else
>>> 154 LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(LS_SX_XLOCK_ACQUIRE,
>>> 155                         sx, 0, 0, file, line);
>>> 156
>> That may not be helpful so I've included this:
>> /usr/home/alfred # bc
>> ibase=16
>> 5D7
>> 1495
>>
>> (kgdb) disasse lf_advlockasync
>> Dump of assembler code for function lf_advlockasync:
>> 0xffffffff806049f0 <lf_advlockasync+0>: push   %rbp
>> 0xffffffff806049f1 <lf_advlockasync+1>: mov    %rdx,%rcx
>>> 0xffffffff80604f70 <lf_advlockasync+1408>:      mov    -0x80(%rbp),%rdi
>>> 0xffffffff80604f74 <lf_advlockasync+1412>:      xor %ecx,%ecx
>>> 0xffffffff80604f76 <lf_advlockasync+1414>:      xor %edx,%edx
>>> 0xffffffff80604f78 <lf_advlockasync+1416>:      mov %rbx,%rsi
>>> 0xffffffff80604f7b <lf_advlockasync+1419>:      callq
>>> 0xffffffff806246d0 <_sx_xunlock_hard>
>>> 0xffffffff80604f80 <lf_advlockasync+1424>:      jmpq
>>> 0xffffffff80604c53 <lf_advlockasync+611>
>>> 0xffffffff80604f85 <lf_advlockasync+1429>:      mov -0x58(%rbp),%rcx
>>> 0xffffffff80604f89 <lf_advlockasync+1433>:      xor %r12d,%r12d
>>> 0xffffffff80604f8c <lf_advlockasync+1436>:      mov 0x18(%rcx),%edi
>>> 0xffffffff80604f8f <lf_advlockasync+1439>:      callq
>>> 0xffffffff80603b90 <lf_clearremotesys>
>>> 0xffffffff80604f94 <lf_advlockasync+1444>:      jmpq
>>> 0xffffffff80604c70 <lf_advlockasync+640>
>>> 0xffffffff80604f99 <lf_advlockasync+1449>:      lea 0xc8(%r13),%rdi
>>> 0xffffffff80604fa0 <lf_advlockasync+1456>:      xor %r8d,%r8d
>>> 0xffffffff80604fa3 <lf_advlockasync+1459>:      xor %ecx,%ecx
>>> 0xffffffff80604fa5 <lf_advlockasync+1461>:      xor %edx,%edx
>>> 0xffffffff80604fa7 <lf_advlockasync+1463>:      mov %rbx,%rsi
>>> 0xffffffff80604faa <lf_advlockasync+1466>:      callq
>>> 0xffffffff8060a1f0 <_mtx_lock_sleep>
>>> 0xffffffff80604faf <lf_advlockasync+1471>:      jmpq
>>> 0xffffffff80604f2e <lf_advlockasync+1342>
>>> 0xffffffff80604fb4 <lf_advlockasync+1476>:      mov -0x80(%rbp),%rdi
>>> 0xffffffff80604fb8 <lf_advlockasync+1480>:      xor %r8d,%r8d
>>> 0xffffffff80604fbb <lf_advlockasync+1483>:      xor %ecx,%ecx
>>> 0xffffffff80604fbd <lf_advlockasync+1485>:      xor %edx,%edx
>>> 0xffffffff80604fbf <lf_advlockasync+1487>:      mov %rbx,%rsi
>>> 0xffffffff80604fc2 <lf_advlockasync+1490>:      callq
>>> 0xffffffff80624210 <_sx_xlock_hard>
>>> 0xffffffff80604fc7 <lf_advlockasync+1495>:      jmpq
>>> 0xffffffff80604f15 <lf_advlockasync+1317>
>>> 0xffffffff80604fcc <lf_advlockasync+1500>:      lea 0xc8(%r13),%rdi
>>> 0xffffffff80604fd3 <lf_advlockasync+1507>:      xor %ecx,%ecx
>>> 0xffffffff80604fd5 <lf_advlockasync+1509>:      xor %edx,%edx
>>> 0xffffffff80604fd7 <lf_advlockasync+1511>:      xor %esi,%esi
>>> 0xffffffff80604fd9 <lf_advlockasync+1513>:      callq
>>> 0xffffffff8060a040 <_mtx_unlock_sleep>
>>> 0xffffffff80604fde <lf_advlockasync+1518>:      jmpq
>>> 0xffffffff80604f5c <lf_advlockasync+1388>
>>> 0xffffffff80604fe3 <lf_advlockasync+1523>:      mov %r15,(%rcx)
>>> 0xffffffff80604fe6 <lf_advlockasync+1526>:      mov %r15,%r14
>>> 0xffffffff80604fe9 <lf_advlockasync+1529>:      mov %gs:0x0,%rax
>>> 0xffffffff80604ff2 <lf_advlockasync+1538>:      lock cmpxchg
>>> %rbx,0xe0(%r13)
> This is not helpful too, you demonstrated the inlined part of the sx_lock().
> I need to understand which sx caused the issue, state->ls_lock (and
> then it is related to the vnode life), or lf_lock_states_lock.
>
> Either the logic of the assembler should be analyzed to decipher which
> lock is it, or try to list more lines around the reported address, to
> see which sx_xlock() line is there.
>
>>
>>> No, I never saw nothing similar in last 3 years.
>> Yes, I'd suspect we'd all see more things here.  We're very much capable
>> of adding instrumentation to the OS/kernel to help track this down if
>> you have ideas.
> INVARIANTS, DIAGNOSTIC, DEBUG_VFS_LOCK.
>
> What is needed is the printout of *vp involved in the panic.
>
>> -Alfred
>>
>>
>>>> db>  bt
>>>> Tracing pid 5099 tid 101614 td 0xfffffe005d54e8c0
>>>> _sx_xlock_hard() at _sx_xlock_hard+0xb3
>>>> lf_advlockasync() at lf_advlockasync+0x5d7
>>>> lf_advlock() at lf_advlock+0x47
>>>> vop_stdadvlock() at vop_stdadvlock+0xb3
>>>> VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x4a
>>>> closef() at closef+0x352
>>>> kern_close() at kern_close+0x172
>>>> amd64_syscall() at amd64_syscall+0x58a
>>>> Xfast_syscall() at Xfast_syscall+0xf7
>>>> --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8011651fc, rsp = 0x7fffffbfdd58, rbp = 0x807c3d6c0 ---
>>>>
>>>> (kgdb) list *(_sx_xlock_hard+0xb3)
>>>> 0xffffffff806242c3 is in _sx_xlock_hard
>>>> (/usr/home/jpaetzel/9.0.6-RELEASE-p1/FreeBSD/src/sys/kern/kern_sx.c:514).
>>>> 509                     x = sx->sx_lock;
>>>> 510                     if ((sx->lock_object.lo_flags & SX_NOADAPTIVE)
>>>> == 0) {
>>>> 511                             if ((x & SX_LOCK_SHARED) == 0) {
>>>> 512                                     x = SX_OWNER(x);
>>>> 513                                     owner = (struct thread *)x;
>>>> 514                                     if (TD_IS_RUNNING(owner)) {
>>>> 515                                             if
>>>> (LOCK_LOG_TEST(&sx->lock_object, 0))
>>>> 516 CTR3(KTR_LOCK,
>>>> 517                                                 "%s: spinning on %p
>>>> held by %p",
>>>> 518 __func__, sx, owner);
>>>>
>>>>
>>>> Another panic here, which we have less information is attached as an image.
>>>>
>>>> We're looking at using some INVARIANTS and WITNESS kernels, but was
>>>> wondering if y'all had any other suggestions to use please?
>>>>
>>>> thank you,
>>>> -Alfred


From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 19 08:50:11 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id E5AE9996
 for <freebsd-fs@freebsd.org>; Tue, 19 Feb 2013 08:50:11 +0000 (UTC)
 (envelope-from momchil@xaxo.eu)
Received: from vps2.xaxo.eu (vps2.xaxo.eu [78.47.156.66])
 by mx1.freebsd.org (Postfix) with ESMTP id 65BBA772
 for <freebsd-fs@freebsd.org>; Tue, 19 Feb 2013 08:50:10 +0000 (UTC)
Received: from vps2.xaxo.eu (localhost [127.0.0.1])
 by vps2.xaxo.eu (8.14.4/8.14.4) with ESMTP id r1J8labT076840;
 Tue, 19 Feb 2013 09:47:36 +0100 (CET) (envelope-from momchil@xaxo.eu)
Received: (from www@localhost)
 by vps2.xaxo.eu (8.14.4/8.14.4/Submit) id r1J8laNk076834;
 Tue, 19 Feb 2013 09:47:36 +0100 (CET) (envelope-from momchil@xaxo.eu)
X-Authentication-Warning: vps2.xaxo.eu: www set sender to momchil@xaxo.eu
 using -f
Received: from 139.18.9.22 (SquirrelMail authenticated user space)
 by webmail.xaxo.eu with HTTP; Tue, 19 Feb 2013 09:47:36 +0100
Message-ID: <86a88ac8bb038ec5d8034724dcf80924.squirrel@webmail.xaxo.eu>
In-Reply-To: <1794994447.3103158.1361231818953.JavaMail.root@erie.cs.uoguelph.ca>
References: <1794994447.3103158.1361231818953.JavaMail.root@erie.cs.uoguelph.ca>
Date: Tue, 19 Feb 2013 09:47:36 +0100
Subject: Re: NFS + Kerberos
From: "Momchil Ivanov" <momchil@xaxo.eu>
To: "Rick Macklem" <rmacklem@uoguelph.ca>
User-Agent: SquirrelMail/1.4.21
MIME-Version: 1.0
Content-Type: text/plain;charset=utf-8
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
Cc: freebsd-fs@freebsd.org, Momchil Ivanov <momchil@xaxo.eu>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 08:50:12 -0000

On Tue, February 19, 2013 12:56 am, Rick Macklem wrote:
> Thanks to Elias's hard work, a bug/fix has just been isolated in the
> Kerberos library that causes the gssd to fail to translate a principal
> to a uid. The fix is to increase the size of the buffer passed to
> getpwnam_r(). See this thread:
> http://docs.FreeBSD.org/cgi/mid.cgi?CADtN0WKVzbKxhaLQw8y2KLhhRJC9n4ht9wyPmGQ+pHqSjQkVNw
>
> I haven't run into this bug, so I don't know what systems are affected,
> but it would explain why you can't get it working.
>
> I'd suggest you apply the patch in the email (increase buf to 1024) and
> then try again with libraries built with the patch.

Do I have to aplly the patch to the server only and then rebuild world or
do I have to do the same on the client too? And do I need to rebuild
heimdal on both machines?

btw, I checked the logs of the kdc and could not see any trace of the nfs
server trying to validate the client's ticket... Frankly, I don't know
that should I expect there, I haven't used kerberos before, so I have no
idea if it's related to the bug. Here is part of the log:

AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for
krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL
No preauth found, returning PREAUTH-REQUIRED -- user@EXAMPLE.LOCAL
sending 407 bytes to IPv4:X.X.X.X
AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for
krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL
Client sent patypes: encrypted-timestamp
Looking for PKINIT pa-data -- user@EXAMPLE.LOCAL
Looking for ENC-TS pa-data -- user@EXAMPLE.LOCAL
ENC-TS Pre-authentication succeeded -- user@EXAMPLE.LOCAL using des-cbc-crc
Client supported enctypes: des-cbc-crc
Using des-cbc-crc/aes256-cts-hmac-sha1-96
AS-REQ authtime: 2013-02-11T23:45:44 starttime: unset endtime:
2013-02-12T09:45:39 renew till: unset
sending 552 bytes to IPv4:X.X.X.X

Thank you,
Momchil

From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 19 11:45:38 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 2567758E
 for <freebsd-fs@freebsd.org>; Tue, 19 Feb 2013 11:45:38 +0000 (UTC)
 (envelope-from ml@my.gd)
Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com
 [209.85.212.181]) by mx1.freebsd.org (Postfix) with ESMTP id 86E1D7A2
 for <freebsd-fs@freebsd.org>; Tue, 19 Feb 2013 11:45:37 +0000 (UTC)
Received: by mail-wi0-f181.google.com with SMTP id hm6so4658532wib.14
 for <freebsd-fs@freebsd.org>; Tue, 19 Feb 2013 03:45:36 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-received:content-type:mime-version:subject:from:in-reply-to:date
 :cc:content-transfer-encoding:message-id:references:to:x-mailer
 :x-gm-message-state;
 bh=Gdeehuk4X6YWGlAhwgKuY4Ts3TRnoOI8+5be7RrUdLE=;
 b=KZcZv4+TZnqczqfduxhQ9cyCYOOYB6QxSD2Xt0blltJ3bTEa/3rbzurlnAF730wr1v
 QBGHBl45a8sNyfnzjhRNlN7SLuR4508Ff1PHBNSE2MgpRKqoFO9ykSP9Z7iqsgDGXlX3
 Jt5E+3T5Qrk+RuTDV1wE5IR1ayOBarWOLvGilRs+IGk/pQg6X/Rvxxvlm0Fkw7tG6dmv
 WLTLnMuV/TKezWdYaKbrpObszc9Lgtu4RIYv3vkkC1eaAzeoNwdFxNTrTY+qeNCiceHC
 sHppVor1jpaHdXetfQ2DRQagcda5mSN35rTh8peyW//Ad1VJrx6kjAchJSAJJpUs71qD
 Sy1w==
X-Received: by 10.180.24.229 with SMTP id x5mr24848880wif.17.1361274336363;
 Tue, 19 Feb 2013 03:45:36 -0800 (PST)
Received: from dfleuriot-at-hi-media.com ([83.167.62.196])
 by mx.google.com with ESMTPS id eo10sm27248507wib.9.2013.02.19.03.45.35
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Tue, 19 Feb 2013 03:45:35 -0800 (PST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: zfs raid1 error resilvering and mount
From: Fleuriot Damien <ml@my.gd>
In-Reply-To: <CAOrTs_bb26hP92jNJh7CEpbRF8uHaqAeCPF06-rwwE0KoMH3Fw@mail.gmail.com>
Date: Tue, 19 Feb 2013 12:45:34 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <0ED6EC22-9875-45FF-ADFC-BB23C2C94FC0@my.gd>
References: <CAOrTs_aRL1cO3Jm1YrXoQRGoUX4vVqYKvFcTXbzyijYAhDsCVA@mail.gmail.com>
 <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd>
 <CAOrTs_a1CX2tiBe+1zzsSMw+TTqvfFJD=Dr-dUURa_4s4aSYCA@mail.gmail.com>
 <B8CACAE9-64E5-40EE-821D-079524579C4F@my.gd>
 <CAOrTs_bb26hP92jNJh7CEpbRF8uHaqAeCPF06-rwwE0KoMH3Fw@mail.gmail.com>
To: Konstantin Kuklin <konstantin.kuklin@gmail.com>
X-Mailer: Apple Mail (2.1499)
X-Gm-Message-State: ALoCoQmesuEMPc1ZMmjUufFbOOYtA8FFYhxa/jAfNnhQapL4/koedl/vYXttxCpUpeB+VXWDcMyl
Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 11:45:38 -0000

If I understand you correctly, you have:
- booted another system from flash
- NOT replaced the failed device
- under this booted system, resilvering takes place automatically


While I cannot tell why ZFS tries to resilver without a new, proper =
device, I think it will only work once you've replaced the failed =
device.

Could you try replacing the failed drive ?


On Feb 19, 2013, at 12:39 PM, Konstantin Kuklin =
<konstantin.kuklin@gmail.com> wrote:

> i did`t replace disk, after reboot system not started (zfs installed
> as default root system) and i boot from another system(from flash) and
> resilvering has auto start and show me warnings with freeze
> progress(dead on checking zroot/var/crash )
> replacing dead disk healing var/crash with <0x0> adress?
>=20
> 2013/2/18 Fleuriot Damien <ml@my.gd>:
>> Reassure me here, you've replaced your failed vdev before trying to =
resilver right ?
>>=20
>> Your zpool status suggests otherwise, so I only want to make sure =
this is a status from before replacing your drive.
>>=20
>>=20
>> On Feb 18, 2013, at 8:48 AM, Konstantin Kuklin =
<konstantin.kuklin@gmail.com> wrote:
>>=20
>>> i can`t do it, because resilvering in progress(freeze on 0.1%) and =
zfs
>>> list empty
>>>=20
>>> 2013/2/17 Fleuriot Damien <ml@my.gd>:
>>>> Hmmm, zfs destroy -f zroot/var/crash ?
>>>>=20
>>>> Then you can try to zfs mount -a
>>>>=20
>>>>=20
>>>>=20
>>>> Removing pjd and mm from cc, if they want to read your message =
they're old enough to check their ML subscription.
>>>>=20
>>>>=20
>>>> On Feb 17, 2013, at 3:46 PM, Konstantin Kuklin =
<konstantin.kuklin@gmail.com> wrote:
>>>>=20
>>>>> hi, i have raid1 on zfs with 2 device on pool
>>>>> first device died and boot from second not working...
>>>>>=20
>>>>> i try to get http://mfsbsd.vx.sk/ flash and load from it with =
zpool import
>>>>> http://puu.sh/2402E
>>>>>=20
>>>>> when  i load zfs.ko and opensolaris.ko i see this message:
>>>>> Solaris: WARNING: Can't open objset for zroot/var/crash
>>>>> Solaris: WARNING: Can't open objset for zroot/var/crash
>>>>>=20
>>>>> zpool status:
>>>>> http://puu.sh/2405f
>>>>>=20
>>>>> resilvering freeze with:
>>>>> zpool status -v
>>>>>      .............
>>>>>      zroot/usr:<0x28ff>
>>>>>      zroot/usr:<0x29ff>
>>>>>      zroot/usr:<0x2aff>
>>>>>      zroot/var/crash:<0x0>
>>>>> root@Flash:/root #
>>>>>=20
>>>>> how i can delete or drop it fs zroot/var/crash (1m-10m size i =
didn`t
>>>>> remember) and mount other zfs points with my data
>>>>> --
>>>>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC
>>>>> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0=
=B0=D0=BD=D1=82=D0=B8=D0=BD.
>>>>> _______________________________________________
>>>>> freebsd-fs@freebsd.org mailing list
>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>> To unsubscribe, send any mail to =
"freebsd-fs-unsubscribe@freebsd.org"
>>>>=20
>>>=20
>>>=20
>>>=20
>>> --
>>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC
>>> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0=
=B0=D0=BD=D1=82=D0=B8=D0=BD.
>>=20
>=20
>=20
>=20
> --
> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC
> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0=B0=
=D0=BD=D1=82=D0=B8=D0=BD.


From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 19 11:46:50 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 7BBAA60F;
 Tue, 19 Feb 2013 11:46:50 +0000 (UTC)
 (envelope-from konstantin.kuklin@gmail.com)
Received: from mail-qa0-f53.google.com (mail-qa0-f53.google.com
 [209.85.216.53]) by mx1.freebsd.org (Postfix) with ESMTP id 158657B0;
 Tue, 19 Feb 2013 11:46:49 +0000 (UTC)
Received: by mail-qa0-f53.google.com with SMTP id z4so1760792qan.12
 for <multiple recipients>; Tue, 19 Feb 2013 03:46:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type:content-transfer-encoding;
 bh=C70p9JL3bdcMt+Xio/yJrStsHXOTL/ICUr1GTzlOg68=;
 b=Ox3Mu/ORfvWAU9LUAf1WIjX9tfTtXQGA0GhJK9/JmXvG74bO6DWwf1eYSd+wQuzynK
 hTzMCeDkOwWN9tcVu/SBvRe19IwH7Ydp3ic9oGRylJUn5TiwGi18VGJlO5evc8ky0NUJ
 b+icAnUTXSR8YrHTMLIxNBo5dcTAhRe3fCQZubb5w0xQf9Hw8dvfyKrfCnSDyJ5BYXXk
 QXzKuWUO9a2SfbF5H8PrZqfa0qelVlrLYTqq0iT/EmX37dX+BpyrtBeHj1F7UnJK0R22
 vCG0UlBp8jZR38B7znNNLHTbbe33qOP7xobwBcDT13cdUjma7GPEpcKm69dzax7mHvR6
 PgYA==
MIME-Version: 1.0
X-Received: by 10.224.60.6 with SMTP id n6mr6640429qah.16.1361273974315; Tue,
 19 Feb 2013 03:39:34 -0800 (PST)
Received: by 10.49.98.130 with HTTP; Tue, 19 Feb 2013 03:39:34 -0800 (PST)
In-Reply-To: <B8CACAE9-64E5-40EE-821D-079524579C4F@my.gd>
References: <CAOrTs_aRL1cO3Jm1YrXoQRGoUX4vVqYKvFcTXbzyijYAhDsCVA@mail.gmail.com>
 <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd>
 <CAOrTs_a1CX2tiBe+1zzsSMw+TTqvfFJD=Dr-dUURa_4s4aSYCA@mail.gmail.com>
 <B8CACAE9-64E5-40EE-821D-079524579C4F@my.gd>
Date: Tue, 19 Feb 2013 15:39:34 +0400
Message-ID: <CAOrTs_bb26hP92jNJh7CEpbRF8uHaqAeCPF06-rwwE0KoMH3Fw@mail.gmail.com>
Subject: Re: zfs raid1 error resilvering and mount
From: Konstantin Kuklin <konstantin.kuklin@gmail.com>
To: Fleuriot Damien <ml@my.gd>
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 11:46:50 -0000

i did`t replace disk, after reboot system not started (zfs installed
as default root system) and i boot from another system(from flash) and
resilvering has auto start and show me warnings with freeze
progress(dead on checking zroot/var/crash )
replacing dead disk healing var/crash with <0x0> adress?

2013/2/18 Fleuriot Damien <ml@my.gd>:
> Reassure me here, you've replaced your failed vdev before trying to resil=
ver right ?
>
> Your zpool status suggests otherwise, so I only want to make sure this is=
 a status from before replacing your drive.
>
>
> On Feb 18, 2013, at 8:48 AM, Konstantin Kuklin <konstantin.kuklin@gmail.c=
om> wrote:
>
>> i can`t do it, because resilvering in progress(freeze on 0.1%) and zfs
>> list empty
>>
>> 2013/2/17 Fleuriot Damien <ml@my.gd>:
>>> Hmmm, zfs destroy -f zroot/var/crash ?
>>>
>>> Then you can try to zfs mount -a
>>>
>>>
>>>
>>> Removing pjd and mm from cc, if they want to read your message they're =
old enough to check their ML subscription.
>>>
>>>
>>> On Feb 17, 2013, at 3:46 PM, Konstantin Kuklin <konstantin.kuklin@gmail=
.com> wrote:
>>>
>>>> hi, i have raid1 on zfs with 2 device on pool
>>>> first device died and boot from second not working...
>>>>
>>>> i try to get http://mfsbsd.vx.sk/ flash and load from it with zpool im=
port
>>>> http://puu.sh/2402E
>>>>
>>>> when  i load zfs.ko and opensolaris.ko i see this message:
>>>> Solaris: WARNING: Can't open objset for zroot/var/crash
>>>> Solaris: WARNING: Can't open objset for zroot/var/crash
>>>>
>>>> zpool status:
>>>> http://puu.sh/2405f
>>>>
>>>> resilvering freeze with:
>>>> zpool status -v
>>>>       .............
>>>>       zroot/usr:<0x28ff>
>>>>       zroot/usr:<0x29ff>
>>>>       zroot/usr:<0x2aff>
>>>>       zroot/var/crash:<0x0>
>>>> root@Flash:/root #
>>>>
>>>> how i can delete or drop it fs zroot/var/crash (1m-10m size i didn`t
>>>> remember) and mount other zfs points with my data
>>>> --
>>>> =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD
>>>> =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE.
>>>> _______________________________________________
>>>> freebsd-fs@freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>>
>>
>>
>>
>> --
>> =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD
>> =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE.
>


--
=F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD
=EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE.

From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 19 13:27:04 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 2644B3DF
 for <freebsd-fs@freebsd.org>; Tue, 19 Feb 2013 13:27:04 +0000 (UTC)
 (envelope-from ml@my.gd)
Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com
 [209.85.212.182]) by mx1.freebsd.org (Postfix) with ESMTP id B1427D22
 for <freebsd-fs@freebsd.org>; Tue, 19 Feb 2013 13:27:03 +0000 (UTC)
Received: by mail-wi0-f182.google.com with SMTP id hi18so4768574wib.3
 for <freebsd-fs@freebsd.org>; Tue, 19 Feb 2013 05:26:56 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-received:content-type:mime-version:subject:from:in-reply-to:date
 :cc:content-transfer-encoding:message-id:references:to:x-mailer
 :x-gm-message-state;
 bh=hDyDQ/BdgeQGgw+6VZTeqj9wcZVR7AXl4Ev8z6bZs0k=;
 b=S5+5Wvb/+0nCcLJ2bSaLY2PUdJyunvCBZeOayOOFvwcThs0rZr3FF3OSbsN0B9YL1r
 5MZunOG/zucsYZk+yWIuiD9s94qSLVju17jxL9GYXuddlbiKSL9YdGK88oQhsF9yZfNs
 865x13GsLAPKU+cGC2rKYZFqQ+pAnXH0TJ9l0NCfOqaxWpQ50PlTnseTG4CX63ER6jG2
 X8zQpEdUaRY03D6C4+eHjNlWO6LBK2qGITa0lFvhASoC0W0PUIrR3zlV5sJqW/mWCICg
 UA5KNxCL1E6b9mK5JvDM41gMlPt/NfA3bjzQxDoNGLle56KbEN6i2vPjDztjeaGnfr0C
 Gm9w==
X-Received: by 10.180.24.229 with SMTP id x5mr25538857wif.17.1361280416728;
 Tue, 19 Feb 2013 05:26:56 -0800 (PST)
Received: from dfleuriot-at-hi-media.com ([83.167.62.196])
 by mx.google.com with ESMTPS id s8sm25560861wif.9.2013.02.19.05.26.54
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Tue, 19 Feb 2013 05:26:55 -0800 (PST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: zfs raid1 error resilvering and mount
From: Fleuriot Damien <ml@my.gd>
In-Reply-To: <CAOrTs_YeuYE4=+6AJse5TizALKobLzuqWPwdiK8YHAvr4gjbfg@mail.gmail.com>
Date: Tue, 19 Feb 2013 14:26:54 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <2FC474DC-1905-4BEC-BFA6-037054B5437B@my.gd>
References: <CAOrTs_aRL1cO3Jm1YrXoQRGoUX4vVqYKvFcTXbzyijYAhDsCVA@mail.gmail.com>
 <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd>
 <CAOrTs_a1CX2tiBe+1zzsSMw+TTqvfFJD=Dr-dUURa_4s4aSYCA@mail.gmail.com>
 <B8CACAE9-64E5-40EE-821D-079524579C4F@my.gd>
 <CAOrTs_bb26hP92jNJh7CEpbRF8uHaqAeCPF06-rwwE0KoMH3Fw@mail.gmail.com>
 <0ED6EC22-9875-45FF-ADFC-BB23C2C94FC0@my.gd>
 <CAOrTs_asx4N9Pt-W3ahg=KY0Bwc+PQ-oXxjAHt8qdn08XQd9yA@mail.gmail.com>
 <8506B305-D696-4213-BA59-929E4886B10C@my.gd>
 <CAOrTs_YeuYE4=+6AJse5TizALKobLzuqWPwdiK8YHAvr4gjbfg@mail.gmail.com>
To: Konstantin Kuklin <konstantin.kuklin@gmail.com>
X-Mailer: Apple Mail (2.1499)
X-Gm-Message-State: ALoCoQmH3wfsydfdmsoI8C1Mv5fS9ai4n0AAiicrlz5Fl2gZqrLKJV5Td7udGd8wVeh4Ej7GTITt
Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 13:27:04 -0000

Well I can't see anything else to help you, except trying to replace =
your failed vdev and resilver from there=E2=80=A6


On Feb 19, 2013, at 2:24 PM, Konstantin Kuklin =
<konstantin.kuklin@gmail.com> wrote:

> zfs set canmount=3Doff zroot/var/crash
>=20
> i can`t do this, because zfs list empty
>=20
> 2013/2/19 Fleuriot Damien <ml@my.gd>:
>> The thing is, perhaps you have corrupted blocks that weren't caught =
either by ZFS or your drives' firmware, preventing the pool's operation.
>>=20
>> Seeing zroot/var/crash is the problem, could you try:
>>=20
>> 1/ booting from a live CD or flash
>> 2/ NOT start a resilver
>> 3/ run the command:
>> zfs set canmount=3Doff zroot/var/crash
>>=20
>>=20
>> This should prevent /var/crash from trying to be mounted from the ZFS =
pool.
>>=20
>> Perhaps this'll allow you to get further through the boot process and =
perhaps even start your ZFS pool correctly.
>>=20
>>=20
>>=20
>> On Feb 19, 2013, at 12:52 PM, Konstantin Kuklin =
<konstantin.kuklin@gmail.com> wrote:
>>=20
>>> you understand me right, but my problem not in dead device... raid1
>>> must work correctly with 1 device and command to replace or =
something
>>> else not work, just freeze
>>> i have only 2 warning about crash fs zroot/var/crash and thats all
>>> have any idea, how i can repair it without default zfs tools like =
zfs, zpool?
>>>=20
>>>=20
>>> 2013/2/19 Fleuriot Damien <ml@my.gd>:
>>>> If I understand you correctly, you have:
>>>> - booted another system from flash
>>>> - NOT replaced the failed device
>>>> - under this booted system, resilvering takes place automatically
>>>>=20
>>>>=20
>>>> While I cannot tell why ZFS tries to resilver without a new, proper =
device, I think it will only work once you've replaced the failed =
device.
>>>>=20
>>>> Could you try replacing the failed drive ?
>>>>=20
>>>>=20
>>>> On Feb 19, 2013, at 12:39 PM, Konstantin Kuklin =
<konstantin.kuklin@gmail.com> wrote:
>>>>=20
>>>>> i did`t replace disk, after reboot system not started (zfs =
installed
>>>>> as default root system) and i boot from another system(from flash) =
and
>>>>> resilvering has auto start and show me warnings with freeze
>>>>> progress(dead on checking zroot/var/crash )
>>>>> replacing dead disk healing var/crash with <0x0> adress?
>>>>>=20
>>>>> 2013/2/18 Fleuriot Damien <ml@my.gd>:
>>>>>> Reassure me here, you've replaced your failed vdev before trying =
to resilver right ?
>>>>>>=20
>>>>>> Your zpool status suggests otherwise, so I only want to make sure =
this is a status from before replacing your drive.
>>>>>>=20
>>>>>>=20
>>>>>> On Feb 18, 2013, at 8:48 AM, Konstantin Kuklin =
<konstantin.kuklin@gmail.com> wrote:
>>>>>>=20
>>>>>>> i can`t do it, because resilvering in progress(freeze on 0.1%) =
and zfs
>>>>>>> list empty
>>>>>>>=20
>>>>>>> 2013/2/17 Fleuriot Damien <ml@my.gd>:
>>>>>>>> Hmmm, zfs destroy -f zroot/var/crash ?
>>>>>>>>=20
>>>>>>>> Then you can try to zfs mount -a
>>>>>>>>=20
>>>>>>>>=20
>>>>>>>>=20
>>>>>>>> Removing pjd and mm from cc, if they want to read your message =
they're old enough to check their ML subscription.
>>>>>>>>=20
>>>>>>>>=20
>>>>>>>> On Feb 17, 2013, at 3:46 PM, Konstantin Kuklin =
<konstantin.kuklin@gmail.com> wrote:
>>>>>>>>=20
>>>>>>>>> hi, i have raid1 on zfs with 2 device on pool
>>>>>>>>> first device died and boot from second not working...
>>>>>>>>>=20
>>>>>>>>> i try to get http://mfsbsd.vx.sk/ flash and load from it with =
zpool import
>>>>>>>>> http://puu.sh/2402E
>>>>>>>>>=20
>>>>>>>>> when  i load zfs.ko and opensolaris.ko i see this message:
>>>>>>>>> Solaris: WARNING: Can't open objset for zroot/var/crash
>>>>>>>>> Solaris: WARNING: Can't open objset for zroot/var/crash
>>>>>>>>>=20
>>>>>>>>> zpool status:
>>>>>>>>> http://puu.sh/2405f
>>>>>>>>>=20
>>>>>>>>> resilvering freeze with:
>>>>>>>>> zpool status -v
>>>>>>>>>    .............
>>>>>>>>>    zroot/usr:<0x28ff>
>>>>>>>>>    zroot/usr:<0x29ff>
>>>>>>>>>    zroot/usr:<0x2aff>
>>>>>>>>>    zroot/var/crash:<0x0>
>>>>>>>>> root@Flash:/root #
>>>>>>>>>=20
>>>>>>>>> how i can delete or drop it fs zroot/var/crash (1m-10m size i =
didn`t
>>>>>>>>> remember) and mount other zfs points with my data
>>>>>>>>> --
>>>>>>>>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC
>>>>>>>>> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=
=82=D0=B0=D0=BD=D1=82=D0=B8=D0=BD.
>>>>>>>>> _______________________________________________
>>>>>>>>> freebsd-fs@freebsd.org mailing list
>>>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>>>> To unsubscribe, send any mail to =
"freebsd-fs-unsubscribe@freebsd.org"
>>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>> --
>>>>>>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC
>>>>>>> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=
=D0=B0=D0=BD=D1=82=D0=B8=D0=BD.
>>>>>>=20
>>>>>=20
>>>>>=20
>>>>>=20
>>>>> --
>>>>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC
>>>>> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0=
=B0=D0=BD=D1=82=D0=B8=D0=BD.
>>>>=20
>>>=20
>>>=20
>>>=20
>>> --
>>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC
>>> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0=
=B0=D0=BD=D1=82=D0=B8=D0=BD.
>>=20
>=20
>=20
>=20
> --
> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC
> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0=B0=
=D0=BD=D1=82=D0=B8=D0=BD.


From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 19 13:32:19 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id D97768EC;
 Tue, 19 Feb 2013 13:32:19 +0000 (UTC)
 (envelope-from konstantin.kuklin@gmail.com)
Received: from mail-qe0-f42.google.com (mail-qe0-f42.google.com
 [209.85.128.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7B63FD84;
 Tue, 19 Feb 2013 13:32:19 +0000 (UTC)
Received: by mail-qe0-f42.google.com with SMTP id 2so3044900qeb.15
 for <multiple recipients>; Tue, 19 Feb 2013 05:32:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type:content-transfer-encoding;
 bh=NmMeb08ghdhT2tPcuhh3sqWDEQrepRCVqabzsNkQE8g=;
 b=OV5j/4E34d5WAPTrsOrCVJdcyNbaYnp1TKh8NoFTCVGFU2k309KF2sCAzqWZkxdDkC
 8bmo4Utw3K4vAjxYiXX5tAb/+XUQRTp5UF4rLpk0LwIwOZVchdu64SJ+fKfI1cuO7KUH
 6dFIQAk2jNTUhOQl1g4r10aC+8Mng+0G9bR+Y5bZ79i6+xLuzCtmiow8BizKHtWQkRdI
 E+hNLAxD6SvQQyEE5YF58Q4waSNVb1BRuDUETEZvAEuKygWUNiwVI9878y+f+YMHWv4t
 KG4SB/IHJrOToLyceJ27ngk9dxbjjy3oX9feETvPuNqrgt8FcqYm6q2g/XYvB6q3VXt9
 Bqrg==
MIME-Version: 1.0
X-Received: by 10.224.33.14 with SMTP id f14mr7097690qad.69.1361280278069;
 Tue, 19 Feb 2013 05:24:38 -0800 (PST)
Received: by 10.49.98.130 with HTTP; Tue, 19 Feb 2013 05:24:37 -0800 (PST)
In-Reply-To: <8506B305-D696-4213-BA59-929E4886B10C@my.gd>
References: <CAOrTs_aRL1cO3Jm1YrXoQRGoUX4vVqYKvFcTXbzyijYAhDsCVA@mail.gmail.com>
 <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd>
 <CAOrTs_a1CX2tiBe+1zzsSMw+TTqvfFJD=Dr-dUURa_4s4aSYCA@mail.gmail.com>
 <B8CACAE9-64E5-40EE-821D-079524579C4F@my.gd>
 <CAOrTs_bb26hP92jNJh7CEpbRF8uHaqAeCPF06-rwwE0KoMH3Fw@mail.gmail.com>
 <0ED6EC22-9875-45FF-ADFC-BB23C2C94FC0@my.gd>
 <CAOrTs_asx4N9Pt-W3ahg=KY0Bwc+PQ-oXxjAHt8qdn08XQd9yA@mail.gmail.com>
 <8506B305-D696-4213-BA59-929E4886B10C@my.gd>
Date: Tue, 19 Feb 2013 17:24:37 +0400
Message-ID: <CAOrTs_YeuYE4=+6AJse5TizALKobLzuqWPwdiK8YHAvr4gjbfg@mail.gmail.com>
Subject: Re: zfs raid1 error resilvering and mount
From: Konstantin Kuklin <konstantin.kuklin@gmail.com>
To: Fleuriot Damien <ml@my.gd>
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: quoted-printable
X-Mailman-Approved-At: Tue, 19 Feb 2013 13:55:25 +0000
Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 13:32:19 -0000

zfs set canmount=3Doff zroot/var/crash

i can`t do this, because zfs list empty

2013/2/19 Fleuriot Damien <ml@my.gd>:
> The thing is, perhaps you have corrupted blocks that weren't caught eithe=
r by ZFS or your drives' firmware, preventing the pool's operation.
>
> Seeing zroot/var/crash is the problem, could you try:
>
> 1/ booting from a live CD or flash
> 2/ NOT start a resilver
> 3/ run the command:
> zfs set canmount=3Doff zroot/var/crash
>
>
> This should prevent /var/crash from trying to be mounted from the ZFS poo=
l.
>
> Perhaps this'll allow you to get further through the boot process and per=
haps even start your ZFS pool correctly.
>
>
>
> On Feb 19, 2013, at 12:52 PM, Konstantin Kuklin <konstantin.kuklin@gmail.=
com> wrote:
>
>> you understand me right, but my problem not in dead device... raid1
>> must work correctly with 1 device and command to replace or something
>> else not work, just freeze
>> i have only 2 warning about crash fs zroot/var/crash and thats all
>> have any idea, how i can repair it without default zfs tools like zfs, z=
pool?
>>
>>
>> 2013/2/19 Fleuriot Damien <ml@my.gd>:
>>> If I understand you correctly, you have:
>>> - booted another system from flash
>>> - NOT replaced the failed device
>>> - under this booted system, resilvering takes place automatically
>>>
>>>
>>> While I cannot tell why ZFS tries to resilver without a new, proper dev=
ice, I think it will only work once you've replaced the failed device.
>>>
>>> Could you try replacing the failed drive ?
>>>
>>>
>>> On Feb 19, 2013, at 12:39 PM, Konstantin Kuklin <konstantin.kuklin@gmai=
l.com> wrote:
>>>
>>>> i did`t replace disk, after reboot system not started (zfs installed
>>>> as default root system) and i boot from another system(from flash) and
>>>> resilvering has auto start and show me warnings with freeze
>>>> progress(dead on checking zroot/var/crash )
>>>> replacing dead disk healing var/crash with <0x0> adress?
>>>>
>>>> 2013/2/18 Fleuriot Damien <ml@my.gd>:
>>>>> Reassure me here, you've replaced your failed vdev before trying to r=
esilver right ?
>>>>>
>>>>> Your zpool status suggests otherwise, so I only want to make sure thi=
s is a status from before replacing your drive.
>>>>>
>>>>>
>>>>> On Feb 18, 2013, at 8:48 AM, Konstantin Kuklin <konstantin.kuklin@gma=
il.com> wrote:
>>>>>
>>>>>> i can`t do it, because resilvering in progress(freeze on 0.1%) and z=
fs
>>>>>> list empty
>>>>>>
>>>>>> 2013/2/17 Fleuriot Damien <ml@my.gd>:
>>>>>>> Hmmm, zfs destroy -f zroot/var/crash ?
>>>>>>>
>>>>>>> Then you can try to zfs mount -a
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Removing pjd and mm from cc, if they want to read your message they=
're old enough to check their ML subscription.
>>>>>>>
>>>>>>>
>>>>>>> On Feb 17, 2013, at 3:46 PM, Konstantin Kuklin <konstantin.kuklin@g=
mail.com> wrote:
>>>>>>>
>>>>>>>> hi, i have raid1 on zfs with 2 device on pool
>>>>>>>> first device died and boot from second not working...
>>>>>>>>
>>>>>>>> i try to get http://mfsbsd.vx.sk/ flash and load from it with zpoo=
l import
>>>>>>>> http://puu.sh/2402E
>>>>>>>>
>>>>>>>> when  i load zfs.ko and opensolaris.ko i see this message:
>>>>>>>> Solaris: WARNING: Can't open objset for zroot/var/crash
>>>>>>>> Solaris: WARNING: Can't open objset for zroot/var/crash
>>>>>>>>
>>>>>>>> zpool status:
>>>>>>>> http://puu.sh/2405f
>>>>>>>>
>>>>>>>> resilvering freeze with:
>>>>>>>> zpool status -v
>>>>>>>>     .............
>>>>>>>>     zroot/usr:<0x28ff>
>>>>>>>>     zroot/usr:<0x29ff>
>>>>>>>>     zroot/usr:<0x2aff>
>>>>>>>>     zroot/var/crash:<0x0>
>>>>>>>> root@Flash:/root #
>>>>>>>>
>>>>>>>> how i can delete or drop it fs zroot/var/crash (1m-10m size i didn=
`t
>>>>>>>> remember) and mount other zfs points with my data
>>>>>>>> --
>>>>>>>> =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD
>>>>>>>> =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE.
>>>>>>>> _______________________________________________
>>>>>>>> freebsd-fs@freebsd.org mailing list
>>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.o=
rg"
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD
>>>>>> =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD
>>>> =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE.
>>>
>>
>>
>>
>> --
>> =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD
>> =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE.
>


--
=F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD
=EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE.

From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 19 14:58:32 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 56CB1C4;
 Tue, 19 Feb 2013 14:58:31 +0000 (UTC)
 (envelope-from freebsd-listen@fabiankeil.de)
Received: from smtprelay02.ispgateway.de (smtprelay02.ispgateway.de
 [80.67.31.36]) by mx1.freebsd.org (Postfix) with ESMTP id 65668B34;
 Tue, 19 Feb 2013 14:58:31 +0000 (UTC)
Received: from [78.35.187.42] (helo=fabiankeil.de)
 by smtprelay02.ispgateway.de with esmtpsa (SSLv3:AES128-SHA:128)
 (Exim 4.68) (envelope-from <freebsd-listen@fabiankeil.de>)
 id 1U7oKU-0005s9-Iy; Tue, 19 Feb 2013 15:37:58 +0100
Date: Tue, 19 Feb 2013 15:35:51 +0100
From: Fabian Keil <freebsd-listen@fabiankeil.de>
To: "Steven Hartland" <smh@freebsd.org>
Subject: Re: ZFS on 9.1 doesn't see errors on geli volumes...
Message-ID: <20130219153551.175ad31f@fabiankeil.de>
In-Reply-To: <EF8E2A888F8D44FBA7362610150DAEA1@multiplay.co.uk>
References: <20130218191242.GI55866@funkthat.com>
 <20130218200121.GJ55866@funkthat.com>
 <EF8E2A888F8D44FBA7362610150DAEA1@multiplay.co.uk>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=PGP-SHA1;
 boundary="Sig_/=MYq4uS8VhumqmOEv5iB7oq"; protocol="application/pgp-signature"
X-Df-Sender: Nzc1MDY3
Cc: freebsd-fs@FreeBSD.org, John-Mark Gurney <jmg@funkthat.com>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 14:58:32 -0000

--Sig_/=MYq4uS8VhumqmOEv5iB7oq
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

"Steven Hartland" <smh@freebsd.org> wrote:

> From: "John-Mark Gurney" <jmg@funkthat.com>

> > so, we just end up overwriting the bio_completed error...
> >
> > pjd, should we just put the bio_completed =3D line under an else?
> >
> > something like:
> > if (pbp->bio_error !=3D 0) {
> > G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=3D%d).",
> >     pbp->bio_error);
> > pbp->bio_completed =3D 0;
> > } else
> > pbp->bio_completed =3D pbp->bio_length;
> >
> > /* Write is finished, send it up. */
> > g_io_deliver(pbp, pbp->bio_error);
> > sc =3D pbp->bio_to->geom->softc;
> > atomic_subtract_int(&sc->sc_inflight, 1);
> >
> > But doesn't explain why read's aren't being counted though...
>=20
> Looks like the read case will loose the error if its not the last
> bio in sector group.
>=20
> The attached should fix both cases.

Works for me on 10-CURRENT, thanks.

> A question for someone familiar with geom: why is bio_completed
> not set to bio_length in the read success case? Is this correct
> or is this another little bug?

No idea, but I reported "another little bug" a while ago:
http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/162036
and while testing how the patch affects it, I discovered
that the panic can be prevented with:

diff --git a/sys/geom/eli/g_eli.c b/sys/geom/eli/g_eli.c
index 4e35297..24969b0 100644
--- a/sys/geom/eli/g_eli.c
+++ b/sys/geom/eli/g_eli.c
@@ -183,7 +183,8 @@ g_eli_read_done(struct bio *bp)
                        pbp->bio_driver2 =3D NULL;
                }
                g_io_deliver(pbp, pbp->bio_error);
-               atomic_subtract_int(&sc->sc_inflight, 1);
+               if (sc !=3D NULL)
+                       atomic_subtract_int(&sc->sc_inflight, 1);
                return;
        }
        mtx_lock(&sc->sc_queue_mtx);

atomic_*_int(&sc->sc_inflight, 1) seems to be used without
checking that sc isn't NULL pretty much everywhere in g_eli.c,
though, and it's not clear to me when it's safe and when it isn't.

> On a related note, if anyone's got some pointers to docs about
> the internals of geom, I'd be interested :)

Every time I looked for internal geom documentation in the
past I came up empty.

Fabian

--Sig_/=MYq4uS8VhumqmOEv5iB7oq
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlEjjckACgkQBYqIVf93VJ23AQCfUjev5ucpAJ+pZy4TlhrecbwO
7XQAoLzeQ0t0lRZqyYGHB7xGpEfHuDSA
=Gebs
-----END PGP SIGNATURE-----

--Sig_/=MYq4uS8VhumqmOEv5iB7oq--

From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 19 18:24:20 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 30A1793E;
 Tue, 19 Feb 2013 18:24:20 +0000 (UTC) (envelope-from jhb@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 08F9EBED;
 Tue, 19 Feb 2013 18:24:20 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r1JIOJBc033727;
 Tue, 19 Feb 2013 18:24:19 GMT
 (envelope-from jhb@freefall.freebsd.org)
Received: (from jhb@localhost)
 by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r1JIOJP3033723;
 Tue, 19 Feb 2013 18:24:19 GMT (envelope-from jhb)
Date: Tue, 19 Feb 2013 18:24:19 GMT
Message-Id: <201302191824.r1JIOJP3033723@freefall.freebsd.org>
To: jhb@FreeBSD.org, freebsd-fs@FreeBSD.org, jhb@FreeBSD.org
From: jhb@FreeBSD.org
Subject: Re: kern/176179: [nfs] nfs client KASSERT: panic: attempt to set
 TDF_SBDRY recursively
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 18:24:20 -0000

Synopsis: [nfs] nfs client KASSERT: panic: attempt to set TDF_SBDRY recursively

Responsible-Changed-From-To: freebsd-fs->jhb
Responsible-Changed-By: jhb
Responsible-Changed-When: Tue Feb 19 18:23:50 UTC 2013
Responsible-Changed-Why: 
This is due to one of my changes.  I am reworking it and the reworked
version should fix this.

http://www.freebsd.org/cgi/query-pr.cgi?pr=176179

From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 19 20:11:03 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id C5BDB915
 for <freebsd-fs@freebsd.org>; Tue, 19 Feb 2013 20:11:03 +0000 (UTC)
 (envelope-from toasty@dragondata.com)
Received: from mail-ia0-x22a.google.com (ia-in-x022a.1e100.net
 [IPv6:2607:f8b0:4001:c02::22a])
 by mx1.freebsd.org (Postfix) with ESMTP id 8EBD33A6
 for <freebsd-fs@freebsd.org>; Tue, 19 Feb 2013 20:11:03 +0000 (UTC)
Received: by mail-ia0-f170.google.com with SMTP id k20so6532308iak.1
 for <freebsd-fs@freebsd.org>; Tue, 19 Feb 2013 12:11:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=dragondata.com; s=google;
 h=x-received:content-type:mime-version:subject:from:in-reply-to:date
 :cc:content-transfer-encoding:message-id:references:to:x-mailer;
 bh=UPWosEXkGGWzgnU5mq7j1BCBDV6yLJmc4XG/ifsHxns=;
 b=jTlrkjDXCD1SqoNQ1ErD4Dkn3S8Ik8GaRBTU8//zyRMpbnNKuy4yfs4wxl8gs8ziE/
 Tq02ZxZHGXP3+N/fHRtlWMUBmE3iZRvTFvewkNQbv1R2T5RO8mozbuXzPAa7dqWF4+/u
 WfAIt+DES0Qu4HSbSFsk/CiuWpMAkHNGEA/fw=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-received:content-type:mime-version:subject:from:in-reply-to:date
 :cc:content-transfer-encoding:message-id:references:to:x-mailer
 :x-gm-message-state;
 bh=UPWosEXkGGWzgnU5mq7j1BCBDV6yLJmc4XG/ifsHxns=;
 b=HgSrQameUHyhRey89xAkrEFisg/inK+++toH/2qEmdAUCiZXDWCiP7b2e9myhyF4XS
 oi7IWRhFz1N6aN+XqyQcRuWffK4Kr3GxjnQzeFz9MdQHz6914XZgHPfnSyEULTbiBUDM
 pdV2JgHIZAML0lclHMuzTEPULzXvzZUKWOMhoMvdHZx702UEn2hMQBhjeZjwrBBP1PPG
 7rDV5zLUJS8wbRXhwusEObOo4ZUG+q7OYVowe21S13Xx+kiz//lbvmWC8wj1owuaNW/b
 Y9FdEl23iENArnGrWs+FLs3qTaxYZQn+LvFQ033yilrPJfmT6P2QnsEyy54jgtejbXTa
 gjqw==
X-Received: by 10.50.196.165 with SMTP id in5mr9855938igc.99.1361304653771;
 Tue, 19 Feb 2013 12:10:53 -0800 (PST)
Received: from vpn132.rw1.your.org (vpn132.rw1.your.org. [204.9.51.132])
 by mx.google.com with ESMTPS id ip8sm4053976igc.4.2013.02.19.12.10.50
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Tue, 19 Feb 2013 12:10:52 -0800 (PST)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: Improving ZFS performance for large directories
From: Kevin Day <toasty@dragondata.com>
In-Reply-To: <20130201192416.GA76461@server.rulingia.com>
Date: Tue, 19 Feb 2013 14:10:47 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com>
References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com>
 <CAJjvXiE+8OMu_yvdRAsWugH7W=fhFW7bicOLLyjEn8YrgvCwiw@mail.gmail.com>
 <F4420A8C-FB92-4771-B261-6C47A736CF7F@dragondata.com>
 <20130201192416.GA76461@server.rulingia.com>
To: Peter Jeremy <peter@rulingia.com>
X-Mailer: Apple Mail (2.1499)
X-Gm-Message-State: ALoCoQkU2VCvT+jkKziLCBqbIYtrE5fbjJqPe+T3S++ADtUBZdXWe0nfe9VdbxmF1I4DTgoPuVRq
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 20:11:03 -0000

Sorry for the late followup, I've been doing some testing with an L2ARC =
device.


>> Doing it twice back-to-back makes a bit of difference but it's still =
slow either way.
>=20
> ZFS can very conservative about caching data and twice might not be =
enough.
> I suggest you try 8-10 times, or until the time stops reducing.
>=20

Timing doing an "ls" in large directories 20 times, the first is the =
slowest, then all subsequent listings are roughly the same. There =
doesn't appear to be any gain after 20 repetitions=20


>> I think some of the issue is that nothing is being allowed to stay =
cached long.
>=20
> Well ZFS doesn't do any time-based eviction so if things aren't
> staying in the cache, it's because they are being evicted by things
> that ZFS considers more deserving.
>=20
> Looking at the zfs-stats you posted, it looks like your workload has
> very low locality of reference (the data hitrate is very) low.  If
> this is not what you expect then you need more RAM.  OTOH, your
> vfs.zfs.arc_meta_used being above vfs.zfs.arc_meta_limit suggests that
> ZFS really wants to cache more metadata (by default ZFS has a 25%
> metadata, 75% data split in ARC to prevent metadata caching starving
> data caching).  I would go even further than the 50:50 split suggested
> later and try 75:25 (ie, triple the current vfs.zfs.arc_meta_limit).
>=20
> Note that if there is basically no locality of reference in your
> workload (as I suspect), you can even turn off data caching for
> specific filesystems with zfs set primarycache=3Dmetadata tank/foo
> (note that you still need to increase vfs.zfs.arc_meta_limit to
> allow ZFS to use the the ARC to cache metadata).

Now that I've got an L2ARC device (250GB), I've been doing some playing. =
With the defaults (primarycache and secondarycache set to all), I really =
didn't see much improvement. The SSD filled itself pretty quickly, but =
it's hit rate was around 1%, even after 48 hours.

Thinking I'd make the primary cache metadata only, and the secondary =
cache "all" would improve things, I wiped the device (SATA secure erase =
to make sure) and tried again. This was much worse, i'm guessing because =
there was some amount of real file data being looked at frequently, the =
SSD was basically getting hammered for read access with 100% =
utilization, and things were far slower.

I wiped the SSD and tried again with primarycache=3Dall, =
secondarycache=3Dmetadata and things have improved. Even with boosting =
up vfs.zfs.l2arc_write_max, it took quite a while before things =
stabilized. I'm guessing there isn't a huge amount of data, but there's =
such poor locality and sweeping the entire filesystem takes so long that =
it's going to take a while before it decides what's worth being cached. =
After about 20 hours in this configuration, it's a HUGE difference on =
directory speeds though. Before adding the SSD, an "ls" in a directory =
with 65k files would take 10-30 seconds, it's now down to about 0.2 =
seconds.=20

So I'm guessing the theory was right, there was more metadata than would =
fit in ARC so it was constantly churning. I'm a bit surprised that =
continually doing an ls in a big directory didn't make it stick better, =
but these filesystems are HUGE so there may be some inefficiencies =
happening here. There are roughly 29M files, growing at about 50k =
files/day. We recently upgraded, and are now at 96 3TB drives in the =
pool.

What I also find surprising is this:

L2 ARC Size: (Adaptive)				22.70	GiB
	Header Size:			0.31%	71.49	MiB

L2 ARC Breakdown:				23.77m
	Hit Ratio:			34.26%	8.14m
	Miss Ratio:			65.74%	15.62m
	Feeds:					63.28k

It's a 250G drive, and only 22G is being used, and there's still a ~66% =
miss rate. Is there any way to tell why more metadata isn't being pushed =
to the L2ARC? I see a pretty high count for "Passed Headroom" and "Tried =
Lock Failures", but I'm not sure if that's normal.  Including the =
lengthy output of zfs-stat below in case anyone sees something that =
stands out as being unusual.=20

------------------------------------------------------------------------
ZFS Subsystem Report				Tue Feb 19 20:08:19 2013
------------------------------------------------------------------------

System Information:

	Kernel Version:				901000 (osreldate)
	Hardware Platform:			amd64
	Processor Architecture:			amd64

	ZFS Storage pool Version:		28
	ZFS Filesystem Version:			5

FreeBSD 9.1-RC2 #1: Tue Oct 30 20:37:38 UTC 2012 root
 8:08PM  up 20:40, 3 users, load averages: 0.47, 0.50, 0.52

------------------------------------------------------------------------

System Memory:

	8.41%	5.22	GiB Active,	10.18%	6.32	GiB Inact
	77.39%	48.05	GiB Wired,	1.52%	966.99	MiB Cache
	2.50%	1.55	GiB Free,	0.00%	888.00	KiB Gap

	Real Installed:				64.00	GiB
	Real Available:			99.97%	63.98	GiB
	Real Managed:			97.04%	62.08	GiB

	Logical Total:				64.00	GiB
	Logical Used:			86.22%	55.18	GiB
	Logical Free:			13.78%	8.82	GiB

Kernel Memory:					23.18	GiB
	Data:				99.91%	23.16	GiB
	Text:				0.09%	21.27	MiB

Kernel Memory Map:				52.10	GiB
	Size:				35.21%	18.35	GiB
	Free:				64.79%	33.75	GiB

------------------------------------------------------------------------

ARC Summary: (HEALTHY)
	Memory Throttle Count:			0

ARC Misc:
	Deleted:				10.24m
	Recycle Misses:				3.48m
	Mutex Misses:				24.85k
	Evict Skips:				12.79m

ARC Size:				92.50%	28.25	GiB
	Target Size: (Adaptive)		92.50%	28.25	GiB
	Min Size (Hard Limit):		25.00%	7.64	GiB
	Max Size (High Water):		4:1	30.54	GiB

ARC Size Breakdown:
	Recently Used Cache Size:	62.35%	17.62	GiB
	Frequently Used Cache Size:	37.65%	10.64	GiB

ARC Hash Breakdown:
	Elements Max:				1.99m
	Elements Current:		99.16%	1.98m
	Collisions:				8.97m
	Chain Max:				14
	Chains:					586.97k

------------------------------------------------------------------------

ARC Efficiency:					1.15b
	Cache Hit Ratio:		97.66%	1.12b
	Cache Miss Ratio:		2.34%	26.80m
	Actual Hit Ratio:		72.75%	833.30m

	Data Demand Efficiency:		98.39%	33.94m
	Data Prefetch Efficiency:	8.11%	7.60m

	CACHE HITS BY CACHE LIST:
	  Anonymously Used:		23.88%	267.15m
	  Most Recently Used:		4.70%	52.60m
	  Most Frequently Used:		69.79%	780.70m
	  Most Recently Used Ghost:	0.64%	7.13m
	  Most Frequently Used Ghost:	0.98%	10.99m

	CACHE HITS BY DATA TYPE:
	  Demand Data:			2.99%	33.40m
	  Prefetch Data:		0.06%	616.42k
	  Demand Metadata:		71.38%	798.44m
	  Prefetch Metadata:		25.58%	286.13m

	CACHE MISSES BY DATA TYPE:
	  Demand Data:			2.04%	546.67k
	  Prefetch Data:		26.07%	6.99m
	  Demand Metadata:		37.96%	10.18m
	  Prefetch Metadata:		33.93%	9.09m

------------------------------------------------------------------------

L2 ARC Summary: (HEALTHY)
	Passed Headroom:			3.62m
	Tried Lock Failures:			3.17m
	IO In Progress:				21.18k
	Low Memory Aborts:			20
	Free on Write:				7.07k
	Writes While Full:			134
	R/W Clashes:				1.63k
	Bad Checksums:				0
	IO Errors:				0
	SPA Mismatch:				0

L2 ARC Size: (Adaptive)				22.70	GiB
	Header Size:			0.31%	71.02	MiB

L2 ARC Breakdown:				23.78m
	Hit Ratio:			34.25%	8.15m
	Miss Ratio:			65.75%	15.64m
	Feeds:					63.47k

L2 ARC Buffer:
	Bytes Scanned:				65.51	TiB
	Buffer Iterations:			63.47k
	List Iterations:			4.06m
	NULL List Iterations:			64.89k

L2 ARC Writes:
	Writes Sent:			100.00%	29.89k

------------------------------------------------------------------------

File-Level Prefetch: (HEALTHY)

DMU Efficiency:					1.24b
	Hit Ratio:			64.29%	798.62m
	Miss Ratio:			35.71%	443.54m

	Colinear:				443.54m
	  Hit Ratio:			0.00%	20.45k
	  Miss Ratio:			100.00%	443.52m

	Stride:					772.29m
	  Hit Ratio:			99.99%	772.21m
	  Miss Ratio:			0.01%	81.30k

DMU Misc:
	Reclaim:				443.52m
	  Successes:			0.05%	220.47k
	  Failures:			99.95%	443.30m

	Streams:				26.42m
	  +Resets:			0.05%	12.73k
	  -Resets:			99.95%	26.41m
	  Bogus:				0

------------------------------------------------------------------------

VDEV cache is disabled

------------------------------------------------------------------------

ZFS Tunables (sysctl):
	kern.maxusers                           384
	vm.kmem_size                            66662760448
	vm.kmem_size_scale                      1
	vm.kmem_size_min                        0
	vm.kmem_size_max                        329853485875
	vfs.zfs.l2c_only_size                   5242113536
	vfs.zfs.mfu_ghost_data_lsize            178520064
	vfs.zfs.mfu_ghost_metadata_lsize        6486959104
	vfs.zfs.mfu_ghost_size                  6665479168
	vfs.zfs.mfu_data_lsize                  11863127552
	vfs.zfs.mfu_metadata_lsize              123386368
	vfs.zfs.mfu_size                        12432947200
	vfs.zfs.mru_ghost_data_lsize            14095171584
	vfs.zfs.mru_ghost_metadata_lsize        8351076864
	vfs.zfs.mru_ghost_size                  22446248448
	vfs.zfs.mru_data_lsize                  2076449280
	vfs.zfs.mru_metadata_lsize              4655490560
	vfs.zfs.mru_size                        7074721792
	vfs.zfs.anon_data_lsize                 0
	vfs.zfs.anon_metadata_lsize             0
	vfs.zfs.anon_size                       1605632
	vfs.zfs.l2arc_norw                      1
	vfs.zfs.l2arc_feed_again                1
	vfs.zfs.l2arc_noprefetch                1
	vfs.zfs.l2arc_feed_min_ms               200
	vfs.zfs.l2arc_feed_secs                 1
	vfs.zfs.l2arc_headroom                  2
	vfs.zfs.l2arc_write_boost               52428800
	vfs.zfs.l2arc_write_max                 26214400
	vfs.zfs.arc_meta_limit                  16398159872
	vfs.zfs.arc_meta_used                   16398120264
	vfs.zfs.arc_min                         8199079936
	vfs.zfs.arc_max                         32796319744
	vfs.zfs.dedup.prefetch                  1
	vfs.zfs.mdcomp_disable                  0
	vfs.zfs.write_limit_override            0
	vfs.zfs.write_limit_inflated            206088929280
	vfs.zfs.write_limit_max                 8587038720
	vfs.zfs.write_limit_min                 33554432
	vfs.zfs.write_limit_shift               3
	vfs.zfs.no_write_throttle               0
	vfs.zfs.zfetch.array_rd_sz              1048576
	vfs.zfs.zfetch.block_cap                256
	vfs.zfs.zfetch.min_sec_reap             2
	vfs.zfs.zfetch.max_streams              8
	vfs.zfs.prefetch_disable                0
	vfs.zfs.mg_alloc_failures               12
	vfs.zfs.check_hostid                    1
	vfs.zfs.recover                         0
	vfs.zfs.txg.synctime_ms                 1000
	vfs.zfs.txg.timeout                     5
	vfs.zfs.vdev.cache.bshift               16
	vfs.zfs.vdev.cache.size                 0
	vfs.zfs.vdev.cache.max                  16384
	vfs.zfs.vdev.write_gap_limit            4096
	vfs.zfs.vdev.read_gap_limit             32768
	vfs.zfs.vdev.aggregation_limit          131072
	vfs.zfs.vdev.ramp_rate                  2
	vfs.zfs.vdev.time_shift                 6
	vfs.zfs.vdev.min_pending                4
	vfs.zfs.vdev.max_pending                128
	vfs.zfs.vdev.bio_flush_disable          0
	vfs.zfs.cache_flush_disable             0
	vfs.zfs.zil_replay_disable              0
	vfs.zfs.zio.use_uma                     0
	vfs.zfs.snapshot_list_prefetch          0
	vfs.zfs.version.zpl                     5
	vfs.zfs.version.spa                     28
	vfs.zfs.version.acl                     1
	vfs.zfs.debug                           0
	vfs.zfs.super_owner                     0


From owner-freebsd-fs@FreeBSD.ORG  Tue Feb 19 21:11:44 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id DBA3D8BA;
 Tue, 19 Feb 2013 21:11:44 +0000 (UTC)
 (envelope-from tomek.cedro@gmail.com)
Received: from mail-qc0-f170.google.com (mail-qc0-f170.google.com
 [209.85.216.170])
 by mx1.freebsd.org (Postfix) with ESMTP id 704237F4;
 Tue, 19 Feb 2013 21:11:44 +0000 (UTC)
Received: by mail-qc0-f170.google.com with SMTP id d42so2801484qca.29
 for <multiple recipients>; Tue, 19 Feb 2013 13:11:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:sender:date:x-google-sender-auth:message-id
 :subject:from:to:content-type;
 bh=1yBb3dK8gbWXGlNZwKmtrGe7pRbf1HX71uH8gx53GEg=;
 b=ZfGkqVJcWLEJGQXvaqENiPmQPNQYc9zpfKuIyHQ8zmfzLuQsex6hmKYvRg2yD6S/6h
 IP1a9rmqMqDIApHeDk8ZrE8SiLq2IP/gk8lneJneJFQuODjbq7Lxt+amUu1r3Abuu1+L
 xnnBWlMHiz872lLV5L64LyJ7AitNdd2FOT2cWDNerNZUy0w2RCgA1TFXgbJ6oY+6eajm
 HjQ5EYI/xH5UCXjrKfit1PUY/cRs4VfVUmvyjly9xFgC3MhK2zrcxT6HOdpPwrA9mwQs
 Ei89GGSQoVKnFoWTy5fZ0QPtF9Dcr9Ygb1hExTiYFeCKMPcH0Ga9XtZIC0/Phgp1y5H0
 6nUw==
MIME-Version: 1.0
X-Received: by 10.224.209.193 with SMTP id gh1mr8319809qab.86.1361308297623;
 Tue, 19 Feb 2013 13:11:37 -0800 (PST)
Sender: tomek.cedro@gmail.com
Received: by 10.49.71.204 with HTTP; Tue, 19 Feb 2013 13:11:37 -0800 (PST)
Date: Tue, 19 Feb 2013 22:11:37 +0100
X-Google-Sender-Auth: OqEMZuDv2ClLuLd4D5EO-APjo0k
Message-ID: <CAFYkXj=6frxqujVkJ532oXEtJdGnMeS_DNdqprF4xC2YKrza0A@mail.gmail.com>
Subject: bluray recorder
From: CeDeROM <cederom@tlen.pl>
To: freebsd-stable@freebsd.org, freebsd-fs@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Feb 2013 21:11:44 -0000

Hello :-)

I have just bought a Pioneer 15x BluRay recorder. I saw something like
below in the dmesg, I cannot access video with VLC, should I worry
about that? I guess recording files can be done just as for DVD with
growisofs? :-)

(cd2:ata0:0:1:0): READ DVD STRUCTURE. CDB: ad 0 0 0 0 0 0 1 0 8 0 0
(cd2:ata0:0:1:0): CAM status: SCSI Status Error
(cd2:ata0:0:1:0): SCSI status: Check Condition
(cd2:ata0:0:1:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field in CDB)
(cd2:ata0:0:1:0): Error 22, Unretryable error
(cd2:ata0:0:1:0): READ DVD STRUCTURE. CDB: ad 0 0 0 0 0 0 1 0 8 0 0
(cd2:ata0:0:1:0): CAM status: SCSI Status Error
(cd2:ata0:0:1:0): SCSI status: Check Condition
(cd2:ata0:0:1:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field in CDB)
(cd2:ata0:0:1:0): Error 22, Unretryable error
(cd2:ata0:0:1:0): READ DVD STRUCTURE. CDB: ad 0 0 0 0 0 0 1 0 8 0 0
(cd2:ata0:0:1:0): CAM status: SCSI Status Error
(cd2:ata0:0:1:0): SCSI status: Check Condition
(cd2:ata0:0:1:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field in CDB)
(cd2:ata0:0:1:0): Error 22, Unretryable error
(cd2:ata0:0:1:0): READ DVD STRUCTURE. CDB: ad 0 0 0 0 0 0 1 0 8 0 0
(cd2:ata0:0:1:0): CAM status: SCSI Status Error
(cd2:ata0:0:1:0): SCSI status: Check Condition
(cd2:ata0:0:1:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field in CDB)
(cd2:ata0:0:1:0): Error 22, Unretryable error

Any hints welcome :-)
Tomek

-- 
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 00:42:52 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id ADEAB300
 for <freebsd-fs@FreeBSD.org>; Wed, 20 Feb 2013 00:42:52 +0000 (UTC)
 (envelope-from jmg@h2.funkthat.com)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 by mx1.freebsd.org (Postfix) with ESMTP id 5812536E
 for <freebsd-fs@FreeBSD.org>; Wed, 20 Feb 2013 00:42:52 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id r1K0gkk5048810
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
 for <freebsd-fs@FreeBSD.org>; Tue, 19 Feb 2013 16:42:46 -0800 (PST)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id r1K0gklU048809
 for freebsd-fs@FreeBSD.org; Tue, 19 Feb 2013 16:42:46 -0800 (PST)
 (envelope-from jmg)
Date: Tue, 19 Feb 2013 16:42:46 -0800
From: John-Mark Gurney <jmg@funkthat.com>
To: freebsd-fs@FreeBSD.org
Subject: on zfs, read errors are considered write errors?
Message-ID: <20130220004246.GL55866@funkthat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Tue, 19 Feb 2013 16:42:46 -0800 (PST)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 00:42:52 -0000

So, I've been trying to track down how ZFS handles errors and stuff
to make sure it's sane before I try to fix geli, but I've been getting
some wierd results...  Apparently, zfs thinks that read errors are
WRITE errors, or even CKSUM errors (this is understandable, as invalid
data would cause a cksum error)...  I don't know where in the zfs code
that error accounting is happening, but here is my test:
touch /root/disk{1,2}
mdconfig -a -t vnode -f /root/disk1 -s 96m
mdconfig -a -t vnode -f /root/disk2 -s 96m
gnop create md0 
gnop create md1 
zpool create ztest mirror md0.nop md1.nop
cd /ztest
for i in `jot 1000 1`; do echo $i > $i; done
cd /
zpool export ztest
gnop configure -r 0 md0.nop
zpool import ztest
zpool status
gnop configure -r 30 md0.nop
cat /ztest/*
zpool status
zpool scrub
zpool status

And I get results like:
[root@carbon /]# zpool status
  pool: ztest
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 239K in 0h0m with 0 errors on Tue Feb 19 16:36:37 2013
config:

        NAME         STATE     READ WRITE CKSUM
        ztest        ONLINE       0     0     0
          mirror-0   ONLINE       0     0     0
            md0.nop  ONLINE       5   277   422
            md1.nop  ONLINE       0     0     0

errors: No known data errors

I'm patches that changes gnop to log the errors on debug of 1 instead
of 2 (which also logs all requests), and the logs verify that only errors
to READ requests are returned...

Any clues why read errors would cause write errors?

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 02:00:50 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 1996E646
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 02:00:50 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id D067091C
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 02:00:49 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEACEtJFGDaFvO/2dsb2JhbABFhkm6AYEkc4IfAQEEASMEUgUWGBEZAgRVBogfBgytaoJAkCGNPRqBAxkbB4ItgRMDiGaGMYcUgR2PO4MlgU0HFwYY
X-IronPort-AV: E=Sophos;i="4.84,698,1355115600"; d="c'?scan'208";a="14840163"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu.net.uoguelph.ca with ESMTP; 19 Feb 2013 21:00:42 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id B40F3B3F4E;
 Tue, 19 Feb 2013 21:00:42 -0500 (EST)
Date: Tue, 19 Feb 2013 21:00:42 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Momchil Ivanov <momchil@xaxo.eu>
Message-ID: <992481316.3137385.1361325642681.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <86a88ac8bb038ec5d8034724dcf80924.squirrel@webmail.xaxo.eu>
Subject: Re: NFS + Kerberos
MIME-Version: 1.0
Content-Type: multipart/mixed; 
 boundary="----=_Part_3137384_634441493.1361325642679"
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 02:00:50 -0000

------=_Part_3137384_634441493.1361325642679
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Momchil Ivanov wrote:
> On Tue, February 19, 2013 12:56 am, Rick Macklem wrote:
> > Thanks to Elias's hard work, a bug/fix has just been isolated in the
> > Kerberos library that causes the gssd to fail to translate a
> > principal
> > to a uid. The fix is to increase the size of the buffer passed to
> > getpwnam_r(). See this thread:
> > http://docs.FreeBSD.org/cgi/mid.cgi?CADtN0WKVzbKxhaLQw8y2KLhhRJC9n4ht9wyPmGQ+pHqSjQkVNw
> >
> > I haven't run into this bug, so I don't know what systems are
> > affected,
> > but it would explain why you can't get it working.
> >
> > I'd suggest you apply the patch in the email (increase buf to 1024)
> > and
> > then try again with libraries built with the patch.
> 
> Do I have to aplly the patch to the server only and then rebuild world
> or
> do I have to do the same on the client too? And do I need to rebuild
> heimdal on both machines?
> 
The bug should only affect the server, since the client never translates
between principal_name<->uid. (The client does a rather cheezey trick of
using the uid to select the correct credential cache file.)

> btw, I checked the logs of the kdc and could not see any trace of the
> nfs
> server trying to validate the client's ticket... Frankly, I don't know
> that should I expect there, I haven't used kerberos before, so I have
> no
> idea if it's related to the bug. Here is part of the log:
> 
> AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for
> krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL
> No preauth found, returning PREAUTH-REQUIRED -- user@EXAMPLE.LOCAL
> sending 407 bytes to IPv4:X.X.X.X
> AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for
> krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL
> Client sent patypes: encrypted-timestamp
> Looking for PKINIT pa-data -- user@EXAMPLE.LOCAL
> Looking for ENC-TS pa-data -- user@EXAMPLE.LOCAL
> ENC-TS Pre-authentication succeeded -- user@EXAMPLE.LOCAL using
> des-cbc-crc
> Client supported enctypes: des-cbc-crc
> Using des-cbc-crc/aes256-cts-hmac-sha1-96
> AS-REQ authtime: 2013-02-11T23:45:44 starttime: unset endtime:
> 2013-02-12T09:45:39 renew till: unset
> sending 552 bytes to IPv4:X.X.X.X
> 
Hmm, that sounds like you are never getting as far as sending the
ticket to the server, but I'm not at home, so I can't look and see
exactly what gets logged. (Also, I use a MIT KDC, so what gets logged
might be different.)

I've attached a trivial program that you can compile/run as root
on the NFS server to see if 128 bytes is a big enough buffer for your setup.
If it can print out the uid for the usernames you test as arguments,
the patch isn't needed for your environment.
(Oh, and it has a typo bug in the errx() arguments, but it works ok
 for testing.)

Good luck with it, rick

> Thank you,
> Momchil

------=_Part_3137384_634441493.1361325642679--

From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 03:59:18 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 2B80E6B7
 for <fs@FreeBSD.org>; Wed, 20 Feb 2013 03:59:18 +0000 (UTC)
 (envelope-from jamie@FreeBSD.org)
Received: from m2.gritton.org (gritton.org [199.192.164.235])
 by mx1.freebsd.org (Postfix) with ESMTP id E6FDBDEA
 for <fs@FreeBSD.org>; Wed, 20 Feb 2013 03:59:17 +0000 (UTC)
Received: from glorfindel.gritton.org (c-174-52-130-157.hsd1.ut.comcast.net
 [174.52.130.157]) (authenticated bits=0)
 by m2.gritton.org (8.14.5/8.14.5) with ESMTP id r1K3xG4Z019369
 for <fs@freebsd.org>; Tue, 19 Feb 2013 20:59:16 -0700 (MST)
 (envelope-from jamie@FreeBSD.org)
Message-ID: <51244A13.8030907@FreeBSD.org>
Date: Tue, 19 Feb 2013 20:59:15 -0700
From: Jamie Gritton <jamie@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
 rv:1.9.2.24) Gecko/20120129 Thunderbird/3.1.16
MIME-Version: 1.0
To: fs@FreeBSD.org
Subject: mount/kldload race
Content-Type: multipart/mixed; boundary="------------080501010405030304090106"
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 03:59:18 -0000

This is a multi-part message in MIME format.
--------------080501010405030304090106
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Perhaps most people don't try to mount a bunch of filesystems at the 
same time, at least not those that depend on kernel modules. But it 
turns out that's going to be a pretty common situation with jails and 
nullfs. And I found that when attempting such a feat will cause most of 
these simultaneous mounts to fail with ENODEV.

It turns out that the problem is a race in vfs_byname_kld(). First it'll 
see if the fstype is loaded, and if it isn't then it will load the 
module. But if the module is loaded by a different process between those 
two points, the resulting EEXIST from kern_kldload() will make 
vfs_byname_kld() error out.

The fix is pretty simple: don't treat EEXIST as an error. By going on, 
and rechecking for the fstype, the filesystem can be mounted while still 
allowing any "real" error to be caught. I'm including a small patch that 
will accomplish this, and I'd appreciate a quick look by anyone who's 
familiar with this part of things before I commit it.

- Jamie

--------------080501010405030304090106
Content-Type: text/plain;
 name="vfs_init.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="vfs_init.diff"

Index: sys/kern/vfs_init.c
===================================================================
--- sys/kern/vfs_init.c	(revision 247000)
+++ sys/kern/vfs_init.c	(working copy)
@@ -130,13 +130,18 @@
 
 	/* Try to load the respective module. */
 	*error = kern_kldload(td, fstype, &fileid);
+	if (*error == EEXIST) {
+		*error = 0;
+		fileid = 0;
+	}
 	if (*error)
 		return (NULL);
 
 	/* Look up again to see if the VFS was loaded. */
 	vfsp = vfs_byname(fstype);
 	if (vfsp == NULL) {
-		(void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE);
+		if (fileid != 0)
+			(void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE);
 		*error = ENODEV;
 		return (NULL);
 	}

--------------080501010405030304090106--

From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 05:43:14 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id EF7A9BF1;
 Wed, 20 Feb 2013 05:43:14 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 4E5D9211;
 Wed, 20 Feb 2013 05:43:14 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r1K5h9wg001927;
 Wed, 20 Feb 2013 07:43:09 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r1K5h9wg001927
Received: (from kostik@localhost)
 by tom.home (8.14.6/8.14.6/Submit) id r1K5h90j001926;
 Wed, 20 Feb 2013 07:43:09 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 20 Feb 2013 07:43:09 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Jamie Gritton <jamie@FreeBSD.org>
Subject: Re: mount/kldload race
Message-ID: <20130220054309.GD2598@kib.kiev.ua>
References: <51244A13.8030907@FreeBSD.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="7HhoQoqNsng1reXT"
Content-Disposition: inline
In-Reply-To: <51244A13.8030907@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 05:43:15 -0000


--7HhoQoqNsng1reXT
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Feb 19, 2013 at 08:59:15PM -0700, Jamie Gritton wrote:
> Perhaps most people don't try to mount a bunch of filesystems at the=20
> same time, at least not those that depend on kernel modules. But it=20
> turns out that's going to be a pretty common situation with jails and=20
> nullfs. And I found that when attempting such a feat will cause most of=
=20
> these simultaneous mounts to fail with ENODEV.
>=20
> It turns out that the problem is a race in vfs_byname_kld(). First it'll=
=20
> see if the fstype is loaded, and if it isn't then it will load the=20
> module. But if the module is loaded by a different process between those=
=20
> two points, the resulting EEXIST from kern_kldload() will make=20
> vfs_byname_kld() error out.
>=20
> The fix is pretty simple: don't treat EEXIST as an error. By going on,=20
> and rechecking for the fstype, the filesystem can be mounted while still=
=20
> allowing any "real" error to be caught. I'm including a small patch that=
=20
> will accomplish this, and I'd appreciate a quick look by anyone who's=20
> familiar with this part of things before I commit it.
>=20
> - Jamie

> Index: sys/kern/vfs_init.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- sys/kern/vfs_init.c	(revision 247000)
> +++ sys/kern/vfs_init.c	(working copy)
> @@ -130,13 +130,18 @@
> =20
>  	/* Try to load the respective module. */
>  	*error =3D kern_kldload(td, fstype, &fileid);
> +	if (*error =3D=3D EEXIST) {
> +		*error =3D 0;
> +		fileid =3D 0;
Why do you clear fileid ? Is this to prevent an attempt to kldunload()
the module which was not loaded by the current thread ?

If yes, I would suggest to use the separate flag to track this,
which is cleared on EEXIST error. IMHO it is cleaner and less puzzling.
> +	}
>  	if (*error)
>  		return (NULL);
> =20
>  	/* Look up again to see if the VFS was loaded. */
>  	vfsp =3D vfs_byname(fstype);
>  	if (vfsp =3D=3D NULL) {
> -		(void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE);
> +		if (fileid !=3D 0)
> +			(void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE);
>  		*error =3D ENODEV;
>  		return (NULL);
>  	}

> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


--7HhoQoqNsng1reXT
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRJGJtAAoJEJDCuSvBvK1BcN0P/23vCOiUpRsEV5ODpT1qdD9E
DsdB8/WAIc7xL1OgSUL5M4kORDpH6i7gqnDLlYJRtPvc6fikvrEGR9MpC4fFMHJ3
58LaKdhWTvl2YRKyLxcpO4rC9yx86DVLbA57ya7EH+P+9Ij/Ehh0NM8yaVO3xCQE
0/aP0MwPHxfAbkm8ybB8MEVegLWzBEMGds8Yvybxm0fIRuCNZKprhVd+XpcM73Dp
LoTeRm0ho+ggzlJv4NfwPsCoYgMHLML0wLibqXbUsQgJVKvNZ06sYZVFvT6ywc+n
b1L2y/Qxibmxww/lUy61AgxXg+/h7vIme21IsWn905K3ka0IWhuOud3Mn2X51N4b
cSqXC0FWhDcXT77iPOp/aVlOKrPqtarqX3WqFdM6AiGkZ/pciis4YtXvn1q2midV
UbBB2rdORUuSZXs1yJNLuOn5UKLFCrZKSnxScZxSaEhE3U7OwCXRjfxYFbTvlovr
Xl5dQqpLirrZlhjg2gYcftl964BhmIIEj0a3QrWFfZZ4cfE9JOmQ0n5JpWfNopir
FNmLS6nsNwG8sJN0cRKloC1cB5yELHn7ZeFvGKD2ttQRCpFH1ou7Oc3cnV5Xv07m
UXBO5K3ztmyxnS0TSuNyuIN01YFN6cxH3+dU0ssMl05+UpYsf6SL7Kpz/DGK3TTT
Em0OAxk446w7bjtf56UX
=u3qn
-----END PGP SIGNATURE-----

--7HhoQoqNsng1reXT--

From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 06:21:00 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id E0AB2135
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 06:21:00 +0000 (UTC)
 (envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
 by mx1.freebsd.org (Postfix) with ESMTP id 4328032B
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 06:20:59 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5])
 (authenticated bits=0)
 by smtp-sofia.digsys.bg (8.14.5/8.14.5) with ESMTP id r1K6KkqH077549
 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 08:20:46 +0200 (EET)
 (envelope-from daniel@digsys.bg)
Message-ID: <51246B3E.1030604@digsys.bg>
Date: Wed, 20 Feb 2013 08:20:46 +0200
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:10.0.12) Gecko/20130125 Thunderbird/10.0.12
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: Re: Improving ZFS performance for large directories
References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com>
 <CAJjvXiE+8OMu_yvdRAsWugH7W=fhFW7bicOLLyjEn8YrgvCwiw@mail.gmail.com>
 <F4420A8C-FB92-4771-B261-6C47A736CF7F@dragondata.com>
 <20130201192416.GA76461@server.rulingia.com>
 <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com>
In-Reply-To: <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 06:21:00 -0000


On 19.02.13 22:10, Kevin Day wrote:
> Thinking I'd make the primary cache metadata only, and the secondary cache "all" would improve things, I wiped the device (SATA secure erase to make sure) and tried again. This was much worse, i'm guessing because there was some amount of real file data being looked at frequently, the SSD was basically getting hammered for read access with 100% utilization, and things were far slower.

This sounds weird. What kind is your L2ARC device, what performance and 
how is it connected? Typical today's SSDs have read performance of over 
500 MB/s if you connect it at SATA3. You could double that with two 
drives etc. For L2ARC you don't really need write-optimized SSD, because 
ZFS rate-limits the writes to L2ARC. It is best to connect these on the 
motherboard's SATA ports.

Is the SSD used only for L2ARC? If it is writing too much, that might 
make it slow at intensive usage, especially if it is not write-optimised 
(typical "pro" or "enterprise"). Also, you may wish to experiment with 
the sector size (alignment) when you add it to the pool. The ashift 
parameter is per-vdev in ZFS and cache and log devices are separate 
vdevs. Therefore, using gnop to make it appear as 4K or 8K sector drive 
might improve things. You have to experiment here...


>
> ARC Size:				92.50%	28.25	GiB
> 	Target Size: (Adaptive)		92.50%	28.25	GiB
> 	Min Size (Hard Limit):		25.00%	7.64	GiB
> 	Max Size (High Water):		4:1	30.54	GiB

But this looks strange. Have you increased vfs.zfs.arc_max and 
vfs.zfs.arc_meta_limit?
For an 72GB system, I have this in /boot/loader.conf

vfs.zfs.arc_max=64424509440
vfs.zfs.arc_meta_limit=51539607552

I found out that increasing vfs.zfs.arc_meta_limit helped most (my 
issues were with huge deduped datasets with dedup ratio of around 10 and 
many snapshots). Even if you intend to keep ARC small (bad idea, as it 
is being used to track L2ARC as well), you need to increase 
vfs.zfs.arc_meta_limit, perhaps up to vfs.zfs.arc_max. If you do that, 
then perhaps primarycache=metadata might even work better.

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 07:20:02 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 56096B7A
 for <freebsd-fs@smarthost.ysv.freebsd.org>;
 Wed, 20 Feb 2013 07:20:02 +0000 (UTC)
 (envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id 35F6B73B
 for <freebsd-fs@smarthost.ysv.freebsd.org>;
 Wed, 20 Feb 2013 07:20:02 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r1K7K1vq003697
 for <freebsd-fs@freefall.freebsd.org>; Wed, 20 Feb 2013 07:20:01 GMT
 (envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r1K7K12t003696;
 Wed, 20 Feb 2013 07:20:01 GMT (envelope-from gnats)
Date: Wed, 20 Feb 2013 07:20:01 GMT
Message-Id: <201302200720.r1K7K12t003696@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
Cc: 
From: "Ganael LAPLANCHE" <ganael.laplanche@martymac.org>
Subject: Re: kern/112658: [smbfs] [patch] smbfs and caching problems (resolves
 bin/111004)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: Ganael LAPLANCHE <ganael.laplanche@martymac.org>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 07:20:02 -0000

The following reply was made to PR kern/112658; it has been noted by GNATS.

From: "Ganael LAPLANCHE" <ganael.laplanche@martymac.org>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/112658: [smbfs] [patch] smbfs and caching problems (resolves bin/111004)
Date: Wed, 20 Feb 2013 07:16:27 +0000 (UTC)

 This is a multi-part message in MIME format.
 
 ------=OPENWEBMAIL_ATT_0.350054851341159
 Content-Type: text/plain; charset=iso-8859-15
 
 Here is an updated version of the patch.
 It applies to svn revision 246938.
 
 --
 Ganael LAPLANCHE <ganael.laplanche@martymac.org>
 http://www.martymac.org | http://contribs.martymac.org
 FreeBSD: martymac <martymac@FreeBSD.org>, http://www.FreeBSD.org
 
 ------=OPENWEBMAIL_ATT_0.350054851341159
 Content-Type: text/plain;
 	name="patch-smbfs-svnrev-246938.txt"
 Content-Disposition: attachment; filename="patch-smbfs-svnrev-246938.txt"
 Content-Transfer-Encoding: base64
 
 ZGlmZiAtYXVyTiBzeXMvZnMvc21iZnMub3JpZy9zbWJmc19ub2RlLmMgc3lzL2ZzL3NtYmZzL3Nt
 YmZzX25vZGUuYwotLS0gc3lzL2ZzL3NtYmZzLm9yaWcvc21iZnNfbm9kZS5jCTIwMTItMTItMTQg
 MDk6NTk6MTEuNjgyNzQxMDAwICswMTAwCisrKyBzeXMvZnMvc21iZnMvc21iZnNfbm9kZS5jCTIw
 MTMtMDItMjAgMDY6MDA6MzcuNjUyNzk2MzA1ICswMTAwCkBAIC02NCw3ICs2NCw3IEBACiAJcmV0
 dXJuIChmbnZfMzJfYnVmKG5hbWUsIG5tbGVuLCBGTlYxXzMyX0lOSVQpKTsgCiB9CiAKLXN0YXRp
 YyBjaGFyICoKK2NoYXIgKgogc21iZnNfbmFtZV9hbGxvYyhjb25zdCB1X2NoYXIgKm5hbWUsIGlu
 dCBubWxlbikKIHsKIAl1X2NoYXIgKmNwOwpAQCAtNzYsNyArNzYsNyBAQAogCXJldHVybiBjcDsK
 IH0KIAotc3RhdGljIHZvaWQKK3ZvaWQKIHNtYmZzX25hbWVfZnJlZSh1X2NoYXIgKm5hbWUpCiB7
 CiAKZGlmZiAtYXVyTiBzeXMvZnMvc21iZnMub3JpZy9zbWJmc19ub2RlLmggc3lzL2ZzL3NtYmZz
 L3NtYmZzX25vZGUuaAotLS0gc3lzL2ZzL3NtYmZzLm9yaWcvc21iZnNfbm9kZS5oCTIwMTItMTIt
 MTQgMDk6NTk6MTEuNjc2NzQwMDAwICswMTAwCisrKyBzeXMvZnMvc21iZnMvc21iZnNfbm9kZS5o
 CTIwMTMtMDItMjAgMDY6MDA6MzcuNjU4ODE5MjA3ICswMTAwCkBAIC05MSw2ICs5MSw5IEBACiAJ
 c3RydWN0IHNtYmZhdHRyICpmYXAsIHN0cnVjdCB2bm9kZSAqKnZwcCk7CiB1X2ludDMyX3Qgc21i
 ZnNfaGFzaChjb25zdCB1X2NoYXIgKm5hbWUsIGludCBubWxlbik7CiAKK2NoYXIgICpzbWJmc19u
 YW1lX2FsbG9jKGNvbnN0IHVfY2hhciAqbmFtZSwgaW50IG5tbGVuKTsKK3ZvaWQgICBzbWJmc19u
 YW1lX2ZyZWUodV9jaGFyICpuYW1lKTsKKwogaW50ICBzbWJmc19nZXRwYWdlcyhzdHJ1Y3Qgdm9w
 X2dldHBhZ2VzX2FyZ3MgKik7CiBpbnQgIHNtYmZzX3B1dHBhZ2VzKHN0cnVjdCB2b3BfcHV0cGFn
 ZXNfYXJncyAqKTsKIGludCAgc21iZnNfcmVhZHZub2RlKHN0cnVjdCB2bm9kZSAqdnAsIHN0cnVj
 dCB1aW8gKnVpb3AsIHN0cnVjdCB1Y3JlZCAqY3JlZCk7CmRpZmYgLWF1ck4gc3lzL2ZzL3NtYmZz
 Lm9yaWcvc21iZnNfc21iLmMgc3lzL2ZzL3NtYmZzL3NtYmZzX3NtYi5jCi0tLSBzeXMvZnMvc21i
 ZnMub3JpZy9zbWJmc19zbWIuYwkyMDEyLTEyLTE0IDA5OjU5OjExLjY4MDc0MDAwMCArMDEwMAor
 Kysgc3lzL2ZzL3NtYmZzL3NtYmZzX3NtYi5jCTIwMTMtMDItMjAgMDY6MDA6MzcuNjY3ODE0ODMz
 ICswMTAwCkBAIC0xNDQzLDM3ICsxNDQzLDUwIEBACiB9CiAKIGludAotc21iZnNfc21iX2xvb2t1
 cChzdHJ1Y3Qgc21ibm9kZSAqZG5wLCBjb25zdCBjaGFyICpuYW1lLCBpbnQgbm1sZW4sCitzbWJm
 c19zbWJfbG9va3VwKHN0cnVjdCBzbWJub2RlICpkbnAsIGNoYXIgKipuYW1lcCwgaW50ICpubWxl
 bnAsCiAJc3RydWN0IHNtYmZhdHRyICpmYXAsIHN0cnVjdCBzbWJfY3JlZCAqc2NyZWQpCiB7CiAJ
 c3RydWN0IHNtYmZzX2ZjdHggKmN0eDsKIAlpbnQgZXJyb3I7CiAKLQlpZiAoZG5wID09IE5VTEwg
 fHwgKGRucC0+bl9pbm8gPT0gMiAmJiBuYW1lID09IE5VTEwpKSB7CisJaWYgKGRucCA9PSBOVUxM
 IHx8CisJCShkbnAtPm5faW5vID09IDIgJiYgKG5hbWVwID09IE5VTEwgfHwgKm5hbWVwID09IE5V
 TEwpKSkgewogCQliemVybyhmYXAsIHNpemVvZigqZmFwKSk7CiAJCWZhcC0+ZmFfYXR0ciA9IFNN
 Ql9GQV9ESVI7CiAJCWZhcC0+ZmFfaW5vID0gMjsKIAkJcmV0dXJuIDA7CiAJfQotCWlmIChubWxl
 biA9PSAxICYmIG5hbWVbMF0gPT0gJy4nKSB7Ci0JCWVycm9yID0gc21iZnNfc21iX2xvb2t1cChk
 bnAsIE5VTEwsIDAsIGZhcCwgc2NyZWQpOworCWlmIChubWxlbnAgJiYgKm5tbGVucCA9PSAxICYm
 IG5hbWVwICYmICgqbmFtZXApWzBdID09ICcuJykgeworCQllcnJvciA9IHNtYmZzX3NtYl9sb29r
 dXAoZG5wLCBOVUxMLCBOVUxMLCBmYXAsIHNjcmVkKTsKIAkJcmV0dXJuIGVycm9yOwotCX0gZWxz
 ZSBpZiAobm1sZW4gPT0gMiAmJiBuYW1lWzBdID09ICcuJyAmJiBuYW1lWzFdID09ICcuJykgewot
 CQllcnJvciA9IHNtYmZzX3NtYl9sb29rdXAoVlRPU01CKGRucC0+bl9wYXJlbnQpLCBOVUxMLCAw
 LCBmYXAsCi0JCSAgICBzY3JlZCk7CisJfSBlbHNlIGlmIChubWxlbnAgJiYgKm5tbGVucCA9PSAy
 ICYmIG5hbWVwICYmICgqbmFtZXApWzBdID09ICcuJyAmJgorCQkoKm5hbWVwKVsxXSA9PSAnLicp
 IHsKKwkJZXJyb3IgPSBzbWJmc19zbWJfbG9va3VwKFZUT1NNQihkbnAtPm5fcGFyZW50KSwgTlVM
 TCwgTlVMTCwKKwkJCWZhcCwgc2NyZWQpOwogCQlwcmludGYoIiVzOiBrbm93cyBOT1RISU5HIGFi
 b3V0ICcuLidcbiIsIF9fZnVuY19fKTsKIAkJcmV0dXJuIGVycm9yOwogCX0KLQllcnJvciA9IHNt
 YmZzX2ZpbmRvcGVuKGRucCwgbmFtZSwgbm1sZW4sCi0JICAgIFNNQl9GQV9TWVNURU0gfCBTTUJf
 RkFfSElEREVOIHwgU01CX0ZBX0RJUiwgc2NyZWQsICZjdHgpOworCWVycm9yID0gc21iZnNfZmlu
 ZG9wZW4oZG5wLCBuYW1lcCA/ICpuYW1lcCA6IE5VTEwsIG5tbGVucCA/ICpubWxlbnAgOiAwLAor
 CQlTTUJfRkFfU1lTVEVNIHwgU01CX0ZBX0hJRERFTiB8IFNNQl9GQV9ESVIsIHNjcmVkLCAmY3R4
 KTsKIAlpZiAoZXJyb3IpCiAJCXJldHVybiBlcnJvcjsKIAljdHgtPmZfZmxhZ3MgfD0gU01CRlNf
 UkREX0ZJTkRTSU5HTEU7CiAJZXJyb3IgPSBzbWJmc19maW5kbmV4dChjdHgsIDEsIHNjcmVkKTsK
 IAlpZiAoZXJyb3IgPT0gMCkgewogCQkqZmFwID0gY3R4LT5mX2F0dHI7Ci0JCWlmIChuYW1lID09
 IE5VTEwpCisJCWlmIChuYW1lcCA9PSBOVUxMIHx8ICpuYW1lcCA9PSBOVUxMKQogCQkJZmFwLT5m
 YV9pbm8gPSBkbnAtPm5faW5vOworCQlpZiAobmFtZXAgJiYgKm5hbWVwICYmIG5tbGVucCAmJiAq
 bm1sZW5wKSB7CisJCQkvKiBSZXR1cm4gdGhlICpyZWFsKiBuYW1lIGFuZCBsZW5ndGggb2YgdGhl
 IGZpbGUgCisJCQkgKiBmb3VuZCBvbiB0aGUgc2VydmVyIGlmIG5lY2Vzc2FyeS4gSWYgYSBuZXcg
 YWxsb2NhdGlvbgorCQkJICogaXMgZG9uZSBoZXJlLCBtZW1vcnkgd2lsbCBiZSBmcmVlZCBsYXRl
 ciAqLworCQkJaWYoKGN0eC0+Zl9ubWxlbiAhPSAqbm1sZW5wKSB8fAorCQkJCShiY21wKGN0eC0+
 Zl9uYW1lLCAqbmFtZXAsICpubWxlbnApICE9IDApKSB7CisJCQkJU01CVkRFQlVHKCJsb29rdXBl
 ZCBmaWxlbmFtZSBhbmQgc2VydmVyJ3MgZmlsZW5hbWUgZGlmZmVyXG4iKTsKKwkJCQkqbmFtZXAg
 PSBzbWJmc19uYW1lX2FsbG9jKCh1X2NoYXIgKikoY3R4LT5mX25hbWUpLCBjdHgtPmZfbm1sZW4p
 OworCQkJCSpubWxlbnAgPSBjdHgtPmZfbm1sZW47CisJCQl9CisJCX0KIAl9CiAJc21iZnNfZmlu
 ZGNsb3NlKGN0eCwgc2NyZWQpOwogCXJldHVybiBlcnJvcjsKZGlmZiAtYXVyTiBzeXMvZnMvc21i
 ZnMub3JpZy9zbWJmc19zdWJyLmggc3lzL2ZzL3NtYmZzL3NtYmZzX3N1YnIuaAotLS0gc3lzL2Zz
 L3NtYmZzLm9yaWcvc21iZnNfc3Vici5oCTIwMTItMTItMTQgMDk6NTk6MTEuNjc4NzQyMDAwICsw
 MTAwCisrKyBzeXMvZnMvc21iZnMvc21iZnNfc3Vici5oCTIwMTMtMDItMjAgMDY6MDA6MzcuNjcz
 Nzk5MDQ0ICswMTAwCkBAIC0xNjYsNyArMTY2LDcgQEAKIGludCAgc21iZnNfZmluZGNsb3NlKHN0
 cnVjdCBzbWJmc19mY3R4ICpjdHgsIHN0cnVjdCBzbWJfY3JlZCAqc2NyZWQpOwogaW50ICBzbWJm
 c19mdWxscGF0aChzdHJ1Y3QgbWJjaGFpbiAqbWJwLCBzdHJ1Y3Qgc21iX3ZjICp2Y3AsCiAJc3Ry
 dWN0IHNtYm5vZGUgKmRucCwgY29uc3QgY2hhciAqbmFtZSwgaW50IG5tbGVuKTsKLWludCAgc21i
 ZnNfc21iX2xvb2t1cChzdHJ1Y3Qgc21ibm9kZSAqZG5wLCBjb25zdCBjaGFyICpuYW1lLCBpbnQg
 bm1sZW4sCitpbnQgIHNtYmZzX3NtYl9sb29rdXAoc3RydWN0IHNtYm5vZGUgKmRucCwgY2hhciAq
 Km5hbWVwLCBpbnQgKm5tbGVucCwKIAlzdHJ1Y3Qgc21iZmF0dHIgKmZhcCwgc3RydWN0IHNtYl9j
 cmVkICpzY3JlZCk7CiAKIGludCAgc21iZnNfZm5hbWVfdG9sb2NhbChzdHJ1Y3Qgc21iX3ZjICp2
 Y3AsIGNoYXIgKm5hbWUsIGludCAqbm1sZW4sIGludCBjYXNlb3B0KTsKZGlmZiAtYXVyTiBzeXMv
 ZnMvc21iZnMub3JpZy9zbWJmc192ZnNvcHMuYyBzeXMvZnMvc21iZnMvc21iZnNfdmZzb3BzLmMK
 LS0tIHN5cy9mcy9zbWJmcy5vcmlnL3NtYmZzX3Zmc29wcy5jCTIwMTItMTItMTQgMDk6NTk6MTEu
 Njc5NzQxMDAwICswMTAwCisrKyBzeXMvZnMvc21iZnMvc21iZnNfdmZzb3BzLmMJMjAxMy0wMi0y
 MCAwNjowMDozNy42Nzk4MjE5NDYgKzAxMDAKQEAgLTMxNiw3ICszMTYsNyBAQAogCX0KIAlzY3Jl
 ZCA9IHNtYmZzX21hbGxvY19zY3JlZCgpOwogCXNtYl9tYWtlc2NyZWQoc2NyZWQsIHRkLCBjcmVk
 KTsKLQllcnJvciA9IHNtYmZzX3NtYl9sb29rdXAoTlVMTCwgTlVMTCwgMCwgJmZhdHRyLCBzY3Jl
 ZCk7CisJZXJyb3IgPSBzbWJmc19zbWJfbG9va3VwKE5VTEwsIE5VTEwsIE5VTEwsICZmYXR0ciwg
 c2NyZWQpOwogCWlmIChlcnJvcikKIAkJZ290byBvdXQ7CiAJZXJyb3IgPSBzbWJmc19uZ2V0KG1w
 LCBOVUxMLCBOVUxMLCAwLCAmZmF0dHIsICZ2cCk7CmRpZmYgLWF1ck4gc3lzL2ZzL3NtYmZzLm9y
 aWcvc21iZnNfdm5vcHMuYyBzeXMvZnMvc21iZnMvc21iZnNfdm5vcHMuYwotLS0gc3lzL2ZzL3Nt
 YmZzLm9yaWcvc21iZnNfdm5vcHMuYwkyMDEyLTEyLTE0IDA5OjU5OjExLjY4Mzc0MDAwMCArMDEw
 MAorKysgc3lzL2ZzL3NtYmZzL3NtYmZzX3Zub3BzLmMJMjAxMy0wMi0yMCAwNjowOTo1Mi44MTM4
 OTE3MDkgKzAxMDAKQEAgLTI3MCw3ICsyNzAsNyBAQAogCXNjcmVkID0gc21iZnNfbWFsbG9jX3Nj
 cmVkKCk7CiAJc21iX21ha2VzY3JlZChzY3JlZCwgY3VydGhyZWFkLCBhcC0+YV9jcmVkKTsKIAlv
 bGRzaXplID0gbnAtPm5fc2l6ZTsKLQllcnJvciA9IHNtYmZzX3NtYl9sb29rdXAobnAsIE5VTEws
 IDAsICZmYXR0ciwgc2NyZWQpOworCWVycm9yID0gc21iZnNfc21iX2xvb2t1cChucCwgTlVMTCwg
 TlVMTCwgJmZhdHRyLCBzY3JlZCk7CiAJaWYgKGVycm9yKSB7CiAJCVNNQlZERUJVRygiZXJyb3Ig
 JWRcbiIsIGVycm9yKTsKIAkJc21iZnNfZnJlZV9zY3JlZChzY3JlZCk7CkBAIC01MTQsNyArNTE0
 LDcgQEAKIAllcnJvciA9IHNtYmZzX3NtYl9jcmVhdGUoZG5wLCBuYW1lLCBubWxlbiwgc2NyZWQp
 OwogCWlmIChlcnJvcikKIAkJZ290byBvdXQ7Ci0JZXJyb3IgPSBzbWJmc19zbWJfbG9va3VwKGRu
 cCwgbmFtZSwgbm1sZW4sICZmYXR0ciwgc2NyZWQpOworCWVycm9yID0gc21iZnNfc21iX2xvb2t1
 cChkbnAsICZuYW1lLCAmbm1sZW4sICZmYXR0ciwgc2NyZWQpOwogCWlmIChlcnJvcikKIAkJZ290
 byBvdXQ7CiAJZXJyb3IgPSBzbWJmc19uZ2V0KFZUT1ZGUyhkdnApLCBkdnAsIG5hbWUsIG5tbGVu
 LCAmZmF0dHIsICZ2cCk7CkBAIC01MjQsNiArNTI0LDggQEAKIAlpZiAoY25wLT5jbl9mbGFncyAm
 IE1BS0VFTlRSWSkKIAkJY2FjaGVfZW50ZXIoZHZwLCB2cCwgY25wKTsKIG91dDoKKwlpZiAobmFt
 ZSAhPSBjbnAtPmNuX25hbWVwdHIpCisJCXNtYmZzX25hbWVfZnJlZSgodV9jaGFyICopbmFtZSk7
 CiAJc21iZnNfZnJlZV9zY3JlZChzY3JlZCk7CiAJcmV0dXJuIGVycm9yOwogfQpAQCAtNzIxLDE2
 ICs3MjMsMTkgQEAKIAllcnJvciA9IHNtYmZzX3NtYl9ta2RpcihkbnAsIG5hbWUsIGxlbiwgc2Ny
 ZWQpOwogCWlmIChlcnJvcikKIAkJZ290byBvdXQ7Ci0JZXJyb3IgPSBzbWJmc19zbWJfbG9va3Vw
 KGRucCwgbmFtZSwgbGVuLCAmZmF0dHIsIHNjcmVkKTsKKwllcnJvciA9IHNtYmZzX3NtYl9sb29r
 dXAoZG5wLCAmbmFtZSwgJmxlbiwgJmZhdHRyLCBzY3JlZCk7CiAJaWYgKGVycm9yKQogCQlnb3Rv
 IG91dDsKIAllcnJvciA9IHNtYmZzX25nZXQoVlRPVkZTKGR2cCksIGR2cCwgbmFtZSwgbGVuLCAm
 ZmF0dHIsICZ2cCk7CiAJaWYgKGVycm9yKQogCQlnb3RvIG91dDsKIAkqYXAtPmFfdnBwID0gdnA7
 CisJZXJyb3IgPSAwOwogb3V0OgorCWlmIChuYW1lICE9IGNucC0+Y25fbmFtZXB0cikKKwkJc21i
 ZnNfbmFtZV9mcmVlKCh1X2NoYXIgKiluYW1lKTsKIAlzbWJmc19mcmVlX3NjcmVkKHNjcmVkKTsK
 LQlyZXR1cm4gMDsKKwlyZXR1cm4gZXJyb3I7CiB9CiAKIC8qCkBAIC0xMTUwLDcgKzExNTUsNyBA
 QAogCQlyZXR1cm4gRU5PRU5UOwogCiAJZXJyb3IgPSBjYWNoZV9sb29rdXAoZHZwLCB2cHAsIGNu
 cCwgTlVMTCwgTlVMTCk7Ci0JU01CVkRFQlVHKCJjYWNoZV9sb29rdXAgcmV0dXJuZWQgJWRcbiIs
 IGVycm9yKTsKKwlTTUJWREVCVUcoImNhY2hlX2xvb2t1cCBmb3IgJyVzJyByZXR1cm5lZCAlZFxu
 IiwgY25wLT5jbl9uYW1lcHRyLCBlcnJvcik7CiAJaWYgKGVycm9yID4gMCkKIAkJcmV0dXJuIGVy
 cm9yOwogCWlmIChlcnJvcikgewkJLyogbmFtZSB3YXMgZm91bmQgKi8KQEAgLTExOTUsNyArMTIw
 MCw4IEBACiAJCSp2cHAgPSBOVUxMVlA7CiAJfQogCS8qIAotCSAqIGVudHJ5IGlzIG5vdCBpbiB0
 aGUgY2FjaGUgb3IgaGFzIGJlZW4gZXhwaXJlZAorCSAqIGVudHJ5IGlzIG5vdCBpbiB0aGUgY2Fj
 aGUsIGhhcyBiZWVuIGV4cGlyZWQKKwkgKiBvciBlbnRyeSBpbiB0aGUgY2FjaGUgZGlkIG5vdCBt
 YXRjaCBpbnB1dCBmaWxlbmFtZSdzIGNhc2UKIAkgKi8KIAllcnJvciA9IDA7CiAJKnZwcCA9IE5V
 TExWUDsKQEAgLTEyMDMsMTggKzEyMDksMTkgQEAKIAlzbWJfbWFrZXNjcmVkKHNjcmVkLCB0ZCwg
 Y25wLT5jbl9jcmVkKTsKIAlmYXAgPSAmZmF0dHI7CiAJaWYgKGZsYWdzICYgSVNET1RET1QpIHsK
 LQkJZXJyb3IgPSBzbWJmc19zbWJfbG9va3VwKFZUT1NNQihkbnAtPm5fcGFyZW50KSwgTlVMTCwg
 MCwgZmFwLAorCQllcnJvciA9IHNtYmZzX3NtYl9sb29rdXAoVlRPU01CKGRucC0+bl9wYXJlbnQp
 LCBOVUxMLCBOVUxMLCBmYXAsCiAJCSAgICBzY3JlZCk7Ci0JCVNNQlZERUJVRygicmVzdWx0IG9m
 IGRvdGRvdCBsb29rdXA6ICVkXG4iLCBlcnJvcik7CisJCVNNQlZERUJVRygicmVzdWx0IG9mIGRv
 dGRvdCBzbWJmc19zbWJfbG9va3VwOiAlZFxuIiwgZXJyb3IpOwogCX0gZWxzZSB7Ci0JCWZhcCA9
 ICZmYXR0cjsKLQkJZXJyb3IgPSBzbWJmc19zbWJfbG9va3VwKGRucCwgbmFtZSwgbm1sZW4sIGZh
 cCwgc2NyZWQpOworCQllcnJvciA9IHNtYmZzX3NtYl9sb29rdXAoZG5wLCAmbmFtZSwgJm5tbGVu
 LCBmYXAsIHNjcmVkKTsKIC8qCQlpZiAoY25wLT5jbl9uYW1lbGVuID09IDEgJiYgY25wLT5jbl9u
 YW1lcHRyWzBdID09ICcuJykqLwogCQlTTUJWREVCVUcoInJlc3VsdCBvZiBzbWJmc19zbWJfbG9v
 a3VwOiAlZFxuIiwgZXJyb3IpOwogCX0KIAlpZiAoZXJyb3IgJiYgZXJyb3IgIT0gRU5PRU5UKQog
 CQlnb3RvIG91dDsKIAlpZiAoZXJyb3IpIHsJCQkvKiBlbnRyeSBub3QgZm91bmQgKi8KKwkJU01C
 VkRFQlVHKCJlbnRyeSBub3QgZm91bmQgb24gc2VydmVyXG4iKTsKKwogCQkvKgogCQkgKiBIYW5k
 bGUgUkVOQU1FIG9yIENSRUFURSBjYXNlLi4uCiAJCSAqLwpAQCAtMTIyOCw5ICsxMjM1LDExIEBA
 CiAJCX0KIAkJZXJyb3IgPSBFTk9FTlQ7CiAJCWdvdG8gb3V0OwotCX0vKiBlbHNlIHsKLQkJU01C
 VkRFQlVHKCJGb3VuZCBlbnRyeSAlcyB3aXRoIGlkPSVkXG4iLCBmYXAtPmVudHJ5TmFtZSwgZmFw
 LT5kaXJFbnROdW0pOwotCX0qLworCX0KKworCS8qIGVudHJ5IGZvdW5kICovCisJU01CVkRFQlVH
 KCJlbnRyeSBmb3VuZCBvbiBzZXJ2ZXI6ICclcydcbiIsIG5hbWUpOworCiAJLyoKIAkgKiBoYW5k
 bGUgREVMRVRFIGNhc2UgLi4uCiAJICovCkBAIC0xMjUxLDYgKzEyNjAsMTMgQEAKIAkJZ290byBv
 dXQ7CiAJfQogCWlmIChuYW1laW9wID09IFJFTkFNRSAmJiBpc2xhc3RjbikgeworCQlpZiAobmFt
 ZSAhPSBjbnAtPmNuX25hbWVwdHIpIHsKKwkJCS8qIFRhcmdldCBoYXMgYmVlbiBmb3VuZCBvbiB0
 aGUgc2VydmVyLiBKdXN0IHJldHVybiBoZXJlCisJCQkqIHRvIGF2b2lyIGZhbGxpbmcgdG8gdGhl
 IHNvdXJjZSB2bm9kZSwgd2hpY2ggd291bGQgbGVhZAorCQkJKiB0byBOT1QgY2FsbCB0aGUgcmVu
 YW1lIHN5c2NhbGwgKi8KKwkJCWVycm9yID0gRUpVU1RSRVRVUk47CisJCQlnb3RvIG91dDsKKwkJ
 fQogCQllcnJvciA9IFZPUF9BQ0NFU1MoZHZwLCBWV1JJVEUsIGNucC0+Y25fY3JlZCwgdGQpOwog
 CQlpZiAoZXJyb3IpCiAJCQlnb3RvIG91dDsKQEAgLTEyNzQsMTEgKzEyOTAsMTQgQEAKIAkJCWVy
 cm9yID0gdmZzX2J1c3kobXAsIDApOwogCQkJdm5fbG9jayhkdnAsIExLX0VYQ0xVU0lWRSB8IExL
 X1JFVFJZKTsKIAkJCXZmc19yZWwobXApOwotCQkJaWYgKGVycm9yKQotCQkJCXJldHVybiAoRU5P
 RU5UKTsKKwkJCWlmIChlcnJvcikgeworCQkJCWVycm9yID0gRU5PRU5UOworCQkJCWdvdG8gb3V0
 OworCQkJfQogCQkJaWYgKChkdnAtPnZfaWZsYWcgJiBWSV9ET09NRUQpICE9IDApIHsKIAkJCQl2
 ZnNfdW5idXN5KG1wKTsKLQkJCQlyZXR1cm4gKEVOT0VOVCk7CQorCQkJCWVycm9yID0gRU5PRU5U
 OworCQkJCWdvdG8gb3V0OwogCQkJfQogCQl9CQogCQlWT1BfVU5MT0NLKGR2cCwgMCk7CkBAIC0x
 MzA4LDYgKzEzMjcsOCBAQAogCQljYWNoZV9lbnRlcihkdnAsICp2cHAsIGNucCk7CiAJfQogb3V0
 OgorCWlmIChuYW1lICE9IGNucC0+Y25fbmFtZXB0cikKKwkJc21iZnNfbmFtZV9mcmVlKCh1X2No
 YXIgKiluYW1lKTsKIAlzbWJmc19mcmVlX3NjcmVkKHNjcmVkKTsKIAlyZXR1cm4gKGVycm9yKTsK
 IH0K
 
 ------=OPENWEBMAIL_ATT_0.350054851341159--

From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 08:28:47 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 91B6C342
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 08:28:47 +0000 (UTC)
 (envelope-from peter@rulingia.com)
Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au
 [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id 20AC9AC6
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 08:28:46 +0000 (UTC)
Received: from server.rulingia.com
 (c220-239-237-213.belrs5.nsw.optusnet.com.au [220.239.237.213])
 by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id r1K8SX8C070885
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
 Wed, 20 Feb 2013 19:28:33 +1100 (EST)
 (envelope-from peter@rulingia.com)
X-Bogosity: Ham, spamicity=0.000000
Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1])
 by server.rulingia.com (8.14.5/8.14.5) with ESMTP id r1K8SSHv003583
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Wed, 20 Feb 2013 19:28:28 +1100 (EST)
 (envelope-from peter@server.rulingia.com)
Received: (from peter@localhost)
 by server.rulingia.com (8.14.5/8.14.5/Submit) id r1K8SSBc003581;
 Wed, 20 Feb 2013 19:28:28 +1100 (EST) (envelope-from peter)
Date: Wed, 20 Feb 2013 19:28:28 +1100
From: Peter Jeremy <peter@rulingia.com>
To: Kevin Day <toasty@dragondata.com>
Subject: Re: Improving ZFS performance for large directories
Message-ID: <20130220082828.GA44920@server.rulingia.com>
References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com>
 <CAJjvXiE+8OMu_yvdRAsWugH7W=fhFW7bicOLLyjEn8YrgvCwiw@mail.gmail.com>
 <F4420A8C-FB92-4771-B261-6C47A736CF7F@dragondata.com>
 <20130201192416.GA76461@server.rulingia.com>
 <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="1yeeQ81UyVL57Vl7"
Content-Disposition: inline
In-Reply-To: <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com>
X-PGP-Key: http://www.rulingia.com/keys/peter.pgp
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 08:28:47 -0000


--1yeeQ81UyVL57Vl7
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2013-Feb-19 14:10:47 -0600, Kevin Day <toasty@dragondata.com> wrote:
>Timing doing an "ls" in large directories 20 times, the first is the
>slowest, then all subsequent listings are roughly the same.

OK.  My testing was on large files rather than large amounts of metadata.

>Thinking I'd make the primary cache metadata only, and the secondary
>cache "all" would improve things,

This won't work as expected.  L2ARC only caches data coming out of ARC
so by setting ARC to cache metadata only, there's never any "data" in
ARC and hence never any evicted from ARC to L2ARC.

> I wiped the device (SATA secure erase to make sure)

That's not necessary.  L2ARC doesn't survive reboots because all teh
L2ARC "metadata" is in ARC only.  This does mean that it takes quite
a while for L2ARC to warm up following a reboot.

>Before adding the SSD, an "ls" in a directory with 65k files would
>take 10-30 seconds, it's now down to about 0.2 seconds.

That sounds quite good.

> There are roughly 29M files, growing at about 50k files/day. We
>recently upgraded, and are now at 96 3TB drives in the pool.=20

That number of files isn't really excessive but it sounds like your
workload has very low locality.  At this stage, my suggestions are:
1) Disable atime if you don't need it & haven't already.
   Otherwise file accesses are triggering metadata updates.
2) Increase vfs.zfs.arc_meta_limit
   You're still getting more metadata misses than data misses
3) Increase your ARC size (more RAM)
   Your pool is quite large compared to your RAM.

>It's a 250G drive, and only 22G is being used, and there's still a
>~66% miss rate.

That's 66% of the requests that missed in ARC.

> Is there any way to tell why more metadata isn't
>being pushed to the L2ARC?

ZFS treats writing to L2ARC very much as an afterthought.  L2ARC writes
are rate limited by vfs.zfs.l2arc_write_{boost,max} and will be aborted
if they might interfere with a read.  I'm not sure how to improve it.

Since this is all generic ZFS, you might like to try asking on
zfs@lists.illumos.org as well.  Some of the experts there might have
some ideas.

--=20
Peter Jeremy

--1yeeQ81UyVL57Vl7
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlEkiSwACgkQ/opHv/APuIdsKQCgq90SUs/wm9rYE5moVPpIXBHu
PCcAn38hMTi+YFknk64N3ro4mR/dSKsk
=Sl9j
-----END PGP SIGNATURE-----

--1yeeQ81UyVL57Vl7--

From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 10:59:00 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id A09CDC1D
 for <freebsd-fs@FreeBSD.org>; Wed, 20 Feb 2013 10:59:00 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id E7CA16FB
 for <freebsd-fs@FreeBSD.org>; Wed, 20 Feb 2013 10:58:59 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
 [212.40.38.101])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA24598;
 Wed, 20 Feb 2013 12:58:49 +0200 (EET) (envelope-from avg@FreeBSD.org)
Message-ID: <5124AC69.6010709@FreeBSD.org>
Date: Wed, 20 Feb 2013 12:58:49 +0200
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130206 Thunderbird/17.0.2
MIME-Version: 1.0
To: Kevin Day <toasty@dragondata.com>
Subject: Re: Improving ZFS performance for large directories
References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com>
 <CAJjvXiE+8OMu_yvdRAsWugH7W=fhFW7bicOLLyjEn8YrgvCwiw@mail.gmail.com>
 <F4420A8C-FB92-4771-B261-6C47A736CF7F@dragondata.com>
 <20130201192416.GA76461@server.rulingia.com>
 <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com>
In-Reply-To: <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com>
X-Enigmail-Version: 1.4.6
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: FreeBSD Filesystems <freebsd-fs@FreeBSD.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 10:59:00 -0000

on 19/02/2013 22:10 Kevin Day said the following:
> Timing doing an "ls" in large directories 20 times, the first is the slowest,
then all subsequent listings are roughly the same. There doesn't appear to be any
gain after 20 repetitions

I think that the above could be related to the below

> 	vfs.zfs.arc_meta_limit                  16398159872
> 	vfs.zfs.arc_meta_used                   16398120264


-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 11:16:59 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 2F26C597
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 11:16:59 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from proxypop04.sare.net (proxypop04.sare.net [194.30.0.65])
 by mx1.freebsd.org (Postfix) with ESMTP id EB058844
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 11:16:58 +0000 (UTC)
Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11])
 by proxypop04.sare.net (Postfix) with ESMTPSA id E74859DF0E2
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 12:16:56 +0100 (CET)
From: Borja Marcos <borjam@sarenet.es>
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: ZFS, lies and statistics
Date: Wed, 20 Feb 2013 12:16:53 +0100
Message-Id: <E22FF1DE-EC87-43E7-9216-4F36C7307AAE@sarenet.es>
To: FreeBSD Filesystems <freebsd-fs@freebsd.org>
Mime-Version: 1.0 (Apple Message framework v1085)
X-Mailer: Apple Mail (2.1085)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 11:16:59 -0000

Hi :)

Still working on polishing devilator to graph some meaningul ZFS =
statistics.

I have been peeking at the ZFS statistics. In RELENG-9.2 it=B4s a bit =
confusing, as there are two different sets of statistics:

Some are under vfs.zfs (I think the following ones are relevant):

vfs.zfs.l2c_only_size: 36603866112
vfs.zfs.mfu_ghost_data_lsize: 730375168
vfs.zfs.mfu_ghost_metadata_lsize: 1075240960
vfs.zfs.mfu_ghost_size: 1805616128
vfs.zfs.mfu_data_lsize: 1785933312
vfs.zfs.mfu_metadata_lsize: 373835264
vfs.zfs.mfu_size: 2198305792
vfs.zfs.mru_ghost_data_lsize: 550150656
vfs.zfs.mru_ghost_metadata_lsize: 1859320320
vfs.zfs.mru_ghost_size: 2409470976
vfs.zfs.mru_data_lsize: 503643648
vfs.zfs.mru_metadata_lsize: 105602560
vfs.zfs.mru_size: 800031744
vfs.zfs.anon_data_lsize: 0
vfs.zfs.anon_metadata_lsize: 0
vfs.zfs.anon_size: 673792

vfs.zfs.arc_meta_used: 2087455400

and of course the new kstat tree.


But there is a discrepancy between the information provided by both, or =
I am missing something.

Having a look at these samples (this is updated from my test system, so =
anyone interested can have a look at the progress), I have tried to =
obtain a good graph of the ARC breakdown using two different approaches =
(one borrowed from the Solaris version of arcstats.pl, which makes this =
calculation,=20

    mru_size =3D ARCSTATS_P;
    if ( ARCSTATS_SIZE > ARCSTATS_C )
        mfu_size =3D ARCSTATS_SIZE - mru_size;
    else
        mfu_size =3D ARCSTATS_C - mru_size;
    add_output_u64("zfs_mfu_size", mfu_size);
    add_output_u64("zfs_mru_size", mru_size);


and, on the other hand, I'm using the sized directly provided by =
vfs.zfs, which turn out to be different.

Which one would be the best?

Thanks,


Borja.


From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 11:21:30 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 4E5A5E3F
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 11:21:30 +0000 (UTC)
 (envelope-from borjam@sarenet.es)
Received: from proxypop04.sare.net (proxypop04.sare.net [194.30.0.65])
 by mx1.freebsd.org (Postfix) with ESMTP id 143F88EB
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 11:21:29 +0000 (UTC)
Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11])
 by proxypop04.sare.net (Postfix) with ESMTPSA id AC4B89DF051;
 Wed, 20 Feb 2013 12:21:28 +0100 (CET)
Subject: Re: ZFS, lies and statistics
Mime-Version: 1.0 (Apple Message framework v1085)
Content-Type: text/plain; charset=us-ascii
From: Borja Marcos <borjam@sarenet.es>
In-Reply-To: <E22FF1DE-EC87-43E7-9216-4F36C7307AAE@sarenet.es>
Date: Wed, 20 Feb 2013 12:21:27 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <8198B045-C5FD-48EA-B17C-52F2386FD727@sarenet.es>
References: <E22FF1DE-EC87-43E7-9216-4F36C7307AAE@sarenet.es>
To: Borja Marcos <borjam@sarenet.es>
X-Mailer: Apple Mail (2.1085)
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 11:21:30 -0000


On Feb 20, 2013, at 12:16 PM, Borja Marcos wrote:

> Still working on polishing devilator to graph some meaningul ZFS =
statistics.


Sorry, I forgot to include the URL.

http://devilator.frobula.com/


Borja.


From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 14:53:45 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id C229D86C
 for <fs@FreeBSD.org>; Wed, 20 Feb 2013 14:53:45 +0000 (UTC)
 (envelope-from jamie@FreeBSD.org)
Received: from m2.gritton.org (gritton.org [199.192.164.235])
 by mx1.freebsd.org (Postfix) with ESMTP id 751BB92E
 for <fs@FreeBSD.org>; Wed, 20 Feb 2013 14:53:44 +0000 (UTC)
Received: from guppy.corp.verio.net (fw.oremut02.us.wh.verio.net
 [198.65.168.24]) (authenticated bits=0)
 by m2.gritton.org (8.14.5/8.14.5) with ESMTP id r1KErhdq029382;
 Wed, 20 Feb 2013 07:53:43 -0700 (MST)
 (envelope-from jamie@FreeBSD.org)
Message-ID: <5124E372.1000009@FreeBSD.org>
Date: Wed, 20 Feb 2013 07:53:38 -0700
From: Jamie Gritton <jamie@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:9.0) Gecko/20120126 Thunderbird/9.0
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: mount/kldload race
References: <51244A13.8030907@FreeBSD.org> <20130220054309.GD2598@kib.kiev.ua>
In-Reply-To: <20130220054309.GD2598@kib.kiev.ua>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 14:53:45 -0000

On 02/19/13 22:43, Konstantin Belousov wrote:
> On Tue, Feb 19, 2013 at 08:59:15PM -0700, Jamie Gritton wrote:
>> Perhaps most people don't try to mount a bunch of filesystems at the
>> same time, at least not those that depend on kernel modules. But it
>> turns out that's going to be a pretty common situation with jails and
>> nullfs. And I found that when attempting such a feat will cause most of
>> these simultaneous mounts to fail with ENODEV.
>>
>> It turns out that the problem is a race in vfs_byname_kld(). First it'll
>> see if the fstype is loaded, and if it isn't then it will load the
>> module. But if the module is loaded by a different process between those
>> two points, the resulting EEXIST from kern_kldload() will make
>> vfs_byname_kld() error out.
>>
>> The fix is pretty simple: don't treat EEXIST as an error. By going on,
>> and rechecking for the fstype, the filesystem can be mounted while still
>> allowing any "real" error to be caught. I'm including a small patch that
>> will accomplish this, and I'd appreciate a quick look by anyone who's
>> familiar with this part of things before I commit it.
>>
>> - Jamie
>
>> Index: sys/kern/vfs_init.c
>> ===================================================================
>> --- sys/kern/vfs_init.c	(revision 247000)
>> +++ sys/kern/vfs_init.c	(working copy)
>> @@ -130,13 +130,18 @@
>>
>>   	/* Try to load the respective module. */
>>   	*error = kern_kldload(td, fstype,&fileid);
>> +	if (*error == EEXIST) {
>> +		*error = 0;
>> +		fileid = 0;
> Why do you clear fileid ? Is this to prevent an attempt to kldunload()
> the module which was not loaded by the current thread ?
>
> If yes, I would suggest to use the separate flag to track this,
> which is cleared on EEXIST error. IMHO it is cleaner and less puzzling.

Yes, that's why.  As a side note, I clear *error ostensibly for the sake 
of the callers, but it turns out none of the callers actually look at 
the returned error.

Here's a new patch with an added flag:


Index: sys/kern/vfs_init.c
===================================================================
--- sys/kern/vfs_init.c	(revision 247000)
+++ sys/kern/vfs_init.c	(working copy)
@@ -122,7 +122,7 @@
  vfs_byname_kld(const char *fstype, struct thread *td, int *error)
  {
  	struct vfsconf *vfsp;
-	int fileid;
+	int fileid, loaded;

  	vfsp = vfs_byname(fstype);
  	if (vfsp != NULL)
@@ -130,13 +130,17 @@

  	/* Try to load the respective module. */
  	*error = kern_kldload(td, fstype, &fileid);
+	loaded = (*error == 0);
+	if (*error == EEXIST)
+		*error = 0;
  	if (*error)
  		return (NULL);

  	/* Look up again to see if the VFS was loaded. */
  	vfsp = vfs_byname(fstype);
  	if (vfsp == NULL) {
-		(void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE);
+		if (loaded)
+			(void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE);
  		*error = ENODEV;
  		return (NULL);
  	}

From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 15:37:14 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 04CFEB41;
 Wed, 20 Feb 2013 15:37:14 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 57222C29;
 Wed, 20 Feb 2013 15:37:13 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r1KFb5Uo069015;
 Wed, 20 Feb 2013 17:37:05 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r1KFb5Uo069015
Received: (from kostik@localhost)
 by tom.home (8.14.6/8.14.6/Submit) id r1KFb5ZH069014;
 Wed, 20 Feb 2013 17:37:05 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 20 Feb 2013 17:37:05 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Jamie Gritton <jamie@FreeBSD.org>
Subject: Re: mount/kldload race
Message-ID: <20130220153705.GE2598@kib.kiev.ua>
References: <51244A13.8030907@FreeBSD.org> <20130220054309.GD2598@kib.kiev.ua>
 <5124E372.1000009@FreeBSD.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="UOi+gfmBpEZPw9cU"
Content-Disposition: inline
In-Reply-To: <5124E372.1000009@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 15:37:14 -0000


--UOi+gfmBpEZPw9cU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Feb 20, 2013 at 07:53:38AM -0700, Jamie Gritton wrote:
> On 02/19/13 22:43, Konstantin Belousov wrote:
> > On Tue, Feb 19, 2013 at 08:59:15PM -0700, Jamie Gritton wrote:
> >> Perhaps most people don't try to mount a bunch of filesystems at the
> >> same time, at least not those that depend on kernel modules. But it
> >> turns out that's going to be a pretty common situation with jails and
> >> nullfs. And I found that when attempting such a feat will cause most of
> >> these simultaneous mounts to fail with ENODEV.
> >>
> >> It turns out that the problem is a race in vfs_byname_kld(). First it'=
ll
> >> see if the fstype is loaded, and if it isn't then it will load the
> >> module. But if the module is loaded by a different process between tho=
se
> >> two points, the resulting EEXIST from kern_kldload() will make
> >> vfs_byname_kld() error out.
> >>
> >> The fix is pretty simple: don't treat EEXIST as an error. By going on,
> >> and rechecking for the fstype, the filesystem can be mounted while sti=
ll
> >> allowing any "real" error to be caught. I'm including a small patch th=
at
> >> will accomplish this, and I'd appreciate a quick look by anyone who's
> >> familiar with this part of things before I commit it.
> >>
> >> - Jamie
> >
> >> Index: sys/kern/vfs_init.c
> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >> --- sys/kern/vfs_init.c	(revision 247000)
> >> +++ sys/kern/vfs_init.c	(working copy)
> >> @@ -130,13 +130,18 @@
> >>
> >>   	/* Try to load the respective module. */
> >>   	*error =3D kern_kldload(td, fstype,&fileid);
> >> +	if (*error =3D=3D EEXIST) {
> >> +		*error =3D 0;
> >> +		fileid =3D 0;
> > Why do you clear fileid ? Is this to prevent an attempt to kldunload()
> > the module which was not loaded by the current thread ?
> >
> > If yes, I would suggest to use the separate flag to track this,
> > which is cleared on EEXIST error. IMHO it is cleaner and less puzzling.
>=20
> Yes, that's why.  As a side note, I clear *error ostensibly for the sake=
=20
> of the callers, but it turns out none of the callers actually look at=20
> the returned error.
>=20
> Here's a new patch with an added flag:
I have no further comments, looks good.

>=20
>=20
> Index: sys/kern/vfs_init.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- sys/kern/vfs_init.c	(revision 247000)
> +++ sys/kern/vfs_init.c	(working copy)
> @@ -122,7 +122,7 @@
>   vfs_byname_kld(const char *fstype, struct thread *td, int *error)
>   {
>   	struct vfsconf *vfsp;
> -	int fileid;
> +	int fileid, loaded;
>=20
>   	vfsp =3D vfs_byname(fstype);
>   	if (vfsp !=3D NULL)
> @@ -130,13 +130,17 @@
>=20
>   	/* Try to load the respective module. */
>   	*error =3D kern_kldload(td, fstype, &fileid);
> +	loaded =3D (*error =3D=3D 0);
> +	if (*error =3D=3D EEXIST)
> +		*error =3D 0;
>   	if (*error)
>   		return (NULL);
>=20
>   	/* Look up again to see if the VFS was loaded. */
>   	vfsp =3D vfs_byname(fstype);
>   	if (vfsp =3D=3D NULL) {
> -		(void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE);
> +		if (loaded)
> +			(void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE);
>   		*error =3D ENODEV;
>   		return (NULL);
>   	}

--UOi+gfmBpEZPw9cU
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRJO2gAAoJEJDCuSvBvK1Bj44P/3XSMKbo4w/4W3Gf7MnCcn01
NOkmXv36TthOxkXUEOWldzKUQzYuvG4lbQ8m8w+nrlP5C5wVFY2a7jWAZBHLEs96
m/H35R5Dke3a4UEd1Q7lb47LhIhR65hmzyBYCP2ylwLKdAI4kuWxmoFPV62QFIqt
csMomk2N7FtMB4F4ryMxHTA6ERzbjh5JE+kdR2KtDFXOdWeDa7no4NYnXjAcxACK
+ikfCHrvYfgsq3goUg12CSFbQth9naKSbRtSm5qXF07akjnwNc+mtrni4mxN/+YM
8kgehvojPIFkDzOw+tTK7EnRlTovXroV0VcDm0r0rAZFo/3eJxvh5nIqbyBpM4oO
tqTs1qGk1o4XA8lytAfP2UFUb9LOD2CIarjcre6Mj/tF5t0CThZLJEZvLRjxNZ0L
Cp46C/vf6m348UsP06gu74WibtGwVBlJthr+IXwLUZnNvsKrDZLrq4EXl7n1gxJ9
VYkeIYStpKklwj/6h0GHBAZhF9sbDUsqUPQ6VFnKz6EOQIxsZTIQtytWImLOTArZ
DHZRKta7AdtylDk8YHH83p90AMEduCBFWse/ZGv7+kcUbyehtfMm1QmTkdBwPZd4
WYKd/Gy5y35xgD+MWuTvly7aqss3dwHk581CQyfAjFYEkvcW8VZtLYDrgbrk9rfu
/vE7ZFtHooVRyr6mlsV2
=PTS+
-----END PGP SIGNATURE-----

--UOi+gfmBpEZPw9cU--

From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 16:07:16 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 678B69F8
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 16:07:16 +0000 (UTC)
 (envelope-from toasty@dragondata.com)
Received: from mail-ia0-x22e.google.com (ia-in-x022e.1e100.net
 [IPv6:2607:f8b0:4001:c02::22e])
 by mx1.freebsd.org (Postfix) with ESMTP id E6C0CE75
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 16:07:15 +0000 (UTC)
Received: by mail-ia0-f174.google.com with SMTP id u20so3104499iag.5
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 08:07:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=dragondata.com; s=google;
 h=x-received:content-type:mime-version:subject:from:in-reply-to:date
 :cc:content-transfer-encoding:message-id:references:to:x-mailer;
 bh=di9AS21CdIBzZAWj2A8TRmDTg3dF/VnKOSDwhlacPU8=;
 b=phfaBN8o5G261w5EbNaWYFV2Z4P5c0ol0meJbvSO17qrsNZIH/CfK+b9IvdLFcrFnE
 14lYZucXNYbX0wBlLm5DdGSCra1fdh6HMxE5Y3gefix72VHORWYQf3wc/xYAGmIxb4bJ
 Dc8j1/xEXMNptMeRR7X7nYDcFVvzhkOs5cM+I=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-received:content-type:mime-version:subject:from:in-reply-to:date
 :cc:content-transfer-encoding:message-id:references:to:x-mailer
 :x-gm-message-state;
 bh=di9AS21CdIBzZAWj2A8TRmDTg3dF/VnKOSDwhlacPU8=;
 b=k3+bfI6lXqXAY/GFPBKP/FtYGS7hBg2PV2ncjW2aOapRvQ01IlF5kW5bm3XsX/nhuj
 jZ9mG0P6fO66tRKJp7Sf0SDO0WJFP/BrxtcW1E/AMvY3LKiLHSaJosuxCfAYG1kxgMq5
 /xcRvh1BBaFI+ZMv3aufPYwsxgd+k/SCqI69AM1ipYE8PFSFEhj28Il5G/O4OeGpLM+5
 EPlg7gjw1LI/GwTQxuAB7cu9JYiFszn+Q53drXgLS/c0z2RSXnm7OgrmQRcmsWoEpodS
 8y6RsNcu9KGuL+eNufDnuXbR79xv9B0xUNJcu+R8HYBkoHkWRBc1aQPi1ppwIalm336f
 oa/Q==
X-Received: by 10.42.95.146 with SMTP id f18mr9641819icn.9.1361376435436;
 Wed, 20 Feb 2013 08:07:15 -0800 (PST)
Received: from vpn132.rw1.your.org (vpn132.rw1.your.org. [204.9.51.132])
 by mx.google.com with ESMTPS id s8sm766074igs.0.2013.02.20.08.07.13
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Wed, 20 Feb 2013 08:07:14 -0800 (PST)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: Improving ZFS performance for large directories
From: Kevin Day <toasty@dragondata.com>
In-Reply-To: <20130220082828.GA44920@server.rulingia.com>
Date: Wed, 20 Feb 2013 10:07:11 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <2F90562A-7F98-49A5-8431-4313961EFA70@dragondata.com>
References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com>
 <CAJjvXiE+8OMu_yvdRAsWugH7W=fhFW7bicOLLyjEn8YrgvCwiw@mail.gmail.com>
 <F4420A8C-FB92-4771-B261-6C47A736CF7F@dragondata.com>
 <20130201192416.GA76461@server.rulingia.com>
 <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com>
 <20130220082828.GA44920@server.rulingia.com>
To: Peter Jeremy <peter@rulingia.com>
X-Mailer: Apple Mail (2.1499)
X-Gm-Message-State: ALoCoQl1AObRP2c0/VpjQFroRFEvU3zm14dOacSgmElQHoY5m5tRgz5g59GyZkUDONvGkQ/Oxge2
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 16:07:16 -0000


On Feb 20, 2013, at 2:28 AM, Peter Jeremy <peter@rulingia.com> wrote:
>> Thinking I'd make the primary cache metadata only, and the secondary
>> cache "all" would improve things,
>=20
> This won't work as expected.  L2ARC only caches data coming out of ARC
> so by setting ARC to cache metadata only, there's never any "data" in
> ARC and hence never any evicted from ARC to L2ARC.
>=20

That makes sense, I wasn't sure if it was smart enough to realize this =
happening or not, but I guess it won't work.


>> I wiped the device (SATA secure erase to make sure)
>=20
> That's not necessary.  L2ARC doesn't survive reboots because all teh
> L2ARC "metadata" is in ARC only.  This does mean that it takes quite
> a while for L2ARC to warm up following a reboot.
>=20

I was more concerned with the SSD's performance than ZFS caring what was =
there. A few cases completely filled the SSD, which can slow things down =
(there are no free blocks for it to use). Secure Erase will reset it so =
the drive's controller knows EVERYTHING is really free. We have one =
model of SSD here that will drop to about 5% of it's original =
performance after every block on the drive has been written to once. =
We're not using that model anymore, but I still like to be sure. :)

>> There are roughly 29M files, growing at about 50k files/day. We
>> recently upgraded, and are now at 96 3TB drives in the pool.=20
>=20
> That number of files isn't really excessive but it sounds like your
> workload has very low locality.  At this stage, my suggestions are:
> 1) Disable atime if you don't need it & haven't already.
>   Otherwise file accesses are triggering metadata updates.
> 2) Increase vfs.zfs.arc_meta_limit
>   You're still getting more metadata misses than data misses
> 3) Increase your ARC size (more RAM)
>   Your pool is quite large compared to your RAM.
>=20

Yeah, I think the locality is basically zero. It's multiple rsyncs =
running across the entire filesystem repeatedly. Each directory is only =
going to be touched once per pass through, so that isn't really going to =
benefit much from cache unless we get lucky and two rsyncs come in =
back-to-back where one is chasing another.

Atime is already off globally - nothing we use needs it. We are at the =
limit for RAM for this motherboard, so any further increases are going =
to be quite expensive.=20

>=20
>> Is there any way to tell why more metadata isn't
>> being pushed to the L2ARC?
>=20
> ZFS treats writing to L2ARC very much as an afterthought.  L2ARC =
writes
> are rate limited by vfs.zfs.l2arc_write_{boost,max} and will be =
aborted
> if they might interfere with a read.  I'm not sure how to improve it.
>=20

At this stage there are just zero writes being done, so perhaps the =
problem is that with so much pressure on the arc metadata, nothing is =
getting a chance to get pushed into the L2ARC. I'm going to try to =
increase the meta limit on ARC, but there's not a great deal more I can =
do.

> Since this is all generic ZFS, you might like to try asking on
> zfs@lists.illumos.org as well.  Some of the experts there might have
> some ideas.

I will try that, thanks!

-- Kevin


From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 19:27:06 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 584F02CF
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 19:27:06 +0000 (UTC)
 (envelope-from radiomlodychbandytow@o2.pl)
Received: from moh2-ve1.go2.pl (moh2-ve1.go2.pl [193.17.41.186])
 by mx1.freebsd.org (Postfix) with ESMTP id E4A25E47
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 19:27:05 +0000 (UTC)
Received: from moh2-ve1.go2.pl (unknown [10.0.0.186])
 by moh2-ve1.go2.pl (Postfix) with ESMTP id 31D0244D54F
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 20:26:43 +0100 (CET)
Received: from unknown (unknown [10.0.0.74])
 by moh2-ve1.go2.pl (Postfix) with SMTP
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 20:26:43 +0100 (CET)
Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id nQYMSp;
 Wed, 20 Feb 2013 20:26:43 +0100
Message-ID: <51252372.1040001@o2.pl>
Date: Wed, 20 Feb 2013 20:26:42 +0100
From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= <radiomlodychbandytow@o2.pl>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130201 Thunderbird/17.0.2
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: Some filesystem thoughts
References: <mailman.15.1361361601.62143.freebsd-fs@freebsd.org>
In-Reply-To: <mailman.15.1361361601.62143.freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-O2-Trust: 1, 31
X-O2-SPF: neutral
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 19:27:06 -0000

Hello,
I'm a pretty fresh Unix user, suffering from productivity loss caused by 
changing OS. I dearly miss a couple of facilities that I had implemented 
in my file manager (Total Commander) and sought a file manager that 
could replace them. I'm pretty sure there's none. I found that Unix does 
some of them at the OS level and that's a superior way. But with others 
it doesn't; some file managers implement them by themselves, much like 
TC (not well enough, but that's another rant), but I think that for 
them, OS is the right place too and that's what I'd like to talk about. 
I come with a free idea that I think would be awesome to have 
implemented, while not being sure it even can be implemented sensibly 
within Unix. Maybe I miss something? Maybe the idea ain't good? Maybe 
there are things that do the job well enough already and I just miss them?

Anyway, here's the story:

Total Commander's filesystem plugins are awesome. They enable users to 
manage remote / virtual resources just like remote filesystems. FTP, 
websites, process list, calendar; the variety is rich.
In Unix, there are equivalents for some of them; the ones that mattered 
the most for me can be usually simulated by mount.
And that's a better way because when mounted, they can be used by any 
program, not just file manager. I'm sure that all people here are used 
to enjoying the benefits of this approach, though for me they are novel.

The other thing - packer plugins. They allow treating archives like 
directories. Again, there are many useful ones, some obvious (like 
zips), some not so much. I treated my executables as directories, which 
enabled me to easily manipulate resources stored inside. Especially 
useful when hacking closed-source Delphi programs as they contain lots 
of GUI code stored directly (The name 'TNASTYNAGSCREEN' will stay in my 
mind for long). Or extracting icons. Or doing many other things that are 
necessary to play with closed source code, but less relevant in Unix.
There was a steganography plugin storing data inside images. A plugin 
for generation and browsing of file lists. A Java decompiler. And a 
great variety of others.

Unix file managers offer similar, though not so rich options. Yet I 
think it's not their job. Like with mounting, there's great benefit from 
being able to use standard tools with them.
Some write things like zipfs, but I think it's wrong.
First, typing a command is cumbersome. Second, even if it was automated, 
mounting needs a mount point. The only good one is the file itself; 
working with a dozen (or thousand) of archives in a single directory is 
a norm for me. Switching dirs back and forth would be very disruptive. 
Breaks relative paths. And so on.

The way I see it is not to treat files as streams of bytes. That's not 
what they are, files have meanings and there are tools that bring them 
out. A picture is a stored emotion. OK, there are no tools for that yet. 
But it is also an array of pixels. And a container with exif data. And 
may be a container with an encrypted archive. And, a stream of bytes too.
They have multiple facets.
I think that it would be useful to somehow expose them to applications.
Wouldn't it be useful to be able to grep through pdfs in your email 
attachments?
Mass-edit music tags with sed? Manually edit with your favourite text 
editor instead of the sucky one-liner provided by your favourite music 
player?
How about video players being able to play videos by reading them in 
decoded form directly from the filesystem instead of having to integrate 
a significant number of complex libraries to provide sufficient format 
coverage?
-- 
Twoje radio

From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 19:37:23 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 876ED807
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 19:37:23 +0000 (UTC)
 (envelope-from momchil@xaxo.eu)
Received: from vps2.xaxo.eu (vps2.xaxo.eu [78.47.156.66])
 by mx1.freebsd.org (Postfix) with ESMTP id 13A4CEE4
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 19:37:22 +0000 (UTC)
Received: from t61.xaxo.eu ([10.75.23.6])
 by vps2.xaxo.eu (8.14.4/8.14.4) with ESMTP id r1KJbE4D087018;
 Wed, 20 Feb 2013 20:37:14 +0100 (CET) (envelope-from momchil@xaxo.eu)
Date: Wed, 20 Feb 2013 20:37:07 +0100
Message-ID: <86621m4w0s.wl%momchil@xaxo.eu>
From: Momchil Ivanov <momchil@xaxo.eu>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: NFS + Kerberos
In-Reply-To: <992481316.3137385.1361325642681.JavaMail.root@erie.cs.uoguelph.ca>
References: <86a88ac8bb038ec5d8034724dcf80924.squirrel@webmail.xaxo.eu>
 <992481316.3137385.1361325642681.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka")
Content-Type: text/plain; charset=US-ASCII
Cc: freebsd-fs@freebsd.org, Momchil Ivanov <momchil@xaxo.eu>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 19:37:23 -0000

At Tue, 19 Feb 2013 21:00:42 -0500 (EST),
Rick Macklem wrote:
> 
> Momchil Ivanov wrote:
> > On Tue, February 19, 2013 12:56 am, Rick Macklem wrote:
> > > Thanks to Elias's hard work, a bug/fix has just been isolated in the
> > > Kerberos library that causes the gssd to fail to translate a
> > > principal
> > > to a uid. The fix is to increase the size of the buffer passed to
> > > getpwnam_r(). See this thread:
> > > http://docs.FreeBSD.org/cgi/mid.cgi?CADtN0WKVzbKxhaLQw8y2KLhhRJC9n4ht9wyPmGQ+pHqSjQkVNw
> > >
> > > I haven't run into this bug, so I don't know what systems are
> > > affected,
> > > but it would explain why you can't get it working.
> > >
> > > I'd suggest you apply the patch in the email (increase buf to 1024)
> > > and
> > > then try again with libraries built with the patch.
> > 
> > Do I have to aplly the patch to the server only and then rebuild world
> > or
> > do I have to do the same on the client too? And do I need to rebuild
> > heimdal on both machines?
> > 
> The bug should only affect the server, since the client never translates
> between principal_name<->uid. (The client does a rather cheezey trick of
> using the uid to select the correct credential cache file.)
> 
> > btw, I checked the logs of the kdc and could not see any trace of the
> > nfs
> > server trying to validate the client's ticket... Frankly, I don't know
> > that should I expect there, I haven't used kerberos before, so I have
> > no
> > idea if it's related to the bug. Here is part of the log:
> > 
> > AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for
> > krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL
> > No preauth found, returning PREAUTH-REQUIRED -- user@EXAMPLE.LOCAL
> > sending 407 bytes to IPv4:X.X.X.X
> > AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for
> > krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL
> > Client sent patypes: encrypted-timestamp
> > Looking for PKINIT pa-data -- user@EXAMPLE.LOCAL
> > Looking for ENC-TS pa-data -- user@EXAMPLE.LOCAL
> > ENC-TS Pre-authentication succeeded -- user@EXAMPLE.LOCAL using
> > des-cbc-crc
> > Client supported enctypes: des-cbc-crc
> > Using des-cbc-crc/aes256-cts-hmac-sha1-96
> > AS-REQ authtime: 2013-02-11T23:45:44 starttime: unset endtime:
> > 2013-02-12T09:45:39 renew till: unset
> > sending 552 bytes to IPv4:X.X.X.X
> > 
> Hmm, that sounds like you are never getting as far as sending the
> ticket to the server, but I'm not at home, so I can't look and see
> exactly what gets logged. (Also, I use a MIT KDC, so what gets logged
> might be different.)
> 
> I've attached a trivial program that you can compile/run as root
> on the NFS server to see if 128 bytes is a big enough buffer for your setup.
> If it can print out the uid for the usernames you test as arguments,
> the patch isn't needed for your environment.
> (Oh, and it has a typo bug in the errx() arguments, but it works ok
>  for testing.)
> 
> Good luck with it, rick

Your test program works with a regular user, but fails with root,
indeed.

I will try the patch. Do I need to rebuild only world or do I have to
rebuild heimdal too?

Thanks you,
Momchil

From owner-freebsd-fs@FreeBSD.ORG  Wed Feb 20 23:10:52 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 107ECAD3
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 23:10:52 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id CF7BE15F
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 23:10:51 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEAIVWJVGDaFvO/2dsb2JhbABFhkm6GoEZc4IfAQEEASMEUgUWGAICDRkCWQaIHwYMrgWSPoEjjBoagQM0B4ItgRMDiGaNRoEdjz6DJYFNBxcGGA
X-IronPort-AV: E=Sophos;i="4.84,705,1355115600"; d="scan'208";a="15020446"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu.net.uoguelph.ca with ESMTP; 20 Feb 2013 18:10:48 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 57C07B3FAC;
 Wed, 20 Feb 2013 18:10:48 -0500 (EST)
Date: Wed, 20 Feb 2013 18:10:48 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Momchil Ivanov <momchil@xaxo.eu>
Message-ID: <222730394.3167100.1361401848290.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <86621m4w0s.wl%momchil@xaxo.eu>
Subject: Re: NFS + Kerberos
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Feb 2013 23:10:52 -0000

Momchil Ivanov wrote:
> At Tue, 19 Feb 2013 21:00:42 -0500 (EST),
> Rick Macklem wrote:
> >
> > Momchil Ivanov wrote:
> > > On Tue, February 19, 2013 12:56 am, Rick Macklem wrote:
> > > > Thanks to Elias's hard work, a bug/fix has just been isolated in
> > > > the
> > > > Kerberos library that causes the gssd to fail to translate a
> > > > principal
> > > > to a uid. The fix is to increase the size of the buffer passed
> > > > to
> > > > getpwnam_r(). See this thread:
> > > > http://docs.FreeBSD.org/cgi/mid.cgi?CADtN0WKVzbKxhaLQw8y2KLhhRJC9n4ht9wyPmGQ+pHqSjQkVNw
> > > >
> > > > I haven't run into this bug, so I don't know what systems are
> > > > affected,
> > > > but it would explain why you can't get it working.
> > > >
> > > > I'd suggest you apply the patch in the email (increase buf to
> > > > 1024)
> > > > and
> > > > then try again with libraries built with the patch.
> > >
> > > Do I have to aplly the patch to the server only and then rebuild
> > > world
> > > or
> > > do I have to do the same on the client too? And do I need to
> > > rebuild
> > > heimdal on both machines?
> > >
> > The bug should only affect the server, since the client never
> > translates
> > between principal_name<->uid. (The client does a rather cheezey
> > trick of
> > using the uid to select the correct credential cache file.)
> >
> > > btw, I checked the logs of the kdc and could not see any trace of
> > > the
> > > nfs
> > > server trying to validate the client's ticket... Frankly, I don't
> > > know
> > > that should I expect there, I haven't used kerberos before, so I
> > > have
> > > no
> > > idea if it's related to the bug. Here is part of the log:
> > >
> > > AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for
> > > krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL
> > > No preauth found, returning PREAUTH-REQUIRED -- user@EXAMPLE.LOCAL
> > > sending 407 bytes to IPv4:X.X.X.X
> > > AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for
> > > krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL
> > > Client sent patypes: encrypted-timestamp
> > > Looking for PKINIT pa-data -- user@EXAMPLE.LOCAL
> > > Looking for ENC-TS pa-data -- user@EXAMPLE.LOCAL
> > > ENC-TS Pre-authentication succeeded -- user@EXAMPLE.LOCAL using
> > > des-cbc-crc
> > > Client supported enctypes: des-cbc-crc
> > > Using des-cbc-crc/aes256-cts-hmac-sha1-96
> > > AS-REQ authtime: 2013-02-11T23:45:44 starttime: unset endtime:
> > > 2013-02-12T09:45:39 renew till: unset
> > > sending 552 bytes to IPv4:X.X.X.X
> > >
> > Hmm, that sounds like you are never getting as far as sending the
> > ticket to the server, but I'm not at home, so I can't look and see
> > exactly what gets logged. (Also, I use a MIT KDC, so what gets
> > logged
> > might be different.)
> >
> > I've attached a trivial program that you can compile/run as root
> > on the NFS server to see if 128 bytes is a big enough buffer for
> > your setup.
> > If it can print out the uid for the usernames you test as arguments,
> > the patch isn't needed for your environment.
> > (Oh, and it has a typo bug in the errx() arguments, but it works ok
> >  for testing.)
> >
> > Good luck with it, rick
> 
> Your test program works with a regular user, but fails with root,
> indeed.
> 
> I will try the patch. Do I need to rebuild only world or do I have to
> rebuild heimdal too?
> 
I would have thought kerberos was rebuilt for make buildworld. If you
use heimdal from somewhere else (ports or their distro), I don't think
that needs to be rebuilt, since I don't think the ..pname_to_uid()
function is a part of a generic heimdal distribution, but I am not
sure.

Be sure to change buf[128] --> buf[1024] in both:
kerberos5/lib/libgssapi_krb5/pname_to_uid.c
usr.sbin/gssd/gssd.c

(Or paths close to that. I might not have remembered them quite correctly;-)

rick

> Thanks you,
> Momchil

From owner-freebsd-fs@FreeBSD.ORG  Thu Feb 21 00:19:22 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id C0C1E1E8
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 00:19:22 +0000 (UTC)
 (envelope-from grarpamp@gmail.com)
Received: from mail-ve0-f169.google.com (mail-ve0-f169.google.com
 [209.85.128.169]) by mx1.freebsd.org (Postfix) with ESMTP id 86D7378D
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 00:19:22 +0000 (UTC)
Received: by mail-ve0-f169.google.com with SMTP id 15so7638032vea.0
 for <freebsd-fs@freebsd.org>; Wed, 20 Feb 2013 16:19:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:in-reply-to:references:date:message-id
 :subject:from:to:content-type;
 bh=dUfPOYubUiyl+ilqHJYQ9xGlvBUm11Qw0LWBIYxIjv4=;
 b=XsD1Ru1fW6PHkJ7axwLHNZu+lF1wpfJ7F+jX7nyXtMtOThGU45UsE8g6a58RgtrVqV
 sEzR1ASKak1azmbQ9hsVVQ9DDF25aoWG4GwP298aL0hChaFLjvWe6PzCQs5m9Rv6Mqg6
 GBPO7P5AcOXxs0l+3NJyndb6xHrxuLnOuBUi2FywGL7KPHKm2+l2EZf0DB2VsPql5d6E
 xW6zh18qAM+nMBJ9scs7XR1awRyKySTEXImHiK84qyenpSwFA7MubWd+PxxnTJDXTVW8
 bw1Yji9U5dVBmMhGygvBPFgecKfqC/F9fJqtTje8OWiyUg0TeYcVpdsAxRu3BDXPqciH
 pg5g==
MIME-Version: 1.0
X-Received: by 10.52.22.194 with SMTP id g2mr25371345vdf.91.1361405956612;
 Wed, 20 Feb 2013 16:19:16 -0800 (PST)
Received: by 10.220.219.79 with HTTP; Wed, 20 Feb 2013 16:19:16 -0800 (PST)
In-Reply-To: <20130215171144.710bf9af@fabiankeil.de>
References: <CAD2Ti2_9i3rj5763UCjzxRw_7+qDky1MNRJzdvOZnmdYpLUfYQ@mail.gmail.com>
 <CA+tpaK3wA4kNVzLfE9EDaR1SGLN7_t-N-9Sw+vcsdeUKX4EFoA@mail.gmail.com>
 <CAD2Ti28ZKhrf3Yo06ooL9NXChtSEgvE-Rv_FngY1nfnVwUZ3YQ@mail.gmail.com>
 <20130215171144.710bf9af@fabiankeil.de>
Date: Wed, 20 Feb 2013 19:19:16 -0500
Message-ID: <CAD2Ti294qNyveJUgt=KLV9Acq+PGGK6LAtmNYfSAq2oCrkj6Rw@mail.gmail.com>
Subject: Re: Crazy ZFS ZIL options: md(4) umass(4)
From: grarpamp <grarpamp@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Feb 2013 00:19:22 -0000

Still digesting this thread in free time.
There are some articles too...

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
http://www.slideshare.net/relling/zfs-tutorial-lisa-2011
http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained
http://dtrace.org/blogs/brendan/2009/06/26/slog-screenshots/
https://espix.net/~wildcat/txt/zfs-fragmentation.txt
http://pthree.org/2012/12/06/zfs-administration-part-iii-the-zfs-intent-log/
http://www.techforce.com.br/news/layout/set/print/linux_blog/zfs_part_4_sustained_random_small_files_sync_write_iops

Whatever happened to the old ISA ExpandedDRAM drives?
Today, bus based internal boards given mobo support of lots of ram
don't seem to make too much sense. But there has to be a cheap
SATA interface version of these things... a drive tray where you can
just stuff it with DIMMs and a battery.
Cheap as in, am I missing an entire class of $20-$50 devices
here? That's all they should cost in parts (minus ram), yet all I
see are $multikilo 'enterprise' stuff. If that's really the case one
could make them from China.

I can't see burning up an SSD (cost) for non-enterprise use.
I'll test with USB to expose failure modes. Will probably end up with
RAMZIL/syncdisable or adding a 10k spindle pair.

From owner-freebsd-fs@FreeBSD.ORG  Thu Feb 21 01:42:08 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 99DDE6A6
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 01:42:08 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 62FE9A5F
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 01:42:07 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqIEADN7JVGDaFvO/2dsb2JhbAArGhaGM7oQgRtzgh8BAQEDAQEBASArIAsFFhgCAg0ZAikBCSYGCAcEARkDBIdrBgwtrUySN4EjjBQWgQ00B4ItgRMDiGaLDoI4gR2PPoMlT4EFNQ
X-IronPort-AV: E=Sophos;i="4.84,705,1355115600"; d="scan'208";a="17555604"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 20 Feb 2013 20:42:01 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id DB320B4032;
 Wed, 20 Feb 2013 20:42:01 -0500 (EST)
Date: Wed, 20 Feb 2013 20:42:01 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: grarpamp <grarpamp@gmail.com>
Message-ID: <127974626.3169918.1361410921874.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <CAD2Ti294qNyveJUgt=KLV9Acq+PGGK6LAtmNYfSAq2oCrkj6Rw@mail.gmail.com>
Subject: Re: Crazy ZFS ZIL options: md(4) umass(4)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Feb 2013 01:42:08 -0000

grarpamp wrote:
> Still digesting this thread in free time.
> There are some articles too...
> 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> http://www.slideshare.net/relling/zfs-tutorial-lisa-2011
> http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained
> http://dtrace.org/blogs/brendan/2009/06/26/slog-screenshots/
> https://espix.net/~wildcat/txt/zfs-fragmentation.txt
> http://pthree.org/2012/12/06/zfs-administration-part-iii-the-zfs-intent-log/
> http://www.techforce.com.br/news/layout/set/print/linux_blog/zfs_part_4_sustained_random_small_files_sync_write_iops
> 
> Whatever happened to the old ISA ExpandedDRAM drives?
> Today, bus based internal boards given mobo support of lots of ram
> don't seem to make too much sense. But there has to be a cheap
> SATA interface version of these things... a drive tray where you can
> just stuff it with DIMMs and a battery.
> Cheap as in, am I missing an entire class of $20-$50 devices
> here? That's all they should cost in parts (minus ram), yet all I
> see are $multikilo 'enterprise' stuff. If that's really the case one
> could make them from China.
> 
Someone posted mentioning this one. ($337 isn't $20-%50, but...):
http://www.acard.com/english/fb01-product.jsp?idno_no=382&prod_no=ANS-9010BA&type1_idno=5&ino=28

> I can't see burning up an SSD (cost) for non-enterprise use.
> I'll test with USB to expose failure modes. Will probably end up with
> RAMZIL/syncdisable or adding a 10k spindle pair.
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Thu Feb 21 12:11:13 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id CD66112B6
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 12:11:13 +0000 (UTC)
 (envelope-from grarpamp@gmail.com)
Received: from mail-oa0-f44.google.com (mail-oa0-f44.google.com
 [209.85.219.44]) by mx1.freebsd.org (Postfix) with ESMTP id A2BAF26B
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 12:11:13 +0000 (UTC)
Received: by mail-oa0-f44.google.com with SMTP id h1so9168215oag.17
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 04:11:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:date:message-id:subject:from:to
 :content-type; bh=yLnbQxW8i3pwtNfamuDq6MxUQj/wYfP7gBt3Pxh56Y0=;
 b=HFD2br+lZ0KZl8brxvYUj1KGdrnTLNfrUukrY2w6dzU+JOIEs7hBS5cncH0eoSTqbz
 Fv3Uw1l7ioxwhos/+0ONhvQ3+xTFbeBH0JNCOn49FM3I5E/2iRNcgYBDV7u4nv6CDIQl
 S8mKJItA+lBKkGAtkYsvBiXgqnV8y2Vc1x1T1GSsh1nO7TtnDzZXuXTHQe6Nhgb8MH9V
 Ggyja8zPkgFSIaaGRHpubddLJag4A4Oy2Kp4Cvad49Jclyhqu3xF1xIcVrnIIkh2PUJK
 pSUSWnkiNx00/3BAQkF9OAh67vdwT+qRaWPnXjuwXn4lxrZe6EDVdlhXoAr7H/DlX29Y
 A6Qw==
MIME-Version: 1.0
X-Received: by 10.60.11.35 with SMTP id n3mr7035118oeb.90.1361442112804; Thu,
 21 Feb 2013 02:21:52 -0800 (PST)
Received: by 10.60.146.203 with HTTP; Thu, 21 Feb 2013 02:21:52 -0800 (PST)
Date: Thu, 21 Feb 2013 05:21:52 -0500
Message-ID: <CAD2Ti2_1eKYVy-hqq8MgpHUwybu_dyh=i+hEbJ7bMH5ORaUJmA@mail.gmail.com>
Subject: Crazy ZFS ZIL options: md(4) umass(4) NAND SATA PCI
From: grarpamp <grarpamp@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Feb 2013 12:11:13 -0000

> Someone posted mentioning this one. ($337 isn't $20-%50, but...):
> http://www.acard.com/english/fb01-product.jsp?idno_no=382&prod_no=ANS-9010BA&type1_idno=5&ino=28

It's 32GiB, 20K iops (8KiB?), 14 minute CF dump, 4hr life, but only specs at
200MiB/s over sata2 (a few drives in jbod can match that sustained xfer rate).
They make a dual port version at 16GiB and 200MiB/s per port (400MiB
host striped).
It's ddr2 ram is nearly obsolete and at $80/4GiB is almost 4x more than ddr3.
Optional 64GiB CF card $85 (or use non-ecc ram emulate and a 32GiB CF card $40).
Optional whatever 12vdc source you want to rig to it.
So to fill it out you're looking at $295 dev + $640 ram ...  $1100 stoked.
And the ANS-9010B is 48GiB for $250.

There's old Gigabyte gc-ramdisk i-ram, 4GiB, sata1, DDR1, battery ... ~$100

STEC ZeusRAM 8GiB, DDR3, SLC, SAS2, $2500-$3000
It's about as hot as the price.

Bus based...

You can get a PCI-e 4GiB, 60 second SLC dump, supercaps, from ddrdrive.com
for $2000, and they might even let you write a FreeBSD driver for it.
It's price, size and flexibility are not that hot, performance maybe
40K iops (4KiB),
on par with the above.

www.fusionio.com/products/iodrive-octal/

NAND...
SLC 20GiB ish ... $150
MLC 60GiB ish ... $75

There's still room for a cheap DDR3+SATA3 unit.
Or just fill out your motherboard slots and add a UPS :)

From owner-freebsd-fs@FreeBSD.ORG  Thu Feb 21 12:33:16 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 582393BA
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 12:33:16 +0000 (UTC)
 (envelope-from ronald-freebsd8@klop.yi.org)
Received: from cpsmtpb-ews06.kpnxchange.com (cpsmtpb-ews06.kpnxchange.com
 [213.75.39.9]) by mx1.freebsd.org (Postfix) with ESMTP id BCC8CAAA
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 12:33:15 +0000 (UTC)
Received: from cpsps-ews21.kpnxchange.com ([10.94.84.187]) by
 cpsmtpb-ews06.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); 
 Thu, 21 Feb 2013 12:38:46 +0100
Received: from CPSMTPM-TLF103.kpnxchange.com ([195.121.3.6]) by
 cpsps-ews21.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); 
 Thu, 21 Feb 2013 12:38:46 +0100
Received: from sjakie.klop.ws ([212.182.167.131]) by
 CPSMTPM-TLF103.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); 
 Thu, 21 Feb 2013 12:40:07 +0100
Received: from 212-182-167-131.ip.telfort.nl (localhost [127.0.0.1])
 by sjakie.klop.ws (Postfix) with ESMTP id 7001EA988
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 12:40:07 +0100 (CET)
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: freebsd-fs@freebsd.org
Subject: Re: Some filesystem thoughts
References: <mailman.15.1361361601.62143.freebsd-fs@freebsd.org>
 <51252372.1040001@o2.pl>
Date: Thu, 21 Feb 2013 12:40:07 +0100
MIME-Version: 1.0
Content-Transfer-Encoding: Quoted-Printable
From: "Ronald Klop" <ronald-freebsd8@klop.yi.org>
Message-ID: <op.wsutc5a68527sy@212-182-167-131.ip.telfort.nl>
In-Reply-To: <51252372.1040001@o2.pl>
User-Agent: Opera Mail/12.14 (FreeBSD)
X-OriginalArrivalTime: 21 Feb 2013 11:40:07.0783 (UTC)
 FILETIME=[32D43F70:01CE1028]
X-RcptDomain: freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Feb 2013 12:33:16 -0000

On Wed, 20 Feb 2013 20:26:42 +0100, Radio m=C5=82odych bandyt=C3=B3w  =

<radiomlodychbandytow@o2.pl> wrote:

> Hello,
> I'm a pretty fresh Unix user, suffering from productivity loss caused =
by  =

> changing OS. I dearly miss a couple of facilities that I had implement=
ed  =

> in my file manager (Total Commander) and sought a file manager that  =

> could replace them. I'm pretty sure there's none. I found that Unix do=
es  =

> some of them at the OS level and that's a superior way. But with other=
s  =

> it doesn't; some file managers implement them by themselves, much like=
  =

> TC (not well enough, but that's another rant), but I think that for  =

> them, OS is the right place too and that's what I'd like to talk about=
.  =

> I come with a free idea that I think would be awesome to have  =

> implemented, while not being sure it even can be implemented sensibly =
 =

> within Unix. Maybe I miss something? Maybe the idea ain't good? Maybe =
 =

> there are things that do the job well enough already and I just miss  =

> them?
>
> Anyway, here's the story:
>
> Total Commander's filesystem plugins are awesome. They enable users to=
  =

> manage remote / virtual resources just like remote filesystems. FTP,  =

> websites, process list, calendar; the variety is rich.
> In Unix, there are equivalents for some of them; the ones that mattere=
d  =

> the most for me can be usually simulated by mount.
> And that's a better way because when mounted, they can be used by any =
 =

> program, not just file manager. I'm sure that all people here are used=
  =

> to enjoying the benefits of this approach, though for me they are nove=
l.
>
> The other thing - packer plugins. They allow treating archives like  =

> directories. Again, there are many useful ones, some obvious (like  =

> zips), some not so much. I treated my executables as directories, whic=
h  =

> enabled me to easily manipulate resources stored inside. Especially  =

> useful when hacking closed-source Delphi programs as they contain lots=
  =

> of GUI code stored directly (The name 'TNASTYNAGSCREEN' will stay in m=
y  =

> mind for long). Or extracting icons. Or doing many other things that a=
re  =

> necessary to play with closed source code, but less relevant in Unix.
> There was a steganography plugin storing data inside images. A plugin =
 =

> for generation and browsing of file lists. A Java decompiler. And a  =

> great variety of others.
>
> Unix file managers offer similar, though not so rich options. Yet I  =

> think it's not their job. Like with mounting, there's great benefit fr=
om  =

> being able to use standard tools with them.
> Some write things like zipfs, but I think it's wrong.
> First, typing a command is cumbersome. Second, even if it was automate=
d,  =

> mounting needs a mount point. The only good one is the file itself;  =

> working with a dozen (or thousand) of archives in a single directory i=
s  =

> a norm for me. Switching dirs back and forth would be very disruptive.=
  =

> Breaks relative paths. And so on.
>
> The way I see it is not to treat files as streams of bytes. That's not=
  =

> what they are, files have meanings and there are tools that bring them=
  =

> out. A picture is a stored emotion. OK, there are no tools for that ye=
t.  =

> But it is also an array of pixels. And a container with exif data. And=
  =

> may be a container with an encrypted archive. And, a stream of bytes t=
oo.
> They have multiple facets.
> I think that it would be useful to somehow expose them to applications=
.
> Wouldn't it be useful to be able to grep through pdfs in your email  =

> attachments?
> Mass-edit music tags with sed? Manually edit with your favourite text =
 =

> editor instead of the sucky one-liner provided by your favourite music=
  =

> player?
> How about video players being able to play videos by reading them in  =

> decoded form directly from the filesystem instead of having to integra=
te  =

> a significant number of complex libraries to provide sufficient format=
  =

> coverage?

Creative ideas.
Part of what you want is in fusefs (mounting of files to edit their  =

content). And part is implemented in e.g. KDE (integrated support for  =

various file types in fulltext search and tagging of files/metadata, etc=
.).
The chances of having all these complex libraries integrated in the  =

FreeBSD OS are close to zero I presume. But I am not in a position to  =

decide about that.
I think you can't expect the OS to serve everybody's detailed wishes. Th=
e  =

OS serves files and user programs know what to do with them.

Ronald.

From owner-freebsd-fs@FreeBSD.ORG  Thu Feb 21 16:18:56 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 593EFD3
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 16:18:56 +0000 (UTC)
 (envelope-from momchil@xaxo.eu)
Received: from vps2.xaxo.eu (vps2.xaxo.eu [78.47.156.66])
 by mx1.freebsd.org (Postfix) with ESMTP id DB75422B
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 16:18:55 +0000 (UTC)
Received: from vps2.xaxo.eu (localhost [127.0.0.1])
 by vps2.xaxo.eu (8.14.4/8.14.4) with ESMTP id r1LGIroG093454;
 Thu, 21 Feb 2013 17:18:53 +0100 (CET) (envelope-from momchil@xaxo.eu)
Received: (from www@localhost)
 by vps2.xaxo.eu (8.14.4/8.14.4/Submit) id r1LGIr6f093453;
 Thu, 21 Feb 2013 17:18:53 +0100 (CET) (envelope-from momchil@xaxo.eu)
X-Authentication-Warning: vps2.xaxo.eu: www set sender to momchil@xaxo.eu
 using -f
Received: from 139.18.9.22 (SquirrelMail authenticated user space)
 by webmail.xaxo.eu with HTTP; Thu, 21 Feb 2013 17:18:53 +0100
Message-ID: <d112e84c5a294f5e009e8eac4eb0cf19.squirrel@webmail.xaxo.eu>
Date: Thu, 21 Feb 2013 17:18:53 +0100
Subject: Re: NFS + Kerberos
From: "Momchil Ivanov" <momchil@xaxo.eu>
To: "Rick Macklem" <rmacklem@uoguelph.ca>
User-Agent: SquirrelMail/1.4.21
MIME-Version: 1.0
Content-Type: text/plain;charset=utf-8
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Feb 2013 16:18:56 -0000

On Thu, February 21, 2013 12:10 am, Rick Macklem wrote:
> I would have thought kerberos was rebuilt for make buildworld. If you
use heimdal from somewhere else (ports or their distro), I don't think
that needs to be rebuilt, since I don't think the ..pname_to_uid()
function is a part of a generic heimdal distribution, but I am not sure.
>
> Be sure to change buf[128] --> buf[1024] in both:
> kerberos5/lib/libgssapi_krb5/pname_to_uid.c
> usr.sbin/gssd/gssd.c
>
> (Or paths close to that. I might not have remembered them quite
> correctly;-)

this change allows for yet another entry in the kdc log:

2013-02-21T17:03:43 TGS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for
nfs/srv.example.local@EXAMPLE.LOCAL
2013-02-21T17:03:44 TGS-REQ authtime: 2013-02-21T17:02:03 starttime:
2013-02-21T17:03:43 endtime: 2013-02-22T03:02:00 renew till: unset
2013-02-21T17:03:44 sending 612 bytes to IPv4:X.X.X.X

which seems promising, but I still get:

$ mount -t nfs -o nfsv4,sec=krb5i srv.example.local:/ /mnt/srv
  mount_nfs: can't update /var/db/mounttab for srv.example.local:/ nfsv4
err=10016
  mount_nfs: /mnt/srv, : Input/output error

do you happen to have any other ideas?

Thank you,
Momchil


From owner-freebsd-fs@FreeBSD.ORG  Thu Feb 21 19:07:12 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 55A00347;
 Thu, 21 Feb 2013 19:07:12 +0000 (UTC)
 (envelope-from mavbsd@gmail.com)
Received: from mail-bk0-f53.google.com (mail-bk0-f53.google.com
 [209.85.214.53]) by mx1.freebsd.org (Postfix) with ESMTP id B8FF4220;
 Thu, 21 Feb 2013 19:07:11 +0000 (UTC)
Received: by mail-bk0-f53.google.com with SMTP id j10so4240626bkw.40
 for <multiple recipients>; Thu, 21 Feb 2013 11:07:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=x-received:sender:message-id:date:from:user-agent:mime-version:to
 :cc:subject:references:in-reply-to:content-type
 :content-transfer-encoding;
 bh=bEA4zuirgVUuOV99wU7y7zIaXL8UT8/6iXIjaQFp9Yk=;
 b=Bzeamx+Pl52jSjLjffP2m7YluD3Tx7vBrEX2PDU9grJtqpT3sofIAaQC3dVOsYqyPr
 6wNudlyyA61KRlYUw3jJ1TytnVOTfuaSxs4G9MaLsts3aYcTNgIrcVOaz0ludoAL5bT/
 yCyk5/2WSTPEB/1KzbkNBWJ+UAlAgL0cWWiQyTUbTQ2vRRwJ2fY95YLFfMrCH7aI389O
 AWU0MQc36Ktsaqwy2v9jjtQklJktKszrobSNsaMLv0vYmbziaI6yhvqT9m3/A/LmsOm+
 aZJa/bhw28kod6Mqq8DdFLp86fQM9HGu67F6Wqlnn/ZfveYy53NgOVxla4O3sHwPzSfC
 TOlg==
X-Received: by 10.204.8.16 with SMTP id f16mr11376400bkf.81.1361473625153;
 Thu, 21 Feb 2013 11:07:05 -0800 (PST)
Received: from mavbook.mavhome.dp.ua (mavhome.mavhome.dp.ua. [213.227.240.37])
 by mx.google.com with ESMTPS id
 go8sm24841835bkc.20.2013.02.21.11.07.02
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Thu, 21 Feb 2013 11:07:03 -0800 (PST)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <51267055.3040500@FreeBSD.org>
Date: Thu, 21 Feb 2013 21:07:01 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130125 Thunderbird/17.0.2
MIME-Version: 1.0
To: Jeremy Chadwick <jdc@koitsu.org>
Subject: Re: disk "flipped" - a known problem?
References: <20130121221617.GA23909@icarus.home.lan>
 <50FED818.7070704@FreeBSD.org> <20130125083619.GA51096@icarus.home.lan>
 <20130125211232.GA3037@icarus.home.lan>
 <20130125212559.GA1772@icarus.home.lan>
 <20130125213209.GA1858@icarus.home.lan>
 <20130126011754.GA1806@icarus.home.lan>
In-Reply-To: <20130126011754.GA1806@icarus.home.lan>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, avg@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Feb 2013 19:07:12 -0000

On 26.01.2013 03:17, Jeremy Chadwick wrote:
> Okay, I've figured out the exact, 100% reproducible condition that
> causes the situation.  It took me a lot of tries and a digital pocket
> recorder to take verbal notes (there are just too many things to look at
> simultaneously), but I've figured it out.
> 
> I'm sorry for the verbosity, but it's necessary.
> 
> Assume the disk we're talking about is /dev/ada5.
> 
> 1. Prior to any issues, we have this:
> 
> root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5*
> crw-r-----  1 root  operator  0x8c Jan 25 16:41 /dev/ada5
> crw-------  1 root  operator  0x75 Jan 25 16:35 /dev/pass5
> crw-------  1 root  operator  0x51 Jan 25 16:35 /dev/xpt0
> 
> 2. ada5 begins experiencing issues -- ATA commands (CDBs) submit do not
> get a response (not going to discuss how/why that can happen).
> 
> 3. These types of messages are seen on console (naturally the CDB and
> request type will vary -- in this case it was because I was doing the dd
> zero'ing, thus tickling the bad sector/naughty firmware on the drive):
> 
> Jan 25 16:29:28 icarus kernel: ahcich5: Timeout on slot 0 port 0
> Jan 25 16:29:28 icarus kernel: ahcich5: is 00000000 cs 00000000 ss 00000001 rs 00000001 tfd 40 serr 00000000 cmd 0004c017
> Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset...
> Jan 25 16:29:28 icarus kernel: ahcich5: SATA connect time=1000us status=00000113
> Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset: device found
> Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED.  ACB: 61 80 80 77 01 40 00 00 00 00 00 00
> Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout
> Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): Retrying command
> 
> 4. Any I/O submit to ada5 during this time blocks (this is normal).
> 
> 5. **While this situation is happening**, something using xpt(4)
> attempts to submit a CDB to the disk (ex. smartctl -a /dev/ada5).
> This request also blocks (again, normal).
> 
> 6. Physical device falls off bus, or CAM kicks the disk off the bus.
> Doesn't matter which.  We see messages resembling this (boy am I tired
> of this interspersed output problem):
> 
> Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): lost device
> Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): lost device
> Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): removing device entry
> Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): passdevgonecb: devfs entry is gone
> 
> 7. Standard I/O requests fail with errno=6 "Device not configured".
> xpt(4) requests also fail with the same errno.
> 
> 8. Device-wise, at this stage all we have is:
> 
> root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5*
> crw-------  1 root  operator  0x51 Jan 25 16:35 /dev/xpt0
> 
> 9. Device comes back online for whatever reason.  FreeBSD sees the disk,
> blah blah blah:
> 
> Jan 25 16:30:16 icarus kernel: GEOM: new disk ada5
> Jan 25 16:30:16 icarus kernel: ada5: <WDC WD1500ADFD-00NLR4 21.07QR4> ATA-7 SATA 1.x device
> Jan 25 16:30:16 icarus kernel: ada5: Serial Number WD-WMAP41573589
> Jan 25 16:30:16 icarus kernel: ada5: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
> Jan 25 16:30:16 icarus kernel: ada5: Command Queueing enabled
> Jan 25 16:30:16 icarus kernel: ada5: 143089MB (293046768 512 byte sectors: 16H 63S/T 16383C)
> Jan 25 16:30:16 icarus kernel: ada5: Previously was known as ad14
> 
> ...um, where's pass5?
> 
> 10. /dev/pass5 is now completely (permanently) missing:
> 
> root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5*
> crw-r-----  1 root  operator  0x99 Jan 25 16:42 /dev/ada5
> crw-------  1 root  operator  0x51 Jan 25 16:35 /dev/xpt0
> 
> 11. Any further attempts to communicate via xpt(4) with ada5 fail.
> Detaching and reattaching the disk does not fix the issue; the only fix
> is to reboot the system.
> 
> 12. "camcontrol debug -IPXp scbus5" results in tons and tons of output
> all pertaining to xpt(4).  It looks like xpt(4) is in some kind of
> loop.
> 
> Below is my verbose boot (with non-kernel things removed), which
> also includes "camcontrol debug" output once things are in a bad state:
> 
> http://jdc.koitsu.org/freebsd/xpt_oddity.log
> 
> In this log you'll see that after 1 CAM timeout I yanked the drive, then
> roughly 30 seconds later reinserted it.
> 
> If you need me to turn on CAM debugging *prior* to the above, I can do
> that, just let me know.
> 
> The important step is #5.  Without that, the problem shown in #9/10/11
> does not happen.
> 
> It's a good thing I don't run smartd(8) -- most users I see using that
> software set the interval to something like 180s or 60s.  Imagine this
> frustration: "okay so the disk fell off the bus, but what, now I can't
> talk to it with SMART?  Uhhh... <reboots>  Err, works now?  Whatever".

I think, the problem may already be fixed in HEAD by r244014 by ken@.
I've just merged it to 9-STABLE at r247115. So if it is still possible
to reproduce the situation, it would be good to try.

-- 
Alexander Motin

From owner-freebsd-fs@FreeBSD.ORG  Thu Feb 21 23:18:04 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 57307BFE
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 23:18:04 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 14CAA3D6
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 23:18:03 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEANepJlGDaFvO/2dsb2JhbABFhk63bYJYgRxzgh8BAQQBIwRSBRYYAgINGQJZBogfBq0ckhKBI403NAeCLYETA4hpjU2QXoMlggk
X-IronPort-AV: E=Sophos;i="4.84,711,1355115600"; d="scan'208";a="17707622"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 21 Feb 2013 18:17:56 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 9D232B3F0D;
 Thu, 21 Feb 2013 18:17:56 -0500 (EST)
Date: Thu, 21 Feb 2013 18:17:56 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Momchil Ivanov <momchil@xaxo.eu>
Message-ID: <496437657.3199038.1361488676628.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <d112e84c5a294f5e009e8eac4eb0cf19.squirrel@webmail.xaxo.eu>
Subject: Re: NFS + Kerberos
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Feb 2013 23:18:04 -0000

Momchil Ivanov wrote:
> On Thu, February 21, 2013 12:10 am, Rick Macklem wrote:
> > I would have thought kerberos was rebuilt for make buildworld. If
> > you
> use heimdal from somewhere else (ports or their distro), I don't think
> that needs to be rebuilt, since I don't think the ..pname_to_uid()
> function is a part of a generic heimdal distribution, but I am not
> sure.
> >
> > Be sure to change buf[128] --> buf[1024] in both:
> > kerberos5/lib/libgssapi_krb5/pname_to_uid.c
> > usr.sbin/gssd/gssd.c
> >
> > (Or paths close to that. I might not have remembered them quite
> > correctly;-)
> 
> this change allows for yet another entry in the kdc log:
> 
> 2013-02-21T17:03:43 TGS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for
> nfs/srv.example.local@EXAMPLE.LOCAL
> 2013-02-21T17:03:44 TGS-REQ authtime: 2013-02-21T17:02:03 starttime:
> 2013-02-21T17:03:43 endtime: 2013-02-22T03:02:00 renew till: unset
> 2013-02-21T17:03:44 sending 612 bytes to IPv4:X.X.X.X
> 
> which seems promising, but I still get:
> 
> $ mount -t nfs -o nfsv4,sec=krb5i srv.example.local:/ /mnt/srv
> mount_nfs: can't update /var/db/mounttab for srv.example.local:/ nfsv4
> err=10016
> mount_nfs: /mnt/srv, : Input/output error
> 
Error 10016 is NFS4ERR_WRONGSEC. This means that the server expects a
different security flavour (sys maybe) at some point in the mount.

I can't remember if you posted your /etc/exports file before, but
I suspect the file system referred by the root sepcified in the V4:
line isn't allowing krb5i. For example, if you wanted to mount the
file system rooted at /home by the above, you would need the following
2 lines in /etc/exports.

/home -sec=krb5i <host-or-network>
V4: /home -sec=krb5i

You can list other security flavours for -sec, but krb5i needs to be
one of them.

rick
ps: Don't worry about the "can't update /var/db/mounttab". It is
    basically harmless and can be fixed by allowing the user doing
    the mount write access to it. If you don't do that, then the
    mount will still work ok, it will just generate the message.

> do you happen to have any other ideas?
> 
> Thank you,
> Momchil

From owner-freebsd-fs@FreeBSD.ORG  Thu Feb 21 23:36:11 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id EB0DE34D
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 23:36:11 +0000 (UTC)
 (envelope-from jdc@koitsu.org)
Received: from qmta14.emeryville.ca.mail.comcast.net
 (qmta14.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:44:76:96:27:212])
 by mx1.freebsd.org (Postfix) with ESMTP id CF45D6D6
 for <freebsd-fs@freebsd.org>; Thu, 21 Feb 2013 23:36:11 +0000 (UTC)
Received: from omta03.emeryville.ca.mail.comcast.net ([76.96.30.27])
 by qmta14.emeryville.ca.mail.comcast.net with comcast
 id 39gl1l0050b6N64AEBcBSU; Thu, 21 Feb 2013 23:36:11 +0000
Received: from koitsu.strangled.net ([67.180.84.87])
 by omta03.emeryville.ca.mail.comcast.net with comcast
 id 3Bc91l00a1t3BNj8PBc90d; Thu, 21 Feb 2013 23:36:11 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
 id 90BB373A1C; Thu, 21 Feb 2013 15:36:09 -0800 (PST)
Date: Thu, 21 Feb 2013 15:36:09 -0800
From: Jeremy Chadwick <jdc@koitsu.org>
To: Alexander Motin <mav@FreeBSD.org>
Subject: Re: disk "flipped" - a known problem?
Message-ID: <20130221233609.GA92249@icarus.home.lan>
References: <20130121221617.GA23909@icarus.home.lan>
 <50FED818.7070704@FreeBSD.org>
 <20130125083619.GA51096@icarus.home.lan>
 <20130125211232.GA3037@icarus.home.lan>
 <20130125212559.GA1772@icarus.home.lan>
 <20130125213209.GA1858@icarus.home.lan>
 <20130126011754.GA1806@icarus.home.lan>
 <51267055.3040500@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <51267055.3040500@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net;
 s=q20121106; t=1361489771;
 bh=/tdPUkBiQoJZ+fmmIV8Uv6IkEU4GUkzKoe4gUAwxsKo=;
 h=Received:Received:Received:Date:From:To:Subject:Message-ID:
 MIME-Version:Content-Type;
 b=h3ygpcrcN6qvQmmVCiUrQt1zBZFS72vqx958L1Kyoo9kAadASwosUPyG7Gt9GNj4L
 HU3eFXFLwR19lkKTBjKmRGNmbJuXmfzo+GIb2QBFsiDV3Ie8VVgIj+KvBzbVidNMZ9
 F5sBAIMI+xw7I3fh0Xi2yOv7osibwJX+5MmMBa8CFICIsLimPx53BxTQj/A66upPLM
 mEvJ2h2w3ByEW9GBlsqaC9tWw9QbP+SPVZv7tBWZz0Do8jIRSAP8WEq6noZnDQKVlT
 YmniiTkWZzE3eJdrIizTXlt8mB0RpDz1UnbJQsBaW+2Op1NAt2wdg04bVgnmDbUPao
 S9WWHUW27J4QA==
Cc: freebsd-fs@freebsd.org, avg@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Feb 2013 23:36:12 -0000

On Thu, Feb 21, 2013 at 09:07:01PM +0200, Alexander Motin wrote:
> On 26.01.2013 03:17, Jeremy Chadwick wrote:
> > Okay, I've figured out the exact, 100% reproducible condition that
> > causes the situation.  It took me a lot of tries and a digital pocket
> > recorder to take verbal notes (there are just too many things to look at
> > simultaneously), but I've figured it out.
> > 
> > I'm sorry for the verbosity, but it's necessary.
> > 
> > Assume the disk we're talking about is /dev/ada5.
> > 
> > 1. Prior to any issues, we have this:
> > 
> > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5*
> > crw-r-----  1 root  operator  0x8c Jan 25 16:41 /dev/ada5
> > crw-------  1 root  operator  0x75 Jan 25 16:35 /dev/pass5
> > crw-------  1 root  operator  0x51 Jan 25 16:35 /dev/xpt0
> > 
> > 2. ada5 begins experiencing issues -- ATA commands (CDBs) submit do not
> > get a response (not going to discuss how/why that can happen).
> > 
> > 3. These types of messages are seen on console (naturally the CDB and
> > request type will vary -- in this case it was because I was doing the dd
> > zero'ing, thus tickling the bad sector/naughty firmware on the drive):
> > 
> > Jan 25 16:29:28 icarus kernel: ahcich5: Timeout on slot 0 port 0
> > Jan 25 16:29:28 icarus kernel: ahcich5: is 00000000 cs 00000000 ss 00000001 rs 00000001 tfd 40 serr 00000000 cmd 0004c017
> > Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset...
> > Jan 25 16:29:28 icarus kernel: ahcich5: SATA connect time=1000us status=00000113
> > Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset: device found
> > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED.  ACB: 61 80 80 77 01 40 00 00 00 00 00 00
> > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout
> > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): Retrying command
> > 
> > 4. Any I/O submit to ada5 during this time blocks (this is normal).
> > 
> > 5. **While this situation is happening**, something using xpt(4)
> > attempts to submit a CDB to the disk (ex. smartctl -a /dev/ada5).
> > This request also blocks (again, normal).
> > 
> > 6. Physical device falls off bus, or CAM kicks the disk off the bus.
> > Doesn't matter which.  We see messages resembling this (boy am I tired
> > of this interspersed output problem):
> > 
> > Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): lost device
> > Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): lost device
> > Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): removing device entry
> > Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): passdevgonecb: devfs entry is gone
> > 
> > 7. Standard I/O requests fail with errno=6 "Device not configured".
> > xpt(4) requests also fail with the same errno.
> > 
> > 8. Device-wise, at this stage all we have is:
> > 
> > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5*
> > crw-------  1 root  operator  0x51 Jan 25 16:35 /dev/xpt0
> > 
> > 9. Device comes back online for whatever reason.  FreeBSD sees the disk,
> > blah blah blah:
> > 
> > Jan 25 16:30:16 icarus kernel: GEOM: new disk ada5
> > Jan 25 16:30:16 icarus kernel: ada5: <WDC WD1500ADFD-00NLR4 21.07QR4> ATA-7 SATA 1.x device
> > Jan 25 16:30:16 icarus kernel: ada5: Serial Number WD-WMAP41573589
> > Jan 25 16:30:16 icarus kernel: ada5: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
> > Jan 25 16:30:16 icarus kernel: ada5: Command Queueing enabled
> > Jan 25 16:30:16 icarus kernel: ada5: 143089MB (293046768 512 byte sectors: 16H 63S/T 16383C)
> > Jan 25 16:30:16 icarus kernel: ada5: Previously was known as ad14
> > 
> > ...um, where's pass5?
> > 
> > 10. /dev/pass5 is now completely (permanently) missing:
> > 
> > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5*
> > crw-r-----  1 root  operator  0x99 Jan 25 16:42 /dev/ada5
> > crw-------  1 root  operator  0x51 Jan 25 16:35 /dev/xpt0
> > 
> > 11. Any further attempts to communicate via xpt(4) with ada5 fail.
> > Detaching and reattaching the disk does not fix the issue; the only fix
> > is to reboot the system.
> > 
> > 12. "camcontrol debug -IPXp scbus5" results in tons and tons of output
> > all pertaining to xpt(4).  It looks like xpt(4) is in some kind of
> > loop.
> > 
> > Below is my verbose boot (with non-kernel things removed), which
> > also includes "camcontrol debug" output once things are in a bad state:
> > 
> > http://jdc.koitsu.org/freebsd/xpt_oddity.log
> > 
> > In this log you'll see that after 1 CAM timeout I yanked the drive, then
> > roughly 30 seconds later reinserted it.
> > 
> > If you need me to turn on CAM debugging *prior* to the above, I can do
> > that, just let me know.
> > 
> > The important step is #5.  Without that, the problem shown in #9/10/11
> > does not happen.
> > 
> > It's a good thing I don't run smartd(8) -- most users I see using that
> > software set the interval to something like 180s or 60s.  Imagine this
> > frustration: "okay so the disk fell off the bus, but what, now I can't
> > talk to it with SMART?  Uhhh... <reboots>  Err, works now?  Whatever".
> 
> I think, the problem may already be fixed in HEAD by r244014 by ken@.
> I've just merged it to 9-STABLE at r247115. So if it is still possible
> to reproduce the situation, it would be good to try.

Yep, I saw the commit per svn-src-stable-9@freebsd.org, along with
a bunch of others; I wasn't sure if r247114 or r247115 fixed it, so ws
waiting for a follow-up from you.  :-)

I'll rebuild world/kernel and try it out + report back.  Thank you (and
ken@ too!) for the work on this.

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

From owner-freebsd-fs@FreeBSD.ORG  Fri Feb 22 01:03:11 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 8E2F98E6
 for <freebsd-fs@freebsd.org>; Fri, 22 Feb 2013 01:03:11 +0000 (UTC)
 (envelope-from momchil@xaxo.eu)
Received: from vps2.xaxo.eu (vps2.xaxo.eu [78.47.156.66])
 by mx1.freebsd.org (Postfix) with ESMTP id 2A3E8A3D
 for <freebsd-fs@freebsd.org>; Fri, 22 Feb 2013 01:03:10 +0000 (UTC)
Received: from t61.xaxo.eu ([10.75.23.6])
 by vps2.xaxo.eu (8.14.4/8.14.4) with ESMTP id r1M132BT098425;
 Fri, 22 Feb 2013 02:03:02 +0100 (CET) (envelope-from momchil@xaxo.eu)
Date: Fri, 22 Feb 2013 02:02:53 +0100
Message-ID: <86ip5lkvnm.wl%momchil@xaxo.eu>
From: Momchil Ivanov <momchil@xaxo.eu>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: NFS + Kerberos
In-Reply-To: <496437657.3199038.1361488676628.JavaMail.root@erie.cs.uoguelph.ca>
References: <d112e84c5a294f5e009e8eac4eb0cf19.squirrel@webmail.xaxo.eu>
 <496437657.3199038.1361488676628.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka")
Content-Type: text/plain; charset=US-ASCII
Cc: freebsd-fs@freebsd.org, Momchil Ivanov <momchil@xaxo.eu>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Feb 2013 01:03:11 -0000

At Thu, 21 Feb 2013 18:17:56 -0500 (EST),
Rick Macklem wrote:
> Error 10016 is NFS4ERR_WRONGSEC. This means that the server expects a
> different security flavour (sys maybe) at some point in the mount.

btw you have a typo, it's NFSERR_WRONGSEC. The problem is that I think
it would be hard for me to find the piece of code that issues it in my
case, so that I can understand why. Unfortunately, I am not familiar
with NFS and the kernel internals... and since there are a number of
places where it can be generated [1] and the machine that I am using
as a NFS server, is rather slow in compiling world... it would be hard
for me to instrument the code...

> I can't remember if you posted your /etc/exports file before, but
> I suspect the file system referred by the root sepcified in the V4:
> line isn't allowing krb5i. For example, if you wanted to mount the
> file system rooted at /home by the above, you would need the following
> 2 lines in /etc/exports.
> 
> /home -sec=krb5i <host-or-network>
> V4: /home -sec=krb5i

here is my /etc/exports:

V4: /tank/storage -sec=krb5i:krb5p
/tank/storage -sec=krb5i:krb5p

> You can list other security flavours for -sec, but krb5i needs to be
> one of them.
> 
> rick
> ps: Don't worry about the "can't update /var/db/mounttab". It is
>     basically harmless and can be fixed by allowing the user doing
>     the mount write access to it. If you don't do that, then the
>     mount will still work ok, it will just generate the message.

I know this :)

btw I have Kerberos working with sshd on the same machine, so I think
I have managed to set it up correctly... but the NFS server doesn't
want to work with Kerberos.. the changes you suggested were in the
right direction, since I can now see TGS-REQ lines in the KDC log, but
there might still be some bugs here, or I am doing something wrong...

Ideas are welcomed :) I would be happy to get it working.

1: http://fxr.watson.org/fxr/ident?v=FREEBSD9;i=NFSERR_WRONGSEC

Thank you,
Momchil

From owner-freebsd-fs@FreeBSD.ORG  Fri Feb 22 02:46:01 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 4D798EAE
 for <freebsd-fs@freebsd.org>; Fri, 22 Feb 2013 02:46:01 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id EC1BFEA2
 for <freebsd-fs@freebsd.org>; Fri, 22 Feb 2013 02:46:00 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEAIXbJlGDaFvO/2dsb2JhbABFhk66S4Efc4IfAQEEASNWBRYYAgINBQETAlkGiB8GDK0YkhuBI4wwgQc0BxIBghqBEwOIaY1NkF6DJYFMAQcXHg
X-IronPort-AV: E=Sophos;i="4.84,713,1355115600"; d="scan'208";a="17724606"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 21 Feb 2013 21:45:59 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 9378DB3F13;
 Thu, 21 Feb 2013 21:45:59 -0500 (EST)
Date: Thu, 21 Feb 2013 21:45:59 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Momchil Ivanov <momchil@xaxo.eu>
Message-ID: <1845485841.3202259.1361501159585.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <86ip5lkvnm.wl%momchil@xaxo.eu>
Subject: Re: NFS + Kerberos
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Feb 2013 02:46:01 -0000

Momchil Ivanov wrote:
> At Thu, 21 Feb 2013 18:17:56 -0500 (EST),
> Rick Macklem wrote:
> > Error 10016 is NFS4ERR_WRONGSEC. This means that the server expects
> > a
> > different security flavour (sys maybe) at some point in the mount.
> 
> btw you have a typo, it's NFSERR_WRONGSEC.
Actually, it's called NFS4ERR_WRONGSEC in the RFC and NFSERR_WRONGSEC in
the NFS sources, just to try and confuse you;-)

> The problem is that I think
> it would be hard for me to find the piece of code that issues it in my
> case, so that I can understand why. Unfortunately, I am not familiar
> with NFS and the kernel internals... and since there are a number of
> places where it can be generated [1] and the machine that I am using
> as a NFS server, is rather slow in compiling world... it would be hard
> for me to instrument the code...
> 
> > I can't remember if you posted your /etc/exports file before, but
> > I suspect the file system referred by the root sepcified in the V4:
> > line isn't allowing krb5i. For example, if you wanted to mount the
> > file system rooted at /home by the above, you would need the
> > following
> > 2 lines in /etc/exports.
> >
> > /home -sec=krb5i <host-or-network>
> > V4: /home -sec=krb5i
> 
> here is my /etc/exports:
> 
> V4: /tank/storage -sec=krb5i:krb5p
> /tank/storage -sec=krb5i:krb5p
> 
Just as an experiment, you could try adding "sys" to the -sec list
for both lines. If the mount works then, it would tell you that the
client isn't successfully getting a Kerberos credential and is
falling back to using "sys" (called AUTH_SYS in the RFCs, just for
further confusion;-).

> > You can list other security flavours for -sec, but krb5i needs to be
> > one of them.
> >
> > rick
> > ps: Don't worry about the "can't update /var/db/mounttab". It is
> >     basically harmless and can be fixed by allowing the user doing
> >     the mount write access to it. If you don't do that, then the
> >     mount will still work ok, it will just generate the message.
> 
> I know this :)
> 
> btw I have Kerberos working with sshd on the same machine, so I think
> I have managed to set it up correctly... but the NFS server doesn't
> want to work with Kerberos.. the changes you suggested were in the
> right direction, since I can now see TGS-REQ lines in the KDC log, but
> there might still be some bugs here, or I am doing something wrong...
> 
> Ideas are welcomed :) I would be happy to get it working.
> 
Check to see what the user's credential cache file is called.
If you "ls -l /tmp" you should be able to find it.

If it isn't called /tmp/krb5cc_<uid>, where <uid> is the uid for
the user, then you will need the recent patch applied to the gssd.c
that adds a "-s" option to search for the credential cache file in a list of
directories. This patch is in head as r244604 and stable/9 as
r245089, but not in any release. (Some sshds generate separate
credential cache files for each login session, although not the
default one in the system, as far as I understand.)

rick

> 1: http://fxr.watson.org/fxr/ident?v=FREEBSD9;i=NFSERR_WRONGSEC
> 
> Thank you,
> Momchil

From owner-freebsd-fs@FreeBSD.ORG  Fri Feb 22 02:47:26 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 0A192F3C
 for <freebsd-fs@freebsd.org>; Fri, 22 Feb 2013 02:47:26 +0000 (UTC)
 (envelope-from bfriesen@simple.dallas.tx.us)
Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74])
 by mx1.freebsd.org (Postfix) with ESMTP id C3601EBB
 for <freebsd-fs@freebsd.org>; Fri, 22 Feb 2013 02:47:25 +0000 (UTC)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
 [65.66.246.65])
 by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id r1M2lHQu010153;
 Thu, 21 Feb 2013 20:47:17 -0600 (CST)
Date: Thu, 21 Feb 2013 20:47:17 -0600 (CST)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Kevin Day <toasty@dragondata.com>
Subject: Re: Improving ZFS performance for large directories
In-Reply-To: <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com>
Message-ID: <alpine.GSO.2.01.1302212046040.11141@freddy.simplesystems.org>
References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com>
 <CAJjvXiE+8OMu_yvdRAsWugH7W=fhFW7bicOLLyjEn8YrgvCwiw@mail.gmail.com>
 <F4420A8C-FB92-4771-B261-6C47A736CF7F@dragondata.com>
 <20130201192416.GA76461@server.rulingia.com>
 <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com>
User-Agent: Alpine 2.01 (GSO 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
 (blade.simplesystems.org [65.66.246.90]);
 Thu, 21 Feb 2013 20:47:17 -0600 (CST)
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Feb 2013 02:47:26 -0000

On Tue, 19 Feb 2013, Kevin Day wrote:

> Sorry for the late followup, I've been doing some testing with an L2ARC device.
>
>>> Doing it twice back-to-back makes a bit of difference but it's still slow either way.
>>
>> ZFS can very conservative about caching data and twice might not be enough.
>> I suggest you try 8-10 times, or until the time stops reducing.
>
> Timing doing an "ls" in large directories 20 times, the first is the slowest, then all subsequent listings are roughly the same. There doesn't appear to be any gain after 20 repetitions

You might consider that the bottleneck might be in 'ls' or something
outside of zfs.  Make sure that you are doing 'ls -f' or else you are
just measuring its sorting performance.

On a Solaris 10 system in a zfs directory with a million files:

% time ls -f |wc -l
  1000002
/bin/ls -F -f  0.76s user 0.93s system 89% cpu 1.897 total
wc -l  0.08s user 0.02s system 5% cpu 1.697 total

% time ls |wc -l
  1000000
/bin/ls -F  4.32s user 8.10s system 97% cpu 12.682 total
wc -l  0.08s user 0.02s system 0% cpu 12.432 total

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@FreeBSD.ORG  Fri Feb 22 08:38:39 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 27471153
 for <fs@freebsd.org>; Fri, 22 Feb 2013 08:38:39 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id AB456F01
 for <fs@freebsd.org>; Fri, 22 Feb 2013 08:38:38 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r1M8cV8b062737;
 Fri, 22 Feb 2013 10:38:31 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r1M8cV8b062737
Received: (from kostik@localhost)
 by tom.home (8.14.6/8.14.6/Submit) id r1M8cVrg062736;
 Fri, 22 Feb 2013 10:38:31 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Fri, 22 Feb 2013 10:38:31 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: cleaning files beyond EOF
Message-ID: <20130222083831.GK2598@kib.kiev.ua>
References: <20130217113031.N9271@besplex.bde.org>
 <20130217055528.GB2522@kib.kiev.ua>
 <20130217172928.C1900@besplex.bde.org>
 <20130217074832.GA2598@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="qr7nXUVd9Lj/wfVJ"
Content-Disposition: inline
In-Reply-To: <20130217074832.GA2598@kib.kiev.ua>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Feb 2013 08:38:39 -0000


--qr7nXUVd9Lj/wfVJ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Feb 17, 2013 at 09:48:32AM +0200, Konstantin Belousov wrote:
> But the ffs_getpages() might be indeed the culprit. It calls
> vm_page_zero_invalid(), which only has DEV_BSIZE granularity. I think
> that ffs_getpages() also should zero the after eof part of the last page
> of the file to fix your damage, since device read cannot read less than
> DEV_BSIZE.
>=20

Here is the updated patch, with the bug fixed which mis-calculated the
size for pmap_zero_page_area().

diff --git a/sys/ufs/ffs/ffs_vnops.c b/sys/ufs/ffs/ffs_vnops.c
index 5c99d5b..08508a4 100644
--- a/sys/ufs/ffs/ffs_vnops.c
+++ b/sys/ufs/ffs/ffs_vnops.c
@@ -829,9 +829,9 @@ static int
 ffs_getpages(ap)
 	struct vop_getpages_args *ap;
 {
-	int i;
 	vm_page_t mreq;
-	int pcount;
+	uint64_t size;
+	int i, isize, pcount;
=20
 	pcount =3D round_page(ap->a_count) / PAGE_SIZE;
 	mreq =3D ap->a_m[ap->a_reqpage];
@@ -846,6 +846,11 @@ ffs_getpages(ap)
 	if (mreq->valid) {
 		if (mreq->valid !=3D VM_PAGE_BITS_ALL)
 			vm_page_zero_invalid(mreq, TRUE);
+		size =3D VTOI(ap->a_vp)->i_size;
+		if (mreq->pindex =3D=3D OFF_TO_IDX(size)) {
+			isize =3D size & PAGE_MASK;
+			pmap_zero_page_area(mreq, isize, PAGE_SIZE - isize);
+		}
 		for (i =3D 0; i < pcount; i++) {
 			if (i !=3D ap->a_reqpage) {
 				vm_page_lock(ap->a_m[i]);

--qr7nXUVd9Lj/wfVJ
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRJy6GAAoJEJDCuSvBvK1BniAP/0OEx4cd92cKW7Q7yEvco7tZ
2TIgSAHHh9WHH2z1R3dWhhL0PleHd45JLEja5dVJ+NvTqcN8yrDGPwocHYIMSDaY
1ZsdQ47WI/fGar7z50j3CjG6lmLf3vlQunrY6sDPK4CNYgVzL/Zgvl8Mh+3kBNwF
OWyR9sXFdaAZlB3vhStpNmAR95HrQgwyop6BlYOwgKvl3y7Lk9w5vwNbOdJiA37t
aZm8ehSV/DMCFmot4N/Bo5iqRuX6Af7Jz4XsOuZ6IylAY29wAgbgzCGJ+ZkMFKYk
SqhNs5UfPpTawY6YPRCeUchqXh+uFZoGSIBheNx061jUvMeMnf2DvoQSdHPjh93t
+bXHvfXC95d0orAf+y3TsUamL/iMx8k3HnlKf4QYP7j2hDiRqIxtAV8+ueukkwlW
WVenDp2fIe9MH+EMkgetOjjZlopKNU5sfaeJDEaDo5ybFKm6EZad9YEapqGHSpeI
TgWtgOX3ETkc4Cn+U+xtBoUNEQv1YQD6TQ7gfMHsI7Y1rYyplX0PgZr0Sj7cRvJc
vKYtfWNTqiwlun02a63FEChiMdOwmZVan5XoaScjV4tcORef3+173R4HE41JRcPH
b8HNFWGdWTyp/9Jd7iw3FeaqJ8ojcsDQXSo+tbDGyIr5Q6VAm1abE1UfEVNl8Wyr
oQnnJR6DU8tNeEi59Urx
=q/e/
-----END PGP SIGNATURE-----

--qr7nXUVd9Lj/wfVJ--

From owner-freebsd-fs@FreeBSD.ORG  Fri Feb 22 18:43:51 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 0F0E54BD
 for <freebsd-fs@freebsd.org>; Fri, 22 Feb 2013 18:43:51 +0000 (UTC)
 (envelope-from momchil@xaxo.eu)
Received: from vps2.xaxo.eu (vps2.xaxo.eu [78.47.156.66])
 by mx1.freebsd.org (Postfix) with ESMTP id 79D641FB
 for <freebsd-fs@freebsd.org>; Fri, 22 Feb 2013 18:43:49 +0000 (UTC)
Received: from t61.xaxo.eu ([10.75.23.6])
 by vps2.xaxo.eu (8.14.4/8.14.4) with ESMTP id r1MIhlvj001957;
 Fri, 22 Feb 2013 19:43:48 +0100 (CET) (envelope-from momchil@xaxo.eu)
Date: Fri, 22 Feb 2013 19:43:39 +0100
Message-ID: <86txp4gpes.wl%momchil@xaxo.eu>
From: Momchil Ivanov <momchil@xaxo.eu>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: NFS + Kerberos
In-Reply-To: <1845485841.3202259.1361501159585.JavaMail.root@erie.cs.uoguelph.ca>
References: <86ip5lkvnm.wl%momchil@xaxo.eu>
 <1845485841.3202259.1361501159585.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka")
Content-Type: text/plain; charset=US-ASCII
Cc: freebsd-fs@freebsd.org, Momchil Ivanov <momchil@xaxo.eu>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Feb 2013 18:43:51 -0000

At Thu, 21 Feb 2013 21:45:59 -0500 (EST),
Rick Macklem wrote:
> 
> Momchil Ivanov wrote:
> > At Thu, 21 Feb 2013 18:17:56 -0500 (EST),
> > Rick Macklem wrote:
> > > Error 10016 is NFS4ERR_WRONGSEC. This means that the server expects
> > > a
> > > different security flavour (sys maybe) at some point in the mount.
> > 
> > btw you have a typo, it's NFSERR_WRONGSEC.
> Actually, it's called NFS4ERR_WRONGSEC in the RFC and NFSERR_WRONGSEC in
> the NFS sources, just to try and confuse you;-)

ok :)

> Just as an experiment, you could try adding "sys" to the -sec list
> for both lines. If the mount works then, it would tell you that the
> client isn't successfully getting a Kerberos credential and is
> falling back to using "sys" (called AUTH_SYS in the RFCs, just for
> further confusion;-).

I can mount with the following /etc/exports file:

V4: /tank/storage -sec=sys:krb5i:krb5p
/tank/storage -sec=sys:krb5i:krb5p

and the command:

mount -t nfs -o nfsv4,sec=sys srv.example.local:/ /mnt/srv

and without a kerberos ticket I can also mount with:

mount -t nfs -o nfsv4,sec=krb5i srv.example.local:/ /mnt/srv
mount -t nfs -o nfsv4,sec=krb5p srv.example.local:/ /mnt/srv

so it falls back to sys...

...

> Check to see what the user's credential cache file is called.
> If you "ls -l /tmp" you should be able to find it.
> 
> If it isn't called /tmp/krb5cc_<uid>, where <uid> is the uid for
> the user, then you will need the recent patch applied to the gssd.c
> that adds a "-s" option to search for the credential cache file in a list of
> directories. This patch is in head as r244604 and stable/9 as
> r245089, but not in any release. (Some sshds generate separate
> credential cache files for each login session, although not the
> default one in the system, as far as I understand.)

on the client machine with FreeBSD 8.2-STABLE as of around Dec 2011,
the file exists and is /tmp/krb5cc_1001, where 1001 is the uid of the
user that I am using to mount the nfs file system.

I have also tried to mount the file system from the server (FreeBSD
9.1) on the server itself using the same commands, I do get the
nfs/srv.example.local@EXAMPLE.LOCAL ticket, but it dies with the same
error:

nfsv4 err=10016
mount_nfs: /mnt/srv, : Input/output error

is there some way I can get verbose output from nfsd or gssd that
tells me why it is failing, or do you have any other ideas :) ?

Thank you,
Momchil

From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 23 00:04:24 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id E2BE4500
 for <freebsd-fs@freebsd.org>; Sat, 23 Feb 2013 00:04:24 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id ADAFD2FF
 for <freebsd-fs@freebsd.org>; Sat, 23 Feb 2013 00:04:24 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqEEAOUFKFGDaFvO/2dsb2JhbABEhk64FYJagSJzgh8BAQQBI1YFFhgCAg0FARMCWQaIHwase5IggSOMMASBAzQHEgGCGoETA4hojVKQY4MlgUwBBxce
X-IronPort-AV: E=Sophos;i="4.84,719,1355115600"; d="scan'208";a="15393134"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu.net.uoguelph.ca with ESMTP; 22 Feb 2013 19:04:23 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 352CBB41DC;
 Fri, 22 Feb 2013 19:04:23 -0500 (EST)
Date: Fri, 22 Feb 2013 19:04:23 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Momchil Ivanov <momchil@xaxo.eu>
Message-ID: <1103491143.3229700.1361577863159.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <86txp4gpes.wl%momchil@xaxo.eu>
Subject: Re: NFS + Kerberos
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org, Elias Martenson <lokedhs@gmail.com>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Feb 2013 00:04:24 -0000

Momchil Ivanov wrote:
> At Thu, 21 Feb 2013 21:45:59 -0500 (EST),
> Rick Macklem wrote:
> >
> > Momchil Ivanov wrote:
> > > At Thu, 21 Feb 2013 18:17:56 -0500 (EST),
> > > Rick Macklem wrote:
> > > > Error 10016 is NFS4ERR_WRONGSEC. This means that the server
> > > > expects
> > > > a
> > > > different security flavour (sys maybe) at some point in the
> > > > mount.
> > >
> > > btw you have a typo, it's NFSERR_WRONGSEC.
> > Actually, it's called NFS4ERR_WRONGSEC in the RFC and
> > NFSERR_WRONGSEC in
> > the NFS sources, just to try and confuse you;-)
> 
> ok :)
> 
> > Just as an experiment, you could try adding "sys" to the -sec list
> > for both lines. If the mount works then, it would tell you that the
> > client isn't successfully getting a Kerberos credential and is
> > falling back to using "sys" (called AUTH_SYS in the RFCs, just for
> > further confusion;-).
> 
> I can mount with the following /etc/exports file:
> 
> V4: /tank/storage -sec=sys:krb5i:krb5p
> /tank/storage -sec=sys:krb5i:krb5p
> 
> and the command:
> 
> mount -t nfs -o nfsv4,sec=sys srv.example.local:/ /mnt/srv
> 
> and without a kerberos ticket I can also mount with:
> 
> mount -t nfs -o nfsv4,sec=krb5i srv.example.local:/ /mnt/srv
> mount -t nfs -o nfsv4,sec=krb5p srv.example.local:/ /mnt/srv
> 
> so it falls back to sys...
> 
> ...
> 
> > Check to see what the user's credential cache file is called.
> > If you "ls -l /tmp" you should be able to find it.
> >
> > If it isn't called /tmp/krb5cc_<uid>, where <uid> is the uid for
> > the user, then you will need the recent patch applied to the gssd.c
> > that adds a "-s" option to search for the credential cache file in a
> > list of
> > directories. This patch is in head as r244604 and stable/9 as
> > r245089, but not in any release. (Some sshds generate separate
> > credential cache files for each login session, although not the
> > default one in the system, as far as I understand.)
> 
> on the client machine with FreeBSD 8.2-STABLE as of around Dec 2011,
> the file exists and is /tmp/krb5cc_1001, where 1001 is the uid of the
> user that I am using to mount the nfs file system.
> 
Ok, so you don't need the "-s" option for the gssd.

> I have also tried to mount the file system from the server (FreeBSD
> 9.1) on the server itself using the same commands, I do get the
> nfs/srv.example.local@EXAMPLE.LOCAL ticket, but it dies with the same
> error:
> 
> nfsv4 err=10016
> mount_nfs: /mnt/srv, : Input/output error
> 
> is there some way I can get verbose output from nfsd or gssd that
> tells me why it is failing, or do you have any other ideas :) ?
> 
You can run "gssd -d -d" and it will run in foreground and print
out messages related to resource allocation. This isn't much use,
except to tell you that it is doing something. (Adding a "verbose"
option is on my "to do" list, but I don't have any code at this time.
If someone wants to do this, I think it would be great.)

If you do this, don't have it started at boot (gssd_enable="NO" in
/etc/rc.conf) and then do the above command as root in a window
before attempting the mount command.

Beyond that, you could add printfs to gssd.c. The main client side
function is gssd_init_sec_context(), which should get the Kerberos
ticket for a user via their TGT.

I've added Elias to the cc list, since he just went through this
and might be able to help.

rick

> Thank you,
> Momchil

From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 23 03:57:10 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id A65CB649
 for <freebsd-fs@freebsd.org>; Sat, 23 Feb 2013 03:57:10 +0000 (UTC)
 (envelope-from jdc@koitsu.org)
Received: from qmta03.emeryville.ca.mail.comcast.net
 (qmta03.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:32])
 by mx1.freebsd.org (Postfix) with ESMTP id 6D85FA29
 for <freebsd-fs@freebsd.org>; Sat, 23 Feb 2013 03:57:10 +0000 (UTC)
Received: from omta21.emeryville.ca.mail.comcast.net ([76.96.30.88])
 by qmta03.emeryville.ca.mail.comcast.net with comcast
 id 3cG11l0021u4NiLA3fx9F3; Sat, 23 Feb 2013 03:57:09 +0000
Received: from koitsu.strangled.net ([67.180.84.87])
 by omta21.emeryville.ca.mail.comcast.net with comcast
 id 3fx81l00c1t3BNj8hfx9RQ; Sat, 23 Feb 2013 03:57:09 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
 id D8FB673A31; Fri, 22 Feb 2013 19:57:08 -0800 (PST)
Date: Fri, 22 Feb 2013 19:57:08 -0800
From: Jeremy Chadwick <jdc@koitsu.org>
To: Alexander Motin <mav@FreeBSD.org>
Subject: Re: disk "flipped" - a known problem?
Message-ID: <20130223035708.GA23614@icarus.home.lan>
References: <20130121221617.GA23909@icarus.home.lan>
 <50FED818.7070704@FreeBSD.org>
 <20130125083619.GA51096@icarus.home.lan>
 <20130125211232.GA3037@icarus.home.lan>
 <20130125212559.GA1772@icarus.home.lan>
 <20130125213209.GA1858@icarus.home.lan>
 <20130126011754.GA1806@icarus.home.lan>
 <51267055.3040500@FreeBSD.org>
 <20130221233609.GA92249@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130221233609.GA92249@icarus.home.lan>
User-Agent: Mutt/1.5.21 (2010-09-15)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net;
 s=q20121106; t=1361591829;
 bh=CXCEcgFQ1KOgXnPMZpxlAAIvGfe8VMQgvzd8wdtnrXU=;
 h=Received:Received:Received:Date:From:To:Subject:Message-ID:
 MIME-Version:Content-Type;
 b=MDvDZxv6U98jkGZVE5C70i74I4EH5tS9nUwGmzmYDv00uJ9vqcVpgkPon5H/BbF/X
 QwvLPL1anb6N6WrJGPi05CYRNMmEQCg8Glij3qmQBMGOIvAGDTLTqFszh7t29uXT47
 tzP9u0Ry3DOxvNuS5A3ejtk3sWLrKBkK2Ng/va7G+xTzGOkX8Y0leR6CJMY4UtdM4x
 UjC24+YBglX+cBkmBCWkq7CXWFzY29OTAm4zvub3eLEU9pI9ZuBhtnnFKfbagFNE6b
 TSIucXBNNaTn9VzpXLUjz6cFYkxARczAJTOIR+7rQBNJnrHLL1C1g0w0kmueZd/acI
 c9bYAb/da5iRA==
Cc: freebsd-fs@freebsd.org, avg@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Feb 2013 03:57:10 -0000

On Thu, Feb 21, 2013 at 03:36:09PM -0800, Jeremy Chadwick wrote:
> On Thu, Feb 21, 2013 at 09:07:01PM +0200, Alexander Motin wrote:
> > On 26.01.2013 03:17, Jeremy Chadwick wrote:
> > > Okay, I've figured out the exact, 100% reproducible condition that
> > > causes the situation.  It took me a lot of tries and a digital pocket
> > > recorder to take verbal notes (there are just too many things to look at
> > > simultaneously), but I've figured it out.
> > > 
> > > I'm sorry for the verbosity, but it's necessary.
> > > 
> > > Assume the disk we're talking about is /dev/ada5.
> > > 
> > > 1. Prior to any issues, we have this:
> > > 
> > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5*
> > > crw-r-----  1 root  operator  0x8c Jan 25 16:41 /dev/ada5
> > > crw-------  1 root  operator  0x75 Jan 25 16:35 /dev/pass5
> > > crw-------  1 root  operator  0x51 Jan 25 16:35 /dev/xpt0
> > > 
> > > 2. ada5 begins experiencing issues -- ATA commands (CDBs) submit do not
> > > get a response (not going to discuss how/why that can happen).
> > > 
> > > 3. These types of messages are seen on console (naturally the CDB and
> > > request type will vary -- in this case it was because I was doing the dd
> > > zero'ing, thus tickling the bad sector/naughty firmware on the drive):
> > > 
> > > Jan 25 16:29:28 icarus kernel: ahcich5: Timeout on slot 0 port 0
> > > Jan 25 16:29:28 icarus kernel: ahcich5: is 00000000 cs 00000000 ss 00000001 rs 00000001 tfd 40 serr 00000000 cmd 0004c017
> > > Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset...
> > > Jan 25 16:29:28 icarus kernel: ahcich5: SATA connect time=1000us status=00000113
> > > Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset: device found
> > > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED.  ACB: 61 80 80 77 01 40 00 00 00 00 00 00
> > > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout
> > > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): Retrying command
> > > 
> > > 4. Any I/O submit to ada5 during this time blocks (this is normal).
> > > 
> > > 5. **While this situation is happening**, something using xpt(4)
> > > attempts to submit a CDB to the disk (ex. smartctl -a /dev/ada5).
> > > This request also blocks (again, normal).
> > > 
> > > 6. Physical device falls off bus, or CAM kicks the disk off the bus.
> > > Doesn't matter which.  We see messages resembling this (boy am I tired
> > > of this interspersed output problem):
> > > 
> > > Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): lost device
> > > Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): lost device
> > > Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): removing device entry
> > > Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): passdevgonecb: devfs entry is gone
> > > 
> > > 7. Standard I/O requests fail with errno=6 "Device not configured".
> > > xpt(4) requests also fail with the same errno.
> > > 
> > > 8. Device-wise, at this stage all we have is:
> > > 
> > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5*
> > > crw-------  1 root  operator  0x51 Jan 25 16:35 /dev/xpt0
> > > 
> > > 9. Device comes back online for whatever reason.  FreeBSD sees the disk,
> > > blah blah blah:
> > > 
> > > Jan 25 16:30:16 icarus kernel: GEOM: new disk ada5
> > > Jan 25 16:30:16 icarus kernel: ada5: <WDC WD1500ADFD-00NLR4 21.07QR4> ATA-7 SATA 1.x device
> > > Jan 25 16:30:16 icarus kernel: ada5: Serial Number WD-WMAP41573589
> > > Jan 25 16:30:16 icarus kernel: ada5: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
> > > Jan 25 16:30:16 icarus kernel: ada5: Command Queueing enabled
> > > Jan 25 16:30:16 icarus kernel: ada5: 143089MB (293046768 512 byte sectors: 16H 63S/T 16383C)
> > > Jan 25 16:30:16 icarus kernel: ada5: Previously was known as ad14
> > > 
> > > ...um, where's pass5?
> > > 
> > > 10. /dev/pass5 is now completely (permanently) missing:
> > > 
> > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5*
> > > crw-r-----  1 root  operator  0x99 Jan 25 16:42 /dev/ada5
> > > crw-------  1 root  operator  0x51 Jan 25 16:35 /dev/xpt0
> > > 
> > > 11. Any further attempts to communicate via xpt(4) with ada5 fail.
> > > Detaching and reattaching the disk does not fix the issue; the only fix
> > > is to reboot the system.
> > > 
> > > 12. "camcontrol debug -IPXp scbus5" results in tons and tons of output
> > > all pertaining to xpt(4).  It looks like xpt(4) is in some kind of
> > > loop.
> > > 
> > > Below is my verbose boot (with non-kernel things removed), which
> > > also includes "camcontrol debug" output once things are in a bad state:
> > > 
> > > http://jdc.koitsu.org/freebsd/xpt_oddity.log
> > > 
> > > In this log you'll see that after 1 CAM timeout I yanked the drive, then
> > > roughly 30 seconds later reinserted it.
> > > 
> > > If you need me to turn on CAM debugging *prior* to the above, I can do
> > > that, just let me know.
> > > 
> > > The important step is #5.  Without that, the problem shown in #9/10/11
> > > does not happen.
> > > 
> > > It's a good thing I don't run smartd(8) -- most users I see using that
> > > software set the interval to something like 180s or 60s.  Imagine this
> > > frustration: "okay so the disk fell off the bus, but what, now I can't
> > > talk to it with SMART?  Uhhh... <reboots>  Err, works now?  Whatever".
> > 
> > I think, the problem may already be fixed in HEAD by r244014 by ken@.
> > I've just merged it to 9-STABLE at r247115. So if it is still possible
> > to reproduce the situation, it would be good to try.
> 
> Yep, I saw the commit per svn-src-stable-9@freebsd.org, along with
> a bunch of others; I wasn't sure if r247114 or r247115 fixed it, so ws
> waiting for a follow-up from you.  :-)
> 
> I'll rebuild world/kernel and try it out + report back.  Thank you (and
> ken@ too!) for the work on this.

Got around to this today -- I can confirm as of r247132 on stable/9 the
above problem is gone.  Verification details below, for those who care:

Initial attachment:

Feb 22 19:40:32 icarus kernel: ada5 at ahcich5 bus 0 scbus5 target 0 lun 0
Feb 22 19:40:32 icarus kernel: ada5: <ST3750630AS HP24> ATA-8 SATA 2.x device
Feb 22 19:40:32 icarus kernel: ada5: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
Feb 22 19:40:32 icarus kernel: ada5: Command Queueing enabled
Feb 22 19:40:32 icarus kernel: ada5: 715404MB (1465149168 512 byte sectors: 16H 63S/T 16383C)
Feb 22 19:40:32 icarus kernel: ada5: Previously was known as ad14

Ran dd if=/dev/zero of=/dev/ada5 bs=64k.  Timeouts occurring due to
physical issues with the disk itself:

Feb 22 19:44:01 icarus kernel: ahcich5: Timeout on slot 0 port 0
Feb 22 19:44:01 icarus kernel: ahcich5: is 00000000 cs 00000000 ss 00000001 rs 00000001 tfd 40 serr 00000000 cmd 0004c017
Feb 22 19:44:01 icarus kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 80 00 0c 00 40 00 00 00 00 00 00
Feb 22 19:44:01 icarus kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout
Feb 22 19:44:01 icarus kernel: (ada5:ahcich5:0:0:0): Retrying command
Feb 22 19:44:33 icarus kernel: ahcich5: Timeout on slot 0 port 0
Feb 22 19:44:33 icarus kernel: ahcich5: is 00000000 cs 00000000 ss 00000001 rs 00000001 tfd 40 serr 00000000 cmd 0004c017
Feb 22 19:44:33 icarus kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 80 00 18 00 40 00 00 00 00 00 00
Feb 22 19:44:33 icarus kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout
Feb 22 19:44:33 icarus kernel: (ada5:ahcich5:0:0:0): Retrying command

Initiated smartctl -a /dev/ada5, which blocked as expected.  Timeouts
still happening, and to speed up the process I yanked the disk:

Feb 22 19:45:32 icarus kernel: ahcich5: Timeout on slot 0 port 0
Feb 22 19:45:32 icarus kernel: ahcich5: is 00000000 cs 00000000 ss 00000001 rs 00000001 tfd 40 serr 00000000 cmd 0004c017
Feb 22 19:45:32 icarus kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 80 80 25 00 40 00 00 00 00 00 00
Feb 22 19:45:32 icarus kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout
Feb 22 19:45:32 icarus kernel: (ada5:ahcich5:0:0:0): Retrying command
Feb 22 19:45:55 icarus kernel: (ada5:ahcich5:0:0:0): lost device
Feb 22 19:45:55 icarus kernel: (ada5:ahcich5:0:0:0): removing device entry

After yanking:

root@icarus:~ # smartctl -a /dev/ada5
smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

(pass5:ahcich5:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 01 00
(pass5:ahcich5:0:0:0): CAM status: Unconditionally Re-queue Request
smartctl: cam_send_ccb: Device not configured

Checking devices:

root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5*
crw-------  1 root  operator  0x52 Feb 21 21:37 /dev/xpt0

Reinserted disk:

Feb 22 19:47:53 icarus kernel: ada5 at ahcich5 bus 0 scbus5 target 0 lun 0
Feb 22 19:47:53 icarus kernel: ada5: <ST3750630AS HP24> ATA-8 SATA 2.x device
Feb 22 19:47:53 icarus kernel: ada5: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
Feb 22 19:47:53 icarus kernel: ada5: Command Queueing enabled
Feb 22 19:47:53 icarus kernel: ada5: 715404MB (1465149168 512 byte sectors: 16H 63S/T 16383C)
Feb 22 19:47:53 icarus kernel: ada5: Previously was known as ad14

Devices look good:

root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5*
crw-r-----  1 root  operator  0x98 Feb 22 19:47 /dev/ada5
crw-------  1 root  operator  0x96 Feb 22 19:47 /dev/pass5
crw-------  1 root  operator  0x52 Feb 21 21:37 /dev/xpt0

And smartctl works fine.  :-)

(Footnote for readers: The previous WD disk I was testing with went
belly up in an even worse way so I couldn't use it, but thankfully I've
lots of bad disks that exhibit repeated timeouts during I/O.  *pats his
angry ST3750630AS*)

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 23 09:39:44 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 47488FCF;
 Sat, 23 Feb 2013 09:39:44 +0000 (UTC)
 (envelope-from utisoft@gmail.com)
Received: from mail-ia0-x232.google.com (mail-ia0-x232.google.com
 [IPv6:2607:f8b0:4001:c02::232])
 by mx1.freebsd.org (Postfix) with ESMTP id 07F25322;
 Sat, 23 Feb 2013 09:39:43 +0000 (UTC)
Received: by mail-ia0-f178.google.com with SMTP id y26so1218457iab.37
 for <multiple recipients>; Sat, 23 Feb 2013 01:39:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=nTYqMs4HMt4lkQehenOajxB6rIn7pOhOmw3BPqClYrA=;
 b=tgjUtVTtMg5SmOu3V76xx38zFVeTx/EuGxBYZbet/FuxTl06pgY0PS4FqG6eZy9OFm
 P8HAyl8uxQODJoUoZA+P6SN+DecezZia2fu93TrmFGOxKIr2ZaU4p6KVugL4/U0y8Y3I
 HT4XkyPu6X997C+9KuFQxl6mErck9bdnpX+Nn/8jfT+Y/sRSfl+VYOxfgIPlUrwlLgOf
 QKXJMRTyh60dtmnsx/f35oVOEptvyRmkK2/0pZA76x/K95vBH13bIUI0ORsqdiYPjeRw
 /GU4jGarLxoX9O9cnv6cNJkJzu7Bh0fOMZI3/kVf12M4FlXi+CvYYzKVSGx7jAc5Huwa
 T1xg==
MIME-Version: 1.0
X-Received: by 10.50.152.169 with SMTP id uz9mr677475igb.15.1361612383645;
 Sat, 23 Feb 2013 01:39:43 -0800 (PST)
Received: by 10.64.63.12 with HTTP; Sat, 23 Feb 2013 01:39:43 -0800 (PST)
Received: by 10.64.63.12 with HTTP; Sat, 23 Feb 2013 01:39:43 -0800 (PST)
In-Reply-To: <51267055.3040500@FreeBSD.org>
References: <20130121221617.GA23909@icarus.home.lan>
 <50FED818.7070704@FreeBSD.org>
 <20130125083619.GA51096@icarus.home.lan>
 <20130125211232.GA3037@icarus.home.lan>
 <20130125212559.GA1772@icarus.home.lan>
 <20130125213209.GA1858@icarus.home.lan>
 <20130126011754.GA1806@icarus.home.lan>
 <51267055.3040500@FreeBSD.org>
Date: Sat, 23 Feb 2013 09:39:43 +0000
Message-ID: <CADLo839OaZ-HfXW9HjKd0pTR66Sd4zzf8Y7ervZrb0-Oem4+cQ@mail.gmail.com>
Subject: Re: disk "flipped" - a known problem?
From: Chris Rees <utisoft@gmail.com>
To: mav@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: Jeremy Chadwick <jdc@koitsu.org>, freebsd-fs@freebsd.org,
 Andriy Gapon <avg@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Feb 2013 09:39:44 -0000

On 21 Feb 2013 19:07, "Alexander Motin" <mav@freebsd.org> wrote:
>
> On 26.01.2013 03:17, Jeremy Chadwick wrote:
> > Okay, I've figured out the exact, 100% reproducible condition that
> > causes the situation.  It took me a lot of tries and a digital pocket
> > recorder to take verbal notes (there are just too many things to look at
> > simultaneously), but I've figured it out.
> >
> > I'm sorry for the verbosity, but it's necessary.
> >
> > Assume the disk we're talking about is /dev/ada5.
> >
> > 1. Prior to any issues, we have this:
> >
> > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5*
> > crw-r-----  1 root  operator  0x8c Jan 25 16:41 /dev/ada5
> > crw-------  1 root  operator  0x75 Jan 25 16:35 /dev/pass5
> > crw-------  1 root  operator  0x51 Jan 25 16:35 /dev/xpt0
> >
> > 2. ada5 begins experiencing issues -- ATA commands (CDBs) submit do not
> > get a response (not going to discuss how/why that can happen).
> >
> > 3. These types of messages are seen on console (naturally the CDB and
> > request type will vary -- in this case it was because I was doing the dd
> > zero'ing, thus tickling the bad sector/naughty firmware on the drive):
> >
> > Jan 25 16:29:28 icarus kernel: ahcich5: Timeout on slot 0 port 0
> > Jan 25 16:29:28 icarus kernel: ahcich5: is 00000000 cs 00000000 ss
00000001 rs 00000001 tfd 40 serr 00000000 cmd 0004c017
> > Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset...
> > Jan 25 16:29:28 icarus kernel: ahcich5: SATA connect time=1000us
status=00000113
> > Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset: device found
> > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0):
WRITE_FPDMA_QUEUED.  ACB: 61 80 80 77 01 40 00 00 00 00 00 00
> > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): CAM status:
Command timeout
> > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): Retrying command
> >
> > 4. Any I/O submit to ada5 during this time blocks (this is normal).
> >
> > 5. **While this situation is happening**, something using xpt(4)
> > attempts to submit a CDB to the disk (ex. smartctl -a /dev/ada5).
> > This request also blocks (again, normal).
> >
> > 6. Physical device falls off bus, or CAM kicks the disk off the bus.
> > Doesn't matter which.  We see messages resembling this (boy am I tired
> > of this interspersed output problem):
> >
> > Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): lost device
> > Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): lost device
> > Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): removing device
entry
> > Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): passdevgonecb:
devfs entry is gone
> >
> > 7. Standard I/O requests fail with errno=6 "Device not configured".
> > xpt(4) requests also fail with the same errno.
> >
> > 8. Device-wise, at this stage all we have is:
> >
> > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5*
> > crw-------  1 root  operator  0x51 Jan 25 16:35 /dev/xpt0
> >
> > 9. Device comes back online for whatever reason.  FreeBSD sees the disk,
> > blah blah blah:
> >
> > Jan 25 16:30:16 icarus kernel: GEOM: new disk ada5
> > Jan 25 16:30:16 icarus kernel: ada5: <WDC WD1500ADFD-00NLR4 21.07QR4>
ATA-7 SATA 1.x device
> > Jan 25 16:30:16 icarus kernel: ada5: Serial Number WD-WMAP41573589
> > Jan 25 16:30:16 icarus kernel: ada5: 150.000MB/s transfers (SATA 1.x,
UDMA6, PIO 8192bytes)
> > Jan 25 16:30:16 icarus kernel: ada5: Command Queueing enabled
> > Jan 25 16:30:16 icarus kernel: ada5: 143089MB (293046768 512 byte
sectors: 16H 63S/T 16383C)
> > Jan 25 16:30:16 icarus kernel: ada5: Previously was known as ad14
> >
> > ...um, where's pass5?
> >
> > 10. /dev/pass5 is now completely (permanently) missing:
> >
> > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5*
> > crw-r-----  1 root  operator  0x99 Jan 25 16:42 /dev/ada5
> > crw-------  1 root  operator  0x51 Jan 25 16:35 /dev/xpt0
> >
> > 11. Any further attempts to communicate via xpt(4) with ada5 fail.
> > Detaching and reattaching the disk does not fix the issue; the only fix
> > is to reboot the system.
> >
> > 12. "camcontrol debug -IPXp scbus5" results in tons and tons of output
> > all pertaining to xpt(4).  It looks like xpt(4) is in some kind of
> > loop.
> >
> > Below is my verbose boot (with non-kernel things removed), which
> > also includes "camcontrol debug" output once things are in a bad state:
> >
> > http://jdc.koitsu.org/freebsd/xpt_oddity.log
> >
> > In this log you'll see that after 1 CAM timeout I yanked the drive, then
> > roughly 30 seconds later reinserted it.
> >
> > If you need me to turn on CAM debugging *prior* to the above, I can do
> > that, just let me know.
> >
> > The important step is #5.  Without that, the problem shown in #9/10/11
> > does not happen.
> >
> > It's a good thing I don't run smartd(8) -- most users I see using that
> > software set the interval to something like 180s or 60s.  Imagine this
> > frustration: "okay so the disk fell off the bus, but what, now I can't
> > talk to it with SMART?  Uhhh... <reboots>  Err, works now?  Whatever".
>
> I think, the problem may already be fixed in HEAD by r244014 by ken@.
> I've just merged it to 9-STABLE at r247115. So if it is still possible
> to reproduce the situation, it would be good to try.

I think I've been having the same troubles since upgrading from 9.0, so I'm
going to try applying that to 9.1-R and I'll also give feedback.

Chris

From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 23 12:00:20 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id BC43F6EE
 for <freebsd-fs@freebsd.org>; Sat, 23 Feb 2013 12:00:20 +0000 (UTC)
 (envelope-from momchil@xaxo.eu)
Received: from vps2.xaxo.eu (vps2.xaxo.eu [78.47.156.66])
 by mx1.freebsd.org (Postfix) with ESMTP id 3970BA84
 for <freebsd-fs@freebsd.org>; Sat, 23 Feb 2013 12:00:19 +0000 (UTC)
Received: from t61.xaxo.eu ([10.75.23.6])
 by vps2.xaxo.eu (8.14.4/8.14.4) with ESMTP id r1NC0CaJ017602;
 Sat, 23 Feb 2013 13:00:12 +0100 (CET) (envelope-from momchil@xaxo.eu)
Date: Sat, 23 Feb 2013 13:00:03 +0100
Message-ID: <86hal3kzp8.wl%momchil@xaxo.eu>
From: Momchil Ivanov <momchil@xaxo.eu>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: NFS + Kerberos
In-Reply-To: <1103491143.3229700.1361577863159.JavaMail.root@erie.cs.uoguelph.ca>
References: <86txp4gpes.wl%momchil@xaxo.eu>	<1103491143.3229700.1361577863159.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka")
Content-Type: text/plain; charset=US-ASCII
Cc: freebsd-fs@freebsd.org, Elias Martenson <lokedhs@gmail.com>,
 Momchil Ivanov <momchil@xaxo.eu>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Feb 2013 12:00:20 -0000

At Fri, 22 Feb 2013 19:04:23 -0500 (EST),
Rick Macklem wrote:
> You can run "gssd -d -d" and it will run in foreground and print
> out messages related to resource allocation. This isn't much use,
> except to tell you that it is doing something. (Adding a "verbose"
> option is on my "to do" list, but I don't have any code at this time.
> If someone wants to do this, I think it would be great.)
> 
> If you do this, don't have it started at boot (gssd_enable="NO" in
> /etc/rc.conf) and then do the above command as root in a window
> before attempting the mount command.
> 
> Beyond that, you could add printfs to gssd.c. The main client side
> function is gssd_init_sec_context(), which should get the Kerberos
> ticket for a user via their TGT.

well, the server doesn't seem to start it at boot with
gssd_enable="YES", I don't know why, but I cannot stop/restart nfsd
until I manually start gssd :) the client starts it at boot, though

note: I can ssh into the server even when gssd is not running, I don't
know if this is expected.

"gssd -d -d" prints things like this on the client and the server:

1 resources allocated
2 resources allocated
1 resources allocated
0 resources allocated
1 resources allocated
2 resources allocated
1 resources allocated
0 resources allocated
1 resources allocated
2 resources allocated
1 resources allocated
0 resources allocated

which doesn't tell me anything :) so here is what happens on the
client without a kerberos ticket:

1 resources allocated
/usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> init_sec_context_args
uid:       1001
cred:      0
ctx:       0
name:      5848115787646107649
req_flags: 5848115787646107650
> gss_resources
i=0
gr_id  :5848115787646107649
gr_res :0x28203060
/usr/src/usr.sbin/gssd/gssd.c:307 argp->name
/usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
/usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
0 resources allocated
1 resources allocated
/usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> init_sec_context_args
uid:       1001
cred:      0
ctx:       0
name:      5848115787646107650
req_flags: 5848115787646107650
> gss_resources
i=0
gr_id  :5848115787646107650
gr_res :0x28203060
/usr/src/usr.sbin/gssd/gssd.c:307 argp->name
/usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
/usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
0 resources allocated
1 resources allocated
/usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> init_sec_context_args
uid:       1001
cred:      0
ctx:       0
name:      5848115787646107651
req_flags: 5848115787646107650
> gss_resources
i=0
gr_id  :5848115787646107651
gr_res :0x28203060
/usr/src/usr.sbin/gssd/gssd.c:307 argp->name
/usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
/usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
0 resources allocated

here is what happens with a kerberos ticket:

1 resources allocated
/usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> init_sec_context_args
uid:       1001
cred:      0
ctx:       0
name:      5848116041049178113
req_flags: 5848116041049178114
> gss_resources
i=0
gr_id  :5848116041049178113
gr_res :0x28203060
/usr/src/usr.sbin/gssd/gssd.c:307 argp->name
/usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
/usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
2 resources allocated
/usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED
1 resources allocated
0 resources allocated
1 resources allocated
/usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> init_sec_context_args
uid:       1001
cred:      0
ctx:       0
name:      5848116041049178115
req_flags: 5848116041049178114
> gss_resources
i=0
gr_id  :5848116041049178115
gr_res :0x28203060
/usr/src/usr.sbin/gssd/gssd.c:307 argp->name
/usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
/usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
2 resources allocated
/usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED
1 resources allocated
0 resources allocated
1 resources allocated
/usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> init_sec_context_args
uid:       1001
cred:      0
ctx:       0
name:      5848116041049178117
req_flags: 5848116041049178114
> gss_resources
i=0
gr_id  :5848116041049178117
gr_res :0x28203060
/usr/src/usr.sbin/gssd/gssd.c:307 argp->name
/usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
/usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
2 resources allocated
/usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED
1 resources allocated
0 resources allocated

here is what I have changed:

--- gssd.c.orig	2013-02-23 11:13:20.000000000 +0100
+++ gssd.c	2013-02-23 12:34:33.000000000 +0100
@@ -238,6 +238,33 @@
 	return (TRUE);
 }
 
+static void
+dump_resources(FILE *s)
+{
+	struct gss_resource *gr;
+	int i;
+
+	fprintf(s, "> gss_resources\n");
+
+	i = 0;
+	LIST_FOREACH(gr, &gss_resources, gr_link) {
+	  fprintf(s, "i=%d\n", i);
+	  fprintf(s, "gr_id  :%llu\n", gr->gr_id);
+	  fprintf(s, "gr_res :%p\n", gr->gr_res);
+	}
+}
+
+void
+dump_init_sec_context_args(FILE *s, init_sec_context_args *p)
+{
+	fprintf(s, "> init_sec_context_args\n");
+	fprintf(s, "uid:       %d\n", p->uid);
+	fprintf(s, "cred:      %llu\n", p->cred);
+	fprintf(s, "ctx:       %llu\n", p->ctx);
+	fprintf(s, "name:      %llu\n", p->name);
+	fprintf(s, "req_flags: %llu\n", p->req_flags);
+}
+
 bool_t
 gssd_init_sec_context_1_svc(init_sec_context_args *argp, init_sec_context_res *result, struct svc_req *rqstp)
 {
@@ -248,27 +275,42 @@
 
 	snprintf(ccname, sizeof(ccname), "FILE:/tmp/krb5cc_%d",
 	    (int) argp->uid);
+
+	printf("%s:%d %s\n", __FILE__, __LINE__, ccname);
+	dump_init_sec_context_args(stdout, argp);
+	dump_resources(stdout);
+
 	setenv("KRB5CCNAME", ccname, TRUE);
 
 	memset(result, 0, sizeof(*result));
 	if (argp->cred) {
+                printf("%s:%d argp->cred\n", __FILE__, __LINE__);
 		cred = gssd_find_resource(argp->cred);
+		printf("%s:%d cred=%llu\n", __FILE__, __LINE__, cred);
 		if (!cred) {
 			result->major_status = GSS_S_CREDENTIALS_EXPIRED;
+			printf("%s:%d GSS_S_CREDENTIALS_EXPIRED\n", __FILE__, __LINE__);
 			return (TRUE);
 		}
 	}
 	if (argp->ctx) {
+	        printf("%s:%d argp->ctx\n", __FILE__, __LINE__);
 		ctx = gssd_find_resource(argp->ctx);
+		printf("%s:%d ctx=%llu\n", __FILE__, __LINE__, ctx);
 		if (!ctx) {
 			result->major_status = GSS_S_CONTEXT_EXPIRED;
+			printf("%s:%d GSS_S_CONTEXT_EXPIRED\n", __FILE__, __LINE__);
 			return (TRUE);
 		}
 	}
 	if (argp->name) {
+	        printf("%s:%d argp->name\n", __FILE__, __LINE__);
 		name = gssd_find_resource(argp->name);
+		printf("%s:%d name=%llu\n", __FILE__, __LINE__, name);
+		printf("%s:%d name=%p\n", __FILE__, __LINE__, name);
 		if (!name) {
 			result->major_status = GSS_S_BAD_NAME;
+			printf("%s:%d GSS_S_BAD_NAME\n", __FILE__, __LINE__);
 			return (TRUE);
 		}
 	}
@@ -286,6 +328,11 @@
 			result->ctx = argp->ctx;
 		else
 			result->ctx = gssd_make_resource(ctx);
+
+		if (result->major_status == GSS_S_COMPLETE)
+			printf("%s:%d GSS_S_COMPLETE\n", __FILE__, __LINE__);
+		else
+			printf("%s:%d GSS_S_CONTINUE_NEEDED\n", __FILE__, __LINE__);
 	}
 
 	return (TRUE);

Ideas?

Thank you,
Momchil

From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 23 15:35:48 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 4715A221
 for <freebsd-fs@freebsd.org>; Sat, 23 Feb 2013 15:35:48 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id D792E1F7
 for <freebsd-fs@freebsd.org>; Sat, 23 Feb 2013 15:35:47 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEABPhKFGDaFvO/2dsb2JhbABFhk+4KYJagR9zgh8BAQQBIwRSBRYYAgINGQJZBoggBq0ZhBCNcIEjjDSBAzQHgi2BEwOIaY1UkGWDJYIJ
X-IronPort-AV: E=Sophos;i="4.84,721,1355115600"; d="scan'208";a="17925705"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 23 Feb 2013 10:35:47 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 1288CB3EE4;
 Sat, 23 Feb 2013 10:35:47 -0500 (EST)
Date: Sat, 23 Feb 2013 10:35:47 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Momchil Ivanov <momchil@xaxo.eu>
Message-ID: <508324799.3234256.1361633747016.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <86hal3kzp8.wl%momchil@xaxo.eu>
Subject: Re: NFS + Kerberos
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org, Elias Martenson <lokedhs@gmail.com>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Feb 2013 15:35:48 -0000

Momchil Ivanov wrote:
> At Fri, 22 Feb 2013 19:04:23 -0500 (EST),
> Rick Macklem wrote:
> > You can run "gssd -d -d" and it will run in foreground and print
> > out messages related to resource allocation. This isn't much use,
> > except to tell you that it is doing something. (Adding a "verbose"
> > option is on my "to do" list, but I don't have any code at this
> > time.
> > If someone wants to do this, I think it would be great.)
> >
> > If you do this, don't have it started at boot (gssd_enable="NO" in
> > /etc/rc.conf) and then do the above command as root in a window
> > before attempting the mount command.
> >
> > Beyond that, you could add printfs to gssd.c. The main client side
> > function is gssd_init_sec_context(), which should get the Kerberos
> > ticket for a user via their TGT.
> 
> well, the server doesn't seem to start it at boot with
> gssd_enable="YES", I don't know why, but I cannot stop/restart nfsd
> until I manually start gssd :) the client starts it at boot, though
> 
> note: I can ssh into the server even when gssd is not running, I don't
> know if this is expected.
> 
Yes. The gssd only handles upcalls from the kernel and only NFS does
those at this time.

> "gssd -d -d" prints things like this on the client and the server:
> 
> 1 resources allocated
> 2 resources allocated
> 1 resources allocated
> 0 resources allocated
> 1 resources allocated
> 2 resources allocated
> 1 resources allocated
> 0 resources allocated
> 1 resources allocated
> 2 resources allocated
> 1 resources allocated
> 0 resources allocated
> 
> which doesn't tell me anything :) so here is what happens on the
> client without a kerberos ticket:
> 
> 1 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> > init_sec_context_args
> uid: 1001
> cred: 0
> ctx: 0
> name: 5848115787646107649
> req_flags: 5848115787646107650
> > gss_resources
> i=0
> gr_id :5848115787646107649
> gr_res :0x28203060
> /usr/src/usr.sbin/gssd/gssd.c:307 argp->name
> /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
> /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
> 0 resources allocated
> 1 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> > init_sec_context_args
> uid: 1001
> cred: 0
> ctx: 0
> name: 5848115787646107650
> req_flags: 5848115787646107650
> > gss_resources
> i=0
> gr_id :5848115787646107650
> gr_res :0x28203060
> /usr/src/usr.sbin/gssd/gssd.c:307 argp->name
> /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
> /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
> 0 resources allocated
> 1 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> > init_sec_context_args
> uid: 1001
> cred: 0
> ctx: 0
> name: 5848115787646107651
> req_flags: 5848115787646107650
> > gss_resources
> i=0
> gr_id :5848115787646107651
> gr_res :0x28203060
> /usr/src/usr.sbin/gssd/gssd.c:307 argp->name
> /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
> /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
> 0 resources allocated
> 
> here is what happens with a kerberos ticket:
> 
> 1 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> > init_sec_context_args
> uid: 1001
> cred: 0
> ctx: 0
> name: 5848116041049178113
> req_flags: 5848116041049178114
> > gss_resources
> i=0
> gr_id :5848116041049178113
> gr_res :0x28203060
> /usr/src/usr.sbin/gssd/gssd.c:307 argp->name
> /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
> /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
> 2 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED
> 1 resources allocated
> 0 resources allocated
> 1 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> > init_sec_context_args
> uid: 1001
> cred: 0
> ctx: 0
> name: 5848116041049178115
> req_flags: 5848116041049178114
> > gss_resources
> i=0
> gr_id :5848116041049178115
> gr_res :0x28203060
> /usr/src/usr.sbin/gssd/gssd.c:307 argp->name
> /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
> /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
> 2 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED
> 1 resources allocated
> 0 resources allocated
> 1 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> > init_sec_context_args
> uid: 1001
> cred: 0
> ctx: 0
> name: 5848116041049178117
> req_flags: 5848116041049178114
> > gss_resources
> i=0
> gr_id :5848116041049178117
> gr_res :0x28203060
> /usr/src/usr.sbin/gssd/gssd.c:307 argp->name
> /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
> /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
> 2 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED
> 1 resources allocated
> 0 resources allocated
> 
The above looks reasonable. Once the gssd_init_sec_context()
replies GSS_S_CONTINUE_NEEDED to the kernel with the token
that has the session ticket in it, then...
- The NFS client sends a Null RPC to the server with an
  authenticator of type RPCSEC_GSS with
  version: 1
  RPCSEC_GSS_INIT
  and the token as data on the Null RPC (a Null RPC woundn't
  normally have any data)
- the server processes this via an upcall to its gssd for
  gssd_accept_sec_context()
  - this would get a reply of GSS_S_COMPLETE I think (although
    there may be cases where a GSS_S_CONTINUE_NEEDED occurs
    and there is another cycle of Null RPC token passing)
- this would result in a reply to the Null RPC with an
  RPCSEC_GSS authenticator with roughly:
  - a credential handle (shorthand bits for the user principal)
  - GSS_S_COMPLETE status
- it has been a while, so I can't remember for sure, but I think
  a successful reply includes a token that is passed up to the
  gssd via gssd_init_sec_context() and then it would return
  GSS_S_COMPLETE when the token is processed.

My guess is that it is the server that is replying with some
failure status GSS_S_xxx.

At this point, you can either try and look at the Null RPC
in wireshark or add printfs to the server's gssd.c for
gssd_accept_sec_context() and gssd_acquire_cred().

Basically, the above looks ok, but the rest of the handshake
that generates a credential handle via the Kerberos session
ticket hasn't happened.

Good luck with it, rick

> here is what I have changed:
> 
> --- gssd.c.orig 2013-02-23 11:13:20.000000000 +0100
> +++ gssd.c 2013-02-23 12:34:33.000000000 +0100
> @@ -238,6 +238,33 @@
> return (TRUE);
> }
> 
> +static void
> +dump_resources(FILE *s)
> +{
> + struct gss_resource *gr;
> + int i;
> +
> + fprintf(s, "> gss_resources\n");
> +
> + i = 0;
> + LIST_FOREACH(gr, &gss_resources, gr_link) {
> + fprintf(s, "i=%d\n", i);
> + fprintf(s, "gr_id :%llu\n", gr->gr_id);
> + fprintf(s, "gr_res :%p\n", gr->gr_res);
> + }
> +}
> +
> +void
> +dump_init_sec_context_args(FILE *s, init_sec_context_args *p)
> +{
> + fprintf(s, "> init_sec_context_args\n");
> + fprintf(s, "uid: %d\n", p->uid);
> + fprintf(s, "cred: %llu\n", p->cred);
> + fprintf(s, "ctx: %llu\n", p->ctx);
> + fprintf(s, "name: %llu\n", p->name);
> + fprintf(s, "req_flags: %llu\n", p->req_flags);
> +}
> +
> bool_t
> gssd_init_sec_context_1_svc(init_sec_context_args *argp,
> init_sec_context_res *result, struct svc_req *rqstp)
> {
> @@ -248,27 +275,42 @@
> 
> snprintf(ccname, sizeof(ccname), "FILE:/tmp/krb5cc_%d",
> (int) argp->uid);
> +
> + printf("%s:%d %s\n", __FILE__, __LINE__, ccname);
> + dump_init_sec_context_args(stdout, argp);
> + dump_resources(stdout);
> +
> setenv("KRB5CCNAME", ccname, TRUE);
> 
> memset(result, 0, sizeof(*result));
> if (argp->cred) {
> + printf("%s:%d argp->cred\n", __FILE__, __LINE__);
> cred = gssd_find_resource(argp->cred);
> + printf("%s:%d cred=%llu\n", __FILE__, __LINE__, cred);
> if (!cred) {
> result->major_status = GSS_S_CREDENTIALS_EXPIRED;
> + printf("%s:%d GSS_S_CREDENTIALS_EXPIRED\n", __FILE__, __LINE__);
> return (TRUE);
> }
> }
> if (argp->ctx) {
> + printf("%s:%d argp->ctx\n", __FILE__, __LINE__);
> ctx = gssd_find_resource(argp->ctx);
> + printf("%s:%d ctx=%llu\n", __FILE__, __LINE__, ctx);
> if (!ctx) {
> result->major_status = GSS_S_CONTEXT_EXPIRED;
> + printf("%s:%d GSS_S_CONTEXT_EXPIRED\n", __FILE__, __LINE__);
> return (TRUE);
> }
> }
> if (argp->name) {
> + printf("%s:%d argp->name\n", __FILE__, __LINE__);
> name = gssd_find_resource(argp->name);
> + printf("%s:%d name=%llu\n", __FILE__, __LINE__, name);
> + printf("%s:%d name=%p\n", __FILE__, __LINE__, name);
> if (!name) {
> result->major_status = GSS_S_BAD_NAME;
> + printf("%s:%d GSS_S_BAD_NAME\n", __FILE__, __LINE__);
> return (TRUE);
> }
> }
> @@ -286,6 +328,11 @@
> result->ctx = argp->ctx;
> else
> result->ctx = gssd_make_resource(ctx);
> +
> + if (result->major_status == GSS_S_COMPLETE)
> + printf("%s:%d GSS_S_COMPLETE\n", __FILE__, __LINE__);
> + else
> + printf("%s:%d GSS_S_CONTINUE_NEEDED\n", __FILE__, __LINE__);
> }
> 
> return (TRUE);
> 
> Ideas?
> 
> Thank you,
> Momchil

From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 23 15:57:16 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 94AB583D
 for <freebsd-fs@freebsd.org>; Sat, 23 Feb 2013 15:57:16 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 49AE92A0
 for <freebsd-fs@freebsd.org>; Sat, 23 Feb 2013 15:57:15 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEAP7kKFGDaFvO/2dsb2JhbABFhk+4KYJagR9zgh8BAQQBIwRSBRYYAgINGQJZBoggBq0hhBCNboEjjDSBAzQHgi2BEwOIaY1UkGWDJYIJ
X-IronPort-AV: E=Sophos;i="4.84,721,1355115600"; d="scan'208";a="15447359"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu.net.uoguelph.ca with ESMTP; 23 Feb 2013 10:57:09 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4B7E3B3F1B;
 Sat, 23 Feb 2013 10:57:09 -0500 (EST)
Date: Sat, 23 Feb 2013 10:57:09 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Momchil Ivanov <momchil@xaxo.eu>
Message-ID: <448938470.3234495.1361635029255.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <86hal3kzp8.wl%momchil@xaxo.eu>
Subject: Re: NFS + Kerberos
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org, Elias Martenson <lokedhs@gmail.com>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Feb 2013 15:57:16 -0000

Momchil Ivanov wrote:
> At Fri, 22 Feb 2013 19:04:23 -0500 (EST),
> Rick Macklem wrote:
> > You can run "gssd -d -d" and it will run in foreground and print
> > out messages related to resource allocation. This isn't much use,
> > except to tell you that it is doing something. (Adding a "verbose"
> > option is on my "to do" list, but I don't have any code at this
> > time.
> > If someone wants to do this, I think it would be great.)
> >
> > If you do this, don't have it started at boot (gssd_enable="NO" in
> > /etc/rc.conf) and then do the above command as root in a window
> > before attempting the mount command.
> >
> > Beyond that, you could add printfs to gssd.c. The main client side
> > function is gssd_init_sec_context(), which should get the Kerberos
> > ticket for a user via their TGT.
> 
> well, the server doesn't seem to start it at boot with
> gssd_enable="YES", I don't know why, but I cannot stop/restart nfsd
> until I manually start gssd :) the client starts it at boot, though
> 
> note: I can ssh into the server even when gssd is not running, I don't
> know if this is expected.
> 
> "gssd -d -d" prints things like this on the client and the server:
> 
> 1 resources allocated
> 2 resources allocated
> 1 resources allocated
> 0 resources allocated
> 1 resources allocated
> 2 resources allocated
> 1 resources allocated
> 0 resources allocated
> 1 resources allocated
> 2 resources allocated
> 1 resources allocated
> 0 resources allocated
> 
> which doesn't tell me anything :) so here is what happens on the
> client without a kerberos ticket:
> 
> 1 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> > init_sec_context_args
> uid: 1001
> cred: 0
> ctx: 0
> name: 5848115787646107649
> req_flags: 5848115787646107650
> > gss_resources
> i=0
> gr_id :5848115787646107649
> gr_res :0x28203060
> /usr/src/usr.sbin/gssd/gssd.c:307 argp->name
> /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
> /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
> 0 resources allocated
> 1 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> > init_sec_context_args
> uid: 1001
> cred: 0
> ctx: 0
> name: 5848115787646107650
> req_flags: 5848115787646107650
> > gss_resources
> i=0
> gr_id :5848115787646107650
> gr_res :0x28203060
> /usr/src/usr.sbin/gssd/gssd.c:307 argp->name
> /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
> /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
> 0 resources allocated
> 1 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> > init_sec_context_args
> uid: 1001
> cred: 0
> ctx: 0
> name: 5848115787646107651
> req_flags: 5848115787646107650
> > gss_resources
> i=0
> gr_id :5848115787646107651
> gr_res :0x28203060
> /usr/src/usr.sbin/gssd/gssd.c:307 argp->name
> /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
> /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
> 0 resources allocated
> 
> here is what happens with a kerberos ticket:
> 
> 1 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> > init_sec_context_args
> uid: 1001
> cred: 0
> ctx: 0
> name: 5848116041049178113
> req_flags: 5848116041049178114
> > gss_resources
> i=0
> gr_id :5848116041049178113
> gr_res :0x28203060
> /usr/src/usr.sbin/gssd/gssd.c:307 argp->name
> /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
> /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
> 2 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED
> 1 resources allocated
> 0 resources allocated
> 1 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> > init_sec_context_args
> uid: 1001
> cred: 0
> ctx: 0
> name: 5848116041049178115
> req_flags: 5848116041049178114
> > gss_resources
> i=0
> gr_id :5848116041049178115
> gr_res :0x28203060
> /usr/src/usr.sbin/gssd/gssd.c:307 argp->name
> /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
> /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
> 2 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED
> 1 resources allocated
> 0 resources allocated
> 1 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001
> > init_sec_context_args
> uid: 1001
> cred: 0
> ctx: 0
> name: 5848116041049178117
> req_flags: 5848116041049178114
> > gss_resources
> i=0
> gr_id :5848116041049178117
> gr_res :0x28203060
> /usr/src/usr.sbin/gssd/gssd.c:307 argp->name
> /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176
> /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060
> 2 resources allocated
> /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED
> 1 resources allocated
> 0 resources allocated
> 
In the last post, I forgot to mention...

RFC2203 describes what the stuff in the Null RPCs looks like
and it isn't a particularily large or hard to read RFC, so
you might want to take a look at it.

rick

> here is what I have changed:
> 
> --- gssd.c.orig 2013-02-23 11:13:20.000000000 +0100
> +++ gssd.c 2013-02-23 12:34:33.000000000 +0100
> @@ -238,6 +238,33 @@
> return (TRUE);
> }
> 
> +static void
> +dump_resources(FILE *s)
> +{
> + struct gss_resource *gr;
> + int i;
> +
> + fprintf(s, "> gss_resources\n");
> +
> + i = 0;
> + LIST_FOREACH(gr, &gss_resources, gr_link) {
> + fprintf(s, "i=%d\n", i);
> + fprintf(s, "gr_id :%llu\n", gr->gr_id);
> + fprintf(s, "gr_res :%p\n", gr->gr_res);
> + }
> +}
> +
> +void
> +dump_init_sec_context_args(FILE *s, init_sec_context_args *p)
> +{
> + fprintf(s, "> init_sec_context_args\n");
> + fprintf(s, "uid: %d\n", p->uid);
> + fprintf(s, "cred: %llu\n", p->cred);
> + fprintf(s, "ctx: %llu\n", p->ctx);
> + fprintf(s, "name: %llu\n", p->name);
> + fprintf(s, "req_flags: %llu\n", p->req_flags);
> +}
> +
> bool_t
> gssd_init_sec_context_1_svc(init_sec_context_args *argp,
> init_sec_context_res *result, struct svc_req *rqstp)
> {
> @@ -248,27 +275,42 @@
> 
> snprintf(ccname, sizeof(ccname), "FILE:/tmp/krb5cc_%d",
> (int) argp->uid);
> +
> + printf("%s:%d %s\n", __FILE__, __LINE__, ccname);
> + dump_init_sec_context_args(stdout, argp);
> + dump_resources(stdout);
> +
> setenv("KRB5CCNAME", ccname, TRUE);
> 
> memset(result, 0, sizeof(*result));
> if (argp->cred) {
> + printf("%s:%d argp->cred\n", __FILE__, __LINE__);
> cred = gssd_find_resource(argp->cred);
> + printf("%s:%d cred=%llu\n", __FILE__, __LINE__, cred);
> if (!cred) {
> result->major_status = GSS_S_CREDENTIALS_EXPIRED;
> + printf("%s:%d GSS_S_CREDENTIALS_EXPIRED\n", __FILE__, __LINE__);
> return (TRUE);
> }
> }
> if (argp->ctx) {
> + printf("%s:%d argp->ctx\n", __FILE__, __LINE__);
> ctx = gssd_find_resource(argp->ctx);
> + printf("%s:%d ctx=%llu\n", __FILE__, __LINE__, ctx);
> if (!ctx) {
> result->major_status = GSS_S_CONTEXT_EXPIRED;
> + printf("%s:%d GSS_S_CONTEXT_EXPIRED\n", __FILE__, __LINE__);
> return (TRUE);
> }
> }
> if (argp->name) {
> + printf("%s:%d argp->name\n", __FILE__, __LINE__);
> name = gssd_find_resource(argp->name);
> + printf("%s:%d name=%llu\n", __FILE__, __LINE__, name);
> + printf("%s:%d name=%p\n", __FILE__, __LINE__, name);
> if (!name) {
> result->major_status = GSS_S_BAD_NAME;
> + printf("%s:%d GSS_S_BAD_NAME\n", __FILE__, __LINE__);
> return (TRUE);
> }
> }
> @@ -286,6 +328,11 @@
> result->ctx = argp->ctx;
> else
> result->ctx = gssd_make_resource(ctx);
> +
> + if (result->major_status == GSS_S_COMPLETE)
> + printf("%s:%d GSS_S_COMPLETE\n", __FILE__, __LINE__);
> + else
> + printf("%s:%d GSS_S_CONTINUE_NEEDED\n", __FILE__, __LINE__);
> }
> 
> return (TRUE);
> 
> Ideas?
> 
> Thank you,
> Momchil

From owner-freebsd-fs@FreeBSD.ORG  Sat Feb 23 23:54:47 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id A9A58F40
 for <fs@FreeBSD.org>; Sat, 23 Feb 2013 23:54:47 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from fallbackmx07.syd.optusnet.com.au
 (fallbackmx07.syd.optusnet.com.au [211.29.132.9])
 by mx1.freebsd.org (Postfix) with ESMTP id 21E9CA9F
 for <fs@FreeBSD.org>; Sat, 23 Feb 2013 23:54:46 +0000 (UTC)
Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au
 [211.29.132.194])
 by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
 r1NNsdGe013894 for <fs@FreeBSD.org>; Sun, 24 Feb 2013 10:54:39 +1100
Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au
 (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106])
 by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r1NNsULp026737
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Sun, 24 Feb 2013 10:54:31 +1100
Date: Sun, 24 Feb 2013 10:54:30 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: cleaning files beyond EOF
In-Reply-To: <20130217074832.GA2598@kib.kiev.ua>
Message-ID: <20130224093909.Y920@besplex.bde.org>
References: <20130217113031.N9271@besplex.bde.org>
 <20130217055528.GB2522@kib.kiev.ua>
 <20130217172928.C1900@besplex.bde.org> <20130217074832.GA2598@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=Auu2R5BP c=1 sm=1 a=xK1pj5J4f3QA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=GlckP5_kgdUA:10
 a=s1h70ublos-9_-jcX2sA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117
Cc: fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Feb 2013 23:54:47 -0000

On Sun, 17 Feb 2013, Konstantin Belousov wrote:

> On Sun, Feb 17, 2013 at 06:01:50PM +1100, Bruce Evans wrote:
>> On Sun, 17 Feb 2013, Konstantin Belousov wrote:
>>
>>> On Sun, Feb 17, 2013 at 11:33:58AM +1100, Bruce Evans wrote:
>>>> I have a (possibly damaged) ffs data block with nonzero data beyond
>>>> EOF.  Is anything responsible for clearing this data when the file
>>>> is mmapped()?
>>>>
>>>> At least old versions of gcc mmap() the file and have a bug checking
>>>> for EOF.  They read the garbage beyond the end and get confused.
>>>
>>> Does the 'damaged' status of the data block mean that it contain the
>>> garbage after EOF on disk ?
>>
>> Yes, it's at most software damage.  I used a broken version of
>> vfs_bio_clrbuf() for a long time and it probably left some unusual
>> blocks.  This matters suprisingly rarely.
> I recently had to modify the vfs_bio_clrbuf().  For me, a bug in the
> function did matter a lot, because the function is used, in particular,
> to clear the indirect blocks.  The bug caused quite random filesystem
> failures until I figured it out.  My version of vfs_bio_clrbuf() is
> at the end of the message, it avoids accessing b_data.

This will take me a long time to understand.  Indirect blocks seemed to
be broken to me too.  clrbuf() in 4.4BSD was just a simple bzero() of
the entire buffer followed by an (IMO bogus) setting of b_resid to 0.
It was used mainly to allocate blocks, including indirect blocks in ffs.
Now most file systems use vfs_bio_clrbuf() instead.  hpfs, ntfs and udf
still use clrbuf(), since they apparently didn't understand FreeBSD APIs
when they were implemented and they are too new to have been changed by
the global sweep to change to vfs_bio_clrbuf().  But vfs_bio_clrbuf()
has obscure semantics which seem to be different from those of clrbuf().
It reduces to clrbuf() in the non-VMIO case.  In the VMIO case it only
clears the previously "invalid" portions of the buffer.  I don't see
why "valid" implies zero, and testing showed that it didn't.  Cases where
the buffer ended up not all zero were rare and seemed to be only for
indrect blocks in ffs.  Also, the complexity of vfs_bio_clrbuf() is
bogus IMO.  The allocation will be followed by a physical write, to
avoiding re-zeroing parts of the buffer is an insignificant optimization
even if these parts are most of the buffer.

>> I forgot to mention that this is with an old version of FreeBSD,
>> where I changed vfs_bio.c a lot but barely touched vm.
>>
>>> UFS uses a small wrapper around vnode_generic_getpages() as the
>>> VOP_GETPAGES(), the wrapping code can be ignored for the current
>>> purpose.
>>>
>>> vnode_generic_getpages() iterates over the the pages after the bstrategy()
>>> and marks the part of the page after EOF valid and zeroes it, using
>>> vm_page_set_valid_range().
>>
>> The old version has a large non-wrapper in ffs, and vnode_generic_getpages()
>> uses vm_page_set_validclean().  Maybe the bug is just in the old
>> ffs_getpages().  It seems to do only DEV_BSIZE'ed zeroing stuff.  It
>> begins with the same "We have to zero that data" code that forms most
>> of the wrapper in the current version.  It normally only returns
>> vnode_pager_generic_getpages() after that if bsize < PAGE_SIZE.
>> However, my version has a variable which I had forgotten about to
>> control this, and the forgotten setting of this variable results in
>> always using vnode_pager_generic_getpages(), as in -current.  I probably
>> copied some fixes in -current for this.  So the bug can't be just in
>> ffs_getpages().
>>
>> The "damaged" block is at the end of vfs_default.c.  The file size is
>> 25 * PAGE_SIZE + 16.  It is in 7 16K blocks, 2 full 2K frags, and 1 frag
>> with 16 bytes valid in it.
> But the ffs_getpages() might be indeed the culprit. It calls
> vm_page_zero_invalid(), which only has DEV_BSIZE granularity. I think
> that ffs_getpages() also should zero the after eof part of the last page
> of the file to fix your damage, since device read cannot read less than
> DEV_BSIZE.

Is that for the old ffs_getpages()?  I think it clears the DEV_BSIZE
sub-blocks in the page starting at the first "invalid" one.  For my
file's layout, that is starting at offset DEV_BSIZE, since the sub-block
at offset 0 has 16 bytes valid in it and of course the whole sub-block
is valid at the VMIO level.  I tried to verify this by checking that
the unzeroed bytes were from offset 16 to 511, but unfortunately the
problem went away soon after I wrote the first mail about this :-).
It was very reproducible until then.  I checked using fsdb that the
data block is still unzeroed.

> diff --git a/sys/ufs/ffs/ffs_vnops.c b/sys/ufs/ffs/ffs_vnops.c
> index ef6194c..4240b78 100644
> --- a/sys/ufs/ffs/ffs_vnops.c
> +++ b/sys/ufs/ffs/ffs_vnops.c
> @@ -844,9 +844,9 @@ static int
> ffs_getpages(ap)
> 	struct vop_getpages_args *ap;
> {
> -	int i;
> 	vm_page_t mreq;
> -	int pcount;
> +	uint64_t size;
> +	int i, pcount;
>
> 	pcount = round_page(ap->a_count) / PAGE_SIZE;
> 	mreq = ap->a_m[ap->a_reqpage];
> @@ -861,6 +861,9 @@ ffs_getpages(ap)
> 	if (mreq->valid) {
> 		if (mreq->valid != VM_PAGE_BITS_ALL)
> 			vm_page_zero_invalid(mreq, TRUE);
> +		size = VTOI(ap->a_vp)->i_size;
> +		if (mreq->pindex == OFF_TO_IDX(size))
> +			pmap_zero_page_area(mreq, size & PAGE_MASK, PAGE_SIZE);
> 		for (i = 0; i < pcount; i++) {
> 			if (i != ap->a_reqpage) {
> 				vm_page_lock(ap->a_m[i]);

I saw you later mail with this fix fixed (clearing PAGE_SIZE is too much).
I think I would write it without the pindex check:

 		off = VTOI(ap->a_vp)->i_size & PAGE_MASK;
 		if (off != 0)
 			pmap_zero_page_area(mreq, off, PAGE_SIZE - off);

>
> On the other hand, it is not clear should we indeed protect against such
> case, or just declare the disk data broken.

Then file systems written by different, less strict implementations of
ffs would not work.  It is unclear if ffs is specified to clear bytes
beyond the end when writing files.  But I think it should clear up to
the next fragment boundary.  After that, it is vm's responsibility to
clear.

Bruce