From owner-freebsd-fs@FreeBSD.ORG Sun Feb 17 00:34:10 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1001CE9A for ; Sun, 17 Feb 2013 00:34:10 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx08.syd.optusnet.com.au (fallbackmx08.syd.optusnet.com.au [211.29.132.10]) by mx1.freebsd.org (Postfix) with ESMTP id 805AB1D6 for ; Sun, 17 Feb 2013 00:34:09 +0000 (UTC) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by fallbackmx08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r1H0Y8kH002802 for ; Sun, 17 Feb 2013 11:34:08 +1100 Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r1H0XwuD020783 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sun, 17 Feb 2013 11:34:00 +1100 Date: Sun, 17 Feb 2013 11:33:58 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: fs@freebsd.org Subject: cleaning files beyond EOF Message-ID: <20130217113031.N9271@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=MscKcBme c=1 sm=1 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=MxLOMYGytlMA:10 a=Yc0UDWaBctfB_lDtZ-YA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Feb 2013 00:34:10 -0000 I have a (possibly damaged) ffs data block with nonzero data beyond EOF. Is anything responsible for clearing this data when the file is mmapped()? At least old versions of gcc mmap() the file and have a bug checking for EOF. They read the garbage beyond the end and get confused. Bruce From owner-freebsd-fs@FreeBSD.ORG Sun Feb 17 05:55:39 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 4A43FC29 for ; Sun, 17 Feb 2013 05:55:39 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id A7216AA0 for ; Sun, 17 Feb 2013 05:55:38 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r1H5tSKK017200; Sun, 17 Feb 2013 07:55:28 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r1H5tSKK017200 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r1H5tSGT017199; Sun, 17 Feb 2013 07:55:28 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 17 Feb 2013 07:55:28 +0200 From: Konstantin Belousov To: Bruce Evans Subject: Re: cleaning files beyond EOF Message-ID: <20130217055528.GB2522@kib.kiev.ua> References: <20130217113031.N9271@besplex.bde.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="lnVtgFsQ/+aGFUA8" Content-Disposition: inline In-Reply-To: <20130217113031.N9271@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Feb 2013 05:55:39 -0000 --lnVtgFsQ/+aGFUA8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Feb 17, 2013 at 11:33:58AM +1100, Bruce Evans wrote: > I have a (possibly damaged) ffs data block with nonzero data beyond > EOF. Is anything responsible for clearing this data when the file > is mmapped()? >=20 > At least old versions of gcc mmap() the file and have a bug checking > for EOF. They read the garbage beyond the end and get confused. Does the 'damaged' status of the data block mean that it contain the garbage after EOF on disk ? UFS uses a small wrapper around vnode_generic_getpages() as the VOP_GETPAGES(), the wrapping code can be ignored for the current purpose. vnode_generic_getpages() iterates over the the pages after the bstrategy() and marks the part of the page after EOF valid and zeroes it, using vm_page_set_valid_range(). --lnVtgFsQ/+aGFUA8 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRIHDQAAoJEJDCuSvBvK1BVvIQAII9S5yh7gY5/sDH53QfKv7T 6r1ekPRmUgcSMGcFMwH6N88uVOckdXFcROIJKmgKDvjQzzdvif3SPzBFwShm55rg zsqs5IuEi+xJtn2N6TQOdgTiV+//GLonIKXMjrQluF0qm9+BhP9bUmSrkIzHGV5P Lf20n3EMok3hF03cxdJrDI0jHJ+wUpZTee1SELq/fcMHK04R0BCtsVgulHwRnrr9 N+fdhpB5Eh85LH5OEALgiF2x5deK/khhauPyFRymg+s2N57w7EsxxQHAyYnuz/KN sDY0FOlvNLqwzBGtomrJnRWY+d6RQiKlcxvWkBUP9leOFnuGnRVc1MjYHhNP/Htz GzAeQKLLuDIroSZy5IbcGrLzc2XQIKix1ILPtfffV/2Vn/bQCXHC1EGqPfQl2UqK 8K90QjVR0z7mBvbaiW8QgoN/2Dy0UvqYwOTBrAzAET6tRmH/nOuCAqhzsBAvssY1 tjRS5nqeF3S/AbalZtDNM7TmNZOlWTlLYUrvDuct/C68CkWGejYYFGMoGOuL2/vC hGb5pSH9IDNG5NP3mtoXylukmplgUnhPOiXyxLLGAoFnVTL5nGJaxlUwu8XElOxq +3xNgLt2mkR4er8mx9WaB3moistCIU1LEI4M/kQEOQLQfTWxdc7hC2P8hvm6IN0V TCvwmJrtvt+it4vdaTpg =lL16 -----END PGP SIGNATURE----- --lnVtgFsQ/+aGFUA8-- From owner-freebsd-fs@FreeBSD.ORG Sun Feb 17 07:02:04 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A007D368 for ; Sun, 17 Feb 2013 07:02:04 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail12.syd.optusnet.com.au (mail12.syd.optusnet.com.au [211.29.132.193]) by mx1.freebsd.org (Postfix) with ESMTP id 9793EC2F for ; Sun, 17 Feb 2013 07:02:02 +0000 (UTC) Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail12.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r1H71osh019511 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 17 Feb 2013 18:01:52 +1100 Date: Sun, 17 Feb 2013 18:01:50 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov Subject: Re: cleaning files beyond EOF In-Reply-To: <20130217055528.GB2522@kib.kiev.ua> Message-ID: <20130217172928.C1900@besplex.bde.org> References: <20130217113031.N9271@besplex.bde.org> <20130217055528.GB2522@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=RbTIkCRv c=1 sm=1 a=xK1pj5J4f3QA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=GlckP5_kgdUA:10 a=9bBWqGSFoObTARNqQagA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Feb 2013 07:02:04 -0000 On Sun, 17 Feb 2013, Konstantin Belousov wrote: > On Sun, Feb 17, 2013 at 11:33:58AM +1100, Bruce Evans wrote: >> I have a (possibly damaged) ffs data block with nonzero data beyond >> EOF. Is anything responsible for clearing this data when the file >> is mmapped()? >> >> At least old versions of gcc mmap() the file and have a bug checking >> for EOF. They read the garbage beyond the end and get confused. > > Does the 'damaged' status of the data block mean that it contain the > garbage after EOF on disk ? Yes, it's at most software damage. I used a broken version of vfs_bio_clrbuf() for a long time and it probably left some unusual blocks. This matters suprisingly rarely. I forgot to mention that this is with an old version of FreeBSD, where I changed vfs_bio.c a lot but barely touched vm. > UFS uses a small wrapper around vnode_generic_getpages() as the > VOP_GETPAGES(), the wrapping code can be ignored for the current > purpose. > > vnode_generic_getpages() iterates over the the pages after the bstrategy() > and marks the part of the page after EOF valid and zeroes it, using > vm_page_set_valid_range(). The old version has a large non-wrapper in ffs, and vnode_generic_getpages() uses vm_page_set_validclean(). Maybe the bug is just in the old ffs_getpages(). It seems to do only DEV_BSIZE'ed zeroing stuff. It begins with the same "We have to zero that data" code that forms most of the wrapper in the current version. It normally only returns vnode_pager_generic_getpages() after that if bsize < PAGE_SIZE. However, my version has a variable which I had forgotten about to control this, and the forgotten setting of this variable results in always using vnode_pager_generic_getpages(), as in -current. I probably copied some fixes in -current for this. So the bug can't be just in ffs_getpages(). The "damaged" block is at the end of vfs_default.c. The file size is 25 * PAGE_SIZE + 16. It is in 7 16K blocks, 2 full 2K frags, and 1 frag with 16 bytes valid in it. I have another problem that is apparently with vnode_pager_generic_getpages() and now affects -current from about a year ago in an identical way with the old version: mmap() is very slow in msdosfs. cmp uses mmap() too much, and reading files sequentially using mmap() is 3.4 times slower than reading them using read() on my DVD media/drive. The i/o seems to be correctly clustered for both. with average transaction sizes over 50K but tps much lower for mmap(). Similarly on a (faster) hard disk except the slowness is not as noticeable (drive buffering might hide it completely). However, for ffs files on the hard disk, mmap() is as fast as read(). Bruce From owner-freebsd-fs@FreeBSD.ORG Sun Feb 17 07:48:38 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CA14491C for ; Sun, 17 Feb 2013 07:48:38 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 0CB47D76 for ; Sun, 17 Feb 2013 07:48:37 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r1H7mWvY008623; Sun, 17 Feb 2013 09:48:32 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r1H7mWvY008623 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r1H7mWI0008622; Sun, 17 Feb 2013 09:48:32 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 17 Feb 2013 09:48:32 +0200 From: Konstantin Belousov To: Bruce Evans Subject: Re: cleaning files beyond EOF Message-ID: <20130217074832.GA2598@kib.kiev.ua> References: <20130217113031.N9271@besplex.bde.org> <20130217055528.GB2522@kib.kiev.ua> <20130217172928.C1900@besplex.bde.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="mP3DRpeJDSE+ciuQ" Content-Disposition: inline In-Reply-To: <20130217172928.C1900@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Feb 2013 07:48:38 -0000 --mP3DRpeJDSE+ciuQ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Feb 17, 2013 at 06:01:50PM +1100, Bruce Evans wrote: > On Sun, 17 Feb 2013, Konstantin Belousov wrote: >=20 > > On Sun, Feb 17, 2013 at 11:33:58AM +1100, Bruce Evans wrote: > >> I have a (possibly damaged) ffs data block with nonzero data beyond > >> EOF. Is anything responsible for clearing this data when the file > >> is mmapped()? > >> > >> At least old versions of gcc mmap() the file and have a bug checking > >> for EOF. They read the garbage beyond the end and get confused. > > > > Does the 'damaged' status of the data block mean that it contain the > > garbage after EOF on disk ? >=20 > Yes, it's at most software damage. I used a broken version of > vfs_bio_clrbuf() for a long time and it probably left some unusual > blocks. This matters suprisingly rarely. I recently had to modify the vfs_bio_clrbuf(). For me, a bug in the function did matter a lot, because the function is used, in particular, to clear the indirect blocks. The bug caused quite random filesystem failures until I figured it out. My version of vfs_bio_clrbuf() is at the end of the message, it avoids accessing b_data. >=20 > I forgot to mention that this is with an old version of FreeBSD, > where I changed vfs_bio.c a lot but barely touched vm. >=20 > > UFS uses a small wrapper around vnode_generic_getpages() as the > > VOP_GETPAGES(), the wrapping code can be ignored for the current > > purpose. > > > > vnode_generic_getpages() iterates over the the pages after the bstrateg= y() > > and marks the part of the page after EOF valid and zeroes it, using > > vm_page_set_valid_range(). >=20 > The old version has a large non-wrapper in ffs, and vnode_generic_getpage= s() > uses vm_page_set_validclean(). Maybe the bug is just in the old > ffs_getpages(). It seems to do only DEV_BSIZE'ed zeroing stuff. It > begins with the same "We have to zero that data" code that forms most > of the wrapper in the current version. It normally only returns > vnode_pager_generic_getpages() after that if bsize < PAGE_SIZE. > However, my version has a variable which I had forgotten about to > control this, and the forgotten setting of this variable results in > always using vnode_pager_generic_getpages(), as in -current. I probably > copied some fixes in -current for this. So the bug can't be just in > ffs_getpages(). >=20 > The "damaged" block is at the end of vfs_default.c. The file size is > 25 * PAGE_SIZE + 16. It is in 7 16K blocks, 2 full 2K frags, and 1 frag > with 16 bytes valid in it. But the ffs_getpages() might be indeed the culprit. It calls vm_page_zero_invalid(), which only has DEV_BSIZE granularity. I think that ffs_getpages() also should zero the after eof part of the last page of the file to fix your damage, since device read cannot read less than DEV_BSIZE. diff --git a/sys/ufs/ffs/ffs_vnops.c b/sys/ufs/ffs/ffs_vnops.c index ef6194c..4240b78 100644 --- a/sys/ufs/ffs/ffs_vnops.c +++ b/sys/ufs/ffs/ffs_vnops.c @@ -844,9 +844,9 @@ static int ffs_getpages(ap) struct vop_getpages_args *ap; { - int i; vm_page_t mreq; - int pcount; + uint64_t size; + int i, pcount; =20 pcount =3D round_page(ap->a_count) / PAGE_SIZE; mreq =3D ap->a_m[ap->a_reqpage]; @@ -861,6 +861,9 @@ ffs_getpages(ap) if (mreq->valid) { if (mreq->valid !=3D VM_PAGE_BITS_ALL) vm_page_zero_invalid(mreq, TRUE); + size =3D VTOI(ap->a_vp)->i_size; + if (mreq->pindex =3D=3D OFF_TO_IDX(size)) + pmap_zero_page_area(mreq, size & PAGE_MASK, PAGE_SIZE); for (i =3D 0; i < pcount; i++) { if (i !=3D ap->a_reqpage) { vm_page_lock(ap->a_m[i]); On the other hand, it is not clear should we indeed protect against such case, or just declare the disk data broken. >=20 > I have another problem that is apparently with > vnode_pager_generic_getpages() and now affects -current from about a > year ago in an identical way with the old version: mmap() is very slow > in msdosfs. cmp uses mmap() too much, and reading files sequentially > using mmap() is 3.4 times slower than reading them using read() on my > DVD media/drive. The i/o seems to be correctly clustered for both. > with average transaction sizes over 50K but tps much lower for mmap(). > Similarly on a (faster) hard disk except the slowness is not as noticeable > (drive buffering might hide it completely). However, for ffs files on > the hard disk, mmap() is as fast as read(). diff --git a/sys/kern/vfs_bio.c b/sys/kern/vfs_bio.c index 6393399..83d3609 100644 --- a/sys/kern/vfs_bio.c +++ b/sys/kern/vfs_bio.c @@ -3704,8 +4070,7 @@ vfs_bio_set_valid(struct buf *bp, int base, int size) void vfs_bio_clrbuf(struct buf *bp)=20 { - int i, j, mask; - caddr_t sa, ea; + int i, j, mask, sa, ea, slide; =20 if ((bp->b_flags & (B_VMIO | B_MALLOC)) !=3D B_VMIO) { clrbuf(bp); @@ -3723,39 +4088,69 @@ vfs_bio_clrbuf(struct buf *bp) if ((bp->b_pages[0]->valid & mask) =3D=3D mask) goto unlock; if ((bp->b_pages[0]->valid & mask) =3D=3D 0) { - bzero(bp->b_data, bp->b_bufsize); + pmap_zero_page_area(bp->b_pages[0], 0, bp->b_bufsize); bp->b_pages[0]->valid |=3D mask; goto unlock; } } - ea =3D sa =3D bp->b_data; - for(i =3D 0; i < bp->b_npages; i++, sa =3D ea) { - ea =3D (caddr_t)trunc_page((vm_offset_t)sa + PAGE_SIZE); - ea =3D (caddr_t)(vm_offset_t)ulmin( - (u_long)(vm_offset_t)ea, - (u_long)(vm_offset_t)bp->b_data + bp->b_bufsize); + sa =3D bp->b_offset & PAGE_MASK; + slide =3D 0; + for (i =3D 0; i < bp->b_npages; i++) { + slide =3D imin(slide + PAGE_SIZE, bp->b_bufsize + sa); + ea =3D slide & PAGE_MASK; + if (ea =3D=3D 0) + ea =3D PAGE_SIZE; if (bp->b_pages[i] =3D=3D bogus_page) continue; - j =3D ((vm_offset_t)sa & PAGE_MASK) / DEV_BSIZE; + j =3D sa / DEV_BSIZE; mask =3D ((1 << ((ea - sa) / DEV_BSIZE)) - 1) << j; VM_OBJECT_LOCK_ASSERT(bp->b_pages[i]->object, MA_OWNED); if ((bp->b_pages[i]->valid & mask) =3D=3D mask) continue; if ((bp->b_pages[i]->valid & mask) =3D=3D 0) - bzero(sa, ea - sa); + pmap_zero_page_area(bp->b_pages[i], sa, ea - sa); else { for (; sa < ea; sa +=3D DEV_BSIZE, j++) { - if ((bp->b_pages[i]->valid & (1 << j)) =3D=3D 0) - bzero(sa, DEV_BSIZE); + if ((bp->b_pages[i]->valid & (1 << j)) =3D=3D 0) { + pmap_zero_page_area(bp->b_pages[i], + sa, DEV_BSIZE); + } } } bp->b_pages[i]->valid |=3D mask; + sa =3D 0; } unlock: VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object); bp->b_resid =3D 0; } =20 +void +vfs_bio_bzero_buf(struct buf *bp, int base, int size) +{ + vm_page_t m; + int i, n; + + if ((bp->b_flags & B_UNMAPPED) =3D=3D 0) { + BUF_CHECK_MAPPED(bp); + bzero(bp->b_data + base, size); + } else { + BUF_CHECK_UNMAPPED(bp); + n =3D PAGE_SIZE - (base & PAGE_MASK); + VM_OBJECT_LOCK(bp->b_bufobj->bo_object); + for (i =3D base / PAGE_SIZE; size > 0 && i < bp->b_npages; ++i) { + m =3D bp->b_pages[i]; + if (n > size) + n =3D size; + pmap_zero_page_area(m, base & PAGE_MASK, n); + base +=3D n; + size -=3D n; + n =3D PAGE_SIZE; + } + VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object); + } +} + /* * vm_hold_load_pages and vm_hold_free_pages get pages into * a buffers address space. The pages are anonymous and are --mP3DRpeJDSE+ciuQ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRIItPAAoJEJDCuSvBvK1BLxoP/RBTTxKS5x4P4Y8eh72d80PC Q7FmAKW62SjIFapsHMYVoIdsfi7QYBWW8RGLI9LSjHfz5ouwb4p7LwR5NjcFCjw3 2G/wv9cTgkHv7/+QVqWNXBi0FquLVPY9oPwe2reFz9MZpHJeN0j/5rYKzCWGLxkC EMog2CdI5UBleLRTHCeXvHiSss75W39GAvefijo1C0/soAESCfSBLiITLZu3ZRjE +8sLS4mJ7pGHJ2AQdbVKGb1dkt6XHoZ2T8Jc3C/GwyoYE4CnUXe/2mssSCjePEng ho5eElyqW+3bwtMJKSQAuS+rQDhUvTLQFHHLlkZLyaDHbIVB0mkC03WQGRkdLi34 0Y+bzfNjcncyM197xUaG828DrpwKQZdsfkG685VzWBFWtzqz5epNVjjLWPBt+ReO HBB5FNAU9FXa0u8/amkpnexYTbfgnRx0h8mtd3B7eviKeZ7o2XrMwUAWVihYvZQN gAl/np41x8WXuIZ0BO3itc4LFFlSYj84AloyP3Tw9LT1zuk8bCcl3Vz0Xp6WAce/ tKfskZIMcgUn02wO3XOkHviwfWsi1IJ+eZhgqg7KWefj80WwtWTM8WR83Q0yhcBO rtTfsgFYPJdhF3dAK7EZBcASSs/xYOL74/qcyUVQwUZ4Wg2AA2ew1VdvSPm10qUu Rsd8TZkYmljlzb1BKkAK =Zg5p -----END PGP SIGNATURE----- --mP3DRpeJDSE+ciuQ-- From owner-freebsd-fs@FreeBSD.ORG Sun Feb 17 14:46:57 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8201D7C3; Sun, 17 Feb 2013 14:46:57 +0000 (UTC) (envelope-from konstantin.kuklin@gmail.com) Received: from mail-qc0-f170.google.com (mail-qc0-f170.google.com [209.85.216.170]) by mx1.freebsd.org (Postfix) with ESMTP id 01B2DA69; Sun, 17 Feb 2013 14:46:56 +0000 (UTC) Received: by mail-qc0-f170.google.com with SMTP id d42so1784742qca.29 for ; Sun, 17 Feb 2013 06:46:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=YJgMsTnDELuLnMjp4ab6RPnA5/bvID2AmVen7Oc9bqY=; b=Ct18D0IecE5HBoQkAf6tnWJM2HcdTVlwt9zL9zXCqo7Jr2AqE3dzBRkt6MNdKWfOjC +Dqk6oy/vEhGf9PQ+fuS6l/i6P9d7MJCl3uKm/19qEbvqNdDUW5V7LboGYBOeq2FT1av KDyHfvc+26pt6Ey7CNakG8drYR3LbIcrErw8RtKz4fRudA5XZ6FJuRkx9IlB/oO0qr2R WIEnobaLFNrLcHkO79hVvzH6RHhZGv4Jh7XQul/gl+VE3kv/Te4EbA5bAAnsz3BoJI6H HfEMiWW+3XvRqG3GDDTMxYLZ+0wWl6ZkMIxj0wuvva7hySiGeiBbVEVLV2/mb9gB2Av8 sEEA== MIME-Version: 1.0 X-Received: by 10.224.186.83 with SMTP id cr19mr4106277qab.51.1361112410258; Sun, 17 Feb 2013 06:46:50 -0800 (PST) Received: by 10.49.98.130 with HTTP; Sun, 17 Feb 2013 06:46:50 -0800 (PST) Date: Sun, 17 Feb 2013 18:46:50 +0400 Message-ID: Subject: zfs raid1 error resilvering and mount From: Konstantin Kuklin To: freebsd-fs@freebsd.org, pjd@freebsd.org, mm@freebsd.org, zfs-discuss@opensolaris.org X-Mailman-Approved-At: Sun, 17 Feb 2013 15:41:55 +0000 Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Feb 2013 14:46:57 -0000 hi, i have raid1 on zfs with 2 device on pool first device died and boot from second not working... i try to get http://mfsbsd.vx.sk/ flash and load from it with zpool import http://puu.sh/2402E when i load zfs.ko and opensolaris.ko i see this message: Solaris: WARNING: Can't open objset for zroot/var/crash Solaris: WARNING: Can't open objset for zroot/var/crash zpool status: http://puu.sh/2405f resilvering freeze with: zpool status -v ............. zroot/usr:<0x28ff> zroot/usr:<0x29ff> zroot/usr:<0x2aff> zroot/var/crash:<0x0> root@Flash:/root # how i can delete or drop it fs zroot/var/crash (1m-10m size i didn`t remember) and mount other zfs points with my data --=20 =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE. From owner-freebsd-fs@FreeBSD.ORG Sun Feb 17 15:49:21 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 3B81CDA8 for ; Sun, 17 Feb 2013 15:49:21 +0000 (UTC) (envelope-from ml@my.gd) Received: from mail-wg0-f47.google.com (mail-wg0-f47.google.com [74.125.82.47]) by mx1.freebsd.org (Postfix) with ESMTP id B3817CBB for ; Sun, 17 Feb 2013 15:49:20 +0000 (UTC) Received: by mail-wg0-f47.google.com with SMTP id dr13so3892638wgb.2 for ; Sun, 17 Feb 2013 07:49:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :cc:content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=/E1oXavxbSDixeGVxlfjfS/UGeorjyrFZrcpbJjMODA=; b=WPIp681J8azoVLfdfy8rOyN3zQBOqnMx8PCUqG0h5rOsHKWOyxf7B7LitTEEtoIoJo bx8hBcJXVnbHxoU5lSjBXTcse1K2p4oZWLAQqc7Bj05+TRlL3/0lSmBZBtItQ6g8SKi8 OsG9MobHQ2b11PK4SXZ910XBBmY+JyTrD+ERa6dAwUSFrxjzdKobrrcasn+NX6vr9wB+ ucu/8TYwW4uZ78C7RTQWAjhbnuXZV8vHOh1XBIYrVAuEY1jvbDDqT0QgbhhiACBcHviA eIv6n5C3oAi4VUQwEnIClPMmbi65X6KB34+Ha5Xr7Ik7NEIjxVlgj/WOnVAwGDzlIMHj q/7w== X-Received: by 10.194.76.7 with SMTP id g7mr14164727wjw.50.1361116159500; Sun, 17 Feb 2013 07:49:19 -0800 (PST) Received: from [192.168.0.13] (did75-17-88-165-130-96.fbx.proxad.net. [88.165.130.96]) by mx.google.com with ESMTPS id n2sm15221335wiy.6.2013.02.17.07.49.16 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 17 Feb 2013 07:49:18 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: zfs raid1 error resilvering and mount From: Fleuriot Damien In-Reply-To: Date: Sun, 17 Feb 2013 16:49:16 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd> References: To: Konstantin Kuklin X-Mailer: Apple Mail (2.1499) X-Gm-Message-State: ALoCoQlrKOlQ3DxEH5EJQg53TWZS6xhL9Q0GwMjZbCtFVzx4d18znNvbEVhXIRO4EyVqf2TDvWuD Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Feb 2013 15:49:21 -0000 Hmmm, zfs destroy -f zroot/var/crash ? Then you can try to zfs mount -a Removing pjd and mm from cc, if they want to read your message they're = old enough to check their ML subscription. On Feb 17, 2013, at 3:46 PM, Konstantin Kuklin = wrote: > hi, i have raid1 on zfs with 2 device on pool > first device died and boot from second not working... >=20 > i try to get http://mfsbsd.vx.sk/ flash and load from it with zpool = import > http://puu.sh/2402E >=20 > when i load zfs.ko and opensolaris.ko i see this message: > Solaris: WARNING: Can't open objset for zroot/var/crash > Solaris: WARNING: Can't open objset for zroot/var/crash >=20 > zpool status: > http://puu.sh/2405f >=20 > resilvering freeze with: > zpool status -v > ............. > zroot/usr:<0x28ff> > zroot/usr:<0x29ff> > zroot/usr:<0x2aff> > zroot/var/crash:<0x0> > root@Flash:/root # >=20 > how i can delete or drop it fs zroot/var/crash (1m-10m size i didn`t > remember) and mount other zfs points with my data > --=20 > =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC > =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0=B0= =D0=BD=D1=82=D0=B8=D0=BD. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Feb 18 07:48:37 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1B0AE285; Mon, 18 Feb 2013 07:48:37 +0000 (UTC) (envelope-from konstantin.kuklin@gmail.com) Received: from mail-qa0-f42.google.com (mail-qa0-f42.google.com [209.85.216.42]) by mx1.freebsd.org (Postfix) with ESMTP id B4BB6180; Mon, 18 Feb 2013 07:48:36 +0000 (UTC) Received: by mail-qa0-f42.google.com with SMTP id cr7so1142172qab.15 for ; Sun, 17 Feb 2013 23:48:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=cT4axFSL92SPKDjGTdoRlDwT+fTQwVAfQ1UkfHo2XXQ=; b=HCuvugKmIja6AXqf4LaG9YZnlMlSrHyPCzFnGCTpyfkImax6iNpb30wjyHS+RII7Fu 0MOXiI33Cy9hQcyfAjNNx5CZ+vX1HoLdrpodzT7xnH6SuznOmzWYjy29hmk35xE9iFGj 0foZXtDRznBmfHU4EHATHMHt0SdecHMO1z2fQwwskRhe4ZZ/fVq1kNEI09N5fvEN+e8p q9U9CjiJJj5uO8T7+wYbY0GUolD5ZvkinYvgolYReMl4jEErVVibQbhggiWqK0sdpUwl vj80IrnKt+S9uITDPMj6hawuIH7lmyrr9v98Cu2v61na5K/449dyUyuA2ZNyi0fM0Dod yGjA== MIME-Version: 1.0 X-Received: by 10.49.2.35 with SMTP id 3mr4510000qer.36.1361173709882; Sun, 17 Feb 2013 23:48:29 -0800 (PST) Received: by 10.49.98.130 with HTTP; Sun, 17 Feb 2013 23:48:29 -0800 (PST) In-Reply-To: <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd> References: <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd> Date: Mon, 18 Feb 2013 11:48:29 +0400 Message-ID: Subject: Re: zfs raid1 error resilvering and mount From: Konstantin Kuklin To: Fleuriot Damien Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2013 07:48:37 -0000 i can`t do it, because resilvering in progress(freeze on 0.1%) and zfs list empty 2013/2/17 Fleuriot Damien : > Hmmm, zfs destroy -f zroot/var/crash ? > > Then you can try to zfs mount -a > > > > Removing pjd and mm from cc, if they want to read your message they're ol= d enough to check their ML subscription. > > > On Feb 17, 2013, at 3:46 PM, Konstantin Kuklin wrote: > >> hi, i have raid1 on zfs with 2 device on pool >> first device died and boot from second not working... >> >> i try to get http://mfsbsd.vx.sk/ flash and load from it with zpool impo= rt >> http://puu.sh/2402E >> >> when i load zfs.ko and opensolaris.ko i see this message: >> Solaris: WARNING: Can't open objset for zroot/var/crash >> Solaris: WARNING: Can't open objset for zroot/var/crash >> >> zpool status: >> http://puu.sh/2405f >> >> resilvering freeze with: >> zpool status -v >> ............. >> zroot/usr:<0x28ff> >> zroot/usr:<0x29ff> >> zroot/usr:<0x2aff> >> zroot/var/crash:<0x0> >> root@Flash:/root # >> >> how i can delete or drop it fs zroot/var/crash (1m-10m size i didn`t >> remember) and mount other zfs points with my data >> -- >> =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD >> =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE. >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -- =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE. From owner-freebsd-fs@FreeBSD.ORG Mon Feb 18 09:20:37 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 26AAC6BA for ; Mon, 18 Feb 2013 09:20:37 +0000 (UTC) (envelope-from ml@my.gd) Received: from mail-we0-x233.google.com (mail-we0-x233.google.com [IPv6:2a00:1450:400c:c03::233]) by mx1.freebsd.org (Postfix) with ESMTP id B7DE2738 for ; Mon, 18 Feb 2013 09:20:36 +0000 (UTC) Received: by mail-we0-f179.google.com with SMTP id p43so2805207wea.10 for ; Mon, 18 Feb 2013 01:20:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :cc:content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=vmIm9LMYXGEY/CeIYC6vUXKe92lPLhLQI79IByJimGQ=; b=XxpkQkBr1x/SB3qm/KmXUckb/mNzNxML2wp9cnucNutA/ETP0MAdBEP8Zi2OZ2viY4 mS0/qhbkRQtQ/4OCP+PhmuroTWlroGgpqpHZpMsvPDhLfB8e4HmJcsHG0JWvcwsFsPqd XPj0se8+Ttp+7QBYK6lNpLJXsG0BCXjQq4KdVU8QcMwtW+KxK6tFxj7ahG4a2A2m36rY 3WxZmw6GU5lQeF0J1WUQbOi5Kp+Ty+t5/vozEa8eydkQuYeNZ53K7l7g4yfU81xXODWi 27YjYFpGO6lAjb4So1wXYvmyT1pkmoBRlG+90fB1G3TjRnpHbI86wMntkcyk1WCcMdf/ svCQ== X-Received: by 10.180.108.3 with SMTP id hg3mr16623336wib.33.1361179235927; Mon, 18 Feb 2013 01:20:35 -0800 (PST) Received: from [10.75.0.66] ([83.167.62.196]) by mx.google.com with ESMTPS id j4sm16786695wiz.10.2013.02.18.01.20.25 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 18 Feb 2013 01:20:34 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: zfs raid1 error resilvering and mount From: Fleuriot Damien In-Reply-To: Date: Mon, 18 Feb 2013 10:20:26 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd> To: Konstantin Kuklin X-Mailer: Apple Mail (2.1499) X-Gm-Message-State: ALoCoQkGhu5zMU5dEYlGnnIQyupXcawdrhwa7Tas2mDHxCEKGqosY8hYij0ehZM3SSDCj9aExj5p Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2013 09:20:37 -0000 Reassure me here, you've replaced your failed vdev before trying to = resilver right ? Your zpool status suggests otherwise, so I only want to make sure this = is a status from before replacing your drive. On Feb 18, 2013, at 8:48 AM, Konstantin Kuklin = wrote: > i can`t do it, because resilvering in progress(freeze on 0.1%) and zfs > list empty >=20 > 2013/2/17 Fleuriot Damien : >> Hmmm, zfs destroy -f zroot/var/crash ? >>=20 >> Then you can try to zfs mount -a >>=20 >>=20 >>=20 >> Removing pjd and mm from cc, if they want to read your message = they're old enough to check their ML subscription. >>=20 >>=20 >> On Feb 17, 2013, at 3:46 PM, Konstantin Kuklin = wrote: >>=20 >>> hi, i have raid1 on zfs with 2 device on pool >>> first device died and boot from second not working... >>>=20 >>> i try to get http://mfsbsd.vx.sk/ flash and load from it with zpool = import >>> http://puu.sh/2402E >>>=20 >>> when i load zfs.ko and opensolaris.ko i see this message: >>> Solaris: WARNING: Can't open objset for zroot/var/crash >>> Solaris: WARNING: Can't open objset for zroot/var/crash >>>=20 >>> zpool status: >>> http://puu.sh/2405f >>>=20 >>> resilvering freeze with: >>> zpool status -v >>> ............. >>> zroot/usr:<0x28ff> >>> zroot/usr:<0x29ff> >>> zroot/usr:<0x2aff> >>> zroot/var/crash:<0x0> >>> root@Flash:/root # >>>=20 >>> how i can delete or drop it fs zroot/var/crash (1m-10m size i didn`t >>> remember) and mount other zfs points with my data >>> -- >>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC >>> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0= =B0=D0=BD=D1=82=D0=B8=D0=BD. >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to = "freebsd-fs-unsubscribe@freebsd.org" >>=20 >=20 >=20 >=20 > -- > =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC > =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0=B0= =D0=BD=D1=82=D0=B8=D0=BD. From owner-freebsd-fs@FreeBSD.ORG Mon Feb 18 11:06:44 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 149CD1C4 for ; Mon, 18 Feb 2013 11:06:44 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id ECC42E1F for ; Mon, 18 Feb 2013 11:06:43 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r1IB6h2t061518 for ; Mon, 18 Feb 2013 11:06:43 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r1IB6hFQ061516 for freebsd-fs@FreeBSD.org; Mon, 18 Feb 2013 11:06:43 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 18 Feb 2013 11:06:43 GMT Message-Id: <201302181106.r1IB6hFQ061516@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2013 11:06:44 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/176179 fs [nfs] nfs client KASSERT: panic: attempt to set TDF_SB o kern/176141 fs [zfs] sharesmb=on makes errors for sharenfs, and still o kern/175950 fs [zfs] Possible deadlock in zfs after long uptime o kern/175897 fs [zfs] operations on readonly zpool hang o kern/175179 fs [zfs] ZFS may attach wrong device on move o kern/175071 fs [ufs] [panic] softdep_deallocate_dependencies: unrecov o kern/174372 fs [zfs] Pagefault appears to be related to ZFS o kern/174315 fs [zfs] chflags uchg not supported o kern/174310 fs [zfs] root point mounting broken on CURRENT with multi o kern/174279 fs [ufs] UFS2-SU+J journal and filesystem corruption o kern/174060 fs [ext2fs] Ext2FS system crashes (buffer overflow?) o kern/173830 fs [zfs] Brain-dead simple change to ZFS error descriptio o kern/173718 fs [zfs] phantom directory in zraid2 pool f kern/173657 fs [nfs] strange UID map with nfsuserd o kern/173363 fs [zfs] [panic] Panic on 'zpool replace' on readonly poo o kern/173136 fs [unionfs] mounting above the NFS read-only share panic o kern/172348 fs [unionfs] umount -f of filesystem in use with readonly o kern/172334 fs [unionfs] unionfs permits recursive union mounts; caus o kern/171626 fs [tmpfs] tmpfs should be noisier when the requested siz o kern/171415 fs [zfs] zfs recv fails with "cannot receive incremental o kern/170945 fs [gpt] disk layout not portable between direct connect o bin/170778 fs [zfs] [panic] FreeBSD panics randomly o kern/170680 fs [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA o kern/170497 fs [xfs][panic] kernel will panic whenever I ls a mounted o kern/169945 fs [zfs] [panic] Kernel panic while importing zpool (afte o kern/169480 fs [zfs] ZFS stalls on heavy I/O o kern/169398 fs [zfs] Can't remove file with permanent error o kern/169339 fs panic while " : > /etc/123" o kern/169319 fs [zfs] zfs resilver can't complete o kern/168947 fs [nfs] [zfs] .zfs/snapshot directory is messed up when o kern/168942 fs [nfs] [hang] nfsd hangs after being restarted (not -HU o kern/168158 fs [zfs] incorrect parsing of sharenfs options in zfs (fs o kern/167979 fs [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste o kern/167977 fs [smbfs] mount_smbfs results are differ when utf-8 or U o kern/167688 fs [fusefs] Incorrect signal handling with direct_io o kern/167685 fs [zfs] ZFS on USB drive prevents shutdown / reboot o kern/167612 fs [portalfs] The portal file system gets stuck inside po o kern/167272 fs [zfs] ZFS Disks reordering causes ZFS to pick the wron o kern/167260 fs [msdosfs] msdosfs disk was mounted the second time whe o kern/167109 fs [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene o kern/167105 fs [nfs] mount_nfs can not handle source exports wiht mor o kern/167067 fs [zfs] [panic] ZFS panics the server o kern/167065 fs [zfs] boot fails when a spare is the boot disk o kern/167048 fs [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF o kern/166912 fs [ufs] [panic] Panic after converting Softupdates to jo o kern/166851 fs [zfs] [hang] Copying directory from the mounted UFS di o kern/166477 fs [nfs] NFS data corruption. o kern/165950 fs [ffs] SU+J and fsck problem o kern/165923 fs [nfs] Writing to NFS-backed mmapped files fails if flu o kern/165521 fs [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31 o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/162362 fs [snapshots] [panic] ufs with snapshot(s) panics when g o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161864 fs [ufs] removing journaling from UFS partition fails on o bin/161807 fs [patch] add option for explicitly specifying metadata o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic o kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs p kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o bin/153142 fs [zfs] ls -l outputs `ls: ./.zfs: Operation not support o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis p kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118318 fs [nfs] NFS server hangs under special circumstances o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 300 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Feb 18 19:05:59 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5A78243E; Mon, 18 Feb 2013 19:05:59 +0000 (UTC) (envelope-from eadler@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 3603EFD3; Mon, 18 Feb 2013 19:05:59 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r1IJ5xFD056138; Mon, 18 Feb 2013 19:05:59 GMT (envelope-from eadler@freefall.freebsd.org) Received: (from eadler@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r1IJ5xID056134; Mon, 18 Feb 2013 19:05:59 GMT (envelope-from eadler) Date: Mon, 18 Feb 2013 19:05:59 GMT Message-Id: <201302181905.r1IJ5xID056134@freefall.freebsd.org> To: eadler@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: eadler@FreeBSD.org Subject: Re: bin/176253: zfs pool indentation is misleading/wrong X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2013 19:05:59 -0000 Synopsis: zfs pool indentation is misleading/wrong Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: eadler Responsible-Changed-When: Mon Feb 18 19:05:40 UTC 2013 Responsible-Changed-Why: over to appropriate list http://www.freebsd.org/cgi/query-pr.cgi?pr=176253 From owner-freebsd-fs@FreeBSD.ORG Mon Feb 18 19:10:01 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 59CD85DE for ; Mon, 18 Feb 2013 19:10:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 470FDAD for ; Mon, 18 Feb 2013 19:10:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r1IJA0vk056295 for ; Mon, 18 Feb 2013 19:10:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r1IJA0ES056291; Mon, 18 Feb 2013 19:10:00 GMT (envelope-from gnats) Date: Mon, 18 Feb 2013 19:10:00 GMT Message-Id: <201302181910.r1IJA0ES056291@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: Nathan Rich Subject: Re: misc/176253: zfs pool indentation is misleading/wrong X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Nathan Rich List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2013 19:10:01 -0000 The following reply was made to PR bin/176253; it has been noted by GNATS. From: Nathan Rich To: bug-followup@freebsd.org, Nathan.Rich@dynastysystems.com Cc: Subject: Re: misc/176253: zfs pool indentation is misleading/wrong Date: Mon, 18 Feb 2013 12:03:28 -0700 --e89a8f921a1ea22c3d04d60462c0 Content-Type: text/plain; charset=ISO-8859-1 Indentation doesn't show up in the PR, so here is clarification: the cache section is presented as a top-level item, as if it were a different zpool. --e89a8f921a1ea22c3d04d60462c0 Content-Type: text/html; charset=ISO-8859-1 Indentation doesn't show up in the PR, so here is clarification:
the cache section is presented as a top-level item, as if it were a different zpool.
--e89a8f921a1ea22c3d04d60462c0-- From owner-freebsd-fs@FreeBSD.ORG Mon Feb 18 19:12:49 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id EE689677 for ; Mon, 18 Feb 2013 19:12:49 +0000 (UTC) (envelope-from jmg@h2.funkthat.com) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) by mx1.freebsd.org (Postfix) with ESMTP id ACD39C9 for ; Mon, 18 Feb 2013 19:12:49 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id r1IJChGE024579 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 18 Feb 2013 11:12:43 -0800 (PST) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id r1IJChgf024578 for freebsd-fs@FreeBSD.org; Mon, 18 Feb 2013 11:12:43 -0800 (PST) (envelope-from jmg) Date: Mon, 18 Feb 2013 11:12:42 -0800 From: John-Mark Gurney To: freebsd-fs@FreeBSD.org Subject: ZFS on 9.1 doesn't see errors on geli volumes... Message-ID: <20130218191242.GI55866@funkthat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Mon, 18 Feb 2013 11:12:43 -0800 (PST) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2013 19:12:50 -0000 I'm running 9.1: FreeBSD gold.funkthat.com 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #26 r241041M: Wed Dec 12 23:02:31 PST 2012 jmg@gold.funkthat.com:/usr/src.9stable/sys/amd64/compile/gold amd64 The modifications are limited to improving AES-NI performance. On a box, and decided to go full ZFS w/ geli encrypted volumes (including root fs)... One of the hard drives started going bad, so I started seeing: hpt27xx: Device error information 0x1000000 hpt27xx: Task file error, StatusReg=0x51, ErrReg=0x40, LBA[0-3]=0xf495e928,LBA[4-7]=0x0. (da3:hpt27xx0:0:3:0): READ(10). CDB: 28 0 f4 95 e8 f8 0 0 80 0 (da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed (da3:hpt27xx0:0:3:0): Error 5, Unretryable error GEOM_ELI: g_eli_read_done() failed label/toby.eli[READ(offset=2100974186496, length=90112)] and: (da3:hpt27xx0:0:3:0): WRITE(10). CDB: 2a 0 ef cc 10 90 0 0 8 0 (da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed (da3:hpt27xx0:0:3:0): Error 5, Unretryable error GEOM_ELI: Crypto WRITE request failed (error=5). label/toby.eli[WRITE(offset=2059841654784, length=4096)] So we can see that geli is failing, but zpool status command doesn't show any errors at all... The READ and WRITE columns both show 0 for the device.. Now I do know that the WRITEs are not retried, because if I do a scrub afterward, it detects cksum errors, and does properly increases the count in the CKSUM column... Now if I pull a device, it will see that the device is lost, but no matter how many read or write errors get returned by geli, zfs doesn't seem to count them... Has anyone else seen this w/ ZFS? Is it possible that it's a problem w/ geli, and not ZFS? I haven't tried to run a test w/ gnop to fail some read/writes on -current.. P.S. Please keep me cc'd, as I'm not on the list. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-fs@FreeBSD.ORG Mon Feb 18 20:01:21 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EEA29538; Mon, 18 Feb 2013 20:01:21 +0000 (UTC) (envelope-from jmg@h2.funkthat.com) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) by mx1.freebsd.org (Postfix) with ESMTP id B04B52AD; Mon, 18 Feb 2013 20:01:21 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id r1IK1LTl025235 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 18 Feb 2013 12:01:21 -0800 (PST) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id r1IK1L1W025234; Mon, 18 Feb 2013 12:01:21 -0800 (PST) (envelope-from jmg) Date: Mon, 18 Feb 2013 12:01:21 -0800 From: John-Mark Gurney To: freebsd-fs@FreeBSD.org Subject: Re: ZFS on 9.1 doesn't see errors on geli volumes... Message-ID: <20130218200121.GJ55866@funkthat.com> References: <20130218191242.GI55866@funkthat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130218191242.GI55866@funkthat.com> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Mon, 18 Feb 2013 12:01:21 -0800 (PST) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2013 20:01:22 -0000 John-Mark Gurney wrote this message on Mon, Feb 18, 2013 at 11:12 -0800: > I'm running 9.1: > FreeBSD gold.funkthat.com 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #26 r241041M: Wed Dec 12 23:02:31 PST 2012 jmg@gold.funkthat.com:/usr/src.9stable/sys/amd64/compile/gold amd64 > > The modifications are limited to improving AES-NI performance. > > On a box, and decided to go full ZFS w/ geli encrypted volumes (including > root fs)... One of the hard drives started going bad, so I started > seeing: > hpt27xx: Device error information 0x1000000 > hpt27xx: Task file error, StatusReg=0x51, ErrReg=0x40, LBA[0-3]=0xf495e928,LBA[4-7]=0x0. > (da3:hpt27xx0:0:3:0): READ(10). CDB: 28 0 f4 95 e8 f8 0 0 80 0 > (da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed > (da3:hpt27xx0:0:3:0): Error 5, Unretryable error > GEOM_ELI: g_eli_read_done() failed label/toby.eli[READ(offset=2100974186496, length=90112)] > > and: > (da3:hpt27xx0:0:3:0): WRITE(10). CDB: 2a 0 ef cc 10 90 0 0 8 0 > (da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed > (da3:hpt27xx0:0:3:0): Error 5, Unretryable error > GEOM_ELI: Crypto WRITE request failed (error=5). label/toby.eli[WRITE(offset=2059841654784, length=4096)] > > So we can see that geli is failing, but zpool status command doesn't show > any errors at all... The READ and WRITE columns both show 0 for the device.. > > Now I do know that the WRITEs are not retried, because if I do a scrub > afterward, it detects cksum errors, and does properly increases the > count in the CKSUM column... > > Now if I pull a device, it will see that the device is lost, but no > matter how many read or write errors get returned by geli, zfs doesn't > seem to count them... > > Has anyone else seen this w/ ZFS? Is it possible that it's a problem w/ > geli, and not ZFS? > > I haven't tried to run a test w/ gnop to fail some read/writes on -current.. > > P.S. Please keep me cc'd, as I'm not on the list. Well, after some digging w/ help from smh@, it looks like the write case in geli is broken... in g_eli_write_done, we have the code: if (pbp->bio_error != 0) { G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=%d).", pbp->bio_error); pbp->bio_completed = 0; } /* * Write is finished, send it up. */ pbp->bio_completed = pbp->bio_length; sc = pbp->bio_to->geom->softc; g_io_deliver(pbp, pbp->bio_error); atomic_subtract_int(&sc->sc_inflight, 1); so, we just end up overwriting the bio_completed error... pjd, should we just put the bio_completed = line under an else? something like: if (pbp->bio_error != 0) { G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=%d).", pbp->bio_error); pbp->bio_completed = 0; } else pbp->bio_completed = pbp->bio_length; /* Write is finished, send it up. */ g_io_deliver(pbp, pbp->bio_error); sc = pbp->bio_to->geom->softc; atomic_subtract_int(&sc->sc_inflight, 1); But doesn't explain why read's aren't being counted though... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-fs@FreeBSD.ORG Mon Feb 18 21:14:32 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 89B4C969 for ; Mon, 18 Feb 2013 21:14:32 +0000 (UTC) (envelope-from smh@freebsd.org) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 0715C80F for ; Mon, 18 Feb 2013 21:14:31 +0000 (UTC) X-Spam-Processed: mail1.multiplay.co.uk, Mon, 18 Feb 2013 21:14:22 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002285358.msg for ; Mon, 18 Feb 2013 21:14:22 +0000 X-MDRemoteIP: 188.220.16.49 X-Return-Path: smh@freebsd.org X-Envelope-From: smh@freebsd.org X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.org Message-ID: From: "Steven Hartland" To: "John-Mark Gurney" , References: <20130218191242.GI55866@funkthat.com> <20130218200121.GJ55866@funkthat.com> Subject: Re: ZFS on 9.1 doesn't see errors on geli volumes... Date: Mon, 18 Feb 2013 21:14:35 -0000 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0B02_01CE0E1C.F42EBD20" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2013 21:14:32 -0000 This is a multi-part message in MIME format. ------=_NextPart_000_0B02_01CE0E1C.F42EBD20 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit ----- Original Message ----- From: "John-Mark Gurney" To: Sent: Monday, February 18, 2013 8:01 PM Subject: Re: ZFS on 9.1 doesn't see errors on geli volumes... > John-Mark Gurney wrote this message on Mon, Feb 18, 2013 at 11:12 -0800: >> I'm running 9.1: >> FreeBSD gold.funkthat.com 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #26 r241041M: Wed Dec 12 23:02:31 PST 2012 >> jmg@gold.funkthat.com:/usr/src.9stable/sys/amd64/compile/gold amd64 >> >> The modifications are limited to improving AES-NI performance. >> >> On a box, and decided to go full ZFS w/ geli encrypted volumes (including >> root fs)... One of the hard drives started going bad, so I started >> seeing: >> hpt27xx: Device error information 0x1000000 >> hpt27xx: Task file error, StatusReg=0x51, ErrReg=0x40, LBA[0-3]=0xf495e928,LBA[4-7]=0x0. >> (da3:hpt27xx0:0:3:0): READ(10). CDB: 28 0 f4 95 e8 f8 0 0 80 0 >> (da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed >> (da3:hpt27xx0:0:3:0): Error 5, Unretryable error >> GEOM_ELI: g_eli_read_done() failed label/toby.eli[READ(offset=2100974186496, length=90112)] >> >> and: >> (da3:hpt27xx0:0:3:0): WRITE(10). CDB: 2a 0 ef cc 10 90 0 0 8 0 >> (da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed >> (da3:hpt27xx0:0:3:0): Error 5, Unretryable error >> GEOM_ELI: Crypto WRITE request failed (error=5). label/toby.eli[WRITE(offset=2059841654784, length=4096)] >> >> So we can see that geli is failing, but zpool status command doesn't show >> any errors at all... The READ and WRITE columns both show 0 for the device.. >> >> Now I do know that the WRITEs are not retried, because if I do a scrub >> afterward, it detects cksum errors, and does properly increases the >> count in the CKSUM column... >> >> Now if I pull a device, it will see that the device is lost, but no >> matter how many read or write errors get returned by geli, zfs doesn't >> seem to count them... >> >> Has anyone else seen this w/ ZFS? Is it possible that it's a problem w/ >> geli, and not ZFS? >> >> I haven't tried to run a test w/ gnop to fail some read/writes on -current.. >> >> P.S. Please keep me cc'd, as I'm not on the list. > > Well, after some digging w/ help from smh@, it looks like the write > case in geli is broken... in g_eli_write_done, we have the code: > if (pbp->bio_error != 0) { > G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=%d).", > pbp->bio_error); > pbp->bio_completed = 0; > } > /* > * Write is finished, send it up. > */ > pbp->bio_completed = pbp->bio_length; > sc = pbp->bio_to->geom->softc; > g_io_deliver(pbp, pbp->bio_error); > atomic_subtract_int(&sc->sc_inflight, 1); > > so, we just end up overwriting the bio_completed error... > > pjd, should we just put the bio_completed = line under an else? > > something like: > if (pbp->bio_error != 0) { > G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=%d).", > pbp->bio_error); > pbp->bio_completed = 0; > } else > pbp->bio_completed = pbp->bio_length; > > /* Write is finished, send it up. */ > g_io_deliver(pbp, pbp->bio_error); > sc = pbp->bio_to->geom->softc; > atomic_subtract_int(&sc->sc_inflight, 1); > > But doesn't explain why read's aren't being counted though... Looks like the read case will loose the error if its not the last bio in sector group. The attached should fix both cases. A question for someone familiar with geom: why is bio_completed not set to bio_length in the read success case? Is this correct or is this another little bug? On a related note, if anyone's got some pointers to docs about the internals of geom, I'd be interested :) Regards Steve Regards Steve ------=_NextPart_000_0B02_01CE0E1C.F42EBD20 Content-Type: application/octet-stream; name="g_eli-error-loss.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="g_eli-error-loss.patch" --- sys/geom/eli/g_eli.c.orig 2013-02-18 20:50:53.838663732 +0000=0A= +++ sys/geom/eli/g_eli.c 2013-02-18 21:04:59.429602837 +0000=0A= @@ -158,7 +158,7 @@=0A= =0A= G_ELI_LOGREQ(2, bp, "Request done.");=0A= pbp =3D bp->bio_parent;=0A= - if (pbp->bio_error =3D=3D 0)=0A= + if (pbp->bio_error =3D=3D 0 && bp->bio_error !=3D 0)=0A= pbp->bio_error =3D bp->bio_error;=0A= g_destroy_bio(bp);=0A= /*=0A= @@ -169,7 +169,8 @@=0A= return;=0A= sc =3D pbp->bio_to->geom->softc;=0A= if (pbp->bio_error !=3D 0) {=0A= - G_ELI_LOGREQ(0, pbp, "%s() failed", __func__);=0A= + G_ELI_LOGREQ(0, pbp, "%s() failed (error=3D%d)", __func__,=0A= + pbp->bio_error);=0A= pbp->bio_completed =3D 0;=0A= if (pbp->bio_driver2 !=3D NULL) {=0A= free(pbp->bio_driver2, M_ELI);=0A= @@ -198,10 +199,8 @@=0A= =0A= G_ELI_LOGREQ(2, bp, "Request done.");=0A= pbp =3D bp->bio_parent;=0A= - if (pbp->bio_error =3D=3D 0) {=0A= - if (bp->bio_error !=3D 0)=0A= - pbp->bio_error =3D bp->bio_error;=0A= - }=0A= + if (pbp->bio_error =3D=3D 0 && bp->bio_error !=3D 0)=0A= + pbp->bio_error =3D bp->bio_error;=0A= g_destroy_bio(bp);=0A= /*=0A= * Do we have all sectors already?=0A= @@ -212,14 +211,15 @@=0A= free(pbp->bio_driver2, M_ELI);=0A= pbp->bio_driver2 =3D NULL;=0A= if (pbp->bio_error !=3D 0) {=0A= - G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=3D%d).",=0A= + G_ELI_LOGREQ(0, pbp, "%s() failed (error=3D%d)", __func__,=0A= pbp->bio_error);=0A= pbp->bio_completed =3D 0;=0A= - }=0A= + } else=0A= + pbp->bio_completed =3D pbp->bio_length;=0A= +=0A= /*=0A= * Write is finished, send it up.=0A= */=0A= - pbp->bio_completed =3D pbp->bio_length;=0A= sc =3D pbp->bio_to->geom->softc;=0A= g_io_deliver(pbp, pbp->bio_error);=0A= atomic_subtract_int(&sc->sc_inflight, 1);=0A= ------=_NextPart_000_0B02_01CE0E1C.F42EBD20-- From owner-freebsd-fs@FreeBSD.ORG Mon Feb 18 22:38:18 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C2F2D10C for ; Mon, 18 Feb 2013 22:38:18 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (garage.dawidek.net [91.121.88.72]) by mx1.freebsd.org (Postfix) with ESMTP id 5F07AAFC for ; Mon, 18 Feb 2013 22:38:18 +0000 (UTC) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) by mail.dawidek.net (Postfix) with ESMTPSA id F03E6408; Mon, 18 Feb 2013 23:35:22 +0100 (CET) Date: Mon, 18 Feb 2013 23:39:23 +0100 From: Pawel Jakub Dawidek To: John-Mark Gurney Subject: Re: ZFS on 9.1 doesn't see errors on geli volumes... Message-ID: <20130218223923.GB1375@garage.freebsd.pl> References: <20130218191242.GI55866@funkthat.com> <20130218200121.GJ55866@funkthat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="rS8CxjVDS/+yyDmU" Content-Disposition: inline In-Reply-To: <20130218200121.GJ55866@funkthat.com> X-OS: FreeBSD 10.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2013 22:38:18 -0000 --rS8CxjVDS/+yyDmU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Feb 18, 2013 at 12:01:21PM -0800, John-Mark Gurney wrote: > John-Mark Gurney wrote this message on Mon, Feb 18, 2013 at 11:12 -0800: > > I'm running 9.1: > > FreeBSD gold.funkthat.com 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #26 r24= 1041M: Wed Dec 12 23:02:31 PST 2012 jmg@gold.funkthat.com:/usr/src.9sta= ble/sys/amd64/compile/gold amd64 > >=20 > > The modifications are limited to improving AES-NI performance. > >=20 > > On a box, and decided to go full ZFS w/ geli encrypted volumes (includi= ng > > root fs)... One of the hard drives started going bad, so I started > > seeing: > > hpt27xx: Device error information 0x1000000 > > hpt27xx: Task file error, StatusReg=3D0x51, ErrReg=3D0x40, LBA[0-3]=3D0= xf495e928,LBA[4-7]=3D0x0. > > (da3:hpt27xx0:0:3:0): READ(10). CDB: 28 0 f4 95 e8 f8 0 0 80 0=20 > > (da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed > > (da3:hpt27xx0:0:3:0): Error 5, Unretryable error > > GEOM_ELI: g_eli_read_done() failed label/toby.eli[READ(offset=3D2100974= 186496, length=3D90112)] > >=20 > > and: > > (da3:hpt27xx0:0:3:0): WRITE(10). CDB: 2a 0 ef cc 10 90 0 0 8 0=20 > > (da3:hpt27xx0:0:3:0): CAM status: Auto-Sense Retrieval Failed > > (da3:hpt27xx0:0:3:0): Error 5, Unretryable error > > GEOM_ELI: Crypto WRITE request failed (error=3D5). label/toby.eli[WRITE= (offset=3D2059841654784, length=3D4096)] > >=20 > > So we can see that geli is failing, but zpool status command doesn't sh= ow > > any errors at all... The READ and WRITE columns both show 0 for the de= vice.. > >=20 > > Now I do know that the WRITEs are not retried, because if I do a scrub > > afterward, it detects cksum errors, and does properly increases the > > count in the CKSUM column... > >=20 > > Now if I pull a device, it will see that the device is lost, but no > > matter how many read or write errors get returned by geli, zfs doesn't > > seem to count them... > >=20 > > Has anyone else seen this w/ ZFS? Is it possible that it's a problem w/ > > geli, and not ZFS? > >=20 > > I haven't tried to run a test w/ gnop to fail some read/writes on -curr= ent.. > >=20 > > P.S. Please keep me cc'd, as I'm not on the list. >=20 > Well, after some digging w/ help from smh@, it looks like the write > case in geli is broken... in g_eli_write_done, we have the code: > if (pbp->bio_error !=3D 0) { > G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error= =3D%d).", > pbp->bio_error); > pbp->bio_completed =3D 0; > } > /* > * Write is finished, send it up. > */ > pbp->bio_completed =3D pbp->bio_length; > sc =3D pbp->bio_to->geom->softc; > g_io_deliver(pbp, pbp->bio_error); > atomic_subtract_int(&sc->sc_inflight, 1); >=20 > so, we just end up overwriting the bio_completed error... >=20 > pjd, should we just put the bio_completed =3D line under an else? >=20 > something like: > if (pbp->bio_error !=3D 0) { > G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=3D%d).", > pbp->bio_error); > pbp->bio_completed =3D 0; > } else > pbp->bio_completed =3D pbp->bio_length; >=20 > /* Write is finished, send it up. */ > g_io_deliver(pbp, pbp->bio_error); > sc =3D pbp->bio_to->geom->softc; > atomic_subtract_int(&sc->sc_inflight, 1); >=20 > But doesn't explain why read's aren't being counted though... Your patch looks correct (but add { } around else content before committing). The logic in vdev_geom.c should also be modified to treat other errors just like EIOs. This all still doesn't explain what you are seeing, as you did have EIOs. Experimenting with gnop may provide more info. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl --rS8CxjVDS/+yyDmU Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlEirZsACgkQForvXbEpPzT9/ACgz5v6IEAct2VE77NxS6TBo+YP IWoAn3EAyT2rnsSKSIGacppcyFc193iM =oeJ/ -----END PGP SIGNATURE----- --rS8CxjVDS/+yyDmU-- From owner-freebsd-fs@FreeBSD.ORG Mon Feb 18 23:57:00 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 71C2D275 for ; Mon, 18 Feb 2013 23:57:00 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 1C936E64 for ; Mon, 18 Feb 2013 23:56:59 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEANC+IlGDaFvO/2dsb2JhbABEhkm5W4Ebc4IfAQEBAwEBAQEgKyALBRYYAgINGQIpAQkmBggHBAEcBIdrBgyueJI2gSOMOhAGBIEDNAeCLYETA4hniw2COIEdjzuDJU99CBce X-IronPort-AV: E=Sophos;i="4.84,691,1355115600"; d="scan'208";a="14644626" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 18 Feb 2013 18:56:58 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id F2B75B3F45; Mon, 18 Feb 2013 18:56:58 -0500 (EST) Date: Mon, 18 Feb 2013 18:56:58 -0500 (EST) From: Rick Macklem To: Momchil Ivanov Message-ID: <1794994447.3103158.1361231818953.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <86bobtmvb0.wl%momchil@xaxo.eu> Subject: Re: NFS + Kerberos MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2013 23:57:00 -0000 Monchil Ivanov wrote: > Hello, > > I have been trying to follow this guide [1] to get NFS with Kerberos > working on FreeBSD, but I have some trouble. I hope somebody has the > time and desire to help me... > > I am using FreeBSD 9.1 as NFS server with the following configuration > on the server: > > file /etc/krb5.conf: > > [libdefaults] > default_realm = EXAMPLE.LOCAL > default_etypes = des-cbc-crc > default_etypes_des = des-cbc-crc > allow_weak_crypto = true > [realms] > EXAMPLE.LOCAL = { > kdc = kerberos.example.local > admin_server = kerberos.example.local > } > [domain_realm] > .example.local = EXAMPLE.LOCAL > > file /etc/exports: > > V4: / -sec=krb5i:krb5p > /tank/storage -sec=krb5i:krb5p > > file /etc/rc.conf: > > ## nfsv4 > nfs_server_enable="YES" > nfsv4_server_enable="YES" > nfsuserd_enable="YES" > mountd_enable="YES" > mountd_flags="-r -n" > > # for kerberos > gssd_enable="YES" > > kerberos seems to be working: > > root@srv:/root # kinit -k nfs/srv.example.local > root@srv:/root # klist > Credentials cache: FILE:/tmp/krb5cc_0 > Principal: nfs/srv.example.local@EXAMPLE.LOCAL > > Issued Expires Principal > Feb 2 21:04:02 Feb 3 07:04:02 krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL > root@srv:/root # kdestroy > root@srv:/root # ktutil list > FILE:/etc/krb5.keytab: > > Vno Type Principal > 1 des-cbc-crc nfs/srv.example.local@EXAMPLE.LOCAL > > krb4:/etc/srvtab: > > Vno Type Principal > > the client is FreeBSD 8.2 with the following configuration: > > file /etc/krb5.conf: > > [libdefaults] > default_realm = EXAMPLE.LOCAL > default_etypes = des-cbc-crc > default_etypes_des = des-cbc-crc > allow_weak_crypto = true > [realms] > EXAMPLE.LOCAL = { > kdc = kerberos.example.local > admin_server = kerberos.example.local > } > [domain_realm] > .example.local = EXAMPLE.LOCAL > > file /etc/rc.conf: > > ## NFS v4 > nfsuserd_enable="YES" > nfscbd_enable="YES" > # kerberos > gssd_enable="YES" > > file /etc/sysctl.conf: > # Allow normal users to mount filesystems. > vfs.usermount=1 > > here is the output from the client: > > $ klist > klist: No ticket file: /tmp/krb5cc_1001 > > $ mount -t nfs -o nfsv4,soft,sec=krb5i srv.example.local:/tank/storage > /mnt/srv > mount_nfs: can't update /var/db/mounttab for > srv.example.local:/tank/storage > nfsv4 err=10016 > mount_nfs: /mnt/srv, : Input/output error > > then I do: > > $ kinit user > $ klist > Credentials cache: FILE:/tmp/krb5cc_1001 > Principal: user@EXAMPLE.LOCAL > > Issued Expires Principal > Feb 2 21:15:36 Feb 3 07:15:33 krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL > > $ mount -t nfs -o nfsv4,soft,sec=krb5i srv.example.local:/tank/storage > /mnt/srv > mount_nfs: can't update /var/db/mounttab for > srv.example.local:/tank/storage > nfsv4 err=10016 > mount_nfs: /mnt/srv, : Input/output error > > $ klist > Credentials cache: FILE:/tmp/krb5cc_1001 > Principal: user@EXAMPLE.LOCAL > > Issued Expires Principal > Feb 2 21:15:36 Feb 3 07:15:33 krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL > Feb 2 21:15:43 Feb 3 07:15:33 nfs/srv.example.local@EXAMPLE.LOCAL > > Note: the mount works without Kerberos if I add "sys" to the "sec" > option on both lines of /etc/exports, ownership works too, therefore I > think that nfsv4 works, nfsv3 works too. However I have no idea why > they don't work with Kerberos. > > Note: With and without a kerberos ticket, the result when using nfsv3 > is: > > $ mount -t nfs -o nfsv3,soft,sec=krb5i srv.example.local:/tank/storage > /mnt/srv > mount_nfs: can't update /var/db/mounttab for > srv.example.local:/tank/storage > > $ ls /mnt/srv > ls: /mnt/srv: Permission denied > > Is there an easy way to get it working? Am I doing something wrong? > Thanks to Elias's hard work, a bug/fix has just been isolated in the Kerberos library that causes the gssd to fail to translate a principal to a uid. The fix is to increase the size of the buffer passed to getpwnam_r(). See this thread: http://docs.FreeBSD.org/cgi/mid.cgi?CADtN0WKVzbKxhaLQw8y2KLhhRJC9n4ht9wyPmGQ+pHqSjQkVNw I haven't run into this bug, so I don't know what systems are affected, but it would explain why you can't get it working. I'd suggest you apply the patch in the email (increase buf to 1024) and then try again with libraries built with the patch. rick > PS: Please CC me, since I am not subscribed. > > 1: http://code.google.com/p/macnfsv4/wiki/FreeBSD8KerberizedNFSSetup > > Regards, > Momchil > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Feb 19 07:08:36 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CCF6D7D4; Tue, 19 Feb 2013 07:08:36 +0000 (UTC) (envelope-from alfred@ixsystems.com) Received: from mail.iXsystems.com (newknight.ixsystems.com [206.40.55.70]) by mx1.freebsd.org (Postfix) with ESMTP id 8E9DF99; Tue, 19 Feb 2013 07:08:36 +0000 (UTC) Received: from localhost (mail.ixsystems.com [10.2.55.1]) by mail.iXsystems.com (Postfix) with ESMTP id DB4BF80FF; Mon, 18 Feb 2013 23:08:35 -0800 (PST) Received: from mail.iXsystems.com ([10.2.55.1]) by localhost (mail.ixsystems.com [10.2.55.1]) (maiad, port 10024) with ESMTP id 90308-08; Mon, 18 Feb 2013 23:08:35 -0800 (PST) Received: from Alfreds-MacBook-Pro-9.local (unknown [10.8.0.26]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.iXsystems.com (Postfix) with ESMTPSA id 617B280EC; Mon, 18 Feb 2013 23:08:35 -0800 (PST) Message-ID: <512324F2.4060707@ixsystems.com> Date: Mon, 18 Feb 2013 23:08:34 -0800 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: Konstantin Belousov , Doug Rabson , Xin Li , fs@freebsd.org Subject: Advisory lock crashes. Content-Type: multipart/mixed; boundary="------------080502030802070307030108" X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2013 07:08:36 -0000 This is a multi-part message in MIME format. --------------080502030802070307030108 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hello Konstantin & Doug, We're getting a few crashes in what looks to be kern_lockf.c: fault address here is 0x360 which appears to mean that the "sx" owner thread is NULL db> bt Tracing pid 5099 tid 101614 td 0xfffffe005d54e8c0 _sx_xlock_hard() at _sx_xlock_hard+0xb3 lf_advlockasync() at lf_advlockasync+0x5d7 lf_advlock() at lf_advlock+0x47 vop_stdadvlock() at vop_stdadvlock+0xb3 VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x4a closef() at closef+0x352 kern_close() at kern_close+0x172 amd64_syscall() at amd64_syscall+0x58a Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8011651fc, rsp = 0x7fffffbfdd58, rbp = 0x807c3d6c0 --- (kgdb) list *(_sx_xlock_hard+0xb3) 0xffffffff806242c3 is in _sx_xlock_hard (/usr/home/jpaetzel/9.0.6-RELEASE-p1/FreeBSD/src/sys/kern/kern_sx.c:514). 509 x = sx->sx_lock; 510 if ((sx->lock_object.lo_flags & SX_NOADAPTIVE) == 0) { 511 if ((x & SX_LOCK_SHARED) == 0) { 512 x = SX_OWNER(x); 513 owner = (struct thread *)x; 514 if (TD_IS_RUNNING(owner)) { 515 if (LOCK_LOG_TEST(&sx->lock_object, 0)) 516 CTR3(KTR_LOCK, 517 "%s: spinning on %p held by %p", 518 __func__, sx, owner); Another panic here, which we have less information is attached as an image. We're looking at using some INVARIANTS and WITNESS kernels, but was wondering if y'all had any other suggestions to use please? thank you, -Alfred --------------080502030802070307030108-- From owner-freebsd-fs@FreeBSD.ORG Tue Feb 19 07:33:12 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 23271B3D for ; Tue, 19 Feb 2013 07:33:12 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 8A7EC173 for ; Tue, 19 Feb 2013 07:33:11 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r1J7Wu4b048370; Tue, 19 Feb 2013 09:32:56 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r1J7Wu4b048370 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r1J7Wuoi048369; Tue, 19 Feb 2013 09:32:56 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 19 Feb 2013 09:32:56 +0200 From: Konstantin Belousov To: Alfred Perlstein Subject: Re: Advisory lock crashes. Message-ID: <20130219073256.GV2598@kib.kiev.ua> References: <512324F2.4060707@ixsystems.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="sr8SBrQ3fbgntwtR" Content-Disposition: inline In-Reply-To: <512324F2.4060707@ixsystems.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: Xin Li , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2013 07:33:12 -0000 --sr8SBrQ3fbgntwtR Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Feb 18, 2013 at 11:08:34PM -0800, Alfred Perlstein wrote: > Hello Konstantin & Doug, >=20 > We're getting a few crashes in what looks to be kern_lockf.c: >=20 > fault address here is 0x360 which appears to mean that the "sx" owner=20 > thread is NULL What is the version of FreeBSD ? What is the filesystem owning the file which was advlocked ? Show the line number for lf_advlockasync+0x5d7. No, I never saw nothing similar in last 3 years. >=20 > db> bt > Tracing pid 5099 tid 101614 td 0xfffffe005d54e8c0 > _sx_xlock_hard() at _sx_xlock_hard+0xb3 > lf_advlockasync() at lf_advlockasync+0x5d7 > lf_advlock() at lf_advlock+0x47 > vop_stdadvlock() at vop_stdadvlock+0xb3 > VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x4a > closef() at closef+0x352 > kern_close() at kern_close+0x172 > amd64_syscall() at amd64_syscall+0x58a > Xfast_syscall() at Xfast_syscall+0xf7 > --- syscall (6, FreeBSD ELF64, sys_close), rip =3D 0x8011651fc, rsp =3D 0= x7fffffbfdd58, rbp =3D 0x807c3d6c0 --- >=20 > (kgdb) list *(_sx_xlock_hard+0xb3) > 0xffffffff806242c3 is in _sx_xlock_hard=20 > (/usr/home/jpaetzel/9.0.6-RELEASE-p1/FreeBSD/src/sys/kern/kern_sx.c:514). > 509 x =3D sx->sx_lock; > 510 if ((sx->lock_object.lo_flags & SX_NOADAPTIVE)=20 > =3D=3D 0) { > 511 if ((x & SX_LOCK_SHARED) =3D=3D 0) { > 512 x =3D SX_OWNER(x); > 513 owner =3D (struct thread *)x; > 514 if (TD_IS_RUNNING(owner)) { > 515 if=20 > (LOCK_LOG_TEST(&sx->lock_object, 0)) > 516 CTR3(KTR_LOCK, > 517 "%s: spinning on %p= =20 > held by %p", > 518 __func__, sx, owner); >=20 >=20 > Another panic here, which we have less information is attached as an imag= e. >=20 > We're looking at using some INVARIANTS and WITNESS kernels, but was=20 > wondering if y'all had any other suggestions to use please? >=20 > thank you, > -Alfred --sr8SBrQ3fbgntwtR Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRIyqnAAoJEJDCuSvBvK1B0egP/A9aHJw0KZcC+gz05cmIDwyd 3A4I4+wCOdvBEJbJOU08sYJdbWrNPCuMzAaovTLQ8P7a/IO667p6/UHpK4UqtLhX 5F3euYJ8F7Rac+AQ321txEQAGN4dQaFcUezaekU7H6kX0CN5n0d0JJyd/GwMDNK6 764Y8pKm3AWBXTw2qVWKbXjE+FH5kdq9sxiGq8y6noCSXMJY5kbA1XrlQ5f3EvrP aHzs4uL42XBjIPVbFwyV7Z4KNUWN5RwSlqoQlHpbW9jJVaSpPge+LpMDihft4LED gR3fzsFh0Q0s+a9we1TGggnyQp8ukffqmYmES56I1gOEiu14z1cUGsBZyJYEjm5y DPmIc/MJhnmXTbSZgDw5EWas3keXt4AwPi+pcGaaRPlpyxZ6jPApxe4XGm3Q8060 eEkoKLvvvBRzPPwgy9zc2MRheN0RtipW+58ZHBmJAnFvLJOgGl/YiSFcGTJK1M2R X19kWAQfTVqkq1SpGTakfvED1Rg2lBwXNzsrWSqq28KcMYK1+PnvGaNptr+ApUXg +gHgr1FWw10ka3yzMPUz2CvDtFUnIMz1/VWoAl8+KMZqjvxUc04pH/N8M/mz7XnZ I/dpEmdD7Lctiw0+UNo9pbs1369St01DzFiGkbXnDOPJva5PNFGBkweth3ENnOZk p9UK4wA+oEqR0mBnJBz2 =pOEM -----END PGP SIGNATURE----- --sr8SBrQ3fbgntwtR-- From owner-freebsd-fs@FreeBSD.ORG Tue Feb 19 08:07:17 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 604A38A8 for ; Tue, 19 Feb 2013 08:07:17 +0000 (UTC) (envelope-from alfred@ixsystems.com) Received: from mail.iXsystems.com (newknight.ixsystems.com [206.40.55.70]) by mx1.freebsd.org (Postfix) with ESMTP id 494132DD for ; Tue, 19 Feb 2013 08:07:16 +0000 (UTC) Received: from localhost (mail.ixsystems.com [10.2.55.1]) by mail.iXsystems.com (Postfix) with ESMTP id BBEFD82F9; Tue, 19 Feb 2013 00:07:16 -0800 (PST) Received: from mail.iXsystems.com ([10.2.55.1]) by localhost (mail.ixsystems.com [10.2.55.1]) (maiad, port 10024) with ESMTP id 97714-01; Tue, 19 Feb 2013 00:07:16 -0800 (PST) Received: from Alfreds-MacBook-Pro-9.local (unknown [10.8.0.26]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.iXsystems.com (Postfix) with ESMTPSA id 1006082F0; Tue, 19 Feb 2013 00:07:16 -0800 (PST) Message-ID: <512332B3.10400@ixsystems.com> Date: Tue, 19 Feb 2013 00:07:15 -0800 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: Advisory lock crashes. References: <512324F2.4060707@ixsystems.com> <20130219073256.GV2598@kib.kiev.ua> In-Reply-To: <20130219073256.GV2598@kib.kiev.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Xin Li , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2013 08:07:17 -0000 On 2/18/13 11:32 PM, Konstantin Belousov wrote: > On Mon, Feb 18, 2013 at 11:08:34PM -0800, Alfred Perlstein wrote: >> Hello Konstantin & Doug, >> >> We're getting a few crashes in what looks to be kern_lockf.c: >> >> fault address here is 0x360 which appears to mean that the "sx" owner >> thread is NULL > What is the version of FreeBSD ? This is a releng 9.0 system. (note, we have the most up to date version of this file with the exception of a cosmetic diff for MALLOC defines). > What is the filesystem owning the file which was advlocked ? I'm pretty sure that is going to be ZFS. > Show the line number for lf_advlockasync+0x5d7. > (kgdb) list *(lf_advlockasync+0x5d7) > 0xffffffff80604fc7 is in lf_advlockasync (sx.h:152). > 147 { > 148 uintptr_t tid = (uintptr_t)td; > 149 int error = 0; > 150 > 151 if (!atomic_cmpset_acq_ptr(&sx->sx_lock, > SX_LOCK_UNLOCKED, tid)) > 152 error = _sx_xlock_hard(sx, tid, opts, file, line); > 153 else > 154 LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(LS_SX_XLOCK_ACQUIRE, > 155 sx, 0, 0, file, line); > 156 That may not be helpful so I've included this: /usr/home/alfred # bc ibase=16 5D7 1495 (kgdb) disasse lf_advlockasync Dump of assembler code for function lf_advlockasync: 0xffffffff806049f0 : push %rbp 0xffffffff806049f1 : mov %rdx,%rcx > 0xffffffff80604f70 : mov -0x80(%rbp),%rdi > 0xffffffff80604f74 : xor %ecx,%ecx > 0xffffffff80604f76 : xor %edx,%edx > 0xffffffff80604f78 : mov %rbx,%rsi > 0xffffffff80604f7b : callq > 0xffffffff806246d0 <_sx_xunlock_hard> > 0xffffffff80604f80 : jmpq > 0xffffffff80604c53 > 0xffffffff80604f85 : mov -0x58(%rbp),%rcx > 0xffffffff80604f89 : xor %r12d,%r12d > 0xffffffff80604f8c : mov 0x18(%rcx),%edi > 0xffffffff80604f8f : callq > 0xffffffff80603b90 > 0xffffffff80604f94 : jmpq > 0xffffffff80604c70 > 0xffffffff80604f99 : lea 0xc8(%r13),%rdi > 0xffffffff80604fa0 : xor %r8d,%r8d > 0xffffffff80604fa3 : xor %ecx,%ecx > 0xffffffff80604fa5 : xor %edx,%edx > 0xffffffff80604fa7 : mov %rbx,%rsi > 0xffffffff80604faa : callq > 0xffffffff8060a1f0 <_mtx_lock_sleep> > 0xffffffff80604faf : jmpq > 0xffffffff80604f2e > 0xffffffff80604fb4 : mov -0x80(%rbp),%rdi > 0xffffffff80604fb8 : xor %r8d,%r8d > 0xffffffff80604fbb : xor %ecx,%ecx > 0xffffffff80604fbd : xor %edx,%edx > 0xffffffff80604fbf : mov %rbx,%rsi > 0xffffffff80604fc2 : callq > 0xffffffff80624210 <_sx_xlock_hard> > 0xffffffff80604fc7 : jmpq > 0xffffffff80604f15 > 0xffffffff80604fcc : lea 0xc8(%r13),%rdi > 0xffffffff80604fd3 : xor %ecx,%ecx > 0xffffffff80604fd5 : xor %edx,%edx > 0xffffffff80604fd7 : xor %esi,%esi > 0xffffffff80604fd9 : callq > 0xffffffff8060a040 <_mtx_unlock_sleep> > 0xffffffff80604fde : jmpq > 0xffffffff80604f5c > 0xffffffff80604fe3 : mov %r15,(%rcx) > 0xffffffff80604fe6 : mov %r15,%r14 > 0xffffffff80604fe9 : mov %gs:0x0,%rax > 0xffffffff80604ff2 : lock cmpxchg > %rbx,0xe0(%r13) > > No, I never saw nothing similar in last 3 years. Yes, I'd suspect we'd all see more things here. We're very much capable of adding instrumentation to the OS/kernel to help track this down if you have ideas. -Alfred >> db> bt >> Tracing pid 5099 tid 101614 td 0xfffffe005d54e8c0 >> _sx_xlock_hard() at _sx_xlock_hard+0xb3 >> lf_advlockasync() at lf_advlockasync+0x5d7 >> lf_advlock() at lf_advlock+0x47 >> vop_stdadvlock() at vop_stdadvlock+0xb3 >> VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x4a >> closef() at closef+0x352 >> kern_close() at kern_close+0x172 >> amd64_syscall() at amd64_syscall+0x58a >> Xfast_syscall() at Xfast_syscall+0xf7 >> --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8011651fc, rsp = 0x7fffffbfdd58, rbp = 0x807c3d6c0 --- >> >> (kgdb) list *(_sx_xlock_hard+0xb3) >> 0xffffffff806242c3 is in _sx_xlock_hard >> (/usr/home/jpaetzel/9.0.6-RELEASE-p1/FreeBSD/src/sys/kern/kern_sx.c:514). >> 509 x = sx->sx_lock; >> 510 if ((sx->lock_object.lo_flags & SX_NOADAPTIVE) >> == 0) { >> 511 if ((x & SX_LOCK_SHARED) == 0) { >> 512 x = SX_OWNER(x); >> 513 owner = (struct thread *)x; >> 514 if (TD_IS_RUNNING(owner)) { >> 515 if >> (LOCK_LOG_TEST(&sx->lock_object, 0)) >> 516 CTR3(KTR_LOCK, >> 517 "%s: spinning on %p >> held by %p", >> 518 __func__, sx, owner); >> >> >> Another panic here, which we have less information is attached as an image. >> >> We're looking at using some INVARIANTS and WITNESS kernels, but was >> wondering if y'all had any other suggestions to use please? >> >> thank you, >> -Alfred > From owner-freebsd-fs@FreeBSD.ORG Tue Feb 19 08:20:40 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A0A2C30B for ; Tue, 19 Feb 2013 08:20:40 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id DAFAD634 for ; Tue, 19 Feb 2013 08:20:39 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r1J8KQ07054222; Tue, 19 Feb 2013 10:20:26 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r1J8KQ07054222 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r1J8KQAP054221; Tue, 19 Feb 2013 10:20:26 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 19 Feb 2013 10:20:26 +0200 From: Konstantin Belousov To: Alfred Perlstein Subject: Re: Advisory lock crashes. Message-ID: <20130219082026.GY2598@kib.kiev.ua> References: <512324F2.4060707@ixsystems.com> <20130219073256.GV2598@kib.kiev.ua> <512332B3.10400@ixsystems.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DejVYFcqCV4p9T4J" Content-Disposition: inline In-Reply-To: <512332B3.10400@ixsystems.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: Xin Li , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2013 08:20:40 -0000 --DejVYFcqCV4p9T4J Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 19, 2013 at 12:07:15AM -0800, Alfred Perlstein wrote: > On 2/18/13 11:32 PM, Konstantin Belousov wrote: > > On Mon, Feb 18, 2013 at 11:08:34PM -0800, Alfred Perlstein wrote: > >> Hello Konstantin & Doug, > >> > >> We're getting a few crashes in what looks to be kern_lockf.c: > >> > >> fault address here is 0x360 which appears to mean that the "sx" owner > >> thread is NULL > > What is the version of FreeBSD ? > This is a releng 9.0 system. (note, we have the most up to date version= =20 > of this file with the exception of a cosmetic diff for MALLOC defines). My suspicion is that the issue is not in the kern_lockf.c at all, rather it is a bug in the vnode lifetime management in the filesystem code. If true, the absense of the changes in the kern_lockf.c does not matter, but the changes in ZFS do. AFAIR, there were a lot of fixes in this area for ZFS, done by avg. >=20 > > What is the filesystem owning the file which was advlocked ? > I'm pretty sure that is going to be ZFS. >=20 > > Show the line number for lf_advlockasync+0x5d7. >=20 > > (kgdb) list *(lf_advlockasync+0x5d7) > > 0xffffffff80604fc7 is in lf_advlockasync (sx.h:152). > > 147 { > > 148 uintptr_t tid =3D (uintptr_t)td; > > 149 int error =3D 0; > > 150 > > 151 if (!atomic_cmpset_acq_ptr(&sx->sx_lock,=20 > > SX_LOCK_UNLOCKED, tid)) > > 152 error =3D _sx_xlock_hard(sx, tid, opts, file, l= ine); > > 153 else > > 154 LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(LS_SX_XLOCK_ACQUIRE, > > 155 sx, 0, 0, file, line); > > 156 > That may not be helpful so I've included this: > /usr/home/alfred # bc > ibase=3D16 > 5D7 > 1495 >=20 > (kgdb) disasse lf_advlockasync > Dump of assembler code for function lf_advlockasync: > 0xffffffff806049f0 : push %rbp > 0xffffffff806049f1 : mov %rdx,%rcx > > 0xffffffff80604f70 : mov -0x80(%rbp),%rdi > > 0xffffffff80604f74 : xor %ecx,%ecx > > 0xffffffff80604f76 : xor %edx,%edx > > 0xffffffff80604f78 : mov %rbx,%rsi > > 0xffffffff80604f7b : callq=20 > > 0xffffffff806246d0 <_sx_xunlock_hard> > > 0xffffffff80604f80 : jmpq=20 > > 0xffffffff80604c53 > > 0xffffffff80604f85 : mov -0x58(%rbp),%rcx > > 0xffffffff80604f89 : xor %r12d,%r12d > > 0xffffffff80604f8c : mov 0x18(%rcx),%edi > > 0xffffffff80604f8f : callq=20 > > 0xffffffff80603b90 > > 0xffffffff80604f94 : jmpq=20 > > 0xffffffff80604c70 > > 0xffffffff80604f99 : lea 0xc8(%r13),%rdi > > 0xffffffff80604fa0 : xor %r8d,%r8d > > 0xffffffff80604fa3 : xor %ecx,%ecx > > 0xffffffff80604fa5 : xor %edx,%edx > > 0xffffffff80604fa7 : mov %rbx,%rsi > > 0xffffffff80604faa : callq=20 > > 0xffffffff8060a1f0 <_mtx_lock_sleep> > > 0xffffffff80604faf : jmpq=20 > > 0xffffffff80604f2e > > 0xffffffff80604fb4 : mov -0x80(%rbp),%rdi > > 0xffffffff80604fb8 : xor %r8d,%r8d > > 0xffffffff80604fbb : xor %ecx,%ecx > > 0xffffffff80604fbd : xor %edx,%edx > > 0xffffffff80604fbf : mov %rbx,%rsi > > 0xffffffff80604fc2 : callq=20 > > 0xffffffff80624210 <_sx_xlock_hard> > > 0xffffffff80604fc7 : jmpq=20 > > 0xffffffff80604f15 > > 0xffffffff80604fcc : lea 0xc8(%r13),%rdi > > 0xffffffff80604fd3 : xor %ecx,%ecx > > 0xffffffff80604fd5 : xor %edx,%edx > > 0xffffffff80604fd7 : xor %esi,%esi > > 0xffffffff80604fd9 : callq=20 > > 0xffffffff8060a040 <_mtx_unlock_sleep> > > 0xffffffff80604fde : jmpq=20 > > 0xffffffff80604f5c > > 0xffffffff80604fe3 : mov %r15,(%rcx) > > 0xffffffff80604fe6 : mov %r15,%r14 > > 0xffffffff80604fe9 : mov %gs:0x0,%rax > > 0xffffffff80604ff2 : lock cmpxchg=20 > > %rbx,0xe0(%r13) This is not helpful too, you demonstrated the inlined part of the sx_lock(). I need to understand which sx caused the issue, state->ls_lock (and then it is related to the vnode life), or lf_lock_states_lock. Either the logic of the assembler should be analyzed to decipher which lock is it, or try to list more lines around the reported address, to see which sx_xlock() line is there. >=20 >=20 > > > > No, I never saw nothing similar in last 3 years. >=20 > Yes, I'd suspect we'd all see more things here. We're very much capable= =20 > of adding instrumentation to the OS/kernel to help track this down if=20 > you have ideas. INVARIANTS, DIAGNOSTIC, DEBUG_VFS_LOCK. What is needed is the printout of *vp involved in the panic. >=20 > -Alfred >=20 >=20 > >> db> bt > >> Tracing pid 5099 tid 101614 td 0xfffffe005d54e8c0 > >> _sx_xlock_hard() at _sx_xlock_hard+0xb3 > >> lf_advlockasync() at lf_advlockasync+0x5d7 > >> lf_advlock() at lf_advlock+0x47 > >> vop_stdadvlock() at vop_stdadvlock+0xb3 > >> VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x4a > >> closef() at closef+0x352 > >> kern_close() at kern_close+0x172 > >> amd64_syscall() at amd64_syscall+0x58a > >> Xfast_syscall() at Xfast_syscall+0xf7 > >> --- syscall (6, FreeBSD ELF64, sys_close), rip =3D 0x8011651fc, rsp = =3D 0x7fffffbfdd58, rbp =3D 0x807c3d6c0 --- > >> > >> (kgdb) list *(_sx_xlock_hard+0xb3) > >> 0xffffffff806242c3 is in _sx_xlock_hard > >> (/usr/home/jpaetzel/9.0.6-RELEASE-p1/FreeBSD/src/sys/kern/kern_sx.c:51= 4). > >> 509 x =3D sx->sx_lock; > >> 510 if ((sx->lock_object.lo_flags & SX_NOADAPTIVE) > >> =3D=3D 0) { > >> 511 if ((x & SX_LOCK_SHARED) =3D=3D 0) { > >> 512 x =3D SX_OWNER(x); > >> 513 owner =3D (struct thread *)x; > >> 514 if (TD_IS_RUNNING(owner)) { > >> 515 if > >> (LOCK_LOG_TEST(&sx->lock_object, 0)) > >> 516 CTR3(KTR_LOCK, > >> 517 "%s: spinning on %p > >> held by %p", > >> 518 __func__, sx, owner); > >> > >> > >> Another panic here, which we have less information is attached as an i= mage. > >> > >> We're looking at using some INVARIANTS and WITNESS kernels, but was > >> wondering if y'all had any other suggestions to use please? > >> > >> thank you, > >> -Alfred > > --DejVYFcqCV4p9T4J Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRIzXJAAoJEJDCuSvBvK1B7jUQAJQadXQ7Z6dMDtZ/zEnFv0kJ 3r9OGt5zg3vX71XPug/4FqjBjkbGh6d2IeT/how1u/OL37iRjdc7tLKIkjM/VEJp XaMOIvG2k7MtUOPF9jd2g74DdSdB6zA56I0tdVpKbEQ1ea0t3/Zwxhz4ERBPGIVH VBVlblLV5kAlTivC2EeoZfc390sCFRY3TINBuTPYQkuqvHgI2YIoMH9MSdi4Yenr pkJKTaHL0zTYDnybcgMcqdb7GoNjHDiqMamXgdKdgvfKYT7qgMwte0yUHoUGk994 jBgaa1KOYJCCm1cbpzp0FowMs9b6rQ6aWIF0ZOdV7B0IRgPWvxs3lu6okUU29YZF cdvCpRLyJPxx+47zgZrhrlxlsHLj/09SvYkyB12iW7BVgf03jIbpHm7+dBBjGxvV ovHuv5/hwYNAbZuteE0nctQAp8Qdfd0UcknCe1IL6/S4BWFO4ftIw+MIk+NAS/dA ihN4XNBOe9+3DkSPQydJ6efwDiGlo9W+S1r0P/8rbBSuadk5NCG8Y3/7EV4rIIVw NHu7I0MtHpHSPxGkM/NvaawHl9QjqVx+uCitlzEUex5uGJep2ujifEvTB5Fkd+sr pxcYVEOEdoOKVI3LMqAOe7Rz7W//vplxFZs7O3Nakn/xGtecftusXoDwPjFVsZX/ c1clMBwyUukYuiGP3z6U =HbNy -----END PGP SIGNATURE----- --DejVYFcqCV4p9T4J-- From owner-freebsd-fs@FreeBSD.ORG Tue Feb 19 08:36:13 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1532D86B for ; Tue, 19 Feb 2013 08:36:13 +0000 (UTC) (envelope-from alfred@ixsystems.com) Received: from mail.iXsystems.com (newknight.ixsystems.com [206.40.55.70]) by mx1.freebsd.org (Postfix) with ESMTP id E306271D for ; Tue, 19 Feb 2013 08:36:12 +0000 (UTC) Received: from localhost (mail.ixsystems.com [10.2.55.1]) by mail.iXsystems.com (Postfix) with ESMTP id 5830D843B; Tue, 19 Feb 2013 00:36:12 -0800 (PST) Received: from mail.iXsystems.com ([10.2.55.1]) by localhost (mail.ixsystems.com [10.2.55.1]) (maiad, port 10024) with ESMTP id 98169-06; Tue, 19 Feb 2013 00:36:12 -0800 (PST) Received: from Alfreds-MacBook-Pro-9.local (unknown [10.8.0.26]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.iXsystems.com (Postfix) with ESMTPSA id 060798434; Tue, 19 Feb 2013 00:36:11 -0800 (PST) Message-ID: <5123397B.8030807@ixsystems.com> Date: Tue, 19 Feb 2013 00:36:11 -0800 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: Advisory lock crashes. References: <512324F2.4060707@ixsystems.com> <20130219073256.GV2598@kib.kiev.ua> <512332B3.10400@ixsystems.com> <20130219082026.GY2598@kib.kiev.ua> In-Reply-To: <20130219082026.GY2598@kib.kiev.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Xin Li , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2013 08:36:13 -0000 On 2/19/13 12:20 AM, Konstantin Belousov wrote: > On Tue, Feb 19, 2013 at 12:07:15AM -0800, Alfred Perlstein wrote: >> On 2/18/13 11:32 PM, Konstantin Belousov wrote: >>> On Mon, Feb 18, 2013 at 11:08:34PM -0800, Alfred Perlstein wrote: >>>> Hello Konstantin & Doug, >>>> >>>> We're getting a few crashes in what looks to be kern_lockf.c: >>>> >>>> fault address here is 0x360 which appears to mean that the "sx" owner >>>> thread is NULL >>> What is the version of FreeBSD ? >> This is a releng 9.0 system. (note, we have the most up to date version >> of this file with the exception of a cosmetic diff for MALLOC defines). > My suspicion is that the issue is not in the kern_lockf.c at all, > rather it is a bug in the vnode lifetime management in the filesystem > code. If true, the absense of the changes in the kern_lockf.c does > not matter, but the changes in ZFS do. > > AFAIR, there were a lot of fixes in this area for ZFS, done by avg. That would make sense. It appears as if the lockf data structures are being free()'d out from under us. Maybe there are some asserts we can put in place to catch this under DEBUG_VFS? or something? Meanwhile we'll try to catchup with zfs fixes in head. -Alfred > >>> What is the filesystem owning the file which was advlocked ? >> I'm pretty sure that is going to be ZFS. >> >>> Show the line number for lf_advlockasync+0x5d7. >>> (kgdb) list *(lf_advlockasync+0x5d7) >>> 0xffffffff80604fc7 is in lf_advlockasync (sx.h:152). >>> 147 { >>> 148 uintptr_t tid = (uintptr_t)td; >>> 149 int error = 0; >>> 150 >>> 151 if (!atomic_cmpset_acq_ptr(&sx->sx_lock, >>> SX_LOCK_UNLOCKED, tid)) >>> 152 error = _sx_xlock_hard(sx, tid, opts, file, line); >>> 153 else >>> 154 LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(LS_SX_XLOCK_ACQUIRE, >>> 155 sx, 0, 0, file, line); >>> 156 >> That may not be helpful so I've included this: >> /usr/home/alfred # bc >> ibase=16 >> 5D7 >> 1495 >> >> (kgdb) disasse lf_advlockasync >> Dump of assembler code for function lf_advlockasync: >> 0xffffffff806049f0 : push %rbp >> 0xffffffff806049f1 : mov %rdx,%rcx >>> 0xffffffff80604f70 : mov -0x80(%rbp),%rdi >>> 0xffffffff80604f74 : xor %ecx,%ecx >>> 0xffffffff80604f76 : xor %edx,%edx >>> 0xffffffff80604f78 : mov %rbx,%rsi >>> 0xffffffff80604f7b : callq >>> 0xffffffff806246d0 <_sx_xunlock_hard> >>> 0xffffffff80604f80 : jmpq >>> 0xffffffff80604c53 >>> 0xffffffff80604f85 : mov -0x58(%rbp),%rcx >>> 0xffffffff80604f89 : xor %r12d,%r12d >>> 0xffffffff80604f8c : mov 0x18(%rcx),%edi >>> 0xffffffff80604f8f : callq >>> 0xffffffff80603b90 >>> 0xffffffff80604f94 : jmpq >>> 0xffffffff80604c70 >>> 0xffffffff80604f99 : lea 0xc8(%r13),%rdi >>> 0xffffffff80604fa0 : xor %r8d,%r8d >>> 0xffffffff80604fa3 : xor %ecx,%ecx >>> 0xffffffff80604fa5 : xor %edx,%edx >>> 0xffffffff80604fa7 : mov %rbx,%rsi >>> 0xffffffff80604faa : callq >>> 0xffffffff8060a1f0 <_mtx_lock_sleep> >>> 0xffffffff80604faf : jmpq >>> 0xffffffff80604f2e >>> 0xffffffff80604fb4 : mov -0x80(%rbp),%rdi >>> 0xffffffff80604fb8 : xor %r8d,%r8d >>> 0xffffffff80604fbb : xor %ecx,%ecx >>> 0xffffffff80604fbd : xor %edx,%edx >>> 0xffffffff80604fbf : mov %rbx,%rsi >>> 0xffffffff80604fc2 : callq >>> 0xffffffff80624210 <_sx_xlock_hard> >>> 0xffffffff80604fc7 : jmpq >>> 0xffffffff80604f15 >>> 0xffffffff80604fcc : lea 0xc8(%r13),%rdi >>> 0xffffffff80604fd3 : xor %ecx,%ecx >>> 0xffffffff80604fd5 : xor %edx,%edx >>> 0xffffffff80604fd7 : xor %esi,%esi >>> 0xffffffff80604fd9 : callq >>> 0xffffffff8060a040 <_mtx_unlock_sleep> >>> 0xffffffff80604fde : jmpq >>> 0xffffffff80604f5c >>> 0xffffffff80604fe3 : mov %r15,(%rcx) >>> 0xffffffff80604fe6 : mov %r15,%r14 >>> 0xffffffff80604fe9 : mov %gs:0x0,%rax >>> 0xffffffff80604ff2 : lock cmpxchg >>> %rbx,0xe0(%r13) > This is not helpful too, you demonstrated the inlined part of the sx_lock(). > I need to understand which sx caused the issue, state->ls_lock (and > then it is related to the vnode life), or lf_lock_states_lock. > > Either the logic of the assembler should be analyzed to decipher which > lock is it, or try to list more lines around the reported address, to > see which sx_xlock() line is there. > >> >>> No, I never saw nothing similar in last 3 years. >> Yes, I'd suspect we'd all see more things here. We're very much capable >> of adding instrumentation to the OS/kernel to help track this down if >> you have ideas. > INVARIANTS, DIAGNOSTIC, DEBUG_VFS_LOCK. > > What is needed is the printout of *vp involved in the panic. > >> -Alfred >> >> >>>> db> bt >>>> Tracing pid 5099 tid 101614 td 0xfffffe005d54e8c0 >>>> _sx_xlock_hard() at _sx_xlock_hard+0xb3 >>>> lf_advlockasync() at lf_advlockasync+0x5d7 >>>> lf_advlock() at lf_advlock+0x47 >>>> vop_stdadvlock() at vop_stdadvlock+0xb3 >>>> VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x4a >>>> closef() at closef+0x352 >>>> kern_close() at kern_close+0x172 >>>> amd64_syscall() at amd64_syscall+0x58a >>>> Xfast_syscall() at Xfast_syscall+0xf7 >>>> --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8011651fc, rsp = 0x7fffffbfdd58, rbp = 0x807c3d6c0 --- >>>> >>>> (kgdb) list *(_sx_xlock_hard+0xb3) >>>> 0xffffffff806242c3 is in _sx_xlock_hard >>>> (/usr/home/jpaetzel/9.0.6-RELEASE-p1/FreeBSD/src/sys/kern/kern_sx.c:514). >>>> 509 x = sx->sx_lock; >>>> 510 if ((sx->lock_object.lo_flags & SX_NOADAPTIVE) >>>> == 0) { >>>> 511 if ((x & SX_LOCK_SHARED) == 0) { >>>> 512 x = SX_OWNER(x); >>>> 513 owner = (struct thread *)x; >>>> 514 if (TD_IS_RUNNING(owner)) { >>>> 515 if >>>> (LOCK_LOG_TEST(&sx->lock_object, 0)) >>>> 516 CTR3(KTR_LOCK, >>>> 517 "%s: spinning on %p >>>> held by %p", >>>> 518 __func__, sx, owner); >>>> >>>> >>>> Another panic here, which we have less information is attached as an image. >>>> >>>> We're looking at using some INVARIANTS and WITNESS kernels, but was >>>> wondering if y'all had any other suggestions to use please? >>>> >>>> thank you, >>>> -Alfred From owner-freebsd-fs@FreeBSD.ORG Tue Feb 19 08:50:11 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E5AE9996 for ; Tue, 19 Feb 2013 08:50:11 +0000 (UTC) (envelope-from momchil@xaxo.eu) Received: from vps2.xaxo.eu (vps2.xaxo.eu [78.47.156.66]) by mx1.freebsd.org (Postfix) with ESMTP id 65BBA772 for ; Tue, 19 Feb 2013 08:50:10 +0000 (UTC) Received: from vps2.xaxo.eu (localhost [127.0.0.1]) by vps2.xaxo.eu (8.14.4/8.14.4) with ESMTP id r1J8labT076840; Tue, 19 Feb 2013 09:47:36 +0100 (CET) (envelope-from momchil@xaxo.eu) Received: (from www@localhost) by vps2.xaxo.eu (8.14.4/8.14.4/Submit) id r1J8laNk076834; Tue, 19 Feb 2013 09:47:36 +0100 (CET) (envelope-from momchil@xaxo.eu) X-Authentication-Warning: vps2.xaxo.eu: www set sender to momchil@xaxo.eu using -f Received: from 139.18.9.22 (SquirrelMail authenticated user space) by webmail.xaxo.eu with HTTP; Tue, 19 Feb 2013 09:47:36 +0100 Message-ID: <86a88ac8bb038ec5d8034724dcf80924.squirrel@webmail.xaxo.eu> In-Reply-To: <1794994447.3103158.1361231818953.JavaMail.root@erie.cs.uoguelph.ca> References: <1794994447.3103158.1361231818953.JavaMail.root@erie.cs.uoguelph.ca> Date: Tue, 19 Feb 2013 09:47:36 +0100 Subject: Re: NFS + Kerberos From: "Momchil Ivanov" To: "Rick Macklem" User-Agent: SquirrelMail/1.4.21 MIME-Version: 1.0 Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Cc: freebsd-fs@freebsd.org, Momchil Ivanov X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2013 08:50:12 -0000 On Tue, February 19, 2013 12:56 am, Rick Macklem wrote: > Thanks to Elias's hard work, a bug/fix has just been isolated in the > Kerberos library that causes the gssd to fail to translate a principal > to a uid. The fix is to increase the size of the buffer passed to > getpwnam_r(). See this thread: > http://docs.FreeBSD.org/cgi/mid.cgi?CADtN0WKVzbKxhaLQw8y2KLhhRJC9n4ht9wyPmGQ+pHqSjQkVNw > > I haven't run into this bug, so I don't know what systems are affected, > but it would explain why you can't get it working. > > I'd suggest you apply the patch in the email (increase buf to 1024) and > then try again with libraries built with the patch. Do I have to aplly the patch to the server only and then rebuild world or do I have to do the same on the client too? And do I need to rebuild heimdal on both machines? btw, I checked the logs of the kdc and could not see any trace of the nfs server trying to validate the client's ticket... Frankly, I don't know that should I expect there, I haven't used kerberos before, so I have no idea if it's related to the bug. Here is part of the log: AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL No preauth found, returning PREAUTH-REQUIRED -- user@EXAMPLE.LOCAL sending 407 bytes to IPv4:X.X.X.X AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL Client sent patypes: encrypted-timestamp Looking for PKINIT pa-data -- user@EXAMPLE.LOCAL Looking for ENC-TS pa-data -- user@EXAMPLE.LOCAL ENC-TS Pre-authentication succeeded -- user@EXAMPLE.LOCAL using des-cbc-crc Client supported enctypes: des-cbc-crc Using des-cbc-crc/aes256-cts-hmac-sha1-96 AS-REQ authtime: 2013-02-11T23:45:44 starttime: unset endtime: 2013-02-12T09:45:39 renew till: unset sending 552 bytes to IPv4:X.X.X.X Thank you, Momchil From owner-freebsd-fs@FreeBSD.ORG Tue Feb 19 11:45:38 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2567758E for ; Tue, 19 Feb 2013 11:45:38 +0000 (UTC) (envelope-from ml@my.gd) Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by mx1.freebsd.org (Postfix) with ESMTP id 86E1D7A2 for ; Tue, 19 Feb 2013 11:45:37 +0000 (UTC) Received: by mail-wi0-f181.google.com with SMTP id hm6so4658532wib.14 for ; Tue, 19 Feb 2013 03:45:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :cc:content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=Gdeehuk4X6YWGlAhwgKuY4Ts3TRnoOI8+5be7RrUdLE=; b=KZcZv4+TZnqczqfduxhQ9cyCYOOYB6QxSD2Xt0blltJ3bTEa/3rbzurlnAF730wr1v QBGHBl45a8sNyfnzjhRNlN7SLuR4508Ff1PHBNSE2MgpRKqoFO9ykSP9Z7iqsgDGXlX3 Jt5E+3T5Qrk+RuTDV1wE5IR1ayOBarWOLvGilRs+IGk/pQg6X/Rvxxvlm0Fkw7tG6dmv WLTLnMuV/TKezWdYaKbrpObszc9Lgtu4RIYv3vkkC1eaAzeoNwdFxNTrTY+qeNCiceHC sHppVor1jpaHdXetfQ2DRQagcda5mSN35rTh8peyW//Ad1VJrx6kjAchJSAJJpUs71qD Sy1w== X-Received: by 10.180.24.229 with SMTP id x5mr24848880wif.17.1361274336363; Tue, 19 Feb 2013 03:45:36 -0800 (PST) Received: from dfleuriot-at-hi-media.com ([83.167.62.196]) by mx.google.com with ESMTPS id eo10sm27248507wib.9.2013.02.19.03.45.35 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 19 Feb 2013 03:45:35 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: zfs raid1 error resilvering and mount From: Fleuriot Damien In-Reply-To: Date: Tue, 19 Feb 2013 12:45:34 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <0ED6EC22-9875-45FF-ADFC-BB23C2C94FC0@my.gd> References: <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd> To: Konstantin Kuklin X-Mailer: Apple Mail (2.1499) X-Gm-Message-State: ALoCoQmesuEMPc1ZMmjUufFbOOYtA8FFYhxa/jAfNnhQapL4/koedl/vYXttxCpUpeB+VXWDcMyl Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2013 11:45:38 -0000 If I understand you correctly, you have: - booted another system from flash - NOT replaced the failed device - under this booted system, resilvering takes place automatically While I cannot tell why ZFS tries to resilver without a new, proper = device, I think it will only work once you've replaced the failed = device. Could you try replacing the failed drive ? On Feb 19, 2013, at 12:39 PM, Konstantin Kuklin = wrote: > i did`t replace disk, after reboot system not started (zfs installed > as default root system) and i boot from another system(from flash) and > resilvering has auto start and show me warnings with freeze > progress(dead on checking zroot/var/crash ) > replacing dead disk healing var/crash with <0x0> adress? >=20 > 2013/2/18 Fleuriot Damien : >> Reassure me here, you've replaced your failed vdev before trying to = resilver right ? >>=20 >> Your zpool status suggests otherwise, so I only want to make sure = this is a status from before replacing your drive. >>=20 >>=20 >> On Feb 18, 2013, at 8:48 AM, Konstantin Kuklin = wrote: >>=20 >>> i can`t do it, because resilvering in progress(freeze on 0.1%) and = zfs >>> list empty >>>=20 >>> 2013/2/17 Fleuriot Damien : >>>> Hmmm, zfs destroy -f zroot/var/crash ? >>>>=20 >>>> Then you can try to zfs mount -a >>>>=20 >>>>=20 >>>>=20 >>>> Removing pjd and mm from cc, if they want to read your message = they're old enough to check their ML subscription. >>>>=20 >>>>=20 >>>> On Feb 17, 2013, at 3:46 PM, Konstantin Kuklin = wrote: >>>>=20 >>>>> hi, i have raid1 on zfs with 2 device on pool >>>>> first device died and boot from second not working... >>>>>=20 >>>>> i try to get http://mfsbsd.vx.sk/ flash and load from it with = zpool import >>>>> http://puu.sh/2402E >>>>>=20 >>>>> when i load zfs.ko and opensolaris.ko i see this message: >>>>> Solaris: WARNING: Can't open objset for zroot/var/crash >>>>> Solaris: WARNING: Can't open objset for zroot/var/crash >>>>>=20 >>>>> zpool status: >>>>> http://puu.sh/2405f >>>>>=20 >>>>> resilvering freeze with: >>>>> zpool status -v >>>>> ............. >>>>> zroot/usr:<0x28ff> >>>>> zroot/usr:<0x29ff> >>>>> zroot/usr:<0x2aff> >>>>> zroot/var/crash:<0x0> >>>>> root@Flash:/root # >>>>>=20 >>>>> how i can delete or drop it fs zroot/var/crash (1m-10m size i = didn`t >>>>> remember) and mount other zfs points with my data >>>>> -- >>>>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC >>>>> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0= =B0=D0=BD=D1=82=D0=B8=D0=BD. >>>>> _______________________________________________ >>>>> freebsd-fs@freebsd.org mailing list >>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>> To unsubscribe, send any mail to = "freebsd-fs-unsubscribe@freebsd.org" >>>>=20 >>>=20 >>>=20 >>>=20 >>> -- >>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC >>> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0= =B0=D0=BD=D1=82=D0=B8=D0=BD. >>=20 >=20 >=20 >=20 > -- > =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC > =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0=B0= =D0=BD=D1=82=D0=B8=D0=BD. From owner-freebsd-fs@FreeBSD.ORG Tue Feb 19 11:46:50 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7BBAA60F; Tue, 19 Feb 2013 11:46:50 +0000 (UTC) (envelope-from konstantin.kuklin@gmail.com) Received: from mail-qa0-f53.google.com (mail-qa0-f53.google.com [209.85.216.53]) by mx1.freebsd.org (Postfix) with ESMTP id 158657B0; Tue, 19 Feb 2013 11:46:49 +0000 (UTC) Received: by mail-qa0-f53.google.com with SMTP id z4so1760792qan.12 for ; Tue, 19 Feb 2013 03:46:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=C70p9JL3bdcMt+Xio/yJrStsHXOTL/ICUr1GTzlOg68=; b=Ox3Mu/ORfvWAU9LUAf1WIjX9tfTtXQGA0GhJK9/JmXvG74bO6DWwf1eYSd+wQuzynK hTzMCeDkOwWN9tcVu/SBvRe19IwH7Ydp3ic9oGRylJUn5TiwGi18VGJlO5evc8ky0NUJ b+icAnUTXSR8YrHTMLIxNBo5dcTAhRe3fCQZubb5w0xQf9Hw8dvfyKrfCnSDyJ5BYXXk QXzKuWUO9a2SfbF5H8PrZqfa0qelVlrLYTqq0iT/EmX37dX+BpyrtBeHj1F7UnJK0R22 vCG0UlBp8jZR38B7znNNLHTbbe33qOP7xobwBcDT13cdUjma7GPEpcKm69dzax7mHvR6 PgYA== MIME-Version: 1.0 X-Received: by 10.224.60.6 with SMTP id n6mr6640429qah.16.1361273974315; Tue, 19 Feb 2013 03:39:34 -0800 (PST) Received: by 10.49.98.130 with HTTP; Tue, 19 Feb 2013 03:39:34 -0800 (PST) In-Reply-To: References: <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd> Date: Tue, 19 Feb 2013 15:39:34 +0400 Message-ID: Subject: Re: zfs raid1 error resilvering and mount From: Konstantin Kuklin To: Fleuriot Damien Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2013 11:46:50 -0000 i did`t replace disk, after reboot system not started (zfs installed as default root system) and i boot from another system(from flash) and resilvering has auto start and show me warnings with freeze progress(dead on checking zroot/var/crash ) replacing dead disk healing var/crash with <0x0> adress? 2013/2/18 Fleuriot Damien : > Reassure me here, you've replaced your failed vdev before trying to resil= ver right ? > > Your zpool status suggests otherwise, so I only want to make sure this is= a status from before replacing your drive. > > > On Feb 18, 2013, at 8:48 AM, Konstantin Kuklin wrote: > >> i can`t do it, because resilvering in progress(freeze on 0.1%) and zfs >> list empty >> >> 2013/2/17 Fleuriot Damien : >>> Hmmm, zfs destroy -f zroot/var/crash ? >>> >>> Then you can try to zfs mount -a >>> >>> >>> >>> Removing pjd and mm from cc, if they want to read your message they're = old enough to check their ML subscription. >>> >>> >>> On Feb 17, 2013, at 3:46 PM, Konstantin Kuklin wrote: >>> >>>> hi, i have raid1 on zfs with 2 device on pool >>>> first device died and boot from second not working... >>>> >>>> i try to get http://mfsbsd.vx.sk/ flash and load from it with zpool im= port >>>> http://puu.sh/2402E >>>> >>>> when i load zfs.ko and opensolaris.ko i see this message: >>>> Solaris: WARNING: Can't open objset for zroot/var/crash >>>> Solaris: WARNING: Can't open objset for zroot/var/crash >>>> >>>> zpool status: >>>> http://puu.sh/2405f >>>> >>>> resilvering freeze with: >>>> zpool status -v >>>> ............. >>>> zroot/usr:<0x28ff> >>>> zroot/usr:<0x29ff> >>>> zroot/usr:<0x2aff> >>>> zroot/var/crash:<0x0> >>>> root@Flash:/root # >>>> >>>> how i can delete or drop it fs zroot/var/crash (1m-10m size i didn`t >>>> remember) and mount other zfs points with my data >>>> -- >>>> =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD >>>> =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE. >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>> >> >> >> >> -- >> =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD >> =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE. > -- =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE. From owner-freebsd-fs@FreeBSD.ORG Tue Feb 19 13:27:04 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2644B3DF for ; Tue, 19 Feb 2013 13:27:04 +0000 (UTC) (envelope-from ml@my.gd) Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) by mx1.freebsd.org (Postfix) with ESMTP id B1427D22 for ; Tue, 19 Feb 2013 13:27:03 +0000 (UTC) Received: by mail-wi0-f182.google.com with SMTP id hi18so4768574wib.3 for ; Tue, 19 Feb 2013 05:26:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :cc:content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=hDyDQ/BdgeQGgw+6VZTeqj9wcZVR7AXl4Ev8z6bZs0k=; b=S5+5Wvb/+0nCcLJ2bSaLY2PUdJyunvCBZeOayOOFvwcThs0rZr3FF3OSbsN0B9YL1r 5MZunOG/zucsYZk+yWIuiD9s94qSLVju17jxL9GYXuddlbiKSL9YdGK88oQhsF9yZfNs 865x13GsLAPKU+cGC2rKYZFqQ+pAnXH0TJ9l0NCfOqaxWpQ50PlTnseTG4CX63ER6jG2 X8zQpEdUaRY03D6C4+eHjNlWO6LBK2qGITa0lFvhASoC0W0PUIrR3zlV5sJqW/mWCICg UA5KNxCL1E6b9mK5JvDM41gMlPt/NfA3bjzQxDoNGLle56KbEN6i2vPjDztjeaGnfr0C Gm9w== X-Received: by 10.180.24.229 with SMTP id x5mr25538857wif.17.1361280416728; Tue, 19 Feb 2013 05:26:56 -0800 (PST) Received: from dfleuriot-at-hi-media.com ([83.167.62.196]) by mx.google.com with ESMTPS id s8sm25560861wif.9.2013.02.19.05.26.54 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 19 Feb 2013 05:26:55 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: zfs raid1 error resilvering and mount From: Fleuriot Damien In-Reply-To: Date: Tue, 19 Feb 2013 14:26:54 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <2FC474DC-1905-4BEC-BFA6-037054B5437B@my.gd> References: <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd> <0ED6EC22-9875-45FF-ADFC-BB23C2C94FC0@my.gd> <8506B305-D696-4213-BA59-929E4886B10C@my.gd> To: Konstantin Kuklin X-Mailer: Apple Mail (2.1499) X-Gm-Message-State: ALoCoQmH3wfsydfdmsoI8C1Mv5fS9ai4n0AAiicrlz5Fl2gZqrLKJV5Td7udGd8wVeh4Ej7GTITt Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2013 13:27:04 -0000 Well I can't see anything else to help you, except trying to replace = your failed vdev and resilver from there=E2=80=A6 On Feb 19, 2013, at 2:24 PM, Konstantin Kuklin = wrote: > zfs set canmount=3Doff zroot/var/crash >=20 > i can`t do this, because zfs list empty >=20 > 2013/2/19 Fleuriot Damien : >> The thing is, perhaps you have corrupted blocks that weren't caught = either by ZFS or your drives' firmware, preventing the pool's operation. >>=20 >> Seeing zroot/var/crash is the problem, could you try: >>=20 >> 1/ booting from a live CD or flash >> 2/ NOT start a resilver >> 3/ run the command: >> zfs set canmount=3Doff zroot/var/crash >>=20 >>=20 >> This should prevent /var/crash from trying to be mounted from the ZFS = pool. >>=20 >> Perhaps this'll allow you to get further through the boot process and = perhaps even start your ZFS pool correctly. >>=20 >>=20 >>=20 >> On Feb 19, 2013, at 12:52 PM, Konstantin Kuklin = wrote: >>=20 >>> you understand me right, but my problem not in dead device... raid1 >>> must work correctly with 1 device and command to replace or = something >>> else not work, just freeze >>> i have only 2 warning about crash fs zroot/var/crash and thats all >>> have any idea, how i can repair it without default zfs tools like = zfs, zpool? >>>=20 >>>=20 >>> 2013/2/19 Fleuriot Damien : >>>> If I understand you correctly, you have: >>>> - booted another system from flash >>>> - NOT replaced the failed device >>>> - under this booted system, resilvering takes place automatically >>>>=20 >>>>=20 >>>> While I cannot tell why ZFS tries to resilver without a new, proper = device, I think it will only work once you've replaced the failed = device. >>>>=20 >>>> Could you try replacing the failed drive ? >>>>=20 >>>>=20 >>>> On Feb 19, 2013, at 12:39 PM, Konstantin Kuklin = wrote: >>>>=20 >>>>> i did`t replace disk, after reboot system not started (zfs = installed >>>>> as default root system) and i boot from another system(from flash) = and >>>>> resilvering has auto start and show me warnings with freeze >>>>> progress(dead on checking zroot/var/crash ) >>>>> replacing dead disk healing var/crash with <0x0> adress? >>>>>=20 >>>>> 2013/2/18 Fleuriot Damien : >>>>>> Reassure me here, you've replaced your failed vdev before trying = to resilver right ? >>>>>>=20 >>>>>> Your zpool status suggests otherwise, so I only want to make sure = this is a status from before replacing your drive. >>>>>>=20 >>>>>>=20 >>>>>> On Feb 18, 2013, at 8:48 AM, Konstantin Kuklin = wrote: >>>>>>=20 >>>>>>> i can`t do it, because resilvering in progress(freeze on 0.1%) = and zfs >>>>>>> list empty >>>>>>>=20 >>>>>>> 2013/2/17 Fleuriot Damien : >>>>>>>> Hmmm, zfs destroy -f zroot/var/crash ? >>>>>>>>=20 >>>>>>>> Then you can try to zfs mount -a >>>>>>>>=20 >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> Removing pjd and mm from cc, if they want to read your message = they're old enough to check their ML subscription. >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> On Feb 17, 2013, at 3:46 PM, Konstantin Kuklin = wrote: >>>>>>>>=20 >>>>>>>>> hi, i have raid1 on zfs with 2 device on pool >>>>>>>>> first device died and boot from second not working... >>>>>>>>>=20 >>>>>>>>> i try to get http://mfsbsd.vx.sk/ flash and load from it with = zpool import >>>>>>>>> http://puu.sh/2402E >>>>>>>>>=20 >>>>>>>>> when i load zfs.ko and opensolaris.ko i see this message: >>>>>>>>> Solaris: WARNING: Can't open objset for zroot/var/crash >>>>>>>>> Solaris: WARNING: Can't open objset for zroot/var/crash >>>>>>>>>=20 >>>>>>>>> zpool status: >>>>>>>>> http://puu.sh/2405f >>>>>>>>>=20 >>>>>>>>> resilvering freeze with: >>>>>>>>> zpool status -v >>>>>>>>> ............. >>>>>>>>> zroot/usr:<0x28ff> >>>>>>>>> zroot/usr:<0x29ff> >>>>>>>>> zroot/usr:<0x2aff> >>>>>>>>> zroot/var/crash:<0x0> >>>>>>>>> root@Flash:/root # >>>>>>>>>=20 >>>>>>>>> how i can delete or drop it fs zroot/var/crash (1m-10m size i = didn`t >>>>>>>>> remember) and mount other zfs points with my data >>>>>>>>> -- >>>>>>>>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC >>>>>>>>> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1= =82=D0=B0=D0=BD=D1=82=D0=B8=D0=BD. >>>>>>>>> _______________________________________________ >>>>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>>>> To unsubscribe, send any mail to = "freebsd-fs-unsubscribe@freebsd.org" >>>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>> -- >>>>>>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC >>>>>>> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82= =D0=B0=D0=BD=D1=82=D0=B8=D0=BD. >>>>>>=20 >>>>>=20 >>>>>=20 >>>>>=20 >>>>> -- >>>>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC >>>>> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0= =B0=D0=BD=D1=82=D0=B8=D0=BD. >>>>=20 >>>=20 >>>=20 >>>=20 >>> -- >>> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC >>> =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0= =B0=D0=BD=D1=82=D0=B8=D0=BD. >>=20 >=20 >=20 >=20 > -- > =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC > =D0=9A=D1=83=D0=BA=D0=BB=D0=B8=D0=BD =D0=9A=D0=BE=D0=BD=D1=81=D1=82=D0=B0= =D0=BD=D1=82=D0=B8=D0=BD. From owner-freebsd-fs@FreeBSD.ORG Tue Feb 19 13:32:19 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D97768EC; Tue, 19 Feb 2013 13:32:19 +0000 (UTC) (envelope-from konstantin.kuklin@gmail.com) Received: from mail-qe0-f42.google.com (mail-qe0-f42.google.com [209.85.128.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7B63FD84; Tue, 19 Feb 2013 13:32:19 +0000 (UTC) Received: by mail-qe0-f42.google.com with SMTP id 2so3044900qeb.15 for ; Tue, 19 Feb 2013 05:32:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=NmMeb08ghdhT2tPcuhh3sqWDEQrepRCVqabzsNkQE8g=; b=OV5j/4E34d5WAPTrsOrCVJdcyNbaYnp1TKh8NoFTCVGFU2k309KF2sCAzqWZkxdDkC 8bmo4Utw3K4vAjxYiXX5tAb/+XUQRTp5UF4rLpk0LwIwOZVchdu64SJ+fKfI1cuO7KUH 6dFIQAk2jNTUhOQl1g4r10aC+8Mng+0G9bR+Y5bZ79i6+xLuzCtmiow8BizKHtWQkRdI E+hNLAxD6SvQQyEE5YF58Q4waSNVb1BRuDUETEZvAEuKygWUNiwVI9878y+f+YMHWv4t KG4SB/IHJrOToLyceJ27ngk9dxbjjy3oX9feETvPuNqrgt8FcqYm6q2g/XYvB6q3VXt9 Bqrg== MIME-Version: 1.0 X-Received: by 10.224.33.14 with SMTP id f14mr7097690qad.69.1361280278069; Tue, 19 Feb 2013 05:24:38 -0800 (PST) Received: by 10.49.98.130 with HTTP; Tue, 19 Feb 2013 05:24:37 -0800 (PST) In-Reply-To: <8506B305-D696-4213-BA59-929E4886B10C@my.gd> References: <5D97BF07-ECF4-45B2-91AC-3431A75ECDB3@my.gd> <0ED6EC22-9875-45FF-ADFC-BB23C2C94FC0@my.gd> <8506B305-D696-4213-BA59-929E4886B10C@my.gd> Date: Tue, 19 Feb 2013 17:24:37 +0400 Message-ID: Subject: Re: zfs raid1 error resilvering and mount From: Konstantin Kuklin To: Fleuriot Damien Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Tue, 19 Feb 2013 13:55:25 +0000 Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2013 13:32:19 -0000 zfs set canmount=3Doff zroot/var/crash i can`t do this, because zfs list empty 2013/2/19 Fleuriot Damien : > The thing is, perhaps you have corrupted blocks that weren't caught eithe= r by ZFS or your drives' firmware, preventing the pool's operation. > > Seeing zroot/var/crash is the problem, could you try: > > 1/ booting from a live CD or flash > 2/ NOT start a resilver > 3/ run the command: > zfs set canmount=3Doff zroot/var/crash > > > This should prevent /var/crash from trying to be mounted from the ZFS poo= l. > > Perhaps this'll allow you to get further through the boot process and per= haps even start your ZFS pool correctly. > > > > On Feb 19, 2013, at 12:52 PM, Konstantin Kuklin wrote: > >> you understand me right, but my problem not in dead device... raid1 >> must work correctly with 1 device and command to replace or something >> else not work, just freeze >> i have only 2 warning about crash fs zroot/var/crash and thats all >> have any idea, how i can repair it without default zfs tools like zfs, z= pool? >> >> >> 2013/2/19 Fleuriot Damien : >>> If I understand you correctly, you have: >>> - booted another system from flash >>> - NOT replaced the failed device >>> - under this booted system, resilvering takes place automatically >>> >>> >>> While I cannot tell why ZFS tries to resilver without a new, proper dev= ice, I think it will only work once you've replaced the failed device. >>> >>> Could you try replacing the failed drive ? >>> >>> >>> On Feb 19, 2013, at 12:39 PM, Konstantin Kuklin wrote: >>> >>>> i did`t replace disk, after reboot system not started (zfs installed >>>> as default root system) and i boot from another system(from flash) and >>>> resilvering has auto start and show me warnings with freeze >>>> progress(dead on checking zroot/var/crash ) >>>> replacing dead disk healing var/crash with <0x0> adress? >>>> >>>> 2013/2/18 Fleuriot Damien : >>>>> Reassure me here, you've replaced your failed vdev before trying to r= esilver right ? >>>>> >>>>> Your zpool status suggests otherwise, so I only want to make sure thi= s is a status from before replacing your drive. >>>>> >>>>> >>>>> On Feb 18, 2013, at 8:48 AM, Konstantin Kuklin wrote: >>>>> >>>>>> i can`t do it, because resilvering in progress(freeze on 0.1%) and z= fs >>>>>> list empty >>>>>> >>>>>> 2013/2/17 Fleuriot Damien : >>>>>>> Hmmm, zfs destroy -f zroot/var/crash ? >>>>>>> >>>>>>> Then you can try to zfs mount -a >>>>>>> >>>>>>> >>>>>>> >>>>>>> Removing pjd and mm from cc, if they want to read your message they= 're old enough to check their ML subscription. >>>>>>> >>>>>>> >>>>>>> On Feb 17, 2013, at 3:46 PM, Konstantin Kuklin wrote: >>>>>>> >>>>>>>> hi, i have raid1 on zfs with 2 device on pool >>>>>>>> first device died and boot from second not working... >>>>>>>> >>>>>>>> i try to get http://mfsbsd.vx.sk/ flash and load from it with zpoo= l import >>>>>>>> http://puu.sh/2402E >>>>>>>> >>>>>>>> when i load zfs.ko and opensolaris.ko i see this message: >>>>>>>> Solaris: WARNING: Can't open objset for zroot/var/crash >>>>>>>> Solaris: WARNING: Can't open objset for zroot/var/crash >>>>>>>> >>>>>>>> zpool status: >>>>>>>> http://puu.sh/2405f >>>>>>>> >>>>>>>> resilvering freeze with: >>>>>>>> zpool status -v >>>>>>>> ............. >>>>>>>> zroot/usr:<0x28ff> >>>>>>>> zroot/usr:<0x29ff> >>>>>>>> zroot/usr:<0x2aff> >>>>>>>> zroot/var/crash:<0x0> >>>>>>>> root@Flash:/root # >>>>>>>> >>>>>>>> how i can delete or drop it fs zroot/var/crash (1m-10m size i didn= `t >>>>>>>> remember) and mount other zfs points with my data >>>>>>>> -- >>>>>>>> =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD >>>>>>>> =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE. >>>>>>>> _______________________________________________ >>>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.o= rg" >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD >>>>>> =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE. >>>>> >>>> >>>> >>>> >>>> -- >>>> =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD >>>> =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE. >>> >> >> >> >> -- >> =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD >> =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE. > -- =F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD =EB=D5=CB=CC=C9=CE =EB=CF=CE=D3=D4=C1=CE=D4=C9=CE. From owner-freebsd-fs@FreeBSD.ORG Tue Feb 19 14:58:32 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 56CB1C4; Tue, 19 Feb 2013 14:58:31 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay02.ispgateway.de (smtprelay02.ispgateway.de [80.67.31.36]) by mx1.freebsd.org (Postfix) with ESMTP id 65668B34; Tue, 19 Feb 2013 14:58:31 +0000 (UTC) Received: from [78.35.187.42] (helo=fabiankeil.de) by smtprelay02.ispgateway.de with esmtpsa (SSLv3:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1U7oKU-0005s9-Iy; Tue, 19 Feb 2013 15:37:58 +0100 Date: Tue, 19 Feb 2013 15:35:51 +0100 From: Fabian Keil To: "Steven Hartland" Subject: Re: ZFS on 9.1 doesn't see errors on geli volumes... Message-ID: <20130219153551.175ad31f@fabiankeil.de> In-Reply-To: References: <20130218191242.GI55866@funkthat.com> <20130218200121.GJ55866@funkthat.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/=MYq4uS8VhumqmOEv5iB7oq"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 Cc: freebsd-fs@FreeBSD.org, John-Mark Gurney X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2013 14:58:32 -0000 --Sig_/=MYq4uS8VhumqmOEv5iB7oq Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable "Steven Hartland" wrote: > From: "John-Mark Gurney" > > so, we just end up overwriting the bio_completed error... > > > > pjd, should we just put the bio_completed =3D line under an else? > > > > something like: > > if (pbp->bio_error !=3D 0) { > > G_ELI_LOGREQ(0, pbp, "Crypto WRITE request failed (error=3D%d).", > > pbp->bio_error); > > pbp->bio_completed =3D 0; > > } else > > pbp->bio_completed =3D pbp->bio_length; > > > > /* Write is finished, send it up. */ > > g_io_deliver(pbp, pbp->bio_error); > > sc =3D pbp->bio_to->geom->softc; > > atomic_subtract_int(&sc->sc_inflight, 1); > > > > But doesn't explain why read's aren't being counted though... >=20 > Looks like the read case will loose the error if its not the last > bio in sector group. >=20 > The attached should fix both cases. Works for me on 10-CURRENT, thanks. > A question for someone familiar with geom: why is bio_completed > not set to bio_length in the read success case? Is this correct > or is this another little bug? No idea, but I reported "another little bug" a while ago: http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/162036 and while testing how the patch affects it, I discovered that the panic can be prevented with: diff --git a/sys/geom/eli/g_eli.c b/sys/geom/eli/g_eli.c index 4e35297..24969b0 100644 --- a/sys/geom/eli/g_eli.c +++ b/sys/geom/eli/g_eli.c @@ -183,7 +183,8 @@ g_eli_read_done(struct bio *bp) pbp->bio_driver2 =3D NULL; } g_io_deliver(pbp, pbp->bio_error); - atomic_subtract_int(&sc->sc_inflight, 1); + if (sc !=3D NULL) + atomic_subtract_int(&sc->sc_inflight, 1); return; } mtx_lock(&sc->sc_queue_mtx); atomic_*_int(&sc->sc_inflight, 1) seems to be used without checking that sc isn't NULL pretty much everywhere in g_eli.c, though, and it's not clear to me when it's safe and when it isn't. > On a related note, if anyone's got some pointers to docs about > the internals of geom, I'd be interested :) Every time I looked for internal geom documentation in the past I came up empty. Fabian --Sig_/=MYq4uS8VhumqmOEv5iB7oq Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlEjjckACgkQBYqIVf93VJ23AQCfUjev5ucpAJ+pZy4TlhrecbwO 7XQAoLzeQ0t0lRZqyYGHB7xGpEfHuDSA =Gebs -----END PGP SIGNATURE----- --Sig_/=MYq4uS8VhumqmOEv5iB7oq-- From owner-freebsd-fs@FreeBSD.ORG Tue Feb 19 18:24:20 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 30A1793E; Tue, 19 Feb 2013 18:24:20 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 08F9EBED; Tue, 19 Feb 2013 18:24:20 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r1JIOJBc033727; Tue, 19 Feb 2013 18:24:19 GMT (envelope-from jhb@freefall.freebsd.org) Received: (from jhb@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r1JIOJP3033723; Tue, 19 Feb 2013 18:24:19 GMT (envelope-from jhb) Date: Tue, 19 Feb 2013 18:24:19 GMT Message-Id: <201302191824.r1JIOJP3033723@freefall.freebsd.org> To: jhb@FreeBSD.org, freebsd-fs@FreeBSD.org, jhb@FreeBSD.org From: jhb@FreeBSD.org Subject: Re: kern/176179: [nfs] nfs client KASSERT: panic: attempt to set TDF_SBDRY recursively X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2013 18:24:20 -0000 Synopsis: [nfs] nfs client KASSERT: panic: attempt to set TDF_SBDRY recursively Responsible-Changed-From-To: freebsd-fs->jhb Responsible-Changed-By: jhb Responsible-Changed-When: Tue Feb 19 18:23:50 UTC 2013 Responsible-Changed-Why: This is due to one of my changes. I am reworking it and the reworked version should fix this. http://www.freebsd.org/cgi/query-pr.cgi?pr=176179 From owner-freebsd-fs@FreeBSD.ORG Tue Feb 19 20:11:03 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C5BDB915 for ; Tue, 19 Feb 2013 20:11:03 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: from mail-ia0-x22a.google.com (ia-in-x022a.1e100.net [IPv6:2607:f8b0:4001:c02::22a]) by mx1.freebsd.org (Postfix) with ESMTP id 8EBD33A6 for ; Tue, 19 Feb 2013 20:11:03 +0000 (UTC) Received: by mail-ia0-f170.google.com with SMTP id k20so6532308iak.1 for ; Tue, 19 Feb 2013 12:11:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dragondata.com; s=google; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :cc:content-transfer-encoding:message-id:references:to:x-mailer; bh=UPWosEXkGGWzgnU5mq7j1BCBDV6yLJmc4XG/ifsHxns=; b=jTlrkjDXCD1SqoNQ1ErD4Dkn3S8Ik8GaRBTU8//zyRMpbnNKuy4yfs4wxl8gs8ziE/ Tq02ZxZHGXP3+N/fHRtlWMUBmE3iZRvTFvewkNQbv1R2T5RO8mozbuXzPAa7dqWF4+/u WfAIt+DES0Qu4HSbSFsk/CiuWpMAkHNGEA/fw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :cc:content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=UPWosEXkGGWzgnU5mq7j1BCBDV6yLJmc4XG/ifsHxns=; b=HgSrQameUHyhRey89xAkrEFisg/inK+++toH/2qEmdAUCiZXDWCiP7b2e9myhyF4XS oi7IWRhFz1N6aN+XqyQcRuWffK4Kr3GxjnQzeFz9MdQHz6914XZgHPfnSyEULTbiBUDM pdV2JgHIZAML0lclHMuzTEPULzXvzZUKWOMhoMvdHZx702UEn2hMQBhjeZjwrBBP1PPG 7rDV5zLUJS8wbRXhwusEObOo4ZUG+q7OYVowe21S13Xx+kiz//lbvmWC8wj1owuaNW/b Y9FdEl23iENArnGrWs+FLs3qTaxYZQn+LvFQ033yilrPJfmT6P2QnsEyy54jgtejbXTa gjqw== X-Received: by 10.50.196.165 with SMTP id in5mr9855938igc.99.1361304653771; Tue, 19 Feb 2013 12:10:53 -0800 (PST) Received: from vpn132.rw1.your.org (vpn132.rw1.your.org. [204.9.51.132]) by mx.google.com with ESMTPS id ip8sm4053976igc.4.2013.02.19.12.10.50 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 19 Feb 2013 12:10:52 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Improving ZFS performance for large directories From: Kevin Day In-Reply-To: <20130201192416.GA76461@server.rulingia.com> Date: Tue, 19 Feb 2013 14:10:47 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com> References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com> <20130201192416.GA76461@server.rulingia.com> To: Peter Jeremy X-Mailer: Apple Mail (2.1499) X-Gm-Message-State: ALoCoQkU2VCvT+jkKziLCBqbIYtrE5fbjJqPe+T3S++ADtUBZdXWe0nfe9VdbxmF1I4DTgoPuVRq Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2013 20:11:03 -0000 Sorry for the late followup, I've been doing some testing with an L2ARC = device. >> Doing it twice back-to-back makes a bit of difference but it's still = slow either way. >=20 > ZFS can very conservative about caching data and twice might not be = enough. > I suggest you try 8-10 times, or until the time stops reducing. >=20 Timing doing an "ls" in large directories 20 times, the first is the = slowest, then all subsequent listings are roughly the same. There = doesn't appear to be any gain after 20 repetitions=20 >> I think some of the issue is that nothing is being allowed to stay = cached long. >=20 > Well ZFS doesn't do any time-based eviction so if things aren't > staying in the cache, it's because they are being evicted by things > that ZFS considers more deserving. >=20 > Looking at the zfs-stats you posted, it looks like your workload has > very low locality of reference (the data hitrate is very) low. If > this is not what you expect then you need more RAM. OTOH, your > vfs.zfs.arc_meta_used being above vfs.zfs.arc_meta_limit suggests that > ZFS really wants to cache more metadata (by default ZFS has a 25% > metadata, 75% data split in ARC to prevent metadata caching starving > data caching). I would go even further than the 50:50 split suggested > later and try 75:25 (ie, triple the current vfs.zfs.arc_meta_limit). >=20 > Note that if there is basically no locality of reference in your > workload (as I suspect), you can even turn off data caching for > specific filesystems with zfs set primarycache=3Dmetadata tank/foo > (note that you still need to increase vfs.zfs.arc_meta_limit to > allow ZFS to use the the ARC to cache metadata). Now that I've got an L2ARC device (250GB), I've been doing some playing. = With the defaults (primarycache and secondarycache set to all), I really = didn't see much improvement. The SSD filled itself pretty quickly, but = it's hit rate was around 1%, even after 48 hours. Thinking I'd make the primary cache metadata only, and the secondary = cache "all" would improve things, I wiped the device (SATA secure erase = to make sure) and tried again. This was much worse, i'm guessing because = there was some amount of real file data being looked at frequently, the = SSD was basically getting hammered for read access with 100% = utilization, and things were far slower. I wiped the SSD and tried again with primarycache=3Dall, = secondarycache=3Dmetadata and things have improved. Even with boosting = up vfs.zfs.l2arc_write_max, it took quite a while before things = stabilized. I'm guessing there isn't a huge amount of data, but there's = such poor locality and sweeping the entire filesystem takes so long that = it's going to take a while before it decides what's worth being cached. = After about 20 hours in this configuration, it's a HUGE difference on = directory speeds though. Before adding the SSD, an "ls" in a directory = with 65k files would take 10-30 seconds, it's now down to about 0.2 = seconds.=20 So I'm guessing the theory was right, there was more metadata than would = fit in ARC so it was constantly churning. I'm a bit surprised that = continually doing an ls in a big directory didn't make it stick better, = but these filesystems are HUGE so there may be some inefficiencies = happening here. There are roughly 29M files, growing at about 50k = files/day. We recently upgraded, and are now at 96 3TB drives in the = pool. What I also find surprising is this: L2 ARC Size: (Adaptive) 22.70 GiB Header Size: 0.31% 71.49 MiB L2 ARC Breakdown: 23.77m Hit Ratio: 34.26% 8.14m Miss Ratio: 65.74% 15.62m Feeds: 63.28k It's a 250G drive, and only 22G is being used, and there's still a ~66% = miss rate. Is there any way to tell why more metadata isn't being pushed = to the L2ARC? I see a pretty high count for "Passed Headroom" and "Tried = Lock Failures", but I'm not sure if that's normal. Including the = lengthy output of zfs-stat below in case anyone sees something that = stands out as being unusual.=20 ------------------------------------------------------------------------ ZFS Subsystem Report Tue Feb 19 20:08:19 2013 ------------------------------------------------------------------------ System Information: Kernel Version: 901000 (osreldate) Hardware Platform: amd64 Processor Architecture: amd64 ZFS Storage pool Version: 28 ZFS Filesystem Version: 5 FreeBSD 9.1-RC2 #1: Tue Oct 30 20:37:38 UTC 2012 root 8:08PM up 20:40, 3 users, load averages: 0.47, 0.50, 0.52 ------------------------------------------------------------------------ System Memory: 8.41% 5.22 GiB Active, 10.18% 6.32 GiB Inact 77.39% 48.05 GiB Wired, 1.52% 966.99 MiB Cache 2.50% 1.55 GiB Free, 0.00% 888.00 KiB Gap Real Installed: 64.00 GiB Real Available: 99.97% 63.98 GiB Real Managed: 97.04% 62.08 GiB Logical Total: 64.00 GiB Logical Used: 86.22% 55.18 GiB Logical Free: 13.78% 8.82 GiB Kernel Memory: 23.18 GiB Data: 99.91% 23.16 GiB Text: 0.09% 21.27 MiB Kernel Memory Map: 52.10 GiB Size: 35.21% 18.35 GiB Free: 64.79% 33.75 GiB ------------------------------------------------------------------------ ARC Summary: (HEALTHY) Memory Throttle Count: 0 ARC Misc: Deleted: 10.24m Recycle Misses: 3.48m Mutex Misses: 24.85k Evict Skips: 12.79m ARC Size: 92.50% 28.25 GiB Target Size: (Adaptive) 92.50% 28.25 GiB Min Size (Hard Limit): 25.00% 7.64 GiB Max Size (High Water): 4:1 30.54 GiB ARC Size Breakdown: Recently Used Cache Size: 62.35% 17.62 GiB Frequently Used Cache Size: 37.65% 10.64 GiB ARC Hash Breakdown: Elements Max: 1.99m Elements Current: 99.16% 1.98m Collisions: 8.97m Chain Max: 14 Chains: 586.97k ------------------------------------------------------------------------ ARC Efficiency: 1.15b Cache Hit Ratio: 97.66% 1.12b Cache Miss Ratio: 2.34% 26.80m Actual Hit Ratio: 72.75% 833.30m Data Demand Efficiency: 98.39% 33.94m Data Prefetch Efficiency: 8.11% 7.60m CACHE HITS BY CACHE LIST: Anonymously Used: 23.88% 267.15m Most Recently Used: 4.70% 52.60m Most Frequently Used: 69.79% 780.70m Most Recently Used Ghost: 0.64% 7.13m Most Frequently Used Ghost: 0.98% 10.99m CACHE HITS BY DATA TYPE: Demand Data: 2.99% 33.40m Prefetch Data: 0.06% 616.42k Demand Metadata: 71.38% 798.44m Prefetch Metadata: 25.58% 286.13m CACHE MISSES BY DATA TYPE: Demand Data: 2.04% 546.67k Prefetch Data: 26.07% 6.99m Demand Metadata: 37.96% 10.18m Prefetch Metadata: 33.93% 9.09m ------------------------------------------------------------------------ L2 ARC Summary: (HEALTHY) Passed Headroom: 3.62m Tried Lock Failures: 3.17m IO In Progress: 21.18k Low Memory Aborts: 20 Free on Write: 7.07k Writes While Full: 134 R/W Clashes: 1.63k Bad Checksums: 0 IO Errors: 0 SPA Mismatch: 0 L2 ARC Size: (Adaptive) 22.70 GiB Header Size: 0.31% 71.02 MiB L2 ARC Breakdown: 23.78m Hit Ratio: 34.25% 8.15m Miss Ratio: 65.75% 15.64m Feeds: 63.47k L2 ARC Buffer: Bytes Scanned: 65.51 TiB Buffer Iterations: 63.47k List Iterations: 4.06m NULL List Iterations: 64.89k L2 ARC Writes: Writes Sent: 100.00% 29.89k ------------------------------------------------------------------------ File-Level Prefetch: (HEALTHY) DMU Efficiency: 1.24b Hit Ratio: 64.29% 798.62m Miss Ratio: 35.71% 443.54m Colinear: 443.54m Hit Ratio: 0.00% 20.45k Miss Ratio: 100.00% 443.52m Stride: 772.29m Hit Ratio: 99.99% 772.21m Miss Ratio: 0.01% 81.30k DMU Misc: Reclaim: 443.52m Successes: 0.05% 220.47k Failures: 99.95% 443.30m Streams: 26.42m +Resets: 0.05% 12.73k -Resets: 99.95% 26.41m Bogus: 0 ------------------------------------------------------------------------ VDEV cache is disabled ------------------------------------------------------------------------ ZFS Tunables (sysctl): kern.maxusers 384 vm.kmem_size 66662760448 vm.kmem_size_scale 1 vm.kmem_size_min 0 vm.kmem_size_max 329853485875 vfs.zfs.l2c_only_size 5242113536 vfs.zfs.mfu_ghost_data_lsize 178520064 vfs.zfs.mfu_ghost_metadata_lsize 6486959104 vfs.zfs.mfu_ghost_size 6665479168 vfs.zfs.mfu_data_lsize 11863127552 vfs.zfs.mfu_metadata_lsize 123386368 vfs.zfs.mfu_size 12432947200 vfs.zfs.mru_ghost_data_lsize 14095171584 vfs.zfs.mru_ghost_metadata_lsize 8351076864 vfs.zfs.mru_ghost_size 22446248448 vfs.zfs.mru_data_lsize 2076449280 vfs.zfs.mru_metadata_lsize 4655490560 vfs.zfs.mru_size 7074721792 vfs.zfs.anon_data_lsize 0 vfs.zfs.anon_metadata_lsize 0 vfs.zfs.anon_size 1605632 vfs.zfs.l2arc_norw 1 vfs.zfs.l2arc_feed_again 1 vfs.zfs.l2arc_noprefetch 1 vfs.zfs.l2arc_feed_min_ms 200 vfs.zfs.l2arc_feed_secs 1 vfs.zfs.l2arc_headroom 2 vfs.zfs.l2arc_write_boost 52428800 vfs.zfs.l2arc_write_max 26214400 vfs.zfs.arc_meta_limit 16398159872 vfs.zfs.arc_meta_used 16398120264 vfs.zfs.arc_min 8199079936 vfs.zfs.arc_max 32796319744 vfs.zfs.dedup.prefetch 1 vfs.zfs.mdcomp_disable 0 vfs.zfs.write_limit_override 0 vfs.zfs.write_limit_inflated 206088929280 vfs.zfs.write_limit_max 8587038720 vfs.zfs.write_limit_min 33554432 vfs.zfs.write_limit_shift 3 vfs.zfs.no_write_throttle 0 vfs.zfs.zfetch.array_rd_sz 1048576 vfs.zfs.zfetch.block_cap 256 vfs.zfs.zfetch.min_sec_reap 2 vfs.zfs.zfetch.max_streams 8 vfs.zfs.prefetch_disable 0 vfs.zfs.mg_alloc_failures 12 vfs.zfs.check_hostid 1 vfs.zfs.recover 0 vfs.zfs.txg.synctime_ms 1000 vfs.zfs.txg.timeout 5 vfs.zfs.vdev.cache.bshift 16 vfs.zfs.vdev.cache.size 0 vfs.zfs.vdev.cache.max 16384 vfs.zfs.vdev.write_gap_limit 4096 vfs.zfs.vdev.read_gap_limit 32768 vfs.zfs.vdev.aggregation_limit 131072 vfs.zfs.vdev.ramp_rate 2 vfs.zfs.vdev.time_shift 6 vfs.zfs.vdev.min_pending 4 vfs.zfs.vdev.max_pending 128 vfs.zfs.vdev.bio_flush_disable 0 vfs.zfs.cache_flush_disable 0 vfs.zfs.zil_replay_disable 0 vfs.zfs.zio.use_uma 0 vfs.zfs.snapshot_list_prefetch 0 vfs.zfs.version.zpl 5 vfs.zfs.version.spa 28 vfs.zfs.version.acl 1 vfs.zfs.debug 0 vfs.zfs.super_owner 0 From owner-freebsd-fs@FreeBSD.ORG Tue Feb 19 21:11:44 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DBA3D8BA; Tue, 19 Feb 2013 21:11:44 +0000 (UTC) (envelope-from tomek.cedro@gmail.com) Received: from mail-qc0-f170.google.com (mail-qc0-f170.google.com [209.85.216.170]) by mx1.freebsd.org (Postfix) with ESMTP id 704237F4; Tue, 19 Feb 2013 21:11:44 +0000 (UTC) Received: by mail-qc0-f170.google.com with SMTP id d42so2801484qca.29 for ; Tue, 19 Feb 2013 13:11:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:date:x-google-sender-auth:message-id :subject:from:to:content-type; bh=1yBb3dK8gbWXGlNZwKmtrGe7pRbf1HX71uH8gx53GEg=; b=ZfGkqVJcWLEJGQXvaqENiPmQPNQYc9zpfKuIyHQ8zmfzLuQsex6hmKYvRg2yD6S/6h IP1a9rmqMqDIApHeDk8ZrE8SiLq2IP/gk8lneJneJFQuODjbq7Lxt+amUu1r3Abuu1+L xnnBWlMHiz872lLV5L64LyJ7AitNdd2FOT2cWDNerNZUy0w2RCgA1TFXgbJ6oY+6eajm HjQ5EYI/xH5UCXjrKfit1PUY/cRs4VfVUmvyjly9xFgC3MhK2zrcxT6HOdpPwrA9mwQs Ei89GGSQoVKnFoWTy5fZ0QPtF9Dcr9Ygb1hExTiYFeCKMPcH0Ga9XtZIC0/Phgp1y5H0 6nUw== MIME-Version: 1.0 X-Received: by 10.224.209.193 with SMTP id gh1mr8319809qab.86.1361308297623; Tue, 19 Feb 2013 13:11:37 -0800 (PST) Sender: tomek.cedro@gmail.com Received: by 10.49.71.204 with HTTP; Tue, 19 Feb 2013 13:11:37 -0800 (PST) Date: Tue, 19 Feb 2013 22:11:37 +0100 X-Google-Sender-Auth: OqEMZuDv2ClLuLd4D5EO-APjo0k Message-ID: Subject: bluray recorder From: CeDeROM To: freebsd-stable@freebsd.org, freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2013 21:11:44 -0000 Hello :-) I have just bought a Pioneer 15x BluRay recorder. I saw something like below in the dmesg, I cannot access video with VLC, should I worry about that? I guess recording files can be done just as for DVD with growisofs? :-) (cd2:ata0:0:1:0): READ DVD STRUCTURE. CDB: ad 0 0 0 0 0 0 1 0 8 0 0 (cd2:ata0:0:1:0): CAM status: SCSI Status Error (cd2:ata0:0:1:0): SCSI status: Check Condition (cd2:ata0:0:1:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field in CDB) (cd2:ata0:0:1:0): Error 22, Unretryable error (cd2:ata0:0:1:0): READ DVD STRUCTURE. CDB: ad 0 0 0 0 0 0 1 0 8 0 0 (cd2:ata0:0:1:0): CAM status: SCSI Status Error (cd2:ata0:0:1:0): SCSI status: Check Condition (cd2:ata0:0:1:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field in CDB) (cd2:ata0:0:1:0): Error 22, Unretryable error (cd2:ata0:0:1:0): READ DVD STRUCTURE. CDB: ad 0 0 0 0 0 0 1 0 8 0 0 (cd2:ata0:0:1:0): CAM status: SCSI Status Error (cd2:ata0:0:1:0): SCSI status: Check Condition (cd2:ata0:0:1:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field in CDB) (cd2:ata0:0:1:0): Error 22, Unretryable error (cd2:ata0:0:1:0): READ DVD STRUCTURE. CDB: ad 0 0 0 0 0 0 1 0 8 0 0 (cd2:ata0:0:1:0): CAM status: SCSI Status Error (cd2:ata0:0:1:0): SCSI status: Check Condition (cd2:ata0:0:1:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field in CDB) (cd2:ata0:0:1:0): Error 22, Unretryable error Any hints welcome :-) Tomek -- CeDeROM, SQ7MHZ, http://www.tomek.cedro.info From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 00:42:52 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id ADEAB300 for ; Wed, 20 Feb 2013 00:42:52 +0000 (UTC) (envelope-from jmg@h2.funkthat.com) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) by mx1.freebsd.org (Postfix) with ESMTP id 5812536E for ; Wed, 20 Feb 2013 00:42:52 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id r1K0gkk5048810 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 19 Feb 2013 16:42:46 -0800 (PST) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id r1K0gklU048809 for freebsd-fs@FreeBSD.org; Tue, 19 Feb 2013 16:42:46 -0800 (PST) (envelope-from jmg) Date: Tue, 19 Feb 2013 16:42:46 -0800 From: John-Mark Gurney To: freebsd-fs@FreeBSD.org Subject: on zfs, read errors are considered write errors? Message-ID: <20130220004246.GL55866@funkthat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Tue, 19 Feb 2013 16:42:46 -0800 (PST) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 00:42:52 -0000 So, I've been trying to track down how ZFS handles errors and stuff to make sure it's sane before I try to fix geli, but I've been getting some wierd results... Apparently, zfs thinks that read errors are WRITE errors, or even CKSUM errors (this is understandable, as invalid data would cause a cksum error)... I don't know where in the zfs code that error accounting is happening, but here is my test: touch /root/disk{1,2} mdconfig -a -t vnode -f /root/disk1 -s 96m mdconfig -a -t vnode -f /root/disk2 -s 96m gnop create md0 gnop create md1 zpool create ztest mirror md0.nop md1.nop cd /ztest for i in `jot 1000 1`; do echo $i > $i; done cd / zpool export ztest gnop configure -r 0 md0.nop zpool import ztest zpool status gnop configure -r 30 md0.nop cat /ztest/* zpool status zpool scrub zpool status And I get results like: [root@carbon /]# zpool status pool: ztest state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 239K in 0h0m with 0 errors on Tue Feb 19 16:36:37 2013 config: NAME STATE READ WRITE CKSUM ztest ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 md0.nop ONLINE 5 277 422 md1.nop ONLINE 0 0 0 errors: No known data errors I'm patches that changes gnop to log the errors on debug of 1 instead of 2 (which also logs all requests), and the logs verify that only errors to READ requests are returned... Any clues why read errors would cause write errors? -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 02:00:50 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1996E646 for ; Wed, 20 Feb 2013 02:00:50 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id D067091C for ; Wed, 20 Feb 2013 02:00:49 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEACEtJFGDaFvO/2dsb2JhbABFhkm6AYEkc4IfAQEEASMEUgUWGBEZAgRVBogfBgytaoJAkCGNPRqBAxkbB4ItgRMDiGaGMYcUgR2PO4MlgU0HFwYY X-IronPort-AV: E=Sophos;i="4.84,698,1355115600"; d="c'?scan'208";a="14840163" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 19 Feb 2013 21:00:42 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id B40F3B3F4E; Tue, 19 Feb 2013 21:00:42 -0500 (EST) Date: Tue, 19 Feb 2013 21:00:42 -0500 (EST) From: Rick Macklem To: Momchil Ivanov Message-ID: <992481316.3137385.1361325642681.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <86a88ac8bb038ec5d8034724dcf80924.squirrel@webmail.xaxo.eu> Subject: Re: NFS + Kerberos MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_3137384_634441493.1361325642679" X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 02:00:50 -0000 ------=_Part_3137384_634441493.1361325642679 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Momchil Ivanov wrote: > On Tue, February 19, 2013 12:56 am, Rick Macklem wrote: > > Thanks to Elias's hard work, a bug/fix has just been isolated in the > > Kerberos library that causes the gssd to fail to translate a > > principal > > to a uid. The fix is to increase the size of the buffer passed to > > getpwnam_r(). See this thread: > > http://docs.FreeBSD.org/cgi/mid.cgi?CADtN0WKVzbKxhaLQw8y2KLhhRJC9n4ht9wyPmGQ+pHqSjQkVNw > > > > I haven't run into this bug, so I don't know what systems are > > affected, > > but it would explain why you can't get it working. > > > > I'd suggest you apply the patch in the email (increase buf to 1024) > > and > > then try again with libraries built with the patch. > > Do I have to aplly the patch to the server only and then rebuild world > or > do I have to do the same on the client too? And do I need to rebuild > heimdal on both machines? > The bug should only affect the server, since the client never translates between principal_name<->uid. (The client does a rather cheezey trick of using the uid to select the correct credential cache file.) > btw, I checked the logs of the kdc and could not see any trace of the > nfs > server trying to validate the client's ticket... Frankly, I don't know > that should I expect there, I haven't used kerberos before, so I have > no > idea if it's related to the bug. Here is part of the log: > > AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for > krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL > No preauth found, returning PREAUTH-REQUIRED -- user@EXAMPLE.LOCAL > sending 407 bytes to IPv4:X.X.X.X > AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for > krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL > Client sent patypes: encrypted-timestamp > Looking for PKINIT pa-data -- user@EXAMPLE.LOCAL > Looking for ENC-TS pa-data -- user@EXAMPLE.LOCAL > ENC-TS Pre-authentication succeeded -- user@EXAMPLE.LOCAL using > des-cbc-crc > Client supported enctypes: des-cbc-crc > Using des-cbc-crc/aes256-cts-hmac-sha1-96 > AS-REQ authtime: 2013-02-11T23:45:44 starttime: unset endtime: > 2013-02-12T09:45:39 renew till: unset > sending 552 bytes to IPv4:X.X.X.X > Hmm, that sounds like you are never getting as far as sending the ticket to the server, but I'm not at home, so I can't look and see exactly what gets logged. (Also, I use a MIT KDC, so what gets logged might be different.) I've attached a trivial program that you can compile/run as root on the NFS server to see if 128 bytes is a big enough buffer for your setup. If it can print out the uid for the usernames you test as arguments, the patch isn't needed for your environment. (Oh, and it has a typo bug in the errx() arguments, but it works ok for testing.) Good luck with it, rick > Thank you, > Momchil ------=_Part_3137384_634441493.1361325642679-- From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 03:59:18 2013 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2B80E6B7 for ; Wed, 20 Feb 2013 03:59:18 +0000 (UTC) (envelope-from jamie@FreeBSD.org) Received: from m2.gritton.org (gritton.org [199.192.164.235]) by mx1.freebsd.org (Postfix) with ESMTP id E6FDBDEA for ; Wed, 20 Feb 2013 03:59:17 +0000 (UTC) Received: from glorfindel.gritton.org (c-174-52-130-157.hsd1.ut.comcast.net [174.52.130.157]) (authenticated bits=0) by m2.gritton.org (8.14.5/8.14.5) with ESMTP id r1K3xG4Z019369 for ; Tue, 19 Feb 2013 20:59:16 -0700 (MST) (envelope-from jamie@FreeBSD.org) Message-ID: <51244A13.8030907@FreeBSD.org> Date: Tue, 19 Feb 2013 20:59:15 -0700 From: Jamie Gritton User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.24) Gecko/20120129 Thunderbird/3.1.16 MIME-Version: 1.0 To: fs@FreeBSD.org Subject: mount/kldload race Content-Type: multipart/mixed; boundary="------------080501010405030304090106" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 03:59:18 -0000 This is a multi-part message in MIME format. --------------080501010405030304090106 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Perhaps most people don't try to mount a bunch of filesystems at the same time, at least not those that depend on kernel modules. But it turns out that's going to be a pretty common situation with jails and nullfs. And I found that when attempting such a feat will cause most of these simultaneous mounts to fail with ENODEV. It turns out that the problem is a race in vfs_byname_kld(). First it'll see if the fstype is loaded, and if it isn't then it will load the module. But if the module is loaded by a different process between those two points, the resulting EEXIST from kern_kldload() will make vfs_byname_kld() error out. The fix is pretty simple: don't treat EEXIST as an error. By going on, and rechecking for the fstype, the filesystem can be mounted while still allowing any "real" error to be caught. I'm including a small patch that will accomplish this, and I'd appreciate a quick look by anyone who's familiar with this part of things before I commit it. - Jamie --------------080501010405030304090106 Content-Type: text/plain; name="vfs_init.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="vfs_init.diff" Index: sys/kern/vfs_init.c =================================================================== --- sys/kern/vfs_init.c (revision 247000) +++ sys/kern/vfs_init.c (working copy) @@ -130,13 +130,18 @@ /* Try to load the respective module. */ *error = kern_kldload(td, fstype, &fileid); + if (*error == EEXIST) { + *error = 0; + fileid = 0; + } if (*error) return (NULL); /* Look up again to see if the VFS was loaded. */ vfsp = vfs_byname(fstype); if (vfsp == NULL) { - (void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE); + if (fileid != 0) + (void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE); *error = ENODEV; return (NULL); } --------------080501010405030304090106-- From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 05:43:14 2013 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EF7A9BF1; Wed, 20 Feb 2013 05:43:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 4E5D9211; Wed, 20 Feb 2013 05:43:14 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r1K5h9wg001927; Wed, 20 Feb 2013 07:43:09 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r1K5h9wg001927 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r1K5h90j001926; Wed, 20 Feb 2013 07:43:09 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 20 Feb 2013 07:43:09 +0200 From: Konstantin Belousov To: Jamie Gritton Subject: Re: mount/kldload race Message-ID: <20130220054309.GD2598@kib.kiev.ua> References: <51244A13.8030907@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="7HhoQoqNsng1reXT" Content-Disposition: inline In-Reply-To: <51244A13.8030907@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 05:43:15 -0000 --7HhoQoqNsng1reXT Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 19, 2013 at 08:59:15PM -0700, Jamie Gritton wrote: > Perhaps most people don't try to mount a bunch of filesystems at the=20 > same time, at least not those that depend on kernel modules. But it=20 > turns out that's going to be a pretty common situation with jails and=20 > nullfs. And I found that when attempting such a feat will cause most of= =20 > these simultaneous mounts to fail with ENODEV. >=20 > It turns out that the problem is a race in vfs_byname_kld(). First it'll= =20 > see if the fstype is loaded, and if it isn't then it will load the=20 > module. But if the module is loaded by a different process between those= =20 > two points, the resulting EEXIST from kern_kldload() will make=20 > vfs_byname_kld() error out. >=20 > The fix is pretty simple: don't treat EEXIST as an error. By going on,=20 > and rechecking for the fstype, the filesystem can be mounted while still= =20 > allowing any "real" error to be caught. I'm including a small patch that= =20 > will accomplish this, and I'd appreciate a quick look by anyone who's=20 > familiar with this part of things before I commit it. >=20 > - Jamie > Index: sys/kern/vfs_init.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/kern/vfs_init.c (revision 247000) > +++ sys/kern/vfs_init.c (working copy) > @@ -130,13 +130,18 @@ > =20 > /* Try to load the respective module. */ > *error =3D kern_kldload(td, fstype, &fileid); > + if (*error =3D=3D EEXIST) { > + *error =3D 0; > + fileid =3D 0; Why do you clear fileid ? Is this to prevent an attempt to kldunload() the module which was not loaded by the current thread ? If yes, I would suggest to use the separate flag to track this, which is cleared on EEXIST error. IMHO it is cleaner and less puzzling. > + } > if (*error) > return (NULL); > =20 > /* Look up again to see if the VFS was loaded. */ > vfsp =3D vfs_byname(fstype); > if (vfsp =3D=3D NULL) { > - (void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE); > + if (fileid !=3D 0) > + (void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE); > *error =3D ENODEV; > return (NULL); > } > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" --7HhoQoqNsng1reXT Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRJGJtAAoJEJDCuSvBvK1BcN0P/23vCOiUpRsEV5ODpT1qdD9E DsdB8/WAIc7xL1OgSUL5M4kORDpH6i7gqnDLlYJRtPvc6fikvrEGR9MpC4fFMHJ3 58LaKdhWTvl2YRKyLxcpO4rC9yx86DVLbA57ya7EH+P+9Ij/Ehh0NM8yaVO3xCQE 0/aP0MwPHxfAbkm8ybB8MEVegLWzBEMGds8Yvybxm0fIRuCNZKprhVd+XpcM73Dp LoTeRm0ho+ggzlJv4NfwPsCoYgMHLML0wLibqXbUsQgJVKvNZ06sYZVFvT6ywc+n b1L2y/Qxibmxww/lUy61AgxXg+/h7vIme21IsWn905K3ka0IWhuOud3Mn2X51N4b cSqXC0FWhDcXT77iPOp/aVlOKrPqtarqX3WqFdM6AiGkZ/pciis4YtXvn1q2midV UbBB2rdORUuSZXs1yJNLuOn5UKLFCrZKSnxScZxSaEhE3U7OwCXRjfxYFbTvlovr Xl5dQqpLirrZlhjg2gYcftl964BhmIIEj0a3QrWFfZZ4cfE9JOmQ0n5JpWfNopir FNmLS6nsNwG8sJN0cRKloC1cB5yELHn7ZeFvGKD2ttQRCpFH1ou7Oc3cnV5Xv07m UXBO5K3ztmyxnS0TSuNyuIN01YFN6cxH3+dU0ssMl05+UpYsf6SL7Kpz/DGK3TTT Em0OAxk446w7bjtf56UX =u3qn -----END PGP SIGNATURE----- --7HhoQoqNsng1reXT-- From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 06:21:00 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E0AB2135 for ; Wed, 20 Feb 2013 06:21:00 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id 4328032B for ; Wed, 20 Feb 2013 06:20:59 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.5/8.14.5) with ESMTP id r1K6KkqH077549 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 20 Feb 2013 08:20:46 +0200 (EET) (envelope-from daniel@digsys.bg) Message-ID: <51246B3E.1030604@digsys.bg> Date: Wed, 20 Feb 2013 08:20:46 +0200 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.12) Gecko/20130125 Thunderbird/10.0.12 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Improving ZFS performance for large directories References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com> <20130201192416.GA76461@server.rulingia.com> <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com> In-Reply-To: <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 06:21:00 -0000 On 19.02.13 22:10, Kevin Day wrote: > Thinking I'd make the primary cache metadata only, and the secondary cache "all" would improve things, I wiped the device (SATA secure erase to make sure) and tried again. This was much worse, i'm guessing because there was some amount of real file data being looked at frequently, the SSD was basically getting hammered for read access with 100% utilization, and things were far slower. This sounds weird. What kind is your L2ARC device, what performance and how is it connected? Typical today's SSDs have read performance of over 500 MB/s if you connect it at SATA3. You could double that with two drives etc. For L2ARC you don't really need write-optimized SSD, because ZFS rate-limits the writes to L2ARC. It is best to connect these on the motherboard's SATA ports. Is the SSD used only for L2ARC? If it is writing too much, that might make it slow at intensive usage, especially if it is not write-optimised (typical "pro" or "enterprise"). Also, you may wish to experiment with the sector size (alignment) when you add it to the pool. The ashift parameter is per-vdev in ZFS and cache and log devices are separate vdevs. Therefore, using gnop to make it appear as 4K or 8K sector drive might improve things. You have to experiment here... > > ARC Size: 92.50% 28.25 GiB > Target Size: (Adaptive) 92.50% 28.25 GiB > Min Size (Hard Limit): 25.00% 7.64 GiB > Max Size (High Water): 4:1 30.54 GiB But this looks strange. Have you increased vfs.zfs.arc_max and vfs.zfs.arc_meta_limit? For an 72GB system, I have this in /boot/loader.conf vfs.zfs.arc_max=64424509440 vfs.zfs.arc_meta_limit=51539607552 I found out that increasing vfs.zfs.arc_meta_limit helped most (my issues were with huge deduped datasets with dedup ratio of around 10 and many snapshots). Even if you intend to keep ARC small (bad idea, as it is being used to track L2ARC as well), you need to increase vfs.zfs.arc_meta_limit, perhaps up to vfs.zfs.arc_max. If you do that, then perhaps primarycache=metadata might even work better. Daniel From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 07:20:02 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 56096B7A for ; Wed, 20 Feb 2013 07:20:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 35F6B73B for ; Wed, 20 Feb 2013 07:20:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r1K7K1vq003697 for ; Wed, 20 Feb 2013 07:20:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r1K7K12t003696; Wed, 20 Feb 2013 07:20:01 GMT (envelope-from gnats) Date: Wed, 20 Feb 2013 07:20:01 GMT Message-Id: <201302200720.r1K7K12t003696@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: "Ganael LAPLANCHE" Subject: Re: kern/112658: [smbfs] [patch] smbfs and caching problems (resolves bin/111004) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Ganael LAPLANCHE List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 07:20:02 -0000 The following reply was made to PR kern/112658; it has been noted by GNATS. From: "Ganael LAPLANCHE" To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/112658: [smbfs] [patch] smbfs and caching problems (resolves bin/111004) Date: Wed, 20 Feb 2013 07:16:27 +0000 (UTC) This is a multi-part message in MIME format. ------=OPENWEBMAIL_ATT_0.350054851341159 Content-Type: text/plain; charset=iso-8859-15 Here is an updated version of the patch. It applies to svn revision 246938. -- Ganael LAPLANCHE http://www.martymac.org | http://contribs.martymac.org FreeBSD: martymac , http://www.FreeBSD.org ------=OPENWEBMAIL_ATT_0.350054851341159 Content-Type: text/plain; name="patch-smbfs-svnrev-246938.txt" Content-Disposition: attachment; filename="patch-smbfs-svnrev-246938.txt" Content-Transfer-Encoding: base64 ZGlmZiAtYXVyTiBzeXMvZnMvc21iZnMub3JpZy9zbWJmc19ub2RlLmMgc3lzL2ZzL3NtYmZzL3Nt YmZzX25vZGUuYwotLS0gc3lzL2ZzL3NtYmZzLm9yaWcvc21iZnNfbm9kZS5jCTIwMTItMTItMTQg MDk6NTk6MTEuNjgyNzQxMDAwICswMTAwCisrKyBzeXMvZnMvc21iZnMvc21iZnNfbm9kZS5jCTIw MTMtMDItMjAgMDY6MDA6MzcuNjUyNzk2MzA1ICswMTAwCkBAIC02NCw3ICs2NCw3IEBACiAJcmV0 dXJuIChmbnZfMzJfYnVmKG5hbWUsIG5tbGVuLCBGTlYxXzMyX0lOSVQpKTsgCiB9CiAKLXN0YXRp YyBjaGFyICoKK2NoYXIgKgogc21iZnNfbmFtZV9hbGxvYyhjb25zdCB1X2NoYXIgKm5hbWUsIGlu dCBubWxlbikKIHsKIAl1X2NoYXIgKmNwOwpAQCAtNzYsNyArNzYsNyBAQAogCXJldHVybiBjcDsK IH0KIAotc3RhdGljIHZvaWQKK3ZvaWQKIHNtYmZzX25hbWVfZnJlZSh1X2NoYXIgKm5hbWUpCiB7 CiAKZGlmZiAtYXVyTiBzeXMvZnMvc21iZnMub3JpZy9zbWJmc19ub2RlLmggc3lzL2ZzL3NtYmZz L3NtYmZzX25vZGUuaAotLS0gc3lzL2ZzL3NtYmZzLm9yaWcvc21iZnNfbm9kZS5oCTIwMTItMTIt MTQgMDk6NTk6MTEuNjc2NzQwMDAwICswMTAwCisrKyBzeXMvZnMvc21iZnMvc21iZnNfbm9kZS5o CTIwMTMtMDItMjAgMDY6MDA6MzcuNjU4ODE5MjA3ICswMTAwCkBAIC05MSw2ICs5MSw5IEBACiAJ c3RydWN0IHNtYmZhdHRyICpmYXAsIHN0cnVjdCB2bm9kZSAqKnZwcCk7CiB1X2ludDMyX3Qgc21i ZnNfaGFzaChjb25zdCB1X2NoYXIgKm5hbWUsIGludCBubWxlbik7CiAKK2NoYXIgICpzbWJmc19u YW1lX2FsbG9jKGNvbnN0IHVfY2hhciAqbmFtZSwgaW50IG5tbGVuKTsKK3ZvaWQgICBzbWJmc19u YW1lX2ZyZWUodV9jaGFyICpuYW1lKTsKKwogaW50ICBzbWJmc19nZXRwYWdlcyhzdHJ1Y3Qgdm9w X2dldHBhZ2VzX2FyZ3MgKik7CiBpbnQgIHNtYmZzX3B1dHBhZ2VzKHN0cnVjdCB2b3BfcHV0cGFn ZXNfYXJncyAqKTsKIGludCAgc21iZnNfcmVhZHZub2RlKHN0cnVjdCB2bm9kZSAqdnAsIHN0cnVj dCB1aW8gKnVpb3AsIHN0cnVjdCB1Y3JlZCAqY3JlZCk7CmRpZmYgLWF1ck4gc3lzL2ZzL3NtYmZz Lm9yaWcvc21iZnNfc21iLmMgc3lzL2ZzL3NtYmZzL3NtYmZzX3NtYi5jCi0tLSBzeXMvZnMvc21i ZnMub3JpZy9zbWJmc19zbWIuYwkyMDEyLTEyLTE0IDA5OjU5OjExLjY4MDc0MDAwMCArMDEwMAor Kysgc3lzL2ZzL3NtYmZzL3NtYmZzX3NtYi5jCTIwMTMtMDItMjAgMDY6MDA6MzcuNjY3ODE0ODMz ICswMTAwCkBAIC0xNDQzLDM3ICsxNDQzLDUwIEBACiB9CiAKIGludAotc21iZnNfc21iX2xvb2t1 cChzdHJ1Y3Qgc21ibm9kZSAqZG5wLCBjb25zdCBjaGFyICpuYW1lLCBpbnQgbm1sZW4sCitzbWJm c19zbWJfbG9va3VwKHN0cnVjdCBzbWJub2RlICpkbnAsIGNoYXIgKipuYW1lcCwgaW50ICpubWxl bnAsCiAJc3RydWN0IHNtYmZhdHRyICpmYXAsIHN0cnVjdCBzbWJfY3JlZCAqc2NyZWQpCiB7CiAJ c3RydWN0IHNtYmZzX2ZjdHggKmN0eDsKIAlpbnQgZXJyb3I7CiAKLQlpZiAoZG5wID09IE5VTEwg fHwgKGRucC0+bl9pbm8gPT0gMiAmJiBuYW1lID09IE5VTEwpKSB7CisJaWYgKGRucCA9PSBOVUxM IHx8CisJCShkbnAtPm5faW5vID09IDIgJiYgKG5hbWVwID09IE5VTEwgfHwgKm5hbWVwID09IE5V TEwpKSkgewogCQliemVybyhmYXAsIHNpemVvZigqZmFwKSk7CiAJCWZhcC0+ZmFfYXR0ciA9IFNN Ql9GQV9ESVI7CiAJCWZhcC0+ZmFfaW5vID0gMjsKIAkJcmV0dXJuIDA7CiAJfQotCWlmIChubWxl biA9PSAxICYmIG5hbWVbMF0gPT0gJy4nKSB7Ci0JCWVycm9yID0gc21iZnNfc21iX2xvb2t1cChk bnAsIE5VTEwsIDAsIGZhcCwgc2NyZWQpOworCWlmIChubWxlbnAgJiYgKm5tbGVucCA9PSAxICYm IG5hbWVwICYmICgqbmFtZXApWzBdID09ICcuJykgeworCQllcnJvciA9IHNtYmZzX3NtYl9sb29r dXAoZG5wLCBOVUxMLCBOVUxMLCBmYXAsIHNjcmVkKTsKIAkJcmV0dXJuIGVycm9yOwotCX0gZWxz ZSBpZiAobm1sZW4gPT0gMiAmJiBuYW1lWzBdID09ICcuJyAmJiBuYW1lWzFdID09ICcuJykgewot CQllcnJvciA9IHNtYmZzX3NtYl9sb29rdXAoVlRPU01CKGRucC0+bl9wYXJlbnQpLCBOVUxMLCAw LCBmYXAsCi0JCSAgICBzY3JlZCk7CisJfSBlbHNlIGlmIChubWxlbnAgJiYgKm5tbGVucCA9PSAy ICYmIG5hbWVwICYmICgqbmFtZXApWzBdID09ICcuJyAmJgorCQkoKm5hbWVwKVsxXSA9PSAnLicp IHsKKwkJZXJyb3IgPSBzbWJmc19zbWJfbG9va3VwKFZUT1NNQihkbnAtPm5fcGFyZW50KSwgTlVM TCwgTlVMTCwKKwkJCWZhcCwgc2NyZWQpOwogCQlwcmludGYoIiVzOiBrbm93cyBOT1RISU5HIGFi b3V0ICcuLidcbiIsIF9fZnVuY19fKTsKIAkJcmV0dXJuIGVycm9yOwogCX0KLQllcnJvciA9IHNt YmZzX2ZpbmRvcGVuKGRucCwgbmFtZSwgbm1sZW4sCi0JICAgIFNNQl9GQV9TWVNURU0gfCBTTUJf RkFfSElEREVOIHwgU01CX0ZBX0RJUiwgc2NyZWQsICZjdHgpOworCWVycm9yID0gc21iZnNfZmlu ZG9wZW4oZG5wLCBuYW1lcCA/ICpuYW1lcCA6IE5VTEwsIG5tbGVucCA/ICpubWxlbnAgOiAwLAor CQlTTUJfRkFfU1lTVEVNIHwgU01CX0ZBX0hJRERFTiB8IFNNQl9GQV9ESVIsIHNjcmVkLCAmY3R4 KTsKIAlpZiAoZXJyb3IpCiAJCXJldHVybiBlcnJvcjsKIAljdHgtPmZfZmxhZ3MgfD0gU01CRlNf UkREX0ZJTkRTSU5HTEU7CiAJZXJyb3IgPSBzbWJmc19maW5kbmV4dChjdHgsIDEsIHNjcmVkKTsK IAlpZiAoZXJyb3IgPT0gMCkgewogCQkqZmFwID0gY3R4LT5mX2F0dHI7Ci0JCWlmIChuYW1lID09 IE5VTEwpCisJCWlmIChuYW1lcCA9PSBOVUxMIHx8ICpuYW1lcCA9PSBOVUxMKQogCQkJZmFwLT5m YV9pbm8gPSBkbnAtPm5faW5vOworCQlpZiAobmFtZXAgJiYgKm5hbWVwICYmIG5tbGVucCAmJiAq bm1sZW5wKSB7CisJCQkvKiBSZXR1cm4gdGhlICpyZWFsKiBuYW1lIGFuZCBsZW5ndGggb2YgdGhl IGZpbGUgCisJCQkgKiBmb3VuZCBvbiB0aGUgc2VydmVyIGlmIG5lY2Vzc2FyeS4gSWYgYSBuZXcg YWxsb2NhdGlvbgorCQkJICogaXMgZG9uZSBoZXJlLCBtZW1vcnkgd2lsbCBiZSBmcmVlZCBsYXRl ciAqLworCQkJaWYoKGN0eC0+Zl9ubWxlbiAhPSAqbm1sZW5wKSB8fAorCQkJCShiY21wKGN0eC0+ Zl9uYW1lLCAqbmFtZXAsICpubWxlbnApICE9IDApKSB7CisJCQkJU01CVkRFQlVHKCJsb29rdXBl ZCBmaWxlbmFtZSBhbmQgc2VydmVyJ3MgZmlsZW5hbWUgZGlmZmVyXG4iKTsKKwkJCQkqbmFtZXAg PSBzbWJmc19uYW1lX2FsbG9jKCh1X2NoYXIgKikoY3R4LT5mX25hbWUpLCBjdHgtPmZfbm1sZW4p OworCQkJCSpubWxlbnAgPSBjdHgtPmZfbm1sZW47CisJCQl9CisJCX0KIAl9CiAJc21iZnNfZmlu ZGNsb3NlKGN0eCwgc2NyZWQpOwogCXJldHVybiBlcnJvcjsKZGlmZiAtYXVyTiBzeXMvZnMvc21i ZnMub3JpZy9zbWJmc19zdWJyLmggc3lzL2ZzL3NtYmZzL3NtYmZzX3N1YnIuaAotLS0gc3lzL2Zz L3NtYmZzLm9yaWcvc21iZnNfc3Vici5oCTIwMTItMTItMTQgMDk6NTk6MTEuNjc4NzQyMDAwICsw MTAwCisrKyBzeXMvZnMvc21iZnMvc21iZnNfc3Vici5oCTIwMTMtMDItMjAgMDY6MDA6MzcuNjcz Nzk5MDQ0ICswMTAwCkBAIC0xNjYsNyArMTY2LDcgQEAKIGludCAgc21iZnNfZmluZGNsb3NlKHN0 cnVjdCBzbWJmc19mY3R4ICpjdHgsIHN0cnVjdCBzbWJfY3JlZCAqc2NyZWQpOwogaW50ICBzbWJm c19mdWxscGF0aChzdHJ1Y3QgbWJjaGFpbiAqbWJwLCBzdHJ1Y3Qgc21iX3ZjICp2Y3AsCiAJc3Ry dWN0IHNtYm5vZGUgKmRucCwgY29uc3QgY2hhciAqbmFtZSwgaW50IG5tbGVuKTsKLWludCAgc21i ZnNfc21iX2xvb2t1cChzdHJ1Y3Qgc21ibm9kZSAqZG5wLCBjb25zdCBjaGFyICpuYW1lLCBpbnQg bm1sZW4sCitpbnQgIHNtYmZzX3NtYl9sb29rdXAoc3RydWN0IHNtYm5vZGUgKmRucCwgY2hhciAq Km5hbWVwLCBpbnQgKm5tbGVucCwKIAlzdHJ1Y3Qgc21iZmF0dHIgKmZhcCwgc3RydWN0IHNtYl9j cmVkICpzY3JlZCk7CiAKIGludCAgc21iZnNfZm5hbWVfdG9sb2NhbChzdHJ1Y3Qgc21iX3ZjICp2 Y3AsIGNoYXIgKm5hbWUsIGludCAqbm1sZW4sIGludCBjYXNlb3B0KTsKZGlmZiAtYXVyTiBzeXMv ZnMvc21iZnMub3JpZy9zbWJmc192ZnNvcHMuYyBzeXMvZnMvc21iZnMvc21iZnNfdmZzb3BzLmMK LS0tIHN5cy9mcy9zbWJmcy5vcmlnL3NtYmZzX3Zmc29wcy5jCTIwMTItMTItMTQgMDk6NTk6MTEu Njc5NzQxMDAwICswMTAwCisrKyBzeXMvZnMvc21iZnMvc21iZnNfdmZzb3BzLmMJMjAxMy0wMi0y MCAwNjowMDozNy42Nzk4MjE5NDYgKzAxMDAKQEAgLTMxNiw3ICszMTYsNyBAQAogCX0KIAlzY3Jl ZCA9IHNtYmZzX21hbGxvY19zY3JlZCgpOwogCXNtYl9tYWtlc2NyZWQoc2NyZWQsIHRkLCBjcmVk KTsKLQllcnJvciA9IHNtYmZzX3NtYl9sb29rdXAoTlVMTCwgTlVMTCwgMCwgJmZhdHRyLCBzY3Jl ZCk7CisJZXJyb3IgPSBzbWJmc19zbWJfbG9va3VwKE5VTEwsIE5VTEwsIE5VTEwsICZmYXR0ciwg c2NyZWQpOwogCWlmIChlcnJvcikKIAkJZ290byBvdXQ7CiAJZXJyb3IgPSBzbWJmc19uZ2V0KG1w LCBOVUxMLCBOVUxMLCAwLCAmZmF0dHIsICZ2cCk7CmRpZmYgLWF1ck4gc3lzL2ZzL3NtYmZzLm9y aWcvc21iZnNfdm5vcHMuYyBzeXMvZnMvc21iZnMvc21iZnNfdm5vcHMuYwotLS0gc3lzL2ZzL3Nt YmZzLm9yaWcvc21iZnNfdm5vcHMuYwkyMDEyLTEyLTE0IDA5OjU5OjExLjY4Mzc0MDAwMCArMDEw MAorKysgc3lzL2ZzL3NtYmZzL3NtYmZzX3Zub3BzLmMJMjAxMy0wMi0yMCAwNjowOTo1Mi44MTM4 OTE3MDkgKzAxMDAKQEAgLTI3MCw3ICsyNzAsNyBAQAogCXNjcmVkID0gc21iZnNfbWFsbG9jX3Nj cmVkKCk7CiAJc21iX21ha2VzY3JlZChzY3JlZCwgY3VydGhyZWFkLCBhcC0+YV9jcmVkKTsKIAlv bGRzaXplID0gbnAtPm5fc2l6ZTsKLQllcnJvciA9IHNtYmZzX3NtYl9sb29rdXAobnAsIE5VTEws IDAsICZmYXR0ciwgc2NyZWQpOworCWVycm9yID0gc21iZnNfc21iX2xvb2t1cChucCwgTlVMTCwg TlVMTCwgJmZhdHRyLCBzY3JlZCk7CiAJaWYgKGVycm9yKSB7CiAJCVNNQlZERUJVRygiZXJyb3Ig JWRcbiIsIGVycm9yKTsKIAkJc21iZnNfZnJlZV9zY3JlZChzY3JlZCk7CkBAIC01MTQsNyArNTE0 LDcgQEAKIAllcnJvciA9IHNtYmZzX3NtYl9jcmVhdGUoZG5wLCBuYW1lLCBubWxlbiwgc2NyZWQp OwogCWlmIChlcnJvcikKIAkJZ290byBvdXQ7Ci0JZXJyb3IgPSBzbWJmc19zbWJfbG9va3VwKGRu cCwgbmFtZSwgbm1sZW4sICZmYXR0ciwgc2NyZWQpOworCWVycm9yID0gc21iZnNfc21iX2xvb2t1 cChkbnAsICZuYW1lLCAmbm1sZW4sICZmYXR0ciwgc2NyZWQpOwogCWlmIChlcnJvcikKIAkJZ290 byBvdXQ7CiAJZXJyb3IgPSBzbWJmc19uZ2V0KFZUT1ZGUyhkdnApLCBkdnAsIG5hbWUsIG5tbGVu LCAmZmF0dHIsICZ2cCk7CkBAIC01MjQsNiArNTI0LDggQEAKIAlpZiAoY25wLT5jbl9mbGFncyAm IE1BS0VFTlRSWSkKIAkJY2FjaGVfZW50ZXIoZHZwLCB2cCwgY25wKTsKIG91dDoKKwlpZiAobmFt ZSAhPSBjbnAtPmNuX25hbWVwdHIpCisJCXNtYmZzX25hbWVfZnJlZSgodV9jaGFyICopbmFtZSk7 CiAJc21iZnNfZnJlZV9zY3JlZChzY3JlZCk7CiAJcmV0dXJuIGVycm9yOwogfQpAQCAtNzIxLDE2 ICs3MjMsMTkgQEAKIAllcnJvciA9IHNtYmZzX3NtYl9ta2RpcihkbnAsIG5hbWUsIGxlbiwgc2Ny ZWQpOwogCWlmIChlcnJvcikKIAkJZ290byBvdXQ7Ci0JZXJyb3IgPSBzbWJmc19zbWJfbG9va3Vw KGRucCwgbmFtZSwgbGVuLCAmZmF0dHIsIHNjcmVkKTsKKwllcnJvciA9IHNtYmZzX3NtYl9sb29r dXAoZG5wLCAmbmFtZSwgJmxlbiwgJmZhdHRyLCBzY3JlZCk7CiAJaWYgKGVycm9yKQogCQlnb3Rv IG91dDsKIAllcnJvciA9IHNtYmZzX25nZXQoVlRPVkZTKGR2cCksIGR2cCwgbmFtZSwgbGVuLCAm ZmF0dHIsICZ2cCk7CiAJaWYgKGVycm9yKQogCQlnb3RvIG91dDsKIAkqYXAtPmFfdnBwID0gdnA7 CisJZXJyb3IgPSAwOwogb3V0OgorCWlmIChuYW1lICE9IGNucC0+Y25fbmFtZXB0cikKKwkJc21i ZnNfbmFtZV9mcmVlKCh1X2NoYXIgKiluYW1lKTsKIAlzbWJmc19mcmVlX3NjcmVkKHNjcmVkKTsK LQlyZXR1cm4gMDsKKwlyZXR1cm4gZXJyb3I7CiB9CiAKIC8qCkBAIC0xMTUwLDcgKzExNTUsNyBA QAogCQlyZXR1cm4gRU5PRU5UOwogCiAJZXJyb3IgPSBjYWNoZV9sb29rdXAoZHZwLCB2cHAsIGNu cCwgTlVMTCwgTlVMTCk7Ci0JU01CVkRFQlVHKCJjYWNoZV9sb29rdXAgcmV0dXJuZWQgJWRcbiIs IGVycm9yKTsKKwlTTUJWREVCVUcoImNhY2hlX2xvb2t1cCBmb3IgJyVzJyByZXR1cm5lZCAlZFxu IiwgY25wLT5jbl9uYW1lcHRyLCBlcnJvcik7CiAJaWYgKGVycm9yID4gMCkKIAkJcmV0dXJuIGVy cm9yOwogCWlmIChlcnJvcikgewkJLyogbmFtZSB3YXMgZm91bmQgKi8KQEAgLTExOTUsNyArMTIw MCw4IEBACiAJCSp2cHAgPSBOVUxMVlA7CiAJfQogCS8qIAotCSAqIGVudHJ5IGlzIG5vdCBpbiB0 aGUgY2FjaGUgb3IgaGFzIGJlZW4gZXhwaXJlZAorCSAqIGVudHJ5IGlzIG5vdCBpbiB0aGUgY2Fj aGUsIGhhcyBiZWVuIGV4cGlyZWQKKwkgKiBvciBlbnRyeSBpbiB0aGUgY2FjaGUgZGlkIG5vdCBt YXRjaCBpbnB1dCBmaWxlbmFtZSdzIGNhc2UKIAkgKi8KIAllcnJvciA9IDA7CiAJKnZwcCA9IE5V TExWUDsKQEAgLTEyMDMsMTggKzEyMDksMTkgQEAKIAlzbWJfbWFrZXNjcmVkKHNjcmVkLCB0ZCwg Y25wLT5jbl9jcmVkKTsKIAlmYXAgPSAmZmF0dHI7CiAJaWYgKGZsYWdzICYgSVNET1RET1QpIHsK LQkJZXJyb3IgPSBzbWJmc19zbWJfbG9va3VwKFZUT1NNQihkbnAtPm5fcGFyZW50KSwgTlVMTCwg MCwgZmFwLAorCQllcnJvciA9IHNtYmZzX3NtYl9sb29rdXAoVlRPU01CKGRucC0+bl9wYXJlbnQp LCBOVUxMLCBOVUxMLCBmYXAsCiAJCSAgICBzY3JlZCk7Ci0JCVNNQlZERUJVRygicmVzdWx0IG9m IGRvdGRvdCBsb29rdXA6ICVkXG4iLCBlcnJvcik7CisJCVNNQlZERUJVRygicmVzdWx0IG9mIGRv dGRvdCBzbWJmc19zbWJfbG9va3VwOiAlZFxuIiwgZXJyb3IpOwogCX0gZWxzZSB7Ci0JCWZhcCA9 ICZmYXR0cjsKLQkJZXJyb3IgPSBzbWJmc19zbWJfbG9va3VwKGRucCwgbmFtZSwgbm1sZW4sIGZh cCwgc2NyZWQpOworCQllcnJvciA9IHNtYmZzX3NtYl9sb29rdXAoZG5wLCAmbmFtZSwgJm5tbGVu LCBmYXAsIHNjcmVkKTsKIC8qCQlpZiAoY25wLT5jbl9uYW1lbGVuID09IDEgJiYgY25wLT5jbl9u YW1lcHRyWzBdID09ICcuJykqLwogCQlTTUJWREVCVUcoInJlc3VsdCBvZiBzbWJmc19zbWJfbG9v a3VwOiAlZFxuIiwgZXJyb3IpOwogCX0KIAlpZiAoZXJyb3IgJiYgZXJyb3IgIT0gRU5PRU5UKQog CQlnb3RvIG91dDsKIAlpZiAoZXJyb3IpIHsJCQkvKiBlbnRyeSBub3QgZm91bmQgKi8KKwkJU01C VkRFQlVHKCJlbnRyeSBub3QgZm91bmQgb24gc2VydmVyXG4iKTsKKwogCQkvKgogCQkgKiBIYW5k bGUgUkVOQU1FIG9yIENSRUFURSBjYXNlLi4uCiAJCSAqLwpAQCAtMTIyOCw5ICsxMjM1LDExIEBA CiAJCX0KIAkJZXJyb3IgPSBFTk9FTlQ7CiAJCWdvdG8gb3V0OwotCX0vKiBlbHNlIHsKLQkJU01C VkRFQlVHKCJGb3VuZCBlbnRyeSAlcyB3aXRoIGlkPSVkXG4iLCBmYXAtPmVudHJ5TmFtZSwgZmFw LT5kaXJFbnROdW0pOwotCX0qLworCX0KKworCS8qIGVudHJ5IGZvdW5kICovCisJU01CVkRFQlVH KCJlbnRyeSBmb3VuZCBvbiBzZXJ2ZXI6ICclcydcbiIsIG5hbWUpOworCiAJLyoKIAkgKiBoYW5k bGUgREVMRVRFIGNhc2UgLi4uCiAJICovCkBAIC0xMjUxLDYgKzEyNjAsMTMgQEAKIAkJZ290byBv dXQ7CiAJfQogCWlmIChuYW1laW9wID09IFJFTkFNRSAmJiBpc2xhc3RjbikgeworCQlpZiAobmFt ZSAhPSBjbnAtPmNuX25hbWVwdHIpIHsKKwkJCS8qIFRhcmdldCBoYXMgYmVlbiBmb3VuZCBvbiB0 aGUgc2VydmVyLiBKdXN0IHJldHVybiBoZXJlCisJCQkqIHRvIGF2b2lyIGZhbGxpbmcgdG8gdGhl IHNvdXJjZSB2bm9kZSwgd2hpY2ggd291bGQgbGVhZAorCQkJKiB0byBOT1QgY2FsbCB0aGUgcmVu YW1lIHN5c2NhbGwgKi8KKwkJCWVycm9yID0gRUpVU1RSRVRVUk47CisJCQlnb3RvIG91dDsKKwkJ fQogCQllcnJvciA9IFZPUF9BQ0NFU1MoZHZwLCBWV1JJVEUsIGNucC0+Y25fY3JlZCwgdGQpOwog CQlpZiAoZXJyb3IpCiAJCQlnb3RvIG91dDsKQEAgLTEyNzQsMTEgKzEyOTAsMTQgQEAKIAkJCWVy cm9yID0gdmZzX2J1c3kobXAsIDApOwogCQkJdm5fbG9jayhkdnAsIExLX0VYQ0xVU0lWRSB8IExL X1JFVFJZKTsKIAkJCXZmc19yZWwobXApOwotCQkJaWYgKGVycm9yKQotCQkJCXJldHVybiAoRU5P RU5UKTsKKwkJCWlmIChlcnJvcikgeworCQkJCWVycm9yID0gRU5PRU5UOworCQkJCWdvdG8gb3V0 OworCQkJfQogCQkJaWYgKChkdnAtPnZfaWZsYWcgJiBWSV9ET09NRUQpICE9IDApIHsKIAkJCQl2 ZnNfdW5idXN5KG1wKTsKLQkJCQlyZXR1cm4gKEVOT0VOVCk7CQorCQkJCWVycm9yID0gRU5PRU5U OworCQkJCWdvdG8gb3V0OwogCQkJfQogCQl9CQogCQlWT1BfVU5MT0NLKGR2cCwgMCk7CkBAIC0x MzA4LDYgKzEzMjcsOCBAQAogCQljYWNoZV9lbnRlcihkdnAsICp2cHAsIGNucCk7CiAJfQogb3V0 OgorCWlmIChuYW1lICE9IGNucC0+Y25fbmFtZXB0cikKKwkJc21iZnNfbmFtZV9mcmVlKCh1X2No YXIgKiluYW1lKTsKIAlzbWJmc19mcmVlX3NjcmVkKHNjcmVkKTsKIAlyZXR1cm4gKGVycm9yKTsK IH0K ------=OPENWEBMAIL_ATT_0.350054851341159-- From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 08:28:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 91B6C342 for ; Wed, 20 Feb 2013 08:28:47 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id 20AC9AC6 for ; Wed, 20 Feb 2013 08:28:46 +0000 (UTC) Received: from server.rulingia.com (c220-239-237-213.belrs5.nsw.optusnet.com.au [220.239.237.213]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id r1K8SX8C070885 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 20 Feb 2013 19:28:33 +1100 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id r1K8SSHv003583 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 20 Feb 2013 19:28:28 +1100 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id r1K8SSBc003581; Wed, 20 Feb 2013 19:28:28 +1100 (EST) (envelope-from peter) Date: Wed, 20 Feb 2013 19:28:28 +1100 From: Peter Jeremy To: Kevin Day Subject: Re: Improving ZFS performance for large directories Message-ID: <20130220082828.GA44920@server.rulingia.com> References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com> <20130201192416.GA76461@server.rulingia.com> <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="1yeeQ81UyVL57Vl7" Content-Disposition: inline In-Reply-To: <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 08:28:47 -0000 --1yeeQ81UyVL57Vl7 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2013-Feb-19 14:10:47 -0600, Kevin Day wrote: >Timing doing an "ls" in large directories 20 times, the first is the >slowest, then all subsequent listings are roughly the same. OK. My testing was on large files rather than large amounts of metadata. >Thinking I'd make the primary cache metadata only, and the secondary >cache "all" would improve things, This won't work as expected. L2ARC only caches data coming out of ARC so by setting ARC to cache metadata only, there's never any "data" in ARC and hence never any evicted from ARC to L2ARC. > I wiped the device (SATA secure erase to make sure) That's not necessary. L2ARC doesn't survive reboots because all teh L2ARC "metadata" is in ARC only. This does mean that it takes quite a while for L2ARC to warm up following a reboot. >Before adding the SSD, an "ls" in a directory with 65k files would >take 10-30 seconds, it's now down to about 0.2 seconds. That sounds quite good. > There are roughly 29M files, growing at about 50k files/day. We >recently upgraded, and are now at 96 3TB drives in the pool.=20 That number of files isn't really excessive but it sounds like your workload has very low locality. At this stage, my suggestions are: 1) Disable atime if you don't need it & haven't already. Otherwise file accesses are triggering metadata updates. 2) Increase vfs.zfs.arc_meta_limit You're still getting more metadata misses than data misses 3) Increase your ARC size (more RAM) Your pool is quite large compared to your RAM. >It's a 250G drive, and only 22G is being used, and there's still a >~66% miss rate. That's 66% of the requests that missed in ARC. > Is there any way to tell why more metadata isn't >being pushed to the L2ARC? ZFS treats writing to L2ARC very much as an afterthought. L2ARC writes are rate limited by vfs.zfs.l2arc_write_{boost,max} and will be aborted if they might interfere with a read. I'm not sure how to improve it. Since this is all generic ZFS, you might like to try asking on zfs@lists.illumos.org as well. Some of the experts there might have some ideas. --=20 Peter Jeremy --1yeeQ81UyVL57Vl7 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlEkiSwACgkQ/opHv/APuIdsKQCgq90SUs/wm9rYE5moVPpIXBHu PCcAn38hMTi+YFknk64N3ro4mR/dSKsk =Sl9j -----END PGP SIGNATURE----- --1yeeQ81UyVL57Vl7-- From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 10:59:00 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A09CDC1D for ; Wed, 20 Feb 2013 10:59:00 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id E7CA16FB for ; Wed, 20 Feb 2013 10:58:59 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA24598; Wed, 20 Feb 2013 12:58:49 +0200 (EET) (envelope-from avg@FreeBSD.org) Message-ID: <5124AC69.6010709@FreeBSD.org> Date: Wed, 20 Feb 2013 12:58:49 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130206 Thunderbird/17.0.2 MIME-Version: 1.0 To: Kevin Day Subject: Re: Improving ZFS performance for large directories References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com> <20130201192416.GA76461@server.rulingia.com> <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com> In-Reply-To: <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com> X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 10:59:00 -0000 on 19/02/2013 22:10 Kevin Day said the following: > Timing doing an "ls" in large directories 20 times, the first is the slowest, then all subsequent listings are roughly the same. There doesn't appear to be any gain after 20 repetitions I think that the above could be related to the below > vfs.zfs.arc_meta_limit 16398159872 > vfs.zfs.arc_meta_used 16398120264 -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 11:16:59 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2F26C597 for ; Wed, 20 Feb 2013 11:16:59 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from proxypop04.sare.net (proxypop04.sare.net [194.30.0.65]) by mx1.freebsd.org (Postfix) with ESMTP id EB058844 for ; Wed, 20 Feb 2013 11:16:58 +0000 (UTC) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop04.sare.net (Postfix) with ESMTPSA id E74859DF0E2 for ; Wed, 20 Feb 2013 12:16:56 +0100 (CET) From: Borja Marcos Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Subject: ZFS, lies and statistics Date: Wed, 20 Feb 2013 12:16:53 +0100 Message-Id: To: FreeBSD Filesystems Mime-Version: 1.0 (Apple Message framework v1085) X-Mailer: Apple Mail (2.1085) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 11:16:59 -0000 Hi :) Still working on polishing devilator to graph some meaningul ZFS = statistics. I have been peeking at the ZFS statistics. In RELENG-9.2 it=B4s a bit = confusing, as there are two different sets of statistics: Some are under vfs.zfs (I think the following ones are relevant): vfs.zfs.l2c_only_size: 36603866112 vfs.zfs.mfu_ghost_data_lsize: 730375168 vfs.zfs.mfu_ghost_metadata_lsize: 1075240960 vfs.zfs.mfu_ghost_size: 1805616128 vfs.zfs.mfu_data_lsize: 1785933312 vfs.zfs.mfu_metadata_lsize: 373835264 vfs.zfs.mfu_size: 2198305792 vfs.zfs.mru_ghost_data_lsize: 550150656 vfs.zfs.mru_ghost_metadata_lsize: 1859320320 vfs.zfs.mru_ghost_size: 2409470976 vfs.zfs.mru_data_lsize: 503643648 vfs.zfs.mru_metadata_lsize: 105602560 vfs.zfs.mru_size: 800031744 vfs.zfs.anon_data_lsize: 0 vfs.zfs.anon_metadata_lsize: 0 vfs.zfs.anon_size: 673792 vfs.zfs.arc_meta_used: 2087455400 and of course the new kstat tree. But there is a discrepancy between the information provided by both, or = I am missing something. Having a look at these samples (this is updated from my test system, so = anyone interested can have a look at the progress), I have tried to = obtain a good graph of the ARC breakdown using two different approaches = (one borrowed from the Solaris version of arcstats.pl, which makes this = calculation,=20 mru_size =3D ARCSTATS_P; if ( ARCSTATS_SIZE > ARCSTATS_C ) mfu_size =3D ARCSTATS_SIZE - mru_size; else mfu_size =3D ARCSTATS_C - mru_size; add_output_u64("zfs_mfu_size", mfu_size); add_output_u64("zfs_mru_size", mru_size); and, on the other hand, I'm using the sized directly provided by = vfs.zfs, which turn out to be different. Which one would be the best? Thanks, Borja. From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 11:21:30 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4E5A5E3F for ; Wed, 20 Feb 2013 11:21:30 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from proxypop04.sare.net (proxypop04.sare.net [194.30.0.65]) by mx1.freebsd.org (Postfix) with ESMTP id 143F88EB for ; Wed, 20 Feb 2013 11:21:29 +0000 (UTC) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop04.sare.net (Postfix) with ESMTPSA id AC4B89DF051; Wed, 20 Feb 2013 12:21:28 +0100 (CET) Subject: Re: ZFS, lies and statistics Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Borja Marcos In-Reply-To: Date: Wed, 20 Feb 2013 12:21:27 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <8198B045-C5FD-48EA-B17C-52F2386FD727@sarenet.es> References: To: Borja Marcos X-Mailer: Apple Mail (2.1085) Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 11:21:30 -0000 On Feb 20, 2013, at 12:16 PM, Borja Marcos wrote: > Still working on polishing devilator to graph some meaningul ZFS = statistics. Sorry, I forgot to include the URL. http://devilator.frobula.com/ Borja. From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 14:53:45 2013 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C229D86C for ; Wed, 20 Feb 2013 14:53:45 +0000 (UTC) (envelope-from jamie@FreeBSD.org) Received: from m2.gritton.org (gritton.org [199.192.164.235]) by mx1.freebsd.org (Postfix) with ESMTP id 751BB92E for ; Wed, 20 Feb 2013 14:53:44 +0000 (UTC) Received: from guppy.corp.verio.net (fw.oremut02.us.wh.verio.net [198.65.168.24]) (authenticated bits=0) by m2.gritton.org (8.14.5/8.14.5) with ESMTP id r1KErhdq029382; Wed, 20 Feb 2013 07:53:43 -0700 (MST) (envelope-from jamie@FreeBSD.org) Message-ID: <5124E372.1000009@FreeBSD.org> Date: Wed, 20 Feb 2013 07:53:38 -0700 From: Jamie Gritton User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120126 Thunderbird/9.0 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: mount/kldload race References: <51244A13.8030907@FreeBSD.org> <20130220054309.GD2598@kib.kiev.ua> In-Reply-To: <20130220054309.GD2598@kib.kiev.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 14:53:45 -0000 On 02/19/13 22:43, Konstantin Belousov wrote: > On Tue, Feb 19, 2013 at 08:59:15PM -0700, Jamie Gritton wrote: >> Perhaps most people don't try to mount a bunch of filesystems at the >> same time, at least not those that depend on kernel modules. But it >> turns out that's going to be a pretty common situation with jails and >> nullfs. And I found that when attempting such a feat will cause most of >> these simultaneous mounts to fail with ENODEV. >> >> It turns out that the problem is a race in vfs_byname_kld(). First it'll >> see if the fstype is loaded, and if it isn't then it will load the >> module. But if the module is loaded by a different process between those >> two points, the resulting EEXIST from kern_kldload() will make >> vfs_byname_kld() error out. >> >> The fix is pretty simple: don't treat EEXIST as an error. By going on, >> and rechecking for the fstype, the filesystem can be mounted while still >> allowing any "real" error to be caught. I'm including a small patch that >> will accomplish this, and I'd appreciate a quick look by anyone who's >> familiar with this part of things before I commit it. >> >> - Jamie > >> Index: sys/kern/vfs_init.c >> =================================================================== >> --- sys/kern/vfs_init.c (revision 247000) >> +++ sys/kern/vfs_init.c (working copy) >> @@ -130,13 +130,18 @@ >> >> /* Try to load the respective module. */ >> *error = kern_kldload(td, fstype,&fileid); >> + if (*error == EEXIST) { >> + *error = 0; >> + fileid = 0; > Why do you clear fileid ? Is this to prevent an attempt to kldunload() > the module which was not loaded by the current thread ? > > If yes, I would suggest to use the separate flag to track this, > which is cleared on EEXIST error. IMHO it is cleaner and less puzzling. Yes, that's why. As a side note, I clear *error ostensibly for the sake of the callers, but it turns out none of the callers actually look at the returned error. Here's a new patch with an added flag: Index: sys/kern/vfs_init.c =================================================================== --- sys/kern/vfs_init.c (revision 247000) +++ sys/kern/vfs_init.c (working copy) @@ -122,7 +122,7 @@ vfs_byname_kld(const char *fstype, struct thread *td, int *error) { struct vfsconf *vfsp; - int fileid; + int fileid, loaded; vfsp = vfs_byname(fstype); if (vfsp != NULL) @@ -130,13 +130,17 @@ /* Try to load the respective module. */ *error = kern_kldload(td, fstype, &fileid); + loaded = (*error == 0); + if (*error == EEXIST) + *error = 0; if (*error) return (NULL); /* Look up again to see if the VFS was loaded. */ vfsp = vfs_byname(fstype); if (vfsp == NULL) { - (void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE); + if (loaded) + (void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE); *error = ENODEV; return (NULL); } From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 15:37:14 2013 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 04CFEB41; Wed, 20 Feb 2013 15:37:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 57222C29; Wed, 20 Feb 2013 15:37:13 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r1KFb5Uo069015; Wed, 20 Feb 2013 17:37:05 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r1KFb5Uo069015 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r1KFb5ZH069014; Wed, 20 Feb 2013 17:37:05 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 20 Feb 2013 17:37:05 +0200 From: Konstantin Belousov To: Jamie Gritton Subject: Re: mount/kldload race Message-ID: <20130220153705.GE2598@kib.kiev.ua> References: <51244A13.8030907@FreeBSD.org> <20130220054309.GD2598@kib.kiev.ua> <5124E372.1000009@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="UOi+gfmBpEZPw9cU" Content-Disposition: inline In-Reply-To: <5124E372.1000009@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 15:37:14 -0000 --UOi+gfmBpEZPw9cU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Feb 20, 2013 at 07:53:38AM -0700, Jamie Gritton wrote: > On 02/19/13 22:43, Konstantin Belousov wrote: > > On Tue, Feb 19, 2013 at 08:59:15PM -0700, Jamie Gritton wrote: > >> Perhaps most people don't try to mount a bunch of filesystems at the > >> same time, at least not those that depend on kernel modules. But it > >> turns out that's going to be a pretty common situation with jails and > >> nullfs. And I found that when attempting such a feat will cause most of > >> these simultaneous mounts to fail with ENODEV. > >> > >> It turns out that the problem is a race in vfs_byname_kld(). First it'= ll > >> see if the fstype is loaded, and if it isn't then it will load the > >> module. But if the module is loaded by a different process between tho= se > >> two points, the resulting EEXIST from kern_kldload() will make > >> vfs_byname_kld() error out. > >> > >> The fix is pretty simple: don't treat EEXIST as an error. By going on, > >> and rechecking for the fstype, the filesystem can be mounted while sti= ll > >> allowing any "real" error to be caught. I'm including a small patch th= at > >> will accomplish this, and I'd appreciate a quick look by anyone who's > >> familiar with this part of things before I commit it. > >> > >> - Jamie > > > >> Index: sys/kern/vfs_init.c > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> --- sys/kern/vfs_init.c (revision 247000) > >> +++ sys/kern/vfs_init.c (working copy) > >> @@ -130,13 +130,18 @@ > >> > >> /* Try to load the respective module. */ > >> *error =3D kern_kldload(td, fstype,&fileid); > >> + if (*error =3D=3D EEXIST) { > >> + *error =3D 0; > >> + fileid =3D 0; > > Why do you clear fileid ? Is this to prevent an attempt to kldunload() > > the module which was not loaded by the current thread ? > > > > If yes, I would suggest to use the separate flag to track this, > > which is cleared on EEXIST error. IMHO it is cleaner and less puzzling. >=20 > Yes, that's why. As a side note, I clear *error ostensibly for the sake= =20 > of the callers, but it turns out none of the callers actually look at=20 > the returned error. >=20 > Here's a new patch with an added flag: I have no further comments, looks good. >=20 >=20 > Index: sys/kern/vfs_init.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/kern/vfs_init.c (revision 247000) > +++ sys/kern/vfs_init.c (working copy) > @@ -122,7 +122,7 @@ > vfs_byname_kld(const char *fstype, struct thread *td, int *error) > { > struct vfsconf *vfsp; > - int fileid; > + int fileid, loaded; >=20 > vfsp =3D vfs_byname(fstype); > if (vfsp !=3D NULL) > @@ -130,13 +130,17 @@ >=20 > /* Try to load the respective module. */ > *error =3D kern_kldload(td, fstype, &fileid); > + loaded =3D (*error =3D=3D 0); > + if (*error =3D=3D EEXIST) > + *error =3D 0; > if (*error) > return (NULL); >=20 > /* Look up again to see if the VFS was loaded. */ > vfsp =3D vfs_byname(fstype); > if (vfsp =3D=3D NULL) { > - (void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE); > + if (loaded) > + (void)kern_kldunload(td, fileid, LINKER_UNLOAD_FORCE); > *error =3D ENODEV; > return (NULL); > } --UOi+gfmBpEZPw9cU Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRJO2gAAoJEJDCuSvBvK1Bj44P/3XSMKbo4w/4W3Gf7MnCcn01 NOkmXv36TthOxkXUEOWldzKUQzYuvG4lbQ8m8w+nrlP5C5wVFY2a7jWAZBHLEs96 m/H35R5Dke3a4UEd1Q7lb47LhIhR65hmzyBYCP2ylwLKdAI4kuWxmoFPV62QFIqt csMomk2N7FtMB4F4ryMxHTA6ERzbjh5JE+kdR2KtDFXOdWeDa7no4NYnXjAcxACK +ikfCHrvYfgsq3goUg12CSFbQth9naKSbRtSm5qXF07akjnwNc+mtrni4mxN/+YM 8kgehvojPIFkDzOw+tTK7EnRlTovXroV0VcDm0r0rAZFo/3eJxvh5nIqbyBpM4oO tqTs1qGk1o4XA8lytAfP2UFUb9LOD2CIarjcre6Mj/tF5t0CThZLJEZvLRjxNZ0L Cp46C/vf6m348UsP06gu74WibtGwVBlJthr+IXwLUZnNvsKrDZLrq4EXl7n1gxJ9 VYkeIYStpKklwj/6h0GHBAZhF9sbDUsqUPQ6VFnKz6EOQIxsZTIQtytWImLOTArZ DHZRKta7AdtylDk8YHH83p90AMEduCBFWse/ZGv7+kcUbyehtfMm1QmTkdBwPZd4 WYKd/Gy5y35xgD+MWuTvly7aqss3dwHk581CQyfAjFYEkvcW8VZtLYDrgbrk9rfu /vE7ZFtHooVRyr6mlsV2 =PTS+ -----END PGP SIGNATURE----- --UOi+gfmBpEZPw9cU-- From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 16:07:16 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 678B69F8 for ; Wed, 20 Feb 2013 16:07:16 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: from mail-ia0-x22e.google.com (ia-in-x022e.1e100.net [IPv6:2607:f8b0:4001:c02::22e]) by mx1.freebsd.org (Postfix) with ESMTP id E6C0CE75 for ; Wed, 20 Feb 2013 16:07:15 +0000 (UTC) Received: by mail-ia0-f174.google.com with SMTP id u20so3104499iag.5 for ; Wed, 20 Feb 2013 08:07:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dragondata.com; s=google; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :cc:content-transfer-encoding:message-id:references:to:x-mailer; bh=di9AS21CdIBzZAWj2A8TRmDTg3dF/VnKOSDwhlacPU8=; b=phfaBN8o5G261w5EbNaWYFV2Z4P5c0ol0meJbvSO17qrsNZIH/CfK+b9IvdLFcrFnE 14lYZucXNYbX0wBlLm5DdGSCra1fdh6HMxE5Y3gefix72VHORWYQf3wc/xYAGmIxb4bJ Dc8j1/xEXMNptMeRR7X7nYDcFVvzhkOs5cM+I= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :cc:content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=di9AS21CdIBzZAWj2A8TRmDTg3dF/VnKOSDwhlacPU8=; b=k3+bfI6lXqXAY/GFPBKP/FtYGS7hBg2PV2ncjW2aOapRvQ01IlF5kW5bm3XsX/nhuj jZ9mG0P6fO66tRKJp7Sf0SDO0WJFP/BrxtcW1E/AMvY3LKiLHSaJosuxCfAYG1kxgMq5 /xcRvh1BBaFI+ZMv3aufPYwsxgd+k/SCqI69AM1ipYE8PFSFEhj28Il5G/O4OeGpLM+5 EPlg7gjw1LI/GwTQxuAB7cu9JYiFszn+Q53drXgLS/c0z2RSXnm7OgrmQRcmsWoEpodS 8y6RsNcu9KGuL+eNufDnuXbR79xv9B0xUNJcu+R8HYBkoHkWRBc1aQPi1ppwIalm336f oa/Q== X-Received: by 10.42.95.146 with SMTP id f18mr9641819icn.9.1361376435436; Wed, 20 Feb 2013 08:07:15 -0800 (PST) Received: from vpn132.rw1.your.org (vpn132.rw1.your.org. [204.9.51.132]) by mx.google.com with ESMTPS id s8sm766074igs.0.2013.02.20.08.07.13 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 20 Feb 2013 08:07:14 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Improving ZFS performance for large directories From: Kevin Day In-Reply-To: <20130220082828.GA44920@server.rulingia.com> Date: Wed, 20 Feb 2013 10:07:11 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <2F90562A-7F98-49A5-8431-4313961EFA70@dragondata.com> References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com> <20130201192416.GA76461@server.rulingia.com> <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com> <20130220082828.GA44920@server.rulingia.com> To: Peter Jeremy X-Mailer: Apple Mail (2.1499) X-Gm-Message-State: ALoCoQl1AObRP2c0/VpjQFroRFEvU3zm14dOacSgmElQHoY5m5tRgz5g59GyZkUDONvGkQ/Oxge2 Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 16:07:16 -0000 On Feb 20, 2013, at 2:28 AM, Peter Jeremy wrote: >> Thinking I'd make the primary cache metadata only, and the secondary >> cache "all" would improve things, >=20 > This won't work as expected. L2ARC only caches data coming out of ARC > so by setting ARC to cache metadata only, there's never any "data" in > ARC and hence never any evicted from ARC to L2ARC. >=20 That makes sense, I wasn't sure if it was smart enough to realize this = happening or not, but I guess it won't work. >> I wiped the device (SATA secure erase to make sure) >=20 > That's not necessary. L2ARC doesn't survive reboots because all teh > L2ARC "metadata" is in ARC only. This does mean that it takes quite > a while for L2ARC to warm up following a reboot. >=20 I was more concerned with the SSD's performance than ZFS caring what was = there. A few cases completely filled the SSD, which can slow things down = (there are no free blocks for it to use). Secure Erase will reset it so = the drive's controller knows EVERYTHING is really free. We have one = model of SSD here that will drop to about 5% of it's original = performance after every block on the drive has been written to once. = We're not using that model anymore, but I still like to be sure. :) >> There are roughly 29M files, growing at about 50k files/day. We >> recently upgraded, and are now at 96 3TB drives in the pool.=20 >=20 > That number of files isn't really excessive but it sounds like your > workload has very low locality. At this stage, my suggestions are: > 1) Disable atime if you don't need it & haven't already. > Otherwise file accesses are triggering metadata updates. > 2) Increase vfs.zfs.arc_meta_limit > You're still getting more metadata misses than data misses > 3) Increase your ARC size (more RAM) > Your pool is quite large compared to your RAM. >=20 Yeah, I think the locality is basically zero. It's multiple rsyncs = running across the entire filesystem repeatedly. Each directory is only = going to be touched once per pass through, so that isn't really going to = benefit much from cache unless we get lucky and two rsyncs come in = back-to-back where one is chasing another. Atime is already off globally - nothing we use needs it. We are at the = limit for RAM for this motherboard, so any further increases are going = to be quite expensive.=20 >=20 >> Is there any way to tell why more metadata isn't >> being pushed to the L2ARC? >=20 > ZFS treats writing to L2ARC very much as an afterthought. L2ARC = writes > are rate limited by vfs.zfs.l2arc_write_{boost,max} and will be = aborted > if they might interfere with a read. I'm not sure how to improve it. >=20 At this stage there are just zero writes being done, so perhaps the = problem is that with so much pressure on the arc metadata, nothing is = getting a chance to get pushed into the L2ARC. I'm going to try to = increase the meta limit on ARC, but there's not a great deal more I can = do. > Since this is all generic ZFS, you might like to try asking on > zfs@lists.illumos.org as well. Some of the experts there might have > some ideas. I will try that, thanks! -- Kevin From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 19:27:06 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 584F02CF for ; Wed, 20 Feb 2013 19:27:06 +0000 (UTC) (envelope-from radiomlodychbandytow@o2.pl) Received: from moh2-ve1.go2.pl (moh2-ve1.go2.pl [193.17.41.186]) by mx1.freebsd.org (Postfix) with ESMTP id E4A25E47 for ; Wed, 20 Feb 2013 19:27:05 +0000 (UTC) Received: from moh2-ve1.go2.pl (unknown [10.0.0.186]) by moh2-ve1.go2.pl (Postfix) with ESMTP id 31D0244D54F for ; Wed, 20 Feb 2013 20:26:43 +0100 (CET) Received: from unknown (unknown [10.0.0.74]) by moh2-ve1.go2.pl (Postfix) with SMTP for ; Wed, 20 Feb 2013 20:26:43 +0100 (CET) Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id nQYMSp; Wed, 20 Feb 2013 20:26:43 +0100 Message-ID: <51252372.1040001@o2.pl> Date: Wed, 20 Feb 2013 20:26:42 +0100 From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130201 Thunderbird/17.0.2 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Some filesystem thoughts References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-O2-Trust: 1, 31 X-O2-SPF: neutral X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 19:27:06 -0000 Hello, I'm a pretty fresh Unix user, suffering from productivity loss caused by changing OS. I dearly miss a couple of facilities that I had implemented in my file manager (Total Commander) and sought a file manager that could replace them. I'm pretty sure there's none. I found that Unix does some of them at the OS level and that's a superior way. But with others it doesn't; some file managers implement them by themselves, much like TC (not well enough, but that's another rant), but I think that for them, OS is the right place too and that's what I'd like to talk about. I come with a free idea that I think would be awesome to have implemented, while not being sure it even can be implemented sensibly within Unix. Maybe I miss something? Maybe the idea ain't good? Maybe there are things that do the job well enough already and I just miss them? Anyway, here's the story: Total Commander's filesystem plugins are awesome. They enable users to manage remote / virtual resources just like remote filesystems. FTP, websites, process list, calendar; the variety is rich. In Unix, there are equivalents for some of them; the ones that mattered the most for me can be usually simulated by mount. And that's a better way because when mounted, they can be used by any program, not just file manager. I'm sure that all people here are used to enjoying the benefits of this approach, though for me they are novel. The other thing - packer plugins. They allow treating archives like directories. Again, there are many useful ones, some obvious (like zips), some not so much. I treated my executables as directories, which enabled me to easily manipulate resources stored inside. Especially useful when hacking closed-source Delphi programs as they contain lots of GUI code stored directly (The name 'TNASTYNAGSCREEN' will stay in my mind for long). Or extracting icons. Or doing many other things that are necessary to play with closed source code, but less relevant in Unix. There was a steganography plugin storing data inside images. A plugin for generation and browsing of file lists. A Java decompiler. And a great variety of others. Unix file managers offer similar, though not so rich options. Yet I think it's not their job. Like with mounting, there's great benefit from being able to use standard tools with them. Some write things like zipfs, but I think it's wrong. First, typing a command is cumbersome. Second, even if it was automated, mounting needs a mount point. The only good one is the file itself; working with a dozen (or thousand) of archives in a single directory is a norm for me. Switching dirs back and forth would be very disruptive. Breaks relative paths. And so on. The way I see it is not to treat files as streams of bytes. That's not what they are, files have meanings and there are tools that bring them out. A picture is a stored emotion. OK, there are no tools for that yet. But it is also an array of pixels. And a container with exif data. And may be a container with an encrypted archive. And, a stream of bytes too. They have multiple facets. I think that it would be useful to somehow expose them to applications. Wouldn't it be useful to be able to grep through pdfs in your email attachments? Mass-edit music tags with sed? Manually edit with your favourite text editor instead of the sucky one-liner provided by your favourite music player? How about video players being able to play videos by reading them in decoded form directly from the filesystem instead of having to integrate a significant number of complex libraries to provide sufficient format coverage? -- Twoje radio From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 19:37:23 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 876ED807 for ; Wed, 20 Feb 2013 19:37:23 +0000 (UTC) (envelope-from momchil@xaxo.eu) Received: from vps2.xaxo.eu (vps2.xaxo.eu [78.47.156.66]) by mx1.freebsd.org (Postfix) with ESMTP id 13A4CEE4 for ; Wed, 20 Feb 2013 19:37:22 +0000 (UTC) Received: from t61.xaxo.eu ([10.75.23.6]) by vps2.xaxo.eu (8.14.4/8.14.4) with ESMTP id r1KJbE4D087018; Wed, 20 Feb 2013 20:37:14 +0100 (CET) (envelope-from momchil@xaxo.eu) Date: Wed, 20 Feb 2013 20:37:07 +0100 Message-ID: <86621m4w0s.wl%momchil@xaxo.eu> From: Momchil Ivanov To: Rick Macklem Subject: Re: NFS + Kerberos In-Reply-To: <992481316.3137385.1361325642681.JavaMail.root@erie.cs.uoguelph.ca> References: <86a88ac8bb038ec5d8034724dcf80924.squirrel@webmail.xaxo.eu> <992481316.3137385.1361325642681.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Cc: freebsd-fs@freebsd.org, Momchil Ivanov X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 19:37:23 -0000 At Tue, 19 Feb 2013 21:00:42 -0500 (EST), Rick Macklem wrote: > > Momchil Ivanov wrote: > > On Tue, February 19, 2013 12:56 am, Rick Macklem wrote: > > > Thanks to Elias's hard work, a bug/fix has just been isolated in the > > > Kerberos library that causes the gssd to fail to translate a > > > principal > > > to a uid. The fix is to increase the size of the buffer passed to > > > getpwnam_r(). See this thread: > > > http://docs.FreeBSD.org/cgi/mid.cgi?CADtN0WKVzbKxhaLQw8y2KLhhRJC9n4ht9wyPmGQ+pHqSjQkVNw > > > > > > I haven't run into this bug, so I don't know what systems are > > > affected, > > > but it would explain why you can't get it working. > > > > > > I'd suggest you apply the patch in the email (increase buf to 1024) > > > and > > > then try again with libraries built with the patch. > > > > Do I have to aplly the patch to the server only and then rebuild world > > or > > do I have to do the same on the client too? And do I need to rebuild > > heimdal on both machines? > > > The bug should only affect the server, since the client never translates > between principal_name<->uid. (The client does a rather cheezey trick of > using the uid to select the correct credential cache file.) > > > btw, I checked the logs of the kdc and could not see any trace of the > > nfs > > server trying to validate the client's ticket... Frankly, I don't know > > that should I expect there, I haven't used kerberos before, so I have > > no > > idea if it's related to the bug. Here is part of the log: > > > > AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for > > krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL > > No preauth found, returning PREAUTH-REQUIRED -- user@EXAMPLE.LOCAL > > sending 407 bytes to IPv4:X.X.X.X > > AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for > > krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL > > Client sent patypes: encrypted-timestamp > > Looking for PKINIT pa-data -- user@EXAMPLE.LOCAL > > Looking for ENC-TS pa-data -- user@EXAMPLE.LOCAL > > ENC-TS Pre-authentication succeeded -- user@EXAMPLE.LOCAL using > > des-cbc-crc > > Client supported enctypes: des-cbc-crc > > Using des-cbc-crc/aes256-cts-hmac-sha1-96 > > AS-REQ authtime: 2013-02-11T23:45:44 starttime: unset endtime: > > 2013-02-12T09:45:39 renew till: unset > > sending 552 bytes to IPv4:X.X.X.X > > > Hmm, that sounds like you are never getting as far as sending the > ticket to the server, but I'm not at home, so I can't look and see > exactly what gets logged. (Also, I use a MIT KDC, so what gets logged > might be different.) > > I've attached a trivial program that you can compile/run as root > on the NFS server to see if 128 bytes is a big enough buffer for your setup. > If it can print out the uid for the usernames you test as arguments, > the patch isn't needed for your environment. > (Oh, and it has a typo bug in the errx() arguments, but it works ok > for testing.) > > Good luck with it, rick Your test program works with a regular user, but fails with root, indeed. I will try the patch. Do I need to rebuild only world or do I have to rebuild heimdal too? Thanks you, Momchil From owner-freebsd-fs@FreeBSD.ORG Wed Feb 20 23:10:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 107ECAD3 for ; Wed, 20 Feb 2013 23:10:52 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id CF7BE15F for ; Wed, 20 Feb 2013 23:10:51 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAIVWJVGDaFvO/2dsb2JhbABFhkm6GoEZc4IfAQEEASMEUgUWGAICDRkCWQaIHwYMrgWSPoEjjBoagQM0B4ItgRMDiGaNRoEdjz6DJYFNBxcGGA X-IronPort-AV: E=Sophos;i="4.84,705,1355115600"; d="scan'208";a="15020446" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 20 Feb 2013 18:10:48 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 57C07B3FAC; Wed, 20 Feb 2013 18:10:48 -0500 (EST) Date: Wed, 20 Feb 2013 18:10:48 -0500 (EST) From: Rick Macklem To: Momchil Ivanov Message-ID: <222730394.3167100.1361401848290.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <86621m4w0s.wl%momchil@xaxo.eu> Subject: Re: NFS + Kerberos MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 23:10:52 -0000 Momchil Ivanov wrote: > At Tue, 19 Feb 2013 21:00:42 -0500 (EST), > Rick Macklem wrote: > > > > Momchil Ivanov wrote: > > > On Tue, February 19, 2013 12:56 am, Rick Macklem wrote: > > > > Thanks to Elias's hard work, a bug/fix has just been isolated in > > > > the > > > > Kerberos library that causes the gssd to fail to translate a > > > > principal > > > > to a uid. The fix is to increase the size of the buffer passed > > > > to > > > > getpwnam_r(). See this thread: > > > > http://docs.FreeBSD.org/cgi/mid.cgi?CADtN0WKVzbKxhaLQw8y2KLhhRJC9n4ht9wyPmGQ+pHqSjQkVNw > > > > > > > > I haven't run into this bug, so I don't know what systems are > > > > affected, > > > > but it would explain why you can't get it working. > > > > > > > > I'd suggest you apply the patch in the email (increase buf to > > > > 1024) > > > > and > > > > then try again with libraries built with the patch. > > > > > > Do I have to aplly the patch to the server only and then rebuild > > > world > > > or > > > do I have to do the same on the client too? And do I need to > > > rebuild > > > heimdal on both machines? > > > > > The bug should only affect the server, since the client never > > translates > > between principal_name<->uid. (The client does a rather cheezey > > trick of > > using the uid to select the correct credential cache file.) > > > > > btw, I checked the logs of the kdc and could not see any trace of > > > the > > > nfs > > > server trying to validate the client's ticket... Frankly, I don't > > > know > > > that should I expect there, I haven't used kerberos before, so I > > > have > > > no > > > idea if it's related to the bug. Here is part of the log: > > > > > > AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for > > > krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL > > > No preauth found, returning PREAUTH-REQUIRED -- user@EXAMPLE.LOCAL > > > sending 407 bytes to IPv4:X.X.X.X > > > AS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for > > > krbtgt/EXAMPLE.LOCAL@EXAMPLE.LOCAL > > > Client sent patypes: encrypted-timestamp > > > Looking for PKINIT pa-data -- user@EXAMPLE.LOCAL > > > Looking for ENC-TS pa-data -- user@EXAMPLE.LOCAL > > > ENC-TS Pre-authentication succeeded -- user@EXAMPLE.LOCAL using > > > des-cbc-crc > > > Client supported enctypes: des-cbc-crc > > > Using des-cbc-crc/aes256-cts-hmac-sha1-96 > > > AS-REQ authtime: 2013-02-11T23:45:44 starttime: unset endtime: > > > 2013-02-12T09:45:39 renew till: unset > > > sending 552 bytes to IPv4:X.X.X.X > > > > > Hmm, that sounds like you are never getting as far as sending the > > ticket to the server, but I'm not at home, so I can't look and see > > exactly what gets logged. (Also, I use a MIT KDC, so what gets > > logged > > might be different.) > > > > I've attached a trivial program that you can compile/run as root > > on the NFS server to see if 128 bytes is a big enough buffer for > > your setup. > > If it can print out the uid for the usernames you test as arguments, > > the patch isn't needed for your environment. > > (Oh, and it has a typo bug in the errx() arguments, but it works ok > > for testing.) > > > > Good luck with it, rick > > Your test program works with a regular user, but fails with root, > indeed. > > I will try the patch. Do I need to rebuild only world or do I have to > rebuild heimdal too? > I would have thought kerberos was rebuilt for make buildworld. If you use heimdal from somewhere else (ports or their distro), I don't think that needs to be rebuilt, since I don't think the ..pname_to_uid() function is a part of a generic heimdal distribution, but I am not sure. Be sure to change buf[128] --> buf[1024] in both: kerberos5/lib/libgssapi_krb5/pname_to_uid.c usr.sbin/gssd/gssd.c (Or paths close to that. I might not have remembered them quite correctly;-) rick > Thanks you, > Momchil From owner-freebsd-fs@FreeBSD.ORG Thu Feb 21 00:19:22 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C0C1E1E8 for ; Thu, 21 Feb 2013 00:19:22 +0000 (UTC) (envelope-from grarpamp@gmail.com) Received: from mail-ve0-f169.google.com (mail-ve0-f169.google.com [209.85.128.169]) by mx1.freebsd.org (Postfix) with ESMTP id 86D7378D for ; Thu, 21 Feb 2013 00:19:22 +0000 (UTC) Received: by mail-ve0-f169.google.com with SMTP id 15so7638032vea.0 for ; Wed, 20 Feb 2013 16:19:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=dUfPOYubUiyl+ilqHJYQ9xGlvBUm11Qw0LWBIYxIjv4=; b=XsD1Ru1fW6PHkJ7axwLHNZu+lF1wpfJ7F+jX7nyXtMtOThGU45UsE8g6a58RgtrVqV sEzR1ASKak1azmbQ9hsVVQ9DDF25aoWG4GwP298aL0hChaFLjvWe6PzCQs5m9Rv6Mqg6 GBPO7P5AcOXxs0l+3NJyndb6xHrxuLnOuBUi2FywGL7KPHKm2+l2EZf0DB2VsPql5d6E xW6zh18qAM+nMBJ9scs7XR1awRyKySTEXImHiK84qyenpSwFA7MubWd+PxxnTJDXTVW8 bw1Yji9U5dVBmMhGygvBPFgecKfqC/F9fJqtTje8OWiyUg0TeYcVpdsAxRu3BDXPqciH pg5g== MIME-Version: 1.0 X-Received: by 10.52.22.194 with SMTP id g2mr25371345vdf.91.1361405956612; Wed, 20 Feb 2013 16:19:16 -0800 (PST) Received: by 10.220.219.79 with HTTP; Wed, 20 Feb 2013 16:19:16 -0800 (PST) In-Reply-To: <20130215171144.710bf9af@fabiankeil.de> References: <20130215171144.710bf9af@fabiankeil.de> Date: Wed, 20 Feb 2013 19:19:16 -0500 Message-ID: Subject: Re: Crazy ZFS ZIL options: md(4) umass(4) From: grarpamp To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2013 00:19:22 -0000 Still digesting this thread in free time. There are some articles too... http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide http://www.slideshare.net/relling/zfs-tutorial-lisa-2011 http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained http://dtrace.org/blogs/brendan/2009/06/26/slog-screenshots/ https://espix.net/~wildcat/txt/zfs-fragmentation.txt http://pthree.org/2012/12/06/zfs-administration-part-iii-the-zfs-intent-log/ http://www.techforce.com.br/news/layout/set/print/linux_blog/zfs_part_4_sustained_random_small_files_sync_write_iops Whatever happened to the old ISA ExpandedDRAM drives? Today, bus based internal boards given mobo support of lots of ram don't seem to make too much sense. But there has to be a cheap SATA interface version of these things... a drive tray where you can just stuff it with DIMMs and a battery. Cheap as in, am I missing an entire class of $20-$50 devices here? That's all they should cost in parts (minus ram), yet all I see are $multikilo 'enterprise' stuff. If that's really the case one could make them from China. I can't see burning up an SSD (cost) for non-enterprise use. I'll test with USB to expose failure modes. Will probably end up with RAMZIL/syncdisable or adding a 10k spindle pair. From owner-freebsd-fs@FreeBSD.ORG Thu Feb 21 01:42:08 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 99DDE6A6 for ; Thu, 21 Feb 2013 01:42:08 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 62FE9A5F for ; Thu, 21 Feb 2013 01:42:07 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEADN7JVGDaFvO/2dsb2JhbAArGhaGM7oQgRtzgh8BAQEDAQEBASArIAsFFhgCAg0ZAikBCSYGCAcEARkDBIdrBgwtrUySN4EjjBQWgQ00B4ItgRMDiGaLDoI4gR2PPoMlT4EFNQ X-IronPort-AV: E=Sophos;i="4.84,705,1355115600"; d="scan'208";a="17555604" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 20 Feb 2013 20:42:01 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id DB320B4032; Wed, 20 Feb 2013 20:42:01 -0500 (EST) Date: Wed, 20 Feb 2013 20:42:01 -0500 (EST) From: Rick Macklem To: grarpamp Message-ID: <127974626.3169918.1361410921874.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: Subject: Re: Crazy ZFS ZIL options: md(4) umass(4) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2013 01:42:08 -0000 grarpamp wrote: > Still digesting this thread in free time. > There are some articles too... > > http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > http://www.slideshare.net/relling/zfs-tutorial-lisa-2011 > http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained > http://dtrace.org/blogs/brendan/2009/06/26/slog-screenshots/ > https://espix.net/~wildcat/txt/zfs-fragmentation.txt > http://pthree.org/2012/12/06/zfs-administration-part-iii-the-zfs-intent-log/ > http://www.techforce.com.br/news/layout/set/print/linux_blog/zfs_part_4_sustained_random_small_files_sync_write_iops > > Whatever happened to the old ISA ExpandedDRAM drives? > Today, bus based internal boards given mobo support of lots of ram > don't seem to make too much sense. But there has to be a cheap > SATA interface version of these things... a drive tray where you can > just stuff it with DIMMs and a battery. > Cheap as in, am I missing an entire class of $20-$50 devices > here? That's all they should cost in parts (minus ram), yet all I > see are $multikilo 'enterprise' stuff. If that's really the case one > could make them from China. > Someone posted mentioning this one. ($337 isn't $20-%50, but...): http://www.acard.com/english/fb01-product.jsp?idno_no=382&prod_no=ANS-9010BA&type1_idno=5&ino=28 > I can't see burning up an SSD (cost) for non-enterprise use. > I'll test with USB to expose failure modes. Will probably end up with > RAMZIL/syncdisable or adding a 10k spindle pair. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu Feb 21 12:11:13 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CD66112B6 for ; Thu, 21 Feb 2013 12:11:13 +0000 (UTC) (envelope-from grarpamp@gmail.com) Received: from mail-oa0-f44.google.com (mail-oa0-f44.google.com [209.85.219.44]) by mx1.freebsd.org (Postfix) with ESMTP id A2BAF26B for ; Thu, 21 Feb 2013 12:11:13 +0000 (UTC) Received: by mail-oa0-f44.google.com with SMTP id h1so9168215oag.17 for ; Thu, 21 Feb 2013 04:11:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=yLnbQxW8i3pwtNfamuDq6MxUQj/wYfP7gBt3Pxh56Y0=; b=HFD2br+lZ0KZl8brxvYUj1KGdrnTLNfrUukrY2w6dzU+JOIEs7hBS5cncH0eoSTqbz Fv3Uw1l7ioxwhos/+0ONhvQ3+xTFbeBH0JNCOn49FM3I5E/2iRNcgYBDV7u4nv6CDIQl S8mKJItA+lBKkGAtkYsvBiXgqnV8y2Vc1x1T1GSsh1nO7TtnDzZXuXTHQe6Nhgb8MH9V Ggyja8zPkgFSIaaGRHpubddLJag4A4Oy2Kp4Cvad49Jclyhqu3xF1xIcVrnIIkh2PUJK pSUSWnkiNx00/3BAQkF9OAh67vdwT+qRaWPnXjuwXn4lxrZe6EDVdlhXoAr7H/DlX29Y A6Qw== MIME-Version: 1.0 X-Received: by 10.60.11.35 with SMTP id n3mr7035118oeb.90.1361442112804; Thu, 21 Feb 2013 02:21:52 -0800 (PST) Received: by 10.60.146.203 with HTTP; Thu, 21 Feb 2013 02:21:52 -0800 (PST) Date: Thu, 21 Feb 2013 05:21:52 -0500 Message-ID: Subject: Crazy ZFS ZIL options: md(4) umass(4) NAND SATA PCI From: grarpamp To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2013 12:11:13 -0000 > Someone posted mentioning this one. ($337 isn't $20-%50, but...): > http://www.acard.com/english/fb01-product.jsp?idno_no=382&prod_no=ANS-9010BA&type1_idno=5&ino=28 It's 32GiB, 20K iops (8KiB?), 14 minute CF dump, 4hr life, but only specs at 200MiB/s over sata2 (a few drives in jbod can match that sustained xfer rate). They make a dual port version at 16GiB and 200MiB/s per port (400MiB host striped). It's ddr2 ram is nearly obsolete and at $80/4GiB is almost 4x more than ddr3. Optional 64GiB CF card $85 (or use non-ecc ram emulate and a 32GiB CF card $40). Optional whatever 12vdc source you want to rig to it. So to fill it out you're looking at $295 dev + $640 ram ... $1100 stoked. And the ANS-9010B is 48GiB for $250. There's old Gigabyte gc-ramdisk i-ram, 4GiB, sata1, DDR1, battery ... ~$100 STEC ZeusRAM 8GiB, DDR3, SLC, SAS2, $2500-$3000 It's about as hot as the price. Bus based... You can get a PCI-e 4GiB, 60 second SLC dump, supercaps, from ddrdrive.com for $2000, and they might even let you write a FreeBSD driver for it. It's price, size and flexibility are not that hot, performance maybe 40K iops (4KiB), on par with the above. www.fusionio.com/products/iodrive-octal/ NAND... SLC 20GiB ish ... $150 MLC 60GiB ish ... $75 There's still room for a cheap DDR3+SATA3 unit. Or just fill out your motherboard slots and add a UPS :) From owner-freebsd-fs@FreeBSD.ORG Thu Feb 21 12:33:16 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 582393BA for ; Thu, 21 Feb 2013 12:33:16 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from cpsmtpb-ews06.kpnxchange.com (cpsmtpb-ews06.kpnxchange.com [213.75.39.9]) by mx1.freebsd.org (Postfix) with ESMTP id BCC8CAAA for ; Thu, 21 Feb 2013 12:33:15 +0000 (UTC) Received: from cpsps-ews21.kpnxchange.com ([10.94.84.187]) by cpsmtpb-ews06.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Thu, 21 Feb 2013 12:38:46 +0100 Received: from CPSMTPM-TLF103.kpnxchange.com ([195.121.3.6]) by cpsps-ews21.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Thu, 21 Feb 2013 12:38:46 +0100 Received: from sjakie.klop.ws ([212.182.167.131]) by CPSMTPM-TLF103.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Thu, 21 Feb 2013 12:40:07 +0100 Received: from 212-182-167-131.ip.telfort.nl (localhost [127.0.0.1]) by sjakie.klop.ws (Postfix) with ESMTP id 7001EA988 for ; Thu, 21 Feb 2013 12:40:07 +0100 (CET) Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: freebsd-fs@freebsd.org Subject: Re: Some filesystem thoughts References: <51252372.1040001@o2.pl> Date: Thu, 21 Feb 2013 12:40:07 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: Quoted-Printable From: "Ronald Klop" Message-ID: In-Reply-To: <51252372.1040001@o2.pl> User-Agent: Opera Mail/12.14 (FreeBSD) X-OriginalArrivalTime: 21 Feb 2013 11:40:07.0783 (UTC) FILETIME=[32D43F70:01CE1028] X-RcptDomain: freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2013 12:33:16 -0000 On Wed, 20 Feb 2013 20:26:42 +0100, Radio m=C5=82odych bandyt=C3=B3w = wrote: > Hello, > I'm a pretty fresh Unix user, suffering from productivity loss caused = by = > changing OS. I dearly miss a couple of facilities that I had implement= ed = > in my file manager (Total Commander) and sought a file manager that = > could replace them. I'm pretty sure there's none. I found that Unix do= es = > some of them at the OS level and that's a superior way. But with other= s = > it doesn't; some file managers implement them by themselves, much like= = > TC (not well enough, but that's another rant), but I think that for = > them, OS is the right place too and that's what I'd like to talk about= . = > I come with a free idea that I think would be awesome to have = > implemented, while not being sure it even can be implemented sensibly = = > within Unix. Maybe I miss something? Maybe the idea ain't good? Maybe = = > there are things that do the job well enough already and I just miss = > them? > > Anyway, here's the story: > > Total Commander's filesystem plugins are awesome. They enable users to= = > manage remote / virtual resources just like remote filesystems. FTP, = > websites, process list, calendar; the variety is rich. > In Unix, there are equivalents for some of them; the ones that mattere= d = > the most for me can be usually simulated by mount. > And that's a better way because when mounted, they can be used by any = = > program, not just file manager. I'm sure that all people here are used= = > to enjoying the benefits of this approach, though for me they are nove= l. > > The other thing - packer plugins. They allow treating archives like = > directories. Again, there are many useful ones, some obvious (like = > zips), some not so much. I treated my executables as directories, whic= h = > enabled me to easily manipulate resources stored inside. Especially = > useful when hacking closed-source Delphi programs as they contain lots= = > of GUI code stored directly (The name 'TNASTYNAGSCREEN' will stay in m= y = > mind for long). Or extracting icons. Or doing many other things that a= re = > necessary to play with closed source code, but less relevant in Unix. > There was a steganography plugin storing data inside images. A plugin = = > for generation and browsing of file lists. A Java decompiler. And a = > great variety of others. > > Unix file managers offer similar, though not so rich options. Yet I = > think it's not their job. Like with mounting, there's great benefit fr= om = > being able to use standard tools with them. > Some write things like zipfs, but I think it's wrong. > First, typing a command is cumbersome. Second, even if it was automate= d, = > mounting needs a mount point. The only good one is the file itself; = > working with a dozen (or thousand) of archives in a single directory i= s = > a norm for me. Switching dirs back and forth would be very disruptive.= = > Breaks relative paths. And so on. > > The way I see it is not to treat files as streams of bytes. That's not= = > what they are, files have meanings and there are tools that bring them= = > out. A picture is a stored emotion. OK, there are no tools for that ye= t. = > But it is also an array of pixels. And a container with exif data. And= = > may be a container with an encrypted archive. And, a stream of bytes t= oo. > They have multiple facets. > I think that it would be useful to somehow expose them to applications= . > Wouldn't it be useful to be able to grep through pdfs in your email = > attachments? > Mass-edit music tags with sed? Manually edit with your favourite text = = > editor instead of the sucky one-liner provided by your favourite music= = > player? > How about video players being able to play videos by reading them in = > decoded form directly from the filesystem instead of having to integra= te = > a significant number of complex libraries to provide sufficient format= = > coverage? Creative ideas. Part of what you want is in fusefs (mounting of files to edit their = content). And part is implemented in e.g. KDE (integrated support for = various file types in fulltext search and tagging of files/metadata, etc= .). The chances of having all these complex libraries integrated in the = FreeBSD OS are close to zero I presume. But I am not in a position to = decide about that. I think you can't expect the OS to serve everybody's detailed wishes. Th= e = OS serves files and user programs know what to do with them. Ronald. From owner-freebsd-fs@FreeBSD.ORG Thu Feb 21 16:18:56 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 593EFD3 for ; Thu, 21 Feb 2013 16:18:56 +0000 (UTC) (envelope-from momchil@xaxo.eu) Received: from vps2.xaxo.eu (vps2.xaxo.eu [78.47.156.66]) by mx1.freebsd.org (Postfix) with ESMTP id DB75422B for ; Thu, 21 Feb 2013 16:18:55 +0000 (UTC) Received: from vps2.xaxo.eu (localhost [127.0.0.1]) by vps2.xaxo.eu (8.14.4/8.14.4) with ESMTP id r1LGIroG093454; Thu, 21 Feb 2013 17:18:53 +0100 (CET) (envelope-from momchil@xaxo.eu) Received: (from www@localhost) by vps2.xaxo.eu (8.14.4/8.14.4/Submit) id r1LGIr6f093453; Thu, 21 Feb 2013 17:18:53 +0100 (CET) (envelope-from momchil@xaxo.eu) X-Authentication-Warning: vps2.xaxo.eu: www set sender to momchil@xaxo.eu using -f Received: from 139.18.9.22 (SquirrelMail authenticated user space) by webmail.xaxo.eu with HTTP; Thu, 21 Feb 2013 17:18:53 +0100 Message-ID: Date: Thu, 21 Feb 2013 17:18:53 +0100 Subject: Re: NFS + Kerberos From: "Momchil Ivanov" To: "Rick Macklem" User-Agent: SquirrelMail/1.4.21 MIME-Version: 1.0 Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2013 16:18:56 -0000 On Thu, February 21, 2013 12:10 am, Rick Macklem wrote: > I would have thought kerberos was rebuilt for make buildworld. If you use heimdal from somewhere else (ports or their distro), I don't think that needs to be rebuilt, since I don't think the ..pname_to_uid() function is a part of a generic heimdal distribution, but I am not sure. > > Be sure to change buf[128] --> buf[1024] in both: > kerberos5/lib/libgssapi_krb5/pname_to_uid.c > usr.sbin/gssd/gssd.c > > (Or paths close to that. I might not have remembered them quite > correctly;-) this change allows for yet another entry in the kdc log: 2013-02-21T17:03:43 TGS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for nfs/srv.example.local@EXAMPLE.LOCAL 2013-02-21T17:03:44 TGS-REQ authtime: 2013-02-21T17:02:03 starttime: 2013-02-21T17:03:43 endtime: 2013-02-22T03:02:00 renew till: unset 2013-02-21T17:03:44 sending 612 bytes to IPv4:X.X.X.X which seems promising, but I still get: $ mount -t nfs -o nfsv4,sec=krb5i srv.example.local:/ /mnt/srv mount_nfs: can't update /var/db/mounttab for srv.example.local:/ nfsv4 err=10016 mount_nfs: /mnt/srv, : Input/output error do you happen to have any other ideas? Thank you, Momchil From owner-freebsd-fs@FreeBSD.ORG Thu Feb 21 19:07:12 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 55A00347; Thu, 21 Feb 2013 19:07:12 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-bk0-f53.google.com (mail-bk0-f53.google.com [209.85.214.53]) by mx1.freebsd.org (Postfix) with ESMTP id B8FF4220; Thu, 21 Feb 2013 19:07:11 +0000 (UTC) Received: by mail-bk0-f53.google.com with SMTP id j10so4240626bkw.40 for ; Thu, 21 Feb 2013 11:07:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:sender:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=bEA4zuirgVUuOV99wU7y7zIaXL8UT8/6iXIjaQFp9Yk=; b=Bzeamx+Pl52jSjLjffP2m7YluD3Tx7vBrEX2PDU9grJtqpT3sofIAaQC3dVOsYqyPr 6wNudlyyA61KRlYUw3jJ1TytnVOTfuaSxs4G9MaLsts3aYcTNgIrcVOaz0ludoAL5bT/ yCyk5/2WSTPEB/1KzbkNBWJ+UAlAgL0cWWiQyTUbTQ2vRRwJ2fY95YLFfMrCH7aI389O AWU0MQc36Ktsaqwy2v9jjtQklJktKszrobSNsaMLv0vYmbziaI6yhvqT9m3/A/LmsOm+ aZJa/bhw28kod6Mqq8DdFLp86fQM9HGu67F6Wqlnn/ZfveYy53NgOVxla4O3sHwPzSfC TOlg== X-Received: by 10.204.8.16 with SMTP id f16mr11376400bkf.81.1361473625153; Thu, 21 Feb 2013 11:07:05 -0800 (PST) Received: from mavbook.mavhome.dp.ua (mavhome.mavhome.dp.ua. [213.227.240.37]) by mx.google.com with ESMTPS id go8sm24841835bkc.20.2013.02.21.11.07.02 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 21 Feb 2013 11:07:03 -0800 (PST) Sender: Alexander Motin Message-ID: <51267055.3040500@FreeBSD.org> Date: Thu, 21 Feb 2013 21:07:01 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130125 Thunderbird/17.0.2 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: disk "flipped" - a known problem? References: <20130121221617.GA23909@icarus.home.lan> <50FED818.7070704@FreeBSD.org> <20130125083619.GA51096@icarus.home.lan> <20130125211232.GA3037@icarus.home.lan> <20130125212559.GA1772@icarus.home.lan> <20130125213209.GA1858@icarus.home.lan> <20130126011754.GA1806@icarus.home.lan> In-Reply-To: <20130126011754.GA1806@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, avg@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2013 19:07:12 -0000 On 26.01.2013 03:17, Jeremy Chadwick wrote: > Okay, I've figured out the exact, 100% reproducible condition that > causes the situation. It took me a lot of tries and a digital pocket > recorder to take verbal notes (there are just too many things to look at > simultaneously), but I've figured it out. > > I'm sorry for the verbosity, but it's necessary. > > Assume the disk we're talking about is /dev/ada5. > > 1. Prior to any issues, we have this: > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5* > crw-r----- 1 root operator 0x8c Jan 25 16:41 /dev/ada5 > crw------- 1 root operator 0x75 Jan 25 16:35 /dev/pass5 > crw------- 1 root operator 0x51 Jan 25 16:35 /dev/xpt0 > > 2. ada5 begins experiencing issues -- ATA commands (CDBs) submit do not > get a response (not going to discuss how/why that can happen). > > 3. These types of messages are seen on console (naturally the CDB and > request type will vary -- in this case it was because I was doing the dd > zero'ing, thus tickling the bad sector/naughty firmware on the drive): > > Jan 25 16:29:28 icarus kernel: ahcich5: Timeout on slot 0 port 0 > Jan 25 16:29:28 icarus kernel: ahcich5: is 00000000 cs 00000000 ss 00000001 rs 00000001 tfd 40 serr 00000000 cmd 0004c017 > Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset... > Jan 25 16:29:28 icarus kernel: ahcich5: SATA connect time=1000us status=00000113 > Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset: device found > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 80 80 77 01 40 00 00 00 00 00 00 > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): Retrying command > > 4. Any I/O submit to ada5 during this time blocks (this is normal). > > 5. **While this situation is happening**, something using xpt(4) > attempts to submit a CDB to the disk (ex. smartctl -a /dev/ada5). > This request also blocks (again, normal). > > 6. Physical device falls off bus, or CAM kicks the disk off the bus. > Doesn't matter which. We see messages resembling this (boy am I tired > of this interspersed output problem): > > Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): lost device > Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): lost device > Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): removing device entry > Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): passdevgonecb: devfs entry is gone > > 7. Standard I/O requests fail with errno=6 "Device not configured". > xpt(4) requests also fail with the same errno. > > 8. Device-wise, at this stage all we have is: > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5* > crw------- 1 root operator 0x51 Jan 25 16:35 /dev/xpt0 > > 9. Device comes back online for whatever reason. FreeBSD sees the disk, > blah blah blah: > > Jan 25 16:30:16 icarus kernel: GEOM: new disk ada5 > Jan 25 16:30:16 icarus kernel: ada5: ATA-7 SATA 1.x device > Jan 25 16:30:16 icarus kernel: ada5: Serial Number WD-WMAP41573589 > Jan 25 16:30:16 icarus kernel: ada5: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) > Jan 25 16:30:16 icarus kernel: ada5: Command Queueing enabled > Jan 25 16:30:16 icarus kernel: ada5: 143089MB (293046768 512 byte sectors: 16H 63S/T 16383C) > Jan 25 16:30:16 icarus kernel: ada5: Previously was known as ad14 > > ...um, where's pass5? > > 10. /dev/pass5 is now completely (permanently) missing: > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5* > crw-r----- 1 root operator 0x99 Jan 25 16:42 /dev/ada5 > crw------- 1 root operator 0x51 Jan 25 16:35 /dev/xpt0 > > 11. Any further attempts to communicate via xpt(4) with ada5 fail. > Detaching and reattaching the disk does not fix the issue; the only fix > is to reboot the system. > > 12. "camcontrol debug -IPXp scbus5" results in tons and tons of output > all pertaining to xpt(4). It looks like xpt(4) is in some kind of > loop. > > Below is my verbose boot (with non-kernel things removed), which > also includes "camcontrol debug" output once things are in a bad state: > > http://jdc.koitsu.org/freebsd/xpt_oddity.log > > In this log you'll see that after 1 CAM timeout I yanked the drive, then > roughly 30 seconds later reinserted it. > > If you need me to turn on CAM debugging *prior* to the above, I can do > that, just let me know. > > The important step is #5. Without that, the problem shown in #9/10/11 > does not happen. > > It's a good thing I don't run smartd(8) -- most users I see using that > software set the interval to something like 180s or 60s. Imagine this > frustration: "okay so the disk fell off the bus, but what, now I can't > talk to it with SMART? Uhhh... Err, works now? Whatever". I think, the problem may already be fixed in HEAD by r244014 by ken@. I've just merged it to 9-STABLE at r247115. So if it is still possible to reproduce the situation, it would be good to try. -- Alexander Motin From owner-freebsd-fs@FreeBSD.ORG Thu Feb 21 23:18:04 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 57307BFE for ; Thu, 21 Feb 2013 23:18:04 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 14CAA3D6 for ; Thu, 21 Feb 2013 23:18:03 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEANepJlGDaFvO/2dsb2JhbABFhk63bYJYgRxzgh8BAQQBIwRSBRYYAgINGQJZBogfBq0ckhKBI403NAeCLYETA4hpjU2QXoMlggk X-IronPort-AV: E=Sophos;i="4.84,711,1355115600"; d="scan'208";a="17707622" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 21 Feb 2013 18:17:56 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 9D232B3F0D; Thu, 21 Feb 2013 18:17:56 -0500 (EST) Date: Thu, 21 Feb 2013 18:17:56 -0500 (EST) From: Rick Macklem To: Momchil Ivanov Message-ID: <496437657.3199038.1361488676628.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: Subject: Re: NFS + Kerberos MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2013 23:18:04 -0000 Momchil Ivanov wrote: > On Thu, February 21, 2013 12:10 am, Rick Macklem wrote: > > I would have thought kerberos was rebuilt for make buildworld. If > > you > use heimdal from somewhere else (ports or their distro), I don't think > that needs to be rebuilt, since I don't think the ..pname_to_uid() > function is a part of a generic heimdal distribution, but I am not > sure. > > > > Be sure to change buf[128] --> buf[1024] in both: > > kerberos5/lib/libgssapi_krb5/pname_to_uid.c > > usr.sbin/gssd/gssd.c > > > > (Or paths close to that. I might not have remembered them quite > > correctly;-) > > this change allows for yet another entry in the kdc log: > > 2013-02-21T17:03:43 TGS-REQ user@EXAMPLE.LOCAL from IPv4:X.X.X.X for > nfs/srv.example.local@EXAMPLE.LOCAL > 2013-02-21T17:03:44 TGS-REQ authtime: 2013-02-21T17:02:03 starttime: > 2013-02-21T17:03:43 endtime: 2013-02-22T03:02:00 renew till: unset > 2013-02-21T17:03:44 sending 612 bytes to IPv4:X.X.X.X > > which seems promising, but I still get: > > $ mount -t nfs -o nfsv4,sec=krb5i srv.example.local:/ /mnt/srv > mount_nfs: can't update /var/db/mounttab for srv.example.local:/ nfsv4 > err=10016 > mount_nfs: /mnt/srv, : Input/output error > Error 10016 is NFS4ERR_WRONGSEC. This means that the server expects a different security flavour (sys maybe) at some point in the mount. I can't remember if you posted your /etc/exports file before, but I suspect the file system referred by the root sepcified in the V4: line isn't allowing krb5i. For example, if you wanted to mount the file system rooted at /home by the above, you would need the following 2 lines in /etc/exports. /home -sec=krb5i V4: /home -sec=krb5i You can list other security flavours for -sec, but krb5i needs to be one of them. rick ps: Don't worry about the "can't update /var/db/mounttab". It is basically harmless and can be fixed by allowing the user doing the mount write access to it. If you don't do that, then the mount will still work ok, it will just generate the message. > do you happen to have any other ideas? > > Thank you, > Momchil From owner-freebsd-fs@FreeBSD.ORG Thu Feb 21 23:36:11 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EB0DE34D for ; Thu, 21 Feb 2013 23:36:11 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta14.emeryville.ca.mail.comcast.net (qmta14.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:44:76:96:27:212]) by mx1.freebsd.org (Postfix) with ESMTP id CF45D6D6 for ; Thu, 21 Feb 2013 23:36:11 +0000 (UTC) Received: from omta03.emeryville.ca.mail.comcast.net ([76.96.30.27]) by qmta14.emeryville.ca.mail.comcast.net with comcast id 39gl1l0050b6N64AEBcBSU; Thu, 21 Feb 2013 23:36:11 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta03.emeryville.ca.mail.comcast.net with comcast id 3Bc91l00a1t3BNj8PBc90d; Thu, 21 Feb 2013 23:36:11 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 90BB373A1C; Thu, 21 Feb 2013 15:36:09 -0800 (PST) Date: Thu, 21 Feb 2013 15:36:09 -0800 From: Jeremy Chadwick To: Alexander Motin Subject: Re: disk "flipped" - a known problem? Message-ID: <20130221233609.GA92249@icarus.home.lan> References: <20130121221617.GA23909@icarus.home.lan> <50FED818.7070704@FreeBSD.org> <20130125083619.GA51096@icarus.home.lan> <20130125211232.GA3037@icarus.home.lan> <20130125212559.GA1772@icarus.home.lan> <20130125213209.GA1858@icarus.home.lan> <20130126011754.GA1806@icarus.home.lan> <51267055.3040500@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51267055.3040500@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1361489771; bh=/tdPUkBiQoJZ+fmmIV8Uv6IkEU4GUkzKoe4gUAwxsKo=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=h3ygpcrcN6qvQmmVCiUrQt1zBZFS72vqx958L1Kyoo9kAadASwosUPyG7Gt9GNj4L HU3eFXFLwR19lkKTBjKmRGNmbJuXmfzo+GIb2QBFsiDV3Ie8VVgIj+KvBzbVidNMZ9 F5sBAIMI+xw7I3fh0Xi2yOv7osibwJX+5MmMBa8CFICIsLimPx53BxTQj/A66upPLM mEvJ2h2w3ByEW9GBlsqaC9tWw9QbP+SPVZv7tBWZz0Do8jIRSAP8WEq6noZnDQKVlT YmniiTkWZzE3eJdrIizTXlt8mB0RpDz1UnbJQsBaW+2Op1NAt2wdg04bVgnmDbUPao S9WWHUW27J4QA== Cc: freebsd-fs@freebsd.org, avg@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Feb 2013 23:36:12 -0000 On Thu, Feb 21, 2013 at 09:07:01PM +0200, Alexander Motin wrote: > On 26.01.2013 03:17, Jeremy Chadwick wrote: > > Okay, I've figured out the exact, 100% reproducible condition that > > causes the situation. It took me a lot of tries and a digital pocket > > recorder to take verbal notes (there are just too many things to look at > > simultaneously), but I've figured it out. > > > > I'm sorry for the verbosity, but it's necessary. > > > > Assume the disk we're talking about is /dev/ada5. > > > > 1. Prior to any issues, we have this: > > > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5* > > crw-r----- 1 root operator 0x8c Jan 25 16:41 /dev/ada5 > > crw------- 1 root operator 0x75 Jan 25 16:35 /dev/pass5 > > crw------- 1 root operator 0x51 Jan 25 16:35 /dev/xpt0 > > > > 2. ada5 begins experiencing issues -- ATA commands (CDBs) submit do not > > get a response (not going to discuss how/why that can happen). > > > > 3. These types of messages are seen on console (naturally the CDB and > > request type will vary -- in this case it was because I was doing the dd > > zero'ing, thus tickling the bad sector/naughty firmware on the drive): > > > > Jan 25 16:29:28 icarus kernel: ahcich5: Timeout on slot 0 port 0 > > Jan 25 16:29:28 icarus kernel: ahcich5: is 00000000 cs 00000000 ss 00000001 rs 00000001 tfd 40 serr 00000000 cmd 0004c017 > > Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset... > > Jan 25 16:29:28 icarus kernel: ahcich5: SATA connect time=1000us status=00000113 > > Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset: device found > > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 80 80 77 01 40 00 00 00 00 00 00 > > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout > > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): Retrying command > > > > 4. Any I/O submit to ada5 during this time blocks (this is normal). > > > > 5. **While this situation is happening**, something using xpt(4) > > attempts to submit a CDB to the disk (ex. smartctl -a /dev/ada5). > > This request also blocks (again, normal). > > > > 6. Physical device falls off bus, or CAM kicks the disk off the bus. > > Doesn't matter which. We see messages resembling this (boy am I tired > > of this interspersed output problem): > > > > Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): lost device > > Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): lost device > > Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): removing device entry > > Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): passdevgonecb: devfs entry is gone > > > > 7. Standard I/O requests fail with errno=6 "Device not configured". > > xpt(4) requests also fail with the same errno. > > > > 8. Device-wise, at this stage all we have is: > > > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5* > > crw------- 1 root operator 0x51 Jan 25 16:35 /dev/xpt0 > > > > 9. Device comes back online for whatever reason. FreeBSD sees the disk, > > blah blah blah: > > > > Jan 25 16:30:16 icarus kernel: GEOM: new disk ada5 > > Jan 25 16:30:16 icarus kernel: ada5: ATA-7 SATA 1.x device > > Jan 25 16:30:16 icarus kernel: ada5: Serial Number WD-WMAP41573589 > > Jan 25 16:30:16 icarus kernel: ada5: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) > > Jan 25 16:30:16 icarus kernel: ada5: Command Queueing enabled > > Jan 25 16:30:16 icarus kernel: ada5: 143089MB (293046768 512 byte sectors: 16H 63S/T 16383C) > > Jan 25 16:30:16 icarus kernel: ada5: Previously was known as ad14 > > > > ...um, where's pass5? > > > > 10. /dev/pass5 is now completely (permanently) missing: > > > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5* > > crw-r----- 1 root operator 0x99 Jan 25 16:42 /dev/ada5 > > crw------- 1 root operator 0x51 Jan 25 16:35 /dev/xpt0 > > > > 11. Any further attempts to communicate via xpt(4) with ada5 fail. > > Detaching and reattaching the disk does not fix the issue; the only fix > > is to reboot the system. > > > > 12. "camcontrol debug -IPXp scbus5" results in tons and tons of output > > all pertaining to xpt(4). It looks like xpt(4) is in some kind of > > loop. > > > > Below is my verbose boot (with non-kernel things removed), which > > also includes "camcontrol debug" output once things are in a bad state: > > > > http://jdc.koitsu.org/freebsd/xpt_oddity.log > > > > In this log you'll see that after 1 CAM timeout I yanked the drive, then > > roughly 30 seconds later reinserted it. > > > > If you need me to turn on CAM debugging *prior* to the above, I can do > > that, just let me know. > > > > The important step is #5. Without that, the problem shown in #9/10/11 > > does not happen. > > > > It's a good thing I don't run smartd(8) -- most users I see using that > > software set the interval to something like 180s or 60s. Imagine this > > frustration: "okay so the disk fell off the bus, but what, now I can't > > talk to it with SMART? Uhhh... Err, works now? Whatever". > > I think, the problem may already be fixed in HEAD by r244014 by ken@. > I've just merged it to 9-STABLE at r247115. So if it is still possible > to reproduce the situation, it would be good to try. Yep, I saw the commit per svn-src-stable-9@freebsd.org, along with a bunch of others; I wasn't sure if r247114 or r247115 fixed it, so ws waiting for a follow-up from you. :-) I'll rebuild world/kernel and try it out + report back. Thank you (and ken@ too!) for the work on this. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Fri Feb 22 01:03:11 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8E2F98E6 for ; Fri, 22 Feb 2013 01:03:11 +0000 (UTC) (envelope-from momchil@xaxo.eu) Received: from vps2.xaxo.eu (vps2.xaxo.eu [78.47.156.66]) by mx1.freebsd.org (Postfix) with ESMTP id 2A3E8A3D for ; Fri, 22 Feb 2013 01:03:10 +0000 (UTC) Received: from t61.xaxo.eu ([10.75.23.6]) by vps2.xaxo.eu (8.14.4/8.14.4) with ESMTP id r1M132BT098425; Fri, 22 Feb 2013 02:03:02 +0100 (CET) (envelope-from momchil@xaxo.eu) Date: Fri, 22 Feb 2013 02:02:53 +0100 Message-ID: <86ip5lkvnm.wl%momchil@xaxo.eu> From: Momchil Ivanov To: Rick Macklem Subject: Re: NFS + Kerberos In-Reply-To: <496437657.3199038.1361488676628.JavaMail.root@erie.cs.uoguelph.ca> References: <496437657.3199038.1361488676628.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Cc: freebsd-fs@freebsd.org, Momchil Ivanov X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Feb 2013 01:03:11 -0000 At Thu, 21 Feb 2013 18:17:56 -0500 (EST), Rick Macklem wrote: > Error 10016 is NFS4ERR_WRONGSEC. This means that the server expects a > different security flavour (sys maybe) at some point in the mount. btw you have a typo, it's NFSERR_WRONGSEC. The problem is that I think it would be hard for me to find the piece of code that issues it in my case, so that I can understand why. Unfortunately, I am not familiar with NFS and the kernel internals... and since there are a number of places where it can be generated [1] and the machine that I am using as a NFS server, is rather slow in compiling world... it would be hard for me to instrument the code... > I can't remember if you posted your /etc/exports file before, but > I suspect the file system referred by the root sepcified in the V4: > line isn't allowing krb5i. For example, if you wanted to mount the > file system rooted at /home by the above, you would need the following > 2 lines in /etc/exports. > > /home -sec=krb5i > V4: /home -sec=krb5i here is my /etc/exports: V4: /tank/storage -sec=krb5i:krb5p /tank/storage -sec=krb5i:krb5p > You can list other security flavours for -sec, but krb5i needs to be > one of them. > > rick > ps: Don't worry about the "can't update /var/db/mounttab". It is > basically harmless and can be fixed by allowing the user doing > the mount write access to it. If you don't do that, then the > mount will still work ok, it will just generate the message. I know this :) btw I have Kerberos working with sshd on the same machine, so I think I have managed to set it up correctly... but the NFS server doesn't want to work with Kerberos.. the changes you suggested were in the right direction, since I can now see TGS-REQ lines in the KDC log, but there might still be some bugs here, or I am doing something wrong... Ideas are welcomed :) I would be happy to get it working. 1: http://fxr.watson.org/fxr/ident?v=FREEBSD9;i=NFSERR_WRONGSEC Thank you, Momchil From owner-freebsd-fs@FreeBSD.ORG Fri Feb 22 02:46:01 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4D798EAE for ; Fri, 22 Feb 2013 02:46:01 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id EC1BFEA2 for ; Fri, 22 Feb 2013 02:46:00 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAIXbJlGDaFvO/2dsb2JhbABFhk66S4Efc4IfAQEEASNWBRYYAgINBQETAlkGiB8GDK0YkhuBI4wwgQc0BxIBghqBEwOIaY1NkF6DJYFMAQcXHg X-IronPort-AV: E=Sophos;i="4.84,713,1355115600"; d="scan'208";a="17724606" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 21 Feb 2013 21:45:59 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 9378DB3F13; Thu, 21 Feb 2013 21:45:59 -0500 (EST) Date: Thu, 21 Feb 2013 21:45:59 -0500 (EST) From: Rick Macklem To: Momchil Ivanov Message-ID: <1845485841.3202259.1361501159585.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <86ip5lkvnm.wl%momchil@xaxo.eu> Subject: Re: NFS + Kerberos MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Feb 2013 02:46:01 -0000 Momchil Ivanov wrote: > At Thu, 21 Feb 2013 18:17:56 -0500 (EST), > Rick Macklem wrote: > > Error 10016 is NFS4ERR_WRONGSEC. This means that the server expects > > a > > different security flavour (sys maybe) at some point in the mount. > > btw you have a typo, it's NFSERR_WRONGSEC. Actually, it's called NFS4ERR_WRONGSEC in the RFC and NFSERR_WRONGSEC in the NFS sources, just to try and confuse you;-) > The problem is that I think > it would be hard for me to find the piece of code that issues it in my > case, so that I can understand why. Unfortunately, I am not familiar > with NFS and the kernel internals... and since there are a number of > places where it can be generated [1] and the machine that I am using > as a NFS server, is rather slow in compiling world... it would be hard > for me to instrument the code... > > > I can't remember if you posted your /etc/exports file before, but > > I suspect the file system referred by the root sepcified in the V4: > > line isn't allowing krb5i. For example, if you wanted to mount the > > file system rooted at /home by the above, you would need the > > following > > 2 lines in /etc/exports. > > > > /home -sec=krb5i > > V4: /home -sec=krb5i > > here is my /etc/exports: > > V4: /tank/storage -sec=krb5i:krb5p > /tank/storage -sec=krb5i:krb5p > Just as an experiment, you could try adding "sys" to the -sec list for both lines. If the mount works then, it would tell you that the client isn't successfully getting a Kerberos credential and is falling back to using "sys" (called AUTH_SYS in the RFCs, just for further confusion;-). > > You can list other security flavours for -sec, but krb5i needs to be > > one of them. > > > > rick > > ps: Don't worry about the "can't update /var/db/mounttab". It is > > basically harmless and can be fixed by allowing the user doing > > the mount write access to it. If you don't do that, then the > > mount will still work ok, it will just generate the message. > > I know this :) > > btw I have Kerberos working with sshd on the same machine, so I think > I have managed to set it up correctly... but the NFS server doesn't > want to work with Kerberos.. the changes you suggested were in the > right direction, since I can now see TGS-REQ lines in the KDC log, but > there might still be some bugs here, or I am doing something wrong... > > Ideas are welcomed :) I would be happy to get it working. > Check to see what the user's credential cache file is called. If you "ls -l /tmp" you should be able to find it. If it isn't called /tmp/krb5cc_, where is the uid for the user, then you will need the recent patch applied to the gssd.c that adds a "-s" option to search for the credential cache file in a list of directories. This patch is in head as r244604 and stable/9 as r245089, but not in any release. (Some sshds generate separate credential cache files for each login session, although not the default one in the system, as far as I understand.) rick > 1: http://fxr.watson.org/fxr/ident?v=FREEBSD9;i=NFSERR_WRONGSEC > > Thank you, > Momchil From owner-freebsd-fs@FreeBSD.ORG Fri Feb 22 02:47:26 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0A192F3C for ; Fri, 22 Feb 2013 02:47:26 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id C3601EBB for ; Fri, 22 Feb 2013 02:47:25 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id r1M2lHQu010153; Thu, 21 Feb 2013 20:47:17 -0600 (CST) Date: Thu, 21 Feb 2013 20:47:17 -0600 (CST) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Kevin Day Subject: Re: Improving ZFS performance for large directories In-Reply-To: <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com> Message-ID: References: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com> <20130201192416.GA76461@server.rulingia.com> <19E0C908-79F1-43F8-899C-6B60F998D4A5@dragondata.com> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Thu, 21 Feb 2013 20:47:17 -0600 (CST) Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Feb 2013 02:47:26 -0000 On Tue, 19 Feb 2013, Kevin Day wrote: > Sorry for the late followup, I've been doing some testing with an L2ARC device. > >>> Doing it twice back-to-back makes a bit of difference but it's still slow either way. >> >> ZFS can very conservative about caching data and twice might not be enough. >> I suggest you try 8-10 times, or until the time stops reducing. > > Timing doing an "ls" in large directories 20 times, the first is the slowest, then all subsequent listings are roughly the same. There doesn't appear to be any gain after 20 repetitions You might consider that the bottleneck might be in 'ls' or something outside of zfs. Make sure that you are doing 'ls -f' or else you are just measuring its sorting performance. On a Solaris 10 system in a zfs directory with a million files: % time ls -f |wc -l 1000002 /bin/ls -F -f 0.76s user 0.93s system 89% cpu 1.897 total wc -l 0.08s user 0.02s system 5% cpu 1.697 total % time ls |wc -l 1000000 /bin/ls -F 4.32s user 8.10s system 97% cpu 12.682 total wc -l 0.08s user 0.02s system 0% cpu 12.432 total Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Fri Feb 22 08:38:39 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 27471153 for ; Fri, 22 Feb 2013 08:38:39 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id AB456F01 for ; Fri, 22 Feb 2013 08:38:38 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r1M8cV8b062737; Fri, 22 Feb 2013 10:38:31 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r1M8cV8b062737 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r1M8cVrg062736; Fri, 22 Feb 2013 10:38:31 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 22 Feb 2013 10:38:31 +0200 From: Konstantin Belousov To: Bruce Evans Subject: Re: cleaning files beyond EOF Message-ID: <20130222083831.GK2598@kib.kiev.ua> References: <20130217113031.N9271@besplex.bde.org> <20130217055528.GB2522@kib.kiev.ua> <20130217172928.C1900@besplex.bde.org> <20130217074832.GA2598@kib.kiev.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="qr7nXUVd9Lj/wfVJ" Content-Disposition: inline In-Reply-To: <20130217074832.GA2598@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Feb 2013 08:38:39 -0000 --qr7nXUVd9Lj/wfVJ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Feb 17, 2013 at 09:48:32AM +0200, Konstantin Belousov wrote: > But the ffs_getpages() might be indeed the culprit. It calls > vm_page_zero_invalid(), which only has DEV_BSIZE granularity. I think > that ffs_getpages() also should zero the after eof part of the last page > of the file to fix your damage, since device read cannot read less than > DEV_BSIZE. >=20 Here is the updated patch, with the bug fixed which mis-calculated the size for pmap_zero_page_area(). diff --git a/sys/ufs/ffs/ffs_vnops.c b/sys/ufs/ffs/ffs_vnops.c index 5c99d5b..08508a4 100644 --- a/sys/ufs/ffs/ffs_vnops.c +++ b/sys/ufs/ffs/ffs_vnops.c @@ -829,9 +829,9 @@ static int ffs_getpages(ap) struct vop_getpages_args *ap; { - int i; vm_page_t mreq; - int pcount; + uint64_t size; + int i, isize, pcount; =20 pcount =3D round_page(ap->a_count) / PAGE_SIZE; mreq =3D ap->a_m[ap->a_reqpage]; @@ -846,6 +846,11 @@ ffs_getpages(ap) if (mreq->valid) { if (mreq->valid !=3D VM_PAGE_BITS_ALL) vm_page_zero_invalid(mreq, TRUE); + size =3D VTOI(ap->a_vp)->i_size; + if (mreq->pindex =3D=3D OFF_TO_IDX(size)) { + isize =3D size & PAGE_MASK; + pmap_zero_page_area(mreq, isize, PAGE_SIZE - isize); + } for (i =3D 0; i < pcount; i++) { if (i !=3D ap->a_reqpage) { vm_page_lock(ap->a_m[i]); --qr7nXUVd9Lj/wfVJ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRJy6GAAoJEJDCuSvBvK1BniAP/0OEx4cd92cKW7Q7yEvco7tZ 2TIgSAHHh9WHH2z1R3dWhhL0PleHd45JLEja5dVJ+NvTqcN8yrDGPwocHYIMSDaY 1ZsdQ47WI/fGar7z50j3CjG6lmLf3vlQunrY6sDPK4CNYgVzL/Zgvl8Mh+3kBNwF OWyR9sXFdaAZlB3vhStpNmAR95HrQgwyop6BlYOwgKvl3y7Lk9w5vwNbOdJiA37t aZm8ehSV/DMCFmot4N/Bo5iqRuX6Af7Jz4XsOuZ6IylAY29wAgbgzCGJ+ZkMFKYk SqhNs5UfPpTawY6YPRCeUchqXh+uFZoGSIBheNx061jUvMeMnf2DvoQSdHPjh93t +bXHvfXC95d0orAf+y3TsUamL/iMx8k3HnlKf4QYP7j2hDiRqIxtAV8+ueukkwlW WVenDp2fIe9MH+EMkgetOjjZlopKNU5sfaeJDEaDo5ybFKm6EZad9YEapqGHSpeI TgWtgOX3ETkc4Cn+U+xtBoUNEQv1YQD6TQ7gfMHsI7Y1rYyplX0PgZr0Sj7cRvJc vKYtfWNTqiwlun02a63FEChiMdOwmZVan5XoaScjV4tcORef3+173R4HE41JRcPH b8HNFWGdWTyp/9Jd7iw3FeaqJ8ojcsDQXSo+tbDGyIr5Q6VAm1abE1UfEVNl8Wyr oQnnJR6DU8tNeEi59Urx =q/e/ -----END PGP SIGNATURE----- --qr7nXUVd9Lj/wfVJ-- From owner-freebsd-fs@FreeBSD.ORG Fri Feb 22 18:43:51 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0F0E54BD for ; Fri, 22 Feb 2013 18:43:51 +0000 (UTC) (envelope-from momchil@xaxo.eu) Received: from vps2.xaxo.eu (vps2.xaxo.eu [78.47.156.66]) by mx1.freebsd.org (Postfix) with ESMTP id 79D641FB for ; Fri, 22 Feb 2013 18:43:49 +0000 (UTC) Received: from t61.xaxo.eu ([10.75.23.6]) by vps2.xaxo.eu (8.14.4/8.14.4) with ESMTP id r1MIhlvj001957; Fri, 22 Feb 2013 19:43:48 +0100 (CET) (envelope-from momchil@xaxo.eu) Date: Fri, 22 Feb 2013 19:43:39 +0100 Message-ID: <86txp4gpes.wl%momchil@xaxo.eu> From: Momchil Ivanov To: Rick Macklem Subject: Re: NFS + Kerberos In-Reply-To: <1845485841.3202259.1361501159585.JavaMail.root@erie.cs.uoguelph.ca> References: <86ip5lkvnm.wl%momchil@xaxo.eu> <1845485841.3202259.1361501159585.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Cc: freebsd-fs@freebsd.org, Momchil Ivanov X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Feb 2013 18:43:51 -0000 At Thu, 21 Feb 2013 21:45:59 -0500 (EST), Rick Macklem wrote: > > Momchil Ivanov wrote: > > At Thu, 21 Feb 2013 18:17:56 -0500 (EST), > > Rick Macklem wrote: > > > Error 10016 is NFS4ERR_WRONGSEC. This means that the server expects > > > a > > > different security flavour (sys maybe) at some point in the mount. > > > > btw you have a typo, it's NFSERR_WRONGSEC. > Actually, it's called NFS4ERR_WRONGSEC in the RFC and NFSERR_WRONGSEC in > the NFS sources, just to try and confuse you;-) ok :) > Just as an experiment, you could try adding "sys" to the -sec list > for both lines. If the mount works then, it would tell you that the > client isn't successfully getting a Kerberos credential and is > falling back to using "sys" (called AUTH_SYS in the RFCs, just for > further confusion;-). I can mount with the following /etc/exports file: V4: /tank/storage -sec=sys:krb5i:krb5p /tank/storage -sec=sys:krb5i:krb5p and the command: mount -t nfs -o nfsv4,sec=sys srv.example.local:/ /mnt/srv and without a kerberos ticket I can also mount with: mount -t nfs -o nfsv4,sec=krb5i srv.example.local:/ /mnt/srv mount -t nfs -o nfsv4,sec=krb5p srv.example.local:/ /mnt/srv so it falls back to sys... ... > Check to see what the user's credential cache file is called. > If you "ls -l /tmp" you should be able to find it. > > If it isn't called /tmp/krb5cc_, where is the uid for > the user, then you will need the recent patch applied to the gssd.c > that adds a "-s" option to search for the credential cache file in a list of > directories. This patch is in head as r244604 and stable/9 as > r245089, but not in any release. (Some sshds generate separate > credential cache files for each login session, although not the > default one in the system, as far as I understand.) on the client machine with FreeBSD 8.2-STABLE as of around Dec 2011, the file exists and is /tmp/krb5cc_1001, where 1001 is the uid of the user that I am using to mount the nfs file system. I have also tried to mount the file system from the server (FreeBSD 9.1) on the server itself using the same commands, I do get the nfs/srv.example.local@EXAMPLE.LOCAL ticket, but it dies with the same error: nfsv4 err=10016 mount_nfs: /mnt/srv, : Input/output error is there some way I can get verbose output from nfsd or gssd that tells me why it is failing, or do you have any other ideas :) ? Thank you, Momchil From owner-freebsd-fs@FreeBSD.ORG Sat Feb 23 00:04:24 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E2BE4500 for ; Sat, 23 Feb 2013 00:04:24 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id ADAFD2FF for ; Sat, 23 Feb 2013 00:04:24 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEAOUFKFGDaFvO/2dsb2JhbABEhk64FYJagSJzgh8BAQQBI1YFFhgCAg0FARMCWQaIHwase5IggSOMMASBAzQHEgGCGoETA4hojVKQY4MlgUwBBxce X-IronPort-AV: E=Sophos;i="4.84,719,1355115600"; d="scan'208";a="15393134" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 22 Feb 2013 19:04:23 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 352CBB41DC; Fri, 22 Feb 2013 19:04:23 -0500 (EST) Date: Fri, 22 Feb 2013 19:04:23 -0500 (EST) From: Rick Macklem To: Momchil Ivanov Message-ID: <1103491143.3229700.1361577863159.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <86txp4gpes.wl%momchil@xaxo.eu> Subject: Re: NFS + Kerberos MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, Elias Martenson X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Feb 2013 00:04:24 -0000 Momchil Ivanov wrote: > At Thu, 21 Feb 2013 21:45:59 -0500 (EST), > Rick Macklem wrote: > > > > Momchil Ivanov wrote: > > > At Thu, 21 Feb 2013 18:17:56 -0500 (EST), > > > Rick Macklem wrote: > > > > Error 10016 is NFS4ERR_WRONGSEC. This means that the server > > > > expects > > > > a > > > > different security flavour (sys maybe) at some point in the > > > > mount. > > > > > > btw you have a typo, it's NFSERR_WRONGSEC. > > Actually, it's called NFS4ERR_WRONGSEC in the RFC and > > NFSERR_WRONGSEC in > > the NFS sources, just to try and confuse you;-) > > ok :) > > > Just as an experiment, you could try adding "sys" to the -sec list > > for both lines. If the mount works then, it would tell you that the > > client isn't successfully getting a Kerberos credential and is > > falling back to using "sys" (called AUTH_SYS in the RFCs, just for > > further confusion;-). > > I can mount with the following /etc/exports file: > > V4: /tank/storage -sec=sys:krb5i:krb5p > /tank/storage -sec=sys:krb5i:krb5p > > and the command: > > mount -t nfs -o nfsv4,sec=sys srv.example.local:/ /mnt/srv > > and without a kerberos ticket I can also mount with: > > mount -t nfs -o nfsv4,sec=krb5i srv.example.local:/ /mnt/srv > mount -t nfs -o nfsv4,sec=krb5p srv.example.local:/ /mnt/srv > > so it falls back to sys... > > ... > > > Check to see what the user's credential cache file is called. > > If you "ls -l /tmp" you should be able to find it. > > > > If it isn't called /tmp/krb5cc_, where is the uid for > > the user, then you will need the recent patch applied to the gssd.c > > that adds a "-s" option to search for the credential cache file in a > > list of > > directories. This patch is in head as r244604 and stable/9 as > > r245089, but not in any release. (Some sshds generate separate > > credential cache files for each login session, although not the > > default one in the system, as far as I understand.) > > on the client machine with FreeBSD 8.2-STABLE as of around Dec 2011, > the file exists and is /tmp/krb5cc_1001, where 1001 is the uid of the > user that I am using to mount the nfs file system. > Ok, so you don't need the "-s" option for the gssd. > I have also tried to mount the file system from the server (FreeBSD > 9.1) on the server itself using the same commands, I do get the > nfs/srv.example.local@EXAMPLE.LOCAL ticket, but it dies with the same > error: > > nfsv4 err=10016 > mount_nfs: /mnt/srv, : Input/output error > > is there some way I can get verbose output from nfsd or gssd that > tells me why it is failing, or do you have any other ideas :) ? > You can run "gssd -d -d" and it will run in foreground and print out messages related to resource allocation. This isn't much use, except to tell you that it is doing something. (Adding a "verbose" option is on my "to do" list, but I don't have any code at this time. If someone wants to do this, I think it would be great.) If you do this, don't have it started at boot (gssd_enable="NO" in /etc/rc.conf) and then do the above command as root in a window before attempting the mount command. Beyond that, you could add printfs to gssd.c. The main client side function is gssd_init_sec_context(), which should get the Kerberos ticket for a user via their TGT. I've added Elias to the cc list, since he just went through this and might be able to help. rick > Thank you, > Momchil From owner-freebsd-fs@FreeBSD.ORG Sat Feb 23 03:57:10 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A65CB649 for ; Sat, 23 Feb 2013 03:57:10 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta03.emeryville.ca.mail.comcast.net (qmta03.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:32]) by mx1.freebsd.org (Postfix) with ESMTP id 6D85FA29 for ; Sat, 23 Feb 2013 03:57:10 +0000 (UTC) Received: from omta21.emeryville.ca.mail.comcast.net ([76.96.30.88]) by qmta03.emeryville.ca.mail.comcast.net with comcast id 3cG11l0021u4NiLA3fx9F3; Sat, 23 Feb 2013 03:57:09 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta21.emeryville.ca.mail.comcast.net with comcast id 3fx81l00c1t3BNj8hfx9RQ; Sat, 23 Feb 2013 03:57:09 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id D8FB673A31; Fri, 22 Feb 2013 19:57:08 -0800 (PST) Date: Fri, 22 Feb 2013 19:57:08 -0800 From: Jeremy Chadwick To: Alexander Motin Subject: Re: disk "flipped" - a known problem? Message-ID: <20130223035708.GA23614@icarus.home.lan> References: <20130121221617.GA23909@icarus.home.lan> <50FED818.7070704@FreeBSD.org> <20130125083619.GA51096@icarus.home.lan> <20130125211232.GA3037@icarus.home.lan> <20130125212559.GA1772@icarus.home.lan> <20130125213209.GA1858@icarus.home.lan> <20130126011754.GA1806@icarus.home.lan> <51267055.3040500@FreeBSD.org> <20130221233609.GA92249@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130221233609.GA92249@icarus.home.lan> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1361591829; bh=CXCEcgFQ1KOgXnPMZpxlAAIvGfe8VMQgvzd8wdtnrXU=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=MDvDZxv6U98jkGZVE5C70i74I4EH5tS9nUwGmzmYDv00uJ9vqcVpgkPon5H/BbF/X QwvLPL1anb6N6WrJGPi05CYRNMmEQCg8Glij3qmQBMGOIvAGDTLTqFszh7t29uXT47 tzP9u0Ry3DOxvNuS5A3ejtk3sWLrKBkK2Ng/va7G+xTzGOkX8Y0leR6CJMY4UtdM4x UjC24+YBglX+cBkmBCWkq7CXWFzY29OTAm4zvub3eLEU9pI9ZuBhtnnFKfbagFNE6b TSIucXBNNaTn9VzpXLUjz6cFYkxARczAJTOIR+7rQBNJnrHLL1C1g0w0kmueZd/acI c9bYAb/da5iRA== Cc: freebsd-fs@freebsd.org, avg@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Feb 2013 03:57:10 -0000 On Thu, Feb 21, 2013 at 03:36:09PM -0800, Jeremy Chadwick wrote: > On Thu, Feb 21, 2013 at 09:07:01PM +0200, Alexander Motin wrote: > > On 26.01.2013 03:17, Jeremy Chadwick wrote: > > > Okay, I've figured out the exact, 100% reproducible condition that > > > causes the situation. It took me a lot of tries and a digital pocket > > > recorder to take verbal notes (there are just too many things to look at > > > simultaneously), but I've figured it out. > > > > > > I'm sorry for the verbosity, but it's necessary. > > > > > > Assume the disk we're talking about is /dev/ada5. > > > > > > 1. Prior to any issues, we have this: > > > > > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5* > > > crw-r----- 1 root operator 0x8c Jan 25 16:41 /dev/ada5 > > > crw------- 1 root operator 0x75 Jan 25 16:35 /dev/pass5 > > > crw------- 1 root operator 0x51 Jan 25 16:35 /dev/xpt0 > > > > > > 2. ada5 begins experiencing issues -- ATA commands (CDBs) submit do not > > > get a response (not going to discuss how/why that can happen). > > > > > > 3. These types of messages are seen on console (naturally the CDB and > > > request type will vary -- in this case it was because I was doing the dd > > > zero'ing, thus tickling the bad sector/naughty firmware on the drive): > > > > > > Jan 25 16:29:28 icarus kernel: ahcich5: Timeout on slot 0 port 0 > > > Jan 25 16:29:28 icarus kernel: ahcich5: is 00000000 cs 00000000 ss 00000001 rs 00000001 tfd 40 serr 00000000 cmd 0004c017 > > > Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset... > > > Jan 25 16:29:28 icarus kernel: ahcich5: SATA connect time=1000us status=00000113 > > > Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset: device found > > > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 80 80 77 01 40 00 00 00 00 00 00 > > > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout > > > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): Retrying command > > > > > > 4. Any I/O submit to ada5 during this time blocks (this is normal). > > > > > > 5. **While this situation is happening**, something using xpt(4) > > > attempts to submit a CDB to the disk (ex. smartctl -a /dev/ada5). > > > This request also blocks (again, normal). > > > > > > 6. Physical device falls off bus, or CAM kicks the disk off the bus. > > > Doesn't matter which. We see messages resembling this (boy am I tired > > > of this interspersed output problem): > > > > > > Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): lost device > > > Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): lost device > > > Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): removing device entry > > > Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): passdevgonecb: devfs entry is gone > > > > > > 7. Standard I/O requests fail with errno=6 "Device not configured". > > > xpt(4) requests also fail with the same errno. > > > > > > 8. Device-wise, at this stage all we have is: > > > > > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5* > > > crw------- 1 root operator 0x51 Jan 25 16:35 /dev/xpt0 > > > > > > 9. Device comes back online for whatever reason. FreeBSD sees the disk, > > > blah blah blah: > > > > > > Jan 25 16:30:16 icarus kernel: GEOM: new disk ada5 > > > Jan 25 16:30:16 icarus kernel: ada5: ATA-7 SATA 1.x device > > > Jan 25 16:30:16 icarus kernel: ada5: Serial Number WD-WMAP41573589 > > > Jan 25 16:30:16 icarus kernel: ada5: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) > > > Jan 25 16:30:16 icarus kernel: ada5: Command Queueing enabled > > > Jan 25 16:30:16 icarus kernel: ada5: 143089MB (293046768 512 byte sectors: 16H 63S/T 16383C) > > > Jan 25 16:30:16 icarus kernel: ada5: Previously was known as ad14 > > > > > > ...um, where's pass5? > > > > > > 10. /dev/pass5 is now completely (permanently) missing: > > > > > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5* > > > crw-r----- 1 root operator 0x99 Jan 25 16:42 /dev/ada5 > > > crw------- 1 root operator 0x51 Jan 25 16:35 /dev/xpt0 > > > > > > 11. Any further attempts to communicate via xpt(4) with ada5 fail. > > > Detaching and reattaching the disk does not fix the issue; the only fix > > > is to reboot the system. > > > > > > 12. "camcontrol debug -IPXp scbus5" results in tons and tons of output > > > all pertaining to xpt(4). It looks like xpt(4) is in some kind of > > > loop. > > > > > > Below is my verbose boot (with non-kernel things removed), which > > > also includes "camcontrol debug" output once things are in a bad state: > > > > > > http://jdc.koitsu.org/freebsd/xpt_oddity.log > > > > > > In this log you'll see that after 1 CAM timeout I yanked the drive, then > > > roughly 30 seconds later reinserted it. > > > > > > If you need me to turn on CAM debugging *prior* to the above, I can do > > > that, just let me know. > > > > > > The important step is #5. Without that, the problem shown in #9/10/11 > > > does not happen. > > > > > > It's a good thing I don't run smartd(8) -- most users I see using that > > > software set the interval to something like 180s or 60s. Imagine this > > > frustration: "okay so the disk fell off the bus, but what, now I can't > > > talk to it with SMART? Uhhh... Err, works now? Whatever". > > > > I think, the problem may already be fixed in HEAD by r244014 by ken@. > > I've just merged it to 9-STABLE at r247115. So if it is still possible > > to reproduce the situation, it would be good to try. > > Yep, I saw the commit per svn-src-stable-9@freebsd.org, along with > a bunch of others; I wasn't sure if r247114 or r247115 fixed it, so ws > waiting for a follow-up from you. :-) > > I'll rebuild world/kernel and try it out + report back. Thank you (and > ken@ too!) for the work on this. Got around to this today -- I can confirm as of r247132 on stable/9 the above problem is gone. Verification details below, for those who care: Initial attachment: Feb 22 19:40:32 icarus kernel: ada5 at ahcich5 bus 0 scbus5 target 0 lun 0 Feb 22 19:40:32 icarus kernel: ada5: ATA-8 SATA 2.x device Feb 22 19:40:32 icarus kernel: ada5: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes) Feb 22 19:40:32 icarus kernel: ada5: Command Queueing enabled Feb 22 19:40:32 icarus kernel: ada5: 715404MB (1465149168 512 byte sectors: 16H 63S/T 16383C) Feb 22 19:40:32 icarus kernel: ada5: Previously was known as ad14 Ran dd if=/dev/zero of=/dev/ada5 bs=64k. Timeouts occurring due to physical issues with the disk itself: Feb 22 19:44:01 icarus kernel: ahcich5: Timeout on slot 0 port 0 Feb 22 19:44:01 icarus kernel: ahcich5: is 00000000 cs 00000000 ss 00000001 rs 00000001 tfd 40 serr 00000000 cmd 0004c017 Feb 22 19:44:01 icarus kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 80 00 0c 00 40 00 00 00 00 00 00 Feb 22 19:44:01 icarus kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout Feb 22 19:44:01 icarus kernel: (ada5:ahcich5:0:0:0): Retrying command Feb 22 19:44:33 icarus kernel: ahcich5: Timeout on slot 0 port 0 Feb 22 19:44:33 icarus kernel: ahcich5: is 00000000 cs 00000000 ss 00000001 rs 00000001 tfd 40 serr 00000000 cmd 0004c017 Feb 22 19:44:33 icarus kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 80 00 18 00 40 00 00 00 00 00 00 Feb 22 19:44:33 icarus kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout Feb 22 19:44:33 icarus kernel: (ada5:ahcich5:0:0:0): Retrying command Initiated smartctl -a /dev/ada5, which blocked as expected. Timeouts still happening, and to speed up the process I yanked the disk: Feb 22 19:45:32 icarus kernel: ahcich5: Timeout on slot 0 port 0 Feb 22 19:45:32 icarus kernel: ahcich5: is 00000000 cs 00000000 ss 00000001 rs 00000001 tfd 40 serr 00000000 cmd 0004c017 Feb 22 19:45:32 icarus kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 80 80 25 00 40 00 00 00 00 00 00 Feb 22 19:45:32 icarus kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout Feb 22 19:45:32 icarus kernel: (ada5:ahcich5:0:0:0): Retrying command Feb 22 19:45:55 icarus kernel: (ada5:ahcich5:0:0:0): lost device Feb 22 19:45:55 icarus kernel: (ada5:ahcich5:0:0:0): removing device entry After yanking: root@icarus:~ # smartctl -a /dev/ada5 smartctl 6.0 2012-10-10 r3643 [FreeBSD 9.1-STABLE amd64] (local build) Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org (pass5:ahcich5:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 01 00 (pass5:ahcich5:0:0:0): CAM status: Unconditionally Re-queue Request smartctl: cam_send_ccb: Device not configured Checking devices: root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5* crw------- 1 root operator 0x52 Feb 21 21:37 /dev/xpt0 Reinserted disk: Feb 22 19:47:53 icarus kernel: ada5 at ahcich5 bus 0 scbus5 target 0 lun 0 Feb 22 19:47:53 icarus kernel: ada5: ATA-8 SATA 2.x device Feb 22 19:47:53 icarus kernel: ada5: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes) Feb 22 19:47:53 icarus kernel: ada5: Command Queueing enabled Feb 22 19:47:53 icarus kernel: ada5: 715404MB (1465149168 512 byte sectors: 16H 63S/T 16383C) Feb 22 19:47:53 icarus kernel: ada5: Previously was known as ad14 Devices look good: root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5* crw-r----- 1 root operator 0x98 Feb 22 19:47 /dev/ada5 crw------- 1 root operator 0x96 Feb 22 19:47 /dev/pass5 crw------- 1 root operator 0x52 Feb 21 21:37 /dev/xpt0 And smartctl works fine. :-) (Footnote for readers: The previous WD disk I was testing with went belly up in an even worse way so I couldn't use it, but thankfully I've lots of bad disks that exhibit repeated timeouts during I/O. *pats his angry ST3750630AS*) -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sat Feb 23 09:39:44 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 47488FCF; Sat, 23 Feb 2013 09:39:44 +0000 (UTC) (envelope-from utisoft@gmail.com) Received: from mail-ia0-x232.google.com (mail-ia0-x232.google.com [IPv6:2607:f8b0:4001:c02::232]) by mx1.freebsd.org (Postfix) with ESMTP id 07F25322; Sat, 23 Feb 2013 09:39:43 +0000 (UTC) Received: by mail-ia0-f178.google.com with SMTP id y26so1218457iab.37 for ; Sat, 23 Feb 2013 01:39:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=nTYqMs4HMt4lkQehenOajxB6rIn7pOhOmw3BPqClYrA=; b=tgjUtVTtMg5SmOu3V76xx38zFVeTx/EuGxBYZbet/FuxTl06pgY0PS4FqG6eZy9OFm P8HAyl8uxQODJoUoZA+P6SN+DecezZia2fu93TrmFGOxKIr2ZaU4p6KVugL4/U0y8Y3I HT4XkyPu6X997C+9KuFQxl6mErck9bdnpX+Nn/8jfT+Y/sRSfl+VYOxfgIPlUrwlLgOf QKXJMRTyh60dtmnsx/f35oVOEptvyRmkK2/0pZA76x/K95vBH13bIUI0ORsqdiYPjeRw /GU4jGarLxoX9O9cnv6cNJkJzu7Bh0fOMZI3/kVf12M4FlXi+CvYYzKVSGx7jAc5Huwa T1xg== MIME-Version: 1.0 X-Received: by 10.50.152.169 with SMTP id uz9mr677475igb.15.1361612383645; Sat, 23 Feb 2013 01:39:43 -0800 (PST) Received: by 10.64.63.12 with HTTP; Sat, 23 Feb 2013 01:39:43 -0800 (PST) Received: by 10.64.63.12 with HTTP; Sat, 23 Feb 2013 01:39:43 -0800 (PST) In-Reply-To: <51267055.3040500@FreeBSD.org> References: <20130121221617.GA23909@icarus.home.lan> <50FED818.7070704@FreeBSD.org> <20130125083619.GA51096@icarus.home.lan> <20130125211232.GA3037@icarus.home.lan> <20130125212559.GA1772@icarus.home.lan> <20130125213209.GA1858@icarus.home.lan> <20130126011754.GA1806@icarus.home.lan> <51267055.3040500@FreeBSD.org> Date: Sat, 23 Feb 2013 09:39:43 +0000 Message-ID: Subject: Re: disk "flipped" - a known problem? From: Chris Rees To: mav@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Jeremy Chadwick , freebsd-fs@freebsd.org, Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Feb 2013 09:39:44 -0000 On 21 Feb 2013 19:07, "Alexander Motin" wrote: > > On 26.01.2013 03:17, Jeremy Chadwick wrote: > > Okay, I've figured out the exact, 100% reproducible condition that > > causes the situation. It took me a lot of tries and a digital pocket > > recorder to take verbal notes (there are just too many things to look at > > simultaneously), but I've figured it out. > > > > I'm sorry for the verbosity, but it's necessary. > > > > Assume the disk we're talking about is /dev/ada5. > > > > 1. Prior to any issues, we have this: > > > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5* > > crw-r----- 1 root operator 0x8c Jan 25 16:41 /dev/ada5 > > crw------- 1 root operator 0x75 Jan 25 16:35 /dev/pass5 > > crw------- 1 root operator 0x51 Jan 25 16:35 /dev/xpt0 > > > > 2. ada5 begins experiencing issues -- ATA commands (CDBs) submit do not > > get a response (not going to discuss how/why that can happen). > > > > 3. These types of messages are seen on console (naturally the CDB and > > request type will vary -- in this case it was because I was doing the dd > > zero'ing, thus tickling the bad sector/naughty firmware on the drive): > > > > Jan 25 16:29:28 icarus kernel: ahcich5: Timeout on slot 0 port 0 > > Jan 25 16:29:28 icarus kernel: ahcich5: is 00000000 cs 00000000 ss 00000001 rs 00000001 tfd 40 serr 00000000 cmd 0004c017 > > Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset... > > Jan 25 16:29:28 icarus kernel: ahcich5: SATA connect time=1000us status=00000113 > > Jan 25 16:29:28 icarus kernel: ahcich5: AHCI reset: device found > > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 80 80 77 01 40 00 00 00 00 00 00 > > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout > > Jan 25 16:29:28 icarus kernel: (ada5:ahcich5:0:0:0): Retrying command > > > > 4. Any I/O submit to ada5 during this time blocks (this is normal). > > > > 5. **While this situation is happening**, something using xpt(4) > > attempts to submit a CDB to the disk (ex. smartctl -a /dev/ada5). > > This request also blocks (again, normal). > > > > 6. Physical device falls off bus, or CAM kicks the disk off the bus. > > Doesn't matter which. We see messages resembling this (boy am I tired > > of this interspersed output problem): > > > > Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): lost device > > Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): lost device > > Jan 25 16:29:32 icarus kernel: (ada5:ahcich5:0:0:0): removing device entry > > Jan 25 16:29:32 icarus kernel: (pass5:ahcich5:0:0:0): passdevgonecb: devfs entry is gone > > > > 7. Standard I/O requests fail with errno=6 "Device not configured". > > xpt(4) requests also fail with the same errno. > > > > 8. Device-wise, at this stage all we have is: > > > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5* > > crw------- 1 root operator 0x51 Jan 25 16:35 /dev/xpt0 > > > > 9. Device comes back online for whatever reason. FreeBSD sees the disk, > > blah blah blah: > > > > Jan 25 16:30:16 icarus kernel: GEOM: new disk ada5 > > Jan 25 16:30:16 icarus kernel: ada5: ATA-7 SATA 1.x device > > Jan 25 16:30:16 icarus kernel: ada5: Serial Number WD-WMAP41573589 > > Jan 25 16:30:16 icarus kernel: ada5: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) > > Jan 25 16:30:16 icarus kernel: ada5: Command Queueing enabled > > Jan 25 16:30:16 icarus kernel: ada5: 143089MB (293046768 512 byte sectors: 16H 63S/T 16383C) > > Jan 25 16:30:16 icarus kernel: ada5: Previously was known as ad14 > > > > ...um, where's pass5? > > > > 10. /dev/pass5 is now completely (permanently) missing: > > > > root@icarus:~ # ls -l /dev/ada5* /dev/xpt* /dev/pass5* > > crw-r----- 1 root operator 0x99 Jan 25 16:42 /dev/ada5 > > crw------- 1 root operator 0x51 Jan 25 16:35 /dev/xpt0 > > > > 11. Any further attempts to communicate via xpt(4) with ada5 fail. > > Detaching and reattaching the disk does not fix the issue; the only fix > > is to reboot the system. > > > > 12. "camcontrol debug -IPXp scbus5" results in tons and tons of output > > all pertaining to xpt(4). It looks like xpt(4) is in some kind of > > loop. > > > > Below is my verbose boot (with non-kernel things removed), which > > also includes "camcontrol debug" output once things are in a bad state: > > > > http://jdc.koitsu.org/freebsd/xpt_oddity.log > > > > In this log you'll see that after 1 CAM timeout I yanked the drive, then > > roughly 30 seconds later reinserted it. > > > > If you need me to turn on CAM debugging *prior* to the above, I can do > > that, just let me know. > > > > The important step is #5. Without that, the problem shown in #9/10/11 > > does not happen. > > > > It's a good thing I don't run smartd(8) -- most users I see using that > > software set the interval to something like 180s or 60s. Imagine this > > frustration: "okay so the disk fell off the bus, but what, now I can't > > talk to it with SMART? Uhhh... Err, works now? Whatever". > > I think, the problem may already be fixed in HEAD by r244014 by ken@. > I've just merged it to 9-STABLE at r247115. So if it is still possible > to reproduce the situation, it would be good to try. I think I've been having the same troubles since upgrading from 9.0, so I'm going to try applying that to 9.1-R and I'll also give feedback. Chris From owner-freebsd-fs@FreeBSD.ORG Sat Feb 23 12:00:20 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BC43F6EE for ; Sat, 23 Feb 2013 12:00:20 +0000 (UTC) (envelope-from momchil@xaxo.eu) Received: from vps2.xaxo.eu (vps2.xaxo.eu [78.47.156.66]) by mx1.freebsd.org (Postfix) with ESMTP id 3970BA84 for ; Sat, 23 Feb 2013 12:00:19 +0000 (UTC) Received: from t61.xaxo.eu ([10.75.23.6]) by vps2.xaxo.eu (8.14.4/8.14.4) with ESMTP id r1NC0CaJ017602; Sat, 23 Feb 2013 13:00:12 +0100 (CET) (envelope-from momchil@xaxo.eu) Date: Sat, 23 Feb 2013 13:00:03 +0100 Message-ID: <86hal3kzp8.wl%momchil@xaxo.eu> From: Momchil Ivanov To: Rick Macklem Subject: Re: NFS + Kerberos In-Reply-To: <1103491143.3229700.1361577863159.JavaMail.root@erie.cs.uoguelph.ca> References: <86txp4gpes.wl%momchil@xaxo.eu> <1103491143.3229700.1361577863159.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Cc: freebsd-fs@freebsd.org, Elias Martenson , Momchil Ivanov X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Feb 2013 12:00:20 -0000 At Fri, 22 Feb 2013 19:04:23 -0500 (EST), Rick Macklem wrote: > You can run "gssd -d -d" and it will run in foreground and print > out messages related to resource allocation. This isn't much use, > except to tell you that it is doing something. (Adding a "verbose" > option is on my "to do" list, but I don't have any code at this time. > If someone wants to do this, I think it would be great.) > > If you do this, don't have it started at boot (gssd_enable="NO" in > /etc/rc.conf) and then do the above command as root in a window > before attempting the mount command. > > Beyond that, you could add printfs to gssd.c. The main client side > function is gssd_init_sec_context(), which should get the Kerberos > ticket for a user via their TGT. well, the server doesn't seem to start it at boot with gssd_enable="YES", I don't know why, but I cannot stop/restart nfsd until I manually start gssd :) the client starts it at boot, though note: I can ssh into the server even when gssd is not running, I don't know if this is expected. "gssd -d -d" prints things like this on the client and the server: 1 resources allocated 2 resources allocated 1 resources allocated 0 resources allocated 1 resources allocated 2 resources allocated 1 resources allocated 0 resources allocated 1 resources allocated 2 resources allocated 1 resources allocated 0 resources allocated which doesn't tell me anything :) so here is what happens on the client without a kerberos ticket: 1 resources allocated /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > init_sec_context_args uid: 1001 cred: 0 ctx: 0 name: 5848115787646107649 req_flags: 5848115787646107650 > gss_resources i=0 gr_id :5848115787646107649 gr_res :0x28203060 /usr/src/usr.sbin/gssd/gssd.c:307 argp->name /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 0 resources allocated 1 resources allocated /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > init_sec_context_args uid: 1001 cred: 0 ctx: 0 name: 5848115787646107650 req_flags: 5848115787646107650 > gss_resources i=0 gr_id :5848115787646107650 gr_res :0x28203060 /usr/src/usr.sbin/gssd/gssd.c:307 argp->name /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 0 resources allocated 1 resources allocated /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > init_sec_context_args uid: 1001 cred: 0 ctx: 0 name: 5848115787646107651 req_flags: 5848115787646107650 > gss_resources i=0 gr_id :5848115787646107651 gr_res :0x28203060 /usr/src/usr.sbin/gssd/gssd.c:307 argp->name /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 0 resources allocated here is what happens with a kerberos ticket: 1 resources allocated /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > init_sec_context_args uid: 1001 cred: 0 ctx: 0 name: 5848116041049178113 req_flags: 5848116041049178114 > gss_resources i=0 gr_id :5848116041049178113 gr_res :0x28203060 /usr/src/usr.sbin/gssd/gssd.c:307 argp->name /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 2 resources allocated /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED 1 resources allocated 0 resources allocated 1 resources allocated /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > init_sec_context_args uid: 1001 cred: 0 ctx: 0 name: 5848116041049178115 req_flags: 5848116041049178114 > gss_resources i=0 gr_id :5848116041049178115 gr_res :0x28203060 /usr/src/usr.sbin/gssd/gssd.c:307 argp->name /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 2 resources allocated /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED 1 resources allocated 0 resources allocated 1 resources allocated /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > init_sec_context_args uid: 1001 cred: 0 ctx: 0 name: 5848116041049178117 req_flags: 5848116041049178114 > gss_resources i=0 gr_id :5848116041049178117 gr_res :0x28203060 /usr/src/usr.sbin/gssd/gssd.c:307 argp->name /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 2 resources allocated /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED 1 resources allocated 0 resources allocated here is what I have changed: --- gssd.c.orig 2013-02-23 11:13:20.000000000 +0100 +++ gssd.c 2013-02-23 12:34:33.000000000 +0100 @@ -238,6 +238,33 @@ return (TRUE); } +static void +dump_resources(FILE *s) +{ + struct gss_resource *gr; + int i; + + fprintf(s, "> gss_resources\n"); + + i = 0; + LIST_FOREACH(gr, &gss_resources, gr_link) { + fprintf(s, "i=%d\n", i); + fprintf(s, "gr_id :%llu\n", gr->gr_id); + fprintf(s, "gr_res :%p\n", gr->gr_res); + } +} + +void +dump_init_sec_context_args(FILE *s, init_sec_context_args *p) +{ + fprintf(s, "> init_sec_context_args\n"); + fprintf(s, "uid: %d\n", p->uid); + fprintf(s, "cred: %llu\n", p->cred); + fprintf(s, "ctx: %llu\n", p->ctx); + fprintf(s, "name: %llu\n", p->name); + fprintf(s, "req_flags: %llu\n", p->req_flags); +} + bool_t gssd_init_sec_context_1_svc(init_sec_context_args *argp, init_sec_context_res *result, struct svc_req *rqstp) { @@ -248,27 +275,42 @@ snprintf(ccname, sizeof(ccname), "FILE:/tmp/krb5cc_%d", (int) argp->uid); + + printf("%s:%d %s\n", __FILE__, __LINE__, ccname); + dump_init_sec_context_args(stdout, argp); + dump_resources(stdout); + setenv("KRB5CCNAME", ccname, TRUE); memset(result, 0, sizeof(*result)); if (argp->cred) { + printf("%s:%d argp->cred\n", __FILE__, __LINE__); cred = gssd_find_resource(argp->cred); + printf("%s:%d cred=%llu\n", __FILE__, __LINE__, cred); if (!cred) { result->major_status = GSS_S_CREDENTIALS_EXPIRED; + printf("%s:%d GSS_S_CREDENTIALS_EXPIRED\n", __FILE__, __LINE__); return (TRUE); } } if (argp->ctx) { + printf("%s:%d argp->ctx\n", __FILE__, __LINE__); ctx = gssd_find_resource(argp->ctx); + printf("%s:%d ctx=%llu\n", __FILE__, __LINE__, ctx); if (!ctx) { result->major_status = GSS_S_CONTEXT_EXPIRED; + printf("%s:%d GSS_S_CONTEXT_EXPIRED\n", __FILE__, __LINE__); return (TRUE); } } if (argp->name) { + printf("%s:%d argp->name\n", __FILE__, __LINE__); name = gssd_find_resource(argp->name); + printf("%s:%d name=%llu\n", __FILE__, __LINE__, name); + printf("%s:%d name=%p\n", __FILE__, __LINE__, name); if (!name) { result->major_status = GSS_S_BAD_NAME; + printf("%s:%d GSS_S_BAD_NAME\n", __FILE__, __LINE__); return (TRUE); } } @@ -286,6 +328,11 @@ result->ctx = argp->ctx; else result->ctx = gssd_make_resource(ctx); + + if (result->major_status == GSS_S_COMPLETE) + printf("%s:%d GSS_S_COMPLETE\n", __FILE__, __LINE__); + else + printf("%s:%d GSS_S_CONTINUE_NEEDED\n", __FILE__, __LINE__); } return (TRUE); Ideas? Thank you, Momchil From owner-freebsd-fs@FreeBSD.ORG Sat Feb 23 15:35:48 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4715A221 for ; Sat, 23 Feb 2013 15:35:48 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id D792E1F7 for ; Sat, 23 Feb 2013 15:35:47 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEABPhKFGDaFvO/2dsb2JhbABFhk+4KYJagR9zgh8BAQQBIwRSBRYYAgINGQJZBoggBq0ZhBCNcIEjjDSBAzQHgi2BEwOIaY1UkGWDJYIJ X-IronPort-AV: E=Sophos;i="4.84,721,1355115600"; d="scan'208";a="17925705" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 23 Feb 2013 10:35:47 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 1288CB3EE4; Sat, 23 Feb 2013 10:35:47 -0500 (EST) Date: Sat, 23 Feb 2013 10:35:47 -0500 (EST) From: Rick Macklem To: Momchil Ivanov Message-ID: <508324799.3234256.1361633747016.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <86hal3kzp8.wl%momchil@xaxo.eu> Subject: Re: NFS + Kerberos MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, Elias Martenson X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Feb 2013 15:35:48 -0000 Momchil Ivanov wrote: > At Fri, 22 Feb 2013 19:04:23 -0500 (EST), > Rick Macklem wrote: > > You can run "gssd -d -d" and it will run in foreground and print > > out messages related to resource allocation. This isn't much use, > > except to tell you that it is doing something. (Adding a "verbose" > > option is on my "to do" list, but I don't have any code at this > > time. > > If someone wants to do this, I think it would be great.) > > > > If you do this, don't have it started at boot (gssd_enable="NO" in > > /etc/rc.conf) and then do the above command as root in a window > > before attempting the mount command. > > > > Beyond that, you could add printfs to gssd.c. The main client side > > function is gssd_init_sec_context(), which should get the Kerberos > > ticket for a user via their TGT. > > well, the server doesn't seem to start it at boot with > gssd_enable="YES", I don't know why, but I cannot stop/restart nfsd > until I manually start gssd :) the client starts it at boot, though > > note: I can ssh into the server even when gssd is not running, I don't > know if this is expected. > Yes. The gssd only handles upcalls from the kernel and only NFS does those at this time. > "gssd -d -d" prints things like this on the client and the server: > > 1 resources allocated > 2 resources allocated > 1 resources allocated > 0 resources allocated > 1 resources allocated > 2 resources allocated > 1 resources allocated > 0 resources allocated > 1 resources allocated > 2 resources allocated > 1 resources allocated > 0 resources allocated > > which doesn't tell me anything :) so here is what happens on the > client without a kerberos ticket: > > 1 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > > init_sec_context_args > uid: 1001 > cred: 0 > ctx: 0 > name: 5848115787646107649 > req_flags: 5848115787646107650 > > gss_resources > i=0 > gr_id :5848115787646107649 > gr_res :0x28203060 > /usr/src/usr.sbin/gssd/gssd.c:307 argp->name > /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 > /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 > 0 resources allocated > 1 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > > init_sec_context_args > uid: 1001 > cred: 0 > ctx: 0 > name: 5848115787646107650 > req_flags: 5848115787646107650 > > gss_resources > i=0 > gr_id :5848115787646107650 > gr_res :0x28203060 > /usr/src/usr.sbin/gssd/gssd.c:307 argp->name > /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 > /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 > 0 resources allocated > 1 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > > init_sec_context_args > uid: 1001 > cred: 0 > ctx: 0 > name: 5848115787646107651 > req_flags: 5848115787646107650 > > gss_resources > i=0 > gr_id :5848115787646107651 > gr_res :0x28203060 > /usr/src/usr.sbin/gssd/gssd.c:307 argp->name > /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 > /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 > 0 resources allocated > > here is what happens with a kerberos ticket: > > 1 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > > init_sec_context_args > uid: 1001 > cred: 0 > ctx: 0 > name: 5848116041049178113 > req_flags: 5848116041049178114 > > gss_resources > i=0 > gr_id :5848116041049178113 > gr_res :0x28203060 > /usr/src/usr.sbin/gssd/gssd.c:307 argp->name > /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 > /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 > 2 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED > 1 resources allocated > 0 resources allocated > 1 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > > init_sec_context_args > uid: 1001 > cred: 0 > ctx: 0 > name: 5848116041049178115 > req_flags: 5848116041049178114 > > gss_resources > i=0 > gr_id :5848116041049178115 > gr_res :0x28203060 > /usr/src/usr.sbin/gssd/gssd.c:307 argp->name > /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 > /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 > 2 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED > 1 resources allocated > 0 resources allocated > 1 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > > init_sec_context_args > uid: 1001 > cred: 0 > ctx: 0 > name: 5848116041049178117 > req_flags: 5848116041049178114 > > gss_resources > i=0 > gr_id :5848116041049178117 > gr_res :0x28203060 > /usr/src/usr.sbin/gssd/gssd.c:307 argp->name > /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 > /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 > 2 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED > 1 resources allocated > 0 resources allocated > The above looks reasonable. Once the gssd_init_sec_context() replies GSS_S_CONTINUE_NEEDED to the kernel with the token that has the session ticket in it, then... - The NFS client sends a Null RPC to the server with an authenticator of type RPCSEC_GSS with version: 1 RPCSEC_GSS_INIT and the token as data on the Null RPC (a Null RPC woundn't normally have any data) - the server processes this via an upcall to its gssd for gssd_accept_sec_context() - this would get a reply of GSS_S_COMPLETE I think (although there may be cases where a GSS_S_CONTINUE_NEEDED occurs and there is another cycle of Null RPC token passing) - this would result in a reply to the Null RPC with an RPCSEC_GSS authenticator with roughly: - a credential handle (shorthand bits for the user principal) - GSS_S_COMPLETE status - it has been a while, so I can't remember for sure, but I think a successful reply includes a token that is passed up to the gssd via gssd_init_sec_context() and then it would return GSS_S_COMPLETE when the token is processed. My guess is that it is the server that is replying with some failure status GSS_S_xxx. At this point, you can either try and look at the Null RPC in wireshark or add printfs to the server's gssd.c for gssd_accept_sec_context() and gssd_acquire_cred(). Basically, the above looks ok, but the rest of the handshake that generates a credential handle via the Kerberos session ticket hasn't happened. Good luck with it, rick > here is what I have changed: > > --- gssd.c.orig 2013-02-23 11:13:20.000000000 +0100 > +++ gssd.c 2013-02-23 12:34:33.000000000 +0100 > @@ -238,6 +238,33 @@ > return (TRUE); > } > > +static void > +dump_resources(FILE *s) > +{ > + struct gss_resource *gr; > + int i; > + > + fprintf(s, "> gss_resources\n"); > + > + i = 0; > + LIST_FOREACH(gr, &gss_resources, gr_link) { > + fprintf(s, "i=%d\n", i); > + fprintf(s, "gr_id :%llu\n", gr->gr_id); > + fprintf(s, "gr_res :%p\n", gr->gr_res); > + } > +} > + > +void > +dump_init_sec_context_args(FILE *s, init_sec_context_args *p) > +{ > + fprintf(s, "> init_sec_context_args\n"); > + fprintf(s, "uid: %d\n", p->uid); > + fprintf(s, "cred: %llu\n", p->cred); > + fprintf(s, "ctx: %llu\n", p->ctx); > + fprintf(s, "name: %llu\n", p->name); > + fprintf(s, "req_flags: %llu\n", p->req_flags); > +} > + > bool_t > gssd_init_sec_context_1_svc(init_sec_context_args *argp, > init_sec_context_res *result, struct svc_req *rqstp) > { > @@ -248,27 +275,42 @@ > > snprintf(ccname, sizeof(ccname), "FILE:/tmp/krb5cc_%d", > (int) argp->uid); > + > + printf("%s:%d %s\n", __FILE__, __LINE__, ccname); > + dump_init_sec_context_args(stdout, argp); > + dump_resources(stdout); > + > setenv("KRB5CCNAME", ccname, TRUE); > > memset(result, 0, sizeof(*result)); > if (argp->cred) { > + printf("%s:%d argp->cred\n", __FILE__, __LINE__); > cred = gssd_find_resource(argp->cred); > + printf("%s:%d cred=%llu\n", __FILE__, __LINE__, cred); > if (!cred) { > result->major_status = GSS_S_CREDENTIALS_EXPIRED; > + printf("%s:%d GSS_S_CREDENTIALS_EXPIRED\n", __FILE__, __LINE__); > return (TRUE); > } > } > if (argp->ctx) { > + printf("%s:%d argp->ctx\n", __FILE__, __LINE__); > ctx = gssd_find_resource(argp->ctx); > + printf("%s:%d ctx=%llu\n", __FILE__, __LINE__, ctx); > if (!ctx) { > result->major_status = GSS_S_CONTEXT_EXPIRED; > + printf("%s:%d GSS_S_CONTEXT_EXPIRED\n", __FILE__, __LINE__); > return (TRUE); > } > } > if (argp->name) { > + printf("%s:%d argp->name\n", __FILE__, __LINE__); > name = gssd_find_resource(argp->name); > + printf("%s:%d name=%llu\n", __FILE__, __LINE__, name); > + printf("%s:%d name=%p\n", __FILE__, __LINE__, name); > if (!name) { > result->major_status = GSS_S_BAD_NAME; > + printf("%s:%d GSS_S_BAD_NAME\n", __FILE__, __LINE__); > return (TRUE); > } > } > @@ -286,6 +328,11 @@ > result->ctx = argp->ctx; > else > result->ctx = gssd_make_resource(ctx); > + > + if (result->major_status == GSS_S_COMPLETE) > + printf("%s:%d GSS_S_COMPLETE\n", __FILE__, __LINE__); > + else > + printf("%s:%d GSS_S_CONTINUE_NEEDED\n", __FILE__, __LINE__); > } > > return (TRUE); > > Ideas? > > Thank you, > Momchil From owner-freebsd-fs@FreeBSD.ORG Sat Feb 23 15:57:16 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 94AB583D for ; Sat, 23 Feb 2013 15:57:16 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 49AE92A0 for ; Sat, 23 Feb 2013 15:57:15 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAP7kKFGDaFvO/2dsb2JhbABFhk+4KYJagR9zgh8BAQQBIwRSBRYYAgINGQJZBoggBq0hhBCNboEjjDSBAzQHgi2BEwOIaY1UkGWDJYIJ X-IronPort-AV: E=Sophos;i="4.84,721,1355115600"; d="scan'208";a="15447359" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 23 Feb 2013 10:57:09 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4B7E3B3F1B; Sat, 23 Feb 2013 10:57:09 -0500 (EST) Date: Sat, 23 Feb 2013 10:57:09 -0500 (EST) From: Rick Macklem To: Momchil Ivanov Message-ID: <448938470.3234495.1361635029255.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <86hal3kzp8.wl%momchil@xaxo.eu> Subject: Re: NFS + Kerberos MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, Elias Martenson X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Feb 2013 15:57:16 -0000 Momchil Ivanov wrote: > At Fri, 22 Feb 2013 19:04:23 -0500 (EST), > Rick Macklem wrote: > > You can run "gssd -d -d" and it will run in foreground and print > > out messages related to resource allocation. This isn't much use, > > except to tell you that it is doing something. (Adding a "verbose" > > option is on my "to do" list, but I don't have any code at this > > time. > > If someone wants to do this, I think it would be great.) > > > > If you do this, don't have it started at boot (gssd_enable="NO" in > > /etc/rc.conf) and then do the above command as root in a window > > before attempting the mount command. > > > > Beyond that, you could add printfs to gssd.c. The main client side > > function is gssd_init_sec_context(), which should get the Kerberos > > ticket for a user via their TGT. > > well, the server doesn't seem to start it at boot with > gssd_enable="YES", I don't know why, but I cannot stop/restart nfsd > until I manually start gssd :) the client starts it at boot, though > > note: I can ssh into the server even when gssd is not running, I don't > know if this is expected. > > "gssd -d -d" prints things like this on the client and the server: > > 1 resources allocated > 2 resources allocated > 1 resources allocated > 0 resources allocated > 1 resources allocated > 2 resources allocated > 1 resources allocated > 0 resources allocated > 1 resources allocated > 2 resources allocated > 1 resources allocated > 0 resources allocated > > which doesn't tell me anything :) so here is what happens on the > client without a kerberos ticket: > > 1 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > > init_sec_context_args > uid: 1001 > cred: 0 > ctx: 0 > name: 5848115787646107649 > req_flags: 5848115787646107650 > > gss_resources > i=0 > gr_id :5848115787646107649 > gr_res :0x28203060 > /usr/src/usr.sbin/gssd/gssd.c:307 argp->name > /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 > /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 > 0 resources allocated > 1 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > > init_sec_context_args > uid: 1001 > cred: 0 > ctx: 0 > name: 5848115787646107650 > req_flags: 5848115787646107650 > > gss_resources > i=0 > gr_id :5848115787646107650 > gr_res :0x28203060 > /usr/src/usr.sbin/gssd/gssd.c:307 argp->name > /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 > /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 > 0 resources allocated > 1 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > > init_sec_context_args > uid: 1001 > cred: 0 > ctx: 0 > name: 5848115787646107651 > req_flags: 5848115787646107650 > > gss_resources > i=0 > gr_id :5848115787646107651 > gr_res :0x28203060 > /usr/src/usr.sbin/gssd/gssd.c:307 argp->name > /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 > /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 > 0 resources allocated > > here is what happens with a kerberos ticket: > > 1 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > > init_sec_context_args > uid: 1001 > cred: 0 > ctx: 0 > name: 5848116041049178113 > req_flags: 5848116041049178114 > > gss_resources > i=0 > gr_id :5848116041049178113 > gr_res :0x28203060 > /usr/src/usr.sbin/gssd/gssd.c:307 argp->name > /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 > /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 > 2 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED > 1 resources allocated > 0 resources allocated > 1 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > > init_sec_context_args > uid: 1001 > cred: 0 > ctx: 0 > name: 5848116041049178115 > req_flags: 5848116041049178114 > > gss_resources > i=0 > gr_id :5848116041049178115 > gr_res :0x28203060 > /usr/src/usr.sbin/gssd/gssd.c:307 argp->name > /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 > /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 > 2 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED > 1 resources allocated > 0 resources allocated > 1 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:279 FILE:/tmp/krb5cc_1001 > > init_sec_context_args > uid: 1001 > cred: 0 > ctx: 0 > name: 5848116041049178117 > req_flags: 5848116041049178114 > > gss_resources > i=0 > gr_id :5848116041049178117 > gr_res :0x28203060 > /usr/src/usr.sbin/gssd/gssd.c:307 argp->name > /usr/src/usr.sbin/gssd/gssd.c:309 name=673198176 > /usr/src/usr.sbin/gssd/gssd.c:310 name=0x28203060 > 2 resources allocated > /usr/src/usr.sbin/gssd/gssd.c:335 GSS_S_CONTINUE_NEEDED > 1 resources allocated > 0 resources allocated > In the last post, I forgot to mention... RFC2203 describes what the stuff in the Null RPCs looks like and it isn't a particularily large or hard to read RFC, so you might want to take a look at it. rick > here is what I have changed: > > --- gssd.c.orig 2013-02-23 11:13:20.000000000 +0100 > +++ gssd.c 2013-02-23 12:34:33.000000000 +0100 > @@ -238,6 +238,33 @@ > return (TRUE); > } > > +static void > +dump_resources(FILE *s) > +{ > + struct gss_resource *gr; > + int i; > + > + fprintf(s, "> gss_resources\n"); > + > + i = 0; > + LIST_FOREACH(gr, &gss_resources, gr_link) { > + fprintf(s, "i=%d\n", i); > + fprintf(s, "gr_id :%llu\n", gr->gr_id); > + fprintf(s, "gr_res :%p\n", gr->gr_res); > + } > +} > + > +void > +dump_init_sec_context_args(FILE *s, init_sec_context_args *p) > +{ > + fprintf(s, "> init_sec_context_args\n"); > + fprintf(s, "uid: %d\n", p->uid); > + fprintf(s, "cred: %llu\n", p->cred); > + fprintf(s, "ctx: %llu\n", p->ctx); > + fprintf(s, "name: %llu\n", p->name); > + fprintf(s, "req_flags: %llu\n", p->req_flags); > +} > + > bool_t > gssd_init_sec_context_1_svc(init_sec_context_args *argp, > init_sec_context_res *result, struct svc_req *rqstp) > { > @@ -248,27 +275,42 @@ > > snprintf(ccname, sizeof(ccname), "FILE:/tmp/krb5cc_%d", > (int) argp->uid); > + > + printf("%s:%d %s\n", __FILE__, __LINE__, ccname); > + dump_init_sec_context_args(stdout, argp); > + dump_resources(stdout); > + > setenv("KRB5CCNAME", ccname, TRUE); > > memset(result, 0, sizeof(*result)); > if (argp->cred) { > + printf("%s:%d argp->cred\n", __FILE__, __LINE__); > cred = gssd_find_resource(argp->cred); > + printf("%s:%d cred=%llu\n", __FILE__, __LINE__, cred); > if (!cred) { > result->major_status = GSS_S_CREDENTIALS_EXPIRED; > + printf("%s:%d GSS_S_CREDENTIALS_EXPIRED\n", __FILE__, __LINE__); > return (TRUE); > } > } > if (argp->ctx) { > + printf("%s:%d argp->ctx\n", __FILE__, __LINE__); > ctx = gssd_find_resource(argp->ctx); > + printf("%s:%d ctx=%llu\n", __FILE__, __LINE__, ctx); > if (!ctx) { > result->major_status = GSS_S_CONTEXT_EXPIRED; > + printf("%s:%d GSS_S_CONTEXT_EXPIRED\n", __FILE__, __LINE__); > return (TRUE); > } > } > if (argp->name) { > + printf("%s:%d argp->name\n", __FILE__, __LINE__); > name = gssd_find_resource(argp->name); > + printf("%s:%d name=%llu\n", __FILE__, __LINE__, name); > + printf("%s:%d name=%p\n", __FILE__, __LINE__, name); > if (!name) { > result->major_status = GSS_S_BAD_NAME; > + printf("%s:%d GSS_S_BAD_NAME\n", __FILE__, __LINE__); > return (TRUE); > } > } > @@ -286,6 +328,11 @@ > result->ctx = argp->ctx; > else > result->ctx = gssd_make_resource(ctx); > + > + if (result->major_status == GSS_S_COMPLETE) > + printf("%s:%d GSS_S_COMPLETE\n", __FILE__, __LINE__); > + else > + printf("%s:%d GSS_S_CONTINUE_NEEDED\n", __FILE__, __LINE__); > } > > return (TRUE); > > Ideas? > > Thank you, > Momchil From owner-freebsd-fs@FreeBSD.ORG Sat Feb 23 23:54:47 2013 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A9A58F40 for ; Sat, 23 Feb 2013 23:54:47 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id 21E9CA9F for ; Sat, 23 Feb 2013 23:54:46 +0000 (UTC) Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au [211.29.132.194]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r1NNsdGe013894 for ; Sun, 24 Feb 2013 10:54:39 +1100 Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r1NNsULp026737 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 24 Feb 2013 10:54:31 +1100 Date: Sun, 24 Feb 2013 10:54:30 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov Subject: Re: cleaning files beyond EOF In-Reply-To: <20130217074832.GA2598@kib.kiev.ua> Message-ID: <20130224093909.Y920@besplex.bde.org> References: <20130217113031.N9271@besplex.bde.org> <20130217055528.GB2522@kib.kiev.ua> <20130217172928.C1900@besplex.bde.org> <20130217074832.GA2598@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Auu2R5BP c=1 sm=1 a=xK1pj5J4f3QA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=GlckP5_kgdUA:10 a=s1h70ublos-9_-jcX2sA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Feb 2013 23:54:47 -0000 On Sun, 17 Feb 2013, Konstantin Belousov wrote: > On Sun, Feb 17, 2013 at 06:01:50PM +1100, Bruce Evans wrote: >> On Sun, 17 Feb 2013, Konstantin Belousov wrote: >> >>> On Sun, Feb 17, 2013 at 11:33:58AM +1100, Bruce Evans wrote: >>>> I have a (possibly damaged) ffs data block with nonzero data beyond >>>> EOF. Is anything responsible for clearing this data when the file >>>> is mmapped()? >>>> >>>> At least old versions of gcc mmap() the file and have a bug checking >>>> for EOF. They read the garbage beyond the end and get confused. >>> >>> Does the 'damaged' status of the data block mean that it contain the >>> garbage after EOF on disk ? >> >> Yes, it's at most software damage. I used a broken version of >> vfs_bio_clrbuf() for a long time and it probably left some unusual >> blocks. This matters suprisingly rarely. > I recently had to modify the vfs_bio_clrbuf(). For me, a bug in the > function did matter a lot, because the function is used, in particular, > to clear the indirect blocks. The bug caused quite random filesystem > failures until I figured it out. My version of vfs_bio_clrbuf() is > at the end of the message, it avoids accessing b_data. This will take me a long time to understand. Indirect blocks seemed to be broken to me too. clrbuf() in 4.4BSD was just a simple bzero() of the entire buffer followed by an (IMO bogus) setting of b_resid to 0. It was used mainly to allocate blocks, including indirect blocks in ffs. Now most file systems use vfs_bio_clrbuf() instead. hpfs, ntfs and udf still use clrbuf(), since they apparently didn't understand FreeBSD APIs when they were implemented and they are too new to have been changed by the global sweep to change to vfs_bio_clrbuf(). But vfs_bio_clrbuf() has obscure semantics which seem to be different from those of clrbuf(). It reduces to clrbuf() in the non-VMIO case. In the VMIO case it only clears the previously "invalid" portions of the buffer. I don't see why "valid" implies zero, and testing showed that it didn't. Cases where the buffer ended up not all zero were rare and seemed to be only for indrect blocks in ffs. Also, the complexity of vfs_bio_clrbuf() is bogus IMO. The allocation will be followed by a physical write, to avoiding re-zeroing parts of the buffer is an insignificant optimization even if these parts are most of the buffer. >> I forgot to mention that this is with an old version of FreeBSD, >> where I changed vfs_bio.c a lot but barely touched vm. >> >>> UFS uses a small wrapper around vnode_generic_getpages() as the >>> VOP_GETPAGES(), the wrapping code can be ignored for the current >>> purpose. >>> >>> vnode_generic_getpages() iterates over the the pages after the bstrategy() >>> and marks the part of the page after EOF valid and zeroes it, using >>> vm_page_set_valid_range(). >> >> The old version has a large non-wrapper in ffs, and vnode_generic_getpages() >> uses vm_page_set_validclean(). Maybe the bug is just in the old >> ffs_getpages(). It seems to do only DEV_BSIZE'ed zeroing stuff. It >> begins with the same "We have to zero that data" code that forms most >> of the wrapper in the current version. It normally only returns >> vnode_pager_generic_getpages() after that if bsize < PAGE_SIZE. >> However, my version has a variable which I had forgotten about to >> control this, and the forgotten setting of this variable results in >> always using vnode_pager_generic_getpages(), as in -current. I probably >> copied some fixes in -current for this. So the bug can't be just in >> ffs_getpages(). >> >> The "damaged" block is at the end of vfs_default.c. The file size is >> 25 * PAGE_SIZE + 16. It is in 7 16K blocks, 2 full 2K frags, and 1 frag >> with 16 bytes valid in it. > But the ffs_getpages() might be indeed the culprit. It calls > vm_page_zero_invalid(), which only has DEV_BSIZE granularity. I think > that ffs_getpages() also should zero the after eof part of the last page > of the file to fix your damage, since device read cannot read less than > DEV_BSIZE. Is that for the old ffs_getpages()? I think it clears the DEV_BSIZE sub-blocks in the page starting at the first "invalid" one. For my file's layout, that is starting at offset DEV_BSIZE, since the sub-block at offset 0 has 16 bytes valid in it and of course the whole sub-block is valid at the VMIO level. I tried to verify this by checking that the unzeroed bytes were from offset 16 to 511, but unfortunately the problem went away soon after I wrote the first mail about this :-). It was very reproducible until then. I checked using fsdb that the data block is still unzeroed. > diff --git a/sys/ufs/ffs/ffs_vnops.c b/sys/ufs/ffs/ffs_vnops.c > index ef6194c..4240b78 100644 > --- a/sys/ufs/ffs/ffs_vnops.c > +++ b/sys/ufs/ffs/ffs_vnops.c > @@ -844,9 +844,9 @@ static int > ffs_getpages(ap) > struct vop_getpages_args *ap; > { > - int i; > vm_page_t mreq; > - int pcount; > + uint64_t size; > + int i, pcount; > > pcount = round_page(ap->a_count) / PAGE_SIZE; > mreq = ap->a_m[ap->a_reqpage]; > @@ -861,6 +861,9 @@ ffs_getpages(ap) > if (mreq->valid) { > if (mreq->valid != VM_PAGE_BITS_ALL) > vm_page_zero_invalid(mreq, TRUE); > + size = VTOI(ap->a_vp)->i_size; > + if (mreq->pindex == OFF_TO_IDX(size)) > + pmap_zero_page_area(mreq, size & PAGE_MASK, PAGE_SIZE); > for (i = 0; i < pcount; i++) { > if (i != ap->a_reqpage) { > vm_page_lock(ap->a_m[i]); I saw you later mail with this fix fixed (clearing PAGE_SIZE is too much). I think I would write it without the pindex check: off = VTOI(ap->a_vp)->i_size & PAGE_MASK; if (off != 0) pmap_zero_page_area(mreq, off, PAGE_SIZE - off); > > On the other hand, it is not clear should we indeed protect against such > case, or just declare the disk data broken. Then file systems written by different, less strict implementations of ffs would not work. It is unclear if ffs is specified to clear bytes beyond the end when writing files. But I think it should clear up to the next fragment boundary. After that, it is vm's responsibility to clear. Bruce