From owner-freebsd-fs@FreeBSD.ORG Sun Sep 30 11:06:52 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 73387106564A for ; Sun, 30 Sep 2012 11:06:52 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.78]) by mx1.freebsd.org (Postfix) with ESMTP id ED0B18FC0A for ; Sun, 30 Sep 2012 11:06:51 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1TIHMC-0008DV-IN for freebsd-fs@freebsd.org; Sun, 30 Sep 2012 13:06:44 +0200 Received: from dhcp-077-251-055-099.chello.nl ([77.251.55.99] helo=pinky) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1TIHMB-0006Tj-Ol for freebsd-fs@freebsd.org; Sun, 30 Sep 2012 13:06:43 +0200 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org References: Date: Sun, 30 Sep 2012 13:06:43 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: User-Agent: Opera Mail/12.02 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: 0.0 X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_50 autolearn=disabled version=3.2.5 X-Scan-Signature: 98cd051f671ee36aeaf0d6c34a549736 Subject: Re: Can't remove zil / separate log device from root pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Sep 2012 11:06:52 -0000 On Fri, 28 Sep 2012 17:43:04 +0200, Olivier Smedts wrote: > Hello, > > Some time ago I added a separate log device to my root pool (with the > "bootfs" unset hack) but now I can't remove it from my pool : > > # uname -a > FreeBSD zozo.afpicl.lan 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r241027: > Fri Sep 28 14:43:13 CEST 2012 > root@zozo.afpicl.lan:/usr/obj/usr/src/sys/CORE amd64 > root@zozo:/root# zpool status > pool: tank > state: ONLINE > status: The pool is formatted using a legacy on-disk format. The pool > can > still be used, but some features are unavailable. > action: Upgrade the pool using 'zpool upgrade'. Once this is done, the > pool will no longer be accessible on software that does not > support feature > flags. > scan: resilvered 7,27M in 0h0m with 0 errors on Mon Sep 17 12:43:16 > 2012 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > gpt/disk1 ONLINE 0 0 0 > gpt/disk0 ONLINE 0 0 0 > logs > gpt/zil ONLINE 0 0 0 > > errors: No known data errors > root@zozo:/root# time zpool remove tank gpt/zil && echo "ok !" > 0.000u 0.008s 0:02.61 0.0% 0+0k 0+1io 0pf+0w > ok ! > root@zozo:/root# zpool status > pool: tank > state: ONLINE > status: The pool is formatted using a legacy on-disk format. The pool > can > still be used, but some features are unavailable. > action: Upgrade the pool using 'zpool upgrade'. Once this is done, the > pool will no longer be accessible on software that does not > support feature > flags. > scan: resilvered 7,27M in 0h0m with 0 errors on Mon Sep 17 12:43:16 > 2012 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > gpt/disk1 ONLINE 0 0 0 > gpt/disk0 ONLINE 0 0 0 > logs > gpt/zil ONLINE 0 0 0 > > errors: No known data errors > > I tried unsetting bootfs before removing the log device. Not better. > I tried importing my pool with a 9.1-RC1 liveCD, removing the log > device. The same thing happens. > I tried by physically detaching the log device from my computer. The > system can't mount root. With the 9.1-RC1 liveCD, I can import the > pool with "-m" but then I still can't remove the log device from the > pool. > > Any idea on what I could do ? > > Thanks > What version is your zpool/zfs? Mind the message about upgrading in your output above. Not all version of ZFS support removing a log or cache disk. Run 'zpool upgrade' and 'zfs upgrade'. They will only print the current version. They need more options for real upgrading. Ronald. From owner-freebsd-fs@FreeBSD.ORG Sun Sep 30 12:24:09 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 096AF106564A; Sun, 30 Sep 2012 12:24:09 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 679878FC0A; Sun, 30 Sep 2012 12:24:07 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q8UCOFeD081382; Sun, 30 Sep 2012 15:24:15 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q8UCO3JS029631; Sun, 30 Sep 2012 15:24:03 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q8UCO3a7029630; Sun, 30 Sep 2012 15:24:03 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 30 Sep 2012 15:24:03 +0300 From: Konstantin Belousov To: Pawel Jakub Dawidek Message-ID: <20120930122403.GB35915@deviant.kiev.zoral.com.ua> References: <505DB4E6.8030407@smeets.im> <20120924224606.GE79077@ithaqua.etoilebsd.net> <20120925090840.GD35915@deviant.kiev.zoral.com.ua> <20120929154101.GK1402@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="N+dhEFW7Y2Uiel/w" Content-Disposition: inline In-Reply-To: <20120929154101.GK1402@garage.freebsd.pl> User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, avg@freebsd.org Subject: Re: panic: _sx_xlock_hard: recursed on non-recursive sx zfsvfs->z_hold_mtx[i] @ ...cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1407 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Sep 2012 12:24:09 -0000 --N+dhEFW7Y2Uiel/w Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Sep 29, 2012 at 05:41:02PM +0200, Pawel Jakub Dawidek wrote: > On Tue, Sep 25, 2012 at 12:08:40PM +0300, Konstantin Belousov wrote: > > On Tue, Sep 25, 2012 at 12:46:07AM +0200, Baptiste Daroussin wrote: > > > Hi, > > >=20 > > > I have the exact same problem making: tinderbox and poudriere highly > > > unusable. > > > > > > This is really problematic because pointyhat also rely on nullfs and > > > zfs, which means we can't upgrade the building nodes if we need to for > > > example. > > > > > > regards, Bapt > >=20 > > This is zfs bug. Filesystems shall not call getnewvnode() while holding > > internal locks. At least not the locks which are needed during reclaim. > > Nullfs changes amplified the probability of the problematic situation, > > since now nullfs vnodes are indeed cached instead of being recreated > > on each access, so the overall count of used vnodes could be twice as > > high. > >=20 > > You might try to increase the kern.maxvnodes to reduce the probability = of > > the recursive calls into vnlnru() from getnewvnode(). But for real, bug > > needs to be fixed in zfs. >=20 > With all FreeBSD's VFS constraints, it is really hard to breath, > especially within file system that was not designed with our VFS > complexity in mind. >=20 > For example it would be nice of VFS to not reclaim vnodes from > getnewvnode() and leave this task entirely to the vnlru process. > It is pretty obvious layering violation to me - file system code needs > new vnode, it calls VFS routine to allocate one, which then calls file > system again to reclaim one of its vnodes. The postponing of the reclaim when vnode reserve goes low to the vnlru process does not solve anything, since you only change the recursion into the deadlock. I discussed an approach for this issue with avg. Basic idea is presented in the untested patch below. You can specify that some count of the free vnodes must be present for some dynamic scope, started by getnewvnode_reserve() function. While staying inside the reserved pool, getnewvnode() calls would not recurse into vnlru(). The scope is finished with getnewvnode_drop_reserve(). The getnewvnode_reserve() shall be called while no locks are held. What do you thing ? >=20 > It would also be nice to handle EAGAIN from VOP_RECLAIM(). Currently we > panic on error. This would be useful to return if some of the locks > cannot be acquired immediately. ZFS reclaim already discovers potential > deadlocks and defer some reclamation portion to separate thread. diff --git a/sys/kern/subr_trap.c b/sys/kern/subr_trap.c index 24960fd..66e3201 100644 --- a/sys/kern/subr_trap.c +++ b/sys/kern/subr_trap.c @@ -154,6 +154,8 @@ userret(struct thread *td, struct trapframe *frame) ("userret: Returning with sleep disabled")); KASSERT(td->td_pinned =3D=3D 0, ("userret: Returning with with pinned thread")); + KASSERT(td->td_vp_reserv =3D=3D 0, + ("userret: Returning while holding vnode reservation")); #ifdef VIMAGE /* Unfortunately td_vnet_lpush needs VNET_DEBUG. */ VNET_ASSERT(curvnet =3D=3D NULL, diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 5c781c2..0de53f0 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -935,34 +935,22 @@ vtryrecycle(struct vnode *vp) } =20 /* - * Return the next vnode from the free list. + * Wait for available vnodes. */ -int -getnewvnode(const char *tag, struct mount *mp, struct vop_vector *vops, - struct vnode **vpp) +static int +getnewvnode_wait(int suspended) { - struct vnode *vp =3D NULL; - struct bufobj *bo; =20 - CTR3(KTR_VFS, "%s: mp %p with tag %s", __func__, mp, tag); - mtx_lock(&vnode_free_list_mtx); - /* - * Lend our context to reclaim vnodes if they've exceeded the max. - */ - if (freevnodes > wantfreevnodes) - vnlru_free(1); - /* - * Wait for available vnodes. - */ + mtx_assert(&vnode_free_list_mtx, MA_OWNED); if (numvnodes > desiredvnodes) { - if (mp !=3D NULL && (mp->mnt_kern_flag & MNTK_SUSPEND)) { + if (suspended) { /* * File system is beeing suspended, we cannot risk a * deadlock here, so allocate new vnode anyway. */ if (freevnodes > wantfreevnodes) vnlru_free(freevnodes - wantfreevnodes); - goto alloc; + return (0); } if (vnlruproc_sig =3D=3D 0) { vnlruproc_sig =3D 1; /* avoid unnecessary wakeups */ @@ -970,16 +958,75 @@ getnewvnode(const char *tag, struct mount *mp, struct= vop_vector *vops, } msleep(&vnlruproc_sig, &vnode_free_list_mtx, PVFS, "vlruwk", hz); -#if 0 /* XXX Not all VFS_VGET/ffs_vget callers check returns. */ - if (numvnodes > desiredvnodes) { - mtx_unlock(&vnode_free_list_mtx); - return (ENFILE); + } + return (numvnodes > desiredvnodes ? ENFILE : 0); +} + +void +getnewvnode_reserve(u_int count) +{ + struct thread *td; + + td =3D curthread; + mtx_lock(&vnode_free_list_mtx); + while (count > 0) { + if (getnewvnode_wait(0) =3D=3D 0) { + count--; + td->td_vp_reserv++; + numvnodes++; } -#endif } -alloc: + mtx_unlock(&vnode_free_list_mtx); +} + +void +getnewvnode_drop_reserve(void) +{ + struct thread *td; + + td =3D curthread; + mtx_lock(&vnode_free_list_mtx); + KASSERT(numvnodes >=3D td->td_vp_reserv, ("reserve too large")); + numvnodes -=3D td->td_vp_reserv; + mtx_unlock(&vnode_free_list_mtx); + td->td_vp_reserv =3D 0; +} + +/* + * Return the next vnode from the free list. + */ +int +getnewvnode(const char *tag, struct mount *mp, struct vop_vector *vops, + struct vnode **vpp) +{ + struct vnode *vp; + struct bufobj *bo; + struct thread *td; + int error; + + CTR3(KTR_VFS, "%s: mp %p with tag %s", __func__, mp, tag); + vp =3D NULL; + td =3D curthread; + if (td->td_vp_reserv > 0) { + td->td_vp_reserv -=3D 1; + goto alloc; + } + mtx_lock(&vnode_free_list_mtx); + /* + * Lend our context to reclaim vnodes if they've exceeded the max. + */ + if (freevnodes > wantfreevnodes) + vnlru_free(1); + error =3D getnewvnode_wait(mp !=3D NULL && (mp->mnt_kern_flag & + MNTK_SUSPEND)); +#if 0 /* XXX Not all VFS_VGET/ffs_vget callers check returns. */ + if (error !=3D 0) { + mtx_unlock(&vnode_free_list_mtx); + return (error); +#endif numvnodes++; mtx_unlock(&vnode_free_list_mtx); +alloc: vp =3D (struct vnode *) uma_zalloc(vnode_zone, M_WAITOK|M_ZERO); /* * Setup locks. diff --git a/sys/sys/proc.h b/sys/sys/proc.h index 4c4aa2f..0aae0ce 100644 --- a/sys/sys/proc.h +++ b/sys/sys/proc.h @@ -272,6 +272,7 @@ struct thread { struct osd td_osd; /* (k) Object specific data. */ struct vm_map_entry *td_map_def_user; /* (k) Deferred entries. */ pid_t td_dbg_forked; /* (c) Child pid for debugger. */ + u_int td_vp_reserv; /* (k) Count of reserved vnodes. */ #define td_endzero td_sigmask =20 /* Copied during fork1() or create_thread(). */ diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h index 1bde8b9..029458f 100644 --- a/sys/sys/vnode.h +++ b/sys/sys/vnode.h @@ -600,6 +600,8 @@ void cvtstat(struct stat *st, struct ostat *ost); void cvtnstat(struct stat *sb, struct nstat *nsb); int getnewvnode(const char *tag, struct mount *mp, struct vop_vector *vops, struct vnode **vpp); +void getnewvnode_reserve(u_int count); +void getnewvnode_drop_reserve(void); int insmntque1(struct vnode *vp, struct mount *mp, void (*dtr)(struct vnode *, void *), void *dtr_arg); int insmntque(struct vnode *vp, struct mount *mp); --N+dhEFW7Y2Uiel/w Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAlBoOeMACgkQC3+MBN1Mb4hk4QCglh9bUwmlet/PEwgkmQyuGjbN 9PoAn2VmBWDLTwMwOBYYwdo+IYDI+tDp =QYIV -----END PGP SIGNATURE----- --N+dhEFW7Y2Uiel/w-- From owner-freebsd-fs@FreeBSD.ORG Mon Oct 1 00:09:18 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CF4B1106564A; Mon, 1 Oct 2012 00:09:18 +0000 (UTC) (envelope-from prvs=16217704b1=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 3B4CF8FC0C; Mon, 1 Oct 2012 00:09:14 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50000298546.msg; Mon, 01 Oct 2012 01:08:29 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 01 Oct 2012 01:08:29 +0100 (not processed: message from trusted or authenticated source) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=16217704b1=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <206FFD6BAB774B1196F4434F2661438B@multiplay.co.uk> From: "Steven Hartland" To: Date: Mon, 1 Oct 2012 01:08:35 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-hackers@freebsd.org Subject: Looking for testers / feedback for ZFS recieve properties options X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Oct 2012 00:09:18 -0000 We encountered a problem receiving a full ZFS stream from a disk we had backed up. The problem was the receive was aborting due to quota being exceeded so I did some digging around and found that Oracle ZFS now has -x and -o options as documented here:- http://docs.oracle.com/cd/E23824_01/html/821-1462/zfs-1m.html Seems this has been raised as a feature request upstream: https://www.illumos.org/issues/2745 Anyway being stuck with a backup we couldn't restore I had a play a implementing these options and have a prototype up and running, which I'd like feedback on. This patch also adds a -l option which allows the streams to be limited to those specified. Another option which I think would be useful and seemed relatively painless to add. The initial version of the patch which is based off 8.3-RELEASE can be found here: http://blog.multiplay.co.uk/dropzone/freebsd/zfs-recv-properties.patch Any feedback appreciated Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Oct 1 11:07:16 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CCA3D106566B for ; Mon, 1 Oct 2012 11:07:16 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id B58718FC16 for ; Mon, 1 Oct 2012 11:07:16 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q91B7G6I024956 for ; Mon, 1 Oct 2012 11:07:16 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q91B7Gm1024952 for freebsd-fs@FreeBSD.org; Mon, 1 Oct 2012 11:07:16 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 1 Oct 2012 11:07:16 GMT Message-Id: <201210011107.q91B7Gm1024952@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Oct 2012 11:07:16 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/171415 fs [zfs] zfs recv fails with "cannot receive incremental o kern/170945 fs [gpt] disk layout not portable between direct connect o kern/170914 fs [zfs] [patch] Import patchs related with issues 3090 a o kern/170912 fs [zfs] [patch] unnecessarily setting DS_FLAG_INCONSISTE o bin/170778 fs [zfs] [panic] FreeBSD panics randomly o kern/170680 fs [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA o kern/170497 fs [xfs][panic] kernel will panic whenever I ls a mounted o kern/170238 fs [zfs] [panic] Panic when deleting data o kern/169945 fs [zfs] [panic] Kernel panic while importing zpool (afte o kern/169480 fs [zfs] ZFS stalls on heavy I/O o kern/169398 fs [zfs] Can't remove file with permanent error o kern/169339 fs panic while " : > /etc/123" o kern/169319 fs [zfs] zfs resilver can't complete o kern/168947 fs [nfs] [zfs] .zfs/snapshot directory is messed up when o kern/168942 fs [nfs] [hang] nfsd hangs after being restarted (not -HU o kern/168158 fs [zfs] incorrect parsing of sharenfs options in zfs (fs o kern/167979 fs [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste o kern/167977 fs [smbfs] mount_smbfs results are differ when utf-8 or U o kern/167688 fs [fusefs] Incorrect signal handling with direct_io o kern/167685 fs [zfs] ZFS on USB drive prevents shutdown / reboot o kern/167612 fs [portalfs] The portal file system gets stuck inside po o kern/167272 fs [zfs] ZFS Disks reordering causes ZFS to pick the wron o kern/167260 fs [msdosfs] msdosfs disk was mounted the second time whe o kern/167109 fs [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene o kern/167105 fs [nfs] mount_nfs can not handle source exports wiht mor o kern/167067 fs [zfs] [panic] ZFS panics the server o kern/167066 fs [zfs] ZVOLs not appearing in /dev/zvol o kern/167065 fs [zfs] boot fails when a spare is the boot disk o kern/167048 fs [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF o kern/166912 fs [ufs] [panic] Panic after converting Softupdates to jo o kern/166851 fs [zfs] [hang] Copying directory from the mounted UFS di o kern/166477 fs [nfs] NFS data corruption. o kern/165950 fs [ffs] SU+J and fsck problem o kern/165923 fs [nfs] Writing to NFS-backed mmapped files fails if flu o kern/165521 fs [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31 o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/162362 fs [snapshots] [panic] ufs with snapshot(s) panics when g o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo p kern/161897 fs [zfs] [patch] zfs partition probing causing long delay o kern/161864 fs [ufs] removing journaling from UFS partition fails on o bin/161807 fs [patch] add option for explicitly specifying metadata o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic o kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs p kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o bin/153142 fs [zfs] ls -l outputs `ls: ./.zfs: Operation not support o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " p kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118318 fs [nfs] NFS server hangs under special circumstances o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 289 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Oct 1 13:58:03 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 25CED106564A for ; Mon, 1 Oct 2012 13:58:03 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) by mx1.freebsd.org (Postfix) with ESMTP id D5D018FC14 for ; Mon, 1 Oct 2012 13:58:02 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1TIgVR-0008AB-KK for freebsd-fs@freebsd.org; Mon, 01 Oct 2012 15:57:57 +0200 Received: from dyn1212-46.wlan.ic.ac.uk ([129.31.212.46]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 01 Oct 2012 15:57:57 +0200 Received: from johannes by dyn1212-46.wlan.ic.ac.uk with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 01 Oct 2012 15:57:57 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Johannes Totz Date: Mon, 01 Oct 2012 14:57:28 +0100 Lines: 32 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: dyn1212-46.wlan.ic.ac.uk User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 Subject: zfs estimated size off X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Oct 2012 13:58:03 -0000 Hi, The estimated size by zfs send is always far off (for me). For example: #zfs list -t all -r panzer/home/jo/hgrepos NAME USED REFER panzer/home/jo/hgrepos 31.2G 29.4G [...] panzer/home/jo/hgrepos@fredsync-120914 202M 29.3G panzer/home/jo/hgrepos@fredsync-121001 0 29.4G #zfs send -Rv -I @fredsync-120914 panzer/home/jo/hgrepos@fredsync-121001 send from @fredsync-120914 to panzer/home/jo/hgrepos@fredsync-121001 estimated size is 494M total estimated size is 494M TIME SENT SNAPSHOT 14:31:36 2.68M panzer/home/jo/hgrepos@fredsync-121001 [...] 14:36:56 791M panzer/home/jo/hgrepos@fredsync-121001 This is on: 9-stable, r239951. I don't use that estimate so it's not a problem for me. Just thought this is weird. Johannes From owner-freebsd-fs@FreeBSD.ORG Mon Oct 1 17:28:15 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C3E89106566B for ; Mon, 1 Oct 2012 17:28:15 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 49CB98FC14 for ; Mon, 1 Oct 2012 17:28:15 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id b5so5325351lbd.13 for ; Mon, 01 Oct 2012 10:28:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=dNTfLvKI+oS9SuOCqY2IreJnBwUyIxY/NQFAVnD8/NM=; b=c5NdrJzB9i/yR/mVOi0xDQ7DqrRxOh223NxeiLDzIVzoHq18q1Rm/Wlo22qsh5T6hl nKcn1OjlViN804oBqd3Eh8jTJAmzzimHsmpBk9KxuWPWY/C1V0SVo13I81jqTBrYDZG/ a5mHSgAsGCg3ZXKA8EhgzXWIID+7FGLLmIKzlmUmQlIG9mn5A8vJGpVyEp9f0+BC0ERC WTYqg3uijIECrkPGCECAJH68mLdz7SCioOEG+AQNuUm79/t6GoYbnHddJouprsmQ082P tQmN42Nz1Lp9eP6ulgf5KYIR1kAsTBr/3Jqd/x/oOBoRlONktHhe1vHqH0zYby0NhJLn izSg== MIME-Version: 1.0 Received: by 10.152.106.237 with SMTP id gx13mr12161003lab.46.1349112494910; Mon, 01 Oct 2012 10:28:14 -0700 (PDT) Received: by 10.114.23.230 with HTTP; Mon, 1 Oct 2012 10:28:14 -0700 (PDT) Date: Mon, 1 Oct 2012 10:28:14 -0700 Message-ID: From: Freddie Cash To: FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 Subject: ZFS background/async destroy feature: MFC timeframe? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Oct 2012 17:28:15 -0000 Good morning, Just wondering when the zfs background/async destroy feature is planned to be MFC'd? Is the 9.1 release process holding it up? Is there any way to manually patch it into 9-STABLE? We have a system that is trying to destroy a temp filesystem on import of the pool, which is taking way too long to do (over 3 days so far). Was hoping to test out the background destroy to see if it would work in this situation. -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Tue Oct 2 00:28:55 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8012F106566C; Tue, 2 Oct 2012 00:28:55 +0000 (UTC) (envelope-from prvs=162274225f=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id E147C8FC0A; Tue, 2 Oct 2012 00:28:54 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50000313689.msg; Tue, 02 Oct 2012 01:28:51 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 02 Oct 2012 01:28:51 +0100 (not processed: message from trusted or authenticated source) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=162274225f=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <2B9160DABE154535AACA7731E012B456@multiplay.co.uk> From: "Steven Hartland" To: Date: Tue, 2 Oct 2012 01:29:01 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-stable@freebsd.org Subject: ZFS fails to receive valid snapshots (patch included) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2012 00:28:55 -0000 Just a quick heads up about an issue we've been seeing here recently where zfs receive fails on valid replication streams. Problem:- When using zfs receive to restore data saved with zfs send there is the possibility that the receive will fail due to replication of snapshots information being processed in a random order. In addition to this when receiving snapshots where the parent from snapshot is replaced by another needless failures occur which can be seen clearly in verbose mode. More info and a fix can be found in the PR http://www.freebsd.org/cgi/query-pr.cgi?pr=172259 Any feedback welcome Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Tue Oct 2 07:47:23 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 27D09106566B for ; Tue, 2 Oct 2012 07:47:23 +0000 (UTC) (envelope-from ulysse31@gmail.com) Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id D98F48FC16 for ; Tue, 2 Oct 2012 07:47:22 +0000 (UTC) Received: by obbwc20 with SMTP id wc20so6334377obb.13 for ; Tue, 02 Oct 2012 00:47:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=v44DeOhO/dSduYdX04sZ2bx5P0TB25lNbixUfcDTNws=; b=KLEtOmbF/pRjMtmkE7TEAIQiTsJ/lQ8jDro8vyZvAoiwy0lVikQDwx/By2UwkMbftb WLD8SXU3tGld9gHMkDuUAAZDSFRIYdlfZi3OiH2BHMC1Bqi7+iAGyhr/hgsXsS3IvJbj oRHIbIsOOlULNXyHtLEMYGmMfN/OjADzmNRNMsfcgXbrCsdICk8oxtso4utH3sCXWDfk 1DwJpqaKQ8e9FNgSH9IlQ/JO38iWya0m7mCYyhCkNNHECGFFlPvS7//c7bBFm8osgDEX 9hevoHAotyNMccbjZgtqNbsuPwZsLVaMxcGKPD1oiJcPE5Qq+mdITtVp/sWAjSTIwm0K ecnQ== MIME-Version: 1.0 Received: by 10.60.26.133 with SMTP id l5mr13919839oeg.60.1349164042057; Tue, 02 Oct 2012 00:47:22 -0700 (PDT) Received: by 10.182.80.200 with HTTP; Tue, 2 Oct 2012 00:47:21 -0700 (PDT) In-Reply-To: <933684392.1422908.1348879198470.JavaMail.root@erie.cs.uoguelph.ca> References: <933684392.1422908.1348879198470.JavaMail.root@erie.cs.uoguelph.ca> Date: Tue, 2 Oct 2012 09:47:21 +0200 Message-ID: From: Ulysse 31 To: Rick Macklem Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org Subject: Re: nfsv4 kerberized and gssname=root and allgsname X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2012 07:47:23 -0000 2012/9/29 Rick Macklem : > Ulysse 31 wrote: >> Hi all, >> >> I am actually working on a freebsd 9 backup server. >> this server would backup the production server via kerberized nfs4 >> (since the old backup server, a linux one, was doing so). >> we used on the old backup server a root/ kerberos identity, >> which allows the backup server to access all the data. >> I have followed the documentation found at : >> >> http://code.google.com/p/macnfsv4/wiki/FreeBSD8KerberizedNFSSetup >> >> done : >> - added to kernel : >> >> options KGSSAPI >> device crypto >> >> - added to rc.conf : >> >> nfs_client_enable="YES" >> rpc_lockd_enable="YES" >> rpc_statd_enable="YES" >> rpcbind_enable="YES" >> devfs_enable="YES" >> gssd_enable="YES" >> >> - have done sysctl vfs.rpcsec.keytab_enctype=1 and added it to >> /etc/sysctl.conf >> >> We used MIT kerberos implementation, since it is the one used on all >> our servers (mostly linux), and we have created and /etc/krb5.keytab >> containing the following keys : >> host/ >> nfs/ >> root/ >> >> and, of course, i have used the available patch at : >> http://people.freebsd.org/~rmacklem/rpcsec_gss-9.patch >> >> When i try to mount with the (B) method (the one of the google wiki), >> it works as expected, i mean, with a correct user credential, i can >> access to the user data. >> But, when i try to access via the (C) method (the one that i need in >> order to do a full backup of the production storage server) i get a >> systematic kernel panic when launch the mount command. >> The mount command looks to something like : mount -t nfs -o >> nfsv4,sec=krb5i,gssname=root,allgssname > fqdn>: >> I have activated the kernel debugging stuff to get some infos, here is >> the message : >> >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 0; apic id = 00 >> fault virtual address = 0x368 >> fault code = supervisor read data, page not present >> instruction pointer = 0x20:0xffffffff80866ab7 >> stack pointer = 0x28:0xffffff804aa39ce0 >> frame pointer = 0x28:0xffffff804aa39d30 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 701 (mount_nfs) >> trap number = 12 >> panic: page fault >> cpuid = 0 >> KDB: stack backtrace: >> #0 0xffffffff808ae486 at kdb_backtrace+0x66 >> #1 0xffffffff8087885e at panic+0x1ce >> #2 0xffffffff80b82380 at trap_fatal+0x290 >> #3 0xffffffff80b826b8 at trap_pfault+0x1e8 >> #4 0xffffffff80b82cbe at trap+0x3be >> #5 0xffffffff80b6c57f at calltrap+0x8 >> #6 0xffffffff80a78eda at rpc_gss_init+0x72a >> #7 0xffffffff80a79cd6 at rpc_gss_refresh_auth+0x46 >> #8 0xffffffff807a5a53 at newnfs_request+0x163 >> #9 0xffffffff807bf0f7 at nfsrpc_getattrnovp+0xd7 >> #10 0xffffffff807d9b29 at mountnfs+0x4e9 >> #11 0xffffffff807db60a at nfs_mount+0x13ba >> #12 0xffffffff809068fb at vfs_donmount+0x100b >> #13 0xffffffff80907086 at sys_nmount+0x66 >> #14 0xffffffff80b81c60 at amd64_syscall+0x540 >> #15 0xffffffff80b6c867 at Xfast_syscall+0xf7 >> Uptime: 2m31s >> Dumping 97 out of 1002 MB:..17%..33%..50%..66%..83%..99% >> >> ------------------------------------------------------------------------ >> >> Does anyone as experience something similar ? is their a way to >> correct that ? >> Thanks for the help. >> > Well, you're probably the first person to try doing this in years. I did > have it working about 4-5years ago. Welcome to the bleeding edge;-) > > Could you do the following w.r.t. above kernel: > # cd /boot/nkernel (or wherever the kernel lives) > # nm kernel | grep rpc_gss_init > - add the offset 0x72a to the address for rpc_gss_init > # addr2line -e kernel.symbols > 0xXXX - the hex number above (address of rpc_gss_init+0x72a) > - email me what it prints out, so I know where the crash is occurring > > You could also run the following command on the Linux server to capture > packets during the mount attempt, then email me the xxx.pcap file so I > can look at it in wireshark, to see what is happening before the crash. > (I'm guessing nr_auth is somehow bogus, but that's just a guess.:-) > # tcpdump -s 0 -w xxx.pcap host Hi, Sorry for the delay i was on travel and no working network connection. Back online for the rest of the week ^^. Thanks for your help, here is what it prints out : root@bsdenc:/boot/kernel # nm kernel | grep rpc_gss_init ffffffff80df07b0 r __set_sysinit_set_sym_svc_rpc_gss_init_sys_init ffffffff80a787b0 t rpc_gss_init ffffffff80a7a580 t svc_rpc_gss_init ffffffff81127530 d svc_rpc_gss_init_sys_init ffffffff80a7a3b0 T xdr_rpc_gss_init_res root@bsdenc:/boot/kernel # addr2line -e kernel.symbols 0xffffffff80a78eda /usr/src/sys/rpc/rpcsec_gss/rpcsec_gss.c:772 for the tcpdump from the linux server, i think you may are doing reference to the production nfs server ? if yes, unfortunately it is not linux, it is a netapp filer, so no "real" root access on it (so no tcpdump available :s ). if you were mentioning the old backup server (which is linux but nfs client), i cannot do unmount/mount on it since its production (mountpoint always busy), but i can made a quick VM/testmachine that acts like the linux backup server and do a tcpdump from it. Just let me know. Thanks again. -- Ulysse31 > > rick > >> -- >> Ulysse31 >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Oct 2 10:26:42 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 421DA10657AC for ; Tue, 2 Oct 2012 10:26:42 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id B05878FC1C for ; Tue, 2 Oct 2012 10:26:41 +0000 (UTC) Received: by weyx43 with SMTP id x43so4129410wey.13 for ; Tue, 02 Oct 2012 03:26:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:content-type:date:subject:to:message-id:mime-version:x-mailer; bh=2mTea3Pr3WQ3nGmHPK5viRi6VkZex1MOw1gVS+/A7/E=; b=It77GCUMDC3r6WWP4Zt47FdVXqvvfWgyfRJR7u5FbGiWuP0kFCjvZYdxLy4a66wezd 2NQX4HGnwpCeJxxl0885N0MyQhtZSTzsS625Xh7AkPOk5Iz9XD/O4CZVXC6qqryLBDys ucMV/3bceLzAJafbculb9XAFs/B9TADH2ggNnZUujQ22g5qEeqppB6q1RQIJqVX7fpTg FI6zoyQioawRwzGxjWIAlxi13l1wOLWfeOdrxoM/rOMAx/j398LxVwNLjZWHfSqG36Vk 8ytIPdYPTUAqq8MQsTsol3wciekmioOmDdp0yCsSeOdE6dCDIXGbZytIBv6yEwGMYVDh fykQ== Received: by 10.180.80.33 with SMTP id o1mr17716714wix.14.1349173600387; Tue, 02 Oct 2012 03:26:40 -0700 (PDT) Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com. [217.18.249.148]) by mx.google.com with ESMTPS id k20sm20963599wiv.11.2012.10.02.03.26.38 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 02 Oct 2012 03:26:39 -0700 (PDT) From: Nikolay Denev Date: Tue, 2 Oct 2012 13:26:37 +0300 To: "" Message-Id: <906543F2-96BD-4519-B693-FD5AFB646F87@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) X-Mailer: Apple Mail (2.1498) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: nfs + zfs hangs on RELENG_9 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2012 10:26:42 -0000 Hi, I'm experimenting on a machine with both CTL and NFS exports to a Linux = host to run a Oracle DB instance. And now I'm seeing for a second time a ZFS IO hang after some activity = over the NFS mount. Now all IO to the pool where the NFS share is hangs, for example I did : root@goliath:/home/ndenev # du -ach /tank And it just hangs in there. procstat -kk output of the dd command : root@goliath:/home/ndenev # procstat -kk 38266 PID TID COMM TDNAME KSTACK = =20 38266 101997 du - mi_switch+0x186 = sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 dmu_tx_assign+0x170 = zfs_inactive+0xf1 zfs_freebsd_inactive+0x1a vinactive+0x8d vputx+0x2d8 = sys_fchdir+0x35a amd64_syscall+0x546 Xfast_syscall+0xf7=20 nfsd process is also in D state : root 1420 0.0 0.0 9912 584 ?? Ss Thu08PM 0:00.04 nfsd: = master (nfsd) root 1422 0.0 0.0 9912 2332 ?? D Thu08PM 5347:34.58 nfsd: = server (nfsd) root@goliath:/home/ndenev # procstat -kk 1420 PID TID COMM TDNAME KSTACK = =20 1420 100740 nfsd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0xc _cv_wait_sig+0x12e = seltdwait+0x110 kern_select+0x6ed sys_select+0x5d amd64_syscall+0x546 = Xfast_syscall+0xf7=20 root@goliath:/home/ndenev # procstat -kk 1422 PID TID COMM TDNAME KSTACK = =20 1422 101178 nfsd nfsd: master mi_switch+0x186 = sleepq_wait+0x42 _sx_xlock_hard+0x428 _sx_xlock+0x51 arc_buf_evict+0x84 = dbuf_clear+0x91 dbuf_free_range+0x469 dnode_free_range+0x2c4 = dmu_free_long_range_impl+0x13d dmu_free_long_range+0x4c zfs_rmnode+0x69 = zfs_inactive+0x66 zfs_freebsd_inactive+0x1a vinactive+0x8d vputx+0x2d8 = zfs_freebsd_rename+0xe1 VOP_RENAME_APV+0x46 nfsvno_rename+0x2cc=20 [=85 showing only master thread, as I'm running with maxthreads 196 and = the output is too verbose, I can provide it if needed =85]=20 There are these two kernel threads that look also waiting on some lock : 0 100423 kernel zio_write_intr_6 mi_switch+0x186 = sleepq_wait+0x42 _sx_xlock_hard+0x428 _sx_xlock+0x51 = buf_hash_insert+0x56 arc_write_done+0x8e zio_done+0x2ee zio_execute+0xc3 = zio_done+0x367 zio_execute+0xc3 zio_done+0x367 zio_execute+0xc3 = zio_done+0x367 zio_execute+0xc3 zio_done+0x367 zio_execute+0xc3 = zio_done+0x367 zio_execute+0xc3=20 0 101699 kernel zio_write_intr_7 mi_switch+0x186 = sleepq_wait+0x42 _sx_xlock_hard+0x428 _sx_xlock+0x51 = buf_hash_insert+0x56 arc_write_done+0x8e zio_done+0x2ee zio_execute+0xc3 = zio_done+0x367 zio_execute+0xc3 zio_done+0x367 zio_execute+0xc3 = taskqueue_run_locked+0x85 taskqueue_thread_loop+0x46 fork_exit+0x11f = fork_trampoline+0xe=20 And these threads are from zfskern kernel process : root@goliath:/home/ndenev # procstat -kk 7 PID TID COMM TDNAME KSTACK = =20 7 100192 zfskern arc_reclaim_thre mi_switch+0x186 = sleepq_wait+0x42 _sx_xlock_hard+0x428 _sx_xlock+0x51 = arc_buf_remove_ref+0x8a dbuf_rele_and_unlock+0x132 dbuf_evict+0x11 = dbuf_do_evict+0x53 arc_do_user_evicts+0xb4 arc_reclaim_thread+0x263 = fork_exit+0x11f fork_trampoline+0xe=20 7 100193 zfskern l2arc_feed_threa mi_switch+0x186 = sleepq_timedwait+0x42 _cv_timedwait+0x13c l2arc_feed_thread+0x1a3 = fork_exit+0x11f fork_trampoline+0xe=20 7 100536 zfskern txg_thread_enter mi_switch+0x186 = sleepq_wait+0x42 _cv_wait+0x121 txg_thread_wait+0x79 = txg_quiesce_thread+0xb5 fork_exit+0x11f fork_trampoline+0xe=20 7 100537 zfskern txg_thread_enter mi_switch+0x186 = sleepq_wait+0x42 _cv_wait+0x121 zio_wait+0x61 dsl_pool_sync+0xe0 = spa_sync+0x336 txg_sync_thread+0x136 fork_exit+0x11f fork_trampoline+0xe=20= 7 101811 zfskern txg_thread_enter mi_switch+0x186 = sleepq_wait+0x42 _cv_wait+0x121 txg_quiesce_thread+0x1fb fork_exit+0x11f = fork_trampoline+0xe=20 7 101812 zfskern txg_thread_enter mi_switch+0x186 = sleepq_wait+0x42 _cv_wait+0x121 zio_wait+0x61 dsl_pool_sync+0x2c3 = spa_sync+0x336 txg_sync_thread+0x136 fork_exit+0x11f fork_trampoline+0xe=20= 7 101813 zfskern zvol tank/oracle mi_switch+0x186 = sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 dmu_tx_assign+0x170 = zvol_strategy+0x27a zvol_geom_worker+0xf9 fork_exit+0x11f = fork_trampoline+0xe=20 7 101817 zfskern zvol tank/oracle mi_switch+0x186 = sleepq_wait+0x42 _sleep+0x390 zvol_geom_worker+0xec fork_exit+0x11f = fork_trampoline+0xe=20 7 101822 zfskern zvol tank/oracle mi_switch+0x186 = sleepq_wait+0x42 _sleep+0x390 zvol_geom_worker+0xec fork_exit+0x11f = fork_trampoline+0xe=20 7 101824 zfskern zvol tank/oracle mi_switch+0x186 = sleepq_wait+0x42 _cv_wait+0x121 txg_wait_open+0x85 dmu_tx_assign+0x170 = zvol_strategy+0x27a zvol_geom_worker+0xf9 fork_exit+0x11f = fork_trampoline+0xe=20 I'm running with the following NFS settings : nfs_server_enable=3D"YES" nfsv4_server_enable=3D"YES" nfsuserd_enable=3D"YES" rpc_statd_enable=3D"YES" rpc_lockd_enable=3D"YES" vfs.nfsd.minthreads=3D32 vfs.nfsd.maxthreads=3D196 root@goliath:/home/ndenev # cat /etc/exports V4: /tank -sec=3Dsys -network 10.0.0.0/8 /tank/oracle_db2 -maproot=3Droot -alldirs -network 10.XX.XX.XX = -mask 255.255.255.255=20 And the mount on the Linux machine is : goliath.neterra.moneybookers.net:/tank/oracle_db2 /mnt/goliath2 nfs = rw,bg,hard,nointr,tcp,actimeo=3D0,vers=3D3,timeo=3D600,rsize=3D32768,wsize= =3D32768 0 0 Any idea how this IO hang can be debugged further? Maybe more = information (or less :) ) is needed? From owner-freebsd-fs@FreeBSD.ORG Tue Oct 2 15:00:10 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A0D111065670; Tue, 2 Oct 2012 15:00:10 +0000 (UTC) (envelope-from boris.astardzhiev@gmail.com) Received: from mail-la0-f54.google.com (mail-la0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id C25338FC08; Tue, 2 Oct 2012 15:00:08 +0000 (UTC) Received: by lage12 with SMTP id e12so2979715lag.13 for ; Tue, 02 Oct 2012 08:00:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=PO6JIXQv49d6l/BL+RRjZ6x1QLyiwlaEknHFgZkiAuI=; b=0Ekbt+DjUMyy7IG9RnW8NsUDSawDt6aJLyB7p9FG2rZuAb/YyFmlP4jAcGSpNBuA0L sxQYKQQMK9gskIUGNUSXjYLx7maBhbegph0XuKzMKn3PvAQQjdk5SnsmhnlL+p/NwGgB aonxnukmDFDLtKk3D3bV3oFcurw01zxwPmKHUHjMjx+ZYMTyojMZQaB9v2ZdhI5xOF2u ZD8YYbL1s/ZWrxXtBAitJijd55SGp8ibJR/38wtXt9T7ZE8lV5GhMjeztWIqpx5hr/4O FSOeuzUD0WHVaKz25LwYKLuAgIJBGjhxiblhobzhDo9rG1aLm2XtKbYvGa3+Swic3OUv r/BA== MIME-Version: 1.0 Received: by 10.112.43.98 with SMTP id v2mr678720lbl.1.1349190007550; Tue, 02 Oct 2012 08:00:07 -0700 (PDT) Received: by 10.112.108.1 with HTTP; Tue, 2 Oct 2012 08:00:06 -0700 (PDT) Date: Tue, 2 Oct 2012 18:00:06 +0300 Message-ID: From: Boris Astardzhiev To: freebsd-fs@freebsd.org Content-Type: multipart/mixed; boundary=e0cb4efa6e7e2274bc04cb14c789 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: gjb@semihalf.com, Grzegorz Bernacki , stanislav_galabov@smartcom.bg Subject: libstand's NANDFS superblock detection fix X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2012 15:00:10 -0000 --e0cb4efa6e7e2274bc04cb14c789 Content-Type: text/plain; charset=ISO-8859-1 Hello, On behalf of Smartcom Bulgaria AD I would like to contribute a patch for libstand's NANDFS support in FreeBSD. It is related to the correct detection of a superblock when accessing the filesystem. It's also been noticed that the election of a superblock between kernelspace and libstand's one differs regarding the checkpoint number. The patch is attached. Comments will be appreciated. Greetings, Boris Astardzhiev / Smartcom Bulgaria AD --e0cb4efa6e7e2274bc04cb14c789 Content-Type: application/octet-stream; name="nand-contrib.diff" Content-Disposition: attachment; filename="nand-contrib.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_h7sz8fxe0 ZGlmZiAtLWdpdCBhL2xpYi9saWJzdGFuZC9uYW5kZnMuYyBiL2xpYi9saWJzdGFuZC9uYW5kZnMu YwppbmRleCA2N2UyZmVhLi5lZjZlNmZjIDEwMDY0NAotLS0gYS9saWIvbGlic3RhbmQvbmFuZGZz LmMKKysrIGIvbGliL2xpYnN0YW5kL25hbmRmcy5jCkBAIC0xNzUsNyArMTc1LDcgQEAgc3RhdGlj IGludAogbmFuZGZzX2ZpbmRfc3VwZXJfYmxvY2soc3RydWN0IG5hbmRmcyAqZnMsIHN0cnVjdCBv cGVuX2ZpbGUgKmYpCiB7CiAJc3RydWN0IG5hbmRmc19zdXBlcl9ibG9jayAqc2I7Ci0JaW50IGks IGosIG47CisJaW50IGksIGosIG4sIHM7CiAJaW50IHNlY3RvcnNfdG9fcmVhZCwgZXJyb3I7CiAK IAlzYiA9IG1hbGxvYyhmcy0+bmZfc2VjdG9yc2l6ZSk7CkBAIC0xOTYsMjMgKzE5NiwyOCBAQCBu YW5kZnNfZmluZF9zdXBlcl9ibG9jayhzdHJ1Y3QgbmFuZGZzICpmcywgc3RydWN0IG9wZW5fZmls ZSAqZikKIAkJCWNvbnRpbnVlOwogCQl9CiAJCW4gPSBmcy0+bmZfc2VjdG9yc2l6ZSAvIHNpemVv ZihzdHJ1Y3QgbmFuZGZzX3N1cGVyX2Jsb2NrKTsKKwkJcyA9IDA7CiAJCWlmICgoaSAqIGZzLT5u Zl9zZWN0b3JzaXplKSAlIGZzLT5uZl9mc2RhdGEtPmZfZXJhc2VzaXplID09IDApIHsKIAkJCWlm IChmcy0+bmZfc2VjdG9yc2l6ZSA9PSBzaXplb2Yoc3RydWN0IG5hbmRmc19mc2RhdGEpKQogCQkJ CWNvbnRpbnVlOwogCQkJZWxzZSB7CisJCQkJcyArPSAoc2l6ZW9mKHN0cnVjdCBuYW5kZnNfZnNk YXRhKSAvCisJCQkJICAgIHNpemVvZihzdHJ1Y3QgbmFuZGZzX3N1cGVyX2Jsb2NrKSk7CisjaWYg MAogCQkJCXNiICs9IChzaXplb2Yoc3RydWN0IG5hbmRmc19mc2RhdGEpIC8KIAkJCQkgICAgc2l6 ZW9mKHN0cnVjdCBuYW5kZnNfc3VwZXJfYmxvY2spKTsKIAkJCQluIC09IChzaXplb2Yoc3RydWN0 IG5hbmRmc19mc2RhdGEpIC8KIAkJCQkgICAgc2l6ZW9mKHN0cnVjdCBuYW5kZnNfc3VwZXJfYmxv Y2spKTsKKyNlbmRpZgogCQkJfQogCQl9CiAKLQkJZm9yIChqID0gMDsgaiA8IG47IGorKykgewor CQlmb3IgKGogPSBzOyBqIDwgbjsgaisrKSB7CiAJCQlpZiAoIW5hbmRmc19jaGVja19zdXBlcmJs b2NrX2NyYyhmcy0+bmZfZnNkYXRhLCAmc2Jbal0pKQogCQkJCWNvbnRpbnVlOwotCQkJTkFOREZT X0RFQlVHKCJtYWdpYyAleCB3dGltZSAlamRcbiIsIHNiLT5zX21hZ2ljLAotCQkJICAgIHNiLT5z X3d0aW1lKTsKLQkJCWlmIChzYltqXS5zX3d0aW1lID4gZnMtPm5mX3NiLT5zX3d0aW1lKQorCQkJ TkFOREZTX0RFQlVHKCJtYWdpYyAleCB3dGltZSAlamQsIGxhc3RjcCAweCVqeFxuIiwKKwkJCSAg ICBzYltqXS5zX21hZ2ljLCBzYltqXS5zX3d0aW1lLCBzYltqXS5zX2xhc3RfY25vKTsKKwkJCWlm IChzYltqXS5zX2xhc3RfY25vID4gZnMtPm5mX3NiLT5zX2xhc3RfY25vKQogCQkJCW1lbWNweShm cy0+bmZfc2IsICZzYltqXSwgc2l6ZW9mKCpmcy0+bmZfc2IpKTsKIAkJfQogCX0K --e0cb4efa6e7e2274bc04cb14c789-- From owner-freebsd-fs@FreeBSD.ORG Tue Oct 2 15:07:10 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3BCA9106566C for ; Tue, 2 Oct 2012 15:07:10 +0000 (UTC) (envelope-from boris.astardzhiev@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 9AF988FC08 for ; Tue, 2 Oct 2012 15:07:09 +0000 (UTC) Received: by lbdb5 with SMTP id b5so6454156lbd.13 for ; Tue, 02 Oct 2012 08:07:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=BzcGuuxEVCMfTq8F3b8IYlKCFdJjqeup8U4hf5MHu9Q=; b=ne1ddnW9+KU34vlzE5n+gxSDfmFsnClkQCn5J6oq9Jc1WKDkjA0j+6fKzOVBJOb6p9 3tyeQet0AB3GSBsTM8UyhAEigZTqtWRJUvIGO9ux5MseL8HckynV7M3hKodWcQzU5LyQ F+z0bKFreeTv8r6Ze+NQ0zpXmlAr3q4+STSTYTjinxTwqnst+IGC2wk5zS2mgIbqsEJL 7cMMbN7TuJ2PHqJiMfDmPxeG0nvJWk1g7LVtzX2FF1+a628lFsLyb7Ji5ybz54HMNVbb ymU0EVyR/835EPJrp5D95PWE1xQGnAZBWat5/Xhd5+wCbL0Vki487nXlHaJlnZBtVDoL iySg== MIME-Version: 1.0 Received: by 10.112.26.10 with SMTP id h10mr682211lbg.4.1349190428214; Tue, 02 Oct 2012 08:07:08 -0700 (PDT) Received: by 10.112.108.1 with HTTP; Tue, 2 Oct 2012 08:07:07 -0700 (PDT) Date: Tue, 2 Oct 2012 18:07:07 +0300 Message-ID: From: Boris Astardzhiev To: freebsd-fs@freebsd.org Content-Type: multipart/mixed; boundary=bcaec553ffc03546c104cb14e079 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: stanislav_galabov@smartcom.bg Subject: CRC32 feature in FreeBSD's boot loader X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2012 15:07:10 -0000 --bcaec553ffc03546c104cb14e079 Content-Type: text/plain; charset=ISO-8859-1 Hello, I'm not sure that this is the right mailing list but I couldn't find a dedicated one. I would like to contribute a new feature for the FreeBSD boot loader - a command that calculates the CRC32 of a specified file name. It uses libz's CRC32 implementation. While attempting to make libstand's nandfs to work adequately I've experienced NAND flash page/block misreadings from U-Boot's API side. Therefore I was in a need of a tool that could prove that a file had been read correctly in this very stage of the FreeBSD boot process. So here it is. In addition to the CRC calculation the size of the stipulated file is printed as well. Any comments will be appreciated. Greetings, Boris Astardzhiev / Smartcom Bulgaria AD --bcaec553ffc03546c104cb14e079 Content-Type: application/octet-stream; name="loader-crc32.diff" Content-Disposition: attachment; filename="loader-crc32.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_h7t4ky6b2 ZGlmZiAtLWdpdCBhL2xpYi9saWJzdGFuZC9NYWtlZmlsZSBiL2xpYi9saWJzdGFuZC9NYWtlZmls ZQppbmRleCBhMjZjNTFkLi4wOTVmOTYyIDEwMDY0NAotLS0gYS9saWIvbGlic3RhbmQvTWFrZWZp bGUKKysrIGIvbGliL2xpYnN0YW5kL01ha2VmaWxlCkBAIC0zOSw5ICszOSw5IEBAIENGTEFHUys9 CS1tc29mdC1mbG9hdCAtRF9TVEFOREFMT05FCiAuZW5kaWYKIAogIyBzdGFuZGFsb25lIGNvbXBv bmVudHMgYW5kIHN0dWZmIHdlIGhhdmUgbW9kaWZpZWQgbG9jYWxseQotU1JDUys9CXp1dGlsLmgg X19tYWluLmMgYXNzZXJ0LmMgYmNkLmMgYnN3YXAuYyBlbnZpcm9ubWVudC5jIGdldG9wdC5jIGdl dHMuYyBcCi0JZ2xvYmFscy5jIHBhZ2VyLmMgcHJpbnRmLmMgc3RyZHVwLmMgc3RyZXJyb3IuYyBz dHJ0b2wuYyByYW5kb20uYyBcCi0Jc2Jyay5jIHR3aWRkbGUuYyB6YWxsb2MuYyB6YWxsb2NfbWFs bG9jLmMKK1NSQ1MrPQl6dXRpbC5oIF9fbWFpbi5jIGFzc2VydC5jIGJjZC5jIGJzd2FwLmMgY3Jj LmMgZW52aXJvbm1lbnQuYyBnZXRvcHQuYyBcCisJZ2V0cy5jIGdsb2JhbHMuYyBwYWdlci5jIHBy aW50Zi5jIHN0cmR1cC5jIHN0cmVycm9yLmMgc3RydG9sLmMgXAorCXJhbmRvbS5jIHNicmsuYyB0 d2lkZGxlLmMgemFsbG9jLmMgemFsbG9jX21hbGxvYy5jCiAKICMgcHJpdmF0ZSAocHJ1bmVkKSB2 ZXJzaW9ucyBvZiBsaWJjIHN0cmluZyBmdW5jdGlvbnMKIFNSQ1MrPQlzdHJjYXNlY21wLmMKZGlm ZiAtLWdpdCBhL2xpYi9saWJzdGFuZC9jcmMuYyBiL2xpYi9saWJzdGFuZC9jcmMuYwpuZXcgZmls ZSBtb2RlIDEwMDY0NAppbmRleCAwMDAwMDAwLi4zMzFkMzkwCi0tLSAvZGV2L251bGwKKysrIGIv bGliL2xpYnN0YW5kL2NyYy5jCkBAIC0wLDAgKzEsMTI1IEBACisvKi0KKyAqIENvcHlyaWdodCAo YykgMjAxMiBCb3JpcyBBc3RhcmR6aGlldiAvIFNtYXJ0Y29tIEJ1bGdhcmlhIEFECisgKiBBbGwg cmlnaHRzIHJlc2VydmVkLgorICoKKyAqIFJlZGlzdHJpYnV0aW9uIGFuZCB1c2UgaW4gc291cmNl IGFuZCBiaW5hcnkgZm9ybXMsIHdpdGggb3Igd2l0aG91dAorICogbW9kaWZpY2F0aW9uLCBhcmUg cGVybWl0dGVkIHByb3ZpZGVkIHRoYXQgdGhlIGZvbGxvd2luZyBjb25kaXRpb25zCisgKiBhcmUg bWV0OgorICogMS4gUmVkaXN0cmlidXRpb25zIG9mIHNvdXJjZSBjb2RlIG11c3QgcmV0YWluIHRo ZSBhYm92ZSBjb3B5cmlnaHQKKyAqICAgIG5vdGljZSwgdGhpcyBsaXN0IG9mIGNvbmRpdGlvbnMg YW5kIHRoZSBmb2xsb3dpbmcgZGlzY2xhaW1lci4KKyAqIDIuIFJlZGlzdHJpYnV0aW9ucyBpbiBi aW5hcnkgZm9ybSBtdXN0IHJlcHJvZHVjZSB0aGUgYWJvdmUgY29weXJpZ2h0CisgKiAgICBub3Rp Y2UsIHRoaXMgbGlzdCBvZiBjb25kaXRpb25zIGFuZCB0aGUgZm9sbG93aW5nIGRpc2NsYWltZXIg aW4gdGhlCisgKiAgICBkb2N1bWVudGF0aW9uIGFuZC9vciBvdGhlciBtYXRlcmlhbHMgcHJvdmlk ZWQgd2l0aCB0aGUgZGlzdHJpYnV0aW9uLgorICoKKyAqIFRISVMgU09GVFdBUkUgSVMgUFJPVklE RUQgQlkgVEhFIEFVVEhPUiBBTkQgQ09OVFJJQlVUT1JTIGBgQVMgSVMnJyBBTkQKKyAqIEFOWSBF WFBSRVNTIE9SIElNUExJRUQgV0FSUkFOVElFUywgSU5DTFVESU5HLCBCVVQgTk9UIExJTUlURUQg VE8sIFRIRQorICogSU1QTElFRCBXQVJSQU5USUVTIE9GIE1FUkNIQU5UQUJJTElUWSBBTkQgRklU TkVTUyBGT1IgQSBQQVJUSUNVTEFSIFBVUlBPU0UKKyAqIEFSRSBESVNDTEFJTUVELiAgSU4gTk8g RVZFTlQgU0hBTEwgVEhFIEFVVEhPUiBPUiBDT05UUklCVVRPUlMgQkUgTElBQkxFCisgKiBGT1Ig QU5ZIERJUkVDVCwgSU5ESVJFQ1QsIElOQ0lERU5UQUwsIFNQRUNJQUwsIEVYRU1QTEFSWSwgT1Ig Q09OU0VRVUVOVElBTAorICogREFNQUdFUyAoSU5DTFVESU5HLCBCVVQgTk9UIExJTUlURUQgVE8s IFBST0NVUkVNRU5UIE9GIFNVQlNUSVRVVEUgR09PRFMKKyAqIE9SIFNFUlZJQ0VTOyBMT1NTIE9G IFVTRSwgREFUQSwgT1IgUFJPRklUUzsgT1IgQlVTSU5FU1MgSU5URVJSVVBUSU9OKQorICogSE9X RVZFUiBDQVVTRUQgQU5EIE9OIEFOWSBUSEVPUlkgT0YgTElBQklMSVRZLCBXSEVUSEVSIElOIENP TlRSQUNULCBTVFJJQ1QKKyAqIExJQUJJTElUWSwgT1IgVE9SVCAoSU5DTFVESU5HIE5FR0xJR0VO Q0UgT1IgT1RIRVJXSVNFKSBBUklTSU5HIElOIEFOWSBXQVkKKyAqIE9VVCBPRiBUSEUgVVNFIE9G IFRISVMgU09GVFdBUkUsIEVWRU4gSUYgQURWSVNFRCBPRiBUSEUgUE9TU0lCSUxJVFkgT0YKKyAq IFNVQ0ggREFNQUdFLgorICovCisKKy8qCisgKiBTaW1wbGUgQ1JDIGNhbGN1bGF0aW9uIG9mIGEg ZmlsZS4KKyAqLworCisjaW5jbHVkZSA8c3lzL2NkZWZzLmg+CisjaW5jbHVkZSAic3RhbmQuaCIK KyNpbmNsdWRlIDxzdHJpbmcuaD4KKyNpbmNsdWRlICIuLi9saWJ6L3psaWIuaCIKKworLyoKKyAq IERpc3BsYXkgY2hlY2tzdW0uCisgKi8KKworaW50CitjcmMzMl9maWxlKGNvbnN0IGNoYXIgKmZu YW1lKQoreworCXVuc2lnbmVkIGxvbmcgCWNyYyA9IDBVTDsKKwljaGFyIAkJYnVmWzgwXTsKKwlz aXplX3QgCQlieXRlczsKKwlpbnQJCWZkOworCWludAkJcmVzOworCW9mZl90IAkJZW5kX29mZjsK KwlzdHJ1Y3QJc3RhdCAJc3Q7CisKKwlyZXMgPSAwOworCWZkID0gb3BlbihmbmFtZSwgT19SRE9O TFkpOworCWlmIChmZCA9PSAtMSkgeworCQlwcmludGYoImNhbid0IG9wZW4gJyVzJzogJXNcbiIs IGZuYW1lLCBzdHJlcnJvcihlcnJubykpOworCQlyZXMgPSAtMTsKKwkJcmV0dXJuIChyZXMpOwor CX0KKwkKKwkvKgorCSAqIENoZWNrIGlmIGl0IGlzIGEgcmVndWxhciBmaWxlLgorCSAqLworCW1l bXNldCgmc3QsIDAsIHNpemVvZihzdCkpOworCWlmIChmc3RhdChmZCwgJnN0KSA9PSAtMSkgewor CQlwcmludGYoImNhbid0IGdldCBzdGF0aXN0aWNzIG9mICclcyc6ICVzXG4iLCBmbmFtZSwKKwkJ ICAgIHN0cmVycm9yKGVycm5vKSk7CisJCWNsb3NlKGZkKTsKKwkJcmVzID0gLTE7CisJCXJldHVy biAocmVzKTsKKwl9CisKKwlpZiAoIShTX0lTUkVHKHN0LnN0X21vZGUpKSkgeworCQlwcmludGYo IiclcycgaXMgbm90IGEgcmVndWxhciBmaWxlXG4iLCBmbmFtZSk7CisJCWNsb3NlKGZkKTsKKwkJ cmVzID0gLTE7CisJCXJldHVybiAocmVzKTsKKwl9CisKKwkvKiAKKwkgKiBHcmFiIGZpbGUgc2l6 ZS4gCisJICovCQorCWVuZF9vZmYgPSBsc2VlayhmZCwgMCwgU0VFS19FTkQpOworCWlmIChlbmRf b2ZmID09IC0xKSB7CisJCXByaW50ZigiY2FuJ3QgZ2V0IGVuZCBvZiAnJXMnOiAlc1xuIiwgZm5h bWUsIHN0cmVycm9yKGVycm5vKSk7CisJCWNsb3NlKGZkKTsKKwkJcmVzID0gLTE7CisJCXJldHVy biAocmVzKTsKKwl9CisJCisJaWYgKGxzZWVrKGZkLCAwLCBTRUVLX1NFVCkgPT0gLTEpIHsKKwkJ cHJpbnRmKCJjYW4ndCBzZXQgb2Zmc2V0IHRvICclcyc6ICVzXG4iLCBmbmFtZSwKKwkJICAgIHN0 cmVycm9yKGVycm5vKSk7CisJCWNsb3NlKGZkKTsKKwkJcmVzID0gLTE7CisJCXJldHVybiAocmVz KTsKKwl9CisKKwkvKiAKKwkgKiBDYWxjdWxhdGUgY2hlY2tzdW0uCisJICovCisJY3JjID0gY3Jj MzIoY3JjLCAwLCAwKTsKKwlmb3IgKDs7KSB7CisJCWJ5dGVzID0gcmVhZChmZCwgYnVmLCBzaXpl b2YoYnVmKSk7CisJCWlmIChieXRlcyA8IDApIHsKKwkJCXJlcyA9IC0xOworCQkJYnJlYWs7CisJ CX0KKworCQlpZiAoIWJ5dGVzKQorCQkJYnJlYWs7CisKKwkJY3JjID0gY3JjMzIoY3JjLCBidWYs IGJ5dGVzKTsKKwl9CisKKwlpZiAoIXJlcykgeworCQlwcmludGYoImZpbGU6ICAgJXNcbiIsIGZu YW1lKTsKKwkJcHJpbnRmKCJcdCBzaXplOiAlbGx1XG4iLCBlbmRfb2ZmKTsKKwkJcHJpbnRmKCJc dGNyYzMyOiAweCUwOGx4XG4iLCBjcmMpOworCX0gZWxzZQorCQlwcmludGYoImNhbid0IGNhbGN1 bGF0ZSBjcmMzMiBvZiAnJXMnXG4iLCBmbmFtZSk7CisKKwljbG9zZShmZCk7CisKKwlyZXR1cm4g KHJlcyk7Cit9CisKZGlmZiAtLWdpdCBhL2xpYi9saWJzdGFuZC9zdGFuZC5oIGIvbGliL2xpYnN0 YW5kL3N0YW5kLmgKaW5kZXggMjBiNzE3ZS4uMTE5YzY5NiAxMDA2NDQKLS0tIGEvbGliL2xpYnN0 YW5kL3N0YW5kLmgKKysrIGIvbGliL2xpYnN0YW5kL3N0YW5kLmgKQEAgLTI2Nyw2ICsyNjcsOSBA QCBleHRlcm4gY2hhcgkqb3B0YXJnOwkJCS8qIGdldG9wdCgzKSBleHRlcm5hbCB2YXJpYWJsZXMg Ki8KIGV4dGVybiBpbnQJb3B0aW5kLCBvcHRlcnIsIG9wdG9wdCwgb3B0cmVzZXQ7CiBleHRlcm4g aW50CWdldG9wdChpbnQsIGNoYXIgKiBjb25zdCBbXSwgY29uc3QgY2hhciAqKTsKIAorLyogY3Jj LmMgKi8KK2V4dGVybiBpbnQJY3JjMzJfZmlsZShjb25zdCBjaGFyICpmbmFtZSk7CisKIC8qIHBh Z2VyLmMgKi8KIGV4dGVybiB2b2lkCXBhZ2VyX29wZW4odm9pZCk7CiBleHRlcm4gdm9pZAlwYWdl cl9jbG9zZSh2b2lkKTsKZGlmZiAtLWdpdCBhL3N5cy9ib290L2NvbW1vbi9jb21tYW5kcy5jIGIv c3lzL2Jvb3QvY29tbW9uL2NvbW1hbmRzLmMKaW5kZXggYjRmZTExOC4uZmJiZWM2NiAxMDA2NDQK LS0tIGEvc3lzL2Jvb3QvY29tbW9uL2NvbW1hbmRzLmMKKysrIGIvc3lzL2Jvb3QvY29tbW9uL2Nv bW1hbmRzLmMKQEAgLTQ5NiwzICs0OTYsMjAgQEAgY29tbWFuZF9sc2RldihpbnQgYXJnYywgY2hh ciAqYXJndltdKQogICAgIHJldHVybihDTURfT0spOwogfQogCisvKgorICogQ2FsY3VsYXRlIENS QzMyIG9mIGEgZmlsZS4KKyAqLworQ09NTUFORF9TRVQoY3JjMzIsICJjcmMzMiIsICJjYWxjdWxh dGUgY3JjMzIgb2YgYSBmaWxlIiwgY29tbWFuZF9jcmMzMik7CisKK3N0YXRpYyBpbnQKK2NvbW1h bmRfY3JjMzIoaW50IGFyZ2MsIGNoYXIgKmFyZ3ZbXSkKK3sKKyAgICBpbnQgaSwgcmVzOworCisg ICAgcmVzID0gMDsKKyAgICBmb3IgKGkgPSAxOyAoaSA8IGFyZ2MpICYmICFyZXM7IGkrKykKKyAg ICAJcmVzIHw9IGNyYzMyX2ZpbGUoYXJndltpXSk7CisKKyAgICByZXR1cm4gKCFyZXMgPyAoQ01E X09LKSA6IChDTURfRVJST1IpKTsKK30KKwpkaWZmIC0tZ2l0IGEvc3lzL2Jvb3QvY29tbW9uL2xv YWRlci44IGIvc3lzL2Jvb3QvY29tbW9uL2xvYWRlci44CmluZGV4IDI1ZDIyNzAuLjk1Y2M0N2Ig MTAwNjQ0Ci0tLSBhL3N5cy9ib290L2NvbW1vbi9sb2FkZXIuOAorKysgYi9zeXMvYm9vdC9jb21t b24vbG9hZGVyLjgKQEAgLTE3NCw2ICsxNzQsOCBAQCBUaGUgYmVoYXZpb3Igb2YgdGhpcyBidWls dGluIGlzIGNoYW5nZWQgaWYKIC5YciBsb2FkZXIuNHRoIDgKIGlzIGxvYWRlZC4KIC5QcAorLkl0 IEljIGNyYzMyCitDYWxjdWxhdGUgQ1JDMzIgb2YgYSBzcGVjaWZpZWQgZmlsZS4KIC5JdCBJYyBl Y2hvIFhvCiAuT3AgRmwgbgogLk9wIEFxIG1lc3NhZ2UK --bcaec553ffc03546c104cb14e079-- From owner-freebsd-fs@FreeBSD.ORG Tue Oct 2 17:48:04 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7610E1065670 for ; Tue, 2 Oct 2012 17:48:04 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id E964D8FC17 for ; Tue, 2 Oct 2012 17:48:03 +0000 (UTC) Received: by lbdb5 with SMTP id b5so6659802lbd.13 for ; Tue, 02 Oct 2012 10:48:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=23RlEHIbbdYiM+g4XE4uo7avb+PbMQBo5f+kym5Ndco=; b=dZEk12y2bszvufc8TtsWvVcPJXYIuC57clamegnfOFuP6ibRnzz8a9mnFyhMRT9HdN WzpAY3dfDUFvebkFcIrcKQXlIeSU5uE16UKBgCcQYX3exhSR/K8sSiJivtFauryJr8ke SJ0RsbVRmCfmU7RdTeNO2dyqvcW2g1ZEELTJXzdZgOeLSKEFea50XbNYa+yLIoNSuONA OTCcIDNOWXsMniMkEBaeuZ8ACBK2mVUCk9DKWA2F/He/v9unEwKjOsKEcbGBjBquYtTZ +WT7XiJb4A7KWXoHFmteDvH46+R7t4ULZUzWt9NFTiRF0izzNnbEU0iZgLF5FUcjTyWk 9TWw== MIME-Version: 1.0 Received: by 10.112.37.7 with SMTP id u7mr830321lbj.30.1349200082463; Tue, 02 Oct 2012 10:48:02 -0700 (PDT) Received: by 10.114.23.230 with HTTP; Tue, 2 Oct 2012 10:48:02 -0700 (PDT) In-Reply-To: References: Date: Tue, 2 Oct 2012 10:48:02 -0700 Message-ID: From: Freddie Cash To: FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 Subject: Re: ZFS background/async destroy feature: MFC timeframe? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2012 17:48:04 -0000 On Mon, Oct 1, 2012 at 10:28 AM, Freddie Cash wrote: > Just wondering when the zfs background/async destroy feature is > planned to be MFC'd? Is the 9.1 release process holding it up? > > Is there any way to manually patch it into 9-STABLE? > > We have a system that is trying to destroy a 1 TB temp filesystem on import > of the pool, which is taking way too long to do (over 3 days so far). > Was hoping to test out the background destroy to see if it would work > in this situation. The commits in question are the following: [1] r236884: zfs feature support, June 11, MFC after 1 month [2] r238422: fixes for defer-destroy, July 13, MFC after 1 week [3] r238926: fixes for zpool feature support, July 30, MFC after 2 weeks There may be others, but those appear to be the main ones for enabling async snapshot destroy. The mailing list thread (ZFS new SPA) includes a link [4] which includes a file called 9-stable-zfs-features.patch.gz which appears to be what I want to test. Wondering if anyone has tested it as yet. [1] http://lists.freebsd.org/pipermail/svn-src-head/2012-June/037825.html [2] http://lists.freebsd.org/pipermail/svn-src-head/2012-July/038766.html [3] http://lists.freebsd.org/pipermail/svn-src-head/2012-July/039172.html [4] http://people.freebsd.org/~mm/patches/zfs/ -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Tue Oct 2 19:48:30 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AABB81065670 for ; Tue, 2 Oct 2012 19:48:30 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (mail.vx.sk [176.9.45.25]) by mx1.freebsd.org (Postfix) with ESMTP id 64F148FC08 for ; Tue, 2 Oct 2012 19:48:28 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.2]) by mail.vx.sk (Postfix) with ESMTP id 73ED349F7B for ; Tue, 2 Oct 2012 21:48:27 +0200 (CEST) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk by core.vx.sk (amavisd-new, unix socket) with LMTP id rjawus0BhSqF for ; Tue, 2 Oct 2012 21:48:22 +0200 (CEST) Received: from [10.9.8.1] (188-167-78-15.dynamic.chello.sk [188.167.78.15]) by mail.vx.sk (Postfix) with ESMTPSA id 8BB1349F69 for ; Tue, 2 Oct 2012 21:48:22 +0200 (CEST) Message-ID: <506B4508.3030406@FreeBSD.org> Date: Tue, 02 Oct 2012 21:48:24 +0200 From: Martin Matuska User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org X-Enigmail-Version: 1.4.4 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Subject: [CFT] ZFS feature flag support for 9-STABLE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2012 19:48:30 -0000 Hello all, ZFS feature flag support is ready to be merged to 9-STABLE. The scheduled merge date is short after 9.1-RELEASE. Early adopters can test new features by applying the following patch (stable/9 r241135): http://people.freebsd.org/~mm/patches/zfs/9-stable-zfs-features.patch.gz Steps to apply to a clean checked-out source: cd /path/to/src patch -p0 < /path/to/9-stable-zfs-features.patch Alternatively you can download pre-compiled mfsBSD images for testing: Standard edition (amd64): http://mfsbsd.vx.sk/files/testing/9-stable-zfs-features.iso Special edition with installation file (amd64): http://mfsbsd.vx.sk/files/testing/9-stable-se-zfs-features.iso Feedback and suggestions are welcome. -- Martin Matuska FreeBSD committer http://blog.vx.sk From owner-freebsd-fs@FreeBSD.ORG Tue Oct 2 20:11:11 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D12E3106564A; Tue, 2 Oct 2012 20:11:11 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-la0-f54.google.com (mail-la0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 239718FC08; Tue, 2 Oct 2012 20:11:10 +0000 (UTC) Received: by lage12 with SMTP id e12so3175742lag.13 for ; Tue, 02 Oct 2012 13:11:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=QH980adADfi4U0RhM6LYKZQDSkB4PABZq3LI/hrlzeY=; b=bjKpR4O3rfh55LhoMfzU/AWfngOJQa3ETBu94lUgH9OCeWST536MFEP6cH7QpmMnSU +h0f1k/7gTav+Tozy7Fl8AEQtYbExz37luw9e/ydzKfX19A9ocQFZub/+SbbmISBIuIS aOGaT8C985z0TJzSNtjN6XFAXC7Vs7K4TqEL87Ax2pjXRNm1phKCPku436wy+1lnHxE0 Exc8EvKy54MxFeQ49jhjSCJCkMvi1tLN47aBABRPunUE/PnSvlwzmGVz4Aw3rpDENIUq 9XdFHXKjwFQfYdpoP9P2jijbuIME11YhmrZIzWUvKSJ6UBQ/3+lzS55Quj6sHyXtBvzZ mXCw== MIME-Version: 1.0 Received: by 10.112.27.228 with SMTP id w4mr936175lbg.118.1349208663876; Tue, 02 Oct 2012 13:11:03 -0700 (PDT) Received: by 10.114.23.230 with HTTP; Tue, 2 Oct 2012 13:11:03 -0700 (PDT) In-Reply-To: <506B4508.3030406@FreeBSD.org> References: <506B4508.3030406@FreeBSD.org> Date: Tue, 2 Oct 2012 13:11:03 -0700 Message-ID: From: Freddie Cash To: Martin Matuska Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: [CFT] ZFS feature flag support for 9-STABLE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2012 20:11:11 -0000 On Tue, Oct 2, 2012 at 12:48 PM, Martin Matuska wrote: > ZFS feature flag support is ready to be merged to 9-STABLE. > The scheduled merge date is short after 9.1-RELEASE. > > Early adopters can test new features by applying the following patch > (stable/9 r241135): > http://people.freebsd.org/~mm/patches/zfs/9-stable-zfs-features.patch.gz > > Steps to apply to a clean checked-out source: > cd /path/to/src > patch -p0 < /path/to/9-stable-zfs-features.patch > > Alternatively you can download pre-compiled mfsBSD images for testing: > > Standard edition (amd64): > http://mfsbsd.vx.sk/files/testing/9-stable-zfs-features.iso > > Special edition with installation file (amd64): > http://mfsbsd.vx.sk/files/testing/9-stable-se-zfs-features.iso > > Feedback and suggestions are welcome. THANK YOU!! :) Patch applied cleanly to source tree updated a few minutes ago. Buildworld in progress. I'll let you know if we run into any issues. -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 03:27:38 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 821) id 9E254106566B; Wed, 3 Oct 2012 03:27:38 +0000 (UTC) Date: Wed, 3 Oct 2012 03:27:38 +0000 From: John To: FreeBSD-FS Message-ID: <20121003032738.GA42140@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i Subject: ZFS/istgt lockup X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 03:27:38 -0000 Hi Folks, I've been chasing a problem that I'm not quite sure originates on the BSD side, but the system shouldn't lock up and require a power cycle to reboot. The config: I have a bsd system running 9.1RC handing out a 36TB volume to a Linux RHEL 6.1 system. The RHEL 6.1 systems is doing heavy I/O & number crunching. Many hours into the job stream the kernel becomes quite unhappy: kernel: __ratelimit: 27665 callbacks suppressed kernel: swapper: page allocation failure. order:1, mode:0x4020 kernel: Pid: 0, comm: swapper Tainted: G ---------------- T 2.6.32-131.0.15.el6.x86_64 #1 kernel: Call Trace: kernel: [] ? __alloc_pages_nodemask+0x716/0x8b0 kernel: [] ? alloc_pages_current+0xaa/0x110 kernel: [] ? refill_fl+0x3d5/0x4a0 [cxgb3] kernel: [] ? napi_frags_finish+0x6d/0xb0 kernel: [] ? process_responses+0x653/0x1450 [cxgb3] kernel: [] ? ring_buffer_lock_reserve+0xa2/0x160 kernel: [] ? napi_rx_handler+0x3c/0x90 [cxgb3] kernel: [] ? net_rx_action+0x103/0x2f0 kernel: [] ? __do_softirq+0xb7/0x1e0 kernel: [] ? handle_IRQ_event+0xf6/0x170 kernel: [] ? call_softirq+0x1c/0x30 kernel: [] ? do_softirq+0x65/0xa0 kernel: [] ? irq_exit+0x85/0x90 kernel: [] ? do_IRQ+0x75/0xf0 kernel: [] ? ret_from_intr+0x0/0x11 kernel: [] ? native_safe_halt+0xb/0x10 kernel: [] ? ftrace_raw_event_power_start+0x16/0x20 kernel: [] ? default_idle+0x4d/0xb0 kernel: [] ? cpu_idle+0xb6/0x110 kernel: [] ? start_secondary+0x202/0x245 On the bsd side, the istgt daemon appears to see that one of the connection threads is down and attempts to restart it. At this point, the istgt process size starts to grow. USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND root 1224 0.0 0.4 8041092 405472 v0- DL 4:59PM 15:28.72 /usr/local/bin/istgt root 1224 0.0 0.4 8041092 405472 v0- IL 4:59PM 63:18.34 /usr/local/bin/istgt root 1224 0.0 0.4 8041092 405472 v0- IL 4:59PM 61:13.80 /usr/local/bin/istgt root 1224 0.0 0.4 8041092 405472 v0- IL 4:59PM 0:00.00 /usr/local/bin/istgt There are more than 1400 threads reported. Also of interest, netstat shows: tcp4 0 0 10.59.6.12.5010 10.59.25.113.54076 CLOSE_WAIT tcp4 0 0 10.60.6.12.5010 10.60.25.113.33345 CLOSED tcp4 0 0 10.59.6.12.5010 10.59.25.113.54074 CLOSE_WAIT tcp4 0 0 10.60.6.12.5010 10.60.25.113.33343 CLOSED tcp4 0 0 10.59.6.12.5010 10.59.25.113.54072 CLOSE_WAIT tcp4 0 0 10.60.6.12.5010 10.60.25.113.33341 CLOSED tcp4 0 0 10.60.6.12.5010 10.60.25.113.33339 CLOSED tcp4 0 0 10.59.6.12.5010 10.59.25.113.54070 CLOSE_WAIT tcp4 0 0 10.60.6.12.5010 10.60.25.113.53806 CLOSE_WAIT There are more than 1400 sockets in the CLOSE* state. What would prevent these sockets from cleaning up in a reasonable timeframe? Both sides of the mpio connection appear to be attempting reconnects. An attempt to gracefully kill istgt fails. A kill -9 does not clean things up either. A procstat -kk 1224 after the kill -9 shows: PID TID COMM TDNAME KSTACK 1224 100959 istgt sigthread mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 zio_wait+0x61 dbuf_read+0x5e5 dmu_buf_hold+0xe0 zap_lockdir+0x58 zap_ lookup_norm+0x45 zap_lookup+0x2e zfs_dirent_lock+0x4ff zfs_dirlook+0x69 zfs_lookup+0x26b zfs_freebsd_lookup+0x81 vfs_cache_lookup+0xf8 VOP_LOOKUP_APV+0x40 lookup+0x 464 namei+0x4e9 vn_open_cred+0x3cb 1224 100960 istgt luthread #1 mi_switch+0x186 sleepq_wait+0x42 _sleep+0x376 bwait+0x64 physio+0x246 devfs_write_f+0x8d dofilewrite+0x8b kern_writev +0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7 1224 103533 istgt sendthread #1493 mi_switch+0x186 thread_suspend_switch+0xc9 thread_single+0x1b2 exit1+0x72 sigexit+0x7c postsig+0x3a4 ast+0x26c doreti _ast+0x1f An attempt to forcefully export the pool hangs also. A procstat shows: PID TID COMM TDNAME KSTACK 4427 100991 zpool - mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 dbuf_read+0x30b dmu_buf_hold+0xe0 zap_lockdir+0x58 zap_lookup_norm+0x45 zap_lookup+0x2e dsl_dir_open_spa+0x121 dsl_dataset_hold+0x3b dmu_objset_hold+0x23 zfs_ioc_objset_stats+0x2b zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b kern_ioctl+0x115 sys_ioctl+0xfd amd64_syscall+0x546 Xfast_syscall+0xf7 If anyone has any ideas, please let me know. I know I've left a lot of config information out in an attempt to keep the email shorter. Random comments: This happens with or without multipathd enabled on the linux client. If I catch the istgt daemon while it's creating threads and kill it the system will not lock up. I see no errors in the istgt log file. One of my next things to try is to enable all debugging... The amount of debugging data captured is quite large :-( I am using chelsio 10G cards on both client/server which have been rock solid in all other cases. Thoughts welcome! Thanks, John From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 04:03:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5B2DE106564A; Wed, 3 Oct 2012 04:03:17 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Received: from hergotha.csail.mit.edu (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) by mx1.freebsd.org (Postfix) with ESMTP id DF28F8FC08; Wed, 3 Oct 2012 04:03:16 +0000 (UTC) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.14.5/8.14.5) with ESMTP id q9343FEk078081; Wed, 3 Oct 2012 00:03:15 -0400 (EDT) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.14.5/8.14.4/Submit) id q9343FuC078078; Wed, 3 Oct 2012 00:03:15 -0400 (EDT) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <20587.47363.504969.926603@hergotha.csail.mit.edu> Date: Wed, 3 Oct 2012 00:03:15 -0400 From: Garrett Wollman To: Rick Macklem In-Reply-To: <499414315.1544891.1349180909058.JavaMail.root@erie.cs.uoguelph.ca> References: <20586.27582.478147.838896@hergotha.csail.mit.edu> <499414315.1544891.1349180909058.JavaMail.root@erie.cs.uoguelph.ca> X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (hergotha.csail.mit.edu [127.0.0.1]); Wed, 03 Oct 2012 00:03:15 -0400 (EDT) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on hergotha.csail.mit.edu Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, hackers@freebsd.org Subject: Re: NFS server bottlenecks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 04:03:17 -0000 [Adding freebsd-fs@ to the Cc list, which I neglected the first time around...] < said: > I can't remember (I am early retired now;-) if I mentioned this patch before: > http://people.freebsd.org/~rmacklem/drc.patch > It adds tunables vfs.nfsd.tcphighwater and vfs.nfsd.udphighwater that can > be twiddled so that the drc is trimmed less frequently. By making these > values larger, the trim will only happen once/sec until the high water > mark is reached, instead of on every RPC. The tradeoff is that the DRC will > become larger, but given memory sizes these days, that may be fine for you. It will be a while before I have another server that isn't in production (it's on my deployment plan, but getting the production servers going is taking first priority). The approaches that I was going to look at: Simplest: only do the cache trim once every N requests (for some reasonable value of N, e.g., 1000). Maybe keep track of the number of entries in each hash bucket and ignore those buckets that only have one entry even if is stale. Simple: just use a sepatate mutex for each list that a cache entry is on, rather than a global lock for everything. This would reduce the mutex contention, but I'm not sure how significantly since I don't have the means to measure it yet. Moderately complicated: figure out if a different synchronization type can safely be used (e.g., rmlock instead of mutex) and do so. More complicated: move all cache trimming to a separate thread and just have the rest of the code wake it up when the cache is getting too big (or just once a second since that's easy to implement). Maybe just move all cache processing to a separate thread. It's pretty clear from the profile that the cache mutex is heavily contended, so anything that reduces the length of time it's held is probably a win. That URL again, for the benefit of people on freebsd-fs who didn't see it on hackers, is: >> . (This graph is slightly modified from my previous post as I removed some spurious edges to make the formatting look better. Still looking for a way to get a profile that includes all kernel modules with the kernel.) -GAWollman From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 05:20:16 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id DB438106566B for ; Wed, 3 Oct 2012 05:20:16 +0000 (UTC) (envelope-from unlateral@gmail.com) Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 9CCFF8FC0C for ; Wed, 3 Oct 2012 05:20:16 +0000 (UTC) Received: by obbwc20 with SMTP id wc20so7696259obb.13 for ; Tue, 02 Oct 2012 22:20:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=zMCH5RcL/HUkMfYFqyVYEhTaeWTSxdLabN5Sim62BPE=; b=Ehe4q7LYE9TJPHhoRoTy3uHeC3bLchkr5JMCE6fDpNVd6kQdeeuqZdPzC3ugZ/RfJE WelcWQ+Ni9OzGKezXAJjnzmo8RcFU1lJGxi1//xxT+auvd2NS/O0MT4+PBdP8DlNV6vq 1fYEsjp9eOWeYOKdFmlzkQ3l+sW8R6kj3uivDy3uNvJVuQzqgLTPXBsaK2F82et4SzDi SDltHXvTpzhXOG7meSa3PlpfeN+zbGdzJjeYPoa7WB+6gfS0tCZSNlX7QalETAxDGMkQ fRxQesEGPd0En4YXbwwYHun5Zl5x/c2Mk2R/EHPWLgsEtQuHM8mXHEcAKtDSsPsJAgLR 671w== MIME-Version: 1.0 Received: by 10.182.51.65 with SMTP id i1mr670205obo.45.1349241614929; Tue, 02 Oct 2012 22:20:14 -0700 (PDT) Received: by 10.76.168.40 with HTTP; Tue, 2 Oct 2012 22:20:14 -0700 (PDT) Date: Wed, 3 Oct 2012 01:20:14 -0400 Message-ID: From: Alan Gerber To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: Please help: trying to determine how to resurrect a ZFS pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 05:20:17 -0000 All, Apologies if I've sent this to the wrong list. I had a kernel panic take down a machine earlier this evening that has been running a ZFS pool stably since the feature first became available back in the 7.x days. Today, that system is running 8.3. I'm hoping for a pointer that will help me recover this pool, or at least some of the data from it. I'd certainly like to hear something other than "your pool is hosed!" ;) Anyway, once the system came back online after the panic, ZFS showed that it had lost a number of devices: hss01fs# zpool status pool: storage state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-3C scan: none requested config: NAME STATE READ WRITE CKSUM storage UNAVAIL 0 0 0 raidz1-0 ONLINE 0 0 0 ad18 ONLINE 0 0 0 ad14 ONLINE 0 0 0 ad16 ONLINE 0 0 0 raidz1-2 UNAVAIL 0 0 0 13538029832220131655 UNAVAIL 0 0 0 was /dev/da4 7801765878003193608 UNAVAIL 0 0 0 was /dev/da6 8205912151490430094 UNAVAIL 0 0 0 was /dev/da5 raidz1-3 DEGRADED 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 9503593162443292907 UNAVAIL 0 0 0 was /dev/da2 As you can see, the big problem is the loss of the raidz1-2 vdev. The catch is that all of the missing devices are in fact present on the system and fully operational. I've tried moving these devices to different physical drive slots, reseating the drives, zfs import -F and everything else I can think of to make the four missing devices show up again. Inspecting the labels on the various devices shows what I would expect to see: hss01fs# zdb -l /dev/da4 -------------------------------------------- LABEL 0 -------------------------------------------- version: 28 name: 'storage' state: 0 txg: 14350975 pool_guid: 14645280560957485120 hostid: 666199208 hostname: 'hss01fs' top_guid: 11177432030203081903 guid: 17379190273116326394 hole_array[0]: 1 vdev_children: 4 vdev_tree: type: 'raidz' id: 2 guid: 11177432030203081903 nparity: 1 metaslab_array: 4097 metaslab_shift: 32 ashift: 9 asize: 750163329024 is_log: 0 create_txg: 11918593 children[0]: type: 'disk' id: 0 guid: 4427378272884026385 path: '/dev/da7' phys_path: '/dev/da7' whole_disk: 1 DTL: 4104 create_txg: 11918593 children[1]: type: 'disk' id: 1 guid: 17379190273116326394 path: '/dev/da6' phys_path: '/dev/da6' whole_disk: 1 DTL: 4107 create_txg: 11918593 children[2]: type: 'disk' id: 2 guid: 6091017181957750886 path: '/dev/da3' phys_path: '/dev/da3' whole_disk: 1 DTL: 4101 create_txg: 11918593 [labels 1-3 with identical output values snipped] If I look at one of the operational drives that ZFS recognizes, such as /dev/ad18, I see the same transaction group value present. I've done enough digging to realize that at this point the problem is likely that the GUID entries for each disk are not matching up with what ZFS is expecting from the disk. But I'm not sure what to do about it. If one of you fine folks could please point me in the direction of recovering this pool, I'd greatly appreciate it! -- Alan Gerber From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 07:43:00 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 59C29106564A for ; Wed, 3 Oct 2012 07:43:00 +0000 (UTC) (envelope-from gber@freebsd.org) Received: from smtp.semihalf.com (smtp.semihalf.com [213.17.239.109]) by mx1.freebsd.org (Postfix) with ESMTP id 0A9A18FC08 for ; Wed, 3 Oct 2012 07:42:59 +0000 (UTC) Received: from localhost (unknown [213.17.239.109]) by smtp.semihalf.com (Postfix) with ESMTP id 957A7119C6D; Wed, 3 Oct 2012 09:42:58 +0200 (CEST) X-Virus-Scanned: by amavisd-new at semihalf.com Received: from smtp.semihalf.com ([213.17.239.109]) by localhost (smtp.semihalf.com [213.17.239.109]) (amavisd-new, port 10024) with ESMTP id wWT9QSrWf4QB; Wed, 3 Oct 2012 09:42:58 +0200 (CEST) Received: from [10.0.0.93] (cardhu.semihalf.com [213.17.239.108]) by smtp.semihalf.com (Postfix) with ESMTPSA id F0511119C4F; Wed, 3 Oct 2012 09:42:57 +0200 (CEST) Message-ID: <506C082B.8000207@freebsd.org> Date: Wed, 03 Oct 2012 11:40:59 +0200 From: Grzegorz Bernacki User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.24) Gecko/20120127 Thunderbird/3.1.16 MIME-Version: 1.0 To: Boris Astardzhiev References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, stanislav_galabov@smartcom.bg Subject: Re: libstand's NANDFS superblock detection fix X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 07:43:00 -0000 On 10/02/12 17:00, Boris Astardzhiev wrote: > Hello, > > On behalf of Smartcom Bulgaria AD I would like to contribute a patch for > libstand's NANDFS support in FreeBSD. > It is related to the correct detection of a superblock when accessing > the filesystem. It's also been noticed that > the election of a superblock between kernelspace and libstand's one > differs regarding the checkpoint number. > The patch is attached. > > Comments will be appreciated. > > Greetings, > Boris Astardzhiev / Smartcom Bulgaria AD Hello Boris, Patch looks fine. Just remove code under #if 0 and it can be submitted. Please let me know if you want to do it or you can do it yourself. Give me some more time and I will check checkpoint number differs you mentioned. thanks a lot for you contribution, grzesiek From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 08:12:40 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CA939106564A for ; Wed, 3 Oct 2012 08:12:40 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 1EFB78FC08 for ; Wed, 3 Oct 2012 08:12:39 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA02515; Wed, 03 Oct 2012 11:12:37 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TJK4K-000GRP-VC; Wed, 03 Oct 2012 11:12:37 +0300 Message-ID: <506BF372.1090208@FreeBSD.org> Date: Wed, 03 Oct 2012 11:12:34 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: Nikolay Denev References: <906543F2-96BD-4519-B693-FD5AFB646F87@gmail.com> In-Reply-To: <906543F2-96BD-4519-B693-FD5AFB646F87@gmail.com> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: "" Subject: Re: nfs + zfs hangs on RELENG_9 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 08:12:40 -0000 on 02/10/2012 13:26 Nikolay Denev said the following: > 7 100537 zfskern txg_thread_enter mi_switch+0x186 sleepq_wait+0x42 > _cv_wait+0x121 zio_wait+0x61 dsl_pool_sync+0xe0 spa_sync+0x336 > txg_sync_thread+0x136 fork_exit+0x11f fork_trampoline+0xe >From my past experience the threads stuck in zio_wait always meant an I/O operation stuck in a storage controller driver, controller firmware, etc. Not necessarily a case here, but a possibility. Perhaps try camcontrol tags -v to see the state of disk queues. P.S. It would be nice if for debugging purposes we had some place in zio to record bio that it depends upon. E.g. something like: diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h index 80d9336..75b2fcf 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h @@ -432,6 +432,7 @@ struct zio { #ifdef _KERNEL /* FreeBSD only. */ struct ostask io_task; + void *io_bio; #endif }; diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c index 7d146ff..36bb5ad 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c @@ -684,6 +684,7 @@ vdev_geom_io_intr(struct bio *bp) vd->vdev_delayed_close = B_TRUE; } } + zio->io_bio = NULL; g_destroy_bio(bp); zio_interrupt(zio); } @@ -732,6 +733,7 @@ sendreq: } bp = g_alloc_bio(); bp->bio_caller1 = zio; + zio->io_bio = bp; switch (zio->io_type) { case ZIO_TYPE_READ: case ZIO_TYPE_WRITE: Then, in situation like yours you could use kgdb, switch to the thread in zio_wait, go to zio_wait frame and get bio pointer from zio. From there you could try to deduce what is going on with the I/O request. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 08:42:07 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 31B72106566B; Wed, 3 Oct 2012 08:42:07 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id E03888FC12; Wed, 3 Oct 2012 08:42:05 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA02709; Wed, 03 Oct 2012 11:42:04 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TJKWq-000GSv-3Y; Wed, 03 Oct 2012 11:42:04 +0300 Message-ID: <506BFA5B.9060103@FreeBSD.org> Date: Wed, 03 Oct 2012 11:42:03 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org References: <505DB4E6.8030407@smeets.im> <20120924224606.GE79077@ithaqua.etoilebsd.net> <20120925090840.GD35915@deviant.kiev.zoral.com.ua> <20120929154101.GK1402@garage.freebsd.pl> <20120930122403.GB35915@deviant.kiev.zoral.com.ua> In-Reply-To: <20120930122403.GB35915@deviant.kiev.zoral.com.ua> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Baptiste Daroussin , Florian Smeets , Pawel Jakub Dawidek Subject: Re: panic: _sx_xlock_hard: recursed on non-recursive sx zfsvfs->z_hold_mtx[i] @ ...cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1407 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 08:42:07 -0000 on 30/09/2012 15:24 Konstantin Belousov said the following: > The postponing of the reclaim when vnode reserve goes low to the vnlru > process does not solve anything, since you only change the recursion into > the deadlock. > > I discussed an approach for this issue with avg. Basic idea is presented in > the untested patch below. You can specify that some count of the free > vnodes must be present for some dynamic scope, started by > getnewvnode_reserve() function. While staying inside the reserved pool, > getnewvnode() calls would not recurse into vnlru(). The scope is finished > with getnewvnode_drop_reserve(). > > The getnewvnode_reserve() shall be called while no locks are held. > > What do you think ? Here is a patch that makes use of the getnewvnode_reserve API in ZFS: http://people.freebsd.org/~avg/zfs-getnewvnode.diff -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 08:43:30 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE67F1065670; Wed, 3 Oct 2012 08:43:30 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) by mx1.freebsd.org (Postfix) with ESMTP id 1F63B8FC15; Wed, 3 Oct 2012 08:43:29 +0000 (UTC) Received: by wibhr7 with SMTP id hr7so1606578wib.13 for ; Wed, 03 Oct 2012 01:43:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=UMBk0qjlOrS/0ppX0f0P6COS3GVDpop/STEVp/uqlgQ=; b=q0PCAoJ8AnYyK4jgz7mFKHttlJ206pDLsyMdeVDQkX624ofSunr/JpaxKstJpUQGtA mheHuJj0gFnl5URus2bm9YPRJxc9F6qXYCVOXanv9//BnGfEVSO3x5NxpwG9wVTYQYwk 57rBUZgUpFOL/RfmnVPBP4MADsrT8fWzzwxvg1fKiKnaIE+RWR64mbSwGh8bznCp5KHY pfxC6JOHi5Y38HDIyPPXAdM1OgpFjB7LsAOWbWG9RKHUGw7nG1jkZuzp72+auHOZCMPF zhrD58Ggn0VmA9L0UrKrsovSc4LCd2roAFEaGpgkPsD3M1qBYzOFKYotFgal8OAT8tUd Ttww== Received: by 10.180.79.100 with SMTP id i4mr27999207wix.12.1349253808228; Wed, 03 Oct 2012 01:43:28 -0700 (PDT) Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com. [217.18.249.148]) by mx.google.com with ESMTPS id gg4sm7248461wib.6.2012.10.03.01.43.26 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 03 Oct 2012 01:43:26 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Content-Type: text/plain; charset=windows-1252 From: Nikolay Denev In-Reply-To: <506BF372.1090208@FreeBSD.org> Date: Wed, 3 Oct 2012 11:43:23 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <906543F2-96BD-4519-B693-FD5AFB646F87@gmail.com> <506BF372.1090208@FreeBSD.org> To: Andriy Gapon X-Mailer: Apple Mail (2.1498) Cc: "" Subject: Re: nfs + zfs hangs on RELENG_9 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 08:43:30 -0000 On Oct 3, 2012, at 11:12 AM, Andriy Gapon wrote: > on 02/10/2012 13:26 Nikolay Denev said the following: >> 7 100537 zfskern txg_thread_enter mi_switch+0x186 = sleepq_wait+0x42 >> _cv_wait+0x121 zio_wait+0x61 dsl_pool_sync+0xe0 spa_sync+0x336 >> txg_sync_thread+0x136 fork_exit+0x11f fork_trampoline+0xe >=20 > =46rom my past experience the threads stuck in zio_wait always meant = an I/O > operation stuck in a storage controller driver, controller firmware, = etc. > Not necessarily a case here, but a possibility. >=20 > Perhaps try camcontrol tags -v to see the state of disk = queues. >=20 I'm using the mfi(4) driver which does not seem to be under CAM, but I'm = also running it with=20 the following loader tunable : hw.mfi.max_cmds=3D254, which is an = increase over the standard 128 tags, and maybe this could be the problem. I'll revert it now and retest. > P.S. > It would be nice if for debugging purposes we had some place in zio to = record > bio that it depends upon. > E.g. something like: > diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h > b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h > index 80d9336..75b2fcf 100644 > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h > @@ -432,6 +432,7 @@ struct zio { > #ifdef _KERNEL > /* FreeBSD only. */ > struct ostask io_task; > + void *io_bio; > #endif > }; >=20 > diff --git = a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > index 7d146ff..36bb5ad 100644 > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > @@ -684,6 +684,7 @@ vdev_geom_io_intr(struct bio *bp) > vd->vdev_delayed_close =3D B_TRUE; > } > } > + zio->io_bio =3D NULL; > g_destroy_bio(bp); > zio_interrupt(zio); > } > @@ -732,6 +733,7 @@ sendreq: > } > bp =3D g_alloc_bio(); > bp->bio_caller1 =3D zio; > + zio->io_bio =3D bp; > switch (zio->io_type) { > case ZIO_TYPE_READ: > case ZIO_TYPE_WRITE: >=20 > Then, in situation like yours you could use kgdb, switch to the thread = in > zio_wait, go to zio_wait frame and get bio pointer from zio. =46rom = there you > could try to deduce what is going on with the I/O request. >=20 I'm rebuilding now with these patches and DDB/KDB enabled and will try = to get this information if it happens again. > --=20 > Andriy Gapon Thanks! Regards, Nikolay= From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 08:49:35 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A7256106566B; Wed, 3 Oct 2012 08:49:35 +0000 (UTC) (envelope-from boris.astardzhiev@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id EC16C8FC1B; Wed, 3 Oct 2012 08:49:34 +0000 (UTC) Received: by lbdb5 with SMTP id b5so7372944lbd.13 for ; Wed, 03 Oct 2012 01:49:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=dv6trAOHhdtxQ/Y5n9mmRErosRosYtRb6xQSPiAkNjM=; b=axB+tdbQGUdRogFDQkrJ2Q4IgLyV7pSmMeswFCuopr2xwBLSRHO3RMdPodv8jP0VXA tIQ8NP8Fm3qEDYzIbrOw8Nb0JB2sg/TbjsssQfGfXNkQwRL+PeYO2k3ooyY4LOs7LpdU +ci1139ERYUBOBzpw0m2T3q3QQbPWowDD68ssR71uylxyy1al5oDo+w4DrzpIOFqpEsi 7viUVQdJUxRnQICLt7nCZXcadB6KOvfHj6w+koT7uN8Abll3B4yKk+DpQVAQ/Jp/NJnM SkFRI+ZYxddr8HGNwDT8Y03hClLqZANvyBhgXUofjlpT+8SIhc6snIO7O4qxUb/0IsE0 /wCQ== MIME-Version: 1.0 Received: by 10.152.113.165 with SMTP id iz5mr1087516lab.48.1349254173861; Wed, 03 Oct 2012 01:49:33 -0700 (PDT) Received: by 10.112.108.1 with HTTP; Wed, 3 Oct 2012 01:49:33 -0700 (PDT) In-Reply-To: <506C082B.8000207@freebsd.org> References: <506C082B.8000207@freebsd.org> Date: Wed, 3 Oct 2012 11:49:33 +0300 Message-ID: From: Boris Astardzhiev To: Grzegorz Bernacki Content-Type: multipart/mixed; boundary=f46d0408393bbea7c604cb23b7bd X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org, stanislav_galabov@smartcom.bg Subject: Re: libstand's NANDFS superblock detection fix X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 08:49:35 -0000 --f46d0408393bbea7c604cb23b7bd Content-Type: text/plain; charset=ISO-8859-1 Hello, About the checkpoint numbers - we have noticed that libstand-nandfs' judgement is by sb->s_wtime whereas kernelspace does it by sb->s_last_cno (sys/fs/nandfs/nandfs_vfsops.c:1590). The patch is attached without the #if 0 part and is ready to be submitted. I don't have commit privileges and I don't know the right procedure further. Do I need to send a PR or something else? I'll anticipate your reply. Greetings, Boris On Wed, Oct 3, 2012 at 12:40 PM, Grzegorz Bernacki wrote: > On 10/02/12 17:00, Boris Astardzhiev wrote: > >> Hello, >> >> On behalf of Smartcom Bulgaria AD I would like to contribute a patch for >> libstand's NANDFS support in FreeBSD. >> It is related to the correct detection of a superblock when accessing >> the filesystem. It's also been noticed that >> the election of a superblock between kernelspace and libstand's one >> differs regarding the checkpoint number. >> The patch is attached. >> >> Comments will be appreciated. >> >> Greetings, >> Boris Astardzhiev / Smartcom Bulgaria AD >> > > Hello Boris, > > Patch looks fine. Just remove code under #if 0 and it can be submitted. > Please let me know if you want to do it or you can do it yourself. > Give me some more time and I will check checkpoint number differs you > mentioned. > > thanks a lot for you contribution, > grzesiek > --f46d0408393bbea7c604cb23b7bd Content-Type: application/octet-stream; name="nand-sb.diff" Content-Disposition: attachment; filename="nand-sb.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_h7u62st40 ZGlmZiAtLWdpdCBhL2xpYi9saWJzdGFuZC9uYW5kZnMuYyBiL2xpYi9saWJzdGFuZC9uYW5kZnMu YwppbmRleCA2N2UyZmVhLi5kNWZjYjlkIDEwMDY0NAotLS0gYS9saWIvbGlic3RhbmQvbmFuZGZz LmMKKysrIGIvbGliL2xpYnN0YW5kL25hbmRmcy5jCkBAIC0xNzUsNyArMTc1LDcgQEAgc3RhdGlj IGludAogbmFuZGZzX2ZpbmRfc3VwZXJfYmxvY2soc3RydWN0IG5hbmRmcyAqZnMsIHN0cnVjdCBv cGVuX2ZpbGUgKmYpCiB7CiAJc3RydWN0IG5hbmRmc19zdXBlcl9ibG9jayAqc2I7Ci0JaW50IGks IGosIG47CisJaW50IGksIGosIG4sIHM7CiAJaW50IHNlY3RvcnNfdG9fcmVhZCwgZXJyb3I7CiAK IAlzYiA9IG1hbGxvYyhmcy0+bmZfc2VjdG9yc2l6ZSk7CkBAIC0xOTYsMjMgKzE5NiwyMiBAQCBu YW5kZnNfZmluZF9zdXBlcl9ibG9jayhzdHJ1Y3QgbmFuZGZzICpmcywgc3RydWN0IG9wZW5fZmls ZSAqZikKIAkJCWNvbnRpbnVlOwogCQl9CiAJCW4gPSBmcy0+bmZfc2VjdG9yc2l6ZSAvIHNpemVv ZihzdHJ1Y3QgbmFuZGZzX3N1cGVyX2Jsb2NrKTsKKwkJcyA9IDA7CiAJCWlmICgoaSAqIGZzLT5u Zl9zZWN0b3JzaXplKSAlIGZzLT5uZl9mc2RhdGEtPmZfZXJhc2VzaXplID09IDApIHsKIAkJCWlm IChmcy0+bmZfc2VjdG9yc2l6ZSA9PSBzaXplb2Yoc3RydWN0IG5hbmRmc19mc2RhdGEpKQogCQkJ CWNvbnRpbnVlOwogCQkJZWxzZSB7Ci0JCQkJc2IgKz0gKHNpemVvZihzdHJ1Y3QgbmFuZGZzX2Zz ZGF0YSkgLwotCQkJCSAgICBzaXplb2Yoc3RydWN0IG5hbmRmc19zdXBlcl9ibG9jaykpOwotCQkJ CW4gLT0gKHNpemVvZihzdHJ1Y3QgbmFuZGZzX2ZzZGF0YSkgLworCQkJCXMgKz0gKHNpemVvZihz dHJ1Y3QgbmFuZGZzX2ZzZGF0YSkgLwogCQkJCSAgICBzaXplb2Yoc3RydWN0IG5hbmRmc19zdXBl cl9ibG9jaykpOwogCQkJfQogCQl9CiAKLQkJZm9yIChqID0gMDsgaiA8IG47IGorKykgeworCQlm b3IgKGogPSBzOyBqIDwgbjsgaisrKSB7CiAJCQlpZiAoIW5hbmRmc19jaGVja19zdXBlcmJsb2Nr X2NyYyhmcy0+bmZfZnNkYXRhLCAmc2Jbal0pKQogCQkJCWNvbnRpbnVlOwotCQkJTkFOREZTX0RF QlVHKCJtYWdpYyAleCB3dGltZSAlamRcbiIsIHNiLT5zX21hZ2ljLAotCQkJICAgIHNiLT5zX3d0 aW1lKTsKLQkJCWlmIChzYltqXS5zX3d0aW1lID4gZnMtPm5mX3NiLT5zX3d0aW1lKQorCQkJTkFO REZTX0RFQlVHKCJtYWdpYyAleCB3dGltZSAlamQsIGxhc3RjcCAweCVqeFxuIiwKKwkJCSAgICBz YltqXS5zX21hZ2ljLCBzYltqXS5zX3d0aW1lLCBzYltqXS5zX2xhc3RfY25vKTsKKwkJCWlmIChz YltqXS5zX2xhc3RfY25vID4gZnMtPm5mX3NiLT5zX2xhc3RfY25vKQogCQkJCW1lbWNweShmcy0+ bmZfc2IsICZzYltqXSwgc2l6ZW9mKCpmcy0+bmZfc2IpKTsKIAkJfQogCX0K --f46d0408393bbea7c604cb23b7bd-- From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 08:53:03 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AC450106564A; Wed, 3 Oct 2012 08:53:03 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (garage.dawidek.net [91.121.88.72]) by mx1.freebsd.org (Postfix) with ESMTP id 6F5E68FC0C; Wed, 3 Oct 2012 08:53:02 +0000 (UTC) Received: from localhost (dls125.neoplus.adsl.tpnet.pl [83.24.48.125]) by mail.dawidek.net (Postfix) with ESMTPSA id 8F1E8C31; Wed, 3 Oct 2012 10:51:50 +0200 (CEST) Date: Wed, 3 Oct 2012 10:53:27 +0200 From: Pawel Jakub Dawidek To: "Justin T. Gibbs" Message-ID: <20121003085326.GC1386@garage.freebsd.pl> References: <505DE715.8020806@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="s9fJI615cBHmzTOP" Content-Disposition: inline In-Reply-To: X-OS: FreeBSD 10.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Andriy Gapon Subject: Re: zfs: allow to mount root from a pool not in zpool.cache X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 08:53:03 -0000 --s9fJI615cBHmzTOP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Sep 22, 2012 at 10:59:56PM -0600, Justin T. Gibbs wrote: > On Sep 22, 2012, at 10:28 AM, Andriy Gapon wrote: >=20 > >=20 > > Currently FreeBSD ZFS kernel code doesn't allow to mount root filesyste= m on a > > pool that is not listed in zpool.cache as only pools from the cache are= known to > > ZFS at that time. >=20 > I've for some time been of the opinion that FreeBSD should only use > the cache file for ZFS pools created from non-GEOM objects (i.e. > files). GEOM tasting should be used to make the kernel aware of > all pools whether they be imported on the system, partial, or > foreign. Even for pools created by files, the user land utilities > should do nothing more than ask the kernel to "taste them". This > would remove code duplicated in user land for this task (code that > must be re-executed in kernel space for validation reasons anyway) > and also help solve problems we've encountered at Spectra with races > in fault event processing, spare management, and device arrival and > departures. >=20 > So I'm excited by your work in this area and would encourage you > to "think larger" than just trying to integrate root pool discovery > with GEOM. Spectra may even be able to help in this work sometime > in the near future. GEOM tasting would most likely require rewriting the code heavly. Also note that you can have pools in you system that do match your hostid, but user decided to keep exported and such pool should not be configured automatically. Not a huge problem probably as there is pool status somewhere in the metadata that we can use to see if the pool is exported or not. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl --s9fJI615cBHmzTOP Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlBr/QYACgkQForvXbEpPzS7WgCg785gwKM6zy+TKHbwMj8QA2Dw CcYAnRR76i8cc1QL5UsIItu6PH7qelx1 =qv5g -----END PGP SIGNATURE----- --s9fJI615cBHmzTOP-- From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 10:33:21 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6D515106564A; Wed, 3 Oct 2012 10:33:21 +0000 (UTC) (envelope-from bapt@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3176B8FC16; Wed, 3 Oct 2012 10:33:21 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q93AXLS0064549; Wed, 3 Oct 2012 10:33:21 GMT (envelope-from bapt@FreeBSD.org) Received: (from bapt@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q93AXKRD064546; Wed, 3 Oct 2012 10:33:20 GMT (envelope-from bapt@FreeBSD.org) X-Authentication-Warning: freefall.freebsd.org: bapt set sender to bapt@FreeBSD.org using -f Date: Wed, 3 Oct 2012 12:33:17 +0200 From: Baptiste Daroussin To: Andriy Gapon Message-ID: <20121003103317.GB6377@ithaqua.etoilebsd.net> References: <505DB4E6.8030407@smeets.im> <20120924224606.GE79077@ithaqua.etoilebsd.net> <20120925090840.GD35915@deviant.kiev.zoral.com.ua> <20120929154101.GK1402@garage.freebsd.pl> <20120930122403.GB35915@deviant.kiev.zoral.com.ua> <506BFA5B.9060103@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="kORqDWCi7qDJ0mEj" Content-Disposition: inline In-Reply-To: <506BFA5B.9060103@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.org, Florian Smeets , Pawel Jakub Dawidek Subject: Re: panic: _sx_xlock_hard: recursed on non-recursive sx zfsvfs->z_hold_mtx[i] @ ...cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1407 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 10:33:21 -0000 --kORqDWCi7qDJ0mEj Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Oct 03, 2012 at 11:42:03AM +0300, Andriy Gapon wrote: > on 30/09/2012 15:24 Konstantin Belousov said the following: > > The postponing of the reclaim when vnode reserve goes low to the vnlru= =20 > > process does not solve anything, since you only change the recursion in= to=20 > > the deadlock. > >=20 > > I discussed an approach for this issue with avg. Basic idea is presente= d in > > the untested patch below. You can specify that some count of the free > > vnodes must be present for some dynamic scope, started by=20 > > getnewvnode_reserve() function. While staying inside the reserved pool, > > getnewvnode() calls would not recurse into vnlru(). The scope is finish= ed > > with getnewvnode_drop_reserve(). > >=20 > > The getnewvnode_reserve() shall be called while no locks are held. > >=20 > > What do you think ? >=20 > Here is a patch that makes use of the getnewvnode_reserve API in ZFS: > http://people.freebsd.org/~avg/zfs-getnewvnode.diff >=20 I can confirm that this patch solve the situation for me, I have been heavi= ly stressing it on my buildbox during all the night using poudriere and everyt= hing worked where previously failed quite fastly. regards, Bapt --kORqDWCi7qDJ0mEj Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlBsFG0ACgkQ8kTtMUmk6EycwQCgj8wMjdgQYY6PHC0xF+DLJUPU fuMAnjAnnFog/EwZzbwhRXJhnKX7RepZ =hXmh -----END PGP SIGNATURE----- --kORqDWCi7qDJ0mEj-- From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 11:44:08 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8025B1065670 for ; Wed, 3 Oct 2012 11:44:08 +0000 (UTC) (envelope-from ramquick@gmail.com) Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42]) by mx1.freebsd.org (Postfix) with ESMTP id B17588FC08 for ; Wed, 3 Oct 2012 11:44:05 +0000 (UTC) Received: by mail-wg0-f42.google.com with SMTP id fm10so1163013wgb.1 for ; Wed, 03 Oct 2012 04:43:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:date:message-id:subject:from:to:content-type; bh=T0GwnBODSdKNZXuWIU+MI/T502B3Gsi4DKDE1u4P8+8=; b=el4SdgBMJ6HZ5ebdtaabtn7h721/yq1hdEJB9eZCrF1aowfxPIzTqQb2dzSQ/WBnOe p29U6xfp3WFjHL2dHSOtPb2CtT3tyvZAsCFi2AyydJSeIjvw8E4duRiPAivQQhv2GQuW J5aKhOohqQY5bVNlhgOsFe7NtyJ7H4buuIxRH8N6Tnkd8tK+TK1ih2FCuz4UXnUFuzje PheFWb1Tfoqfbs0+ycvTEVLN0uTK8VOTVjF6cDHjiv+XdH6LL1A5+mYCTdJGcESTgkd6 HGjbqjR7UbiTeIkc5TCNITthRzFOuY5lqznggN6H3FEkwixkt0CP8Nwj52ArVNOzL7W3 YlvQ== MIME-Version: 1.0 Received: by 10.216.29.10 with SMTP id h10mr1091186wea.126.1349264639134; Wed, 03 Oct 2012 04:43:59 -0700 (PDT) Received: by 10.217.2.74 with HTTP; Wed, 3 Oct 2012 04:43:59 -0700 (PDT) Date: Wed, 3 Oct 2012 17:13:59 +0530 Message-ID: From: Ram Chander To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Zfs import issue X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ram_chander250@yahoo.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 11:44:08 -0000 Hi, I am importing zfs snapshot to freebsd-9 from anther host running freebsd-9. When the import happens, it locks the filesystem, "df" hangs and unable to use the filesystem. Once the import completes, the filesystem is back to normal and read/write works fine. The same doesnt happen in Solaris/OpenIndiana. # uname -an FreeBSD hostname 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan 3 07:46:30 UTC 2012 root@farrell.cse.buffalo.edu:/ usr/obj/usr/src/sys/GENERIC amd64 Zfs ver: 28 Any inputs would be helpful. Is there any way to overcome this freeze ? Regards, Ram From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 12:36:23 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A346106566C for ; Wed, 3 Oct 2012 12:36:23 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.171]) by mx1.freebsd.org (Postfix) with ESMTP id A8AE98FC17 for ; Wed, 3 Oct 2012 12:36:22 +0000 (UTC) Received: from [192.168.179.201] (hmbg-4d06dfd3.pool.mediaWays.net [77.6.223.211]) by mrelayeu.kundenserver.de (node=mreu4) with ESMTP (Nemesis) id 0MEP3U-1TCkMW48kK-00FR5g; Wed, 03 Oct 2012 14:36:21 +0200 Message-ID: <506C3143.6060000@brockmann-consult.de> Date: Wed, 03 Oct 2012 14:36:19 +0200 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120825 Thunderbird/15.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <1363900011.1436778.1348962614353.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <1363900011.1436778.1348962614353.JavaMail.root@erie.cs.uoguelph.ca> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:ZfMSL66Z8OmznM7gIU78BJtCXtXyohP4QN8Y4f9datC EDAbazPhknj9032fN/eFyXzrPvIJHJ1uprS4w3njIZXkPJIjga 9sOM8piYx7bALI65M/Q+usjbg5BxhY/GgfYmqkcWy/yld5mHh6 n4FhQE6ovs2kxlehYEi7q2WGzlXUJ3wb63ZP/QiOGaKLhJkEVA C0eM6kjPyRKNQfpqVkU7JnIf1NwxpERa/NMCCnZtBcPgm6lf/B mLtAGBrQ10v1D5IiYjl4DQgyVpcf0CdEgORrJ4Yp47lcTg9Bbq 9Bs+CjFB0SSkVYNb3ht0bqa4pv9u2yQNkHyX8gMtuZK7NTM6jH UBbf7A0jPyAXOWBhEOeASI2tBThJ20AQuF3U++j36P6BPQ5BQ9 6eKUiqI6pu/sg== Subject: Re: NFS Performance Help X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 12:36:23 -0000 On 09/30/2012 01:50 AM, Rick Macklem wrote: > Wayne Hotmail wrote: >> Like others I am having issues getting any decent performance out of >> my NFS clients on FreeBSD.I have tried 8.3 and 9.1 beta on stand alone >> servers or as vmware clients. Used 1 Gig connections or a 10 Gig >> connection.Tried mounting using Version 3 and Version 4.I have tried >> the noatime, sync, and tcp options nothing seems to help.I am >> connecting to a IceWeb NAS. My performance with DD is 60 meg a second >> at best when writing to the server. If I load a Redhat Linux server on >> the same hardware using the same connection my write performance is >> about 340 Meg a second. >> It really falls apart when I run a test script where I create a 100 >> folders then create a 100 files in the folders and append to these >> files 5 times using 5 random files. I am trying to simulate a IMAP >> email server. If I run the script on my local mirror drives it takes >> about a one minute and thirty seconds to complete. If I run the script >> on the NFS mounted drives it takes hours to complete. With my Linux >> install on the same hardware this NFS mounted script takes about 4 >> minutes. >> Google is tired of me asking the same question over and over. So if >> anyone would be so kind as to point out some kernel or system tweaks >> to get me passed my NFS issues that would be greatly appreciated. >> Wayne >> > You could try a smaller rsize,wsize by setting the command line args > for the mount. In general, larger rsize,wsize should perform better, > but if a large write generates a burst of traffic that overloads > some part of the network fabric or server, such that packets get > dropped, performance will be hit big time. > > Other than that, if you capture packets and look at them in > wireshark, you might be able to spot where packets are getting lost > and retransmitted. (If packets are getting dropped, then the fun > part is figuring out why and coming up with a workaround.) > > Hopefully others will have more/better suggestions, rick > My only suggestion is to try (but not necessarily in production) the changes suggested in the thread "NFSv3, ZFS, 10GE performance" started by Sven Brandenburg. It didn't do much for my testing, but he says it does a bunch. However, he is using a Linux client, and a ram based ZIL (using ZFS). Other than that, I can only say that I observed the same thing as you (testing both FreeBSD and Linux clients), but I always tested with ZFS. And I found that with FreeBSD, it was putting high load on the ZIL, meaning FreeBSD was using sync writes, but Linux was not. ESXi did the same thing as a client. So with a cheap SSD as a ZIL, ESXi and FreeBSD were writing at around 40-70 MB/s and Linux was writing at 600. The same test using a virtual machine disk mounted over NFS shows how extreme the problem can be, and was instead 7 MB/s with FreeBSD and ESXi, and around 90-200 with Linux. (And to compare 10Gbps performance with other non-NFS tests, I could get something like 600MB/s with a simple netcat from local RAM to remote /dev/null, and 800-900 with more threads or NICs, I don't remember). I couldn't figure out for sure, but I couldn't cause any corruption in my testing, so I just assume Linux is only running "sync" calls when creating files, write barriers to virtual disks, etc. like it does with local file systems instead of doing every single write synchronously. Peter From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 13:00:42 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 19B0D106566B for ; Wed, 3 Oct 2012 13:00:42 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id C76888FC12 for ; Wed, 3 Oct 2012 13:00:41 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAMg1bFCDaFvO/2dsb2JhbABFFoV3uW6CIAEBAQMBAQEBICsgCwUWGAICDRkCKQEJJgYIBwQBHASHXgYLpSmSaYEhigIahQ6BEgOTPIItgRWPFoMJgUc0 X-IronPort-AV: E=Sophos;i="4.80,528,1344225600"; d="scan'208";a="184416957" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 03 Oct 2012 09:00:35 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4EC22B4066; Wed, 3 Oct 2012 09:00:35 -0400 (EDT) Date: Wed, 3 Oct 2012 09:00:35 -0400 (EDT) From: Rick Macklem To: Peter Maloney Message-ID: <507136025.1628950.1349269235306.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <506C3143.6060000@brockmann-consult.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: NFS Performance Help X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 13:00:42 -0000 Peter Maloney wrote: > On 09/30/2012 01:50 AM, Rick Macklem wrote: > > Wayne Hotmail wrote: > >> Like others I am having issues getting any decent performance out > >> of > >> my NFS clients on FreeBSD.I have tried 8.3 and 9.1 beta on stand > >> alone > >> servers or as vmware clients. Used 1 Gig connections or a 10 Gig > >> connection.Tried mounting using Version 3 and Version 4.I have > >> tried > >> the noatime, sync, and tcp options nothing seems to help.I am > >> connecting to a IceWeb NAS. My performance with DD is 60 meg a > >> second > >> at best when writing to the server. If I load a Redhat Linux server > >> on > >> the same hardware using the same connection my write performance is > >> about 340 Meg a second. > >> It really falls apart when I run a test script where I create a 100 > >> folders then create a 100 files in the folders and append to these > >> files 5 times using 5 random files. I am trying to simulate a IMAP > >> email server. If I run the script on my local mirror drives it > >> takes > >> about a one minute and thirty seconds to complete. If I run the > >> script > >> on the NFS mounted drives it takes hours to complete. With my Linux > >> install on the same hardware this NFS mounted script takes about 4 > >> minutes. > >> Google is tired of me asking the same question over and over. So if > >> anyone would be so kind as to point out some kernel or system > >> tweaks > >> to get me passed my NFS issues that would be greatly appreciated. > >> Wayne > >> > > You could try a smaller rsize,wsize by setting the command line args > > for the mount. In general, larger rsize,wsize should perform better, > > but if a large write generates a burst of traffic that overloads > > some part of the network fabric or server, such that packets get > > dropped, performance will be hit big time. > > > > Other than that, if you capture packets and look at them in > > wireshark, you might be able to spot where packets are getting lost > > and retransmitted. (If packets are getting dropped, then the fun > > part is figuring out why and coming up with a workaround.) > > > > Hopefully others will have more/better suggestions, rick > > > My only suggestion is to try (but not necessarily in production) the > changes suggested in the thread "NFSv3, ZFS, 10GE performance" started > by Sven Brandenburg. It didn't do much for my testing, but he says it > does a bunch. However, he is using a Linux client, and a ram based ZIL > (using ZFS). > > Other than that, I can only say that I observed the same thing as you > (testing both FreeBSD and Linux clients), but I always tested with > ZFS. > And I found that with FreeBSD, it was putting high load on the ZIL, > meaning FreeBSD was using sync writes, but Linux was not. ESXi did the > same thing as a client. So with a cheap SSD as a ZIL, ESXi and FreeBSD > were writing at around 40-70 MB/s and Linux was writing at 600. The > same > test using a virtual machine disk mounted over NFS shows how extreme > the > problem can be, and was instead 7 MB/s with FreeBSD and ESXi, and > around > 90-200 with Linux. (And to compare 10Gbps performance with other > non-NFS > tests, I could get something like 600MB/s with a simple netcat from > local RAM to remote /dev/null, and 800-900 with more threads or NICs, > I > don't remember). > > I couldn't figure out for sure, but I couldn't cause any corruption in > my testing, so I just assume Linux is only running "sync" calls when > creating files, write barriers to virtual disks, etc. like it does > with > local file systems instead of doing every single write synchronously. > I would not recommend that this go in a production system at this time, but you could try the following patch on a test system to see if it alleviates the problem. (If it never gets tested, we'll never know if it works well and should be considered for a commit to head.:-) http://people.freebsd.org/~rmacklem/dirtybuflist.patch The FreeBSD NFS clients (new one just cloned the code in the old one) keeps track of what part of buffer cache block has been written to, instead of always pre-reading in the block and then writing the whole block back to the server. This provides more correct behaviour if multiple clients are writing non-overlapping areas of the same block concurrently and provides better performance in some situations. Unfortunately, the current code only handles a single dirty byte region in the block. As such, when a write of an area not contiguous with a byte region that is already dirty/modified, the code writes the old byte region to the server as a single synchronous write. The above patch changes the code so that it maintains a list of dirty/modified byte regions and avoids this problem. jhb@ also had a simpler patch which avoided the synchronous write, but it didn't preserve correct behaviour when multiple clients are concurrently writing to non-overlapping areas of the same block. (I don't think I have his patch handy at the moment, but maybe he does?) rick > Peter > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 13:21:08 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 83CA2106566B; Wed, 3 Oct 2012 13:21:08 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id E31C48FC14; Wed, 3 Oct 2012 13:21:07 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAHk6bFCDaFvO/2dsb2JhbABFFoV3uW6CIAEBAQMBAQEBICsgCwUWGAICDRkCKQEJJgYIBwQBHAEDh14GC6UnkmSBIYoCGoUOgRIDkzyCLYEVjxaDCYFHNA X-IronPort-AV: E=Sophos;i="4.80,528,1344225600"; d="scan'208";a="181737325" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 03 Oct 2012 09:21:06 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 85A10B4032; Wed, 3 Oct 2012 09:21:06 -0400 (EDT) Date: Wed, 3 Oct 2012 09:21:06 -0400 (EDT) From: Rick Macklem To: Garrett Wollman Message-ID: <1571646304.1630985.1349270466529.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20587.47363.504969.926603@hergotha.csail.mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, hackers@freebsd.org Subject: Re: NFS server bottlenecks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 13:21:08 -0000 Garrett Wollman wrote: > [Adding freebsd-fs@ to the Cc list, which I neglected the first time > around...] > > < said: > > > I can't remember (I am early retired now;-) if I mentioned this > > patch before: > > http://people.freebsd.org/~rmacklem/drc.patch > > It adds tunables vfs.nfsd.tcphighwater and vfs.nfsd.udphighwater > > that can > > be twiddled so that the drc is trimmed less frequently. By making > > these > > values larger, the trim will only happen once/sec until the high > > water > > mark is reached, instead of on every RPC. The tradeoff is that the > > DRC will > > become larger, but given memory sizes these days, that may be fine > > for you. > > It will be a while before I have another server that isn't in > production (it's on my deployment plan, but getting the production > servers going is taking first priority). > > The approaches that I was going to look at: > > Simplest: only do the cache trim once every N requests (for some > reasonable value of N, e.g., 1000). Maybe keep track of the number of > entries in each hash bucket and ignore those buckets that only have > one entry even if is stale. > Well, the patch I have does it when it gets "too big". This made sense to me, since the cache is trimmed to keep it from getting too large. It also does the trim at least once/sec, so that really stale entries are removed. > Simple: just use a sepatate mutex for each list that a cache entry > is on, rather than a global lock for everything. This would reduce > the mutex contention, but I'm not sure how significantly since I > don't have the means to measure it yet. > Well, since the cache trimming is removing entries from the lists, I don't see how that can be done with a global lock for list updates? A mutex in each element could be used for changes (not insertion/removal) to an individual element. However, the current code manipulates the lists and makes minimal changes to the individual elements, so I'm not sure if a mutex in each element would be useful or not, but it wouldn't help for the trimming case, imho. I modified the patch slightly, so it doesn't bother to acquire the mutex when it is checking if it should trim now. I think this results in a slight risk that the test will use an "out of date" cached copy of one of the global vars, but since the code isn't modifying them, I don't think it matters. This modified patch is attached and is also here: http://people.freebsd.org/~rmacklem/drc2.patch > Moderately complicated: figure out if a different synchronization type > can safely be used (e.g., rmlock instead of mutex) and do so. > > More complicated: move all cache trimming to a separate thread and > just have the rest of the code wake it up when the cache is getting > too big (or just once a second since that's easy to implement). Maybe > just move all cache processing to a separate thread. > Only doing it once/sec would result in a very large cache when bursts of traffic arrives. The above patch does it when it is "too big" or at least once/sec. I'm not sure I see why doing it as a separate thread will improve things. There are N nfsd threads already (N can be bumped up to 256 if you wish) and having a bunch more "cache trimming threads" would just increase contention, wouldn't it? The only negative effect I can think of w.r.t. having the nfsd threads doing it would be a (I believe negligible) increase in RPC response times (the time the nfsd thread spends trimming the cache). As noted, I think this time would be negligible compared to disk I/O and network transit times in the total RPC response time? Isilon did use separate threads (I never saw their code, so I am going by what they told me), but it sounded to me like they were trimming the cache too agressively to be effective for TCP mounts. (ie. It sounded to me like they had broken the algorithm to achieve better perf.) Remember that the DRC is weird, in that it is a cache to improve correctness at the expense of overhead. It never improves performance. On the other hand, turn it off or throw away entries too aggressively and data corruption, due to retries of non-idempotent operations, can be the outcome. Good luck with whatever you choose, rick > It's pretty clear from the profile that the cache mutex is heavily > contended, so anything that reduces the length of time it's held is > probably a win. > > That URL again, for the benefit of people on freebsd-fs who didn't see > it on hackers, is: > > >> . > > (This graph is slightly modified from my previous post as I removed > some spurious edges to make the formatting look better. Still looking > for a way to get a profile that includes all kernel modules with the > kernel.) > > -GAWollman > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 13:34:58 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 03EF7106564A for ; Wed, 3 Oct 2012 13:34:58 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 531DC8FC18 for ; Wed, 3 Oct 2012 13:34:56 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA06424; Wed, 03 Oct 2012 16:34:53 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <506C3EFC.2060602@FreeBSD.org> Date: Wed, 03 Oct 2012 16:34:52 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: ram_chander250@yahoo.com References: In-Reply-To: X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: Zfs import issue X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 13:34:58 -0000 on 03/10/2012 14:43 Ram Chander said the following: > Hi, > > I am importing zfs snapshot to freebsd-9 from anther host running > freebsd-9. When the import happens, it locks the filesystem, "df" hangs > and unable to use the filesystem. Once the import completes, the filesystem > is back to normal and read/write works fine. The same doesnt happen in > Solaris/OpenIndiana. > > # uname -an > FreeBSD hostname 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan 3 07:46:30 > UTC 2012 root@farrell.cse.buffalo.edu:/ > usr/obj/usr/src/sys/GENERIC amd64 > > Zfs ver: 28 > > > Any inputs would be helpful. Is there any way to overcome this freeze ? What if you add -n option to df? -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 14:03:27 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B9C351065672 for ; Wed, 3 Oct 2012 14:03:27 +0000 (UTC) (envelope-from olivier@gid0.org) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 335D38FC08 for ; Wed, 3 Oct 2012 14:03:26 +0000 (UTC) Received: by lbdb5 with SMTP id b5so7684963lbd.13 for ; Wed, 03 Oct 2012 07:03:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=urWdLwfVVjFl/V865MubPGm/qtp/DbjSdgtVWij9RBs=; b=iYW8ceZZVa+L+MElO9ySMKn/nB/+QyfWx8EOnYZT379cAfm9vCKHczSVVPwpOxmdAZ JkVnBt+lPeVU7ezhYapXsT8qB8gEU7ZD9r/F3mFIrC6dVlBAavkUfViWLqZCjKZc/U5m m6Iy/TDY93sDhPwFNBILd3KOIsSYe/alfUHuRk0nBzEPw8mem+7NC3aW9ONlLRB3c7wk rH2nvz9niqrCT7dapq4oBo5dtQFOY5IU31QlA1CWwNteF9GiViplnAbahyGc+2HdPc0d sNyfemGBBDzPewy+fzp8MronxiwfETJagYeJ8XpbMWez90Y7l27Y1rR6guUtrKFkeXX5 PVig== MIME-Version: 1.0 Received: by 10.112.45.231 with SMTP id q7mr1761792lbm.133.1349273005467; Wed, 03 Oct 2012 07:03:25 -0700 (PDT) Received: by 10.112.42.231 with HTTP; Wed, 3 Oct 2012 07:03:25 -0700 (PDT) In-Reply-To: References: Date: Wed, 3 Oct 2012 16:03:25 +0200 Message-ID: From: Olivier Smedts To: Ronald Klop Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQl7OSgwLPQOrA2k8OcbDe6WVYu4+D5b34duSiBDlNRT2SQ1qqPepEaiOJuyRg7NmKA/Bred Cc: freebsd-fs@freebsd.org Subject: Re: Can't remove zil / separate log device from root pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 14:03:27 -0000 2012/9/30 Ronald Klop : > What version is your zpool/zfs? Mind the message about upgrading in your > output above. Not all version of ZFS support removing a log or cache disk. > > Run 'zpool upgrade' and 'zfs upgrade'. They will only print the current > version. They need more options for real upgrading. > > Ronald. It's a v28 pool. The separate log device should be removable. The warning is here because I'm under 10-CURRENT, which at the time supports "v5000" (features). Thanks for the reply. -- Olivier Smedts _ ASCII ribbon campaign ( ) e-mail: olivier@gid0.org - against HTML email & vCards X www: http://www.gid0.org - against proprietary attachments / \ "Il y a seulement 10 sortes de gens dans le monde : ceux qui comprennent le binaire, et ceux qui ne le comprennent pas." From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 14:36:03 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B7598106564A; Wed, 3 Oct 2012 14:36:03 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id D1B6B8FC0C; Wed, 3 Oct 2012 14:36:02 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA07164; Wed, 03 Oct 2012 17:35:59 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <506C4D4F.2090909@FreeBSD.org> Date: Wed, 03 Oct 2012 17:35:59 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Steven Hartland References: <505DF1A3.1020809@FreeBSD.org> <80F518854AE34A759D9441AE1A60D2DC@multiplay.co.uk> In-Reply-To: <80F518854AE34A759D9441AE1A60D2DC@multiplay.co.uk> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, freebsd-geom@FreeBSD.org Subject: Re: zfs zvol: set geom mediasize right at creation time X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 14:36:03 -0000 on 23/09/2012 04:04 Steven Hartland said the following: > Do you know what the effect of the volblocksize change > will have on a volume who's disk block size changes? > e.g. via a quirk for a 4k disk being added I am not sure that I got your question... My patch doesn't affect neither volblocksize value nor disk block size [geom property]. It changes only stripe size geom property. > I ask as we've testing a patch here which changes ashift to > be based on stripesize instead of sectorsize but in its > current form it has some odd side effects on pools which > are boot pools. > > Said patch is attached for reference. I think that the patch makes sense and would be curious to learn more about the side-effects. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 14:51:35 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 12654106566C for ; Wed, 3 Oct 2012 14:51:35 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 603678FC08 for ; Wed, 3 Oct 2012 14:51:34 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA07308; Wed, 03 Oct 2012 17:51:29 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <506C50F1.40805@FreeBSD.org> Date: Wed, 03 Oct 2012 17:51:29 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: "Justin T. Gibbs" References: <505DE715.8020806@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: zfs: allow to mount root from a pool not in zpool.cache X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 14:51:35 -0000 on 23/09/2012 07:59 Justin T. Gibbs said the following: > On Sep 22, 2012, at 10:28 AM, Andriy Gapon wrote: > >> >> Currently FreeBSD ZFS kernel code doesn't allow to mount root filesystem on a >> pool that is not listed in zpool.cache as only pools from the cache are known to >> ZFS at that time. > > I've for some time been of the opinion that FreeBSD should only use > the cache file for ZFS pools created from non-GEOM objects (i.e. > files). GEOM tasting should be used to make the kernel aware of > all pools whether they be imported on the system, partial, or > foreign. Even for pools created by files, the user land utilities > should do nothing more than ask the kernel to "taste them". This > would remove code duplicated in user land for this task (code that > must be re-executed in kernel space for validation reasons anyway) > and also help solve problems we've encountered at Spectra with races > in fault event processing, spare management, and device arrival and > departures. > > So I'm excited by your work in this area and would encourage you > to "think larger" than just trying to integrate root pool discovery > with GEOM. Spectra may even be able to help in this work sometime > in the near future. For the moment I am trying to think "narrower" to fix the problem at hand :-) But I see what you say. It doesn't make sense that - zfsboot tastes all BIOS visible disks for pools - zfsloader tastes all BIOS visible disks for pools [duplicated effort detected] - but kernel puts its all trust in some cache file I am not sure what performance impact would tasting of all GEOM providers have, but I've got this idea. geom_vdev geoms should taste all providers (like e.g. geom part or label do) and attach (but not g_access) to any that have valid zfs labels. They should cache things like pool guids, vdev guids, txgs, etc. So that that information is readily available for any queries. So we easily know what pools we have in a system, what devices from those pools are available, etc. When we want to import a pool we just start using the corresponding geom_vdev geoms (g_access them). This will also remove a need for a disk tasting done from userland (which is weird on FreeBSD). I think that the zfs+geom part is not too much work. The userland reduction part looks scarier to me :-) -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 14:59:16 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6A542106564A; Wed, 3 Oct 2012 14:59:16 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 880E58FC08; Wed, 3 Oct 2012 14:59:14 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA07394; Wed, 03 Oct 2012 17:59:13 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <506C52C1.10405@FreeBSD.org> Date: Wed, 03 Oct 2012 17:59:13 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <505DE715.8020806@FreeBSD.org> <20121003085326.GC1386@garage.freebsd.pl> In-Reply-To: <20121003085326.GC1386@garage.freebsd.pl> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: "Justin T. Gibbs" , freebsd-fs@FreeBSD.org Subject: Re: zfs: allow to mount root from a pool not in zpool.cache X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 14:59:16 -0000 on 03/10/2012 11:53 Pawel Jakub Dawidek said the following: > GEOM tasting would most likely require rewriting the code heavly. Almost all lines are already there (for device search by guid). Only some reshuffling would be needed :-) > Also note that you can have pools in you system that do match your > hostid, but user decided to keep exported and such pool should not be > configured automatically. Not a huge problem probably as there is pool > status somewhere in the metadata that we can use to see if the pool is > exported or not. Yeah, either using host id and pool status or keeping zpool.cache as just a list of pools to auto-import. Or some combination. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 17:13:20 2012 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6120C106567B; Wed, 3 Oct 2012 17:13:20 +0000 (UTC) (envelope-from dg@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id 10FB08FC22; Wed, 3 Oct 2012 17:13:20 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q93HDDoM011556; Wed, 3 Oct 2012 10:13:13 -0700 (PDT) (envelope-from dg@pki2.com) From: Dennis Glatting To: Pawel Jakub Dawidek In-Reply-To: <20120929144145.GI1402@garage.freebsd.pl> References: <20120929144145.GI1402@garage.freebsd.pl> Content-Type: text/plain; charset="ISO-8859-1" Date: Wed, 03 Oct 2012 10:13:13 -0700 Message-ID: <1349284393.14318.25.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q93HDDoM011556 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: dg@pki2.com Cc: fs@FreeBSD.org Subject: Re: How to recover from theis ZFS error? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 17:13:20 -0000 On Sat, 2012-09-29 at 16:41 +0200, Pawel Jakub Dawidek wrote: > On Wed, Sep 19, 2012 at 11:35:15AM -0700, Dennis Glatting wrote: > > > > One of my pools (disk-1) with 12T of data is reporting this error after a > > scrub. Is there a way to fix this error without backing up and restoring > > 12T of data? > > > > > > errors: Permanent errors have been detected in the following files: > > > > :<0x0> > > disk-1:<0x0> > > Can you paste entire 'zpool status -v' output after scrub? > THis is the output from a new scrub. It is the second scrub against that data set. The errors that remained after the first scrub (above) have vanished. I'm a little confused although I often run multiple fsck against volumes before entering multi-user mode. That said, during this scrub the system froze twice requiring a reboot. This is now a common problem across my four AMD systems: one eight core x1 8150, one 16 core x1 6272, one 16 core x2 6274, and one 16 core x4 6274. (3x r241015M and 1x r241040). bd3# zpool status -v disk-1 pool: disk-1 state: ONLINE scan: scrub repaired 0 in 30h18m with 0 errors on Wed Oct 3 09:44:29 2012 config: NAME STATE READ WRITE CKSUM disk-1 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da13 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da8 ONLINE 0 0 0 da9 ONLINE 0 0 0 da10 ONLINE 0 0 0 da11 ONLINE 0 0 0 logs gpt/zil-disk1 ONLINE 0 0 0 cache ada0 ONLINE 0 0 0 errors: No known data errors -- Dennis Glatting From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 20:59:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D5891106564A; Wed, 3 Oct 2012 20:59:17 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Received: from hergotha.csail.mit.edu (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) by mx1.freebsd.org (Postfix) with ESMTP id 802858FC1C; Wed, 3 Oct 2012 20:59:17 +0000 (UTC) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.14.5/8.14.5) with ESMTP id q93KxGTY061139; Wed, 3 Oct 2012 16:59:16 -0400 (EDT) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.14.5/8.14.4/Submit) id q93KxG4D061136; Wed, 3 Oct 2012 16:59:16 -0400 (EDT) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <20588.42788.103863.179701@hergotha.csail.mit.edu> Date: Wed, 3 Oct 2012 16:59:16 -0400 From: Garrett Wollman To: Rick Macklem In-Reply-To: <1571646304.1630985.1349270466529.JavaMail.root@erie.cs.uoguelph.ca> References: <20587.47363.504969.926603@hergotha.csail.mit.edu> <1571646304.1630985.1349270466529.JavaMail.root@erie.cs.uoguelph.ca> X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (hergotha.csail.mit.edu [127.0.0.1]); Wed, 03 Oct 2012 16:59:16 -0400 (EDT) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on hergotha.csail.mit.edu Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, hackers@freebsd.org Subject: Re: NFS server bottlenecks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 20:59:18 -0000 < said: >> Simple: just use a sepatate mutex for each list that a cache entry >> is on, rather than a global lock for everything. This would reduce >> the mutex contention, but I'm not sure how significantly since I >> don't have the means to measure it yet. >> > Well, since the cache trimming is removing entries from the lists, I don't > see how that can be done with a global lock for list updates? Well, the global lock is what we have now, but the cache trimming process only looks at one list at a time, so not locking the list that isn't being iterated over probably wouldn't hurt, unless there's some mechanism (that I didn't see) for entries to move from one list to another. Note that I'm considering each hash bucket a separate "list". (One issue to worry about in that case would be cache-line contention in the array of hash buckets; perhaps NFSRVCACHE_HASHSIZE ought to be increased to reduce that.) > Only doing it once/sec would result in a very large cache when bursts of > traffic arrives. My servers have 96 GB of memory so that's not a big deal for me. > I'm not sure I see why doing it as a separate thread will improve things. > There are N nfsd threads already (N can be bumped up to 256 if you wish) > and having a bunch more "cache trimming threads" would just increase > contention, wouldn't it? Only one cache-trimming thread. The cache trim holds the (global) mutex for much longer than any individual nfsd service thread has any need to, and having N threads doing that in parallel is why it's so heavily contended. If there's only one thread doing the trim, then the nfsd service threads aren't spending time either contending on the mutex (it will be held less frequently and for shorter periods). > The only negative effect I can think of w.r.t. having the nfsd > threads doing it would be a (I believe negligible) increase in RPC > response times (the time the nfsd thread spends trimming the cache). > As noted, I think this time would be negligible compared to disk I/O > and network transit times in the total RPC response time? With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G network connectivity, spinning on a contended mutex takes a significant amount of CPU time. (For the current design of the NFS server, it may actually be a win to turn off adaptive mutexes -- I should give that a try once I'm able to do more testing.) -GAWollman From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 21:37:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 22F30106564A; Wed, 3 Oct 2012 21:37:17 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 727FE8FC0A; Wed, 3 Oct 2012 21:37:15 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAOCvbFCDaFvO/2dsb2JhbABFhg+5d4IgAQEBAwEBAQEgKyALGxgCAg0ZAikBCSYGCAcEARwBA4deBgulMZJhgSGKAhqFDoESA5M8gi2BFY8WgwmBRzQ X-IronPort-AV: E=Sophos;i="4.80,530,1344225600"; d="scan'208";a="181843957" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 03 Oct 2012 17:36:59 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 34DE3B403A; Wed, 3 Oct 2012 17:36:59 -0400 (EDT) Date: Wed, 3 Oct 2012 17:36:59 -0400 (EDT) From: Rick Macklem To: Garrett Wollman Message-ID: <1666343702.1682678.1349300219198.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20588.42788.103863.179701@hergotha.csail.mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, hackers@freebsd.org Subject: Re: NFS server bottlenecks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 21:37:17 -0000 Garrett Wollman wrote: > < said: > > >> Simple: just use a sepatate mutex for each list that a cache entry > >> is on, rather than a global lock for everything. This would reduce > >> the mutex contention, but I'm not sure how significantly since I > >> don't have the means to measure it yet. > >> > > Well, since the cache trimming is removing entries from the lists, I > > don't > > see how that can be done with a global lock for list updates? > > Well, the global lock is what we have now, but the cache trimming > process only looks at one list at a time, so not locking the list that > isn't being iterated over probably wouldn't hurt, unless there's some > mechanism (that I didn't see) for entries to move from one list to > another. Note that I'm considering each hash bucket a separate > "list". (One issue to worry about in that case would be cache-line > contention in the array of hash buckets; perhaps NFSRVCACHE_HASHSIZE > ought to be increased to reduce that.) > Yea, a separate mutex for each hash list might help. There is also the LRU list that all entries end up on, that gets used by the trimming code. (I think? I wrote this stuff about 8 years ago, so I haven't looked at it in a while.) Also, increasing the hash table size is probably a good idea, especially if you reduce how aggressively the cache is trimmed. > > Only doing it once/sec would result in a very large cache when > > bursts of > > traffic arrives. > > My servers have 96 GB of memory so that's not a big deal for me. > This code was originally "production tested" on a server with 1Gbyte, so times have changed a bit;-) > > I'm not sure I see why doing it as a separate thread will improve > > things. > > There are N nfsd threads already (N can be bumped up to 256 if you > > wish) > > and having a bunch more "cache trimming threads" would just increase > > contention, wouldn't it? > > Only one cache-trimming thread. The cache trim holds the (global) > mutex for much longer than any individual nfsd service thread has any > need to, and having N threads doing that in parallel is why it's so > heavily contended. If there's only one thread doing the trim, then > the nfsd service threads aren't spending time either contending on the > mutex (it will be held less frequently and for shorter periods). > I think the little drc2.patch which will keep the nfsd threads from acquiring the mutex and doing the trimming most of the time, might be sufficient. I still don't see why a separate trimming thread will be an advantage. I'd also be worried that the one cache trimming thread won't get the job done soon enough. When I did production testing on a 1Gbyte server that saw a peak load of about 100RPCs/sec, it was necessary to trim aggressively. (Although I'd be tempted to say that a server with 1Gbyte is no longer relevant, I recently recall someone trying to run FreeBSD on a i486, although I doubt they wanted to run the nfsd on it.) > > The only negative effect I can think of w.r.t. having the nfsd > > threads doing it would be a (I believe negligible) increase in RPC > > response times (the time the nfsd thread spends trimming the cache). > > As noted, I think this time would be negligible compared to disk I/O > > and network transit times in the total RPC response time? > > With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G > network connectivity, spinning on a contended mutex takes a > significant amount of CPU time. (For the current design of the NFS > server, it may actually be a win to turn off adaptive mutexes -- I > should give that a try once I'm able to do more testing.) > Have fun with it. Let me know when you have what you think is a good patch. rick > -GAWollman > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 22:35:48 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C2AA1106566B for ; Wed, 3 Oct 2012 22:35:48 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 4FF948FC0C for ; Wed, 3 Oct 2012 22:35:48 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEALy8bFCDaFvO/2dsb2JhbAA8CRaFebl3giABAQEDAQEBASAEJyALGxgRGQIEJQEJJgYIBwQBHASFcIFuBgulQJJZiyUPBAUGhQiBEgOOboNKgQSCLYEVjxaDCYE/CDQ X-IronPort-AV: E=Sophos;i="4.80,530,1344225600"; d="scan'208";a="181851197" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 03 Oct 2012 18:35:41 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4FE94B4032; Wed, 3 Oct 2012 18:35:41 -0400 (EDT) Date: Wed, 3 Oct 2012 18:35:41 -0400 (EDT) From: Rick Macklem To: Ulysse 31 Message-ID: <1483416316.1685354.1349303741302.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1685353_717231508.1349303741299" X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: nfsv4 kerberized and gssname=root and allgsname X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 22:35:49 -0000 ------=_Part_1685353_717231508.1349303741299 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Ulysse 31 wrote: > 2012/9/29 Rick Macklem : > > Ulysse 31 wrote: > >> Hi all, > >> > >> I am actually working on a freebsd 9 backup server. > >> this server would backup the production server via kerberized nfs4 > >> (since the old backup server, a linux one, was doing so). > >> we used on the old backup server a root/ kerberos identity, > >> which allows the backup server to access all the data. > >> I have followed the documentation found at : > >> > >> http://code.google.com/p/macnfsv4/wiki/FreeBSD8KerberizedNFSSetup > >> > >> done : > >> - added to kernel : > >> > >> options KGSSAPI > >> device crypto > >> > >> - added to rc.conf : > >> > >> nfs_client_enable="YES" > >> rpc_lockd_enable="YES" > >> rpc_statd_enable="YES" > >> rpcbind_enable="YES" > >> devfs_enable="YES" > >> gssd_enable="YES" > >> > >> - have done sysctl vfs.rpcsec.keytab_enctype=1 and added it to > >> /etc/sysctl.conf > >> > >> We used MIT kerberos implementation, since it is the one used on > >> all > >> our servers (mostly linux), and we have created and > >> /etc/krb5.keytab > >> containing the following keys : > >> host/ > >> nfs/ > >> root/ > >> > >> and, of course, i have used the available patch at : > >> http://people.freebsd.org/~rmacklem/rpcsec_gss-9.patch > >> > >> When i try to mount with the (B) method (the one of the google > >> wiki), > >> it works as expected, i mean, with a correct user credential, i can > >> access to the user data. > >> But, when i try to access via the (C) method (the one that i need > >> in > >> order to do a full backup of the production storage server) i get a > >> systematic kernel panic when launch the mount command. > >> The mount command looks to something like : mount -t nfs -o > >> nfsv4,sec=krb5i,gssname=root,allgssname >> fqdn>: Just to confirm it, you are saying that exactly the same mount command, except without the "allgssname" option, doesn't crash? That is weird, since when I look at the code, there shouldn't be any difference between the two mounts, up to the point where it crashes. The crash seems to indicate that nr_auth is bogus, but I can't see how/why that would happen. I have attached a patch which changes the way nr_auth is set and "might" help, although I doubt it. (It is untested, but if you want to try it, good luck with it.) I'll email again if I get something more solid figured out, rick > >> I have activated the kernel debugging stuff to get some infos, here > >> is > >> the message : > >> > >> > >> Fatal trap 12: page fault while in kernel mode > >> cpuid = 0; apic id = 00 > >> fault virtual address = 0x368 > >> fault code = supervisor read data, page not present > >> instruction pointer = 0x20:0xffffffff80866ab7 > >> stack pointer = 0x28:0xffffff804aa39ce0 > >> frame pointer = 0x28:0xffffff804aa39d30 > >> code segment = base 0x0, limit 0xfffff, type 0x1b > >> = DPL 0, pres 1, long 1, def32 0, gran 1 > >> processor eflags = interrupt enabled, resume, IOPL = 0 > >> current process = 701 (mount_nfs) > >> trap number = 12 > >> panic: page fault > >> cpuid = 0 > >> KDB: stack backtrace: > >> #0 0xffffffff808ae486 at kdb_backtrace+0x66 > >> #1 0xffffffff8087885e at panic+0x1ce > >> #2 0xffffffff80b82380 at trap_fatal+0x290 > >> #3 0xffffffff80b826b8 at trap_pfault+0x1e8 > >> #4 0xffffffff80b82cbe at trap+0x3be > >> #5 0xffffffff80b6c57f at calltrap+0x8 > >> #6 0xffffffff80a78eda at rpc_gss_init+0x72a > >> #7 0xffffffff80a79cd6 at rpc_gss_refresh_auth+0x46 > >> #8 0xffffffff807a5a53 at newnfs_request+0x163 > >> #9 0xffffffff807bf0f7 at nfsrpc_getattrnovp+0xd7 > >> #10 0xffffffff807d9b29 at mountnfs+0x4e9 > >> #11 0xffffffff807db60a at nfs_mount+0x13ba > >> #12 0xffffffff809068fb at vfs_donmount+0x100b > >> #13 0xffffffff80907086 at sys_nmount+0x66 > >> #14 0xffffffff80b81c60 at amd64_syscall+0x540 > >> #15 0xffffffff80b6c867 at Xfast_syscall+0xf7 > >> Uptime: 2m31s > >> Dumping 97 out of 1002 MB:..17%..33%..50%..66%..83%..99% > >> > >> ------------------------------------------------------------------------ > >> > >> Does anyone as experience something similar ? is their a way to > >> correct that ? > >> Thanks for the help. > >> > > Well, you're probably the first person to try doing this in years. I > > did > > have it working about 4-5years ago. Welcome to the bleeding edge;-) > > > > Could you do the following w.r.t. above kernel: > > # cd /boot/nkernel (or wherever the kernel lives) > > # nm kernel | grep rpc_gss_init > > - add the offset 0x72a to the address for rpc_gss_init > > # addr2line -e kernel.symbols > > 0xXXX - the hex number above (address of rpc_gss_init+0x72a) > > - email me what it prints out, so I know where the crash is > > occurring > > > > You could also run the following command on the Linux server to > > capture > > packets during the mount attempt, then email me the xxx.pcap file so > > I > > can look at it in wireshark, to see what is happening before the > > crash. > > (I'm guessing nr_auth is somehow bogus, but that's just a guess.:-) > > # tcpdump -s 0 -w xxx.pcap host > > Hi, > > Sorry for the delay i was on travel and no working network connection. > Back online for the rest of the week ^^. > Thanks for your help, here is what it prints out : > > root@bsdenc:/boot/kernel # nm kernel | grep rpc_gss_init > ffffffff80df07b0 r __set_sysinit_set_sym_svc_rpc_gss_init_sys_init > ffffffff80a787b0 t rpc_gss_init > ffffffff80a7a580 t svc_rpc_gss_init > ffffffff81127530 d svc_rpc_gss_init_sys_init > ffffffff80a7a3b0 T xdr_rpc_gss_init_res > root@bsdenc:/boot/kernel # addr2line -e kernel.symbols > 0xffffffff80a78eda > /usr/src/sys/rpc/rpcsec_gss/rpcsec_gss.c:772 > > > for the tcpdump from the linux server, i think you may are doing > reference to the production nfs server ? > if yes, unfortunately it is not linux, it is a netapp filer, so no > "real" root access on it (so no tcpdump available :s ). > if you were mentioning the old backup server (which is linux but nfs > client), i cannot do unmount/mount on it since its production > (mountpoint always busy), but i can made a quick VM/testmachine that > acts like the linux backup server and do a tcpdump from it. > Just let me know. Thanks again. > > -- > Ulysse31 > > > > > rick > > > >> -- > >> Ulysse31 > >> _______________________________________________ > >> freebsd-fs@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >> To unsubscribe, send any mail to > >> "freebsd-fs-unsubscribe@freebsd.org" ------=_Part_1685353_717231508.1349303741299 Content-Type: text/x-patch; name=rpcsec-crash.patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=rpcsec-crash.patch LS0tIGZzL25mcy9uZnNfY29tbW9ua3JwYy5jLnNhdgkyMDEyLTEwLTAzIDE3OjU4OjE1LjAwMDAw MDAwMCAtMDQwMAorKysgZnMvbmZzL25mc19jb21tb25rcnBjLmMJMjAxMi0xMC0wMyAxODoyMjow MS4wMDAwMDAwMDAgLTA0MDAKQEAgLTQ1OCwxMyArNDU4LDE0IEBAIG5ld25mc19yZXF1ZXN0KHN0 cnVjdCBuZnNydl9kZXNjcmlwdCAqbmQKIAl1X2ludCB0cnlsYXRlcl9kZWxheSA9IDE7CiAJc3Ry dWN0IG5mc19mZWVkYmFja19hcmcgbmY7CiAJc3RydWN0IHRpbWV2YWwgdGltbywgbm93OwotCUFV VEggKmF1dGg7CisJQVVUSCAqYXV0aCwgKnNhdmF1dGg7CiAJc3RydWN0IHJwY19jYWxsZXh0cmEg ZXh0OwogCWVudW0gY2xudF9zdGF0IHN0YXQ7CiAJc3RydWN0IG5mc3JlcSAqcmVwID0gTlVMTDsK IAljaGFyICpzcnZfcHJpbmNpcGFsID0gTlVMTCwgKmNsbnRfcHJpbmNpcGFsID0gTlVMTDsKIAlz aWdzZXRfdCBvbGRzZXQ7CiAJc3RydWN0IHVjcmVkICphdXRoY3JlZDsKKwlpbnQgZGVzdHJveV9h dXRoID0gMTsKIAogCWlmICh4aWRwICE9IE5VTEwpCiAJCSp4aWRwID0gMDsKQEAgLTU4NywxMiAr NTg4LDI5IEBAIG5ld25mc19yZXF1ZXN0KHN0cnVjdCBuZnNydl9kZXNjcmlwdCAqbmQKIAkJICog bmZzc29ja3JlcSBzdHJ1Y3R1cmUsIHNvIGRvbid0IHJlbGVhc2UgdGhlIHJlZmVyZW5jZSBjb3Vu dAogCQkgKiBoZWxkIG9uIGl0LiAtLT4gRG9uJ3QgQVVUSF9ERVNUUk9ZKCkgaXQgaW4gdGhpcyBm dW5jdGlvbi4KIAkJICovCi0JCWlmIChucnAtPm5yX2F1dGggPT0gTlVMTCkKLQkJCW5ycC0+bnJf YXV0aCA9IG5mc19nZXRhdXRoKG5ycCwgc2VjZmxhdm91ciwKKwkJZGVzdHJveV9hdXRoID0gMDsK KwkJTkZTTE9DS1NPQ0tSRVEobnJwKTsKKwkJaWYgKG5ycC0+bnJfYXV0aCA9PSBOVUxMKSB7CisJ CQlORlNVTkxPQ0tTT0NLUkVRKG5ycCk7CisJCQlhdXRoID0gbmZzX2dldGF1dGgobnJwLCBzZWNm bGF2b3VyLAogCQkJICAgIGNsbnRfcHJpbmNpcGFsLCBzcnZfcHJpbmNpcGFsLCBOVUxMLCBhdXRo Y3JlZCk7Ci0JCWVsc2UKLQkJCXJwY19nc3NfcmVmcmVzaF9hdXRoX2NhbGwobnJwLT5ucl9hdXRo KTsKLQkJYXV0aCA9IG5ycC0+bnJfYXV0aDsKKwkJCU5GU0xPQ0tTT0NLUkVRKG5ycCk7CisJCQlp ZiAobnJwLT5ucl9hdXRoID09IE5VTEwpIHsKKwkJCQlucnAtPm5yX2F1dGggPSBhdXRoOworCQkJ CU5GU1VOTE9DS1NPQ0tSRVEobnJwKTsKKwkJCX0gZWxzZSB7CisJCQkJc2F2YXV0aCA9IGF1dGg7 CisJCQkJYXV0aCA9IG5ycC0+bnJfYXV0aDsKKwkJCQlORlNVTkxPQ0tTT0NLUkVRKG5ycCk7CisJ CQkJaWYgKHNhdmF1dGggIT0gTlVMTCkKKwkJCQkJREVTVFJPWV9BVVRIKHNhdmF1dGgpOworCQkJ fQorCQl9IGVsc2UgeworCQkJYXV0aCA9IG5ycC0+bnJfYXV0aDsKKwkJCU5GU1VOTE9DS1NPQ0tS RVEobnJwKTsKKwkJfQorCQlpZiAoYXV0aCAhPSBOVUxMKQorCQkJcnBjX2dzc19yZWZyZXNoX2F1 dGhfY2FsbChhdXRoKTsKIAl9IGVsc2UKIAkJYXV0aCA9IG5mc19nZXRhdXRoKG5ycCwgc2VjZmxh dm91ciwgTlVMTCwKIAkJICAgIHNydl9wcmluY2lwYWwsIE5VTEwsIGF1dGhjcmVkKTsKQEAgLTc0 MSw3ICs3NTksNyBAQCB0cnlhZ2FpbjoKIAl9CiAJaWYgKGVycm9yKSB7CiAJCW1fZnJlZW0obmQt Pm5kX21yZXEpOwotCQlpZiAodXNlZ3NzbmFtZSA9PSAwKQorCQlpZiAoZGVzdHJveV9hdXRoICE9 IDApCiAJCQlBVVRIX0RFU1RST1koYXV0aCk7CiAJCWlmIChyZXAgIT0gTlVMTCkKIAkJCUZSRUUo KGNhZGRyX3QpcmVwLCBNX05GU0RSRVEpOwpAQCAtODk5LDcgKzkxNyw3IEBAIHRyeWFnYWluOgog I2VuZGlmCiAKIAltX2ZyZWVtKG5kLT5uZF9tcmVxKTsKLQlpZiAodXNlZ3NzbmFtZSA9PSAwKQor CWlmIChkZXN0cm95X2F1dGggIT0gMCkKIAkJQVVUSF9ERVNUUk9ZKGF1dGgpOwogCWlmIChyZXAg IT0gTlVMTCkKIAkJRlJFRSgoY2FkZHJfdClyZXAsIE1fTkZTRFJFUSk7CkBAIC05MDksNyArOTI3 LDcgQEAgdHJ5YWdhaW46CiBuZnNtb3V0OgogCW1idWZfZnJlZW0obmQtPm5kX21yZXApOwogCW1i dWZfZnJlZW0obmQtPm5kX21yZXEpOwotCWlmICh1c2Vnc3NuYW1lID09IDApCisJaWYgKGRlc3Ry b3lfYXV0aCAhPSAwKQogCQlBVVRIX0RFU1RST1koYXV0aCk7CiAJaWYgKHJlcCAhPSBOVUxMKQog CQlGUkVFKChjYWRkcl90KXJlcCwgTV9ORlNEUkVRKTsK ------=_Part_1685353_717231508.1349303741299-- From owner-freebsd-fs@FreeBSD.ORG Thu Oct 4 05:53:08 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2FEA11065674 for ; Thu, 4 Oct 2012 05:53:08 +0000 (UTC) (envelope-from ulysse31@gmail.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 885148FC38 for ; Thu, 4 Oct 2012 05:53:07 +0000 (UTC) Received: by mail-wg0-f50.google.com with SMTP id 16so111508wgi.31 for ; Wed, 03 Oct 2012 22:53:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:in-reply-to:mime-version:content-transfer-encoding :content-type:message-id:cc:x-mailer:from:subject:date:to; bh=iEfj8U61a7Sz480Vl+tt4TpXrw8ccJ+WRAlfilI/te4=; b=LATZU8FIceJllMBkkgPzBAAOxQPIX/Xvj8ZrK/1JoBjRXFK0ET2Rb87CcnflKVf8+n TC36lhbUy6h/83rx84yyYM4POQizeuE7eLJDgqa1sm7ZeuvzOQHtd0Tbj01u7xPNKctv hyTggshzFGgcUn4QYFE/yuhlc5y3bmemlc0yOfN5MZ4Id+a+TStwrRktB6pQzhE/k9wf ZL0YSKT3Fa8IyrSrk9OML0+/odtvSs6NjQcr3xx7OqVMv7c9YutapkfVy5QyJOMhRT2y SYe7HmjaYV5VxFKnfpO5dTW6SvNX4foN98EMe5Hgq3RPuFDK1Sm7N4jU/VDO0AHaFFaT SfbQ== Received: by 10.180.91.71 with SMTP id cc7mr10114819wib.2.1349329986056; Wed, 03 Oct 2012 22:53:06 -0700 (PDT) Received: from [10.73.66.227] (37-8-170-82.coucou-networks.fr. [37.8.170.82]) by mx.google.com with ESMTPS id gm7sm12669922wib.10.2012.10.03.22.53.03 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 03 Oct 2012 22:53:04 -0700 (PDT) References: <1483416316.1685354.1349303741302.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <1483416316.1685354.1349303741302.JavaMail.root@erie.cs.uoguelph.ca> Mime-Version: 1.0 (1.0) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Message-Id: <836B0731-DC60-40DF-8D9E-ADB9D3FD5AB5@gmail.com> X-Mailer: iPhone Mail (9A405) From: Gomes do Vale Victor Date: Thu, 4 Oct 2012 07:52:59 +0200 To: Rick Macklem Cc: "freebsd-fs@freebsd.org" Subject: Re: nfsv4 kerberized and gssname=root and allgsname X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Oct 2012 05:53:08 -0000 Le 4 oct. 2012 =C3=A0 00:35, Rick Macklem a =C3=A9cri= t : > Ulysse 31 wrote: >> 2012/9/29 Rick Macklem : >>> Ulysse 31 wrote: >>>> Hi all, >>>>=20 >>>> I am actually working on a freebsd 9 backup server. >>>> this server would backup the production server via kerberized nfs4 >>>> (since the old backup server, a linux one, was doing so). >>>> we used on the old backup server a root/ kerberos identity, >>>> which allows the backup server to access all the data. >>>> I have followed the documentation found at : >>>>=20 >>>> http://code.google.com/p/macnfsv4/wiki/FreeBSD8KerberizedNFSSetup >>>>=20 >>>> done : >>>> - added to kernel : >>>>=20 >>>> options KGSSAPI >>>> device crypto >>>>=20 >>>> - added to rc.conf : >>>>=20 >>>> nfs_client_enable=3D"YES" >>>> rpc_lockd_enable=3D"YES" >>>> rpc_statd_enable=3D"YES" >>>> rpcbind_enable=3D"YES" >>>> devfs_enable=3D"YES" >>>> gssd_enable=3D"YES" >>>>=20 >>>> - have done sysctl vfs.rpcsec.keytab_enctype=3D1 and added it to >>>> /etc/sysctl.conf >>>>=20 >>>> We used MIT kerberos implementation, since it is the one used on >>>> all >>>> our servers (mostly linux), and we have created and >>>> /etc/krb5.keytab >>>> containing the following keys : >>>> host/ >>>> nfs/ >>>> root/ >>>>=20 >>>> and, of course, i have used the available patch at : >>>> http://people.freebsd.org/~rmacklem/rpcsec_gss-9.patch >>>>=20 >>>> When i try to mount with the (B) method (the one of the google >>>> wiki), >>>> it works as expected, i mean, with a correct user credential, i can >>>> access to the user data. >>>> But, when i try to access via the (C) method (the one that i need >>>> in >>>> order to do a full backup of the production storage server) i get a >>>> systematic kernel panic when launch the mount command. >>>> The mount command looks to something like : mount -t nfs -o >>>> nfsv4,sec=3Dkrb5i,gssname=3Droot,allgssname >>> fqdn>: > Just to confirm it, you are saying that exactly the same mount command, > except without the "allgssname" option, doesn't crash? No, in fact it's the same command with gssname=3Dnfs instead of gssname=3Dro= ot that does not crash. When I specify gssname=3Droot it panics. The same command with gssname=3Dnfs and allgssname together "works", well sh= ould say mounts and don't crash because it does not allow accessing as root t= o the nfs share since the netapp expects a root/fqdn key to be used for that= . Don't know if this would give you an hint, I'm gonna test this patch. tell m= e if you have other ideas. For now we decided disabling kerberised nfs on the new FreeBSD backup server= in order to go on production with it without getting late. Thanks for the help. >=20 > That is weird, since when I look at the code, there shouldn't be any > difference between the two mounts, up to the point where it crashes. >=20 > The crash seems to indicate that nr_auth is bogus, but I can't see > how/why that would happen. >=20 > I have attached a patch which changes the way nr_auth is set and "might" > help, although I doubt it. (It is untested, but if you want to try it, > good luck with it.) >=20 > I'll email again if I get something more solid figured out, rick >=20 >>>> I have activated the kernel debugging stuff to get some infos, here >>>> is >>>> the message : >>>>=20 >>>>=20 >>>> Fatal trap 12: page fault while in kernel mode >>>> cpuid =3D 0; apic id =3D 00 >>>> fault virtual address =3D 0x368 >>>> fault code =3D supervisor read data, page not present >>>> instruction pointer =3D 0x20:0xffffffff80866ab7 >>>> stack pointer =3D 0x28:0xffffff804aa39ce0 >>>> frame pointer =3D 0x28:0xffffff804aa39d30 >>>> code segment =3D base 0x0, limit 0xfffff, type 0x1b >>>> =3D DPL 0, pres 1, long 1, def32 0, gran 1 >>>> processor eflags =3D interrupt enabled, resume, IOPL =3D 0 >>>> current process =3D 701 (mount_nfs) >>>> trap number =3D 12 >>>> panic: page fault >>>> cpuid =3D 0 >>>> KDB: stack backtrace: >>>> #0 0xffffffff808ae486 at kdb_backtrace+0x66 >>>> #1 0xffffffff8087885e at panic+0x1ce >>>> #2 0xffffffff80b82380 at trap_fatal+0x290 >>>> #3 0xffffffff80b826b8 at trap_pfault+0x1e8 >>>> #4 0xffffffff80b82cbe at trap+0x3be >>>> #5 0xffffffff80b6c57f at calltrap+0x8 >>>> #6 0xffffffff80a78eda at rpc_gss_init+0x72a >>>> #7 0xffffffff80a79cd6 at rpc_gss_refresh_auth+0x46 >>>> #8 0xffffffff807a5a53 at newnfs_request+0x163 >>>> #9 0xffffffff807bf0f7 at nfsrpc_getattrnovp+0xd7 >>>> #10 0xffffffff807d9b29 at mountnfs+0x4e9 >>>> #11 0xffffffff807db60a at nfs_mount+0x13ba >>>> #12 0xffffffff809068fb at vfs_donmount+0x100b >>>> #13 0xffffffff80907086 at sys_nmount+0x66 >>>> #14 0xffffffff80b81c60 at amd64_syscall+0x540 >>>> #15 0xffffffff80b6c867 at Xfast_syscall+0xf7 >>>> Uptime: 2m31s >>>> Dumping 97 out of 1002 MB:..17%..33%..50%..66%..83%..99% >>>>=20 >>>> -----------------------------------------------------------------------= - >>>>=20 >>>> Does anyone as experience something similar ? is their a way to >>>> correct that ? >>>> Thanks for the help. >>>>=20 >>> Well, you're probably the first person to try doing this in years. I >>> did >>> have it working about 4-5years ago. Welcome to the bleeding edge;-) >>>=20 >>> Could you do the following w.r.t. above kernel: >>> # cd /boot/nkernel (or wherever the kernel lives) >>> # nm kernel | grep rpc_gss_init >>> - add the offset 0x72a to the address for rpc_gss_init >>> # addr2line -e kernel.symbols >>> 0xXXX - the hex number above (address of rpc_gss_init+0x72a) >>> - email me what it prints out, so I know where the crash is >>> occurring >>>=20 >>> You could also run the following command on the Linux server to >>> capture >>> packets during the mount attempt, then email me the xxx.pcap file so >>> I >>> can look at it in wireshark, to see what is happening before the >>> crash. >>> (I'm guessing nr_auth is somehow bogus, but that's just a guess.:-) >>> # tcpdump -s 0 -w xxx.pcap host >>=20 >> Hi, >>=20 >> Sorry for the delay i was on travel and no working network connection. >> Back online for the rest of the week ^^. >> Thanks for your help, here is what it prints out : >>=20 >> root@bsdenc:/boot/kernel # nm kernel | grep rpc_gss_init >> ffffffff80df07b0 r __set_sysinit_set_sym_svc_rpc_gss_init_sys_init >> ffffffff80a787b0 t rpc_gss_init >> ffffffff80a7a580 t svc_rpc_gss_init >> ffffffff81127530 d svc_rpc_gss_init_sys_init >> ffffffff80a7a3b0 T xdr_rpc_gss_init_res >> root@bsdenc:/boot/kernel # addr2line -e kernel.symbols >> 0xffffffff80a78eda >> /usr/src/sys/rpc/rpcsec_gss/rpcsec_gss.c:772 >>=20 >>=20 >> for the tcpdump from the linux server, i think you may are doing >> reference to the production nfs server ? >> if yes, unfortunately it is not linux, it is a netapp filer, so no >> "real" root access on it (so no tcpdump available :s ). >> if you were mentioning the old backup server (which is linux but nfs >> client), i cannot do unmount/mount on it since its production >> (mountpoint always busy), but i can made a quick VM/testmachine that >> acts like the linux backup server and do a tcpdump from it. >> Just let me know. Thanks again. >>=20 >> -- >> Ulysse31 >>=20 >>>=20 >>> rick >>>=20 >>>> -- >>>> Ulysse31 >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to >>>> "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Thu Oct 4 09:24:15 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 22BE2106564A for ; Thu, 4 Oct 2012 09:24:15 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.17.9]) by mx1.freebsd.org (Postfix) with ESMTP id 8E9618FC19 for ; Thu, 4 Oct 2012 09:24:14 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mreu0) with ESMTP (Nemesis) id 0MRyX8-1SrdDY3vAo-00Sipn; Thu, 04 Oct 2012 11:24:07 +0200 Message-ID: <506D55B5.70403@brockmann-consult.de> Date: Thu, 04 Oct 2012 11:24:05 +0200 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120713 Thunderbird/14.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <506C3EFC.2060602@FreeBSD.org> In-Reply-To: <506C3EFC.2060602@FreeBSD.org> X-Enigmail-Version: 1.4.4 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:Ty0sWQg5pexN5tEqqm1jqDY1Xw4OrIKKKsTSqfT7b0H uo+CTZquo2Gdn/rMcCR2YDEnF43AyVQSsO6V/njhTvZN6FX47T 3ZG8RvfDYaKCuFszw68HgK4ngGLqIxGviRBQqmXDkVRkTQmPq7 de5oiVzocfBrbfcWBQyFP8/2iHpbR6LDF/1Vomjx3ilM0SHdHM nhMQzNmrN8wHxPb8tDIcfN9y0ihERM7emao8OmRMB4YQnWI7Si j/XmhfvDNtfZuX0PBIkdSdiHclKMEV2zKY58v0yS9IvHlFU/8Y +fXFHthtrmBqCI0k9HWsM2qiR8yR9mb8VL4g9OuP8PZDDZveUV BbbF8xxVT3Yz+IU9R0aZDlay/vlbhMnRbU6/rMxSc Subject: Re: Zfs import issue X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Oct 2012 09:24:15 -0000 I find this sort of thing to be common, but not exactly as you describe. I don't know if I tried df, but "zfs list" hangs (as well as any other zfs related command, maybe even zdb). And I don't know what you mean "importing zfs snapshot", so I'm guessing you mean zfs recv. eg. zfs send somedataset@somesnapshot | ....... (leave it running in background) zfs list (works fine; I guess it works because send is read-only) zfs destroy somedataset@someothersnapshot (hang; I guess because this is a write operation, so it needs to wait for the read lock on zfs send to finish the transaction) zfs list (hang) I'm not sure if df hangs too. At this point, using kill -9 doesn't solve anything, and if you kill the zfs send, it's possible that every zfs command and df will hang. And I don't know what, but I'm mostly sure there is something I can run that will make even "ls" hang after this point. On 10/03/2012 03:34 PM, Andriy Gapon wrote: > on 03/10/2012 14:43 Ram Chander said the following: >> Hi, >> >> I am importing zfs snapshot to freebsd-9 from anther host running >> freebsd-9. When the import happens, it locks the filesystem, "df" hangs >> and unable to use the filesystem. Once the import completes, the filesystem >> is back to normal and read/write works fine. The same doesnt happen in >> Solaris/OpenIndiana. >> >> # uname -an >> FreeBSD hostname 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan 3 07:46:30 >> UTC 2012 root@farrell.cse.buffalo.edu:/ >> usr/obj/usr/src/sys/GENERIC amd64 >> >> Zfs ver: 28 >> >> >> Any inputs would be helpful. Is there any way to overcome this freeze ? > What if you add -n option to df? > -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de -------------------------------------------- From owner-freebsd-fs@FreeBSD.ORG Thu Oct 4 11:20:25 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E651310656C7 for ; Thu, 4 Oct 2012 11:20:24 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.78]) by mx1.freebsd.org (Postfix) with ESMTP id 6E3578FC12 for ; Thu, 4 Oct 2012 11:20:24 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1TJjTU-0001LI-5a for freebsd-fs@freebsd.org; Thu, 04 Oct 2012 13:20:16 +0200 Received: from [81.21.138.17] (helo=ronaldradial.versatec.local) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1TJjTT-0007jd-Aw for freebsd-fs@freebsd.org; Thu, 04 Oct 2012 13:20:15 +0200 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org References: <20120929144145.GI1402@garage.freebsd.pl> <1349284393.14318.25.camel@btw.pki2.com> Date: Thu, 04 Oct 2012 13:20:15 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: <1349284393.14318.25.camel@btw.pki2.com> User-Agent: Opera Mail/12.02 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: 0.0 X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_50 autolearn=disabled version=3.2.5 X-Scan-Signature: f0d5e446bfc5bbd6ce781899a390d841 Subject: Re: How to recover from theis ZFS error? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Oct 2012 11:20:25 -0000 On Wed, 03 Oct 2012 19:13:13 +0200, Dennis Glatting wrote: > On Sat, 2012-09-29 at 16:41 +0200, Pawel Jakub Dawidek wrote: >> On Wed, Sep 19, 2012 at 11:35:15AM -0700, Dennis Glatting wrote: >> > >> > One of my pools (disk-1) with 12T of data is reporting this error >> after a >> > scrub. Is there a way to fix this error without backing up and >> restoring >> > 12T of data? >> > >> > >> > errors: Permanent errors have been detected in the following files: >> > >> > :<0x0> >> > disk-1:<0x0> >> >> Can you paste entire 'zpool status -v' output after scrub? >> > > THis is the output from a new scrub. It is the second scrub against that > data set. The errors that remained after the first scrub (above) have > vanished. I'm a little confused although I often run multiple fsck > against volumes before entering multi-user mode. > > That said, during this scrub the system froze twice requiring a reboot. > This is now a common problem across my four AMD systems: one eight core > x1 8150, one 16 core x1 6272, one 16 core x2 6274, and one 16 core x4 > 6274. (3x r241015M and 1x r241040). > > > > > bd3# zpool status -v disk-1 > pool: disk-1 > state: ONLINE > scan: scrub repaired 0 in 30h18m with 0 errors on Wed Oct 3 09:44:29 > 2012 > config: > > NAME STATE READ WRITE CKSUM > disk-1 ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > da0 ONLINE 0 0 0 > da1 ONLINE 0 0 0 > da13 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > da8 ONLINE 0 0 0 > da9 ONLINE 0 0 0 > da10 ONLINE 0 0 0 > da11 ONLINE 0 0 0 > logs > gpt/zil-disk1 ONLINE 0 0 0 > cache > ada0 ONLINE 0 0 0 > > errors: No known data errors > > > It is a guess, but maybe you deleted the last snapshot which referenced the blocks with the errors, so now the FS is clean as far as scrub knows. Or rereading the blocks did not give errors this time for some reason. NB: I see similar errors sometimes on my 320GB external USB-disk after I accidentally disconnected the USB cable while running. ;-) Ronald. From owner-freebsd-fs@FreeBSD.ORG Thu Oct 4 12:31:40 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1642E106566C; Thu, 4 Oct 2012 12:31:40 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 024BA8FC0C; Thu, 4 Oct 2012 12:31:38 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA17803; Thu, 04 Oct 2012 15:31:36 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <506D81A7.8030506@FreeBSD.org> Date: Thu, 04 Oct 2012 15:31:35 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Nikolay Denev References: <906543F2-96BD-4519-B693-FD5AFB646F87@gmail.com> <506BF372.1090208@FreeBSD.org> <506C4049.4040100@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs , Pawel Jakub Dawidek Subject: Re: nfs + zfs hangs on RELENG_9 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Oct 2012 12:31:40 -0000 [restoring cc to fs@] on 04/10/2012 14:32 Nikolay Denev said the following: > I have procstat only for the nfsd threads from the moment of the IO hang. > And this is the only one with "arc" : > > 1422 138630 nfsd nfsd: service mi_switch+0x186 > sleepq_wait+0x42 _sleep+0x390 arc_lowmem+0x77 kmem_malloc+0xc1 > uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0xb5 arc_read_nolock+0x1ec > arc_read+0x93 dbuf_read+0x452 dmu_buf_hold_array_by_dnode+0x16b > dmu_buf_hold_array+0x67 dmu_read_uio+0x3f zfs_freebsd_read+0x3e8 > nfsvno_read+0x2e5 nfsrvd_read+0x3ff nfsrvd_dorpc+0x3c0 Oh, very important stack trace. Earlier Nikolay Denev said the following: > PID TID COMM TDNAME KSTACK > 7 100192 zfskern arc_reclaim_thre mi_switch+0x186 sleepq_wait+0x42 _sx_xlock_hard+0x428 > _sx_xlock+0x51 arc_buf_remove_ref+0x8a dbuf_rele_and_unlock+0x132 dbuf_evict+0x11 > dbuf_do_evict+0x53 arc_do_user_evicts+0xb4 arc_reclaim_thread+0x263 fork_exit+0x11f > fork_trampoline+0xe To me this looks like a deadlock caused by a FreeBSD add-on to ZFS: arc_lowmem handler. I think that this is what happens: The nfsd thread does read, arc_read_nolock finds a buffer in a ghost cache and calls arc_get_data_buf while holding a hash_lock (one of buffer hash locks). arc_get_data_buf needs to allocate some memory and, as luck would have it, there is a memory shortage. Low memory handlers are invoked (directly) and one of them is arc_lowmem. arc_lowmem simply kicks arc_reclaim_thread to do its job and then loops sleep-waiting until memory shortage is less severe. arc_reclaim_thread tries to evict some buffers and, as luck would have it again, it attempts to evict either the same buffer or, most likely, a different buffer that hashes to the same lock. So arc_reclaim_thread is blocked on the arc buffer lock. While the nfsd thread holds the lock, but waits in arc_lowmem for arc_reclaim_thread to make progress. Eventually the held lock stalls other threads that attempt to grab it, the stall propagates to txg_sync_thread threads and all ZFS I/O stops. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Oct 4 13:26:52 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 72CC7106564A for ; Thu, 4 Oct 2012 13:26:52 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id EE7DA8FC12 for ; Thu, 4 Oct 2012 13:26:51 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEADaNbVCDaFvO/2dsb2JhbAA8CRaFeboDgiABAQEDAQEBASArIAsFFhgCAg0ZAikBCSYGCAcEARwEhXCBbgYLpg2SdoEhigIBAQ8EBQaEaoESA5I4gQSCLYEVjxaDCYE/CDQ X-IronPort-AV: E=Sophos;i="4.80,536,1344225600"; d="scan'208";a="184673123" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 04 Oct 2012 09:26:44 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 70516B4037; Thu, 4 Oct 2012 09:26:44 -0400 (EDT) Date: Thu, 4 Oct 2012 09:26:44 -0400 (EDT) From: Rick Macklem To: Gomes do Vale Victor Message-ID: <1625458573.1710053.1349357204427.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <836B0731-DC60-40DF-8D9E-ADB9D3FD5AB5@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: nfsv4 kerberized and gssname=root and allgsname X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Oct 2012 13:26:52 -0000 Gomes do Vale Victor wrote: > Le 4 oct. 2012 =C3=A0 00:35, Rick Macklem a =C3=A9= crit : >=20 > > Ulysse 31 wrote: > >> 2012/9/29 Rick Macklem : > >>> Ulysse 31 wrote: > >>>> Hi all, > >>>> > >>>> I am actually working on a freebsd 9 backup server. > >>>> this server would backup the production server via kerberized > >>>> nfs4 > >>>> (since the old backup server, a linux one, was doing so). > >>>> we used on the old backup server a root/ kerberos identity, > >>>> which allows the backup server to access all the data. > >>>> I have followed the documentation found at : > >>>> > >>>> http://code.google.com/p/macnfsv4/wiki/FreeBSD8KerberizedNFSSetup > >>>> > >>>> done : > >>>> - added to kernel : > >>>> > >>>> options KGSSAPI > >>>> device crypto > >>>> > >>>> - added to rc.conf : > >>>> > >>>> nfs_client_enable=3D"YES" > >>>> rpc_lockd_enable=3D"YES" > >>>> rpc_statd_enable=3D"YES" > >>>> rpcbind_enable=3D"YES" > >>>> devfs_enable=3D"YES" > >>>> gssd_enable=3D"YES" > >>>> > >>>> - have done sysctl vfs.rpcsec.keytab_enctype=3D1 and added it to > >>>> /etc/sysctl.conf > >>>> > >>>> We used MIT kerberos implementation, since it is the one used on > >>>> all > >>>> our servers (mostly linux), and we have created and > >>>> /etc/krb5.keytab > >>>> containing the following keys : > >>>> host/ > >>>> nfs/ > >>>> root/ > >>>> > >>>> and, of course, i have used the available patch at : > >>>> http://people.freebsd.org/~rmacklem/rpcsec_gss-9.patch > >>>> > >>>> When i try to mount with the (B) method (the one of the google > >>>> wiki), > >>>> it works as expected, i mean, with a correct user credential, i > >>>> can > >>>> access to the user data. > >>>> But, when i try to access via the (C) method (the one that i need > >>>> in > >>>> order to do a full backup of the production storage server) i get > >>>> a > >>>> systematic kernel panic when launch the mount command. > >>>> The mount command looks to something like : mount -t nfs -o > >>>> nfsv4,sec=3Dkrb5i,gssname=3Droot,allgssname >>>> fqdn>: > > Just to confirm it, you are saying that exactly the same mount > > command, > > except without the "allgssname" option, doesn't crash? >=20 > No, in fact it's the same command with gssname=3Dnfs instead of > gssname=3Droot that does not crash. When I specify gssname=3Droot it > panics. > The same command with gssname=3Dnfs and allgssname together "works", > well should say mounts and don't crash because it does not allow > accessing as root to the nfs share since the netapp expects a > root/fqdn key to be used for that. > Don't know if this would give you an hint, I'm gonna test this patch. > tell me if you have other ideas. Well, although it doesn't "fix" whatever the bug is, you could try a /etc/krb5.keytab file with only the "root/fqdn@realm" entry in it. (That's the way I used to create them.) > For now we decided disabling kerberised nfs on the new FreeBSD backup > server in order to go on production with it without getting late. > Thanks for the help. >=20 > > > > That is weird, since when I look at the code, there shouldn't be any > > difference between the two mounts, up to the point where it crashes. > > > > The crash seems to indicate that nr_auth is bogus, but I can't see > > how/why that would happen. > > > > I have attached a patch which changes the way nr_auth is set and > > "might" > > help, although I doubt it. (It is untested, but if you want to try > > it, > > good luck with it.) > > > > I'll email again if I get something more solid figured out, rick > > > >>>> I have activated the kernel debugging stuff to get some infos, > >>>> here > >>>> is > >>>> the message : > >>>> > >>>> > >>>> Fatal trap 12: page fault while in kernel mode > >>>> cpuid =3D 0; apic id =3D 00 > >>>> fault virtual address =3D 0x368 > >>>> fault code =3D supervisor read data, page not present > >>>> instruction pointer =3D 0x20:0xffffffff80866ab7 > >>>> stack pointer =3D 0x28:0xffffff804aa39ce0 > >>>> frame pointer =3D 0x28:0xffffff804aa39d30 > >>>> code segment =3D base 0x0, limit 0xfffff, type 0x1b > >>>> =3D DPL 0, pres 1, long 1, def32 0, gran 1 > >>>> processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > >>>> current process =3D 701 (mount_nfs) > >>>> trap number =3D 12 > >>>> panic: page fault > >>>> cpuid =3D 0 > >>>> KDB: stack backtrace: > >>>> #0 0xffffffff808ae486 at kdb_backtrace+0x66 > >>>> #1 0xffffffff8087885e at panic+0x1ce > >>>> #2 0xffffffff80b82380 at trap_fatal+0x290 > >>>> #3 0xffffffff80b826b8 at trap_pfault+0x1e8 > >>>> #4 0xffffffff80b82cbe at trap+0x3be > >>>> #5 0xffffffff80b6c57f at calltrap+0x8 > >>>> #6 0xffffffff80a78eda at rpc_gss_init+0x72a > >>>> #7 0xffffffff80a79cd6 at rpc_gss_refresh_auth+0x46 > >>>> #8 0xffffffff807a5a53 at newnfs_request+0x163 > >>>> #9 0xffffffff807bf0f7 at nfsrpc_getattrnovp+0xd7 > >>>> #10 0xffffffff807d9b29 at mountnfs+0x4e9 > >>>> #11 0xffffffff807db60a at nfs_mount+0x13ba > >>>> #12 0xffffffff809068fb at vfs_donmount+0x100b > >>>> #13 0xffffffff80907086 at sys_nmount+0x66 > >>>> #14 0xffffffff80b81c60 at amd64_syscall+0x540 > >>>> #15 0xffffffff80b6c867 at Xfast_syscall+0xf7 > >>>> Uptime: 2m31s > >>>> Dumping 97 out of 1002 MB:..17%..33%..50%..66%..83%..99% > >>>> > >>>> --------------------------------------------------------------------= ---- > >>>> > >>>> Does anyone as experience something similar ? is their a way to > >>>> correct that ? > >>>> Thanks for the help. > >>>> > >>> Well, you're probably the first person to try doing this in years. > >>> I > >>> did > >>> have it working about 4-5years ago. Welcome to the bleeding > >>> edge;-) > >>> > >>> Could you do the following w.r.t. above kernel: > >>> # cd /boot/nkernel (or wherever the kernel lives) > >>> # nm kernel | grep rpc_gss_init > >>> - add the offset 0x72a to the address for rpc_gss_init > >>> # addr2line -e kernel.symbols > >>> 0xXXX - the hex number above (address of rpc_gss_init+0x72a) > >>> - email me what it prints out, so I know where the crash is > >>> occurring > >>> > >>> You could also run the following command on the Linux server to > >>> capture > >>> packets during the mount attempt, then email me the xxx.pcap file > >>> so > >>> I > >>> can look at it in wireshark, to see what is happening before the > >>> crash. > >>> (I'm guessing nr_auth is somehow bogus, but that's just a > >>> guess.:-) > >>> # tcpdump -s 0 -w xxx.pcap host > >> > >> Hi, > >> > >> Sorry for the delay i was on travel and no working network > >> connection. > >> Back online for the rest of the week ^^. > >> Thanks for your help, here is what it prints out : > >> > >> root@bsdenc:/boot/kernel # nm kernel | grep rpc_gss_init > >> ffffffff80df07b0 r __set_sysinit_set_sym_svc_rpc_gss_init_sys_init > >> ffffffff80a787b0 t rpc_gss_init > >> ffffffff80a7a580 t svc_rpc_gss_init > >> ffffffff81127530 d svc_rpc_gss_init_sys_init > >> ffffffff80a7a3b0 T xdr_rpc_gss_init_res > >> root@bsdenc:/boot/kernel # addr2line -e kernel.symbols > >> 0xffffffff80a78eda > >> /usr/src/sys/rpc/rpcsec_gss/rpcsec_gss.c:772 > >> > >> > >> for the tcpdump from the linux server, i think you may are doing > >> reference to the production nfs server ? > >> if yes, unfortunately it is not linux, it is a netapp filer, so no > >> "real" root access on it (so no tcpdump available :s ). > >> if you were mentioning the old backup server (which is linux but > >> nfs > >> client), i cannot do unmount/mount on it since its production > >> (mountpoint always busy), but i can made a quick VM/testmachine > >> that > >> acts like the linux backup server and do a tcpdump from it. > >> Just let me know. Thanks again. > >> > >> -- > >> Ulysse31 > >> > >>> > >>> rick > >>> > >>>> -- > >>>> Ulysse31 > >>>> _______________________________________________ > >>>> freebsd-fs@freebsd.org mailing list > >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>>> To unsubscribe, send any mail to > >>>> "freebsd-fs-unsubscribe@freebsd.org" > > From owner-freebsd-fs@FreeBSD.ORG Thu Oct 4 16:14:30 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 53232106566B; Thu, 4 Oct 2012 16:14:30 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 7020D8FC08; Thu, 4 Oct 2012 16:14:28 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA19407; Thu, 04 Oct 2012 19:14:19 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <506DB5DB.7080302@FreeBSD.org> Date: Thu, 04 Oct 2012 19:14:19 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Nikolay Denev , freebsd-fs , Pawel Jakub Dawidek References: <906543F2-96BD-4519-B693-FD5AFB646F87@gmail.com> <506BF372.1090208@FreeBSD.org> <506C4049.4040100@FreeBSD.org> <506D81A7.8030506@FreeBSD.org> In-Reply-To: <506D81A7.8030506@FreeBSD.org> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: Subject: Re: nfs + zfs hangs on RELENG_9 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Oct 2012 16:14:30 -0000 on 04/10/2012 15:31 Andriy Gapon said the following: > > [restoring cc to fs@] > > on 04/10/2012 14:32 Nikolay Denev said the following: >> I have procstat only for the nfsd threads from the moment of the IO hang. >> And this is the only one with "arc" : >> >> 1422 138630 nfsd nfsd: service mi_switch+0x186 >> sleepq_wait+0x42 _sleep+0x390 arc_lowmem+0x77 kmem_malloc+0xc1 >> uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0xb5 arc_read_nolock+0x1ec >> arc_read+0x93 dbuf_read+0x452 dmu_buf_hold_array_by_dnode+0x16b >> dmu_buf_hold_array+0x67 dmu_read_uio+0x3f zfs_freebsd_read+0x3e8 >> nfsvno_read+0x2e5 nfsrvd_read+0x3ff nfsrvd_dorpc+0x3c0 > > Oh, very important stack trace. > > Earlier Nikolay Denev said the following: >> PID TID COMM TDNAME KSTACK >> 7 100192 zfskern arc_reclaim_thre mi_switch+0x186 sleepq_wait+0x42 _sx_xlock_hard+0x428 >> _sx_xlock+0x51 arc_buf_remove_ref+0x8a dbuf_rele_and_unlock+0x132 dbuf_evict+0x11 >> dbuf_do_evict+0x53 arc_do_user_evicts+0xb4 arc_reclaim_thread+0x263 fork_exit+0x11f >> fork_trampoline+0xe > > To me this looks like a deadlock caused by a FreeBSD add-on to ZFS: arc_lowmem > handler. > I think that this is what happens: > The nfsd thread does read, arc_read_nolock finds a buffer in a ghost cache and > calls arc_get_data_buf while holding a hash_lock (one of buffer hash locks). > arc_get_data_buf needs to allocate some memory and, as luck would have it, there > is a memory shortage. Low memory handlers are invoked (directly) and one of them > is arc_lowmem. arc_lowmem simply kicks arc_reclaim_thread to do its job and then > loops sleep-waiting until memory shortage is less severe. arc_reclaim_thread > tries to evict some buffers and, as luck would have it again, it attempts to evict > either the same buffer or, most likely, a different buffer that hashes to the same > lock. > So arc_reclaim_thread is blocked on the arc buffer lock. While the nfsd thread > holds the lock, but waits in arc_lowmem for arc_reclaim_thread to make progress. > > Eventually the held lock stalls other threads that attempt to grab it, the stall > propagates to txg_sync_thread threads and all ZFS I/O stops. > BTW, one thing to note here is that the lowmem hook was invoked because of KVA space shortage, not because of page shortage. >From practical point of view this may mean that having sufficient KVA size may help to not run into this deadlock. >From programming point of view I am tempted to let arc_lowmem block only if curproc == pageproc. That should both handle the case where blocking is most needed and should prevent the deadlock described above. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Oct 4 18:23:12 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2EED9106564A; Thu, 4 Oct 2012 18:23:12 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 6DF238FC17; Thu, 4 Oct 2012 18:23:10 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id b5so764537lbd.13 for ; Thu, 04 Oct 2012 11:23:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=I0vbwIvLkoJBdX0HnaMH4rx1ZgMBrXS0Wrhwtke3e74=; b=ytPsvSnknSDo/rloPNMc9KdhEtwYHHpOe/WgJKxfUtY/uEupKLmtDef65o49rb1zut eGszLjNq+qFtIczwkeqDI4mDfJFlW/neF0KX2clhvbgxbdOTHc4IvX2R7aCXbzuvithd a9A3iUbh3v2OuT2idJjrm9xTVAek418C1xRDepxoW15F2RaeBHWrVfYO3siRapsU97V0 0hEH3y6wu5fW6G1JsO7J6XJ8LVquHE7Zv15iK/fvJesQznvEm2yYBu4eHo22H7PNdP9Q /xhaPiShtQz1BUAnzFomu8Kfdg0B+4UFXG2gfqsD5XQCEWoGFkTIPHJUruO6+N+AvT/m UJSw== MIME-Version: 1.0 Received: by 10.152.111.227 with SMTP id il3mr4867080lab.23.1349374989672; Thu, 04 Oct 2012 11:23:09 -0700 (PDT) Received: by 10.114.23.230 with HTTP; Thu, 4 Oct 2012 11:23:09 -0700 (PDT) In-Reply-To: References: <506B4508.3030406@FreeBSD.org> Date: Thu, 4 Oct 2012 11:23:09 -0700 Message-ID: From: Freddie Cash To: Martin Matuska Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: [CFT] ZFS feature flag support for 9-STABLE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Oct 2012 18:23:12 -0000 On Tue, Oct 2, 2012 at 1:11 PM, Freddie Cash wrote: > On Tue, Oct 2, 2012 at 12:48 PM, Martin Matuska wrote: >> ZFS feature flag support is ready to be merged to 9-STABLE. >> The scheduled merge date is short after 9.1-RELEASE. >> >> Early adopters can test new features by applying the following patch >> (stable/9 r241135): >> http://people.freebsd.org/~mm/patches/zfs/9-stable-zfs-features.patch.gz >> >> Steps to apply to a clean checked-out source: >> cd /path/to/src >> patch -p0 < /path/to/9-stable-zfs-features.patch >> >> Alternatively you can download pre-compiled mfsBSD images for testing: >> >> Standard edition (amd64): >> http://mfsbsd.vx.sk/files/testing/9-stable-zfs-features.iso >> >> Special edition with installation file (amd64): >> http://mfsbsd.vx.sk/files/testing/9-stable-se-zfs-features.iso >> >> Feedback and suggestions are welcome. > > THANK YOU!! :) > > Patch applied cleanly to source tree updated a few minutes ago. > Buildworld in progress. I'll let you know if we run into any issues. Everything compiled and installed correctly. System booted correctly. Pool upgraded correctly. Was quite interesting to watch a "zfs destroy -v" in one terminal with the output of "zfs list -t all -r" of the same filesystem. Destroy output was always at least 15-20 snapshots ahead of the list. Was able to import the pool successfully, whereas before I could not as it would sit there destroying the filesystem in the foreground. Mucho Gracias!!! This fixes the last major "issue" we had with ZFS on our storage boxes. -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Thu Oct 4 19:34:09 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 90A2D1065673 for ; Thu, 4 Oct 2012 19:34:09 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 156198FC18 for ; Thu, 4 Oct 2012 19:34:08 +0000 (UTC) Received: by mail-we0-f182.google.com with SMTP id x43so698372wey.13 for ; Thu, 04 Oct 2012 12:34:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:content-type:date:subject:to:message-id:mime-version:x-mailer; bh=HohGR9Xk6iohlM6MXJUMZ/6uIhWbIEEkN7eI6AWCV6U=; b=SrA6D7iK0qQeWdKl/EyeTsmFWanvBBdDvVeyKZ4XYywBTNjhZPtZEcrAmvaGXmEOdh DvIxkmd306AdZzynTnCmwoFi0xEwe6HHHF/9PoO2VZPlddQWc6P6J93lrTx2YylmZcEI i3McqwJUCKc6kSugMliid98SwkJ6mREu7WLHMWm86AlE/YccmIGTBGVB3nF1XBuOvePx 3XCTecQ9UNn+WC5Pm86uw9r+ZtwSjUEE1tnFBcmW56j6MNu5WafsjhkpHrs5C+hWDIND mQYQgdtADVGNAKi+JN7Di7waezQ+3vM35LdEUoEmX6JZUL9fZxqUAV9d8uJGeK1VKnU0 BnKw== Received: by 10.180.94.226 with SMTP id df2mr40302378wib.11.1349379241797; Thu, 04 Oct 2012 12:34:01 -0700 (PDT) Received: from [10.0.0.86] ([93.152.184.10]) by mx.google.com with ESMTPS id k2sm16383706wiz.7.2012.10.04.12.34.00 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 04 Oct 2012 12:34:00 -0700 (PDT) From: Nikolay Denev Date: Thu, 4 Oct 2012 22:33:59 +0300 To: "" Message-Id: <5A5FE35F-7D68-4E83-A88D-3002B51F2E00@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) X-Mailer: Apple Mail (2.1498) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: zpool scrub on pool from geli devices offlines the pool? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Oct 2012 19:34:09 -0000 Hi, I have a zfs pool from 24 disks encrypted with geli. I just did a zpool scrub tank, and that probably reopened all of the = devices, but this caused geli "detach on last close" to kick in=20 which resulted in offline pool from UNAVAILABLE devices.=20 pool: tank state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool = clear'. see: http://illumos.org/msg/ZFS-8000-HC scan: scrub in progress since Thu Oct 4 21:19:15 2012 1 scanned out of 8.29T at 1/s, (scan is slow, no estimated time) 0 repaired, 0.00% done config: NAME STATE READ WRITE CKSUM tank UNAVAIL 0 0 0 raidz2-0 UNAVAIL 0 0 0 4340223731536330140 UNAVAIL 0 0 0 was = /dev/mfid1.eli 5260313034754791769 UNAVAIL 0 0 0 was = /dev/mfid2.eli 3388275563832205054 UNAVAIL 0 0 0 was = /dev/mfid3.eli 4279885200356306835 UNAVAIL 0 0 0 was = /dev/mfid4.eli 17520568003934998783 UNAVAIL 0 0 0 was = /dev/mfid5.eli 14683427064986614232 UNAVAIL 0 0 0 was = /dev/mfid6.eli 5604251825626821 UNAVAIL 0 0 0 was = /dev/mfid7.eli 2878395114688866721 UNAVAIL 0 0 0 was = /dev/mfid8.eli raidz2-1 UNAVAIL 0 0 0 1560240233906009318 UNAVAIL 0 0 0 was = /dev/mfid9.eli 17390515268955717943 UNAVAIL 0 0 0 was = /dev/mfid10.eli 16346219034888442254 UNAVAIL 0 0 0 was = /dev/mfid11.eli 16181936453927970171 UNAVAIL 0 0 0 was = /dev/mfid12.eli 13672668419715232053 UNAVAIL 0 0 0 was = /dev/mfid13.eli 8576569675278017750 UNAVAIL 0 0 0 was = /dev/mfid14.eli 7122599902867613575 UNAVAIL 0 0 0 was = /dev/mfid15.eli 6165832151020850637 UNAVAIL 0 0 0 was = /dev/mfid16.eli raidz2-2 UNAVAIL 0 0 0 2529143736541278973 UNAVAIL 0 0 0 was = /dev/mfid17.eli 5815783978070201610 UNAVAIL 0 0 0 was = /dev/mfid18.eli 10521963168174464672 UNAVAIL 0 0 0 was = /dev/mfid19.eli 17880694802593963336 UNAVAIL 0 0 0 was = /dev/mfid20.eli 2868521416175385324 UNAVAIL 0 0 0 was = /dev/mfid21.eli 16369604825508697024 UNAVAIL 0 0 0 was = /dev/mfid22.eli 10849928960759331453 UNAVAIL 0 0 0 was = /dev/mfid23.eli 7128010358193490217 UNAVAIL 0 0 0 was = /dev/mfid24.eli errors: 1 data errors, use '-v' for a list Dmesg shows : GEOM_ELI: Detached mfid1.eli on last close. =85 GEOM_ELI: Detached mfid24.eli on last close. I then did /etc/rc.d/geli restart and zpool clear tank, and it is back = online, but shows permanent metadata errors=85 Any ideas why this happned from a simple zpool scrub, and how it can be = prevented? Just disable "detach on last close" for the geli devices? From owner-freebsd-fs@FreeBSD.ORG Thu Oct 4 20:26:07 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 864BC106566B for ; Thu, 4 Oct 2012 20:26:07 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay02.ispgateway.de (smtprelay02.ispgateway.de [80.67.31.36]) by mx1.freebsd.org (Postfix) with ESMTP id 190228FC0C for ; Thu, 4 Oct 2012 20:26:06 +0000 (UTC) Received: from [78.35.176.207] (helo=fabiankeil.de) by smtprelay02.ispgateway.de with esmtpsa (SSLv3:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1TJrzD-0002AL-1o for freebsd-fs@freebsd.org; Thu, 04 Oct 2012 22:25:35 +0200 Date: Thu, 4 Oct 2012 22:24:22 +0200 From: Fabian Keil To: freebsd-fs@freebsd.org Message-ID: <20121004222422.68d176ec@fabiankeil.de> In-Reply-To: <5A5FE35F-7D68-4E83-A88D-3002B51F2E00@gmail.com> References: <5A5FE35F-7D68-4E83-A88D-3002B51F2E00@gmail.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/eLUGNlqK8CO8QdU4knNukxc"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 Subject: Re: zpool scrub on pool from geli devices offlines the pool? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Oct 2012 20:26:07 -0000 --Sig_/eLUGNlqK8CO8QdU4knNukxc Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Nikolay Denev wrote: > I have a zfs pool from 24 disks encrypted with geli. >=20 > I just did a zpool scrub tank, and that probably reopened all of the devi= ces, > but this caused geli "detach on last close" to kick in=20 > which resulted in offline pool from UNAVAILABLE devices.=20 This is a known issue: http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/117158 The fact that the system didn't panic seems like an improvement, although this might be the result of the different pool layout. > pool: tank > state: UNAVAIL > status: One or more devices are faulted in response to IO failures. > action: Make sure the affected devices are connected, then run 'zpool cle= ar'. > see: http://illumos.org/msg/ZFS-8000-HC > scan: scrub in progress since Thu Oct 4 21:19:15 2012 > 1 scanned out of 8.29T at 1/s, (scan is slow, no estimated time) > 0 repaired, 0.00% done > config: >=20 > NAME STATE READ WRITE CKSUM > tank UNAVAIL 0 0 0 [...] >=20 > errors: 1 data errors, use '-v' for a list >=20 > Dmesg shows : >=20 > GEOM_ELI: Detached mfid1.eli on last close. > =85 > GEOM_ELI: Detached mfid24.eli on last close. >=20 > I then did /etc/rc.d/geli restart and zpool clear tank, and it is back on= line, > but shows permanent metadata errors=85 I'd expect the "permanent" metadata errors to be gone after the scrubbing is completed. > Any ideas why this happned from a simple zpool scrub, and how it can be p= revented? > Just disable "detach on last close" for the geli devices? At least that was Pawel's recommendation in 2007: http://lists.freebsd.org/pipermail/freebsd-current/2007-October/078107.html Fabian --Sig_/eLUGNlqK8CO8QdU4knNukxc Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlBt8HkACgkQBYqIVf93VJ2o/gCggQ3hKU4zXUoA7D+K3HOwzqzv tBoAn01iVH146hTIljOdnlDK216bfvKm =mtka -----END PGP SIGNATURE----- --Sig_/eLUGNlqK8CO8QdU4knNukxc-- From owner-freebsd-fs@FreeBSD.ORG Thu Oct 4 20:59:14 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 019A11065672 for ; Thu, 4 Oct 2012 20:59:14 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 832448FC08 for ; Thu, 4 Oct 2012 20:59:12 +0000 (UTC) Received: by mail-wg0-f50.google.com with SMTP id 16so797930wgi.31 for ; Thu, 04 Oct 2012 13:59:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=rI0bpUvstetRIOAl72uzXQ3QHp3cxC0Z4qZIZZ+Th+U=; b=ywsKjJTSJFTir1UhipGalqi5YnSQnz4WsCn/5awKB9hkYWHGZ7N1Z6+/pf+OLjO9TS 2P7HKS4V0KUEevNXe/2z433sUqSPVqgkdzY59XVda6YL/oSzQeKloIberVH3/6nrGk0u I/oOYOULor4c0bK0C+oCYoJ7OQWssLsT6whzEnprJaItRW0j5+56OyWgD1pSX1YdrT/U JDYNXKwzjs9XvxtmPquPKTNPZ7xF+3nIvmZQu7e1FzxYTiisCdFngVf3B7XXOtva1P8Z n5mIqtQ/PJbOlu2eQyq9qO2lAPg6Nmi4Eee8bgQgA7L6IFfurVh9f9AxtsY9dhHyoi8d 5oaw== Received: by 10.216.140.73 with SMTP id d51mr3914906wej.217.1349384351948; Thu, 04 Oct 2012 13:59:11 -0700 (PDT) Received: from [10.0.0.86] ([93.152.184.10]) by mx.google.com with ESMTPS id j8sm6881714wiy.9.2012.10.04.13.59.06 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 04 Oct 2012 13:59:10 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Content-Type: text/plain; charset=windows-1252 From: Nikolay Denev In-Reply-To: <20121004222422.68d176ec@fabiankeil.de> Date: Thu, 4 Oct 2012 23:59:04 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <5A5FE35F-7D68-4E83-A88D-3002B51F2E00@gmail.com> <20121004222422.68d176ec@fabiankeil.de> To: Fabian Keil X-Mailer: Apple Mail (2.1498) Cc: freebsd-fs@freebsd.org Subject: Re: zpool scrub on pool from geli devices offlines the pool? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Oct 2012 20:59:14 -0000 On Oct 4, 2012, at 11:24 PM, Fabian Keil = wrote: > Nikolay Denev wrote: >=20 >> I have a zfs pool from 24 disks encrypted with geli. >>=20 >> I just did a zpool scrub tank, and that probably reopened all of the = devices, >> but this caused geli "detach on last close" to kick in=20 >> which resulted in offline pool from UNAVAILABLE devices.=20 >=20 > This is a known issue: > http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/117158 >=20 > The fact that the system didn't panic seems like an improvement, > although this might be the result of the different pool layout. >=20 >> pool: tank >> state: UNAVAIL >> status: One or more devices are faulted in response to IO failures. >> action: Make sure the affected devices are connected, then run 'zpool = clear'. >> see: http://illumos.org/msg/ZFS-8000-HC >> scan: scrub in progress since Thu Oct 4 21:19:15 2012 >> 1 scanned out of 8.29T at 1/s, (scan is slow, no estimated = time) >> 0 repaired, 0.00% done >> config: >>=20 >> NAME STATE READ WRITE CKSUM >> tank UNAVAIL 0 0 0 > [...] >>=20 >> errors: 1 data errors, use '-v' for a list >>=20 >> Dmesg shows : >>=20 >> GEOM_ELI: Detached mfid1.eli on last close. >> =85 >> GEOM_ELI: Detached mfid24.eli on last close. >>=20 >> I then did /etc/rc.d/geli restart and zpool clear tank, and it is = back online, >> but shows permanent metadata errors=85 >=20 > I'd expect the "permanent" metadata errors to be gone after > the scrubbing is completed. >=20 >> Any ideas why this happned from a simple zpool scrub, and how it can = be prevented? >> Just disable "detach on last close" for the geli devices? >=20 > At least that was Pawel's recommendation in 2007: > = http://lists.freebsd.org/pipermail/freebsd-current/2007-October/078107.htm= l >=20 > Fabian Thanks for the information, I have missed that. And yep, the pool reports as ONLINE without errors after the reboot. I'll add geli_autodetach=3D"NO" to rc.conf. Regards, Nikolay From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 01:15:06 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 731EC106566B; Fri, 5 Oct 2012 01:15:06 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 45EB88FC0A; Fri, 5 Oct 2012 01:15:06 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q951F6fD050532; Fri, 5 Oct 2012 01:15:06 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q951F527050526; Fri, 5 Oct 2012 01:15:05 GMT (envelope-from linimon) Date: Fri, 5 Oct 2012 01:15:05 GMT Message-Id: <201210050115.q951F527050526@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/172259: [zfs] [patch] ZFS fails to receive valid snapshots (patch included) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 01:15:06 -0000 Old Synopsis: ZFS fails to receive valid snapshots (patch included) New Synopsis: [zfs] [patch] ZFS fails to receive valid snapshots (patch included) Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Oct 5 01:14:41 UTC 2012 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=172259 From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 02:14:33 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EF880106564A; Fri, 5 Oct 2012 02:14:32 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C17468FC08; Fri, 5 Oct 2012 02:14:32 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q952EWdY058825; Fri, 5 Oct 2012 02:14:32 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q952EWRT058813; Fri, 5 Oct 2012 02:14:32 GMT (envelope-from linimon) Date: Fri, 5 Oct 2012 02:14:32 GMT Message-Id: <201210050214.q952EWRT058813@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/172334: [unionfs] unionfs permits recursive union mounts; causes panic quickly X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 02:14:33 -0000 Old Synopsis: unionfs permits recursive union mounts; causes panic quickly New Synopsis: [unionfs] unionfs permits recursive union mounts; causes panic quickly Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Oct 5 02:14:13 UTC 2012 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=172334 From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 03:15:14 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 521B6106566C; Fri, 5 Oct 2012 03:15:14 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 240708FC19; Fri, 5 Oct 2012 03:15:14 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q953FEdO066137; Fri, 5 Oct 2012 03:15:14 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q953FDdN066133; Fri, 5 Oct 2012 03:15:13 GMT (envelope-from linimon) Date: Fri, 5 Oct 2012 03:15:13 GMT Message-Id: <201210050315.q953FDdN066133@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/172348: [unionfs] umount -f of filesystem in use with readonly backed filesystem results in panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 03:15:14 -0000 Synopsis: [unionfs] umount -f of filesystem in use with readonly backed filesystem results in panic Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Oct 5 03:15:04 UTC 2012 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=172348 From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 06:38:20 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 12D5E106566C; Fri, 5 Oct 2012 06:38:20 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (garage.dawidek.net [91.121.88.72]) by mx1.freebsd.org (Postfix) with ESMTP id 9E01C8FC17; Fri, 5 Oct 2012 06:38:19 +0000 (UTC) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) by mail.dawidek.net (Postfix) with ESMTPSA id 5F90B358; Fri, 5 Oct 2012 08:37:11 +0200 (CEST) Date: Fri, 5 Oct 2012 08:38:49 +0200 From: Pawel Jakub Dawidek To: Andriy Gapon Message-ID: <20121005063848.GC1389@garage.freebsd.pl> References: <505DE715.8020806@FreeBSD.org> <506C50F1.40805@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="CblX+4bnyfN0pR09" Content-Disposition: inline In-Reply-To: <506C50F1.40805@FreeBSD.org> X-OS: FreeBSD 10.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "Justin T. Gibbs" , freebsd-fs@FreeBSD.org Subject: Re: zfs: allow to mount root from a pool not in zpool.cache X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 06:38:20 -0000 --CblX+4bnyfN0pR09 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Oct 03, 2012 at 05:51:29PM +0300, Andriy Gapon wrote: > on 23/09/2012 07:59 Justin T. Gibbs said the following: > > On Sep 22, 2012, at 10:28 AM, Andriy Gapon wrote: > >=20 > >> > >> Currently FreeBSD ZFS kernel code doesn't allow to mount root filesyst= em on a > >> pool that is not listed in zpool.cache as only pools from the cache ar= e known to > >> ZFS at that time. > >=20 > > I've for some time been of the opinion that FreeBSD should only use > > the cache file for ZFS pools created from non-GEOM objects (i.e. > > files). GEOM tasting should be used to make the kernel aware of > > all pools whether they be imported on the system, partial, or > > foreign. Even for pools created by files, the user land utilities > > should do nothing more than ask the kernel to "taste them". This > > would remove code duplicated in user land for this task (code that > > must be re-executed in kernel space for validation reasons anyway) > > and also help solve problems we've encountered at Spectra with races > > in fault event processing, spare management, and device arrival and > > departures. > >=20 > > So I'm excited by your work in this area and would encourage you > > to "think larger" than just trying to integrate root pool discovery > > with GEOM. Spectra may even be able to help in this work sometime > > in the near future. >=20 > For the moment I am trying to think "narrower" to fix the problem at hand= :-) >=20 > But I see what you say. > It doesn't make sense that > - zfsboot tastes all BIOS visible disks for pools > - zfsloader tastes all BIOS visible disks for pools [duplicated effort de= tected] > - but kernel puts its all trust in some cache file >=20 > I am not sure what performance impact would tasting of all GEOM providers= have, > but I've got this idea. geom_vdev geoms should taste all providers (like= e.g. > geom part or label do) and attach (but not g_access) to any that have val= id zfs > labels. They should cache things like pool guids, vdev guids, txgs, etc.= So that > that information is readily available for any queries. So we easily know= what > pools we have in a system, what devices from those pools are available, e= tc. When > we want to import a pool we just start using the corresponding geom_vdev = geoms > (g_access them). >=20 > This will also remove a need for a disk tasting done from userland (which= is weird > on FreeBSD). >=20 > I think that the zfs+geom part is not too much work. The userland reduct= ion part > looks scarier to me :-) The original idea behind zpool.cache on Solaris was to reduce boot time and not to taste every single disk/partition in the system if you have few dozens of even few hundred drives in the system. This argument doesn't apply to FreeBSD, as we do the tasting anyway in GEOM. We could eventually try to make it parallel, but this was never big issue for FreeBSD. In my opinion requiring no zpool.cache to import root pool and mount root file system is great idea and we should definiately do it. It will heavly simplify ZFS configuration from various recovery media, etc. User already makes him decision by either placing dataset name into /etc/fstab or by defining vfs.root.mountfrom tunable. There is no need to require anything else from him. He told us what he wants and we should just do it - import the pool even if it is in exported state and if it is not listed in zpool.cache. We already ignore hostid, because it is not available during root mount. I'm all for it. As for the other pools, I'm also in favour of autodetecting them. It will be useful if root is read-only and /boot/zfs/ is read-only as well. But here we need to be more careful. We should only import pool that are in imported state and for which system's hostid matches. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl --CblX+4bnyfN0pR09 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlBugHgACgkQForvXbEpPzR0rACg4vGbuUatO15KmPihWW9Qqc3s hgsAoPMLGPfHgl0mWsM8oUbvUQpapjQS =EHEJ -----END PGP SIGNATURE----- --CblX+4bnyfN0pR09-- From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 08:46:24 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 71F9A106566C; Fri, 5 Oct 2012 08:46:24 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com [209.85.220.54]) by mx1.freebsd.org (Postfix) with ESMTP id 3C4508FC0A; Fri, 5 Oct 2012 08:46:24 +0000 (UTC) Received: by mail-pa0-f54.google.com with SMTP id bi1so1675125pad.13 for ; Fri, 05 Oct 2012 01:46:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=4zkgTdPrpsZN0HUitzeZ4l9lKGnFT23T3sJcwTWz9dU=; b=idRcCuEaenufKdTa95UzQKEuveDi+24nGNod/q162Q8ndA7oo8rdhy0yA9VqHoi71Z yGdpNnQLSJC5VK95aeY2DYQCXv+rETaPUenmLrFREDINFjgLqvrjI8Z2dqnbr2q6SScz B9rOlEvuOJeKOzqcd6PVHSzWoXGT/YUzOAQyytkgoOOfUofb59BFx+lChYHz8HNSkKOm WdLmK5npBq/PdH5MARIGqRsimZHr2v3eUmlF1qn2mINKxtiIWVamAlrBuyVzQ3GpcM2K AoyJWhr9MTr7bsbhfOiliRk/hFNp9FGheKwGHqLXL7EColh9f39RtarlxHji162bjttQ 5ZvA== Received: by 10.68.233.97 with SMTP id tv1mr28645323pbc.96.1349426777532; Fri, 05 Oct 2012 01:46:17 -0700 (PDT) Received: from [192.168.1.128] (mau.donbass.com. [92.242.127.250]) by mx.google.com with ESMTPS id pw9sm5744432pbb.42.2012.10.05.01.46.13 (version=SSLv3 cipher=OTHER); Fri, 05 Oct 2012 01:46:16 -0700 (PDT) Message-ID: <506E9E10.1010600@gmail.com> Date: Fri, 05 Oct 2012 11:45:04 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120924 Thunderbird/15.0.1 MIME-Version: 1.0 To: Martin Matuska References: <506B4508.3030406@FreeBSD.org> In-Reply-To: <506B4508.3030406@FreeBSD.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: [CFT] ZFS feature flag support for 9-STABLE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 08:46:24 -0000 02.10.2012 22:48, Martin Matuska wrote: > ZFS feature flag support is ready to be merged to 9-STABLE. > The scheduled merge date is short after 9.1-RELEASE. > Feedback and suggestions are welcome. +1 from me: works here like a charm. I have one more question about snapshot removal behavior. I already noticed that destroying a snapshot an deduped filesystem can eat a lot of ram to complete. Actually my experience was: 1. One of the snapshots was discarded (14G) and system tried to allocate more RAM then was available. Machine stalled. 2. After reboot snapshot destruction was automatically restarted. And machine stalled again. 3. I booted in single user mode and tried a manual pool import. RAM was exhausted again. I assumed that adding swap wouldn't help either. 4. I raised memory on the machine from 4G to 8G. After boot snapshot destruction completes topping wired at 6G. As I understand snapshot destruction goes into one transaction meaning that all changed structures should be changed in memory and written to disk. Would this code have the same issue? -- Sphinx of black quartz, judge my vow. From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 10:19:48 2012 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5C57C106564A; Fri, 5 Oct 2012 10:19:48 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 39A2E8FC0A; Fri, 5 Oct 2012 10:19:46 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA28237; Fri, 05 Oct 2012 13:19:41 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TK50O-000OxA-HA; Fri, 05 Oct 2012 13:19:40 +0300 Message-ID: <506EB43B.8050204@FreeBSD.org> Date: Fri, 05 Oct 2012 13:19:39 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: "Justin T. Gibbs" References: <76CBA055-021F-458D-8978-E9A973D9B783@scsiguy.com> In-Reply-To: <76CBA055-021F-458D-8978-E9A973D9B783@scsiguy.com> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Pawel Jakub Dawidek , fs@FreeBSD.org Subject: Re: ZFS: Deadlock during vnode recycling X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 10:19:48 -0000 on 18/09/2012 18:14 Justin T. Gibbs said the following: > One of our systems became unresponsive due to an inability to recycle > vnodes. We tracked this down to a deadlock in zfs_zget(). I've attached > the stack trace from the vnlru process to the end of this email. > > We are currently testing the following patch. Since this issue is hard to > replicate I would appreciate review and feedback before I commit it to > FreeBSD. Since vnode exclusive lock is held by the curthread during reclaiming, the the following check in zfs_zget should be sufficient, IMO: if (VOP_ISLOCKED(vp) == LK_EXCLUSIVE) ... What do you think? > Patch > ===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8< > Change 635310 by justing@justing_ns1_spectrabsd on 2012/09/17 15:30:14 > > For most vnode consumers of ZFS, the appropriate behavior > when encountering a vnode that is in the process of being > reclaimed is to wait for that process to complete and then > allocate a new vnode. This behavior is enforced in zfs_zget() > by checking for the VI_DOOMED vnode flag. In the case of > the thread actually reclaiming the vnode, zfs_zget() must > return the current vnode, otherwise a deadlock will occur. > > sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h: > Create a virtual znode field, z_reclaim_td, which is > implemeted as a macro that redirects to z_task.ta_context. > > z_task is only used by the reclaim code to perform the > final cleanup of a znode in a secondary thread. Since > this can only occur after any calls to zfs_zget(), it > is safe to reuse the ta_context field. > > sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c: > In zfs_freebsd_reclaim(), record curthread in the > znode being reclaimed. > > sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c: > o Null out z_reclaim_td when znode_ts are constructed. > > o In zfs_zget(), return a "doomed vnode" if the current > thread is actively reclaiming this object. > > Affected files ... > > ... //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h#2 edit > ... //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c#3 edit > ... //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c#2 edit > > Differences ... > > ==== //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h#2 (text) ==== > > @@ -241,6 +241,7 @@ > struct task z_task; > } znode_t; > > +#define z_reclaim_td z_task.ta_context > > /* > * Convert between znode pointers and vnode pointers > > ==== //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c#3 (text) ==== > > @@ -6083,6 +6083,13 @@ > > ASSERT(zp != NULL); > > + /* > + * Mark the znode so that operations that typically block > + * waiting for reclamation to complete will return the current, > + * "doomed vnode", for this thread. > + */ > + zp->z_reclaim_td = curthread; > + > /* > * Destroy the vm object and flush associated pages. > */ > > ==== //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c#2 (text) ==== > > @@ -158,6 +158,7 @@ > zp->z_dirlocks = NULL; > zp->z_acl_cached = NULL; > zp->z_moved = 0; > + zp->z_reclaim_td = NULL; > return (0); > } > > @@ -1192,7 +1193,8 @@ > dying = 1; > else { > VN_HOLD(vp); > - if ((vp->v_iflag & VI_DOOMED) != 0) { > + if ((vp->v_iflag & VI_DOOMED) != 0 && > + zp->z_reclaim_td != curthread) { > dying = 1; > /* > * Don't VN_RELE() vnode here, because > > vnlru_proc debug session > ===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8<===8< > #0 sched_switch (td=0xfffffe000f87b470, newtd=0xfffffe000d36c8e0, flags=Variable "flags" is not available. > ) at /usr/src/sys/kern/sched_ule.c:1927 > #1 0xffffffff8057f2b6 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:485 > #2 0xffffffff805b8982 in sleepq_timedwait (wchan=0xfffffe05c7515640, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:658 > #3 0xffffffff8057f89f in _sleep (ident=0xfffffe05c7515640, lock=0x0, priority=Variable "priority" is not available. > ) at /usr/src/sys/kern/kern_synch.c:246 > #4 0xffffffff81093035 in zfs_zget (zfsvfs=0xfffffe001de4c000, obj_num=81963, zpp=0xffffff8c60dc51b0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1224 > #5 0xffffffff810bec9a in zfs_get_data (arg=0xfffffe001de4c000, lr=0xffffff820f5330b8, buf=0x0, zio=0xfffffe0584625000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1142 > #6 0xffffffff81096891 in zil_commit (zilog=0xfffffe001c382800, foid=Variable "foid" is not available. > ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:1048 > #7 0xffffffff810bceb0 in zfs_freebsd_write (ap=Variable "ap" is not available. > ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1083 > #8 0xffffffff8081f112 in VOP_WRITE_APV (vop=0xffffffff8112cf40, a=0xffffff8c60dc5680) at vnode_if.c:951 > #9 0xffffffff807b1a6b in vnode_pager_generic_putpages (vp=0xfffffe05c76171e0, ma=0xffffff8c60dc5890, bytecount=Variable "bytecount" is not available. > ) at vnode_if.h:413 > #10 0xffffffff807b1749 in vnode_pager_putpages (object=0xfffffe05e9ee9bc8, m=0xffffff8c60dc5890, count=61440, sync=1, rtvals=0xffffff8c60dc57a0) at vnode_if.h:1189 > #11 0xffffffff807aaee0 in vm_pageout_flush (mc=0xffffff8c60dc5890, count=15, flags=1, mreq=0, prunlen=0xffffff8c60dc594c, eio=0xffffff8c60dc59c0) at vm_pager.h:145 > #12 0xffffffff807a3da3 in vm_object_page_collect_flush (object=Variable "object" is not available. > ) at /usr/src/sys/vm/vm_object.c:936 > #13 0xffffffff807a3f23 in vm_object_page_clean (object=0xfffffe05e9ee9bc8, start=Variable "start" is not available. > ) at /usr/src/sys/vm/vm_object.c:861 > #14 0xffffffff807a42d4 in vm_object_terminate (object=0xfffffe05e9ee9bc8) at /usr/src/sys/vm/vm_object.c:706 > #15 0xffffffff807b241e in vnode_destroy_vobject (vp=0xfffffe05c76171e0) at /usr/src/sys/vm/vnode_pager.c:167 > #16 0xffffffff810beec7 in zfs_freebsd_reclaim (ap=Variable "ap" is not available. > ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6146 > #17 0xffffffff806101e1 in vgonel (vp=0xfffffe05c76171e0) at vnode_if.h:830 > #18 0xffffffff80616379 in vnlru_proc () at /usr/src/sys/kern/vfs_subr.c:734 > > (kgdb) frame 4 > #4 0xffffffff81093035 in zfs_zget (zfsvfs=0xfffffe001de4c000, obj_num=81963, zpp=0xffffff8c60dc51b0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1224 > 1224 tsleep(zp, 0, "zcollide", 1); > (kgdb) l > 1219 sa_buf_rele(db, NULL); > 1220 mutex_exit(&zp->z_lock); > 1221 ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num); > 1222 if (vp != NULL) > 1223 VN_RELE(vp); > 1224 tsleep(zp, 0, "zcollide", 1); > 1225 goto again; > 1226 } > 1227 *zpp = zp; > 1228 err = 0; > (kgdb) > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 10:50:54 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C6CF6106566B; Fri, 5 Oct 2012 10:50:54 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id DFC2C8FC0C; Fri, 5 Oct 2012 10:50:53 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA28434; Fri, 05 Oct 2012 13:50:52 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TK5Ua-000Ozt-3F; Fri, 05 Oct 2012 13:50:52 +0300 Message-ID: <506EBB8B.2000800@FreeBSD.org> Date: Fri, 05 Oct 2012 13:50:51 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org, Pawel Jakub Dawidek References: <505DB4E6.8030407@smeets.im> <20120924224606.GE79077@ithaqua.etoilebsd.net> <20120925090840.GD35915@deviant.kiev.zoral.com.ua> <20120929154101.GK1402@garage.freebsd.pl> <20120930122403.GB35915@deviant.kiev.zoral.com.ua> <506BFA5B.9060103@FreeBSD.org> In-Reply-To: <506BFA5B.9060103@FreeBSD.org> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Subject: Re: panic: _sx_xlock_hard: recursed on non-recursive sx zfsvfs->z_hold_mtx[i] @ ...cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1407 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 10:50:54 -0000 on 03/10/2012 11:42 Andriy Gapon said the following: > on 30/09/2012 15:24 Konstantin Belousov said the following: >> The postponing of the reclaim when vnode reserve goes low to the vnlru >> process does not solve anything, since you only change the recursion into >> the deadlock. >> >> I discussed an approach for this issue with avg. Basic idea is presented in >> the untested patch below. You can specify that some count of the free >> vnodes must be present for some dynamic scope, started by >> getnewvnode_reserve() function. While staying inside the reserved pool, >> getnewvnode() calls would not recurse into vnlru(). The scope is finished >> with getnewvnode_drop_reserve(). >> >> The getnewvnode_reserve() shall be called while no locks are held. >> >> What do you think ? > > Here is a patch that makes use of the getnewvnode_reserve API in ZFS: > http://people.freebsd.org/~avg/zfs-getnewvnode.diff > BTW, my impression is that this problem is a comeback of the original zfs_reclaim problem, but now in zfs_inactive (thanks to help from nullfs). So, with the approach that Kostik designed and which fixes zfs_inactive, zfs_reclaim should now be safe and should no longer require the taskqueue hack. There should never be a recursion back into ZFS via getnewvnode. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 12:24:16 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D8360106566B; Fri, 5 Oct 2012 12:24:16 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id EE6538FC0C; Fri, 5 Oct 2012 12:24:15 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA29073; Fri, 05 Oct 2012 15:24:13 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <506ED16C.7000207@FreeBSD.org> Date: Fri, 05 Oct 2012 15:24:12 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Nikolay Denev , Pawel Jakub Dawidek References: <906543F2-96BD-4519-B693-FD5AFB646F87@gmail.com> <506BF372.1090208@FreeBSD.org> <506C4049.4040100@FreeBSD.org> <506D81A7.8030506@FreeBSD.org> <506DB5DB.7080302@FreeBSD.org> In-Reply-To: <506DB5DB.7080302@FreeBSD.org> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs Subject: Re: nfs + zfs hangs on RELENG_9 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 12:24:16 -0000 on 04/10/2012 19:14 Andriy Gapon said the following: > BTW, one thing to note here is that the lowmem hook was invoked because of KVA > space shortage, not because of page shortage. > > From practical point of view this may mean that having sufficient KVA size may > help to not run into this deadlock. > > From programming point of view I am tempted to let arc_lowmem block only if > curproc == pageproc. That should both handle the case where blocking is most > needed and should prevent the deadlock described above. A possible patch: --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c @@ -3720,8 +3720,16 @@ arc_lowmem(void *arg __unused, int howto __unused) mutex_enter(&arc_reclaim_thr_lock); needfree = 1; cv_signal(&arc_reclaim_thr_cv); - while (needfree) - msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); + + /* + * It is unsafe to block here in arbitrary threads, because we can come + * here from ARC itself and may hold ARC locks and thus risk a deadlock + * with ARC reclaim thread. + */ + if (curproc == pageproc) { + while (needfree) + msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); + } mutex_exit(&arc_reclaim_thr_lock); mutex_exit(&arc_lowmem_lock); } -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 13:13:56 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 821) id C387E106564A; Fri, 5 Oct 2012 13:13:56 +0000 (UTC) Date: Fri, 5 Oct 2012 13:13:56 +0000 From: John To: FreeBSD-FS Message-ID: <20121005131356.GA13888@FreeBSD.org> References: <20121003032738.GA42140@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121003032738.GA42140@FreeBSD.org> User-Agent: Mutt/1.4.2.1i Cc: FreeBSD-SCSI Subject: Re: ZFS/istgt lockup X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 13:13:56 -0000 Copying this reply to -scsi. Not sure if it's more of a zfs issue or istgt... more below... ----- John's Original Message ----- > Hi Folks, > > I've been chasing a problem that I'm not quite sure originates > on the BSD side, but the system shouldn't lock up and require a power > cycle to reboot. > > The config: I have a bsd system running 9.1RC handing out a > 36TB volume to a Linux RHEL 6.1 system. The RHEL 6.1 systems is > doing heavy I/O & number crunching. Many hours into the job stream > the kernel becomes quite unhappy: > > kernel: __ratelimit: 27665 callbacks suppressed > kernel: swapper: page allocation failure. order:1, mode:0x4020 > kernel: Pid: 0, comm: swapper Tainted: G ---------------- T 2.6.32-131.0.15.el6.x86_64 #1 > kernel: Call Trace: > kernel: [] ? __alloc_pages_nodemask+0x716/0x8b0 > kernel: [] ? alloc_pages_current+0xaa/0x110 > kernel: [] ? refill_fl+0x3d5/0x4a0 [cxgb3] > kernel: [] ? napi_frags_finish+0x6d/0xb0 > kernel: [] ? process_responses+0x653/0x1450 [cxgb3] > kernel: [] ? ring_buffer_lock_reserve+0xa2/0x160 > kernel: [] ? napi_rx_handler+0x3c/0x90 [cxgb3] > kernel: [] ? net_rx_action+0x103/0x2f0 > kernel: [] ? __do_softirq+0xb7/0x1e0 > kernel: [] ? handle_IRQ_event+0xf6/0x170 > kernel: [] ? call_softirq+0x1c/0x30 > kernel: [] ? do_softirq+0x65/0xa0 > kernel: [] ? irq_exit+0x85/0x90 > kernel: [] ? do_IRQ+0x75/0xf0 > kernel: [] ? ret_from_intr+0x0/0x11 > kernel: [] ? native_safe_halt+0xb/0x10 > kernel: [] ? ftrace_raw_event_power_start+0x16/0x20 > kernel: [] ? default_idle+0x4d/0xb0 > kernel: [] ? cpu_idle+0xb6/0x110 > kernel: [] ? start_secondary+0x202/0x245 > > On the bsd side, the istgt daemon appears to see that one of the > connection threads is down and attempts to restart it. At this point, > the istgt process size starts to grow. > > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND > root 1224 0.0 0.4 8041092 405472 v0- DL 4:59PM 15:28.72 /usr/local/bin/istgt > root 1224 0.0 0.4 8041092 405472 v0- IL 4:59PM 63:18.34 /usr/local/bin/istgt > root 1224 0.0 0.4 8041092 405472 v0- IL 4:59PM 61:13.80 /usr/local/bin/istgt > root 1224 0.0 0.4 8041092 405472 v0- IL 4:59PM 0:00.00 /usr/local/bin/istgt > > There are more than 1400 threads reported. > > Also of interest, netstat shows: > > tcp4 0 0 10.59.6.12.5010 10.59.25.113.54076 CLOSE_WAIT > tcp4 0 0 10.60.6.12.5010 10.60.25.113.33345 CLOSED > tcp4 0 0 10.59.6.12.5010 10.59.25.113.54074 CLOSE_WAIT > tcp4 0 0 10.60.6.12.5010 10.60.25.113.33343 CLOSED > tcp4 0 0 10.59.6.12.5010 10.59.25.113.54072 CLOSE_WAIT > tcp4 0 0 10.60.6.12.5010 10.60.25.113.33341 CLOSED > tcp4 0 0 10.60.6.12.5010 10.60.25.113.33339 CLOSED > tcp4 0 0 10.59.6.12.5010 10.59.25.113.54070 CLOSE_WAIT > tcp4 0 0 10.60.6.12.5010 10.60.25.113.53806 CLOSE_WAIT > > There are more than 1400 sockets in the CLOSE* state. What would > prevent these sockets from cleaning up in a reasonable timeframe? > Both sides of the mpio connection appear to be attempting reconnects. > > An attempt to gracefully kill istgt fails. A kill -9 does not clean > things up either. > > A procstat -kk 1224 after the kill -9 shows: > > PID TID COMM TDNAME KSTACK > 1224 100959 istgt sigthread mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 zio_wait+0x61 dbuf_read+0x5e5 dmu_buf_hold+0xe0 zap_lockdir+0x58 zap_ > lookup_norm+0x45 zap_lookup+0x2e zfs_dirent_lock+0x4ff zfs_dirlook+0x69 zfs_lookup+0x26b zfs_freebsd_lookup+0x81 vfs_cache_lookup+0xf8 VOP_LOOKUP_APV+0x40 lookup+0x > 464 namei+0x4e9 vn_open_cred+0x3cb > 1224 100960 istgt luthread #1 mi_switch+0x186 sleepq_wait+0x42 _sleep+0x376 bwait+0x64 physio+0x246 devfs_write_f+0x8d dofilewrite+0x8b kern_writev > +0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7 > 1224 103533 istgt sendthread #1493 mi_switch+0x186 thread_suspend_switch+0xc9 thread_single+0x1b2 exit1+0x72 sigexit+0x7c postsig+0x3a4 ast+0x26c doreti > _ast+0x1f > > > An attempt to forcefully export the pool hangs also. A procstat > shows: > > PID TID COMM TDNAME KSTACK > 4427 100991 zpool - mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 dbuf_read+0x30b dmu_buf_hold+0xe0 zap_lockdir+0x58 zap_lookup_norm+0x45 zap_lookup+0x2e dsl_dir_open_spa+0x121 dsl_dataset_hold+0x3b dmu_objset_hold+0x23 zfs_ioc_objset_stats+0x2b zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b kern_ioctl+0x115 sys_ioctl+0xfd amd64_syscall+0x546 Xfast_syscall+0xf7 > > > > If anyone has any ideas, please let me know. I know I've left a lot > of config information out in an attempt to keep the email shorter. > > Random comments: > > This happens with or without multipathd enabled on the linux client. > > If I catch the istgt daemon while it's creating threads and kill it > the system will not lock up. > > I see no errors in the istgt log file. One of my next things to try > is to enable all debugging... The amount of debugging data captured > is quite large :-( > > I am using chelsio 10G cards on both client/server which have been > rock solid in all other cases. > > Thoughts welcome! > > Thanks, > John Hi Folks, I've managed to replicate this problem once. Basically, it appears the linux client sends an abort which is processed here: istgt_iscsi_op_task: switch (function) { case ISCSI_TASK_FUNC_ABORT_TASK: ISTGT_LOG("ABORT_TASK\n"); SESS_MTX_LOCK(conn); rc = istgt_lu_clear_task_ITLQ(conn, conn->sess->lu, lun, ref_CmdSN); SESS_MTX_UNLOCK(conn); if (rc < 0) { ISTGT_ERRLOG("LU reset failed\n"); } istgt_clear_transfer_task(conn, ref_CmdSN); break; At this point, the queue depth is 62. There appears to be one thread in the zfs code performing a read. No other processing occurs after this point. A zfs list hangs. The pool cannot be exported. The istgt daemon cannot be fully killed. A reboot requires a power reset (ie: reboot hangs after flushing buffers). The only thing that does appear to be happening is a growing list of connections: tcp4 0 0 10.60.6.12.5010 10.60.25.113.56577 CLOSE_WAIT tcp4 0 0 10.60.6.12.5010 10.60.25.113.56576 CLOSE_WAIT tcp4 0 0 10.60.6.12.5010 10.60.25.113.56575 CLOSE_WAIT tcp4 0 0 10.60.6.12.5010 10.60.25.113.56574 CLOSE_WAIT tcp4 0 0 10.60.6.12.5010 10.60.25.113.56573 CLOSE_WAIT tcp4 0 0 10.60.6.12.5010 10.60.25.113.56572 CLOSE_WAIT tcp4 0 0 10.60.6.12.5010 10.60.25.113.56571 CLOSE_WAIT tcp4 0 0 10.60.6.12.5010 10.60.25.113.56570 CLOSE_WAIT tcp4 0 0 10.60.6.12.5010 10.60.25.113.56569 CLOSE_WAIT Currently, about 390 and slowly going up. This implies to me that there is some sort of reconnect ocurring that is failing. On the client side, I think the problem is related to a Chelsio N320 10G nic which is showing RX overflows. After showing about 40000 overflows the ABORT was received on the server side. I've never seen a chelsio card have overflow problems. The server is using the same model chelsio card with no issues. Again, any thoughts/comments are welcome! Thanks, John From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 14:33:58 2012 Return-Path: Delivered-To: FS@FREEBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9C8711065672 for ; Fri, 5 Oct 2012 14:33:58 +0000 (UTC) (envelope-from freebsd@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id 9F07F8FC12 for ; Fri, 5 Oct 2012 14:33:54 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q95EXdKL053295 for ; Fri, 5 Oct 2012 07:33:39 -0700 (PDT) (envelope-from freebsd@pki2.com) From: Dennis Glatting To: FS@FREEBSD.ORG Date: Fri, 05 Oct 2012 07:33:39 -0700 Message-ID: <1349447619.89356.13.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q95EXdKL053295 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: freebsd@pki2.com Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: under ZFS, I can reliably crash my systems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 14:33:58 -0000 I don't know if this will be useful but I have a crash dump caused by executing a "panic" in an NMI with the kernel configured for DDB. I ran a few commands under DDB, which looked interesting, but I haven't figured out how to capture the output. The crash dump is here: http://www.pki2.com:/ZFS.crash.04Oct2012 I have five ZFS systems and can reliably crash four of them -- just place them under heavy load. The fifth system isn't used for the same purposes. The four crashing systems are AMD systems whereas the fifth (the one that doesn't give me grief) is Intel. On the system below I had just rebuilt the disk-1 array a few hours earlier. What does "crash" mean? Below is a "top" example where all of the processes are stuck on I/O. The top was still running, so the network was fine and the kernel alive. In other widows open to the server, the network was fine but no command seemed to execute, such as "ps", "ls", or "reboot" (i.e., I enter the command then nothing). What do I need to provide you to debug this problem? last pid: 16210; load averages: 0.02, 2.40, 17.18 up 0+00:59:57 21:51:27 85 processes: 1 running, 84 sleeping CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 153M Active, 2641M Inact, 16G Wired, 26M Buf, 105G Free ARC: 14G Total, 1940M MRU, 12G MFU, 58M Anon, 58M Header, 35M Other Swap: 233G Total, 233G Free USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 2817 root 17 52 4 201M 112M tx->tx 34 68:06 0.00% pbzip2 2835 root 17 52 4 201M 111M tx->tx 39 67:18 0.00% pbzip2 2775 root 17 52 4 205M 114M tx->tx 8 66:51 0.00% pbzip2 2826 root 17 52 4 205M 115M tx->tx 40 66:20 0.00% pbzip2 2784 root 17 52 4 205M 113M tx->tx 38 66:17 0.00% pbzip2 2811 root 17 52 4 205M 116M tx->tx 19 65:56 0.00% pbzip2 2805 root 17 52 4 201M 114M tx->tx 24 65:55 0.00% pbzip2 2823 root 17 52 4 205M 118M tx->tx 57 65:55 0.00% pbzip2 2799 root 17 52 4 201M 114M tx->tx 40 65:23 0.00% pbzip2 2820 root 17 52 4 205M 116M tx->tx 55 65:19 0.00% pbzip2 2829 root 17 52 4 209M 122M tx->tx 10 65:06 0.00% pbzip2 2814 root 17 52 4 205M 114M tx->tx 40 64:54 0.00% pbzip2 2808 root 17 52 4 205M 117M tx->tx 14 64:52 0.00% pbzip2 2802 root 17 52 4 205M 116M tx->tx 13 64:51 0.00% pbzip2 2832 root 17 52 4 201M 115M tx->tx 51 64:49 0.00% pbzip2 2790 root 17 52 4 209M 119M tx->tx 53 64:38 0.00% pbzip2 2787 root 17 52 4 201M 112M tx->tx 28 64:00 0.00% pbzip2 2778 root 17 52 4 205M 117M tx->tx 50 63:51 0.00% pbzip2 2766 root 17 52 4 201M 111M tx->tx 4 63:29 0.00% pbzip2 2793 root 17 52 4 205M 118M tx->tx 6 63:24 0.00% pbzip2 2772 root 17 52 4 201M 115M tx->tx 44 62:57 0.00% pbzip2 2781 root 17 52 4 205M 116M tx->tx 55 62:13 0.00% pbzip2 2769 root 17 52 4 205M 111M tx->tx 41 62:00 0.00% pbzip2 2796 root 17 52 4 201M 118M tx->tx 46 61:51 0.00% pbzip2 2828 root 1 20 0 28648K 5444K pipewr 36 3:48 0.00% john 2777 root 1 20 0 28648K 5396K pipewr 26 3:00 0.00% john 2792 root 1 20 0 28648K 5412K pipewr 62 2:51 0.00% john 2795 root 1 20 0 28648K 5432K pipewr 44 2:40 0.00% john 2831 root 1 20 0 28648K 5440K pipewr 22 2:38 0.00% john 2807 root 1 20 0 28648K 5444K pipewr 23 2:32 0.00% john 2819 root 1 20 0 28648K 5440K pipewr 44 2:32 0.00% john 2804 root 1 20 0 28648K 5444K pipewr 46 2:31 0.00% john 2789 root 1 20 0 28648K 5392K pipewr 60 2:28 0.00% john 2786 root 1 20 0 28648K 5392K pipewr 60 2:26 0.00% john 2825 root 1 20 0 28648K 5444K pipewr 25 2:24 0.00% john mc# uname -a FreeBSD mc 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #3 r241040M: Tue Oct 2 22:51:12 PDT 2012 root@mc:/sys/amd64/compile/DTRACE amd64 mc# zpool status pool: disk-1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM disk-1 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da7 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 cache da0 ONLINE 0 0 0 errors: No known data errors pool: disk-2 state: ONLINE scan: scrub repaired 0 in 0h6m with 0 errors on Thu Oct 4 20:29:35 2012 config: NAME STATE READ WRITE CKSUM disk-2 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da9 ONLINE 0 0 0 da10 ONLINE 0 0 0 errors: No known data errors I do not normally get the swap space error below. I assume this is from the operating system when I rebooted from the panic. The probe errors showed up across the servers last summer after I csup them. I remember reading on one of the lists they are harmless, though it would be nice if they would go away. :) mc# dmesg Copyright (c) 1992-2012 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.1-PRERELEASE #3 r241040M: Tue Oct 2 22:51:12 PDT 2012 root@mc:/sys/amd64/compile/DTRACE amd64 CPU: AMD Opteron(TM) Processor 6274 (2200.07-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x600f12 Family = 0x15 Model = 0x1 Stepping = 2 Features=0x178bfbff Features2=0x1e98220b AMD Features=0x2e500800 AMD Features2=0x1c9bfff,> TSC: P-state invariant, performance statistics real memory = 137438953472 (131072 MB) avail memory = 132423389184 (126288 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: <120911 APIC1027> FreeBSD/SMP: Multiprocessor System Detected: 64 CPUs FreeBSD/SMP: 4 package(s) x 16 core(s) cpu0 (BSP): APIC ID: 32 cpu1 (AP): APIC ID: 33 cpu2 (AP): APIC ID: 34 cpu3 (AP): APIC ID: 35 cpu4 (AP): APIC ID: 36 cpu5 (AP): APIC ID: 37 cpu6 (AP): APIC ID: 38 cpu7 (AP): APIC ID: 39 cpu8 (AP): APIC ID: 40 cpu9 (AP): APIC ID: 41 cpu10 (AP): APIC ID: 42 cpu11 (AP): APIC ID: 43 cpu12 (AP): APIC ID: 44 cpu13 (AP): APIC ID: 45 cpu14 (AP): APIC ID: 46 cpu15 (AP): APIC ID: 47 cpu16 (AP): APIC ID: 64 cpu17 (AP): APIC ID: 65 cpu18 (AP): APIC ID: 66 cpu19 (AP): APIC ID: 67 cpu20 (AP): APIC ID: 68 cpu21 (AP): APIC ID: 69 cpu22 (AP): APIC ID: 70 cpu23 (AP): APIC ID: 71 cpu24 (AP): APIC ID: 72 cpu25 (AP): APIC ID: 73 cpu26 (AP): APIC ID: 74 cpu27 (AP): APIC ID: 75 cpu28 (AP): APIC ID: 76 cpu29 (AP): APIC ID: 77 cpu30 (AP): APIC ID: 78 cpu31 (AP): APIC ID: 79 cpu32 (AP): APIC ID: 96 cpu33 (AP): APIC ID: 97 cpu34 (AP): APIC ID: 98 cpu35 (AP): APIC ID: 99 cpu36 (AP): APIC ID: 100 cpu37 (AP): APIC ID: 101 cpu38 (AP): APIC ID: 102 cpu39 (AP): APIC ID: 103 cpu40 (AP): APIC ID: 104 cpu41 (AP): APIC ID: 105 cpu42 (AP): APIC ID: 106 cpu43 (AP): APIC ID: 107 cpu44 (AP): APIC ID: 108 cpu45 (AP): APIC ID: 109 cpu46 (AP): APIC ID: 110 cpu47 (AP): APIC ID: 111 cpu48 (AP): APIC ID: 128 cpu49 (AP): APIC ID: 129 cpu50 (AP): APIC ID: 130 cpu51 (AP): APIC ID: 131 cpu52 (AP): APIC ID: 132 cpu53 (AP): APIC ID: 133 cpu54 (AP): APIC ID: 134 cpu55 (AP): APIC ID: 135 cpu56 (AP): APIC ID: 136 cpu57 (AP): APIC ID: 137 cpu58 (AP): APIC ID: 138 cpu59 (AP): APIC ID: 139 cpu60 (AP): APIC ID: 140 cpu61 (AP): APIC ID: 141 cpu62 (AP): APIC ID: 142 cpu63 (AP): APIC ID: 143 ACPI Warning: Optional field Pm2ControlBlock has zero address or length: 0x0000000000000000/0x1 (20110527/tbfadt-583) ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-55 on motherboard kbd1 at kbdmux0 module_register_init: MOD_LOAD (vesa, 0xffffffff80bcb500, 0) error 19 ctl: CAM Target Layer loaded cryptosoft0: on motherboard aesni0: on motherboard acpi0: <120911 XSDT1027> on motherboard acpi0: Power Button (fixed) acpi0: reservation of fec00000, 1000 (3) failed acpi0: reservation of fee00000, 1000 (3) failed acpi0: reservation of ffb80000, 80000 (3) failed acpi0: reservation of fec10000, 20 (3) failed acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, dff00000 (3) failed cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 cpu4: on acpi0 cpu5: on acpi0 cpu6: on acpi0 cpu7: on acpi0 cpu8: on acpi0 cpu9: on acpi0 cpu10: on acpi0 cpu11: on acpi0 cpu12: on acpi0 cpu13: on acpi0 cpu14: on acpi0 cpu15: on acpi0 cpu16: on acpi0 cpu17: on acpi0 cpu18: on acpi0 cpu19: on acpi0 cpu20: on acpi0 cpu21: on acpi0 cpu22: on acpi0 cpu23: on acpi0 cpu24: on acpi0 cpu25: on acpi0 cpu26: on acpi0 cpu27: on acpi0 cpu28: on acpi0 cpu29: on acpi0 cpu30: on acpi0 cpu31: on acpi0 cpu32: on acpi0 cpu33: on acpi0 cpu34: on acpi0 cpu35: on acpi0 cpu36: on acpi0 cpu37: on acpi0 cpu38: on acpi0 cpu39: on acpi0 cpu40: on acpi0 cpu41: on acpi0 cpu42: on acpi0 cpu43: on acpi0 cpu44: on acpi0 cpu45: on acpi0 cpu46: on acpi0 cpu47: on acpi0 cpu48: on acpi0 cpu49: on acpi0 cpu50: on acpi0 cpu51: on acpi0 cpu52: on acpi0 cpu53: on acpi0 cpu54: on acpi0 cpu55: on acpi0 cpu56: on acpi0 cpu57: on acpi0 cpu58: on acpi0 cpu59: on acpi0 cpu60: on acpi0 cpu61: on acpi0 cpu62: on acpi0 cpu63: on acpi0 attimer0: port 0x40-0x43 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 atrtc0: port 0x70-0x71 irq 8 on acpi0 Event timer "RTC" frequency 32768 Hz quality 0 hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pci0: at device 0.2 (no driver attached) pcib1: irq 52 at device 4.0 on pci0 pci5: on pcib1 igb0: port 0xe400-0xe41f mem 0xfeb60000-0xfeb7ffff,0xfeb40000-0xfeb5ffff,0xfeb1c000-0xfeb1ffff irq 44 at device 0.0 on pci5 igb0: Using MSIX interrupts with 9 vectors igb0: Ethernet address: 00:e0:81:c8:ee:8a igb0: Bound queue 0 to cpu 0 igb0: Bound queue 1 to cpu 1 igb0: Bound queue 2 to cpu 2 igb0: Bound queue 3 to cpu 3 igb0: Bound queue 4 to cpu 4 igb0: Bound queue 5 to cpu 5 igb0: Bound queue 6 to cpu 6 igb0: Bound queue 7 to cpu 7 igb1: port 0xe800-0xe81f mem 0xfebe0000-0xfebfffff,0xfebc0000-0xfebdffff,0xfeb9c000-0xfeb9ffff irq 45 at device 0.1 on pci5 igb1: Using MSIX interrupts with 9 vectors igb1: Ethernet address: 00:e0:81:c8:ee:8b igb1: Bound queue 0 to cpu 8 igb1: Bound queue 1 to cpu 9 igb1: Bound queue 2 to cpu 10 igb1: Bound queue 3 to cpu 11 igb1: Bound queue 4 to cpu 12 igb1: Bound queue 5 to cpu 13 igb1: Bound queue 6 to cpu 14 igb1: Bound queue 7 to cpu 15 pcib2: irq 53 at device 9.0 on pci0 pci4: on pcib2 em0: port 0xd800-0xd81f mem 0xfeae0000-0xfeafffff,0xfeadc000-0xfeadffff irq 48 at device 0.0 on pci4 em0: Using MSIX interrupts with 3 vectors em0: Ethernet address: 00:e0:81:c8:ee:ff pcib3: irq 54 at device 11.0 on pci0 pci3: on pcib3 mps0: port 0xc000-0xc0ff mem 0xfe93c000-0xfe93ffff,0xfe940000-0xfe97ffff irq 32 at device 0.0 on pci3 mps0: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd mps0: IOCCapabilities: 1285c pcib4: irq 54 at device 13.0 on pci0 pci2: on pcib4 mps1: port 0xb000-0xb0ff mem 0xfe83c000-0xfe83ffff,0xfe840000-0xfe87ffff irq 40 at device 0.0 on pci2 mps1: Firmware: 14.00.00.00, Driver: 14.00.00.01-fbsd mps1: IOCCapabilities: 185c ahci0: port 0x9000-0x9007,0x8000-0x8003,0x7000-0x7007,0x6000-0x6003,0x5000-0x500f mem 0xfdefe400-0xfdefe7ff irq 22 at device 17.0 on pci0 ahci0: AHCI v1.10 with 4 3Gbps ports, Port Multiplier supported ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahcich2: at channel 2 on ahci0 ahcich3: at channel 3 on ahci0 ohci0: mem 0xfdefa000-0xfdefafff irq 16 at device 18.0 on pci0 usbus0 on ohci0 ohci1: mem 0xfdefb000-0xfdefbfff irq 16 at device 18.1 on pci0 usbus1 on ohci1 ehci0: mem 0xfdefe800-0xfdefe8ff irq 17 at device 18.2 on pci0 usbus2: EHCI version 1.0 usbus2 on ehci0 ohci2: mem 0xfdefc000-0xfdefcfff irq 18 at device 19.0 on pci0 usbus3 on ohci2 ohci3: mem 0xfdefd000-0xfdefdfff irq 18 at device 19.1 on pci0 usbus4 on ohci3 ehci1: mem 0xfdefec00-0xfdefecff irq 19 at device 19.2 on pci0 usbus5: EHCI version 1.0 usbus5 on ehci1 pci0: at device 20.0 (no driver attached) atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0 ata0: at channel 0 on atapci0 ata1: at channel 1 on atapci0 isab0: at device 20.3 on pci0 isa0: on isab0 pcib5: at device 20.4 on pci0 pci1: on pcib5 vgapci0: port 0xa800-0xa87f mem 0xfe000000-0xfe7fffff,0xfdfe0000-0xfdffffff irq 23 at device 9.0 on pci1 ohci4: mem 0xfdeff000-0xfdefffff irq 18 at device 20.5 on pci0 usbus6 on ohci4 acpi_button0: on acpi0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart1: port 0x2f8-0x2ff irq 3 on acpi0 ppc0: port 0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: on ppc0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 orm0: at iomem 0xc0000-0xc7fff on isa0 sc0: at flags 0x100 on isa0 sc0: CGA <16 virtual consoles, flags=0x300> vga0: at port 0x3d0-0x3db iomem 0xb8000-0xbffff on isa0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] acpi_throttle0: on cpu0 Timecounters tick every 1.000 msec usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 480Mbps High Speed USB v2.0 usbus3: 12Mbps Full Speed USB v1.0 usbus4: 12Mbps Full Speed USB v1.0 usbus5: 480Mbps High Speed USB v2.0 usbus6: 12Mbps Full Speed USB v1.0 ugen0.1: at usbus0 uhub0: on usbus0 ugen1.1: at usbus1 uhub1: on usbus1 ugen2.1: at usbus2 uhub2: on usbus2 ugen3.1: at usbus3 uhub3: on usbus3 ugen4.1: at usbus4 uhub4: on usbus4 ugen5.1: at usbus5 uhub5: on usbus5 ugen6.1: at usbus6 uhub6: on usbus6 uhub6: 2 ports with 2 removable, self powered uhub0: 3 ports with 3 removable, self powered uhub1: 3 ports with 3 removable, self powered uhub3: 3 ports with 3 removable, self powered uhub4: 3 ports with 3 removable, self powered uhub2: 6 ports with 6 removable, self powered uhub5: 6 ports with 6 removable, self powered ugen5.2: at usbus5 uhub7: on usbus5 (probe257:mps1:0:2:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe257:mps1:0:2:0): CAM status: Invalid Target ID (probe257:mps1:0:2:0): Error 22, Unretryable error (probe258:mps1:0:3:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe258:mps1:0:3:0): CAM status: Invalid Target ID (probe258:mps1:0:3:0): Error 22, Unretryable error (probe0:mps1:0:2:1): INQUIRY. CDB: 12 20 0 0 24 0 (probe0:mps1:0:2:1): CAM status: Invalid Target ID (probe0:mps1:0:2:1): Error 22, Unretryable error (probe1:mps1:0:3:1): INQUIRY. CDB: 12 20 0 0 24 0 (probe1:mps1:0:3:1): CAM status: Invalid Target ID (probe1:mps1:0:3:1): Error 22, Unretryable error (probe0:mps1:0:2:2): INQUIRY. CDB: 12 40 0 0 24 0 (probe0:mps1:0:2:2): CAM status: Invalid Target ID (probe0:mps1:0:2:2): Error 22, Unretryable error (probe1:mps1:0:3:2): INQUIRY. CDB: 12 40 0 0 24 0 (probe1:mps1:0:3:2): CAM status: Invalid Target ID (probe1:mps1:0:3:2): Error 22, Unretryable error (probe0:mps1:0:2:3): INQUIRY. CDB: 12 60 0 0 24 0 (probe0:mps1:0:2:3): CAM status: Invalid Target ID (probe0:mps1:0:2:3): Error 22, Unretryable error (probe1:mps1:0:3:3): INQUIRY. CDB: 12 60 0 0 24 0 (probe1:mps1:0:3:3): CAM status: Invalid Target ID (probe1:mps1:0:3:3): Error 22, Unretryable error (probe0:mps1:0:2:4): INQUIRY. CDB: 12 80 0 0 24 0 (probe0:mps1:0:2:4): CAM status: Invalid Target ID (probe0:mps1:0:2:4): Error 22, Unretryable error (probe1:mps1:0:3:4): INQUIRY. CDB: 12 80 0 0 24 0 (probe1:mps1:0:3:4): CAM status: Invalid Target ID (probe1:mps1:0:3:4): Error 22, Unretryable error (probe0:mps1:0:2:5): INQUIRY. CDB: 12 a0 0 0 24 0 (probe0:mps1:0:2:5): CAM status: Invalid Target ID (probe0:mps1:0:2:5): Error 22, Unretryable error (probe1:mps1:0:3:5): INQUIRY. CDB: 12 a0 0 0 24 0 (probe1:mps1:0:3:5): CAM status: Invalid Target ID (probe1:mps1:0:3:5): Error 22, Unretryable error (probe0:mps1:0:2:6): INQUIRY. CDB: 12 c0 0 0 24 0 (probe0:mps1:0:2:6): CAM status: Invalid Target ID (probe0:mps1:0:2:6): Error 22, Unretryable error (probe1:mps1:0:3:6): INQUIRY. CDB: 12 c0 0 0 24 0 (probe1:mps1:0:3:6): CAM status: Invalid Target ID (probe1:mps1:0:3:6): Error 22, Unretryable error (probe0:mps1:0:2:7): INQUIRY. CDB: 12 e0 0 0 24 0 (probe0:mps1:0:2:7): CAM status: Invalid Target ID (probe0:mps1:0:2:7): Error 22, Unretryable error (probe1:mps1:0:3:7): INQUIRY. CDB: 12 e0 0 0 24 0 (probe1:mps1:0:3:7): CAM status: Invalid Target ID (probe1:mps1:0:3:7): Error 22, Unretryable error (probe255:mps1:0:0:0): REPORT LUNS. CDB: a0 0 0 0 0 0 0 0 0 10 0 0 (probe255:mps1:0:0:0): CAM status: SCSI Status Error (probe255:mps1:0:0:0): SCSI status: Check Condition (probe255:mps1:0:0:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code) (probe255:mps1:0:0:0): Command Specific Info: 0x2c4803c0 (probe255:mps1:0:0:0): Error 22, Unretryable error (probe0:mps1:0:3:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe0:mps1:0:3:0): CAM status: Invalid Target ID (probe0:mps1:0:3:0): Error 22, Unretryable error (probe1:mps1:0:2:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe1:mps1:0:2:0): CAM status: Invalid Target ID (probe1:mps1:0:2:0): Error 22, Unretryable error (probe0:mps1:0:3:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe0:mps1:0:3:0): CAM status: Invalid Target ID (probe0:mps1:0:3:0): Error 22, Unretryable error (probe1:mps1:0:2:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe1:mps1:0:2:0): CAM status: Invalid Target ID (probe1:mps1:0:2:0): Error 22, Unretryable error (probe2:mps1:0:3:1): INQUIRY. CDB: 12 20 0 0 24 0 (probe2:mps1:0:3:1): CAM status: Invalid Target ID (probe2:mps1:0:3:1): Error 22, Unretryable error (probe0:mps1:0:2:1): INQUIRY. CDB: 12 20 0 0 24 0 (probe0:mps1:0:2:1): CAM status: Invalid Target ID (probe0:mps1:0:2:1): Error 22, Unretryable error (probe1:mps1:0:3:1): INQUIRY. CDB: 12 20 0 0 24 0 (probe1:mps1:0:3:1): CAM status: Invalid Target ID (probe1:mps1:0:3:1): Error 22, Unretryable error (probe0:mps1:0:2:1): INQUIRY. CDB: 12 20 0 0 24 0 (probe0:mps1:0:2:1): CAM status: Invalid Target ID (probe0:mps1:0:2:1): Error 22, Unretryable error (probe2:mps1:0:3:2): INQUIRY. CDB: 12 40 0 0 24 0 (probe2:mps1:0:3:2): CAM status: Invalid Target ID (probe2:mps1:0:3:2): Error 22, Unretryable error (probe1:mps1:0:2:2): INQUIRY. CDB: 12 40 0 0 24 0 (probe1:mps1:0:2:2): CAM status: Invalid Target ID (probe1:mps1:0:2:2): Error 22, Unretryable error (probe0:mps1:0:3:2): INQUIRY. CDB: 12 40 0 0 24 0 (probe0:mps1:0:3:2): CAM status: Invalid Target ID (probe0:mps1:0:3:2): Error 22, Unretryable error (probe1:mps1:0:2:2): INQUIRY. CDB: 12 40 0 0 24 0 (probe1:mps1:0:2:2): CAM status: Invalid Target ID (probe1:mps1:0:2:2): Error 22, Unretryable error (probe2:mps1:0:3:3): INQUIRY. CDB: 12 60 0 0 24 0 (probe2:mps1:0:3:3): CAM status: Invalid Target ID (probe2:mps1:0:3:3): Error 22, Unretryable error (probe0:mps1:0:2:3): INQUIRY. CDB: 12 60 0 0 24 0 (probe0:mps1:0:2:3): CAM status: Invalid Target ID (probe0:mps1:0:2:3): Error 22, Unretryable error (probe1:mps1:0:3:3): INQUIRY. CDB: 12 60 0 0 24 0 (probe1:mps1:0:3:3): CAM status: Invalid Target ID (probe1:mps1:0:3:3): Error 22, Unretryable error (probe0:mps1:0:2:3): INQUIRY. CDB: 12 60 0 0 24 0 (probe0:mps1:0:2:3): CAM status: Invalid Target ID (probe0:mps1:0:2:3): Error 22, Unretryable error (probe2:mps1:0:3:4): INQUIRY. CDB: 12 80 0 0 24 0 (probe2:mps1:0:3:4): CAM status: Invalid Target ID (probe2:mps1:0:3:4): Error 22, Unretryable error (probe1:mps1:0:2:4): INQUIRY. CDB: 12 80 0 0 24 0 (probe1:mps1:0:2:4): CAM status: Invalid Target ID (probe1:mps1:0:2:4): Error 22, Unretryable error (probe0:mps1:0:3:4): INQUIRY. CDB: 12 80 0 0 24 0 (probe0:mps1:0:3:4): CAM status: Invalid Target ID (probe0:mps1:0:3:4): Error 22, Unretryable error (probe1:mps1:0:2:4): INQUIRY. CDB: 12 80 0 0 24 0 (probe1:mps1:0:2:4): CAM status: Invalid Target ID (probe1:mps1:0:2:4): Error 22, Unretryable error (probe2:mps1:0:3:5): INQUIRY. CDB: 12 a0 0 0 24 0 (probe2:mps1:0:3:5): CAM status: Invalid Target ID (probe2:mps1:0:3:5): Error 22, Unretryable error (probe0:mps1:0:2:5): INQUIRY. CDB: 12 a0 0 0 24 0 (probe0:mps1:0:2:5): CAM status: Invalid Target ID (probe0:mps1:0:2:5): Error 22, Unretryable error (probe1:mps1:0:3:5): INQUIRY. CDB: 12 a0 0 0 24 0 (probe1:mps1:0:3:5): CAM status: Invalid Target ID (probe1:mps1:0:3:5): Error 22, Unretryable error (probe0:mps1:0:2:5): INQUIRY. CDB: 12 a0 0 0 24 0 (probe0:mps1:0:2:5): CAM status: Invalid Target ID (probe0:mps1:0:2:5): Error 22, Unretryable error (probe2:mps1:0:3:6): INQUIRY. CDB: 12 c0 0 0 24 0 (probe2:mps1:0:3:6): CAM status: Invalid Target ID (probe2:mps1:0:3:6): Error 22, Unretryable error (probe1:mps1:0:2:6): INQUIRY. CDB: 12 c0 0 0 24 0 (probe1:mps1:0:2:6): CAM status: Invalid Target ID (probe1:mps1:0:2:6): Error 22, Unretryable error (probe0:mps1:0:3:6): INQUIRY. CDB: 12 c0 0 0 24 0 (probe0:mps1:0:3:6): CAM status: Invalid Target ID (probe0:mps1:0:3:6): Error 22, Unretryable error (probe1:mps1:0:2:6): INQUIRY. CDB: 12 c0 0 0 24 0 (probe1:mps1:0:2:6): CAM status: Invalid Target ID (probe1:mps1:0:2:6): Error 22, Unretryable error (probe2:mps1:0:3:7): INQUIRY. CDB: 12 e0 0 0 24 0 (probe2:mps1:0:3:7): CAM status: Invalid Target ID (probe2:mps1:0:3:7): Error 22, Unretryable error (probe0:mps1:0:2:7): INQUIRY. CDB: 12 e0 0 0 24 0 (probe0:mps1:0:2:7): CAM status: Invalid Target ID (probe0:mps1:0:2:7): Error 22, Unretryable error (probe1:mps1:0:3:7): INQUIRY. CDB: 12 e0 0 0 24 0 (probe1:mps1:0:3:7): CAM status: Invalid Target ID (probe1:mps1:0:3:7): Error 22, Unretryable error (probe0:mps1:0:2:7): INQUIRY. CDB: 12 e0 0 0 24 0 (probe0:mps1:0:2:7): CAM status: Invalid Target ID (probe0:mps1:0:2:7): Error 22, Unretryable error uhub7: 3 ports with 3 removable, self powered (probe255:mps1:0:0:0): REPORT LUNS. CDB: a0 0 0 0 0 0 0 0 0 10 0 0 (probe255:mps1:0:0:0): CAM status: SCSI Status Error (probe255:mps1:0:0:0): SCSI status: Check Condition (probe255:mps1:0:0:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code) (probe255:mps1:0:0:0): Field Replaceable Unit: 56 (probe255:mps1:0:0:0): Command Specific Info: 0x33353839 (probe255:mps1:0:0:0): Error 22, Unretryable error da0 at mps0 bus 0 scbus0 target 3 lun 0 da0: Fixed Direct Access SCSI-6 device da0: 600.000MB/s transfers da0: Command Queueing enabled da0: 244198MB (500118192 512 byte sectors: 255H 63S/T 31130C) da1 at mps0 bus 0 scbus0 target 5 lun 0 da1: Fixed Direct Access SCSI-6 device da1: 600.000MB/s transfers da1: Command Queueing enabled da1: 238475MB (488397168 512 byte sectors: 255H 63S/T 30401C) da8 at mps1 bus 0 scbus1 target 0 lun 0 da8: Fixed Direct Access SCSI-6 device da8: 150.000MB/s transfers da8: Command Queueing enabled da8: 68664MB (140623872 512 byte sectors: 255H 63S/T 8753C) da5 at mps0 bus 0 scbus0 target 9 lun 0 da5: Fixed Direct Access SCSI-6 device da5: 600.000MB/s transfers da5: Command Queueing enabled da5: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C) da7 at mps0 bus 0 scbus0 target 11 lun 0 da7: Fixed Direct Access SCSI-6 device da7: 600.000MB/s transfers da7: Command Queueing enabled da7: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C) da6 at mps0 bus 0 scbus0 target 10 lun 0 da6: Fixed Direct Access SCSI-6 device da6: 600.000MB/s transfers da6: Command Queueing enabled da6: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C) SMP: AP CPU #1 Launched! SMP: AP CPU #47 Launched! cd0 at ahcich0 bus 0 scbus2 target 0 lun 0 SMP: AP CPU #44 Launched! cd0: Removable CD-ROM SCSI-0 device cd0: 150.000MB/s transfers (SATA 1.x, UDMA6, ATAPI 12bytes, PIO 8192bytes) cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed SMP: AP CPU #57 Launched! da3 at mps0 bus 0 scbus0 target 7 lun 0 da3: Fixed Direct Access SCSI-6 device da3: 600.000MB/s transfers SMP: AP CPU #56 Launched! da3: Command Queueing enabled da3: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C) SMP: AP CPU #39 Launched! da2 at mps0 bus 0 scbus0 target 6 lun 0 da2: Fixed Direct Access SCSI-6 device da2: 600.000MB/s transfers SMP: AP CPU #38 Launched! da2: Command Queueing enabled da2: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C) SMP: AP CPU #33 Launched! SMP: AP CPU #32 Launched! SMP: AP CPU #31 Launched! SMP: AP CPU #30 Launched! da4 at mps0 bus 0 scbus0 target 8 lun 0 da4: Fixed Direct Access SCSI-6 device SMP: AP CPU #35 Launched! da4: 600.000MB/s transfers da4: Command Queueing enabled da4: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C) SMP: AP CPU #34 Launched! SMP: AP CPU #36 Launched! SMP: AP CPU #41 Launched! SMP: AP CPU #42 Launched! SMP: AP CPU #37 Launched! SMP: AP CPU #25 Launched! da9 at mps1 bus 0 scbus1 target 9 lun 0 SMP: AP CPU #59 Launched! da9: Fixed Direct Access SCSI-6 device da9: 600.000MB/s transfers da9: Command Queueing enabled SMP: AP CPU #58 Launched! da9: 3815447MB (7814037168 512 byte sectors: 255H 63S/T 486401C) da10 at mps1 bus 0 scbus1 target 11 lun 0 da10: Fixed Direct Access SCSI-6 device SMP: AP CPU #24 Launched! da10: 600.000MB/s transfers da10: Command Queueing enabled da10: 3815447MB (7814037168 512 byte sectors: 255H 63S/T 486401C) SMP: AP CPU #2 Launched! SMP: AP CPU #29 Launched! SMP: AP CPU #26 Launched! SMP: AP CPU #63 Launched! SMP: AP CPU #28 Launched! SMP: AP CPU #60 Launched! SMP: AP CPU #27 Launched! SMP: AP CPU #5 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #8 Launched! SMP: AP CPU #7 Launched! SMP: AP CPU #13 Launched! SMP: AP CPU #52 Launched! SMP: AP CPU #49 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #50 Launched! SMP: AP CPU #48 Launched! SMP: AP CPU #4 Launched! SMP: AP CPU #61 Launched! SMP: AP CPU #51 Launched! SMP: AP CPU #15 Launched! SMP: AP CPU #10 Launched! SMP: AP CPU #17 Launched! SMP: AP CPU #18 Launched! SMP: AP CPU #19 Launched! SMP: AP CPU #62 Launched! SMP: AP CPU #11 Launched! SMP: AP CPU #53 Launched! SMP: AP CPU #20 Launched! SMP: AP CPU #55 Launched! SMP: AP CPU #23 Launched! SMP: AP CPU #16 Launched! SMP: AP CPU #9 Launched! SMP: AP CPU #14 Launched! SMP: AP CPU #54 Launched! SMP: AP CPU #22 Launched! SMP: AP CPU #12 Launched! SMP: AP CPU #21 Launched! SMP: AP CPU #45 Launched! SMP: AP CPU #40 Launched! SMP: AP CPU #46 Launched! SMP: AP CPU #43 Launched! Timecounter "TSC-low" frequency 8594016 Hz quality 800 ugen5.3: at usbus5 ukbd0: on usbus5 kbd2 at ukbd0 ums0: on usbus5 ums0: 3 buttons and [Z] coordinates ID=0 Trying to mount root from ufs:/dev/gpt/disk0 [rw]... WARNING: / was not properly dismounted ugen3.2: at usbus3 ukbd1: on usbus3 kbd3 at ukbd1 uhid0: on usbus3 ZFS filesystem version 5 ZFS storage pool version 28 swap_pager: out of swap space swap_pager_getswapspace(16): failed pid 1847 (fstat), uid 0, was killed: out of swap space From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 14:48:42 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 70304106564A for ; Fri, 5 Oct 2012 14:48:42 +0000 (UTC) (envelope-from freebsd@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id 26E638FC0A for ; Fri, 5 Oct 2012 14:48:42 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q95EmXIH058668 for ; Fri, 5 Oct 2012 07:48:33 -0700 (PDT) (envelope-from freebsd@pki2.com) From: Dennis Glatting To: fs@freebsd.org Content-Type: text/plain; charset="ISO-8859-1" Date: Fri, 05 Oct 2012 07:48:33 -0700 Message-ID: <1349448513.89356.15.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q95EmXIH058668 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: freebsd@pki2.com Cc: Subject: Also: under ZFS, I can reliably crash my systems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 14:48:42 -0000 I forgot to mention an important point based on an email thread I read yesterday. My four AMD systems are NFS clients of the Intel system. In theory there should not be any NFS traffic for the tasks I am executing but operating systems do what operating systems do. :) From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 15:48:04 2012 Return-Path: Delivered-To: FS@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3A94E1065672 for ; Fri, 5 Oct 2012 15:48:04 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-oa0-f54.google.com (mail-oa0-f54.google.com [209.85.219.54]) by mx1.freebsd.org (Postfix) with ESMTP id E76B98FC08 for ; Fri, 5 Oct 2012 15:48:00 +0000 (UTC) Received: by mail-oa0-f54.google.com with SMTP id n9so2533779oag.13 for ; Fri, 05 Oct 2012 08:48:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=m582WqDcV1jrd5RRZ4TecZgWOsgGo+Ekgn7muGXfkzY=; b=YbZscpQ5UulHpXiHxLpWNldqj/HGopJqkAcS1qFcImtOC9KoUyO/1P5DmzEVwORbRk jIg4vHwgs5ibiSSiU5HvVp/B/+ybg0W9avBbFVi6P+m+7k+4VA6HWczzaeHzQwP28dYa XuTyvfAqQsZ+nNFGO+nFCSshmnkJVHLfs8UyjaQ51eswpgBdgZLRioB9GwUNjlFZic8S AvjsMTVPMYYcs3cn1rsVXIJ6APbaT8kUwhMMGNqOO3OKCl5GfhXl4oetsE3S3AM4+jn+ JuZkGsKdiironklAGlLXkwvaEQJgPRm7rewWa9QYq0FtcfKbczF238AQYDAigHW2C0Gy 7FJg== MIME-Version: 1.0 Received: by 10.182.131.37 with SMTP id oj5mr7296936obb.54.1349452080006; Fri, 05 Oct 2012 08:48:00 -0700 (PDT) Received: by 10.76.12.202 with HTTP; Fri, 5 Oct 2012 08:47:59 -0700 (PDT) In-Reply-To: <1349447619.89356.13.camel@btw.pki2.com> References: <1349447619.89356.13.camel@btw.pki2.com> Date: Fri, 5 Oct 2012 10:47:59 -0500 Message-ID: From: Adam Vande More To: Dennis Glatting Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: FS@freebsd.org Subject: Re: under ZFS, I can reliably crash my systems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 15:48:04 -0000 On Fri, Oct 5, 2012 at 9:33 AM, Dennis Glatting wrote: > I do not normally get the swap space error below. I assume this is from > the operating system when I rebooted from the panic. > > swap_pager: out of swap space > swap_pager_getswapspace(16): failed > pid 1847 (fstat), uid 0, was killed: out of swap space > What does swapinfo show? -- Adam Vande More From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 16:02:30 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 92DCE1065672; Fri, 5 Oct 2012 16:02:30 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id 1AA148FC0A; Fri, 5 Oct 2012 16:02:28 +0000 (UTC) Received: by mail-vb0-f54.google.com with SMTP id v11so2635336vbm.13 for ; Fri, 05 Oct 2012 09:02:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:in-reply-to:mime-version:from:date:message-id:subject:to :cc:content-type; bh=D8ylqgSieQB4zVIYRx4Gk4fBmxvBVDxzcgfue3L+PWY=; b=iliS7tlgekd8tRqmUCf2lBiysgLc6PHLLd4T5surKYWzm8CSmGby6yMqvp+FTHBtp8 ozASn1kHXS3bd2HFj44/PJ+GgfBgtYxYcOu8PvAa9ziddjQnoffg9FiJqsqo2COXkZyB yE2a7dFHn/zx43YT3BiSNrH7hHAeo2ay5FT8upQ/uOl2EsAu3D+/4AzCgx+2mzGTQgHe gxyDmYLIUGll1dNSDMyYZrRPcs6LRNqBX4yb2LjXg5EinnEn3LdhpdI4Y4dRBXXFj8B0 vmO3D/ZgotmOtki2Qzt+P43J9ECUhkrya2hoV6LmX7xrqWjWOVvusoUgpiVvQ/gQpGXy qpRA== Received: by 10.58.144.232 with SMTP id sp8mr5624182veb.56.1349452947242; Fri, 05 Oct 2012 09:02:27 -0700 (PDT) References: <906543F2-96BD-4519-B693-FD5AFB646F87@gmail.com> <506BF372.1090208@FreeBSD.org> <506C4049.4040100@FreeBSD.org> <506D81A7.8030506@FreeBSD.org> <506DB5DB.7080302@FreeBSD.org> <506ED16C.7000207@FreeBSD.org> In-Reply-To: <506ED16C.7000207@FreeBSD.org> Mime-Version: 1.0 (1.0) From: Nikolay Denev Date: Fri, 5 Oct 2012 19:02:24 +0300 Message-ID: <7423226525986478629@unknownmsgid> To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs Subject: Re: nfs + zfs hangs on RELENG_9 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 16:02:30 -0000 Applied and running. Will report if there are issues. Sent from my iPhone On 05.10.2012, at 15:24, Andriy Gapon wrote: > on 04/10/2012 19:14 Andriy Gapon said the following: >> BTW, one thing to note here is that the lowmem hook was invoked because of KVA >> space shortage, not because of page shortage. >> >> From practical point of view this may mean that having sufficient KVA size may >> help to not run into this deadlock. >> >> From programming point of view I am tempted to let arc_lowmem block only if >> curproc == pageproc. That should both handle the case where blocking is most >> needed and should prevent the deadlock described above. > > A possible patch: > > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > @@ -3720,8 +3720,16 @@ arc_lowmem(void *arg __unused, int howto __unused) > mutex_enter(&arc_reclaim_thr_lock); > needfree = 1; > cv_signal(&arc_reclaim_thr_cv); > - while (needfree) > - msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); > + > + /* > + * It is unsafe to block here in arbitrary threads, because we can come > + * here from ARC itself and may hold ARC locks and thus risk a deadlock > + * with ARC reclaim thread. > + */ > + if (curproc == pageproc) { > + while (needfree) > + msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); > + } > mutex_exit(&arc_reclaim_thr_lock); > mutex_exit(&arc_lowmem_lock); > } > > -- > Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 16:09:42 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E9AB61065670 for ; Fri, 5 Oct 2012 16:09:42 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 2DD248FC26 for ; Fri, 5 Oct 2012 16:09:41 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA00772; Fri, 05 Oct 2012 19:09:36 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <506F063F.8050408@FreeBSD.org> Date: Fri, 05 Oct 2012 19:09:35 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Dennis Glatting References: <1349447619.89356.13.camel@btw.pki2.com> In-Reply-To: <1349447619.89356.13.camel@btw.pki2.com> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs Subject: Re: under ZFS, I can reliably crash my systems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 16:09:43 -0000 on 05/10/2012 17:33 Dennis Glatting said the following: > swap_pager: out of swap space > swap_pager_getswapspace(16): failed > pid 1847 (fstat), uid 0, was killed: out of swap space One thing I can tell you, your kernel and userland are out of sync. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 17:42:31 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 69AE71065670; Fri, 5 Oct 2012 17:42:31 +0000 (UTC) (envelope-from fullermd@over-yonder.net) Received: from thyme.infocus-llc.com (server.infocus-llc.com [206.156.254.44]) by mx1.freebsd.org (Postfix) with ESMTP id 38B1D8FC12; Fri, 5 Oct 2012 17:42:30 +0000 (UTC) Received: from draco.over-yonder.net (c-174-50-4-38.hsd1.ms.comcast.net [174.50.4.38]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by thyme.infocus-llc.com (Postfix) with ESMTPSA id 94DFC37B4A6; Fri, 5 Oct 2012 12:42:24 -0500 (CDT) Received: by draco.over-yonder.net (Postfix, from userid 100) id 3XYJFW6CdnzJ9d; Fri, 5 Oct 2012 12:42:23 -0500 (CDT) Date: Fri, 5 Oct 2012 12:42:23 -0500 From: "Matthew D. Fuller" To: Pawel Jakub Dawidek Message-ID: <20121005174223.GU71113@over-yonder.net> References: <505DE715.8020806@FreeBSD.org> <506C50F1.40805@FreeBSD.org> <20121005063848.GC1389@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121005063848.GC1389@garage.freebsd.pl> X-Editor: vi X-OS: FreeBSD User-Agent: Mutt/1.5.21-fullermd.4 (2010-09-15) X-Virus-Scanned: clamav-milter 0.97.6 at thyme.infocus-llc.com X-Virus-Status: Clean Cc: "Justin T. Gibbs" , freebsd-fs@FreeBSD.org, Andriy Gapon Subject: Re: zfs: allow to mount root from a pool not in zpool.cache X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 17:42:31 -0000 On Fri, Oct 05, 2012 at 08:38:49AM +0200 I heard the voice of Pawel Jakub Dawidek, and lo! it spake thus: > > In my opinion requiring no zpool.cache to import root pool and mount > root file system is great idea and we should definiately do it. As a user, I heartily agree. I've setup a number of ZFS systems now, generally by booting up a live CD/USB/existing install on an extra drive, setting up and installing files, then unplugging the temp drive and letting it come up. So far, I haven't yet _ever_ had it actually boot right the first time. It keeps getting up to the root mount and failing. I've always had to plug the other drive back in and "squirrel around" with zpool.cache until it works. The last time, I believe it was because during the setup I had one NIC plugged in, and the "real" system used the other, so the hostid wound up different? It's kinda like an adventure game, but the replay value is a bit low 8-} -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 17:43:52 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 78F90106566B for ; Fri, 5 Oct 2012 17:43:52 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id C26B58FC0C for ; Fri, 5 Oct 2012 17:43:51 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA01260; Fri, 05 Oct 2012 20:43:48 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <506F1C54.1060109@FreeBSD.org> Date: Fri, 05 Oct 2012 20:43:48 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: "Matthew D. Fuller" References: <505DE715.8020806@FreeBSD.org> <506C50F1.40805@FreeBSD.org> <20121005063848.GC1389@garage.freebsd.pl> <20121005174223.GU71113@over-yonder.net> In-Reply-To: <20121005174223.GU71113@over-yonder.net> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: zfs: allow to mount root from a pool not in zpool.cache X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 17:43:52 -0000 on 05/10/2012 20:42 Matthew D. Fuller said the following: > On Fri, Oct 05, 2012 at 08:38:49AM +0200 I heard the voice of > Pawel Jakub Dawidek, and lo! it spake thus: >> >> In my opinion requiring no zpool.cache to import root pool and mount >> root file system is great idea and we should definiately do it. > > As a user, I heartily agree. I've setup a number of ZFS systems now, > generally by booting up a live CD/USB/existing install on an extra > drive, setting up and installing files, then unplugging the temp drive > and letting it come up. > > So far, I haven't yet _ever_ had it actually boot right the first > time. It keeps getting up to the root mount and failing. I've always > had to plug the other drive back in and "squirrel around" with > zpool.cache until it works. The last time, I believe it was because > during the setup I had one NIC plugged in, and the "real" system used > the other, so the hostid wound up different? It's kinda like an > adventure game, but the replay value is a bit low 8-} So try the patch from the start of this thread :-) It may to get it committed. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 18:02:52 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2A659106566B; Fri, 5 Oct 2012 18:02:52 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (garage.dawidek.net [91.121.88.72]) by mx1.freebsd.org (Postfix) with ESMTP id DCD6D8FC08; Fri, 5 Oct 2012 18:02:51 +0000 (UTC) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) by mail.dawidek.net (Postfix) with ESMTPSA id 5281B53D; Fri, 5 Oct 2012 20:01:43 +0200 (CEST) Date: Fri, 5 Oct 2012 20:03:23 +0200 From: Pawel Jakub Dawidek To: "Matthew D. Fuller" Message-ID: <20121005180322.GA1684@garage.freebsd.pl> References: <505DE715.8020806@FreeBSD.org> <506C50F1.40805@FreeBSD.org> <20121005063848.GC1389@garage.freebsd.pl> <20121005174223.GU71113@over-yonder.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="qDbXVdCdHGoSgWSk" Content-Disposition: inline In-Reply-To: <20121005174223.GU71113@over-yonder.net> X-OS: FreeBSD 10.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "Justin T. Gibbs" , freebsd-fs@FreeBSD.org, Andriy Gapon Subject: Re: zfs: allow to mount root from a pool not in zpool.cache X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 18:02:52 -0000 --qDbXVdCdHGoSgWSk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Oct 05, 2012 at 12:42:23PM -0500, Matthew D. Fuller wrote: > On Fri, Oct 05, 2012 at 08:38:49AM +0200 I heard the voice of > Pawel Jakub Dawidek, and lo! it spake thus: > >=20 > > In my opinion requiring no zpool.cache to import root pool and mount > > root file system is great idea and we should definiately do it. >=20 > As a user, I heartily agree. I've setup a number of ZFS systems now, > generally by booting up a live CD/USB/existing install on an extra > drive, setting up and installing files, then unplugging the temp drive > and letting it come up. >=20 > So far, I haven't yet _ever_ had it actually boot right the first > time. It keeps getting up to the root mount and failing. I've always > had to plug the other drive back in and "squirrel around" with > zpool.cache until it works. The last time, I believe it was because > during the setup I had one NIC plugged in, and the "real" system used > the other, so the hostid wound up different? It's kinda like an > adventure game, but the replay value is a bit low 8-} Hostid is not used for root pool, but I'm aware current way of setting up ZFS this way is far from being intuitive and requires some understanding of how it work internally. It shouldn't be that way and I'm glad it will change. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl --qDbXVdCdHGoSgWSk Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlBvIOoACgkQForvXbEpPzQzEwCfYqh9wF0tOqs8etwaCTbPuXmq 0ksAoLkk3odVPdzVJJ7skkVtkiMcmxMh =dyIR -----END PGP SIGNATURE----- --qDbXVdCdHGoSgWSk-- From owner-freebsd-fs@FreeBSD.ORG Fri Oct 5 19:49:55 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 13A371065670; Fri, 5 Oct 2012 19:49:55 +0000 (UTC) (envelope-from andrnils@gmail.com) Received: from mail-oa0-f54.google.com (mail-oa0-f54.google.com [209.85.219.54]) by mx1.freebsd.org (Postfix) with ESMTP id 9F5FF8FC0C; Fri, 5 Oct 2012 19:49:54 +0000 (UTC) Received: by mail-oa0-f54.google.com with SMTP id n9so2836202oag.13 for ; Fri, 05 Oct 2012 12:49:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=mapKNMti4+5Eki2TNEPQ/v2+bmFk6NjJ6tLPw7r46Ms=; b=yLzBqEcFViKYrpmvPHQtsvcoPLladIJw+TAlMzpE6ic0gMn+Uc9oeNI2AC6eJGH40R T30aXTayXff5uVGI/a3y+Q0TLDeL9lQkQog5HXimabZ6i8OMG9Wmfj7OJpCNYjl3plM6 pUeWGKokG3xLFsFdao6ciB84P3zbJiWQrIH63A4Z1T3qcrKT2CFo2ENjr5er3Y00C0tw 6lMQY/X+sd2GiIVN0y2ygqoVk0iUN4Vcu5w7x0//hWkrFdTZvHXEGRHMltzqYpjCh3qs 56ILrLyQEYi9OumqDQkblLyEJ/2l8Lzc4M1mqEvn2JTR9HFoty1tmpMqYzzlwNmw/fL9 7oHg== MIME-Version: 1.0 Received: by 10.182.39.105 with SMTP id o9mr913819obk.69.1349466588297; Fri, 05 Oct 2012 12:49:48 -0700 (PDT) Received: by 10.60.46.165 with HTTP; Fri, 5 Oct 2012 12:49:48 -0700 (PDT) In-Reply-To: <20121005180322.GA1684@garage.freebsd.pl> References: <505DE715.8020806@FreeBSD.org> <506C50F1.40805@FreeBSD.org> <20121005063848.GC1389@garage.freebsd.pl> <20121005174223.GU71113@over-yonder.net> <20121005180322.GA1684@garage.freebsd.pl> Date: Fri, 5 Oct 2012 19:49:48 +0000 Message-ID: From: Andreas Nilsson To: Pawel Jakub Dawidek Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "Justin T. Gibbs" , freebsd-fs@freebsd.org, Andriy Gapon , "Matthew D. Fuller" Subject: Re: zfs: allow to mount root from a pool not in zpool.cache X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2012 19:49:55 -0000 > > Hostid is not used for root pool, but I'm aware current way of setting > up ZFS this way is far from being intuitive and requires some > understanding of how it work internally. It shouldn't be that way and > I'm glad it will change. > > I'm glad to see those plans coming to fruition! I'll try to find the time to test the patches during the weekend. My ultimate goal is beeing able to boot from a zfs snapshot. Best regards Andreas From owner-freebsd-fs@FreeBSD.ORG Sat Oct 6 03:04:03 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E9BE5106566C for ; Sat, 6 Oct 2012 03:04:03 +0000 (UTC) (envelope-from ramquick@gmail.com) Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) by mx1.freebsd.org (Postfix) with ESMTP id 738008FC0C for ; Sat, 6 Oct 2012 03:04:02 +0000 (UTC) Received: by mail-wi0-f178.google.com with SMTP id hr7so1173342wib.13 for ; Fri, 05 Oct 2012 20:04:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=DgoXftZd0xWdmWgt1sWY8QF18T1kQ4LMPteOZ/qIP3w=; b=UiqXNP4NE8wvAuwYcNiFTMAkJP5tma8VCjrj3Ep15idVw03NQJUng+HxphMAtSD6ta J0AZR3P9Xy3V8i4hPRvmNCYYWpxtJb+EkIE5fcB573Wyc9vtSZdjiIs6HwUvYg4qHIZV 26ALBCNBCue6dolFdn4aGx54o3uB4x/RHf3DOw1g4Q7VhhmnB8JwBp6HpOobx79KRbdv o/j+CkZj/h1afLgSmaAklsUaYtYDrImuuDtbczanqyBqS4zbtYgZyM0BI0AMInR+RnPF Qnv2Bs0Zm6oTx3WOPAUwfowCE0yqLrjsBSk8wtBkpnwJlfIXiYiNNoIJhuEWHg1f2pz1 Coow== MIME-Version: 1.0 Received: by 10.180.86.202 with SMTP id r10mr7169441wiz.12.1349492641846; Fri, 05 Oct 2012 20:04:01 -0700 (PDT) Received: by 10.217.2.74 with HTTP; Fri, 5 Oct 2012 20:04:01 -0700 (PDT) In-Reply-To: <506D55B5.70403@brockmann-consult.de> References: <506C3EFC.2060602@FreeBSD.org> <506D55B5.70403@brockmann-consult.de> Date: Sat, 6 Oct 2012 08:34:01 +0530 Message-ID: From: Ram Chander To: Peter Maloney , freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Re: Zfs import issue X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ram_chander250@yahoo.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2012 03:04:04 -0000 Yes importing means "zfs recv" , "df" hangs, "cd " to the filesystem hangs. Basically entire filesystem is inaccessbile. Once "zfs recv" completes, all is normal. On Thu, Oct 4, 2012 at 2:54 PM, Peter Maloney < peter.maloney@brockmann-consult.de> wrote: > I find this sort of thing to be common, but not exactly as you describe. > I don't know if I tried df, but "zfs list" hangs (as well as any other > zfs related command, maybe even zdb). And I don't know what you mean > "importing zfs snapshot", so I'm guessing you mean zfs recv. > > eg. > > zfs send somedataset@somesnapshot | ....... > (leave it running in background) > > zfs list > (works fine; I guess it works because send is read-only) > > zfs destroy somedataset@someothersnapshot > (hang; I guess because this is a write operation, so it needs to wait > for the read lock on zfs send to finish the transaction) > > zfs list > (hang) > > I'm not sure if df hangs too. > > At this point, using kill -9 doesn't solve anything, and if you kill the > zfs send, it's possible that every zfs command and df will hang. > > And I don't know what, but I'm mostly sure there is something I can run > that will make even "ls" hang after this point. > > > On 10/03/2012 03:34 PM, Andriy Gapon wrote: > > on 03/10/2012 14:43 Ram Chander said the following: > >> Hi, > >> > >> I am importing zfs snapshot to freebsd-9 from anther host running > >> freebsd-9. When the import happens, it locks the filesystem, "df" hangs > >> and unable to use the filesystem. Once the import completes, the > filesystem > >> is back to normal and read/write works fine. The same doesnt happen in > >> Solaris/OpenIndiana. > >> > >> # uname -an > >> FreeBSD hostname 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan 3 07:46:30 > >> UTC 2012 root@farrell.cse.buffalo.edu:/ > >> usr/obj/usr/src/sys/GENERIC amd64 > >> > >> Zfs ver: 28 > >> > >> > >> Any inputs would be helpful. Is there any way to overcome this freeze ? > > What if you add -n option to df? > > > > > -- > > -------------------------------------------- > Peter Maloney > Brockmann Consult > Max-Planck-Str. 2 > 21502 Geesthacht > Germany > Tel: +49 4152 889 300 > Fax: +49 4152 889 333 > E-mail: peter.maloney@brockmann-consult.de > Internet: http://www.brockmann-consult.de > -------------------------------------------- > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Sat Oct 6 06:01:04 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A8216106566C for ; Sat, 6 Oct 2012 06:01:04 +0000 (UTC) (envelope-from david@wimsey.us) Received: from mail-gh0-f182.google.com (mail-gh0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 5E74E8FC08 for ; Sat, 6 Oct 2012 06:01:03 +0000 (UTC) Received: by mail-gh0-f182.google.com with SMTP id r20so745038ghr.13 for ; Fri, 05 Oct 2012 23:00:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:content-type:content-transfer-encoding:subject:message-id:date :to:mime-version:x-mailer:x-gm-message-state; bh=fU+gkagnoExUrLbO5wykhM+tw4HM9kpUW++5ueNCUoY=; b=lbLm8v8SHHDY6XrLKxH0b9AK2n5Jvq0Y6E2RwkzcGtmGGPU7zWS7TYuc5RoLbtLAxq oll0EjIEytzUcoNxlZ2DLcfSJ1bXEFQyrgSnp6+gqifFXB4MKHIYL+zGfxLLkxBaeR53 jDWAFAGbLpXzqFxbj57j6OQI0fUDbw0OPocKUQo0q0AbC8HSyp9+Ucsgg8Ikl5BQdwHx n88wkUKIDJnisGlxY90KNP23NmHpdAn1BVm2lcf2wz1gyuSdF56tZzo6jOzVfEvNGRf0 XUIx+bFJ5PPG+1ob/MlJAkAuZLwN0KKutWFu1IbdccVSXo+Tx7V4L5TF/cL7Qk2uOqRS LoYw== Received: by 10.101.103.3 with SMTP id f3mr3098879anm.37.1349503256839; Fri, 05 Oct 2012 23:00:56 -0700 (PDT) Received: from [10.27.1.242] (cpe-071-077-014-104.nc.res.rr.com. [71.77.14.104]) by mx.google.com with ESMTPS id l16sm11798572anm.6.2012.10.05.23.00.56 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 05 Oct 2012 23:00:56 -0700 (PDT) From: David Wimsey Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Message-Id: <074F3CC1-E29F-4552-840F-A38FDDCC7E76@wimsey.us> Date: Sat, 6 Oct 2012 02:00:55 -0400 To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) X-Mailer: Apple Mail (2.1498) X-Gm-Message-State: ALoCoQnQ6AdmbauLw2tIi0aE3Wg1/gBYXFT3WnZpj77+LpvwJ4YHAXOlFiJ40KsKil1p7v6iB/FG Subject: Deadlock on zfs import X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2012 06:01:04 -0000 I have a FreeBSD 9-RELEASE machine that deadlocks when importing one of = the ZFS pools on it. When I run spool import zfs01 (the offending pool) = the data becomes available, and mounts show up as expected. Then it = continues to chew away at the disks like its doing a scrub eventually = deadlocking. I've confirmed I can get to some of the data before the = deadlock happens, so I can get data off of it but its a tedious process = and doesn't help me long term. Heres how I got to this point: This machine is essentially a network file server, it serves NFS for a = vmware ESXi machine, samba for the family's Windows based machines, and = afpd for the macs as well as a couple of jails for subversion and = tftp/netboot services. Other than home directories, all of the mount = points on this machine are generally are set as readonly and are never = writable over the network. If I need to add something to the server, = its dropped into my home directory then moved to its final location from = the command line on the server itself. Noticing the offending pool was at 94% capacity I started rearranging = things and cleaning up. I had multiple shells open copying to multiple = different file systems on the same zfs pool. This normally works fine. = This time it doesn't seem so normal. Some point in copying roughly 25GB = between different filesystems on the same pool the machine deadlocked. On reboot the machine will reach the 'mounting local filesystems' phase = and then starts chugging away at the disks until it locks up again. The = only way to get it to boot is to boot to single user mode then ifs = export the offending pool. After doing so the machine works fine with = the exception of bits that depend on file systems on the offending pool. = The two other pools on the machine (boot and zfs02) work perfectly. If I boot with zfs01 exported and then import it after boot, it chugs = away at the disks for a long time and then deadlocks the machine = eventually. Some filesystems have compression and/or dedup enabled, but I have been = turning that off due to the machine only having 4GB of ram. So, can some one point me in the direction of figuring out whats wrong = and how to maybe go about fixing it. How can I tell if its memory = exhaustion thats causing the problem? Is there a way to roll the pool back (without snapshots, which I had = actually just deleted from the pool, heh) to maybe the last valid state = on the pool? Summary of machine config (output of various commands shown at the = bottom due to its size): 4GB of ram 2 SSDs, 64GB each 4 standard drives, 500GB each (2 western digital, 2 seagate) 3 ZFS pools zboot - Configured with one vdev, it is 2 slices from the SSDs as a = mirror - This pool imports normally with no issues zfs02 - Configured with one vdev, it is 2 slices from the SSDs as a = mirror - This pool imports normally with no issues zfs01 - This is the offending pool, and of course the only one with data = that can't be replaced easily or if at all. 1 raid-z vdev consisting of 3 HDDs and one hot spare HDD. 1 mirrored vdev consisting of 2 slices from the SSDs for the ZIL 2 slices from the SSDs for L2ARC Drives are all SATA connections split between the motherboard SATA ports = and a 4 port RocketPort PCI-e 'raid controller', no raid configured, = just using as additional SATA ports and providing fault tolerance if the = onboard controller fails. Output of various commands: mayham# dmesg | head -n 15 Copyright (c) 1992-2012 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights = reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.0-RELEASE-p3 #0: Tue Jun 12 02:52:29 UTC 2012 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC = amd64 CPU: AMD Phenom(tm) II X4 945 Processor (3013.28-MHz K8-class CPU) Origin =3D "AuthenticAMD" Id =3D 0x100f42 Family =3D 10 Model =3D 4 = Stepping =3D 2 = Features=3D0x178bfbff Features2=3D0x802009 AMD = Features=3D0xee500800 AMD = Features2=3D0x37ff TSC: P-state invariant real memory =3D 4294967296 (4096 MB) avail memory =3D 4075692032 (3886 MB) =20 =20 =20 mayham# dmesg | grep ada ada0 at ahcich1 bus 0 scbus1 target 0 lun 0 ada0: ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad6 ada1 at ahcich2 bus 0 scbus2 target 0 lun 0 ada1: ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada1: Previously was known as ad8 ada2 at ahcich3 bus 0 scbus3 target 0 lun 0 ada2: ATA-9 SATA 3.x device ada2: 600.000MB/s transfers (SATA 3.x, UDMA5, PIO 8192bytes) ada2: Command Queueing enabled ada2: 61057MB (125045424 512 byte sectors: 16H 63S/T 16383C) ada2: Previously was known as ad10 ada3 at ahcich4 bus 0 scbus5 target 0 lun 0 ada3: ATA-9 SATA 3.x device ada3: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes) ada3: Command Queueing enabled ada3: 61057MB (125045424 512 byte sectors: 16H 63S/T 16383C) ada3: Previously was known as ad14 ada4 at ahcich5 bus 0 scbus6 target 0 lun 0 ada4: ATA-8 SATA 2.x device ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada4: Command Queueing enabled ada4: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada4: Previously was known as ad16 ada5 at ahcich6 bus 0 scbus7 target 0 lun 0 ada5: ATA-8 SATA 2.x device ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada5: Command Queueing enabled ada5: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada5: Previously was known as ad18 =20 =20 =20 mayham# dmesg | grep -i zfs ZFS filesystem version 5 ZFS storage pool version 28 Trying to mount root from zfs:zboot []... =20 =20 =20 mayham# cat /boot/loader.conf zfs_load=3D"YES" vfs.root.mountfrom=3D"zfs:zboot" splash_bmp_load=3D"YES" vesa_load=3D"YES" loader_logo=3D"orb" loader_color=3D"YES" bitmap_load=3D"YES" if_vlan_load=3D"YES" # Added after deadlock occured vm.kmem_size=3D"512M" vm.kmem_size_max=3D"512M" vfs.zfs.arc_max=3D"40M" vfs.zfs.vdev.cache.size=3D"5M" vfs.zfs.prefetch_disable=3D"1" =20 =20 =20 mayham# zpool status pool: zboot state: ONLINE scan: scrub repaired 0 in 0h1m with 0 errors on Sun Aug 12 03:35:52 = 2012 config: NAME STATE READ WRITE CKSUM zboot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada2p2 ONLINE 0 0 0 ada3p2 ONLINE 0 0 0 errors: No known data errors pool: zfs01 state: ONLINE scan: resilvered 144K in 0h0m with 0 errors on Thu Aug 30 02:35:33 2012 config: NAME STATE READ WRITE CKSUM zfs01 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada1p3 ONLINE 0 0 0 ada0p3 ONLINE 0 0 0 ada5p3 ONLINE 0 0 0 logs ada2p4 ONLINE 0 0 0 ada3p4 ONLINE 0 0 0 cache ada2p5 ONLINE 0 0 0 ada3p5 ONLINE 0 0 0 spares gpt/disk3 AVAIL =20 errors: No known data errors pool: zfs02 state: ONLINE scan: scrub repaired 0 in 0h1m with 0 errors on Fri Oct 5 04:42:19 = 2012 config: NAME STATE READ WRITE CKSUM zfs02 ONLINE 0 0 0 ada2p6 ONLINE 0 0 0 ada3p6 ONLINE 0 0 0 errors: No known data errors mayham# pool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT zboot 3.97G 2.32G 1.65G 58% 1.12x ONLINE - zfs01 1.30T 1.24T 69.3G 94% 1.36x ONLINE - zfs02 41G 39.4G 1.63G 96% 1.21x ONLINE - From owner-freebsd-fs@FreeBSD.ORG Sat Oct 6 06:09:48 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 491B6106564A for ; Sat, 6 Oct 2012 06:09:48 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.17.9]) by mx1.freebsd.org (Postfix) with ESMTP id E32AC8FC08 for ; Sat, 6 Oct 2012 06:09:47 +0000 (UTC) Received: from [192.168.179.201] (hmbg-5f77232a.pool.mediaWays.net [95.119.35.42]) by mrelayeu.kundenserver.de (node=mrbap3) with ESMTP (Nemesis) id 0LsywU-1TQFO91EqY-012UU4; Sat, 06 Oct 2012 08:09:37 +0200 Message-ID: <506FCB19.8010703@brockmann-consult.de> Date: Sat, 06 Oct 2012 08:09:29 +0200 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120825 Thunderbird/15.0 MIME-Version: 1.0 To: ram_chander250@yahoo.com References: <506C3EFC.2060602@FreeBSD.org> <506D55B5.70403@brockmann-consult.de> In-Reply-To: X-Provags-ID: V02:K0:TTQGlbDrIIK+hH7AkXj2Qu/N3KKiHZaWTDQcnefOvCo EnhUcx0HvHLfgayjOP9ScfkQ6g5TcR1P86I+xrHJpdN1JVa+d+ ujdXXaM4/Ljz6Ig19ljz/SmnajUMQ+5yTEUM3qNdt7aoj4asoN MfNnW8Cd2EcVN/KFwKZ3NaX1Y8ZdNm0AoPhUOj6fMpVFUrG5QW +5UQplxMm4fZYE3FHw/89mC0A1LcQ7NWGzpWXzQSun5/BuY2pL xVAN48Xe19oQBKkjSD/3GCUnbvHKvOFffyRwxy2L2cmWGBDsYZ bVnXC41ydionOknzu6H8AAQALYvSruFWGWz27E/HAfqHc6ai8i Ao2FyexVBYat3vj5G4AQCQ/re4G//53J3vqm7zZ9SmCXGtcrfM OSj/QX62UX17Q== Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: Zfs import issue X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2012 06:09:48 -0000 Okay, then we have the same problem except I need one more zfs command after the send to hang the filesystem. In my case, this means I can simply never call zfs twice before the first is done. eg. a file based lock in my scripts; or wrap the zfs command with something that locks. Does this solve (workaround) your problem? I find (in 8.2-STABLE, 8.3-STABLE) the file system works 100% during a send unless there is something else with a "zfs" command that happens; only zfs commands will hang/block , not ls, df, etc.; in my example below, I destroyed a snapshot, and that command hung/blocked. And when the hung/blocked zfs command is a write command, such as destroy, then the filesystem is hung/blocked too, including ls, df, etc.. And when a hung/blocked command is killed, it seems to be permanently hung (but could just be enormously slow). On 10/06/2012 05:04 AM, Ram Chander wrote: > Yes importing means "zfs recv" , "df" hangs, "cd " to the filesystem > hangs. Basically entire filesystem is inaccessbile. Once "zfs recv" > completes, all is normal. > > On Thu, Oct 4, 2012 at 2:54 PM, Peter Maloney > > wrote: > > I find this sort of thing to be common, but not exactly as you > describe. > I don't know if I tried df, but "zfs list" hangs (as well as any other > zfs related command, maybe even zdb). And I don't know what you mean > "importing zfs snapshot", so I'm guessing you mean zfs recv. > > eg. > > zfs send somedataset@somesnapshot | ....... > (leave it running in background) > > zfs list > (works fine; I guess it works because send is read-only) > > zfs destroy somedataset@someothersnapshot > (hang; I guess because this is a write operation, so it needs to wait > for the read lock on zfs send to finish the transaction) > > zfs list > (hang) > > I'm not sure if df hangs too. > > At this point, using kill -9 doesn't solve anything, and if you > kill the > zfs send, it's possible that every zfs command and df will hang. > > And I don't know what, but I'm mostly sure there is something I > can run > that will make even "ls" hang after this point. > > > On 10/03/2012 03:34 PM, Andriy Gapon wrote: > > on 03/10/2012 14:43 Ram Chander said the following: > >> Hi, > >> > >> I am importing zfs snapshot to freebsd-9 from anther host running > >> freebsd-9. When the import happens, it locks the filesystem, > "df" hangs > >> and unable to use the filesystem. Once the import completes, > the filesystem > >> is back to normal and read/write works fine. The same doesnt > happen in > >> Solaris/OpenIndiana. > >> > >> # uname -an > >> FreeBSD hostname 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan 3 > 07:46:30 > >> UTC 2012 root@farrell.cse.buffalo.edu:/ > >> usr/obj/usr/src/sys/GENERIC amd64 > >> > >> Zfs ver: 28 > >> > >> > >> Any inputs would be helpful. Is there any way to overcome this > freeze ? > > What if you add -n option to df? > > > > > -- > > -------------------------------------------- > Peter Maloney > Brockmann Consult > Max-Planck-Str. 2 > 21502 Geesthacht > Germany > Tel: +49 4152 889 300 > Fax: +49 4152 889 333 > E-mail: peter.maloney@brockmann-consult.de > > Internet: http://www.brockmann-consult.de > -------------------------------------------- > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to > "freebsd-fs-unsubscribe@freebsd.org > " > > From owner-freebsd-fs@FreeBSD.ORG Sat Oct 6 11:20:18 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 21C7B106566B; Sat, 6 Oct 2012 11:20:18 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 1AFFC8FC0C; Sat, 6 Oct 2012 11:20:16 +0000 (UTC) Received: by mail-we0-f182.google.com with SMTP id x43so1983541wey.13 for ; Sat, 06 Oct 2012 04:20:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=G5XdW5gh/GjbNrWuIQkDnuWwceaifLYUubBEB8ftJrE=; b=zVmv4edKO66V/djE0bNJuGYHbUZ6Yg0Bme3OjaQbUjSFXl7sUr2ui7ON97BaigI5aV ztbr+6uTI4ZzTjPaAqRyLVDxfXc1a33Igma6roIJ/RMmIvOkeLojfq+h+FSZlqelzwUc uJS8QLy4md7/Tr/QbWdQIJO0nnLK5+2RqOJFoaJVFUW7mprFuTKu2Cj1ILdhIMAcuyTI 5M11c6YuSOwttqHP1L2xOeHrGa8/BMs7Udfz0sRaqfrvzpt+4hKJdgm+k8t8HprHbpAD 4U/ztalBhrWjkLmax3VU4UWcqlnBX8nBp+ndA0BJOC8WWRuTSLdQN2auIA34BT6XOrdm RKGQ== Received: by 10.216.207.163 with SMTP id n35mr6807055weo.220.1349522415533; Sat, 06 Oct 2012 04:20:15 -0700 (PDT) Received: from [10.0.0.86] ([93.152.184.10]) by mx.google.com with ESMTPS id cl8sm7638401wib.10.2012.10.06.04.20.13 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 06 Oct 2012 04:20:14 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Content-Type: text/plain; charset=windows-1252 From: Nikolay Denev In-Reply-To: <1666343702.1682678.1349300219198.JavaMail.root@erie.cs.uoguelph.ca> Date: Sat, 6 Oct 2012 14:20:11 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <3E7BCFB4-6EE6-48F5-ACA7-A615F3CE5BAC@gmail.com> References: <1666343702.1682678.1349300219198.JavaMail.root@erie.cs.uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.1498) Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, hackers@freebsd.org, Garrett Wollman Subject: Re: NFS server bottlenecks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2012 11:20:18 -0000 On Oct 4, 2012, at 12:36 AM, Rick Macklem wrote: > Garrett Wollman wrote: >> <> said: >>=20 >>>> Simple: just use a sepatate mutex for each list that a cache entry >>>> is on, rather than a global lock for everything. This would reduce >>>> the mutex contention, but I'm not sure how significantly since I >>>> don't have the means to measure it yet. >>>>=20 >>> Well, since the cache trimming is removing entries from the lists, I >>> don't >>> see how that can be done with a global lock for list updates? >>=20 >> Well, the global lock is what we have now, but the cache trimming >> process only looks at one list at a time, so not locking the list = that >> isn't being iterated over probably wouldn't hurt, unless there's some >> mechanism (that I didn't see) for entries to move from one list to >> another. Note that I'm considering each hash bucket a separate >> "list". (One issue to worry about in that case would be cache-line >> contention in the array of hash buckets; perhaps NFSRVCACHE_HASHSIZE >> ought to be increased to reduce that.) >>=20 > Yea, a separate mutex for each hash list might help. There is also the > LRU list that all entries end up on, that gets used by the trimming = code. > (I think? I wrote this stuff about 8 years ago, so I haven't looked at > it in a while.) >=20 > Also, increasing the hash table size is probably a good idea, = especially > if you reduce how aggressively the cache is trimmed. >=20 >>> Only doing it once/sec would result in a very large cache when >>> bursts of >>> traffic arrives. >>=20 >> My servers have 96 GB of memory so that's not a big deal for me. >>=20 > This code was originally "production tested" on a server with 1Gbyte, > so times have changed a bit;-) >=20 >>> I'm not sure I see why doing it as a separate thread will improve >>> things. >>> There are N nfsd threads already (N can be bumped up to 256 if you >>> wish) >>> and having a bunch more "cache trimming threads" would just increase >>> contention, wouldn't it? >>=20 >> Only one cache-trimming thread. The cache trim holds the (global) >> mutex for much longer than any individual nfsd service thread has any >> need to, and having N threads doing that in parallel is why it's so >> heavily contended. If there's only one thread doing the trim, then >> the nfsd service threads aren't spending time either contending on = the >> mutex (it will be held less frequently and for shorter periods). >>=20 > I think the little drc2.patch which will keep the nfsd threads from > acquiring the mutex and doing the trimming most of the time, might be > sufficient. I still don't see why a separate trimming thread will be > an advantage. I'd also be worried that the one cache trimming thread > won't get the job done soon enough. >=20 > When I did production testing on a 1Gbyte server that saw a peak > load of about 100RPCs/sec, it was necessary to trim aggressively. > (Although I'd be tempted to say that a server with 1Gbyte is no > longer relevant, I recently recall someone trying to run FreeBSD > on a i486, although I doubt they wanted to run the nfsd on it.) >=20 >>> The only negative effect I can think of w.r.t. having the nfsd >>> threads doing it would be a (I believe negligible) increase in RPC >>> response times (the time the nfsd thread spends trimming the cache). >>> As noted, I think this time would be negligible compared to disk I/O >>> and network transit times in the total RPC response time? >>=20 >> With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G >> network connectivity, spinning on a contended mutex takes a >> significant amount of CPU time. (For the current design of the NFS >> server, it may actually be a win to turn off adaptive mutexes -- I >> should give that a try once I'm able to do more testing.) >>=20 > Have fun with it. Let me know when you have what you think is a good = patch. >=20 > rick >=20 >> -GAWollman >> _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to >> "freebsd-hackers-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" I was doing some NFS testing with RELENG_9 machine and a Linux RHEL machine over 10G network, and noticed the same nfsd threads = issue. Previously I would read a 32G file locally on the FreeBSD ZFS/NFS server = with "dd if=3D/tank/32G.bin of=3D/dev/null bs=3D1M" to cache it = completely in ARC (machine has 196G RAM), then if I do this again locally I would get close to 4GB/sec read - = completely from the cache... But If I try to read the file over NFS from the Linux machine I would = only get about 100MB/sec speed, sometimes a bit more, and all of the nfsd threads are clearly visible in top. pmcstat also = showed the same mutex contention as in the original post. I've now applied the drc2 patch, and reruning the same test yields about = 960MB/s transfer over NFS=85 quite an improvement! From owner-freebsd-fs@FreeBSD.ORG Sat Oct 6 14:33:34 2012 Return-Path: Delivered-To: FS@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C820C106566C for ; Sat, 6 Oct 2012 14:33:34 +0000 (UTC) (envelope-from dg17@penx.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id 926FB8FC0C for ; Sat, 6 Oct 2012 14:33:34 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q96EXUxW048872; Sat, 6 Oct 2012 07:33:30 -0700 (PDT) (envelope-from dg17@penx.com) From: Dennis Glatting To: Adam Vande More In-Reply-To: References: <1349447619.89356.13.camel@btw.pki2.com> Content-Type: text/plain; charset="us-ascii" Date: Sat, 06 Oct 2012 07:33:30 -0700 Message-ID: <1349534010.45402.3.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q96EXUxW048872 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: dg17@penx.com Cc: FS@freebsd.org Subject: Re: under ZFS, I can reliably crash my systems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dg17@penx.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2012 14:33:34 -0000 On Fri, 2012-10-05 at 10:47 -0500, Adam Vande More wrote: > On Fri, Oct 5, 2012 at 9:33 AM, Dennis Glatting wrote: > > > I do not normally get the swap space error below. I assume this is from > > the operating system when I rebooted from the panic. > > > > > swap_pager: out of swap space > > swap_pager_getswapspace(16): failed > > pid 1847 (fstat), uid 0, was killed: out of swap space > > > > What does swapinfo show? > System was rebooted again, so this is of little value: mc# swapinfo -h Device 1K-blocks Used Avail Capacity /dev/gpt/swap0 244198544 0B 232G 0% From owner-freebsd-fs@FreeBSD.ORG Sat Oct 6 14:38:02 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A6E58106564A; Sat, 6 Oct 2012 14:38:02 +0000 (UTC) (envelope-from dg17@penx.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id 71A358FC08; Sat, 6 Oct 2012 14:38:02 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q96Ebv1E050562; Sat, 6 Oct 2012 07:37:57 -0700 (PDT) (envelope-from dg17@penx.com) From: Dennis Glatting To: Andriy Gapon In-Reply-To: <506F063F.8050408@FreeBSD.org> References: <1349447619.89356.13.camel@btw.pki2.com> <506F063F.8050408@FreeBSD.org> Content-Type: text/plain; charset="us-ascii" Date: Sat, 06 Oct 2012 07:37:57 -0700 Message-ID: <1349534277.45402.7.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q96Ebv1E050562 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: dg17@penx.com Cc: freebsd-fs Subject: Re: under ZFS, I can reliably crash my systems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dg17@penx.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2012 14:38:02 -0000 On Fri, 2012-10-05 at 19:09 +0300, Andriy Gapon wrote: > on 05/10/2012 17:33 Dennis Glatting said the following: > > swap_pager: out of swap space > > swap_pager_getswapspace(16): failed > > pid 1847 (fstat), uid 0, was killed: out of swap space > > One thing I can tell you, your kernel and userland are out of sync. > How so? If svn src and rebuild everything. Do you mean ports? Typical src build is: svn co svn://svn.pki2.com/base/stable/9/ /disk-1/src cd /usr/src; make -j65 buildworld make installworld yes | make delete-old yes | make delete-old-libs mergemaster cd /sys/amd64/conf/ ./mkconfig.pl SMUNI.in config SMUNI cd ../compile/SMUNI make cleandepend && make depend && make make install That's fairly straight forward. From owner-freebsd-fs@FreeBSD.ORG Sat Oct 6 15:21:10 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C019C106564A for ; Sat, 6 Oct 2012 15:21:10 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 171508FC0A for ; Sat, 6 Oct 2012 15:21:09 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA08813; Sat, 06 Oct 2012 18:21:01 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TKWBZ-0004Gm-0b; Sat, 06 Oct 2012 18:21:01 +0300 Message-ID: <50704C5A.2060902@FreeBSD.org> Date: Sat, 06 Oct 2012 18:20:58 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: dg17@penx.com References: <1349447619.89356.13.camel@btw.pki2.com> <506F063F.8050408@FreeBSD.org> <1349534277.45402.7.camel@btw.pki2.com> In-Reply-To: <1349534277.45402.7.camel@btw.pki2.com> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs Subject: Re: under ZFS, I can reliably crash my systems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2012 15:21:10 -0000 on 06/10/2012 17:37 Dennis Glatting said the following: > On Fri, 2012-10-05 at 19:09 +0300, Andriy Gapon wrote: >> on 05/10/2012 17:33 Dennis Glatting said the following: >>> swap_pager: out of swap space >>> swap_pager_getswapspace(16): failed >>> pid 1847 (fstat), uid 0, was killed: out of swap space >> >> One thing I can tell you, your kernel and userland are out of sync. >> > > How so? If svn src and rebuild everything. Do you mean ports? > > Typical src build is: > > svn co svn://svn.pki2.com/base/stable/9/ /disk-1/src > cd /usr/src; make -j65 buildworld > make installworld > yes | make delete-old > yes | make delete-old-libs > mergemaster > cd /sys/amd64/conf/ > ./mkconfig.pl SMUNI.in > config SMUNI > cd ../compile/SMUNI > make cleandepend && make depend && make > make install > > That's fairly straight forward. Why not use buildkernel target with KERNCONF=SMUNI? Anyway, well, maybe your kernel (the one that produced the crashdump) was from before the upgrade. fstat trying to allocate insane amounts of memory during vmcore processing is a sign that fstat and the kernel were compiled using different versions of system headers. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat Oct 6 16:10:38 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D16E61065670; Sat, 6 Oct 2012 16:10:38 +0000 (UTC) (envelope-from freebsd@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id 99F268FC08; Sat, 6 Oct 2012 16:10:38 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q96GAW68083135; Sat, 6 Oct 2012 09:10:32 -0700 (PDT) (envelope-from freebsd@pki2.com) From: Dennis Glatting To: Andriy Gapon In-Reply-To: <50704C5A.2060902@FreeBSD.org> References: <1349447619.89356.13.camel@btw.pki2.com> <506F063F.8050408@FreeBSD.org> <1349534277.45402.7.camel@btw.pki2.com> <50704C5A.2060902@FreeBSD.org> Content-Type: text/plain; charset="ISO-8859-1" Date: Sat, 06 Oct 2012 09:10:31 -0700 Message-ID: <1349539831.53407.0.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q96GAW68083135 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: freebsd@pki2.com Cc: freebsd-fs Subject: Re: under ZFS, I can reliably crash my systems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2012 16:10:38 -0000 On Sat, 2012-10-06 at 18:20 +0300, Andriy Gapon wrote: > on 06/10/2012 17:37 Dennis Glatting said the following: > > On Fri, 2012-10-05 at 19:09 +0300, Andriy Gapon wrote: > >> on 05/10/2012 17:33 Dennis Glatting said the following: > >>> swap_pager: out of swap space > >>> swap_pager_getswapspace(16): failed > >>> pid 1847 (fstat), uid 0, was killed: out of swap space > >> > >> One thing I can tell you, your kernel and userland are out of sync. > >> > > > > How so? If svn src and rebuild everything. Do you mean ports? > > > > Typical src build is: > > > > svn co svn://svn.pki2.com/base/stable/9/ /disk-1/src > > cd /usr/src; make -j65 buildworld > > make installworld > > yes | make delete-old > > yes | make delete-old-libs > > mergemaster > > cd /sys/amd64/conf/ > > ./mkconfig.pl SMUNI.in > > config SMUNI > > cd ../compile/SMUNI > > make cleandepend && make depend && make > > make install > > > > That's fairly straight forward. > > Why not use buildkernel target with KERNCONF=SMUNI? > Is there a difference in the processes? > Anyway, well, maybe your kernel (the one that produced the crashdump) was from > before the upgrade. fstat trying to allocate insane amounts of memory during > vmcore processing is a sign that fstat and the kernel were compiled using > different versions of system headers. > From owner-freebsd-fs@FreeBSD.ORG Sat Oct 6 16:24:15 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id DAF81106564A for ; Sat, 6 Oct 2012 16:24:15 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 2C5418FC0A for ; Sat, 6 Oct 2012 16:24:14 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA09054; Sat, 06 Oct 2012 19:24:10 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TKXAf-0004KM-Vq; Sat, 06 Oct 2012 19:24:10 +0300 Message-ID: <50705B28.8040407@FreeBSD.org> Date: Sat, 06 Oct 2012 19:24:08 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: Dennis Glatting References: <1349447619.89356.13.camel@btw.pki2.com> <506F063F.8050408@FreeBSD.org> <1349534277.45402.7.camel@btw.pki2.com> <50704C5A.2060902@FreeBSD.org> <1349539831.53407.0.camel@btw.pki2.com> In-Reply-To: <1349539831.53407.0.camel@btw.pki2.com> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs Subject: Re: under ZFS, I can reliably crash my systems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2012 16:24:16 -0000 on 06/10/2012 19:10 Dennis Glatting said the following: > On Sat, 2012-10-06 at 18:20 +0300, Andriy Gapon wrote: >> on 06/10/2012 17:37 Dennis Glatting said the following: >>> On Fri, 2012-10-05 at 19:09 +0300, Andriy Gapon wrote: >>>> on 05/10/2012 17:33 Dennis Glatting said the following: >>>>> swap_pager: out of swap space >>>>> swap_pager_getswapspace(16): failed >>>>> pid 1847 (fstat), uid 0, was killed: out of swap space >>>> >>>> One thing I can tell you, your kernel and userland are out of sync. >>>> >>> >>> How so? If svn src and rebuild everything. Do you mean ports? >>> >>> Typical src build is: >>> >>> svn co svn://svn.pki2.com/base/stable/9/ /disk-1/src >>> cd /usr/src; make -j65 buildworld >>> make installworld >>> yes | make delete-old >>> yes | make delete-old-libs >>> mergemaster >>> cd /sys/amd64/conf/ >>> ./mkconfig.pl SMUNI.in >>> config SMUNI >>> cd ../compile/SMUNI >>> make cleandepend && make depend && make >>> make install >>> >>> That's fairly straight forward. >> >> Why not use buildkernel target with KERNCONF=SMUNI? >> > > Is there a difference in the processes? Most likely no. But buildkernel is the official way. If something changes in the official procedure, then your manual procedure may miss it. >> Anyway, well, maybe your kernel (the one that produced the crashdump) was from >> before the upgrade. fstat trying to allocate insane amounts of memory during >> vmcore processing is a sign that fstat and the kernel were compiled using >> different versions of system headers. >> > > -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat Oct 6 20:09:42 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8153E1065670; Sat, 6 Oct 2012 20:09:42 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 9FF378FC18; Sat, 6 Oct 2012 20:09:41 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA10242; Sat, 06 Oct 2012 23:09:34 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TKagn-0004b8-Pf; Sat, 06 Oct 2012 23:09:33 +0300 Message-ID: <50708FFB.40007@FreeBSD.org> Date: Sat, 06 Oct 2012 23:09:31 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: John Baldwin References: <505DEF5F.8060401@FreeBSD.org> <201209260941.29016.jhb@freebsd.org> In-Reply-To: <201209260941.29016.jhb@freebsd.org> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: zfsboot and zfsloader: normalization of filesystem names X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2012 20:09:42 -0000 on 26/09/2012 16:41 John Baldwin said the following: > On Saturday, September 22, 2012 1:03:27 pm Andriy Gapon wrote: >> >> Currently zfsboot uses the following format to specify a ZFS filesystem name in >> a full file path: >> poolname:filesystem/name:/path/to/file >> ZFS loader uses this format: >> zfs:poolname/filesystemname:/path/to/file >> >> The following patchset: >> http://people.freebsd.org/~avg/zfs-boot-naming.diff >> unifies the naming. >> zfsboot format will be: poolname/filesystemname:/path/to/file >> Note that it is still different from zfsloader - "zfs:" prefix is missing. This >> is because unlike the loader zfsboot supports only ZFS filesystem, so the prefix >> is redundant. But I can still add support for it if there is a popular request. > > I think this idea sounds sound. You could easily let zfsboot support both by just > having it skip over a 'zfs:' prefix if it sees one. > OK, I implemented this suggestion and committed the code. Thank you! -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat Oct 6 20:44:53 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 14445106564A for ; Sat, 6 Oct 2012 20:44:53 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 5D0B38FC08 for ; Sat, 6 Oct 2012 20:44:52 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA10368; Sat, 06 Oct 2012 23:44:49 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TKbEv-0004dG-Ak; Sat, 06 Oct 2012 23:44:49 +0300 Message-ID: <5070983F.7010803@FreeBSD.org> Date: Sat, 06 Oct 2012 23:44:47 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: Chuck Burns References: <505DEF5F.8060401@FreeBSD.org> <201209260941.29016.jhb@freebsd.org> <50632ED0.2050901@gmail.com> In-Reply-To: <50632ED0.2050901@gmail.com> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: zfsboot and zfsloader: normalization of filesystem names X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2012 20:44:53 -0000 on 26/09/2012 19:35 Chuck Burns said the following: > On 9/26/2012 8:41 AM, John Baldwin wrote: >> On Saturday, September 22, 2012 1:03:27 pm Andriy Gapon wrote: >>> >>> Currently zfsboot uses the following format to specify a ZFS filesystem name in >>> a full file path: >>> poolname:filesystem/name:/path/to/file >>> ZFS loader uses this format: >>> zfs:poolname/filesystemname:/path/to/file >>> >>> The following patchset: >>> http://people.freebsd.org/~avg/zfs-boot-naming.diff >>> unifies the naming. >>> zfsboot format will be: poolname/filesystemname:/path/to/file >>> Note that it is still different from zfsloader - "zfs:" prefix is missing. This >>> is because unlike the loader zfsboot supports only ZFS filesystem, so the prefix >>> is redundant. But I can still add support for it if there is a popular request. >> >> I think this idea sounds sound. You could easily let zfsboot support both by >> just >> having it skip over a 'zfs:' prefix if it sees one. >> > My $0.02 -- Keep "zfs:" in for consistency's sake. > I made it optionally supported. Thank you. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat Oct 6 22:32:58 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 62D9E106566B; Sat, 6 Oct 2012 22:32:58 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id A00018FC14; Sat, 6 Oct 2012 22:32:57 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAA+xcFCDaFvO/2dsb2JhbABFhhG6GoIgAQEBBAEBASArIAsbDgoCAg0ZAikBCSYGCAIFBAEcAQOHZAumepF5gSGKLhqEZIESA5M+gi2BFY8ZgwmBRzQ X-IronPort-AV: E=Sophos;i="4.80,545,1344225600"; d="scan'208";a="182305646" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 06 Oct 2012 18:32:56 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 72EEAB4037; Sat, 6 Oct 2012 18:32:56 -0400 (EDT) Date: Sat, 6 Oct 2012 18:32:56 -0400 (EDT) From: Rick Macklem To: Nikolay Denev Message-ID: <895825217.1831774.1349562776418.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <3E7BCFB4-6EE6-48F5-ACA7-A615F3CE5BAC@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, hackers@freebsd.org, Garrett Wollman Subject: Re: NFS server bottlenecks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Oct 2012 22:32:58 -0000 Nikolay Deney wrote: > On Oct 4, 2012, at 12:36 AM, Rick Macklem > wrote: >=20 > > Garrett Wollman wrote: > >> < >> said: > >> > >>>> Simple: just use a sepatate mutex for each list that a cache > >>>> entry > >>>> is on, rather than a global lock for everything. This would > >>>> reduce > >>>> the mutex contention, but I'm not sure how significantly since I > >>>> don't have the means to measure it yet. > >>>> > >>> Well, since the cache trimming is removing entries from the lists, > >>> I > >>> don't > >>> see how that can be done with a global lock for list updates? > >> > >> Well, the global lock is what we have now, but the cache trimming > >> process only looks at one list at a time, so not locking the list > >> that > >> isn't being iterated over probably wouldn't hurt, unless there's > >> some > >> mechanism (that I didn't see) for entries to move from one list to > >> another. Note that I'm considering each hash bucket a separate > >> "list". (One issue to worry about in that case would be cache-line > >> contention in the array of hash buckets; perhaps > >> NFSRVCACHE_HASHSIZE > >> ought to be increased to reduce that.) > >> > > Yea, a separate mutex for each hash list might help. There is also > > the > > LRU list that all entries end up on, that gets used by the trimming > > code. > > (I think? I wrote this stuff about 8 years ago, so I haven't looked > > at > > it in a while.) > > > > Also, increasing the hash table size is probably a good idea, > > especially > > if you reduce how aggressively the cache is trimmed. > > > >>> Only doing it once/sec would result in a very large cache when > >>> bursts of > >>> traffic arrives. > >> > >> My servers have 96 GB of memory so that's not a big deal for me. > >> > > This code was originally "production tested" on a server with > > 1Gbyte, > > so times have changed a bit;-) > > > >>> I'm not sure I see why doing it as a separate thread will improve > >>> things. > >>> There are N nfsd threads already (N can be bumped up to 256 if you > >>> wish) > >>> and having a bunch more "cache trimming threads" would just > >>> increase > >>> contention, wouldn't it? > >> > >> Only one cache-trimming thread. The cache trim holds the (global) > >> mutex for much longer than any individual nfsd service thread has > >> any > >> need to, and having N threads doing that in parallel is why it's so > >> heavily contended. If there's only one thread doing the trim, then > >> the nfsd service threads aren't spending time either contending on > >> the > >> mutex (it will be held less frequently and for shorter periods). > >> > > I think the little drc2.patch which will keep the nfsd threads from > > acquiring the mutex and doing the trimming most of the time, might > > be > > sufficient. I still don't see why a separate trimming thread will be > > an advantage. I'd also be worried that the one cache trimming thread > > won't get the job done soon enough. > > > > When I did production testing on a 1Gbyte server that saw a peak > > load of about 100RPCs/sec, it was necessary to trim aggressively. > > (Although I'd be tempted to say that a server with 1Gbyte is no > > longer relevant, I recently recall someone trying to run FreeBSD > > on a i486, although I doubt they wanted to run the nfsd on it.) > > > >>> The only negative effect I can think of w.r.t. having the nfsd > >>> threads doing it would be a (I believe negligible) increase in RPC > >>> response times (the time the nfsd thread spends trimming the > >>> cache). > >>> As noted, I think this time would be negligible compared to disk > >>> I/O > >>> and network transit times in the total RPC response time? > >> > >> With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G > >> network connectivity, spinning on a contended mutex takes a > >> significant amount of CPU time. (For the current design of the NFS > >> server, it may actually be a win to turn off adaptive mutexes -- I > >> should give that a try once I'm able to do more testing.) > >> > > Have fun with it. Let me know when you have what you think is a good > > patch. > > > > rick > > > >> -GAWollman > >> _______________________________________________ > >> freebsd-hackers@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > >> To unsubscribe, send any mail to > >> "freebsd-hackers-unsubscribe@freebsd.org" > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to > > "freebsd-fs-unsubscribe@freebsd.org" >=20 > I was doing some NFS testing with RELENG_9 machine and > a Linux RHEL machine over 10G network, and noticed the same nfsd > threads issue. >=20 > Previously I would read a 32G file locally on the FreeBSD ZFS/NFS > server with "dd if=3D/tank/32G.bin of=3D/dev/null bs=3D1M" to cache it > completely in ARC (machine has 196G RAM), > then if I do this again locally I would get close to 4GB/sec read - > completely from the cache... >=20 > But If I try to read the file over NFS from the Linux machine I would > only get about 100MB/sec speed, sometimes a bit more, > and all of the nfsd threads are clearly visible in top. pmcstat also > showed the same mutex contention as in the original post. >=20 > I've now applied the drc2 patch, and reruning the same test yields > about 960MB/s transfer over NFS=E2=80=A6 quite an improvement! >=20 Sounds good. Hopefully Garrett can test it too and then it sounds like in can be committed. Someday I'll look at using separate mutexes for each of the hash buckets, which should reduce contention for the mutex for TCP. For UDP, there is one LRU list that all entries are on, so UDP is probably stuck using one mutex for now. Since this would be a more involved and risky patch, I think committing drc2.patch first and then doing this later, would make sense. Thanks for testing it, rick >=20 > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org"