From owner-freebsd-fs@freebsd.org Sun Jul 19 21:00:44 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3F2059A530F for ; Sun, 19 Jul 2015 21:00:44 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1A32A1F29 for ; Sun, 19 Jul 2015 21:00:44 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t6JL0hFJ012157 for ; Sun, 19 Jul 2015 21:00:43 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <201507192100.t6JL0hFJ012157@kenobi.freebsd.org> From: bugzilla-noreply@FreeBSD.org To: freebsd-fs@FreeBSD.org Subject: Problem reports for freebsd-fs@FreeBSD.org that need special attention X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 Date: Sun, 19 Jul 2015 21:00:43 +0000 Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jul 2015 21:00:44 -0000 To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- Open | 136470 | [nfs] Cannot mount / in read-only, over NFS Open | 139651 | [nfs] mount(8): read-only remount of NFS volume d Open | 144447 | [zfs] sharenfs fsunshare() & fsshare_main() non f 3 problems total for which you should take action. From owner-freebsd-fs@freebsd.org Mon Jul 20 00:19:54 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7A6099A6BAA for ; Mon, 20 Jul 2015 00:19:54 +0000 (UTC) (envelope-from dylan@techtangents.com) Received: from p3plsmtpa09-07.prod.phx3.secureserver.net (p3plsmtpa09-07.prod.phx3.secureserver.net [173.201.193.236]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client CN "Bizanga Labs SMTP Client Certificate", Issuer "Bizanga Labs CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 599A61027 for ; Mon, 20 Jul 2015 00:19:53 +0000 (UTC) (envelope-from dylan@techtangents.com) Received: from [192.168.12.129] ([118.208.95.204]) by p3plsmtpa09-07.prod.phx3.secureserver.net with id uQJE1q0024QaD9A01QJFAG; Sun, 19 Jul 2015 17:18:16 -0700 Message-ID: <55AC3E44.9080009@techtangents.com> Date: Mon, 20 Jul 2015 10:18:12 +1000 From: "dylan@techtangents.com" User-Agent: Postbox 3.0.11 (Macintosh/20140602) MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: NFS 4.1 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jul 2015 00:19:54 -0000 Hi, Does FreeBSD support NFS 4.1? Is this planned? Sorry if this has been asked before - I was unable to find any info on the web. Cheers, Dylan Just From owner-freebsd-fs@freebsd.org Mon Jul 20 00:52:15 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A1D529A6FED for ; Mon, 20 Jul 2015 00:52:15 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 5B0121D9A for ; Mon, 20 Jul 2015 00:52:14 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DiAgATRaxV/61jaINcg2dpBrtfCYFrCoUtSgKBVhQBAQEBAQEBgQqEIwEBAQMBAQEBICsgCwULAgEIGAICDRkCAicBCSYCDAcEARwEiAUIDa5IlUABAQEBBgEBAQEBARyBIooqhDQBAQUXNAcWglKBQwWUUoRvhF2EXpZ8AiZjgzUiMQeBBjqBBAEBAQ X-IronPort-AV: E=Sophos;i="5.15,505,1432612800"; d="scan'208";a="226821969" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 19 Jul 2015 20:52:07 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id B0A0415F542; Sun, 19 Jul 2015 20:52:07 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Nuy-JoBTO7AP; Sun, 19 Jul 2015 20:52:07 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 5E29F15F55D; Sun, 19 Jul 2015 20:52:07 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id w6U7ONTx1Uzk; Sun, 19 Jul 2015 20:52:07 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 457AE15F542; Sun, 19 Jul 2015 20:52:07 -0400 (EDT) Date: Sun, 19 Jul 2015 20:52:07 -0400 (EDT) From: Rick Macklem To: dylan@techtangents.com Cc: freebsd-fs@freebsd.org Message-ID: <1422949714.104901.1437353527247.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <55AC3E44.9080009@techtangents.com> References: <55AC3E44.9080009@techtangents.com> Subject: Re: NFS 4.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: NFS 4.1 Thread-Index: nSZwtnu6BNfaKSYA+tqSTO530zj1pg== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jul 2015 00:52:15 -0000 Dylan Just wrote: > Hi, > > Does FreeBSD support NFS 4.1? Is this planned? Sorry if this has been > asked before - I was unable to find any info on the web. > Yes. FreeBSD10.x has both client and server. The client includes support for pNFS (file layout only). The server does not include pNFS support at this time. (The NFSv4.1 client is also in FreeBSD9.3.) rick > Cheers, > > Dylan Just > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@freebsd.org Mon Jul 20 01:07:38 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 454369A5173 for ; Mon, 20 Jul 2015 01:07:38 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 076DE10D7 for ; Mon, 20 Jul 2015 01:07:37 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CjBADYSKxV/61jaINchFbDVIFcEQEBAQEBAQGBCoQkAgQZClYSAQgaAg0ZAluIRa5blT8BCgEBAR6BIooqhFI0HYJSgUMFlFKlJgImY4M1IoF4gQQBAQE X-IronPort-AV: E=Sophos;i="5.15,505,1432612800"; d="scan'208";a="226823132" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 19 Jul 2015 21:07:36 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id E2A2915F542; Sun, 19 Jul 2015 21:07:36 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id BS-JqWBHhB0W; Sun, 19 Jul 2015 21:07:36 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id A14DB15F55D; Sun, 19 Jul 2015 21:07:36 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id dS6jfkJOK_a8; Sun, 19 Jul 2015 21:07:36 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 88FE315F542; Sun, 19 Jul 2015 21:07:36 -0400 (EDT) Date: Sun, 19 Jul 2015 21:07:36 -0400 (EDT) From: Rick Macklem To: dylan@techtangents.com Cc: freebsd-fs@freebsd.org Message-ID: <54084976.106539.1437354456546.JavaMail.zimbra@uoguelph.ca> Subject: Re: NFS 4.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.10] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: NFS 4.1 Thread-Index: nwqXVGsWnyjg6inOlPn/1iCxTDe7SA== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jul 2015 01:07:38 -0000 Oops, my mistake. NFSv4.1 is not in FreeBSD9.3. It is in 10.x. rick From owner-freebsd-fs@freebsd.org Mon Jul 20 01:10:43 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3196D9A520C for ; Mon, 20 Jul 2015 01:10:43 +0000 (UTC) (envelope-from dylan@techtangents.com) Received: from p3plsmtpa12-08.prod.phx3.secureserver.net (p3plsmtpa12-08.prod.phx3.secureserver.net [68.178.252.237]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client CN "Bizanga Labs SMTP Client Certificate", Issuer "Bizanga Labs CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id F2E6E11F9 for ; Mon, 20 Jul 2015 01:10:42 +0000 (UTC) (envelope-from dylan@techtangents.com) Received: from [192.168.12.129] ([118.208.95.204]) by p3plsmtpa12-08.prod.phx3.secureserver.net with id uR931q00J4QaD9A01R94nY; Sun, 19 Jul 2015 18:09:06 -0700 Message-ID: <55AC4A2E.4090102@techtangents.com> Date: Mon, 20 Jul 2015 11:09:02 +1000 From: "dylan@techtangents.com" User-Agent: Postbox 3.0.11 (Macintosh/20140602) MIME-Version: 1.0 To: Rick Macklem CC: freebsd-fs@freebsd.org Subject: Re: NFS 4.1 References: <55AC3E44.9080009@techtangents.com> <1422949714.104901.1437353527247.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <1422949714.104901.1437353527247.JavaMail.zimbra@uoguelph.ca> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jul 2015 01:10:43 -0000 > Rick Macklem > 20 July 2015 10:52 am >> Hi, >> >> Does FreeBSD support NFS 4.1? Is this planned? Sorry if this has been >> asked before - I was unable to find any info on the web. >> > Yes. FreeBSD10.x has both client and server. The client includes support > for pNFS (file layout only). The server does not include pNFS support at > this time. (The NFSv4.1 client is also in FreeBSD9.3.) Fantastic! As a follow-up (and my real motivation) - anyone tried this with Vsphere 6? Thanks, Dylan From owner-freebsd-fs@freebsd.org Mon Jul 20 21:42:43 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CA5179A6CB1 for ; Mon, 20 Jul 2015 21:42:43 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B6C18169B for ; Mon, 20 Jul 2015 21:42:43 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t6KLgh6X057849 for ; Mon, 20 Jul 2015 21:42:43 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 201677] unionfs or tmpfs kernel panic Date: Mon, 20 Jul 2015 21:42:43 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-BETA1 X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: linimon@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jul 2015 21:42:43 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=201677 Mark Linimon changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-bugs@FreeBSD.org |freebsd-fs@FreeBSD.org -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Tue Jul 21 01:29:02 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 919C99A7056 for ; Tue, 21 Jul 2015 01:29:02 +0000 (UTC) (envelope-from matthew@FreeBSD.org) Received: from smtp.infracaninophile.co.uk (smtp.infracaninophile.co.uk [IPv6:2001:8b0:151:1:3cd3:cd67:fafa:3d78]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.infracaninophile.co.uk", Issuer "infracaninophile.co.uk" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id F23FF1FD4 for ; Tue, 21 Jul 2015 01:29:01 +0000 (UTC) (envelope-from matthew@FreeBSD.org) Received: from zero-gravitas.local ([IPv6:2001:8b0:151:1:2ef0:eeff:fe24:fa38]) (authenticated bits=0) by smtp.infracaninophile.co.uk (8.15.2/8.15.2) with ESMTPSA id t6L1SgnB091737 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Tue, 21 Jul 2015 02:28:54 +0100 (BST) (envelope-from matthew@FreeBSD.org) Authentication-Results: smtp.infracaninophile.co.uk; dmarc=none header.from=FreeBSD.org DKIM-Filter: OpenDKIM Filter v2.9.2 smtp.infracaninophile.co.uk t6L1SgnB091737 Authentication-Results: smtp.infracaninophile.co.uk/t6L1SgnB091737; dkim=none reason="no signature"; dkim-adsp=none; dkim-atps=neutral X-Authentication-Warning: lucid-nonsense.infracaninophile.co.uk: Host [IPv6:2001:8b0:151:1:2ef0:eeff:fe24:fa38] claimed to be zero-gravitas.local Message-ID: <55ADA043.5060002@FreeBSD.org> Date: Tue, 21 Jul 2015 02:28:35 +0100 From: Matthew Seaman User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Attach a device to an unimported pool? Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="3KjvhK84Ps0fAnGSqbrKNUkopgiTGlaCg" X-Virus-Scanned: clamav-milter 0.98.7 at lucid-nonsense.infracaninophile.co.uk X-Virus-Status: Clean X-Spam-Status: No, score=-3.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on lucid-nonsense.infracaninophile.co.uk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2015 01:29:02 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --3KjvhK84Ps0fAnGSqbrKNUkopgiTGlaCg Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable I have an unimportable zpool which is telling me to 'attach the missing devices and try again' The missing devices are a mirrored pair for logs -- they exist and should be readily attachable, but it seems I cannot attach devices to an un-imported pool, and I can't import the pool with out attaching the devices. Any clues as to how to unfuck this system most gratefully received. Matthew --3KjvhK84Ps0fAnGSqbrKNUkopgiTGlaCg Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQJ8BAEBCgBmBQJVraBJXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQxOUYxNTRFQ0JGMTEyRTUwNTQ0RTNGMzAw MDUxM0YxMEUwQTlFNEU3AAoJEABRPxDgqeTnBjgP/0c1iySUuflkviQiYWEHj6G+ Hzd3UYE0EwvVyBaF4KdQNtzpeKO+aUJI0nQk3pHaWZI0Wr+bOKg+x6vrUEkIOoB+ +rGIcp9H+8H4gH8y1sC0QyuOlBthNGd+26ZcqciCcTjeuyNTyf3OMey689hHuTg1 +1t0yyhnRhmm5WnuPOg0/9PurpYLgthEpT+aCEON/7/yGeA6rF3oT+Me39c4YOQs f1Gr/4JADqloTqYfGp3ftfdsgvheLVwuy6mM+SV8+V/QpulVsnq6kPsL3NuBh2ZN QUSuSKV4fiYnXblqIrts9AHV2hmlmYYQCguOvrDPexPTHdHdq+1v3m2551fkYaA1 jb2ePnGj9rupuyRkUdzwgoGfy22SgX+C8JFjUDkxT/LuxJ9E0cnJuDozVhmQzdqA JFEvnL4LEPFQThnVWsmtEFDtr6B/KjEgaskkjlhMywD3+a85QZMAitIavsqenV9p ycDKNstCALEBZhxtgqkI15wMkBgQhTkmFatxPJh4430stjQ9Q9kaN2RAjxjT2tNP aEKzD2kMba3kMs2JzsV1kdWs5+lrZ2KhrSgcFSG0yuzAG/2e06jO4NMbESE0F8/H v+CsCj5R/rDYFeaLayUSQVuG71ynm04A6ZGytB3te355+zuVB6nW5Xfxrdfk9dxe J0RMYpfEHhreyYzTT+/P =5Ntl -----END PGP SIGNATURE----- --3KjvhK84Ps0fAnGSqbrKNUkopgiTGlaCg-- From owner-freebsd-fs@freebsd.org Tue Jul 21 03:27:07 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1A9D99A6517 for ; Tue, 21 Jul 2015 03:27:07 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wi0-x229.google.com (mail-wi0-x229.google.com [IPv6:2a00:1450:400c:c05::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8DD5FBB1 for ; Tue, 21 Jul 2015 03:27:06 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wibud3 with SMTP id ud3so113985503wib.0 for ; Mon, 20 Jul 2015 20:27:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=jcKTzh/zum0eTfRQr8hNgUuavRVkAUHcmonJfmrIi8o=; b=S2Zi4yfI/PAuaUQMdhe2tdmqVoNDxtKEd9yOdyuxo9qinIalbTRHOWNiimP8xKQPPR U6rFgRxdPcbRPfOMx2PxLTa3HVMmqQl9kmXaluZbSdbwUS+a8NcHYWhlN0LZFNtZzY1G zGY+LhpFSuZ6A8Kr2FxruR7KBHa3PVQB+UPkXYvIoOocYCaTlCNskYrn48sccmNwWEds HXvq23k4U1vPWJa7/vi9Di9L++AfiFDWujzEeURPLlN5SJ4OFrqpMkpcGU03zTKZYS3m wNbZG4dyRuDY+xJC9lp0nJvZOjYXY2Zz9eQq2TNlXy8q6HwvAVipxyEAV8Of1/YYtgMA /GPg== X-Received: by 10.194.59.98 with SMTP id y2mr63409028wjq.42.1437449224879; Mon, 20 Jul 2015 20:27:04 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Mon, 20 Jul 2015 20:26:45 -0700 (PDT) In-Reply-To: <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <5594B008.10202@freebsd.org> <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> From: Ahmed Kamal Date: Tue, 21 Jul 2015 05:26:45 +0200 Message-ID: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) To: Rick Macklem Cc: Graham Allan , Ahmed Kamal via freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2015 03:27:07 -0000 Hi folks, I've upgraded a test client to rhel6 today, and I'll keep an eye on it to see what happens. During the process, I made the (I guess mistake) of zfs send | recv to a locally attached usb disk for backup purposes .. long story short, sharenfs property on the received filesystem was causing some nfs/mountd errors in logs .. I wasn't too happy with what I got .. I destroyed the backup datasets and the whole pool eventually .. and then rebooted the whole nas box .. After reboot my logs are still flooded with Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session Jul 21 05:13:07 nas last message repeated 7536 times Jul 21 05:15:08 nas last message repeated 29664 times Not sure what that means .. or how it can be stopped .. Anyway, will keep you posted on progress. On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem wrote: > Graham Allan wrote: > > I'm curious how things are going for you with this? > > > > Reading your thread did pique my interest since we have a lot of > > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I meant > > to glance through our logs for signs of the same issue, but today I > > started investigating a machine which appeared to have hung processes, > > high rpciod load, and high traffic to the NFS server. Of course it is > > exactly this issue. > > > > The affected machine is running SL5 though most of our server nodes are > > now SL6. I can see errors from most of them but the SL6 systems appear > > less affected - I see a stream of the sequence-id errors in their logs > but > > things in general keep working. The one SL5 machine I'm looking at > > has a single sequence-id error in today's logs, but then goes into a > > stream of "state recovery failed" then "Lock reclaim failed". It's > > probably partly related to the particular workload on this machine. > > > > I would try switching our SL6 machines to NFS 4.1 to see if the > > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is it in > > 10.1?). > > > Btw, I've done some testing against a fairly recent Fedora and haven't seen > the problem. If either of you guys could load a recent Fedora on a test > client > box, it would be interesting to see if it suffers from this. (My > experience is > that the Fedora distros have more up to date Linux NFS clients.) > > rick > > > At the NFS servers, most of the sysctl settings are already tuned > > from defaults. eg tcp.highwater=100000, vfs.nfsd.tcpcachetimeo=300, > > 128-256 nfs kernel threads. > > > > Graham > > > > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via freebsd-fs > wrote: > > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming > > > reports from users about hung vnc sessions. So maybe just maybe, linux > > > clients are able to somehow recover from this bad sequence messages. I > > > could still see the bad sequence error message in logs though > > > > > > Why isn't the highwater tunable set to something better by default ? I > mean > > > this server is certainly not under a high or unusual load (it's only > 40 PCs > > > mounting from it) > > > > > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal > > > > > > wrote: > > > > > > > Thanks all .. I understand now we're doing the "right thing" .. > Although > > > > if mounting keeps wedging, I will have to solve it somehow! Either > using > > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > > > > > > > > Regarding Xin's patch, is it possible to build the patched nfsd > code, as > > > > a > > > > kernel module ? I'm looking to minimize my delta to upstream. > > > > > > > > Also would adopting Xin's patch and hiding it behind a > > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not the > > > > last > > > > person on earth to hit this) ? > > > > > > > > Thanks a lot for all the help! > > > > > > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem > > > > wrote: > > > > > > > >> Ahmed Kamal wrote: > > > >> > Appreciating the fruitful discussion! Can someone please explain > to > > > >> > me, > > > >> > what would happen in the current situation (linux client doing > this > > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect of > > > >> > that? > > > >> Well, as you've seen, the Linux client doesn't function correctly > > > >> against > > > >> the FreeBSD server (and probably others that don't support this > > > >> "skip-by-1" > > > >> case). > > > >> > > > >> > What do users see? Any chances of data loss? > > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what the > > > >> Linux > > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the > guy > > > >> observing > > > >> it. > > > >> > > > >> > > > > >> > Also, I find it strange that netapp have acknowledged this is a > bug on > > > >> > their side, which has been fixed since then! > > > >> Yea, I think Netapp screwed up. For some reason their server allowed > > > >> this, > > > >> then was fixed to not allow it and then someone decided that was > broken > > > >> and > > > >> reversed it. > > > >> > > > >> > I also find it strange that I'm the first to hit this :) Is no one > > > >> running > > > >> > nfs4 yet! > > > >> > > > > >> Well, it seems to be slowly catching on. I suspect that the Linux > client > > > >> mounting a Netapp is the most common use of it. Since it appears > that > > > >> they > > > >> flip flopped w.r.t. who's bug this is, it has probably persisted. > > > >> > > > >> It may turn out that the Linux client has been fixed or it may turn > out > > > >> that most servers allowed this "skip-by-1" even though David Noveck > (one > > > >> of the main authors of the protocol) seems to agree with me that it > > > >> should > > > >> not be allowed. > > > >> > > > >> It is possible that others have bumped into this, but it wasn't > isolated > > > >> (I wouldn't have guessed it, so it was good you pointed to the > RedHat > > > >> discussion) > > > >> and they worked around it by reverting to NFSv3 or similar. > > > >> The protocol is rather complex in this area and changed completely > for > > > >> NFSv4.1, > > > >> so many have also probably moved onto NFSv4.1 where this won't be an > > > >> issue. > > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and > doesn't > > > >> use > > > >> these seqid fields.) > > > >> > > > >> This is all just mho, rick > > > >> > > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < > rmacklem@uoguelph.ca> > > > >> wrote: > > > >> > > > > >> > > Julian Elischer wrote: > > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > > > >> > > > > I am going to post to nfsv4@ietf.org to see what they say. > > > >> > > > > Please > > > >> > > > > let me know if Xin Li's patch resolves your problem, even > though > > > >> > > > > I > > > >> > > > > don't believe it is correct except for the UINT32_MAX case. > Good > > > >> > > > > luck with it, rick > > > >> > > > and please keep us all in the loop as to what they say! > > > >> > > > > > > >> > > > the general N+2 bit sounds like bullshit to me.. its always > N+1 in > > > >> > > > a > > > >> > > > number field that has a > > > >> > > > bit of slack at wrap time (probably due to some ambiguity in > the > > > >> > > > original spec). > > > >> > > > > > > >> > > Actually, since N is the lock op already done, N + 1 is the next > > > >> > > lock > > > >> > > operation in order. Since lock ops need to be strictly ordered, > > > >> allowing > > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no > sense. > > > >> > > > > > >> > > I think the author of the RFC meant that N + 2 or greater > fails, but > > > >> it > > > >> > > was poorly worded. > > > >> > > > > > >> > > I will pass along whatever I get from nfsv4@ietf.org. (There > is an > > > >> archive > > > >> > > of it somewhere, but I can't remember where.;-) > > > >> > > > > > >> > > rick > > > >> > > _______________________________________________ > > > >> > > freebsd-fs@freebsd.org mailing list > > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > >> > > To unsubscribe, send any mail to > > > >> > > "freebsd-fs-unsubscribe@freebsd.org" > > > >> > > > > > >> > > > > >> > > > > > > > > > > > _______________________________________________ > > > freebsd-fs@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > > -- > > ------------------------------------------------------------------------- > > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612) 624-5040 > > School of Physics and Astronomy - University of Minnesota > > ------------------------------------------------------------------------- > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@freebsd.org Tue Jul 21 03:52:41 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6455F9A6A78 for ; Tue, 21 Jul 2015 03:52:41 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wg0-x236.google.com (mail-wg0-x236.google.com [IPv6:2a00:1450:400c:c00::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DE6B71637 for ; Tue, 21 Jul 2015 03:52:40 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wgav7 with SMTP id v7so79797515wga.2 for ; Mon, 20 Jul 2015 20:52:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=bfsV2/Ww64o8f9nfYnLH9RCrSIQPm1jcwZkiPveEOKA=; b=R3FAYdVBj82/yQG+zXjlkNFZhFOBZwVBRZnDVVqgzKdrsl1KdIC9p4W9aCp80eRy+h jErfIGfmC8jv6w+e0g3bsgjA3Cp+Wnhmno92Km9EnCqU0B9ag67t3zWhVhBygBO1aIP3 1NEqUAW2WasZX6Hs67nBqWCsL6oUmTwWyEYkob3WJ0Q9bapm78dGog1+BkRKXzC58GOx tuKmdb70gPokdpYwvPIsA6w28txY3GDpY8cues40gTJ5TYvZM7MjCTeQed+z2fgqsff/ 3gWUhmgJGQNze9QKX2eL+mByh8ouIzZGXsP+jFRR8muy+0OEl+Z53J7RpqfQ4b/x5dUE koXA== X-Received: by 10.194.59.98 with SMTP id y2mr63570894wjq.42.1437450759215; Mon, 20 Jul 2015 20:52:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Mon, 20 Jul 2015 20:52:19 -0700 (PDT) In-Reply-To: References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <5594B008.10202@freebsd.org> <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> From: Ahmed Kamal Date: Tue, 21 Jul 2015 05:52:19 +0200 Message-ID: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) To: Rick Macklem Cc: Graham Allan , Ahmed Kamal via freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2015 03:52:41 -0000 More info .. Just noticed nfsd is spinning the cpu at 500% :( I just did the dtrace with: dtrace -n profile-1001 { @[stack()] = count(); } The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom) Since rebooting the nfs server didn't fix it .. I imagine I'd have to reboot all NFS clients .. This would be really sad .. Any advice is most appreciated .. Thanks On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal < email.ahmedkamal@googlemail.com> wrote: > Hi folks, > > I've upgraded a test client to rhel6 today, and I'll keep an eye on it to > see what happens. > > During the process, I made the (I guess mistake) of zfs send | recv to a > locally attached usb disk for backup purposes .. long story short, sharenfs > property on the received filesystem was causing some nfs/mountd errors in > logs .. I wasn't too happy with what I got .. I destroyed the backup > datasets and the whole pool eventually .. and then rebooted the whole nas > box .. After reboot my logs are still flooded with > > Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > Jul 21 05:13:07 nas last message repeated 7536 times > Jul 21 05:15:08 nas last message repeated 29664 times > > Not sure what that means .. or how it can be stopped .. Anyway, will keep > you posted on progress. > > On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem > wrote: > >> Graham Allan wrote: >> > I'm curious how things are going for you with this? >> > >> > Reading your thread did pique my interest since we have a lot of >> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I meant >> > to glance through our logs for signs of the same issue, but today I >> > started investigating a machine which appeared to have hung processes, >> > high rpciod load, and high traffic to the NFS server. Of course it is >> > exactly this issue. >> > >> > The affected machine is running SL5 though most of our server nodes are >> > now SL6. I can see errors from most of them but the SL6 systems appear >> > less affected - I see a stream of the sequence-id errors in their logs >> but >> > things in general keep working. The one SL5 machine I'm looking at >> > has a single sequence-id error in today's logs, but then goes into a >> > stream of "state recovery failed" then "Lock reclaim failed". It's >> > probably partly related to the particular workload on this machine. >> > >> > I would try switching our SL6 machines to NFS 4.1 to see if the >> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is it in >> > 10.1?). >> > >> Btw, I've done some testing against a fairly recent Fedora and haven't >> seen >> the problem. If either of you guys could load a recent Fedora on a test >> client >> box, it would be interesting to see if it suffers from this. (My >> experience is >> that the Fedora distros have more up to date Linux NFS clients.) >> >> rick >> >> > At the NFS servers, most of the sysctl settings are already tuned >> > from defaults. eg tcp.highwater=100000, vfs.nfsd.tcpcachetimeo=300, >> > 128-256 nfs kernel threads. >> > >> > Graham >> > >> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via freebsd-fs >> wrote: >> > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming >> > > reports from users about hung vnc sessions. So maybe just maybe, linux >> > > clients are able to somehow recover from this bad sequence messages. I >> > > could still see the bad sequence error message in logs though >> > > >> > > Why isn't the highwater tunable set to something better by default ? >> I mean >> > > this server is certainly not under a high or unusual load (it's only >> 40 PCs >> > > mounting from it) >> > > >> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal >> > > > > > > wrote: >> > > >> > > > Thanks all .. I understand now we're doing the "right thing" .. >> Although >> > > > if mounting keeps wedging, I will have to solve it somehow! Either >> using >> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. >> > > > >> > > > Regarding Xin's patch, is it possible to build the patched nfsd >> code, as >> > > > a >> > > > kernel module ? I'm looking to minimize my delta to upstream. >> > > > >> > > > Also would adopting Xin's patch and hiding it behind a >> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not >> the >> > > > last >> > > > person on earth to hit this) ? >> > > > >> > > > Thanks a lot for all the help! >> > > > >> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem > > >> >> > > > wrote: >> > > > >> > > >> Ahmed Kamal wrote: >> > > >> > Appreciating the fruitful discussion! Can someone please explain >> to >> > > >> > me, >> > > >> > what would happen in the current situation (linux client doing >> this >> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect >> of >> > > >> > that? >> > > >> Well, as you've seen, the Linux client doesn't function correctly >> > > >> against >> > > >> the FreeBSD server (and probably others that don't support this >> > > >> "skip-by-1" >> > > >> case). >> > > >> >> > > >> > What do users see? Any chances of data loss? >> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what the >> > > >> Linux >> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the >> guy >> > > >> observing >> > > >> it. >> > > >> >> > > >> > >> > > >> > Also, I find it strange that netapp have acknowledged this is a >> bug on >> > > >> > their side, which has been fixed since then! >> > > >> Yea, I think Netapp screwed up. For some reason their server >> allowed >> > > >> this, >> > > >> then was fixed to not allow it and then someone decided that was >> broken >> > > >> and >> > > >> reversed it. >> > > >> >> > > >> > I also find it strange that I'm the first to hit this :) Is no >> one >> > > >> running >> > > >> > nfs4 yet! >> > > >> > >> > > >> Well, it seems to be slowly catching on. I suspect that the Linux >> client >> > > >> mounting a Netapp is the most common use of it. Since it appears >> that >> > > >> they >> > > >> flip flopped w.r.t. who's bug this is, it has probably persisted. >> > > >> >> > > >> It may turn out that the Linux client has been fixed or it may >> turn out >> > > >> that most servers allowed this "skip-by-1" even though David >> Noveck (one >> > > >> of the main authors of the protocol) seems to agree with me that it >> > > >> should >> > > >> not be allowed. >> > > >> >> > > >> It is possible that others have bumped into this, but it wasn't >> isolated >> > > >> (I wouldn't have guessed it, so it was good you pointed to the >> RedHat >> > > >> discussion) >> > > >> and they worked around it by reverting to NFSv3 or similar. >> > > >> The protocol is rather complex in this area and changed completely >> for >> > > >> NFSv4.1, >> > > >> so many have also probably moved onto NFSv4.1 where this won't be >> an >> > > >> issue. >> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and >> doesn't >> > > >> use >> > > >> these seqid fields.) >> > > >> >> > > >> This is all just mho, rick >> > > >> >> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < >> rmacklem@uoguelph.ca> >> > > >> wrote: >> > > >> > >> > > >> > > Julian Elischer wrote: >> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: >> > > >> > > > > I am going to post to nfsv4@ietf.org to see what they say. >> > > >> > > > > Please >> > > >> > > > > let me know if Xin Li's patch resolves your problem, even >> though >> > > >> > > > > I >> > > >> > > > > don't believe it is correct except for the UINT32_MAX >> case. Good >> > > >> > > > > luck with it, rick >> > > >> > > > and please keep us all in the loop as to what they say! >> > > >> > > > >> > > >> > > > the general N+2 bit sounds like bullshit to me.. its always >> N+1 in >> > > >> > > > a >> > > >> > > > number field that has a >> > > >> > > > bit of slack at wrap time (probably due to some ambiguity in >> the >> > > >> > > > original spec). >> > > >> > > > >> > > >> > > Actually, since N is the lock op already done, N + 1 is the >> next >> > > >> > > lock >> > > >> > > operation in order. Since lock ops need to be strictly ordered, >> > > >> allowing >> > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no >> sense. >> > > >> > > >> > > >> > > I think the author of the RFC meant that N + 2 or greater >> fails, but >> > > >> it >> > > >> > > was poorly worded. >> > > >> > > >> > > >> > > I will pass along whatever I get from nfsv4@ietf.org. (There >> is an >> > > >> archive >> > > >> > > of it somewhere, but I can't remember where.;-) >> > > >> > > >> > > >> > > rick >> > > >> > > _______________________________________________ >> > > >> > > freebsd-fs@freebsd.org mailing list >> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > > >> > > To unsubscribe, send any mail to >> > > >> > > "freebsd-fs-unsubscribe@freebsd.org" >> > > >> > > >> > > >> > >> > > >> >> > > > >> > > > >> > > _______________________________________________ >> > > freebsd-fs@freebsd.org mailing list >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > >> > -- >> > >> ------------------------------------------------------------------------- >> > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612) 624-5040 >> > School of Physics and Astronomy - University of Minnesota >> > >> ------------------------------------------------------------------------- >> > _______________________________________________ >> > freebsd-fs@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > > From owner-freebsd-fs@freebsd.org Tue Jul 21 04:51:25 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9D7919A72E5 for ; Tue, 21 Jul 2015 04:51:25 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com [IPv6:2a00:1450:400c:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1D7491AA7 for ; Tue, 21 Jul 2015 04:51:25 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wicgb10 with SMTP id gb10so43877841wic.1 for ; Mon, 20 Jul 2015 21:51:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=7QOnTTc4nGbGxfpBgNP6hi1kgFQiMY9F9FQBOE/p1IM=; b=oAgUiTrOJAj0Ie3ajoBZZm4IuNUsd9+7C4vXCbS1QpxQ1e+KpIFCSgshZmnW1YlYyQ FITt2ZSeiXV33BlL2BgEGGf0h5KM6rQPp+NXW1Mfg/JLVeoFCCzORnU4rWNOEb8Pjh3e E/TUc66Bp45r7jMUsv7aQL4Po2zJUelWGKG+3i4ya+sf9kCJ7cc5n2QHIdBtaWExBwUG aKVzxOEOIuSufioG5e+yOG/wnSb8vGSMDY1FDqklWHOwobLPDdyr0kUe63IBWOaan+48 uZ6RUq2bAasp3Rnhtuowpjk8H+S/9JUShHuS3z6YhpgB1L3AnPi8fFb+gtuMeYUnmcMG AWcQ== X-Received: by 10.194.59.98 with SMTP id y2mr63959633wjq.42.1437454283437; Mon, 20 Jul 2015 21:51:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Mon, 20 Jul 2015 21:51:03 -0700 (PDT) In-Reply-To: References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <5594B008.10202@freebsd.org> <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> From: Ahmed Kamal Date: Tue, 21 Jul 2015 06:51:03 +0200 Message-ID: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) To: Rick Macklem Cc: Graham Allan , Ahmed Kamal via freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2015 04:51:25 -0000 rhel6 servers logs were flooded with errors like: http://paste2.org/EwLGcGF6 The Freebsd box was being pounded with 40Mbps of nfs traffic .. probably Linux was retrying too hard ?! I had to reboot all PCs and after the last one, nfsd CPU usage dropped immediately to zero On Tue, Jul 21, 2015 at 5:52 AM, Ahmed Kamal < email.ahmedkamal@googlemail.com> wrote: > More info .. Just noticed nfsd is spinning the cpu at 500% :( I just did > the dtrace with: > > dtrace -n profile-1001 { @[stack()] = count(); } > The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom) > > Since rebooting the nfs server didn't fix it .. I imagine I'd have to > reboot all NFS clients .. This would be really sad .. Any advice is most > appreciated .. Thanks > > > On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal < > email.ahmedkamal@googlemail.com> wrote: > >> Hi folks, >> >> I've upgraded a test client to rhel6 today, and I'll keep an eye on it to >> see what happens. >> >> During the process, I made the (I guess mistake) of zfs send | recv to a >> locally attached usb disk for backup purposes .. long story short, sharenfs >> property on the received filesystem was causing some nfs/mountd errors in >> logs .. I wasn't too happy with what I got .. I destroyed the backup >> datasets and the whole pool eventually .. and then rebooted the whole nas >> box .. After reboot my logs are still flooded with >> >> Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session >> Jul 21 05:13:07 nas last message repeated 7536 times >> Jul 21 05:15:08 nas last message repeated 29664 times >> >> Not sure what that means .. or how it can be stopped .. Anyway, will keep >> you posted on progress. >> >> On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem >> wrote: >> >>> Graham Allan wrote: >>> > I'm curious how things are going for you with this? >>> > >>> > Reading your thread did pique my interest since we have a lot of >>> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I meant >>> > to glance through our logs for signs of the same issue, but today I >>> > started investigating a machine which appeared to have hung processes, >>> > high rpciod load, and high traffic to the NFS server. Of course it is >>> > exactly this issue. >>> > >>> > The affected machine is running SL5 though most of our server nodes are >>> > now SL6. I can see errors from most of them but the SL6 systems appear >>> > less affected - I see a stream of the sequence-id errors in their logs >>> but >>> > things in general keep working. The one SL5 machine I'm looking at >>> > has a single sequence-id error in today's logs, but then goes into a >>> > stream of "state recovery failed" then "Lock reclaim failed". It's >>> > probably partly related to the particular workload on this machine. >>> > >>> > I would try switching our SL6 machines to NFS 4.1 to see if the >>> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is it in >>> > 10.1?). >>> > >>> Btw, I've done some testing against a fairly recent Fedora and haven't >>> seen >>> the problem. If either of you guys could load a recent Fedora on a test >>> client >>> box, it would be interesting to see if it suffers from this. (My >>> experience is >>> that the Fedora distros have more up to date Linux NFS clients.) >>> >>> rick >>> >>> > At the NFS servers, most of the sysctl settings are already tuned >>> > from defaults. eg tcp.highwater=100000, vfs.nfsd.tcpcachetimeo=300, >>> > 128-256 nfs kernel threads. >>> > >>> > Graham >>> > >>> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via freebsd-fs >>> wrote: >>> > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming >>> > > reports from users about hung vnc sessions. So maybe just maybe, >>> linux >>> > > clients are able to somehow recover from this bad sequence messages. >>> I >>> > > could still see the bad sequence error message in logs though >>> > > >>> > > Why isn't the highwater tunable set to something better by default ? >>> I mean >>> > > this server is certainly not under a high or unusual load (it's only >>> 40 PCs >>> > > mounting from it) >>> > > >>> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal >>> > > >> > > > wrote: >>> > > >>> > > > Thanks all .. I understand now we're doing the "right thing" .. >>> Although >>> > > > if mounting keeps wedging, I will have to solve it somehow! Either >>> using >>> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. >>> > > > >>> > > > Regarding Xin's patch, is it possible to build the patched nfsd >>> code, as >>> > > > a >>> > > > kernel module ? I'm looking to minimize my delta to upstream. >>> > > > >>> > > > Also would adopting Xin's patch and hiding it behind a >>> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not >>> the >>> > > > last >>> > > > person on earth to hit this) ? >>> > > > >>> > > > Thanks a lot for all the help! >>> > > > >>> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < >>> rmacklem@uoguelph.ca> >>> >>> > > > wrote: >>> > > > >>> > > >> Ahmed Kamal wrote: >>> > > >> > Appreciating the fruitful discussion! Can someone please >>> explain to >>> > > >> > me, >>> > > >> > what would happen in the current situation (linux client doing >>> this >>> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect >>> of >>> > > >> > that? >>> > > >> Well, as you've seen, the Linux client doesn't function correctly >>> > > >> against >>> > > >> the FreeBSD server (and probably others that don't support this >>> > > >> "skip-by-1" >>> > > >> case). >>> > > >> >>> > > >> > What do users see? Any chances of data loss? >>> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what >>> the >>> > > >> Linux >>> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the >>> guy >>> > > >> observing >>> > > >> it. >>> > > >> >>> > > >> > >>> > > >> > Also, I find it strange that netapp have acknowledged this is a >>> bug on >>> > > >> > their side, which has been fixed since then! >>> > > >> Yea, I think Netapp screwed up. For some reason their server >>> allowed >>> > > >> this, >>> > > >> then was fixed to not allow it and then someone decided that was >>> broken >>> > > >> and >>> > > >> reversed it. >>> > > >> >>> > > >> > I also find it strange that I'm the first to hit this :) Is no >>> one >>> > > >> running >>> > > >> > nfs4 yet! >>> > > >> > >>> > > >> Well, it seems to be slowly catching on. I suspect that the Linux >>> client >>> > > >> mounting a Netapp is the most common use of it. Since it appears >>> that >>> > > >> they >>> > > >> flip flopped w.r.t. who's bug this is, it has probably persisted. >>> > > >> >>> > > >> It may turn out that the Linux client has been fixed or it may >>> turn out >>> > > >> that most servers allowed this "skip-by-1" even though David >>> Noveck (one >>> > > >> of the main authors of the protocol) seems to agree with me that >>> it >>> > > >> should >>> > > >> not be allowed. >>> > > >> >>> > > >> It is possible that others have bumped into this, but it wasn't >>> isolated >>> > > >> (I wouldn't have guessed it, so it was good you pointed to the >>> RedHat >>> > > >> discussion) >>> > > >> and they worked around it by reverting to NFSv3 or similar. >>> > > >> The protocol is rather complex in this area and changed >>> completely for >>> > > >> NFSv4.1, >>> > > >> so many have also probably moved onto NFSv4.1 where this won't be >>> an >>> > > >> issue. >>> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and >>> doesn't >>> > > >> use >>> > > >> these seqid fields.) >>> > > >> >>> > > >> This is all just mho, rick >>> > > >> >>> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < >>> rmacklem@uoguelph.ca> >>> > > >> wrote: >>> > > >> > >>> > > >> > > Julian Elischer wrote: >>> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: >>> > > >> > > > > I am going to post to nfsv4@ietf.org to see what they >>> say. >>> > > >> > > > > Please >>> > > >> > > > > let me know if Xin Li's patch resolves your problem, even >>> though >>> > > >> > > > > I >>> > > >> > > > > don't believe it is correct except for the UINT32_MAX >>> case. Good >>> > > >> > > > > luck with it, rick >>> > > >> > > > and please keep us all in the loop as to what they say! >>> > > >> > > > >>> > > >> > > > the general N+2 bit sounds like bullshit to me.. its always >>> N+1 in >>> > > >> > > > a >>> > > >> > > > number field that has a >>> > > >> > > > bit of slack at wrap time (probably due to some ambiguity >>> in the >>> > > >> > > > original spec). >>> > > >> > > > >>> > > >> > > Actually, since N is the lock op already done, N + 1 is the >>> next >>> > > >> > > lock >>> > > >> > > operation in order. Since lock ops need to be strictly >>> ordered, >>> > > >> allowing >>> > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no >>> sense. >>> > > >> > > >>> > > >> > > I think the author of the RFC meant that N + 2 or greater >>> fails, but >>> > > >> it >>> > > >> > > was poorly worded. >>> > > >> > > >>> > > >> > > I will pass along whatever I get from nfsv4@ietf.org. (There >>> is an >>> > > >> archive >>> > > >> > > of it somewhere, but I can't remember where.;-) >>> > > >> > > >>> > > >> > > rick >>> > > >> > > _______________________________________________ >>> > > >> > > freebsd-fs@freebsd.org mailing list >>> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> > > >> > > To unsubscribe, send any mail to >>> > > >> > > "freebsd-fs-unsubscribe@freebsd.org" >>> > > >> > > >>> > > >> > >>> > > >> >>> > > > >>> > > > >>> > > _______________________________________________ >>> > > freebsd-fs@freebsd.org mailing list >>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org >>> " >>> > >>> > -- >>> > >>> ------------------------------------------------------------------------- >>> > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612) 624-5040 >>> > School of Physics and Astronomy - University of Minnesota >>> > >>> ------------------------------------------------------------------------- >>> > _______________________________________________ >>> > freebsd-fs@freebsd.org mailing list >>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>> > >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>> >> >> > From owner-freebsd-fs@freebsd.org Tue Jul 21 04:52:28 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1784A9A731A for ; Tue, 21 Jul 2015 04:52:28 +0000 (UTC) (envelope-from m.seaman@infracaninophile.co.uk) Received: from smtp.infracaninophile.co.uk (smtp.infracaninophile.co.uk [IPv6:2001:8b0:151:1:3cd3:cd67:fafa:3d78]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.infracaninophile.co.uk", Issuer "infracaninophile.co.uk" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 995E21B9B for ; Tue, 21 Jul 2015 04:52:27 +0000 (UTC) (envelope-from m.seaman@infracaninophile.co.uk) Received: from zero-gravitas.local ([IPv6:2001:8b0:151:1:2ef0:eeff:fe24:fa38]) (authenticated bits=0) by smtp.infracaninophile.co.uk (8.15.2/8.15.2) with ESMTPSA id t6L4qEpS096334 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Tue, 21 Jul 2015 05:52:16 +0100 (BST) (envelope-from m.seaman@infracaninophile.co.uk) Authentication-Results: smtp.infracaninophile.co.uk; dmarc=none header.from=infracaninophile.co.uk DKIM-Filter: OpenDKIM Filter v2.9.2 smtp.infracaninophile.co.uk t6L4qEpS096334 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=infracaninophile.co.uk; s=201001-infracaninophile; t=1437454336; bh=5q6+uIqqEcr6ZMG9tOp91XyV6g0o/LokOJut1of9FaU=; h=Date:From:To:Subject:References:In-Reply-To; z=Date:=20Tue,=2021=20Jul=202015=2005:52:13=20+0100|From:=20Matthew =20Seaman=20|To:=20freebsd-fs@fre ebsd.org|Subject:=20Re:=20Attach=20a=20device=20to=20an=20unimport ed=20pool?|References:=20<55ADA043.5060002@FreeBSD.org>|In-Reply-T o:=20<55ADA043.5060002@FreeBSD.org>; b=etMuF3vfG3UI0G0CyW/PBmIIz8+fiML1k6ljBvCzivWoDRXb32lJxZCu2tLhmUKjy K0r81GqHKANxMCmfxyLCbaIZ9Q8w3Ohv1u0G/hFSTbV0D2mT6b4p5hxhaeCM0ThbKq xrmL//LvvGJTpij6Qh9deHOcxiL9jzhWhbJt5Lxc= X-Authentication-Warning: lucid-nonsense.infracaninophile.co.uk: Host [IPv6:2001:8b0:151:1:2ef0:eeff:fe24:fa38] claimed to be zero-gravitas.local Message-ID: <55ADCFFD.8090000@infracaninophile.co.uk> Date: Tue, 21 Jul 2015 05:52:13 +0100 From: Matthew Seaman User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Attach a device to an unimported pool? References: <55ADA043.5060002@FreeBSD.org> In-Reply-To: <55ADA043.5060002@FreeBSD.org> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="LjVkMiCH5ARqMhMVVoMtKWSfgKQ2mEiJK" X-Virus-Scanned: clamav-milter 0.98.7 at lucid-nonsense.infracaninophile.co.uk X-Virus-Status: Clean X-Spam-Status: No, score=-1.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on lucid-nonsense.infracaninophile.co.uk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2015 04:52:28 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --LjVkMiCH5ARqMhMVVoMtKWSfgKQ2mEiJK Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 2015/07/21 02:28, Matthew Seaman wrote: > I have an unimportable zpool which is telling me to 'attach the missing= > devices and try again' >=20 > The missing devices are a mirrored pair for logs -- they exist and > should be readily attachable, but it seems I cannot attach devices to a= n > un-imported pool, and I can't import the pool with out attaching the > devices. >=20 > Any clues as to how to unfuck this system most gratefully received. Following myself up, because I missed out providing enough useful information. This is a 10.1-RELEASE machine -- I was upgrading it from p5 to p14 and changing the vfs.zfs.arc_max setting: nothing that should have been anything other than routine. However the machine wouldn't reboot properly, and I have eventually ended up with it booted up from a USB stick trying to repair it's boot pool. I've tried: zpool import -R /mnt zroot zpool import -f -R /mnt zroot zpool import -f -F -R /mnt zroot zpool import -f -m /mnt zroot plus variations on zpool attach ... and zpool replace ... none of which has succeeded. Matthew --LjVkMiCH5ARqMhMVVoMtKWSfgKQ2mEiJK Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQJ8BAEBCgBmBQJVrc/9XxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQxOUYxNTRFQ0JGMTEyRTUwNTQ0RTNGMzAw MDUxM0YxMEUwQTlFNEU3AAoJEABRPxDgqeTniV8P/RscpCXVPwtU+jMkMcI9QwV3 wP+hgM5V4NoaWxt20/ma4eQIH1tqCQzAhn45Vbk3TCSGFnpAieQwcquyEOM8ugc8 z2bZ80t3Z4Ps/3/2SOW4D0s0k57p1sKn+ABmaypuku84AzHeOI+ZlZPddB4vZW8E PsefppJPqtxQML2lWinvcXkDgnfXaIkPguh39ASQWu26L7IT5gXULUa4Zq496CWH qGf6+RZoc51SH4/X5yEMHMjIIu3YivqxaEiXn/Y3ddXHZKwVsh7SW/VCGb8zue9h JqOd52JBtNPM0/UkUmpxZNWbzDEsBbCbRmGwjQIq9JL2w6s+wAhZsT2JKpWJ4aXL WSZabvmGBSjcUcLI7KcNgA8QOPY+7s9zERQRvL3Nh2fZ/IQSfU3kmo5ZqrbZu08O Grgcn2P4SqlSxBkgs/mmAOUbFShHSve0313db1wYJiiaX52rFoRcJYbv4M14N3CM AcvWExsMmtOY1nFXJpupsvw6Ms6PGbaQP1otT4s7apfs7BiCdLre4D57ieUdcU3T eN8EyibSpYfDvKYSzP5THHokAZhclRVlRXaY/TOk96fFHeb3hVdjKhKZGmCQGj2R L4lbY70T95S8Cpc5KUnNWqoyr1kOju9ODDi4NmWA8PpAqA3drMyEdSSjANpS/w21 QifYW5BEh4vDp4SQoMlT =cdU8 -----END PGP SIGNATURE----- --LjVkMiCH5ARqMhMVVoMtKWSfgKQ2mEiJK-- From owner-freebsd-fs@freebsd.org Tue Jul 21 07:54:59 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E4EE99A6F75 for ; Tue, 21 Jul 2015 07:54:59 +0000 (UTC) (envelope-from ronald-lists@klop.ws) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.81]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AA9F619DD for ; Tue, 21 Jul 2015 07:54:59 +0000 (UTC) (envelope-from ronald-lists@klop.ws) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1ZHSO2-0000Vw-Op; Tue, 21 Jul 2015 09:54:56 +0200 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: freebsd-fs@freebsd.org, "Matthew Seaman" Subject: Re: Attach a device to an unimported pool? References: <55ADA043.5060002@FreeBSD.org> <55ADCFFD.8090000@infracaninophile.co.uk> Date: Tue, 21 Jul 2015 09:54:49 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Ronald Klop" Message-ID: In-Reply-To: <55ADCFFD.8090000@infracaninophile.co.uk> User-Agent: Opera Mail/1.0 (Win32) X-Authenticated-As-Hash: 398f5522cb258ce43cb679602f8cfe8b62a256d1 X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: - X-Spam-Score: -1.0 X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED, BAYES_20, URIBL_BLOCKED autolearn=disabled version=3.3.1 X-Scan-Signature: 12f61b0c8dc8dcc8c992b8e1fde77987 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2015 07:55:00 -0000 On Tue, 21 Jul 2015 06:52:13 +0200, Matthew Seaman wrote: > On 2015/07/21 02:28, Matthew Seaman wrote: >> I have an unimportable zpool which is telling me to 'attach the missing >> devices and try again' >> >> The missing devices are a mirrored pair for logs -- they exist and >> should be readily attachable, but it seems I cannot attach devices to an >> un-imported pool, and I can't import the pool with out attaching the >> devices. >> >> Any clues as to how to unfuck this system most gratefully received. > > Following myself up, because I missed out providing enough useful > information. > > This is a 10.1-RELEASE machine -- I was upgrading it from p5 to p14 and > changing the vfs.zfs.arc_max setting: nothing that should have been > anything other than routine. However the machine wouldn't reboot > properly, and I have eventually ended up with it booted up from a USB > stick trying to repair it's boot pool. > > I've tried: > > zpool import -R /mnt zroot > zpool import -f -R /mnt zroot > zpool import -f -F -R /mnt zroot > zpool import -f -m /mnt zroot > > plus variations on zpool attach ... and zpool replace ... none of which > has succeeded. > > Matthew > > I did not try, but can you import the pool readonly? Ronald. From owner-freebsd-fs@freebsd.org Tue Jul 21 08:48:56 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 598439A79FB for ; Tue, 21 Jul 2015 08:48:56 +0000 (UTC) (envelope-from matthew@freebsd.org) Received: from smtp.infracaninophile.co.uk (smtp.infracaninophile.co.uk [IPv6:2001:8b0:151:1:3cd3:cd67:fafa:3d78]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.infracaninophile.co.uk", Issuer "infracaninophile.co.uk" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id C0896119A for ; Tue, 21 Jul 2015 08:48:55 +0000 (UTC) (envelope-from matthew@freebsd.org) Received: from ox-dell39.ox.adestra.com (no-reverse-dns.metronet-uk.com [85.199.232.226] (may be forged)) (authenticated bits=0) by smtp.infracaninophile.co.uk (8.15.2/8.15.2) with ESMTPSA id t6L8mcBj001849 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jul 2015 09:48:45 +0100 (BST) (envelope-from matthew@freebsd.org) Authentication-Results: smtp.infracaninophile.co.uk; dmarc=none header.from=freebsd.org DKIM-Filter: OpenDKIM Filter v2.9.2 smtp.infracaninophile.co.uk t6L8mcBj001849 Authentication-Results: smtp.infracaninophile.co.uk/t6L8mcBj001849; dkim=none reason="no signature"; dkim-adsp=none; dkim-atps=neutral X-Authentication-Warning: lucid-nonsense.infracaninophile.co.uk: Host no-reverse-dns.metronet-uk.com [85.199.232.226] (may be forged) claimed to be ox-dell39.ox.adestra.com Message-ID: <55AE0765.1020004@freebsd.org> Date: Tue, 21 Jul 2015 09:48:37 +0100 From: Matthew Seaman User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Ronald Klop , freebsd-fs@freebsd.org Subject: Re: Attach a device to an unimported pool? References: <55ADA043.5060002@FreeBSD.org> <55ADCFFD.8090000@infracaninophile.co.uk> In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="wvMiv1faAQIGpTT3oLqM1bX829DH8gtCk" X-Virus-Scanned: clamav-milter 0.98.7 at lucid-nonsense.infracaninophile.co.uk X-Virus-Status: Clean X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on lucid-nonsense.infracaninophile.co.uk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2015 08:48:56 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --wvMiv1faAQIGpTT3oLqM1bX829DH8gtCk Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 07/21/15 08:54, Ronald Klop wrote: > I did not try, but can you import the pool readonly? Neither did I unfortunately. In the end I had to blow everything away and rebuild from scratch as the box was needed in service pronto. Matthew --wvMiv1faAQIGpTT3oLqM1bX829DH8gtCk Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJVrgdlAAoJEABRPxDgqeTn/oUP/25hyT9SDJr3pYAVzfJJOVXg f9yoDK/MRukIQSwvSisl6yduVTKZ6fPjZ2w42uZ5Mrle7u5e4vAC9ife8XG6eVcH Zk3oT1xR5LRMt6frqOIBnJs1grDrENDcpsj9OCalysBHFgsQjTklE9WWb2lcCzbx WdwQPBwzbYs8XtAcdFddXcLqLWaJ1gbqqd0LQuE1cqkXsOsKOU5cqxDPR18DKAE7 9tPEptLVO8jejlE4A9T6Ku0Onhu86tBVAkpJq6g5/+Pi1112nmzHG82WqALscwSf z0eOMa/IexXJU4vAfBC1UR8azc8zZJC/jKNykM4lR8jZsk9Ue8fddU/BhtTMNPX/ i5o0M1COXkZDfqaukfsgvoBahkFH2nakGHLTCT+CTxpLWR5IGLBCfyzxlxVVNRll hXDMfVl2mETw0ruEtUreMt9vqBSaQWpryqLZg4DoRUs4TU1MTGpwEZTovpD5cBqT FgwEbKEYJ2BxxBl4HntVgCaH8QNi0nX5lOtG0Gu7c0OGZLVZG6Hma06fnmWLbknS TzNLUYCJDx1pCkfVIIqbqeaDERrKogWIeTQsp6V6zLqsiKxrAHjyyGGHMThYhY1Z VBEN9G1W3DKnnBvKNeLP+ZSMYi5VTfg7dnqbJ9aibpM5afCdH+ynnv6MBKkvWu1D RN/AUq87xzyGErKwRPQQ =rB8f -----END PGP SIGNATURE----- --wvMiv1faAQIGpTT3oLqM1bX829DH8gtCk-- From owner-freebsd-fs@freebsd.org Tue Jul 21 14:30:04 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 386F19A7212 for ; Tue, 21 Jul 2015 14:30:04 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wg0-x22c.google.com (mail-wg0-x22c.google.com [IPv6:2a00:1450:400c:c00::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D2AAE1115 for ; Tue, 21 Jul 2015 14:30:03 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wgbcc4 with SMTP id cc4so65044300wgb.3 for ; Tue, 21 Jul 2015 07:30:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=AJ78TDKLUaUy+mP18D96yWAuvG2DkMdMkD0UldkCQWs=; b=zvOej42AJoiaD7T883LRRUtBAiAUjtawRHlUqKGrQHYH4DzAvIfGg2iKFDRXvVQwgh IwVlCsSsnpFtMGOcoHYdCQ12VxvMkocqQ4Igr5SyTAEiZ2RMzczTX+OZjfeuo8UZhrrK wRpc9SDMLx6pOacIB1Fx3QXYOhwL6zXEy3hQATI5fxMG1yqjpd7u9ysC61wbk12T+xR2 akTt1v4C6llrpX2D3seAYY3rtjMTDBIbhGnjCJ96Nc3qT8y+FEP6+NLRFpY45p0VGkhU w7bRG6Pl5MPv9AKYSvEsJ7ZqpKom9I2QrMiD4xxifQTXI5wlg2noJ/qvULUWj2x9g2ux pHMw== X-Received: by 10.180.21.244 with SMTP id y20mr32158899wie.65.1437489002183; Tue, 21 Jul 2015 07:30:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Tue, 21 Jul 2015 07:29:42 -0700 (PDT) From: Ahmed Kamal Date: Tue, 21 Jul 2015 16:29:42 +0200 Message-ID: Subject: Allow filtering of properties on zfs receive | is it merged ? To: freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2015 14:30:04 -0000 Hi folks, I am interested in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173234 I don't understand what Steven means by (I'll take it) .. is this now merged ? It's not showing up on my 10.1p14 Related work: https://www.illumos.org/issues/2745 https://github.com/zfsonlinux/zfs/issues/1350 http://www.listbox.com/member/archive/182191/2014/20140405192243:28750F74-BD19-11E3-A7AF-904883EF6A73/ Why I need this => I'd like to zfs send | recv from locally attached disks, while filtering properties like (mountpoint, and sharenfs) to avoid production server disruption. Thanks! From owner-freebsd-fs@freebsd.org Tue Jul 21 14:31:15 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A87C79A731D for ; Tue, 21 Jul 2015 14:31:15 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wg0-x236.google.com (mail-wg0-x236.google.com [IPv6:2a00:1450:400c:c00::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 42EB11236 for ; Tue, 21 Jul 2015 14:31:15 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wgav7 with SMTP id v7so92740198wga.2 for ; Tue, 21 Jul 2015 07:31:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=BSsUlg8l/HkvHlVsvHCjtLPvLObl+5MltOIcP91NVMc=; b=SGRqj1E7VkUlqYfT/DrfwV82h+GE7ucUPPLDKiMtUjRgOoaJvKWZLzCgDdcdDK7kKe kbnmgzwzCG9jvOt27/2HlyG1RUGVWTc4HVN/ZjTnBQSm/T7npnKPC/Dr3DGiP2WSe6H+ RPYH/z/tMKL7PcKGsdSTA+eoDpkVJUcgn69VfTKlQZeFrL9SvD8MSl0wiexTGUPZteTG P4egd5FgdrIq2tcK6zXrhUWLc1W2ns+HPn+usvEcddNiI0C2EMJw+wSRBbwL5q7fjjQN jw+4Oac4rr7gCISy4BZ8h1aqF2iYO50mIrk46Urc0LB7ce47BDqj1pWN3eGWZk6SBXin 0Cnw== X-Received: by 10.180.21.244 with SMTP id y20mr32169025wie.65.1437489073722; Tue, 21 Jul 2015 07:31:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Tue, 21 Jul 2015 07:30:54 -0700 (PDT) In-Reply-To: References: From: Ahmed Kamal Date: Tue, 21 Jul 2015 16:30:54 +0200 Message-ID: Subject: Re: Allow filtering of properties on zfs receive | is it merged ? To: freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2015 14:31:15 -0000 Just got a reply from Steven ... So FYI .. Wish someone can clean it up though, it looks quite useful .. Thanks! --- Comment #4 from Steven Hartland --- Afraid not, need to get some time to clean up based on feedback from illumos On Tue, Jul 21, 2015 at 4:29 PM, Ahmed Kamal < email.ahmedkamal@googlemail.com> wrote: > Hi folks, > > I am interested in > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173234 > I don't understand what Steven means by (I'll take it) .. is this now > merged ? It's not showing up on my 10.1p14 > > Related work: > https://www.illumos.org/issues/2745 > https://github.com/zfsonlinux/zfs/issues/1350 > > http://www.listbox.com/member/archive/182191/2014/20140405192243:28750F74-BD19-11E3-A7AF-904883EF6A73/ > > Why I need this => I'd like to zfs send | recv from locally attached > disks, while filtering properties like (mountpoint, and sharenfs) to avoid > production server disruption. > > Thanks! > From owner-freebsd-fs@freebsd.org Tue Jul 21 23:06:08 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0857B9A7723 for ; Tue, 21 Jul 2015 23:06:08 +0000 (UTC) (envelope-from javocado@gmail.com) Received: from mail-lb0-x22b.google.com (mail-lb0-x22b.google.com [IPv6:2a00:1450:4010:c04::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8FD5F11EC for ; Tue, 21 Jul 2015 23:06:07 +0000 (UTC) (envelope-from javocado@gmail.com) Received: by lbbqi7 with SMTP id qi7so45177672lbb.3 for ; Tue, 21 Jul 2015 16:06:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=ShrsgseL+luR528Hvl7oVBPMHbPtVuxQqnBYIiLM6gE=; b=SLi679BlSSXjHE7YTI5cS+F/kxSBG5pSlyRZGPcLSl0JE1k54ahITm+ZylN4OSlQGu AowEbHuXpMZDkFEERlzNQQpo/333poCVkZXV8MKnFHBxo0zR2PhTZnszrEz7Jq2bRQQt erHnw5uAm+hPvU7ZqtDdFTJyJ4dPVBZiatiOlFOIfHZy3uC1VNZAxs3mdGLhZl0PSfvj pL7Bqxqk3Q7STMUK046K3W6Zvuae6/BG35NK3BrSF6ZhQ8CILN4UXGApEokWiqZMH+2B 1wLs4yvz//k2swoXrZZ3CgsyvZe08B2ihVVhXQAsJmtg2JWWmYxuVNA35IW0hu3woxvr LcTA== MIME-Version: 1.0 X-Received: by 10.112.219.70 with SMTP id pm6mr34249137lbc.41.1437519965634; Tue, 21 Jul 2015 16:06:05 -0700 (PDT) Received: by 10.114.96.8 with HTTP; Tue, 21 Jul 2015 16:06:05 -0700 (PDT) Date: Tue, 21 Jul 2015 16:06:05 -0700 Message-ID: Subject: Prioritize resilvering priority From: javocado To: FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2015 23:06:08 -0000 Hi, How does one go about prioritizing the resilvering process so it does not overwhelm normal disk I/O on a FreeBSD (8.3amd) system? Further, can this be altered in real time, or do the settings have to be in place prior to the resilvering even starting? Thanks! From owner-freebsd-fs@freebsd.org Wed Jul 22 00:25:27 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E1D5F9A7A99 for ; Wed, 22 Jul 2015 00:25:27 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 81A8911B0 for ; Wed, 22 Jul 2015 00:25:26 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DGBAAG4q5V/61jaINWAwMZgx8xYwYGs0iIU4FrCoU3SgKBeRMBAQEBAQEBgQqEJAEBBAEBASArIAsFCwIBCA4KAgINGQICJwEJJgIECAcEARwEh3gDEggFtgqQfg2FLgEBAQEBAQQBAQEBAQEBG4EiiiqCTQqBQwkQAgEFCAEOJBAHEYIcDC8SgTEFhw6NRYR1gmCBfoQaRoNWjAgJgz2DXwImgg0cgW8iMQd/QYEEAQEB X-IronPort-AV: E=Sophos;i="5.15,519,1432612800"; d="scan'208";a="227192858" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 21 Jul 2015 20:25:19 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 65EF415F561; Tue, 21 Jul 2015 20:25:19 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id ICEjA7XoJXQq; Tue, 21 Jul 2015 20:25:18 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id EB5B415F565; Tue, 21 Jul 2015 20:25:17 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id lId1qOvI_YPt; Tue, 21 Jul 2015 20:25:17 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id CCF9015F561; Tue, 21 Jul 2015 20:25:17 -0400 (EDT) Date: Tue, 21 Jul 2015 20:25:17 -0400 (EDT) From: Rick Macklem To: Ahmed Kamal Cc: Graham Allan , Ahmed Kamal via freebsd-fs Message-ID: <1363104359.1192506.1437524717797.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.10] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: Linux NFSv4 clients are getting (bad sequence-id error!) Thread-Index: 2FlGUqTWsnivxNnbgoS5zoMsK4Mqvw== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jul 2015 00:25:28 -0000 Ahmed Kamal wrote: > rhel6 servers logs were flooded with errors like: http://paste2.org/EwLGcGF6 > The Freebsd box was being pounded with 40Mbps of nfs traffic .. probably > Linux was retrying too hard ?! I had to reboot all PCs and after the last > one, nfsd CPU usage dropped immediately to zero > The error 10052 is NFS4ERR_BAD_SESSION. For the Destroy_Session operation it could be generated for several reasons, but the most likely is that the Destroy_session operation isn't the last one in the compound (as required by RFC-5661). Snippet of RFC-5661: If the COMPOUND request starts with SEQUENCE, and if the sessionids specified in SEQUENCE and DESTROY_SESSION are the same, then o DESTROY_SESSION MUST be the final operation in the COMPOUND request. If it happens again, capture packets for a short period of time (just need one of the RPC requests with Destroy_session in it). When you look at the RPC request in wireshark, if it isn't the last operation in the compound, then that's why this is happening (and broken w.r.t. the RFC, I suspect). rick ps: If you email me a small raw packet capture with this RPC in it, I can take a look at it. > On Tue, Jul 21, 2015 at 5:52 AM, Ahmed Kamal < > email.ahmedkamal@googlemail.com> wrote: > > > More info .. Just noticed nfsd is spinning the cpu at 500% :( I just did > > the dtrace with: > > > > dtrace -n profile-1001 { @[stack()] = count(); } > > The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom) > > > > Since rebooting the nfs server didn't fix it .. I imagine I'd have to > > reboot all NFS clients .. This would be really sad .. Any advice is most > > appreciated .. Thanks > > > > > > On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal < > > email.ahmedkamal@googlemail.com> wrote: > > > >> Hi folks, > >> > >> I've upgraded a test client to rhel6 today, and I'll keep an eye on it to > >> see what happens. > >> > >> During the process, I made the (I guess mistake) of zfs send | recv to a > >> locally attached usb disk for backup purposes .. long story short, > >> sharenfs > >> property on the received filesystem was causing some nfs/mountd errors in > >> logs .. I wasn't too happy with what I got .. I destroyed the backup > >> datasets and the whole pool eventually .. and then rebooted the whole nas > >> box .. After reboot my logs are still flooded with > >> > >> Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > >> Jul 21 05:13:07 nas last message repeated 7536 times > >> Jul 21 05:15:08 nas last message repeated 29664 times > >> > >> Not sure what that means .. or how it can be stopped .. Anyway, will keep > >> you posted on progress. > >> > >> On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem > >> wrote: > >> > >>> Graham Allan wrote: > >>> > I'm curious how things are going for you with this? > >>> > > >>> > Reading your thread did pique my interest since we have a lot of > >>> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I meant > >>> > to glance through our logs for signs of the same issue, but today I > >>> > started investigating a machine which appeared to have hung processes, > >>> > high rpciod load, and high traffic to the NFS server. Of course it is > >>> > exactly this issue. > >>> > > >>> > The affected machine is running SL5 though most of our server nodes are > >>> > now SL6. I can see errors from most of them but the SL6 systems appear > >>> > less affected - I see a stream of the sequence-id errors in their logs > >>> but > >>> > things in general keep working. The one SL5 machine I'm looking at > >>> > has a single sequence-id error in today's logs, but then goes into a > >>> > stream of "state recovery failed" then "Lock reclaim failed". It's > >>> > probably partly related to the particular workload on this machine. > >>> > > >>> > I would try switching our SL6 machines to NFS 4.1 to see if the > >>> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is it in > >>> > 10.1?). > >>> > > >>> Btw, I've done some testing against a fairly recent Fedora and haven't > >>> seen > >>> the problem. If either of you guys could load a recent Fedora on a test > >>> client > >>> box, it would be interesting to see if it suffers from this. (My > >>> experience is > >>> that the Fedora distros have more up to date Linux NFS clients.) > >>> > >>> rick > >>> > >>> > At the NFS servers, most of the sysctl settings are already tuned > >>> > from defaults. eg tcp.highwater=100000, vfs.nfsd.tcpcachetimeo=300, > >>> > 128-256 nfs kernel threads. > >>> > > >>> > Graham > >>> > > >>> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via freebsd-fs > >>> wrote: > >>> > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming > >>> > > reports from users about hung vnc sessions. So maybe just maybe, > >>> linux > >>> > > clients are able to somehow recover from this bad sequence messages. > >>> I > >>> > > could still see the bad sequence error message in logs though > >>> > > > >>> > > Why isn't the highwater tunable set to something better by default ? > >>> I mean > >>> > > this server is certainly not under a high or unusual load (it's only > >>> 40 PCs > >>> > > mounting from it) > >>> > > > >>> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal > >>> > > >>> > > > wrote: > >>> > > > >>> > > > Thanks all .. I understand now we're doing the "right thing" .. > >>> Although > >>> > > > if mounting keeps wedging, I will have to solve it somehow! Either > >>> using > >>> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > >>> > > > > >>> > > > Regarding Xin's patch, is it possible to build the patched nfsd > >>> code, as > >>> > > > a > >>> > > > kernel module ? I'm looking to minimize my delta to upstream. > >>> > > > > >>> > > > Also would adopting Xin's patch and hiding it behind a > >>> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not > >>> the > >>> > > > last > >>> > > > person on earth to hit this) ? > >>> > > > > >>> > > > Thanks a lot for all the help! > >>> > > > > >>> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < > >>> rmacklem@uoguelph.ca> > >>> > >>> > > > wrote: > >>> > > > > >>> > > >> Ahmed Kamal wrote: > >>> > > >> > Appreciating the fruitful discussion! Can someone please > >>> explain to > >>> > > >> > me, > >>> > > >> > what would happen in the current situation (linux client doing > >>> this > >>> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect > >>> of > >>> > > >> > that? > >>> > > >> Well, as you've seen, the Linux client doesn't function correctly > >>> > > >> against > >>> > > >> the FreeBSD server (and probably others that don't support this > >>> > > >> "skip-by-1" > >>> > > >> case). > >>> > > >> > >>> > > >> > What do users see? Any chances of data loss? > >>> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what > >>> the > >>> > > >> Linux > >>> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the > >>> guy > >>> > > >> observing > >>> > > >> it. > >>> > > >> > >>> > > >> > > >>> > > >> > Also, I find it strange that netapp have acknowledged this is a > >>> bug on > >>> > > >> > their side, which has been fixed since then! > >>> > > >> Yea, I think Netapp screwed up. For some reason their server > >>> allowed > >>> > > >> this, > >>> > > >> then was fixed to not allow it and then someone decided that was > >>> broken > >>> > > >> and > >>> > > >> reversed it. > >>> > > >> > >>> > > >> > I also find it strange that I'm the first to hit this :) Is no > >>> one > >>> > > >> running > >>> > > >> > nfs4 yet! > >>> > > >> > > >>> > > >> Well, it seems to be slowly catching on. I suspect that the Linux > >>> client > >>> > > >> mounting a Netapp is the most common use of it. Since it appears > >>> that > >>> > > >> they > >>> > > >> flip flopped w.r.t. who's bug this is, it has probably persisted. > >>> > > >> > >>> > > >> It may turn out that the Linux client has been fixed or it may > >>> turn out > >>> > > >> that most servers allowed this "skip-by-1" even though David > >>> Noveck (one > >>> > > >> of the main authors of the protocol) seems to agree with me that > >>> it > >>> > > >> should > >>> > > >> not be allowed. > >>> > > >> > >>> > > >> It is possible that others have bumped into this, but it wasn't > >>> isolated > >>> > > >> (I wouldn't have guessed it, so it was good you pointed to the > >>> RedHat > >>> > > >> discussion) > >>> > > >> and they worked around it by reverting to NFSv3 or similar. > >>> > > >> The protocol is rather complex in this area and changed > >>> completely for > >>> > > >> NFSv4.1, > >>> > > >> so many have also probably moved onto NFSv4.1 where this won't be > >>> an > >>> > > >> issue. > >>> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and > >>> doesn't > >>> > > >> use > >>> > > >> these seqid fields.) > >>> > > >> > >>> > > >> This is all just mho, rick > >>> > > >> > >>> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < > >>> rmacklem@uoguelph.ca> > >>> > > >> wrote: > >>> > > >> > > >>> > > >> > > Julian Elischer wrote: > >>> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > >>> > > >> > > > > I am going to post to nfsv4@ietf.org to see what they > >>> say. > >>> > > >> > > > > Please > >>> > > >> > > > > let me know if Xin Li's patch resolves your problem, even > >>> though > >>> > > >> > > > > I > >>> > > >> > > > > don't believe it is correct except for the UINT32_MAX > >>> case. Good > >>> > > >> > > > > luck with it, rick > >>> > > >> > > > and please keep us all in the loop as to what they say! > >>> > > >> > > > > >>> > > >> > > > the general N+2 bit sounds like bullshit to me.. its always > >>> N+1 in > >>> > > >> > > > a > >>> > > >> > > > number field that has a > >>> > > >> > > > bit of slack at wrap time (probably due to some ambiguity > >>> in the > >>> > > >> > > > original spec). > >>> > > >> > > > > >>> > > >> > > Actually, since N is the lock op already done, N + 1 is the > >>> next > >>> > > >> > > lock > >>> > > >> > > operation in order. Since lock ops need to be strictly > >>> ordered, > >>> > > >> allowing > >>> > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no > >>> sense. > >>> > > >> > > > >>> > > >> > > I think the author of the RFC meant that N + 2 or greater > >>> fails, but > >>> > > >> it > >>> > > >> > > was poorly worded. > >>> > > >> > > > >>> > > >> > > I will pass along whatever I get from nfsv4@ietf.org. (There > >>> is an > >>> > > >> archive > >>> > > >> > > of it somewhere, but I can't remember where.;-) > >>> > > >> > > > >>> > > >> > > rick > >>> > > >> > > _______________________________________________ > >>> > > >> > > freebsd-fs@freebsd.org mailing list > >>> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> > > >> > > To unsubscribe, send any mail to > >>> > > >> > > "freebsd-fs-unsubscribe@freebsd.org" > >>> > > >> > > > >>> > > >> > > >>> > > >> > >>> > > > > >>> > > > > >>> > > _______________________________________________ > >>> > > freebsd-fs@freebsd.org mailing list > >>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org > >>> " > >>> > > >>> > -- > >>> > > >>> ------------------------------------------------------------------------- > >>> > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612) 624-5040 > >>> > School of Physics and Astronomy - University of Minnesota > >>> > > >>> ------------------------------------------------------------------------- > >>> > _______________________________________________ > >>> > freebsd-fs@freebsd.org mailing list > >>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >>> > > >>> _______________________________________________ > >>> freebsd-fs@freebsd.org mailing list > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >>> > >> > >> > > > From owner-freebsd-fs@freebsd.org Wed Jul 22 00:32:20 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6104C9A7C95 for ; Wed, 22 Jul 2015 00:32:20 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from mail.in-addr.com (mail.in-addr.com [IPv6:2a01:4f8:191:61e8::2525:2525]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 296131895 for ; Wed, 22 Jul 2015 00:32:20 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from gjp by mail.in-addr.com with local (Exim 4.85 (FreeBSD)) (envelope-from ) id 1ZHhxK-000HPQ-4d; Wed, 22 Jul 2015 01:32:18 +0100 Date: Wed, 22 Jul 2015 01:32:18 +0100 From: Gary Palmer To: javocado Cc: FreeBSD Filesystems Subject: Re: Prioritize resilvering priority Message-ID: <20150722003218.GD41419@in-addr.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on mail.in-addr.com); SAEximRunCond expanded to false X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jul 2015 00:32:20 -0000 On Tue, Jul 21, 2015 at 04:06:05PM -0700, javocado wrote: > Hi, > > How does one go about prioritizing the resilvering process so it does not > overwhelm normal disk I/O on a FreeBSD (8.3amd) system? Further, can this > be altered in real time, or do the settings have to be in place prior to > the resilvering even starting? You don't state, but is this on ZFS? I assume they're on 8.3 also, but look at settings like vfs.zfs.resilver_min_time_ms and vfs.zfs.resilver_delay in sysctl. Searching for resilver_min_time_ms and resilver_delay may give some hints, e.g. http://broken.net/uncategorized/zfs-performance-tuning-for-scrubs-and-resilvers/ does the opposite to what you want, but it should give hints. Also https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/zfs-advanced.html I think they're dynamic. I tuned the scrub settings and they took effect immediately. Regards, Gary From owner-freebsd-fs@freebsd.org Wed Jul 22 00:46:22 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A2D969A7F46 for ; Wed, 22 Jul 2015 00:46:22 +0000 (UTC) (envelope-from michelle@sorbs.net) Received: from hades.sorbs.net (hades.sorbs.net [67.231.146.201]) by mx1.freebsd.org (Postfix) with ESMTP id 949B51D1D; Wed, 22 Jul 2015 00:46:22 +0000 (UTC) (envelope-from michelle@sorbs.net) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from isux.com (firewall.isux.com [213.165.190.213]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0NRV00B536F58K00@hades.sorbs.net>; Tue, 21 Jul 2015 17:52:19 -0700 (PDT) Message-id: <55AEE7DB.1030609@sorbs.net> Date: Wed, 22 Jul 2015 02:46:19 +0200 From: Michelle Sullivan User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.24) Gecko/20100301 SeaMonkey/1.1.19 To: Gary Palmer Cc: javocado , FreeBSD Filesystems Subject: Re: Prioritize resilvering priority References: <20150722003218.GD41419@in-addr.com> In-reply-to: <20150722003218.GD41419@in-addr.com> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jul 2015 00:46:22 -0000 Gary Palmer wrote: > On Tue, Jul 21, 2015 at 04:06:05PM -0700, javocado wrote: > >> Hi, >> >> How does one go about prioritizing the resilvering process so it does not >> overwhelm normal disk I/O on a FreeBSD (8.3amd) system? Further, can this >> be altered in real time, or do the settings have to be in place prior to >> the resilvering even starting? >> > > You don't state, but is this on ZFS? > > I assume they're on 8.3 also, but look at settings like > vfs.zfs.resilver_min_time_ms and vfs.zfs.resilver_delay in sysctl. > Searching for resilver_min_time_ms and resilver_delay may give > some hints, e.g. > > http://broken.net/uncategorized/zfs-performance-tuning-for-scrubs-and-resilvers/ > > does the opposite to what you want, but it should give hints. Also > > https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/zfs-advanced.html > > I think they're dynamic. I tuned the scrub settings and they took effect > immediately. > > They mostly the settings are dynamic (both the ones you have above are) ... at worst some settings only take effect at boot time, so if you wanted to change them reboot. None of the settings have to be made before resilvering commences (remember that when you reboot, the resilver stops then is resumed on startup so any 'boot tunables' take effect) -- Michelle Sullivan http://www.mhix.org/ From owner-freebsd-fs@freebsd.org Wed Jul 22 18:52:57 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 58A4B9A81D6 for ; Wed, 22 Jul 2015 18:52:57 +0000 (UTC) (envelope-from javocado@gmail.com) Received: from mail-la0-x22c.google.com (mail-la0-x22c.google.com [IPv6:2a00:1450:4010:c03::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D610F161C; Wed, 22 Jul 2015 18:52:56 +0000 (UTC) (envelope-from javocado@gmail.com) Received: by lahh5 with SMTP id h5so143623261lah.2; Wed, 22 Jul 2015 11:52:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=lhIOxTeqlr5rT+sUGIIrk7Lk5VhRKPkH0WawPJx79hA=; b=txPoSh8HK4NYZgmEyuqhAe1w8nO59/Y6hTpNIfSBj0NbLJs1YtoKcCs4w6gPUa84Ca hscOfBjbd7lZo0/fp+aJGHjcqEZOaD3VsW7YDIcpg1E2IbmIOja41MIfvxe02CfoVGis 8bR8xc70LQ9WyuIxwa0ZVKvP2+oVb76aj4bT8FUY6xF2zbD9+Kq1kmheESmIXU7ehxSe 8+goiHfs7Cf0ecDBG4tmxs8aKoIl9Fce5Nag0JahJIJ+flwlRqYr2VXJiQQ98Hr+lG90 cBtYGizapDk86lpLKqrujuDkDjsS07YKeNmCd42ekI8l0LwOhEhYvxgsd3vDHm6XS6r5 YDIw== MIME-Version: 1.0 X-Received: by 10.112.40.66 with SMTP id v2mr3700309lbk.27.1437591174885; Wed, 22 Jul 2015 11:52:54 -0700 (PDT) Received: by 10.114.96.8 with HTTP; Wed, 22 Jul 2015 11:52:54 -0700 (PDT) In-Reply-To: <20150722003218.GD41419@in-addr.com> References: <20150722003218.GD41419@in-addr.com> Date: Wed, 22 Jul 2015 11:52:54 -0700 Message-ID: Subject: Re: Prioritize resilvering priority From: javocado To: Gary Palmer , michelle@sorbs.net Cc: FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jul 2015 18:52:57 -0000 Thanks for all that feedback. Yes, this is for zfs. It looks like some of those sysctls are not available in 8.3: # sysctl vfs.zfs.scrub_delay sysctl: unknown oid 'vfs.zfs.scrub_delay' # sysctl vfs.zfs.resilver_delay sysctl: unknown oid 'vfs.zfs.resilver_delay' # sysctl vfs.zfs.zfs_top_maxinflight sysctl: unknown oid 'vfs.zfs.zfs_top_maxinflight' But I do have: vfs.zfs.vdev.max_pending: 10 (dynamic) vfs.zfs.scrub_limit: 10 (loader) So, I think I would want to lower one or both of these to increase I/O responsiveness on the system. Correct? How would the 2 play together in terms of which to adjust to achieve the best system performance at the expense of a longer resilver? On Tue, Jul 21, 2015 at 5:32 PM, Gary Palmer wrote: > On Tue, Jul 21, 2015 at 04:06:05PM -0700, javocado wrote: > > Hi, > > > > How does one go about prioritizing the resilvering process so it does not > > overwhelm normal disk I/O on a FreeBSD (8.3amd) system? Further, can > this > > be altered in real time, or do the settings have to be in place prior to > > the resilvering even starting? > > You don't state, but is this on ZFS? > > I assume they're on 8.3 also, but look at settings like > vfs.zfs.resilver_min_time_ms and vfs.zfs.resilver_delay in sysctl. > Searching for resilver_min_time_ms and resilver_delay may give > some hints, e.g. > > > http://broken.net/uncategorized/zfs-performance-tuning-for-scrubs-and-resilvers/ > > does the opposite to what you want, but it should give hints. Also > > > https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/zfs-advanced.html > > I think they're dynamic. I tuned the scrub settings and they took effect > immediately. > > Regards, > > Gary > From owner-freebsd-fs@freebsd.org Wed Jul 22 20:06:03 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 761659A8D95 for ; Wed, 22 Jul 2015 20:06:03 +0000 (UTC) (envelope-from paul@kraus-haus.org) Received: from mail-qg0-f41.google.com (mail-qg0-f41.google.com [209.85.192.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 343F51931 for ; Wed, 22 Jul 2015 20:06:02 +0000 (UTC) (envelope-from paul@kraus-haus.org) Received: by qgii95 with SMTP id i95so77163332qgi.2 for ; Wed, 22 Jul 2015 13:05:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id:references:to; bh=Wlnr60k9w2HxY2XBwKnGbXnFJw3PLkCv3tTeC8DQOtM=; b=KgHfPm9kdxOn7iCHhm0CWt6BnHhPsupOSbfqFfjMlNLKhr8CdmpMLiU0hPcK60IvEQ XfFYVtQ1hW0swdiiMg2GCHS1xFvro0Mt0qIK8XEYkfNMTde3q5Vrp7IIXpNBaCpVIY/N 51irio+ZJAd0wMNwZkn0KSdo3FD/3EdyWt/ZfKjcG+Zt4jlpYUMun2LzG5zOsjYUMYTJ ofI2z1lsSckB9uTkstiuKGLkT+OCNXe6Hwv7v85fperZ63glt+b62RfjpXaG1DQhd2pt oH802qWAlN3cP7OPYKZmzoby6Y5sbn2Bc2x6Nyel8aTs6gYUCkvhiS9e/rfrxDVrT1Mh +ndA== X-Gm-Message-State: ALoCoQnQIIYEQpH+wM6rKKtku+cxWkF4UeMi/U6ENqWJ/Wm6CUkkQWRxG7zO7yDRl0ocoEvz5gLq X-Received: by 10.55.50.195 with SMTP id y186mr6504751qky.55.1437595210634; Wed, 22 Jul 2015 13:00:10 -0700 (PDT) Received: from [192.168.2.137] (pool-100-4-179-8.albyny.fios.verizon.net. [100.4.179.8]) by smtp.gmail.com with ESMTPSA id 195sm1233504qhr.13.2015.07.22.13.00.08 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 22 Jul 2015 13:00:08 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Prioritize resilvering priority From: Paul Kraus In-Reply-To: Date: Wed, 22 Jul 2015 16:00:06 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <96FF6F66-06D3-4CAE-ABE5-C608A9A85F7A@kraus-haus.org> References: <20150722003218.GD41419@in-addr.com> To: FreeBSD Filesystems X-Mailer: Apple Mail (2.1878.6) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jul 2015 20:06:03 -0000 On Jul 22, 2015, at 14:52, javocado wrote: > But I do have: > vfs.zfs.vdev.max_pending: 10 (dynamic) > vfs.zfs.scrub_limit: 10 (loader) >=20 > So, I think I would want to lower one or both of these to increase I/O > responsiveness on the system. Correct? How would the 2 play together = in > terms of which to adjust to achieve the best system performance at the > expense of a longer resilver? vfs.zfs.vdev.max_pending is the limit on the number of disk I/O that can = be outstanding for a drive (or, IIRC, in this case a given vdev). There = has been great debate over tuning this one years ago on the zfs list. = The general consensus is that 10 is a good value for modern SATA drives. = When I was running 4 SATA drives behind a port multiplier (not a great = configuration) I tuned this down to 4 to keep from overwhelming the port = multiplier. Tuning it _down_ will reduce overall throughput to a drive. = It does not differentiate between production I/O and scrub / resilver = I/O. This post: = https://forums.freebsd.org/threads/how-to-limit-scrub-bandwidth-vfs-zfs-sc= rub_limit.31628/ Implies that the vfs.zfs.scrub_limit parameter limits the number of = outstanding I/O but just for scrub / resilver operations. I would start = by tuning it down to 5 or so and watch carefully with iostat -x to see = the effect. Note that newer ZFS code addresses the scrub operation starving the rest = of the system from I/O. I have not had a problem on either my FBSD 9 or = 10 systems. -- Paul Kraus paul@kraus-haus.org From owner-freebsd-fs@freebsd.org Wed Jul 22 21:08:38 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1F7819A870C; Wed, 22 Jul 2015 21:08:38 +0000 (UTC) (envelope-from dieterbsd@gmail.com) Received: from mail-ie0-x22d.google.com (mail-ie0-x22d.google.com [IPv6:2607:f8b0:4001:c03::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E40D2121A; Wed, 22 Jul 2015 21:08:37 +0000 (UTC) (envelope-from dieterbsd@gmail.com) Received: by iebmu5 with SMTP id mu5so176876874ieb.1; Wed, 22 Jul 2015 14:08:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=qn7iZ4+OdmcVtT8/ptX9VGB5otf01eqiJ4fG3gRKVG8=; b=UdOPsea3nb6DHkBoO00v+v2F6NHDjcN4mJsxOaHpg4MkJ8u8VAeiW6y1d6G3+sUwFT cHO1zpwyLe1oNtK3KpzemrbWiKvHSHxMMJtErM9H03xSRLosmoyy7nDYN/QV1K2cIy2r zXf7GQMu+/0jO/eIs/MSDaotgXeTU3ZaHR+GFgUoXjaOWhV6/npHy3208l20g2Dsr1q1 H2usvWsLVTpr9thZsAnDmT/pU7pC5DWwoWtMPU6FAK+1iPyX56R/2Au+VeEYiep/K8Q+ mJph0bnggk0T52cXxlzfZbu3c3HrIQ0PKrJOdnybMxWH8BGDDu+RVkoNIMLLiamDivWM +A3w== MIME-Version: 1.0 X-Received: by 10.50.44.8 with SMTP id a8mr9359395igm.70.1437599317104; Wed, 22 Jul 2015 14:08:37 -0700 (PDT) Received: by 10.64.2.132 with HTTP; Wed, 22 Jul 2015 14:08:37 -0700 (PDT) Date: Wed, 22 Jul 2015 14:08:37 -0700 Message-ID: Subject: Re: format/newfs larger external consumer drives From: Dieter BSD To: freebsd-hackers@freebsd.org, freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jul 2015 21:08:38 -0000 Don whY asks: > So, fsck's effort (and execution *time*) is based *mostly* on inodes? I don't know about *mostly*, but reducing the number of inodes significantly reduced fsck time for me. > In my case, there isn't any real physical room for an internal disk. > I'm using Dell FX160's -- they'll support a SATA laptop drive Assuming that "SATA laptop drive" means 2.5" form factor, just add a 6 TB SSD. (available this month, but the price is top secret) > The point is to get rid of the piles of CD/DVD media that I've > accumulated over the years Someone must make a jukebox for those. >> I am very tired of having an entire machine panic just because >> one disk decided to take a nap. This is not how you get 5 9s. :-( > > Or, power glitches, firmware bugs, etc. A UPS eats power glitches for breakfast. Firmware is a problem. More and more mainboards can have FLOSS firmware, but there is also buggy firmware in disks and other devices. Warren helpfully pointed us to > http://sourceforge.net/projects/fuse-ufs2/ Thanks, Warren! Sourceforge was kaput when I tried to check it out. I'm hoping that it is a high quality implementation, and, being fuse, will solve the kernel panic problem. I plan to look at the filesystem regression tests and see if I can think up any additional test cases. Chris H typed: > For the record, SourceForge hasn't been available for quite > some time. Slashdot reports: The short version is that a storage fault led to significant filesystem corruption, and we had to restore a massive amount of data from backups. There's a post at the SourceForge blog going into a bit more detail, and describing the steps our Siteops team took (and is still taking) to restore service. http://sourceforge.net/blog/sourceforge-infrastructure-and-service-restoration/ Last time I looked, some (not all) parts of fuse-ufs2 had recovered. > I thought this (fuse-ufs2) might be a nice addition to the ports > tree, and was hoping to put it there. Sounds good. From owner-freebsd-fs@freebsd.org Wed Jul 22 22:56:09 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 02AF29A8E4B for ; Wed, 22 Jul 2015 22:56:09 +0000 (UTC) (envelope-from dan@langille.org) Received: from clavin2.langille.org (clavin2.langille.org [199.233.228.197]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "clavin.langille.org", Issuer "StartCom Class 2 Primary Intermediate Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D36161E08 for ; Wed, 22 Jul 2015 22:56:08 +0000 (UTC) (envelope-from dan@langille.org) Received: from (clavin2.int.langille.org (clavin2.int.unixathome.org [10.4.7.7]) (Authenticated sender: hidden) with ESMTPSA id 4A5BEA70D for ; Wed, 22 Jul 2015 22:50:13 +0000 (UTC) From: Dan Langille X-Pgp-Agent: GPGMail 2.5 Content-Type: multipart/signed; boundary="Apple-Mail=_DD24DC84-59CE-4E11-B6F0-82D21C34B3B5"; protocol="application/pgp-signature"; micalg=pgp-sha512 Subject: ZFS benchmarking project Date: Wed, 22 Jul 2015 18:50:03 -0400 Message-Id: To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2102\)) X-Mailer: Apple Mail (2.2102) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jul 2015 22:56:09 -0000 --Apple-Mail=_DD24DC84-59CE-4E11-B6F0-82D21C34B3B5 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 I'm setting up a new server. I'd like to see some benchmarks on resilver = times for various configurations (mirror, raidzN) and numbers of drives. I'm looking for input and participation. I've created a few starting = points at https://github.com/dlangille/zfs_benchmarks Hope to see you there. =E2=80=94 Dan Langille http://langille.org/ --Apple-Mail=_DD24DC84-59CE-4E11-B6F0-82D21C34B3B5 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iQJ8BAEBCgBmBQJVsB4jXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQ1MTE2RjM0ODIzRDdERDM4OTY0OUJBNzdF QjIxNTlERUU5NzI3MzlGAAoJEOshWd7pcnOfBu8P/RoMEjVhx/Ru0vvys9oYgrDt EdvnT307+KOOCd0kCLoxs4Xgem0ZblGNIUZJhRQ4ioycQGwJKWEie2G9I3EYyZlt QKt9elP4wBI7WlFuNOzG1SmBCkHwCOiu7/IlZgZtmxNCsfIBKddj9l1PHc2+7Xa4 j8rP7Id47QFl4lEHx49wdH6aig8vsNwfH55l1ku2tbu13l6c3ruBWKyIO3Egxi55 x0RTEj7A0MiigMuGSqFeReuJvKpGN19pRcmLiLFBiHRbqYu8kFTTE/u7T0DmHanW v3ei35gymKn/kO7ghAHVwy3dRx+gbW1G0cpw8Q4jrMLZoY5hoQLgKcFatzfGsa/Q Ag6iDN435bcLUu1rtKaFtRKz4z0y/yB5O7bFDEU2uA5HSvAp/LeihFINdSHik+Jh rW2MTdd2DYd+j6+Ot7Pc4k8I1e1wZTthNPSrXfymlKEUfIRhMl/TdG65IV/MabAM B8lXNXJhCB666qXND3zjDC5s8W3YX2tYQ3TvOXzVUiRb2fm5k+vnQWyjEiqNqPfv 4IkqgKnsBpT8G5riFbMcr1iz4pkKoH2C9kG1oB2xDAoFsVltMN2dPJcx9jRKbbIi xuPrQmLANTqkwSYLbvjeQZ30TT+6olrShrrrbB6rps6ywe1Hiu7j/4bjxEYSs6jr xMOMaqrdbCfsHnBtfQbo =vQEr -----END PGP SIGNATURE----- --Apple-Mail=_DD24DC84-59CE-4E11-B6F0-82D21C34B3B5-- From owner-freebsd-fs@freebsd.org Wed Jul 22 23:45:55 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A26919A7AA7; Wed, 22 Jul 2015 23:45:55 +0000 (UTC) (envelope-from f0andrey@gmail.com) Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com [IPv6:2a00:1450:400c:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 39A6B1A51; Wed, 22 Jul 2015 23:45:55 +0000 (UTC) (envelope-from f0andrey@gmail.com) Received: by wicgb10 with SMTP id gb10so119569250wic.1; Wed, 22 Jul 2015 16:45:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=3J+diC6s0wkkWrFSHmivxGwEsZEFb93bhDsoo5dYB8E=; b=ps8T0pB/BFU9HR553I12qirAEXdI5bJi402rwhutp1rlGMDqG6aIzJxd5VAaFijRCn +QdxsLdOAbwkOFT2tYEmyg5sa6yddlQfa4kSerFwHdkxMthHwik7Noht1w9NLoT/TskL 3znsTBtj7FxdpRhbYSMUCBd3vBLVqHstn3kcyAQGncbM23YSCJA5a/9syY7zXIZR0GHL u5dD+ZtsGfeDuiUOyN0VkHgHcwjwuEFGiiq3NjO2+VlpoNr3+BbCiK6/MoxnbXU9SnlQ 25j0UeSx9hjILpUOwpGKEDOGkF2SxkgePAb1fjTJwS+V3POjAB/gatgdt8+E8GJVs94p tS8g== MIME-Version: 1.0 X-Received: by 10.194.87.4 with SMTP id t4mr10803749wjz.84.1437608753512; Wed, 22 Jul 2015 16:45:53 -0700 (PDT) Received: by 10.194.64.102 with HTTP; Wed, 22 Jul 2015 16:45:53 -0700 (PDT) Date: Thu, 23 Jul 2015 02:45:53 +0300 Message-ID: Subject: forgotten? Bug 183234 - newfs_msdos(8): [patch] From: Andrey Fesenko To: freebsd-fs@freebsd.org, "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jul 2015 23:45:55 -0000 More 2 year not apply working patch fixed Bug 183234 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183234 - newfs_msdos(8): [patch] can't boot BeagleBone Black when newfs_msdos is trimming to a multiple of sectors per track From owner-freebsd-fs@freebsd.org Thu Jul 23 05:46:03 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 120369A6183 for ; Thu, 23 Jul 2015 05:46:03 +0000 (UTC) (envelope-from simon.brugger@gmail.com) Received: from mail-qk0-x231.google.com (mail-qk0-x231.google.com [IPv6:2607:f8b0:400d:c09::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D6C9218B1 for ; Thu, 23 Jul 2015 05:46:02 +0000 (UTC) (envelope-from simon.brugger@gmail.com) Received: by qkfc129 with SMTP id c129so124638447qkf.1 for ; Wed, 22 Jul 2015 22:46:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=BqApJ77jT+BGIDrlBwLPRWS/cb08rcy4cNHsew1kAyI=; b=A6VcImJY0BaXPlrGwqXP29d4QNxaD2rNRj/DMnjwr2Fmvl8hNP7Jcr5rDQNBtYqDuj PI3fPwyBxfTiLkkb2tswJCXRjbeeLA7QZycXzJ80ZzDqCkR4cagOD7hej1OChreemH+6 +mOHOfbwE2ePN4cylarUlc7op7jgsj2O3kenYQLe5oSQH51JL996qOX1x2uazdm6AArp 1b0CylIspW+rbJBs5WYJcetHA+qpLODtSigN7BUrMI+9RkHGm0BgalhLs+mEaxXYEXo1 2lEPz1JnIUbRS40QInJzEUSXUxz6zUW3fyvJEDA7GLik9cxj3V3daUrXBDpIchHeozVZ eoCQ== MIME-Version: 1.0 X-Received: by 10.140.234.10 with SMTP id f10mr5543192qhc.11.1437630361951; Wed, 22 Jul 2015 22:46:01 -0700 (PDT) Received: by 10.140.39.202 with HTTP; Wed, 22 Jul 2015 22:46:01 -0700 (PDT) Date: Wed, 22 Jul 2015 22:46:01 -0700 Message-ID: Subject: Need help to understand inodes in FAT implementation From: Simon Brugger To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jul 2015 05:46:03 -0000 The FAT filesystem is a design from Microsoft and the whole Windows universe. As you probably know, FAT does not support Inodes like ext3 or other UNIX/LINUX filesystem. The FreeBSD kernel provide a FAT filesystem driver but I've problems to understand the code and need help for answering the following questions: - Inodes must be generated on the fly by file access. When exactly does this happened? - How does FreeBSD generate Inodes? - How does the kernel FAT driver manage the file descriptors? Thank you very much for your help. From owner-freebsd-fs@freebsd.org Thu Jul 23 06:41:44 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 990C79A6A78 for ; Thu, 23 Jul 2015 06:41:44 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 24C3E1E43 for ; Thu, 23 Jul 2015 06:41:43 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id t6N6fdnI001597 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 23 Jul 2015 09:41:39 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t6N6fdnI001597 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id t6N6fdT3001596; Thu, 23 Jul 2015 09:41:39 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 23 Jul 2015 09:41:39 +0300 From: Konstantin Belousov To: Simon Brugger Cc: freebsd-fs@freebsd.org Subject: Re: Need help to understand inodes in FAT implementation Message-ID: <20150723064139.GL2072@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jul 2015 06:41:44 -0000 On Wed, Jul 22, 2015 at 10:46:01PM -0700, Simon Brugger wrote: > The FAT filesystem is a design from Microsoft and the whole Windows > universe. As you probably know, FAT does not support Inodes like ext3 or > other UNIX/LINUX filesystem. > > The FreeBSD kernel provide a FAT filesystem driver but I've problems to > understand the code and need help for answering the following questions: > > - Inodes must be generated on the fly by file access. When exactly does > this happened? Inodes are not generated. The inode numbers are. Look at the sys/fs/msdosfs/msdosfs_vnops.c:msdosfs_getattr() to see how. The struct vattr va_fileid field is returned from VOP_GETATTR() to userspace as the struct stat st_ino. > - How does FreeBSD generate Inodes? > - How does the kernel FAT driver manage the file descriptors? Kernel FAT driver does not manage file descriptors. From owner-freebsd-fs@freebsd.org Thu Jul 23 10:41:10 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8DA899A8CD6 for ; Thu, 23 Jul 2015 10:41:10 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 2D3FA12BE for ; Thu, 23 Jul 2015 10:41:09 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CsBACew7BV/61jaINVAwODaWMGBoMdr1iIX4FrCoU3SgKCBBMBAQEBAQEBgQqEJAEBBAEBASArIAsFCwIBCA4KAgINGQICJwEJJgIECAcEARwEh3gDEggFtHWQYg2FLgEBAQEBBQEBAQEBAQEbgSKKKoJNCoFDCRACAQUIAQ4kEAcSgleBQwWHEAKGVIZ6hHaCYoIIhB5Gg1eMEwmDPYNfAiaCDRyBbyIxB39BgQQBAQE X-IronPort-AV: E=Sophos;i="5.15,529,1432612800"; d="scan'208";a="227700787" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 23 Jul 2015 06:41:01 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id A178015F561; Thu, 23 Jul 2015 06:41:01 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Zu7vZTjmLv8K; Thu, 23 Jul 2015 06:41:00 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 2603415F563; Thu, 23 Jul 2015 06:41:00 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id ROIQnAPDxoVx; Thu, 23 Jul 2015 06:41:00 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 05D3D15F561; Thu, 23 Jul 2015 06:41:00 -0400 (EDT) Date: Thu, 23 Jul 2015 06:40:59 -0400 (EDT) From: Rick Macklem To: Ahmed Kamal Cc: Graham Allan , Ahmed Kamal via freebsd-fs Message-ID: <1474771205.1788105.1437648059578.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: Linux NFSv4 clients are getting (bad sequence-id error!) Thread-Index: oRHe6SbNcUnodM43WYKJhL8qkUyLCA== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jul 2015 10:41:10 -0000 Ahmed Kamal wrote: > rhel6 servers logs were flooded with errors like: http://paste2.org/EwLGcGF6 > The Freebsd box was being pounded with 40Mbps of nfs traffic .. probably > Linux was retrying too hard ?! I had to reboot all PCs and after the last > one, nfsd CPU usage dropped immediately to zero > Btw, it would be interesting to know what triggers these things (overload of the nfs server resulting in very slow response or ???). Basically Destroy_session isn't an operation that a client would normally do. I have no idea why the Linux client would do it. (A session is what achieves the "exactly once" semantics for the RPCs. It should really be in the RPC layer, but the NFSv4 working group put it in NFSv4.1 because they didn't want to replace Sun RPC. I can't think of a reason to destroy a session except on dismount. Maybe if the client thinks the session is broken for some reason??) Maybe something like "vmstat -m", "vmstat -z" and "nfsstat -s -e" running repeatedly (once/sec with timestamps via "date" or similar) so that you can see what was happening just before the meltdowns. A raw packet trace of just when the meltdown starts would be useful, but I can't think of how you'd get one of reasonable size. Maybe having "tcpdump -s 0 -w .pcap " run for 1sec and then kill/restart it repeatedly with different file names, so you might get a useful 1sec capture at the critical time? Anyhow, good luck with it, rick > On Tue, Jul 21, 2015 at 5:52 AM, Ahmed Kamal < > email.ahmedkamal@googlemail.com> wrote: > > > More info .. Just noticed nfsd is spinning the cpu at 500% :( I just did > > the dtrace with: > > > > dtrace -n profile-1001 { @[stack()] = count(); } > > The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom) > > > > Since rebooting the nfs server didn't fix it .. I imagine I'd have to > > reboot all NFS clients .. This would be really sad .. Any advice is most > > appreciated .. Thanks > > > > > > On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal < > > email.ahmedkamal@googlemail.com> wrote: > > > >> Hi folks, > >> > >> I've upgraded a test client to rhel6 today, and I'll keep an eye on it to > >> see what happens. > >> > >> During the process, I made the (I guess mistake) of zfs send | recv to a > >> locally attached usb disk for backup purposes .. long story short, > >> sharenfs > >> property on the received filesystem was causing some nfs/mountd errors in > >> logs .. I wasn't too happy with what I got .. I destroyed the backup > >> datasets and the whole pool eventually .. and then rebooted the whole nas > >> box .. After reboot my logs are still flooded with > >> > >> Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > >> Jul 21 05:13:07 nas last message repeated 7536 times > >> Jul 21 05:15:08 nas last message repeated 29664 times > >> > >> Not sure what that means .. or how it can be stopped .. Anyway, will keep > >> you posted on progress. > >> > >> On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem > >> wrote: > >> > >>> Graham Allan wrote: > >>> > I'm curious how things are going for you with this? > >>> > > >>> > Reading your thread did pique my interest since we have a lot of > >>> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I meant > >>> > to glance through our logs for signs of the same issue, but today I > >>> > started investigating a machine which appeared to have hung processes, > >>> > high rpciod load, and high traffic to the NFS server. Of course it is > >>> > exactly this issue. > >>> > > >>> > The affected machine is running SL5 though most of our server nodes are > >>> > now SL6. I can see errors from most of them but the SL6 systems appear > >>> > less affected - I see a stream of the sequence-id errors in their logs > >>> but > >>> > things in general keep working. The one SL5 machine I'm looking at > >>> > has a single sequence-id error in today's logs, but then goes into a > >>> > stream of "state recovery failed" then "Lock reclaim failed". It's > >>> > probably partly related to the particular workload on this machine. > >>> > > >>> > I would try switching our SL6 machines to NFS 4.1 to see if the > >>> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is it in > >>> > 10.1?). > >>> > > >>> Btw, I've done some testing against a fairly recent Fedora and haven't > >>> seen > >>> the problem. If either of you guys could load a recent Fedora on a test > >>> client > >>> box, it would be interesting to see if it suffers from this. (My > >>> experience is > >>> that the Fedora distros have more up to date Linux NFS clients.) > >>> > >>> rick > >>> > >>> > At the NFS servers, most of the sysctl settings are already tuned > >>> > from defaults. eg tcp.highwater=100000, vfs.nfsd.tcpcachetimeo=300, > >>> > 128-256 nfs kernel threads. > >>> > > >>> > Graham > >>> > > >>> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via freebsd-fs > >>> wrote: > >>> > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming > >>> > > reports from users about hung vnc sessions. So maybe just maybe, > >>> linux > >>> > > clients are able to somehow recover from this bad sequence messages. > >>> I > >>> > > could still see the bad sequence error message in logs though > >>> > > > >>> > > Why isn't the highwater tunable set to something better by default ? > >>> I mean > >>> > > this server is certainly not under a high or unusual load (it's only > >>> 40 PCs > >>> > > mounting from it) > >>> > > > >>> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal > >>> > > >>> > > > wrote: > >>> > > > >>> > > > Thanks all .. I understand now we're doing the "right thing" .. > >>> Although > >>> > > > if mounting keeps wedging, I will have to solve it somehow! Either > >>> using > >>> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > >>> > > > > >>> > > > Regarding Xin's patch, is it possible to build the patched nfsd > >>> code, as > >>> > > > a > >>> > > > kernel module ? I'm looking to minimize my delta to upstream. > >>> > > > > >>> > > > Also would adopting Xin's patch and hiding it behind a > >>> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not > >>> the > >>> > > > last > >>> > > > person on earth to hit this) ? > >>> > > > > >>> > > > Thanks a lot for all the help! > >>> > > > > >>> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < > >>> rmacklem@uoguelph.ca> > >>> > >>> > > > wrote: > >>> > > > > >>> > > >> Ahmed Kamal wrote: > >>> > > >> > Appreciating the fruitful discussion! Can someone please > >>> explain to > >>> > > >> > me, > >>> > > >> > what would happen in the current situation (linux client doing > >>> this > >>> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect > >>> of > >>> > > >> > that? > >>> > > >> Well, as you've seen, the Linux client doesn't function correctly > >>> > > >> against > >>> > > >> the FreeBSD server (and probably others that don't support this > >>> > > >> "skip-by-1" > >>> > > >> case). > >>> > > >> > >>> > > >> > What do users see? Any chances of data loss? > >>> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what > >>> the > >>> > > >> Linux > >>> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the > >>> guy > >>> > > >> observing > >>> > > >> it. > >>> > > >> > >>> > > >> > > >>> > > >> > Also, I find it strange that netapp have acknowledged this is a > >>> bug on > >>> > > >> > their side, which has been fixed since then! > >>> > > >> Yea, I think Netapp screwed up. For some reason their server > >>> allowed > >>> > > >> this, > >>> > > >> then was fixed to not allow it and then someone decided that was > >>> broken > >>> > > >> and > >>> > > >> reversed it. > >>> > > >> > >>> > > >> > I also find it strange that I'm the first to hit this :) Is no > >>> one > >>> > > >> running > >>> > > >> > nfs4 yet! > >>> > > >> > > >>> > > >> Well, it seems to be slowly catching on. I suspect that the Linux > >>> client > >>> > > >> mounting a Netapp is the most common use of it. Since it appears > >>> that > >>> > > >> they > >>> > > >> flip flopped w.r.t. who's bug this is, it has probably persisted. > >>> > > >> > >>> > > >> It may turn out that the Linux client has been fixed or it may > >>> turn out > >>> > > >> that most servers allowed this "skip-by-1" even though David > >>> Noveck (one > >>> > > >> of the main authors of the protocol) seems to agree with me that > >>> it > >>> > > >> should > >>> > > >> not be allowed. > >>> > > >> > >>> > > >> It is possible that others have bumped into this, but it wasn't > >>> isolated > >>> > > >> (I wouldn't have guessed it, so it was good you pointed to the > >>> RedHat > >>> > > >> discussion) > >>> > > >> and they worked around it by reverting to NFSv3 or similar. > >>> > > >> The protocol is rather complex in this area and changed > >>> completely for > >>> > > >> NFSv4.1, > >>> > > >> so many have also probably moved onto NFSv4.1 where this won't be > >>> an > >>> > > >> issue. > >>> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and > >>> doesn't > >>> > > >> use > >>> > > >> these seqid fields.) > >>> > > >> > >>> > > >> This is all just mho, rick > >>> > > >> > >>> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < > >>> rmacklem@uoguelph.ca> > >>> > > >> wrote: > >>> > > >> > > >>> > > >> > > Julian Elischer wrote: > >>> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > >>> > > >> > > > > I am going to post to nfsv4@ietf.org to see what they > >>> say. > >>> > > >> > > > > Please > >>> > > >> > > > > let me know if Xin Li's patch resolves your problem, even > >>> though > >>> > > >> > > > > I > >>> > > >> > > > > don't believe it is correct except for the UINT32_MAX > >>> case. Good > >>> > > >> > > > > luck with it, rick > >>> > > >> > > > and please keep us all in the loop as to what they say! > >>> > > >> > > > > >>> > > >> > > > the general N+2 bit sounds like bullshit to me.. its always > >>> N+1 in > >>> > > >> > > > a > >>> > > >> > > > number field that has a > >>> > > >> > > > bit of slack at wrap time (probably due to some ambiguity > >>> in the > >>> > > >> > > > original spec). > >>> > > >> > > > > >>> > > >> > > Actually, since N is the lock op already done, N + 1 is the > >>> next > >>> > > >> > > lock > >>> > > >> > > operation in order. Since lock ops need to be strictly > >>> ordered, > >>> > > >> allowing > >>> > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no > >>> sense. > >>> > > >> > > > >>> > > >> > > I think the author of the RFC meant that N + 2 or greater > >>> fails, but > >>> > > >> it > >>> > > >> > > was poorly worded. > >>> > > >> > > > >>> > > >> > > I will pass along whatever I get from nfsv4@ietf.org. (There > >>> is an > >>> > > >> archive > >>> > > >> > > of it somewhere, but I can't remember where.;-) > >>> > > >> > > > >>> > > >> > > rick > >>> > > >> > > _______________________________________________ > >>> > > >> > > freebsd-fs@freebsd.org mailing list > >>> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> > > >> > > To unsubscribe, send any mail to > >>> > > >> > > "freebsd-fs-unsubscribe@freebsd.org" > >>> > > >> > > > >>> > > >> > > >>> > > >> > >>> > > > > >>> > > > > >>> > > _______________________________________________ > >>> > > freebsd-fs@freebsd.org mailing list > >>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org > >>> " > >>> > > >>> > -- > >>> > > >>> ------------------------------------------------------------------------- > >>> > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612) 624-5040 > >>> > School of Physics and Astronomy - University of Minnesota > >>> > > >>> ------------------------------------------------------------------------- > >>> > _______________________________________________ > >>> > freebsd-fs@freebsd.org mailing list > >>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >>> > > >>> _______________________________________________ > >>> freebsd-fs@freebsd.org mailing list > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >>> > >> > >> > > > From owner-freebsd-fs@freebsd.org Thu Jul 23 15:27:12 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2C5C09A810C for ; Thu, 23 Jul 2015 15:27:12 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wi0-x236.google.com (mail-wi0-x236.google.com [IPv6:2a00:1450:400c:c05::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 900261E99 for ; Thu, 23 Jul 2015 15:27:11 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wibud3 with SMTP id ud3so224572288wib.0 for ; Thu, 23 Jul 2015 08:27:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=hq8gj35xDIKhSmw2RZm41No0ZXdWodLndJgJS7oCf9A=; b=ozxt04nodg8TawXhw9pSkliGJY8lxJaXMxIywq/1DAoSsGWKWw4PYPFrkVVmrNCbgz vvAzZRMR5z0c5al7ydzfs6iSmyGYahda1Vk2zb0JdMbGCwLFHLV821guP87IRQyEZ44B 364IcnyKVIRRaEPnxlQhl5kP+nw7Bkv58TkxjuHzZsKp6fsAb4ys9AuHxc+L1/4iQ6yG ZU9ASWyCg9wX2xEuvD+G4uHS7XhdLTSLi4qtjAAbE8rYe70DNfQrRcOwD1tcqppr1KOn YYZuxlHXR5V/I26OXRF9Wyzsmk1VrVLnGv69kaEPQPlxhCws1QeKmcyH1zlyJTBSPo1O 0ARg== X-Received: by 10.194.192.33 with SMTP id hd1mr17663347wjc.96.1437665229769; Thu, 23 Jul 2015 08:27:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Thu, 23 Jul 2015 08:26:50 -0700 (PDT) In-Reply-To: <1474771205.1788105.1437648059578.JavaMail.zimbra@uoguelph.ca> References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> <1474771205.1788105.1437648059578.JavaMail.zimbra@uoguelph.ca> From: Ahmed Kamal Date: Thu, 23 Jul 2015 17:26:50 +0200 Message-ID: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) To: Rick Macklem Cc: Graham Allan , Ahmed Kamal via freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jul 2015 15:27:12 -0000 Well .. The problem is now gone, so I guess I can't collect more data till it happens (or hopefully doesn't :) happen again .. So as I described, I had to restart the FreeBSD NFS server box first .. maybe this caused linux clients to give up after 5 mins, and attempt to destroy the session ? When the NFS server was back up .. It was being bombarded (50Mbps traffic) with rpc traffic, probably saying this "destroy session" message. What I don't understand however is, why doesn't this end. What does FreeBSD reply with? Shouldn't it say, Okay, I don't know anything about this session, so consider it destroyed .. suit yourself linux .. or does it refuse to destroy, causing Linux to keep on retrying like crazy ? On Thu, Jul 23, 2015 at 12:40 PM, Rick Macklem wrote: > Ahmed Kamal wrote: > > rhel6 servers logs were flooded with errors like: > http://paste2.org/EwLGcGF6 > > The Freebsd box was being pounded with 40Mbps of nfs traffic .. probably > > Linux was retrying too hard ?! I had to reboot all PCs and after the last > > one, nfsd CPU usage dropped immediately to zero > > > Btw, it would be interesting to know what triggers these things (overload > of > the nfs server resulting in very slow response or ???). Basically > Destroy_session > isn't an operation that a client would normally do. I have no idea why the > Linux > client would do it. (A session is what achieves the "exactly once" > semantics for > the RPCs. It should really be in the RPC layer, but the NFSv4 working > group put > it in NFSv4.1 because they didn't want to replace Sun RPC. I can't think > of a reason > to destroy a session except on dismount. Maybe if the client thinks the > session is > broken for some reason??) > > Maybe something like "vmstat -m", "vmstat -z" and "nfsstat -s -e" running > repeatedly > (once/sec with timestamps via "date" or similar) so that you can see what > was happening just > before the meltdowns. > > A raw packet trace of just when the meltdown starts would be useful, but I > can't think > of how you'd get one of reasonable size. Maybe having "tcpdump -s 0 -w > .pcap " > run for 1sec and then kill/restart it repeatedly with different file > names, so you might get > a useful 1sec capture at the critical time? > > Anyhow, good luck with it, rick > > > On Tue, Jul 21, 2015 at 5:52 AM, Ahmed Kamal < > > email.ahmedkamal@googlemail.com> wrote: > > > > > More info .. Just noticed nfsd is spinning the cpu at 500% :( I just > did > > > the dtrace with: > > > > > > dtrace -n profile-1001 { @[stack()] = count(); } > > > The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom) > > > > > > Since rebooting the nfs server didn't fix it .. I imagine I'd have to > > > reboot all NFS clients .. This would be really sad .. Any advice is > most > > > appreciated .. Thanks > > > > > > > > > On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal < > > > email.ahmedkamal@googlemail.com> wrote: > > > > > >> Hi folks, > > >> > > >> I've upgraded a test client to rhel6 today, and I'll keep an eye on > it to > > >> see what happens. > > >> > > >> During the process, I made the (I guess mistake) of zfs send | recv > to a > > >> locally attached usb disk for backup purposes .. long story short, > > >> sharenfs > > >> property on the received filesystem was causing some nfs/mountd > errors in > > >> logs .. I wasn't too happy with what I got .. I destroyed the backup > > >> datasets and the whole pool eventually .. and then rebooted the whole > nas > > >> box .. After reboot my logs are still flooded with > > >> > > >> Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > > >> Jul 21 05:13:07 nas last message repeated 7536 times > > >> Jul 21 05:15:08 nas last message repeated 29664 times > > >> > > >> Not sure what that means .. or how it can be stopped .. Anyway, will > keep > > >> you posted on progress. > > >> > > >> On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem > > >> wrote: > > >> > > >>> Graham Allan wrote: > > >>> > I'm curious how things are going for you with this? > > >>> > > > >>> > Reading your thread did pique my interest since we have a lot of > > >>> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I > meant > > >>> > to glance through our logs for signs of the same issue, but today I > > >>> > started investigating a machine which appeared to have hung > processes, > > >>> > high rpciod load, and high traffic to the NFS server. Of course it > is > > >>> > exactly this issue. > > >>> > > > >>> > The affected machine is running SL5 though most of our server > nodes are > > >>> > now SL6. I can see errors from most of them but the SL6 systems > appear > > >>> > less affected - I see a stream of the sequence-id errors in their > logs > > >>> but > > >>> > things in general keep working. The one SL5 machine I'm looking at > > >>> > has a single sequence-id error in today's logs, but then goes into > a > > >>> > stream of "state recovery failed" then "Lock reclaim failed". It's > > >>> > probably partly related to the particular workload on this machine. > > >>> > > > >>> > I would try switching our SL6 machines to NFS 4.1 to see if the > > >>> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is > it in > > >>> > 10.1?). > > >>> > > > >>> Btw, I've done some testing against a fairly recent Fedora and > haven't > > >>> seen > > >>> the problem. If either of you guys could load a recent Fedora on a > test > > >>> client > > >>> box, it would be interesting to see if it suffers from this. (My > > >>> experience is > > >>> that the Fedora distros have more up to date Linux NFS clients.) > > >>> > > >>> rick > > >>> > > >>> > At the NFS servers, most of the sysctl settings are already tuned > > >>> > from defaults. eg tcp.highwater=100000, vfs.nfsd.tcpcachetimeo=300, > > >>> > 128-256 nfs kernel threads. > > >>> > > > >>> > Graham > > >>> > > > >>> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via > freebsd-fs > > >>> wrote: > > >>> > > PS: Today (after adjusting tcp.highwater) I didn't get any > screaming > > >>> > > reports from users about hung vnc sessions. So maybe just maybe, > > >>> linux > > >>> > > clients are able to somehow recover from this bad sequence > messages. > > >>> I > > >>> > > could still see the bad sequence error message in logs though > > >>> > > > > >>> > > Why isn't the highwater tunable set to something better by > default ? > > >>> I mean > > >>> > > this server is certainly not under a high or unusual load (it's > only > > >>> 40 PCs > > >>> > > mounting from it) > > >>> > > > > >>> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal > > >>> > > > >>> > > > wrote: > > >>> > > > > >>> > > > Thanks all .. I understand now we're doing the "right thing" .. > > >>> Although > > >>> > > > if mounting keeps wedging, I will have to solve it somehow! > Either > > >>> using > > >>> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > > >>> > > > > > >>> > > > Regarding Xin's patch, is it possible to build the patched nfsd > > >>> code, as > > >>> > > > a > > >>> > > > kernel module ? I'm looking to minimize my delta to upstream. > > >>> > > > > > >>> > > > Also would adopting Xin's patch and hiding it behind a > > >>> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably > not > > >>> the > > >>> > > > last > > >>> > > > person on earth to hit this) ? > > >>> > > > > > >>> > > > Thanks a lot for all the help! > > >>> > > > > > >>> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < > > >>> rmacklem@uoguelph.ca> > > >>> > > >>> > > > wrote: > > >>> > > > > > >>> > > >> Ahmed Kamal wrote: > > >>> > > >> > Appreciating the fruitful discussion! Can someone please > > >>> explain to > > >>> > > >> > me, > > >>> > > >> > what would happen in the current situation (linux client > doing > > >>> this > > >>> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the > effect > > >>> of > > >>> > > >> > that? > > >>> > > >> Well, as you've seen, the Linux client doesn't function > correctly > > >>> > > >> against > > >>> > > >> the FreeBSD server (and probably others that don't support > this > > >>> > > >> "skip-by-1" > > >>> > > >> case). > > >>> > > >> > > >>> > > >> > What do users see? Any chances of data loss? > > >>> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess > what > > >>> the > > >>> > > >> Linux > > >>> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're > the > > >>> guy > > >>> > > >> observing > > >>> > > >> it. > > >>> > > >> > > >>> > > >> > > > >>> > > >> > Also, I find it strange that netapp have acknowledged this > is a > > >>> bug on > > >>> > > >> > their side, which has been fixed since then! > > >>> > > >> Yea, I think Netapp screwed up. For some reason their server > > >>> allowed > > >>> > > >> this, > > >>> > > >> then was fixed to not allow it and then someone decided that > was > > >>> broken > > >>> > > >> and > > >>> > > >> reversed it. > > >>> > > >> > > >>> > > >> > I also find it strange that I'm the first to hit this :) Is > no > > >>> one > > >>> > > >> running > > >>> > > >> > nfs4 yet! > > >>> > > >> > > > >>> > > >> Well, it seems to be slowly catching on. I suspect that the > Linux > > >>> client > > >>> > > >> mounting a Netapp is the most common use of it. Since it > appears > > >>> that > > >>> > > >> they > > >>> > > >> flip flopped w.r.t. who's bug this is, it has probably > persisted. > > >>> > > >> > > >>> > > >> It may turn out that the Linux client has been fixed or it may > > >>> turn out > > >>> > > >> that most servers allowed this "skip-by-1" even though David > > >>> Noveck (one > > >>> > > >> of the main authors of the protocol) seems to agree with me > that > > >>> it > > >>> > > >> should > > >>> > > >> not be allowed. > > >>> > > >> > > >>> > > >> It is possible that others have bumped into this, but it > wasn't > > >>> isolated > > >>> > > >> (I wouldn't have guessed it, so it was good you pointed to the > > >>> RedHat > > >>> > > >> discussion) > > >>> > > >> and they worked around it by reverting to NFSv3 or similar. > > >>> > > >> The protocol is rather complex in this area and changed > > >>> completely for > > >>> > > >> NFSv4.1, > > >>> > > >> so many have also probably moved onto NFSv4.1 where this > won't be > > >>> an > > >>> > > >> issue. > > >>> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics > and > > >>> doesn't > > >>> > > >> use > > >>> > > >> these seqid fields.) > > >>> > > >> > > >>> > > >> This is all just mho, rick > > >>> > > >> > > >>> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < > > >>> rmacklem@uoguelph.ca> > > >>> > > >> wrote: > > >>> > > >> > > > >>> > > >> > > Julian Elischer wrote: > > >>> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > > >>> > > >> > > > > I am going to post to nfsv4@ietf.org to see what they > > >>> say. > > >>> > > >> > > > > Please > > >>> > > >> > > > > let me know if Xin Li's patch resolves your problem, > even > > >>> though > > >>> > > >> > > > > I > > >>> > > >> > > > > don't believe it is correct except for the UINT32_MAX > > >>> case. Good > > >>> > > >> > > > > luck with it, rick > > >>> > > >> > > > and please keep us all in the loop as to what they say! > > >>> > > >> > > > > > >>> > > >> > > > the general N+2 bit sounds like bullshit to me.. its > always > > >>> N+1 in > > >>> > > >> > > > a > > >>> > > >> > > > number field that has a > > >>> > > >> > > > bit of slack at wrap time (probably due to some > ambiguity > > >>> in the > > >>> > > >> > > > original spec). > > >>> > > >> > > > > > >>> > > >> > > Actually, since N is the lock op already done, N + 1 is > the > > >>> next > > >>> > > >> > > lock > > >>> > > >> > > operation in order. Since lock ops need to be strictly > > >>> ordered, > > >>> > > >> allowing > > >>> > > >> > > N + 2 (which means N + 2 would be done before N + 1) > makes no > > >>> sense. > > >>> > > >> > > > > >>> > > >> > > I think the author of the RFC meant that N + 2 or greater > > >>> fails, but > > >>> > > >> it > > >>> > > >> > > was poorly worded. > > >>> > > >> > > > > >>> > > >> > > I will pass along whatever I get from nfsv4@ietf.org. > (There > > >>> is an > > >>> > > >> archive > > >>> > > >> > > of it somewhere, but I can't remember where.;-) > > >>> > > >> > > > > >>> > > >> > > rick > > >>> > > >> > > _______________________________________________ > > >>> > > >> > > freebsd-fs@freebsd.org mailing list > > >>> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>> > > >> > > To unsubscribe, send any mail to > > >>> > > >> > > "freebsd-fs-unsubscribe@freebsd.org" > > >>> > > >> > > > > >>> > > >> > > > >>> > > >> > > >>> > > > > > >>> > > > > > >>> > > _______________________________________________ > > >>> > > freebsd-fs@freebsd.org mailing list > > >>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>> > > To unsubscribe, send any mail to " > freebsd-fs-unsubscribe@freebsd.org > > >>> " > > >>> > > > >>> > -- > > >>> > > > >>> > ------------------------------------------------------------------------- > > >>> > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612) > 624-5040 > > >>> > School of Physics and Astronomy - University of Minnesota > > >>> > > > >>> > ------------------------------------------------------------------------- > > >>> > _______________________________________________ > > >>> > freebsd-fs@freebsd.org mailing list > > >>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>> > To unsubscribe, send any mail to " > freebsd-fs-unsubscribe@freebsd.org" > > >>> > > > >>> _______________________________________________ > > >>> freebsd-fs@freebsd.org mailing list > > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org > " > > >>> > > >> > > >> > > > > > > From owner-freebsd-fs@freebsd.org Thu Jul 23 17:55:20 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C6C929A9D41 for ; Thu, 23 Jul 2015 17:55:20 +0000 (UTC) (envelope-from m.e.sanliturk@gmail.com) Received: from mail-ie0-x230.google.com (mail-ie0-x230.google.com [IPv6:2607:f8b0:4001:c03::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 853B9144C for ; Thu, 23 Jul 2015 17:55:20 +0000 (UTC) (envelope-from m.e.sanliturk@gmail.com) Received: by iebmu5 with SMTP id mu5so1685434ieb.1 for ; Thu, 23 Jul 2015 10:55:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=6Xk8v/BdK/twku5L9VL3qbIfKAW1ROk/8GCBvcU4oEI=; b=OQ9j7SAaBMHmiccicXMqgxj5IL3R403K90/u6I7iTy5zF4gSeMYANyVlmc2sl/uUWp aNdQjCHHMBHOHo72QnYZAAJD/eEqNMldGm+sPwJALYVZtZJdCHvJd1DhPlhNnW35nz5+ ZeSQlei/e1OJlyNxrQR7yBuDPdE+v+g+K4Dtunw+OV4GIjRy62SRwB33G0Aly5xxukh1 tLBPYtk5sAMCIj3pvWjWJpnjMNWWO+RkjSvGwi7AI6d0A/b6z/N0ObvB86ysZXYrY73L syHvWgrL54QqYoKbaZEtiGje9RnW0vuOSDs7/PBLZq/8PPzfOGu6zbTAS7z19gF5D+2H 05UQ== MIME-Version: 1.0 X-Received: by 10.107.41.146 with SMTP id p140mr15286418iop.58.1437669439640; Thu, 23 Jul 2015 09:37:19 -0700 (PDT) Received: by 10.65.15.33 with HTTP; Thu, 23 Jul 2015 09:37:19 -0700 (PDT) In-Reply-To: References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> <1474771205.1788105.1437648059578.JavaMail.zimbra@uoguelph.ca> Date: Thu, 23 Jul 2015 09:37:19 -0700 Message-ID: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) From: Mehmet Erol Sanliturk To: Ahmed Kamal Cc: Rick Macklem , Ahmed Kamal via freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jul 2015 17:55:21 -0000 On Thu, Jul 23, 2015 at 8:26 AM, Ahmed Kamal via freebsd-fs < freebsd-fs@freebsd.org> wrote: > Well .. The problem is now gone, so I guess I can't collect more data till > it happens (or hopefully doesn't :) happen again .. So as I described, I > had to restart the FreeBSD NFS server box first .. maybe this caused linux > clients to give up after 5 mins, and attempt to destroy the session ? When > the NFS server was back up .. It was being bombarded (50Mbps traffic) with > rpc traffic, probably saying this "destroy session" message. > > What I don't understand however is, why doesn't this end. What does FreeBSD > reply with? Shouldn't it say, Okay, I don't know anything about this > session, so consider it destroyed .. suit yourself linux .. or does it > refuse to destroy, causing Linux to keep on retrying like crazy ? > > My opinion is that in the latest Linux NFS client there is a problem : It is consuming too much time to communicate with the Linux server . For that reason , I have switched backed to Fedora 18 as a client because of this "fighting" with the server visible from the switch lights and getting response after a long activity which is meaningless to make so much activity get a response . Server is Fedora 19 . Mehmet Erol Sanliturk > On Thu, Jul 23, 2015 at 12:40 PM, Rick Macklem > wrote: > > > Ahmed Kamal wrote: > > > rhel6 servers logs were flooded with errors like: > > http://paste2.org/EwLGcGF6 > > > The Freebsd box was being pounded with 40Mbps of nfs traffic .. > probably > > > Linux was retrying too hard ?! I had to reboot all PCs and after the > last > > > one, nfsd CPU usage dropped immediately to zero > > > > > Btw, it would be interesting to know what triggers these things (overload > > of > > the nfs server resulting in very slow response or ???). Basically > > Destroy_session > > isn't an operation that a client would normally do. I have no idea why > the > > Linux > > client would do it. (A session is what achieves the "exactly once" > > semantics for > > the RPCs. It should really be in the RPC layer, but the NFSv4 working > > group put > > it in NFSv4.1 because they didn't want to replace Sun RPC. I can't think > > of a reason > > to destroy a session except on dismount. Maybe if the client thinks the > > session is > > broken for some reason??) > > > > Maybe something like "vmstat -m", "vmstat -z" and "nfsstat -s -e" running > > repeatedly > > (once/sec with timestamps via "date" or similar) so that you can see what > > was happening just > > before the meltdowns. > > > > A raw packet trace of just when the meltdown starts would be useful, but > I > > can't think > > of how you'd get one of reasonable size. Maybe having "tcpdump -s 0 -w > > .pcap " > > run for 1sec and then kill/restart it repeatedly with different file > > names, so you might get > > a useful 1sec capture at the critical time? > > > > Anyhow, good luck with it, rick > > > > > On Tue, Jul 21, 2015 at 5:52 AM, Ahmed Kamal < > > > email.ahmedkamal@googlemail.com> wrote: > > > > > > > More info .. Just noticed nfsd is spinning the cpu at 500% :( I just > > did > > > > the dtrace with: > > > > > > > > dtrace -n profile-1001 { @[stack()] = count(); } > > > > The result is at http://paste2.org/vb8ZdvF2 (scroll to bottom) > > > > > > > > Since rebooting the nfs server didn't fix it .. I imagine I'd have to > > > > reboot all NFS clients .. This would be really sad .. Any advice is > > most > > > > appreciated .. Thanks > > > > > > > > > > > > On Tue, Jul 21, 2015 at 5:26 AM, Ahmed Kamal < > > > > email.ahmedkamal@googlemail.com> wrote: > > > > > > > >> Hi folks, > > > >> > > > >> I've upgraded a test client to rhel6 today, and I'll keep an eye on > > it to > > > >> see what happens. > > > >> > > > >> During the process, I made the (I guess mistake) of zfs send | recv > > to a > > > >> locally attached usb disk for backup purposes .. long story short, > > > >> sharenfs > > > >> property on the received filesystem was causing some nfs/mountd > > errors in > > > >> logs .. I wasn't too happy with what I got .. I destroyed the backup > > > >> datasets and the whole pool eventually .. and then rebooted the > whole > > nas > > > >> box .. After reboot my logs are still flooded with > > > >> > > > >> Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > > > >> Jul 21 05:13:07 nas last message repeated 7536 times > > > >> Jul 21 05:15:08 nas last message repeated 29664 times > > > >> > > > >> Not sure what that means .. or how it can be stopped .. Anyway, will > > keep > > > >> you posted on progress. > > > >> > > > >> On Fri, Jul 17, 2015 at 9:31 PM, Rick Macklem > > > > >> wrote: > > > >> > > > >>> Graham Allan wrote: > > > >>> > I'm curious how things are going for you with this? > > > >>> > > > > >>> > Reading your thread did pique my interest since we have a lot of > > > >>> > Scientific Linux (RHEL clone) boxes with FreeBSD NFSv4 servers. I > > meant > > > >>> > to glance through our logs for signs of the same issue, but > today I > > > >>> > started investigating a machine which appeared to have hung > > processes, > > > >>> > high rpciod load, and high traffic to the NFS server. Of course > it > > is > > > >>> > exactly this issue. > > > >>> > > > > >>> > The affected machine is running SL5 though most of our server > > nodes are > > > >>> > now SL6. I can see errors from most of them but the SL6 systems > > appear > > > >>> > less affected - I see a stream of the sequence-id errors in their > > logs > > > >>> but > > > >>> > things in general keep working. The one SL5 machine I'm looking > at > > > >>> > has a single sequence-id error in today's logs, but then goes > into > > a > > > >>> > stream of "state recovery failed" then "Lock reclaim failed". > It's > > > >>> > probably partly related to the particular workload on this > machine. > > > >>> > > > > >>> > I would try switching our SL6 machines to NFS 4.1 to see if the > > > >>> > behaviour changes, but 4.1 isn't supported by our 9.3 servers (is > > it in > > > >>> > 10.1?). > > > >>> > > > > >>> Btw, I've done some testing against a fairly recent Fedora and > > haven't > > > >>> seen > > > >>> the problem. If either of you guys could load a recent Fedora on a > > test > > > >>> client > > > >>> box, it would be interesting to see if it suffers from this. (My > > > >>> experience is > > > >>> that the Fedora distros have more up to date Linux NFS clients.) > > > >>> > > > >>> rick > > > >>> > > > >>> > At the NFS servers, most of the sysctl settings are already tuned > > > >>> > from defaults. eg tcp.highwater=100000, > vfs.nfsd.tcpcachetimeo=300, > > > >>> > 128-256 nfs kernel threads. > > > >>> > > > > >>> > Graham > > > >>> > > > > >>> > On Fri, Jul 03, 2015 at 01:21:00AM +0200, Ahmed Kamal via > > freebsd-fs > > > >>> wrote: > > > >>> > > PS: Today (after adjusting tcp.highwater) I didn't get any > > screaming > > > >>> > > reports from users about hung vnc sessions. So maybe just > maybe, > > > >>> linux > > > >>> > > clients are able to somehow recover from this bad sequence > > messages. > > > >>> I > > > >>> > > could still see the bad sequence error message in logs though > > > >>> > > > > > >>> > > Why isn't the highwater tunable set to something better by > > default ? > > > >>> I mean > > > >>> > > this server is certainly not under a high or unusual load (it's > > only > > > >>> 40 PCs > > > >>> > > mounting from it) > > > >>> > > > > > >>> > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal > > > >>> > > > > >>> > > > wrote: > > > >>> > > > > > >>> > > > Thanks all .. I understand now we're doing the "right thing" > .. > > > >>> Although > > > >>> > > > if mounting keeps wedging, I will have to solve it somehow! > > Either > > > >>> using > > > >>> > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > > > >>> > > > > > > >>> > > > Regarding Xin's patch, is it possible to build the patched > nfsd > > > >>> code, as > > > >>> > > > a > > > >>> > > > kernel module ? I'm looking to minimize my delta to upstream. > > > >>> > > > > > > >>> > > > Also would adopting Xin's patch and hiding it behind a > > > >>> > > > kern.nfs.allow_linux_broken_client be an option (I'm probably > > not > > > >>> the > > > >>> > > > last > > > >>> > > > person on earth to hit this) ? > > > >>> > > > > > > >>> > > > Thanks a lot for all the help! > > > >>> > > > > > > >>> > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < > > > >>> rmacklem@uoguelph.ca> > > > >>> > > > >>> > > > wrote: > > > >>> > > > > > > >>> > > >> Ahmed Kamal wrote: > > > >>> > > >> > Appreciating the fruitful discussion! Can someone please > > > >>> explain to > > > >>> > > >> > me, > > > >>> > > >> > what would happen in the current situation (linux client > > doing > > > >>> this > > > >>> > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the > > effect > > > >>> of > > > >>> > > >> > that? > > > >>> > > >> Well, as you've seen, the Linux client doesn't function > > correctly > > > >>> > > >> against > > > >>> > > >> the FreeBSD server (and probably others that don't support > > this > > > >>> > > >> "skip-by-1" > > > >>> > > >> case). > > > >>> > > >> > > > >>> > > >> > What do users see? Any chances of data loss? > > > >>> > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess > > what > > > >>> the > > > >>> > > >> Linux > > > >>> > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. > You're > > the > > > >>> guy > > > >>> > > >> observing > > > >>> > > >> it. > > > >>> > > >> > > > >>> > > >> > > > > >>> > > >> > Also, I find it strange that netapp have acknowledged this > > is a > > > >>> bug on > > > >>> > > >> > their side, which has been fixed since then! > > > >>> > > >> Yea, I think Netapp screwed up. For some reason their server > > > >>> allowed > > > >>> > > >> this, > > > >>> > > >> then was fixed to not allow it and then someone decided that > > was > > > >>> broken > > > >>> > > >> and > > > >>> > > >> reversed it. > > > >>> > > >> > > > >>> > > >> > I also find it strange that I'm the first to hit this :) > Is > > no > > > >>> one > > > >>> > > >> running > > > >>> > > >> > nfs4 yet! > > > >>> > > >> > > > > >>> > > >> Well, it seems to be slowly catching on. I suspect that the > > Linux > > > >>> client > > > >>> > > >> mounting a Netapp is the most common use of it. Since it > > appears > > > >>> that > > > >>> > > >> they > > > >>> > > >> flip flopped w.r.t. who's bug this is, it has probably > > persisted. > > > >>> > > >> > > > >>> > > >> It may turn out that the Linux client has been fixed or it > may > > > >>> turn out > > > >>> > > >> that most servers allowed this "skip-by-1" even though David > > > >>> Noveck (one > > > >>> > > >> of the main authors of the protocol) seems to agree with me > > that > > > >>> it > > > >>> > > >> should > > > >>> > > >> not be allowed. > > > >>> > > >> > > > >>> > > >> It is possible that others have bumped into this, but it > > wasn't > > > >>> isolated > > > >>> > > >> (I wouldn't have guessed it, so it was good you pointed to > the > > > >>> RedHat > > > >>> > > >> discussion) > > > >>> > > >> and they worked around it by reverting to NFSv3 or similar. > > > >>> > > >> The protocol is rather complex in this area and changed > > > >>> completely for > > > >>> > > >> NFSv4.1, > > > >>> > > >> so many have also probably moved onto NFSv4.1 where this > > won't be > > > >>> an > > > >>> > > >> issue. > > > >>> > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics > > and > > > >>> doesn't > > > >>> > > >> use > > > >>> > > >> these seqid fields.) > > > >>> > > >> > > > >>> > > >> This is all just mho, rick > > > >>> > > >> > > > >>> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < > > > >>> rmacklem@uoguelph.ca> > > > >>> > > >> wrote: > > > >>> > > >> > > > > >>> > > >> > > Julian Elischer wrote: > > > >>> > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > > > >>> > > >> > > > > I am going to post to nfsv4@ietf.org to see what > they > > > >>> say. > > > >>> > > >> > > > > Please > > > >>> > > >> > > > > let me know if Xin Li's patch resolves your problem, > > even > > > >>> though > > > >>> > > >> > > > > I > > > >>> > > >> > > > > don't believe it is correct except for the > UINT32_MAX > > > >>> case. Good > > > >>> > > >> > > > > luck with it, rick > > > >>> > > >> > > > and please keep us all in the loop as to what they > say! > > > >>> > > >> > > > > > > >>> > > >> > > > the general N+2 bit sounds like bullshit to me.. its > > always > > > >>> N+1 in > > > >>> > > >> > > > a > > > >>> > > >> > > > number field that has a > > > >>> > > >> > > > bit of slack at wrap time (probably due to some > > ambiguity > > > >>> in the > > > >>> > > >> > > > original spec). > > > >>> > > >> > > > > > > >>> > > >> > > Actually, since N is the lock op already done, N + 1 is > > the > > > >>> next > > > >>> > > >> > > lock > > > >>> > > >> > > operation in order. Since lock ops need to be strictly > > > >>> ordered, > > > >>> > > >> allowing > > > >>> > > >> > > N + 2 (which means N + 2 would be done before N + 1) > > makes no > > > >>> sense. > > > >>> > > >> > > > > > >>> > > >> > > I think the author of the RFC meant that N + 2 or > greater > > > >>> fails, but > > > >>> > > >> it > > > >>> > > >> > > was poorly worded. > > > >>> > > >> > > > > > >>> > > >> > > I will pass along whatever I get from nfsv4@ietf.org. > > (There > > > >>> is an > > > >>> > > >> archive > > > >>> > > >> > > of it somewhere, but I can't remember where.;-) > > > >>> > > >> > > > > > >>> > > >> > > rick > > > >>> > > >> > > _______________________________________________ > > > >>> > > >> > > freebsd-fs@freebsd.org mailing list > > > >>> > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > >>> > > >> > > To unsubscribe, send any mail to > > > >>> > > >> > > "freebsd-fs-unsubscribe@freebsd.org" > > > >>> > > >> > > > > > >>> > > >> > > > > >>> > > >> > > > >>> > > > > > > >>> > > > > > > >>> > > _______________________________________________ > > > >>> > > freebsd-fs@freebsd.org mailing list > > > >>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > >>> > > To unsubscribe, send any mail to " > > freebsd-fs-unsubscribe@freebsd.org > > > >>> " > > > >>> > > > > >>> > -- > > > >>> > > > > >>> > > ------------------------------------------------------------------------- > > > >>> > Graham Allan - allan@physics.umn.edu - gta@umn.edu - (612) > > 624-5040 > > > >>> > School of Physics and Astronomy - University of Minnesota > > > >>> > > > > >>> > > ------------------------------------------------------------------------- > > > >>> > _______________________________________________ > > > >>> > freebsd-fs@freebsd.org mailing list > > > >>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > >>> > To unsubscribe, send any mail to " > > freebsd-fs-unsubscribe@freebsd.org" > > > >>> > > > > >>> _______________________________________________ > > > >>> freebsd-fs@freebsd.org mailing list > > > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > >>> To unsubscribe, send any mail to " > freebsd-fs-unsubscribe@freebsd.org > > " > > > >>> > > > >> > > > >> > > > > > > > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@freebsd.org Thu Jul 23 18:13:24 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7E2DC9A914A for ; Thu, 23 Jul 2015 18:13:24 +0000 (UTC) (envelope-from allan@physics.umn.edu) Received: from mail.physics.umn.edu (smtp.spa.umn.edu [128.101.220.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 59A101256 for ; Thu, 23 Jul 2015 18:13:22 +0000 (UTC) (envelope-from allan@physics.umn.edu) Received: from c-66-41-25-68.hsd1.mn.comcast.net ([66.41.25.68] helo=[192.168.0.2]) by mail.physics.umn.edu with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.77 (FreeBSD)) (envelope-from ) id 1ZIKzc-0006O5-5H; Thu, 23 Jul 2015 13:13:16 -0500 Message-ID: <55B12EB7.6030607@physics.umn.edu> Date: Thu, 23 Jul 2015 13:13:11 -0500 From: Graham Allan User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Ahmed Kamal , Rick Macklem CC: Ahmed Kamal via freebsd-fs Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <5594B008.10202@freebsd.org> <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jul 2015 18:13:24 -0000 For our part, the user whose code triggered the pathological behaviour on SL5 reran it on SL6 without incident - I still see lots of sequence-id errors in the logs, but nothing bad happened. I'd still like to ask them to rerun again on SL5 to see if the "accept skipped seqid" patch had any effect, though I think we expect not. Maybe it would be nice if I could get set up to capture rolling tcpdumps of the nfs traffic before they run that though... Graham On 7/20/2015 10:26 PM, Ahmed Kamal wrote: > Hi folks, > > I've upgraded a test client to rhel6 today, and I'll keep an eye on it > to see what happens. > > During the process, I made the (I guess mistake) of zfs send | recv to a > locally attached usb disk for backup purposes .. long story short, > sharenfs property on the received filesystem was causing some nfs/mountd > errors in logs .. I wasn't too happy with what I got .. I destroyed the > backup datasets and the whole pool eventually .. and then rebooted the > whole nas box .. After reboot my logs are still flooded with > > Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > Jul 21 05:13:07 nas last message repeated 7536 times > Jul 21 05:15:08 nas last message repeated 29664 times > > Not sure what that means .. or how it can be stopped .. Anyway, will > keep you posted on progress. -- ------------------------------------------------------------------------- Graham Allan - gta@umn.edu - allan@physics.umn.edu School of Physics and Astronomy - University of Minnesota ------------------------------------------------------------------------- From owner-freebsd-fs@freebsd.org Thu Jul 23 21:53:06 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7BB649A8C1C for ; Thu, 23 Jul 2015 21:53:06 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 26E001EBB for ; Thu, 23 Jul 2015 21:53:05 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DKBQALRbFV/61jaINbDguEOQaDHY4jsiUCghARAQEBAQEBAYEKhCQBAQQjVhACAQgYAgINGQICVwIEE4gutgCWFwEBAQEBAQQBAQEBAR2BIooqhBohCQ40B4JpgUMFhxKNTqVTAiaDP1oiMYEGQYEEAQEB X-IronPort-AV: E=Sophos;i="5.15,533,1432612800"; d="scan'208";a="227832744" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 23 Jul 2015 17:53:04 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 4B56915F542; Thu, 23 Jul 2015 17:53:04 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id qA0w_Jp1aXN9; Thu, 23 Jul 2015 17:53:03 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id A41C115F55D; Thu, 23 Jul 2015 17:53:03 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id OpU7jTQRR4lP; Thu, 23 Jul 2015 17:53:03 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 8602015F542; Thu, 23 Jul 2015 17:53:03 -0400 (EDT) Date: Thu, 23 Jul 2015 17:53:03 -0400 (EDT) From: Rick Macklem To: Graham Allan Cc: Ahmed Kamal , Ahmed Kamal via freebsd-fs Message-ID: <1935759160.2320694.1437688383362.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <55B12EB7.6030607@physics.umn.edu> References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> <55B12EB7.6030607@physics.umn.edu> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.12] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: Linux NFSv4 clients are getting (bad sequence-id error!) Thread-Index: tjTKN2j4d0bPpf9zbbgJO12vPj66UQ== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jul 2015 21:53:06 -0000 Graham Allan wrote: > For our part, the user whose code triggered the pathological behaviour > on SL5 reran it on SL6 without incident - I still see lots of > sequence-id errors in the logs, but nothing bad happened. > > I'd still like to ask them to rerun again on SL5 to see if the "accept > skipped seqid" patch had any effect, though I think we expect not. Maybe > it would be nice if I could get set up to capture rolling tcpdumps of > the nfs traffic before they run that though... > > Graham > > On 7/20/2015 10:26 PM, Ahmed Kamal wrote: > > Hi folks, > > > > I've upgraded a test client to rhel6 today, and I'll keep an eye on it > > to see what happens. > > > > During the process, I made the (I guess mistake) of zfs send | recv to a > > locally attached usb disk for backup purposes .. long story short, > > sharenfs property on the received filesystem was causing some nfs/mountd > > errors in logs .. I wasn't too happy with what I got .. I destroyed the > > backup datasets and the whole pool eventually .. and then rebooted the > > whole nas box .. After reboot my logs are still flooded with > > > > Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > > Jul 21 05:13:07 nas last message repeated 7536 times > > Jul 21 05:15:08 nas last message repeated 29664 times > > > > Not sure what that means .. or how it can be stopped .. Anyway, will > > keep you posted on progress. > Oh, I didn't see the part about "reboot" before. Unfortunately, it sounds like the client isn't recovering after the session is lost. When the server reboots, the client(s) will get NFS4ERR_BAD_SESSION errors back because the server reboot has deleted all sessions. The NFS4ERR_BAD_SESSION should trigger state recovery on the client. (It doesn't sound like the clients went into recovery, starting with a Create_session operation, but without a packet trace, I can't be sure?) rick > > -- > ------------------------------------------------------------------------- > Graham Allan - gta@umn.edu - allan@physics.umn.edu > School of Physics and Astronomy - University of Minnesota > ------------------------------------------------------------------------- > > From owner-freebsd-fs@freebsd.org Thu Jul 23 21:55:21 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 56E199A8C9D for ; Thu, 23 Jul 2015 21:55:21 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wi0-x229.google.com (mail-wi0-x229.google.com [IPv6:2a00:1450:400c:c05::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E1F4D1F8D for ; Thu, 23 Jul 2015 21:55:20 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wicmv11 with SMTP id mv11so41598153wic.0 for ; Thu, 23 Jul 2015 14:55:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=XU8wq0SZg+PO6Y1V8wZhjDOJZiar8OwfF5IaG0XjEdQ=; b=IonJAiQgn06ywcW1bo+pRh+xBDfNR3W52iJZLgDgBpvU7/jYwgubEsBezXKiPo8qva SFQAfyiB0ALdq1SOQ/yB9ZNPpXM1sWYAgIIXTWyUusPYZS2z6C37FJ0+Y4DaBNQihxmA 6wnrhFoNvGD+xE2rNzF16alWPBREdX9h69Xg0PbmrRosLvWhyn5hM9aP1k6dRXjR87DN DoQDZNw8Hn9YRaVVeCakPIQ+j6IkElXSizSSx2elWbd5lrFhUITv78lVAo0TStSN4h+b uf5/jdvi/4jznyPtxbgwvyFAebWFkziBZYjPwBqVi5Zuu6sOK4e1tC1ULTJ+OPTE8CuL xoXg== X-Received: by 10.180.20.198 with SMTP id p6mr755888wie.38.1437688518033; Thu, 23 Jul 2015 14:55:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Thu, 23 Jul 2015 14:54:58 -0700 (PDT) In-Reply-To: <1935759160.2320694.1437688383362.JavaMail.zimbra@uoguelph.ca> References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> <55B12EB7.6030607@physics.umn.edu> <1935759160.2320694.1437688383362.JavaMail.zimbra@uoguelph.ca> From: Ahmed Kamal Date: Thu, 23 Jul 2015 23:54:58 +0200 Message-ID: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) To: Rick Macklem Cc: Graham Allan , Ahmed Kamal via freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jul 2015 21:55:21 -0000 Can you please let me know the ultimate packet trace command I'd need to run in case of any nfs4 troubles .. I guess this should be comprehensive even at the expense of a larger output size (which we can trim later).. Thanks a lot for the help! On Thu, Jul 23, 2015 at 11:53 PM, Rick Macklem wrote: > Graham Allan wrote: > > For our part, the user whose code triggered the pathological behaviour > > on SL5 reran it on SL6 without incident - I still see lots of > > sequence-id errors in the logs, but nothing bad happened. > > > > I'd still like to ask them to rerun again on SL5 to see if the "accept > > skipped seqid" patch had any effect, though I think we expect not. Maybe > > it would be nice if I could get set up to capture rolling tcpdumps of > > the nfs traffic before they run that though... > > > > Graham > > > > On 7/20/2015 10:26 PM, Ahmed Kamal wrote: > > > Hi folks, > > > > > > I've upgraded a test client to rhel6 today, and I'll keep an eye on it > > > to see what happens. > > > > > > During the process, I made the (I guess mistake) of zfs send | recv to > a > > > locally attached usb disk for backup purposes .. long story short, > > > sharenfs property on the received filesystem was causing some > nfs/mountd > > > errors in logs .. I wasn't too happy with what I got .. I destroyed the > > > backup datasets and the whole pool eventually .. and then rebooted the > > > whole nas box .. After reboot my logs are still flooded with > > > > > > Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > > > Jul 21 05:13:07 nas last message repeated 7536 times > > > Jul 21 05:15:08 nas last message repeated 29664 times > > > > > > Not sure what that means .. or how it can be stopped .. Anyway, will > > > keep you posted on progress. > > > Oh, I didn't see the part about "reboot" before. Unfortunately, it sounds > like the > client isn't recovering after the session is lost. When the server > reboots, the > client(s) will get NFS4ERR_BAD_SESSION errors back because the server > reboot has > deleted all sessions. The NFS4ERR_BAD_SESSION should trigger state > recovery on the client. > (It doesn't sound like the clients went into recovery, starting with a > Create_session > operation, but without a packet trace, I can't be sure?) > > rick > > > > > -- > > ------------------------------------------------------------------------- > > Graham Allan - gta@umn.edu - allan@physics.umn.edu > > School of Physics and Astronomy - University of Minnesota > > ------------------------------------------------------------------------- > > > > > From owner-freebsd-fs@freebsd.org Thu Jul 23 21:59:19 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1C5AF9A8D9C for ; Thu, 23 Jul 2015 21:59:19 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id C5AB91246 for ; Thu, 23 Jul 2015 21:59:18 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DLBQBNPLFV/61jaINbGYQ5BoMdjiOyJQKCDBEBAQEBAQEBgQqEIwEBAQECASMEUgULAgEIDgoCAg0ZAgJXAgQTiCYItWiWFgEBAQEBAQQBAQEBAR2BIooqhBohCQ40B4JpgUMFhxKFLYghjX6EHZM4AiaEGSIxgQZBgQQBAQE X-IronPort-AV: E=Sophos;i="5.15,532,1432612800"; d="scan'208";a="226072488" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 23 Jul 2015 17:59:11 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 2442C15F542; Thu, 23 Jul 2015 17:59:11 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id TrkNQndjioBd; Thu, 23 Jul 2015 17:59:09 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 26D4C15F55D; Thu, 23 Jul 2015 17:59:09 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 7jF37sCYS-hr; Thu, 23 Jul 2015 17:59:09 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 09AE715F542; Thu, 23 Jul 2015 17:59:09 -0400 (EDT) Date: Thu, 23 Jul 2015 17:59:09 -0400 (EDT) From: Rick Macklem To: Ahmed Kamal Cc: Graham Allan , Ahmed Kamal via freebsd-fs Message-ID: <576106597.2326662.1437688749018.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <684628776.2772174.1435793776748.JavaMail.zimbra@uoguelph.ca> <20150716235022.GF32479@physics.umn.edu> <184170291.10949389.1437161519387.JavaMail.zimbra@uoguelph.ca> <55B12EB7.6030607@physics.umn.edu> <1935759160.2320694.1437688383362.JavaMail.zimbra@uoguelph.ca> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.12] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: Linux NFSv4 clients are getting (bad sequence-id error!) Thread-Index: 9YgEn3j29XZHVRi6Xn/biaCGqqGDxA== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jul 2015 21:59:19 -0000 Ahmed Kamal wrote: > Can you please let me know the ultimate packet trace command I'd need to > run in case of any nfs4 troubles .. I guess this should be comprehensive > even at the expense of a larger output size (which we can trim later).. > Thanks a lot for the help! > tcpdump -s 0 -w .pcap host ( refers to a file name you choose and refers to the host name of a client generating traffic.) --> But you won't be able to allow this to run for long during the storm or the file will be huge. Then you look at .pcap in wireshark, which knows NFS. rick > On Thu, Jul 23, 2015 at 11:53 PM, Rick Macklem wrote: > > > Graham Allan wrote: > > > For our part, the user whose code triggered the pathological behaviour > > > on SL5 reran it on SL6 without incident - I still see lots of > > > sequence-id errors in the logs, but nothing bad happened. > > > > > > I'd still like to ask them to rerun again on SL5 to see if the "accept > > > skipped seqid" patch had any effect, though I think we expect not. Maybe > > > it would be nice if I could get set up to capture rolling tcpdumps of > > > the nfs traffic before they run that though... > > > > > > Graham > > > > > > On 7/20/2015 10:26 PM, Ahmed Kamal wrote: > > > > Hi folks, > > > > > > > > I've upgraded a test client to rhel6 today, and I'll keep an eye on it > > > > to see what happens. > > > > > > > > During the process, I made the (I guess mistake) of zfs send | recv to > > a > > > > locally attached usb disk for backup purposes .. long story short, > > > > sharenfs property on the received filesystem was causing some > > nfs/mountd > > > > errors in logs .. I wasn't too happy with what I got .. I destroyed the > > > > backup datasets and the whole pool eventually .. and then rebooted the > > > > whole nas box .. After reboot my logs are still flooded with > > > > > > > > Jul 21 05:12:36 nas kernel: nfsrv_cache_session: no session > > > > Jul 21 05:13:07 nas last message repeated 7536 times > > > > Jul 21 05:15:08 nas last message repeated 29664 times > > > > > > > > Not sure what that means .. or how it can be stopped .. Anyway, will > > > > keep you posted on progress. > > > > > Oh, I didn't see the part about "reboot" before. Unfortunately, it sounds > > like the > > client isn't recovering after the session is lost. When the server > > reboots, the > > client(s) will get NFS4ERR_BAD_SESSION errors back because the server > > reboot has > > deleted all sessions. The NFS4ERR_BAD_SESSION should trigger state > > recovery on the client. > > (It doesn't sound like the clients went into recovery, starting with a > > Create_session > > operation, but without a packet trace, I can't be sure?) > > > > rick > > > > > > > > -- > > > ------------------------------------------------------------------------- > > > Graham Allan - gta@umn.edu - allan@physics.umn.edu > > > School of Physics and Astronomy - University of Minnesota > > > ------------------------------------------------------------------------- > > > > > > > > > From owner-freebsd-fs@freebsd.org Sat Jul 25 11:28:42 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 859349AAA0F for ; Sat, 25 Jul 2015 11:28:42 +0000 (UTC) (envelope-from jjuanino@gmail.com) Received: from mail-oi0-x232.google.com (mail-oi0-x232.google.com [IPv6:2607:f8b0:4003:c06::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 478401AED for ; Sat, 25 Jul 2015 11:28:42 +0000 (UTC) (envelope-from jjuanino@gmail.com) Received: by oihq81 with SMTP id q81so31547853oih.2 for ; Sat, 25 Jul 2015 04:28:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Akm/d1aF5FXzY3Yar4dem34KSkL2af8vCLh3/G8k65E=; b=ALhdQjHaPPoKYTN5KxyPI9crvlBkDRhiFbhFXgq01ajcIbgWtaJkIIMeVI4JjyMTlW ufhYRFsuSJ9ws7NVyd1vknYtAWSfv9/YoKGQmW9hUXu7WCnTsZuNPwZLTBi5VgZW5TuX cV2562wjETCvkcFUr4fFgX8mng5NN8uDA5xg9+u/H6Xl8dBMnB7L6FhApVlIAH/xmiKa uNfBnuyzXxTxMKujavVKd6S1MHzvgNoKEQbW276gtlqQIvX6iTJwq74gLl77izve+Mo8 LeNO04OEKnYy8jfL1kWC7+9AbzKLH8PUlAZCoLwVU6iL3vtSIY7GHdAxCF8HjTH0B5SK 6fTw== MIME-Version: 1.0 X-Received: by 10.202.136.139 with SMTP id k133mr18700783oid.7.1437823721363; Sat, 25 Jul 2015 04:28:41 -0700 (PDT) Received: by 10.202.212.15 with HTTP; Sat, 25 Jul 2015 04:28:41 -0700 (PDT) Date: Sat, 25 Jul 2015 13:28:41 +0200 Message-ID: Subject: ZFS related panic: Memory modified after free From: =?UTF-8?B?Sm9zw6kgR2FyY8OtYSBKdWFuaW5v?= To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jul 2015 11:28:42 -0000 Hi FreeBSD-fs, I have installed last devel snapshot from FreeBSD-11.0-CURRENT-amd64-20150722-r285794, with the following settings: * ZFS guided * ZFS encrypted option enable (GELI) I have only added the following tunables to /boot/loader.conf: i915kms_load="YES" vfs.zfs.arc_max="1024M" Yes, I have limited the ARC cache to 1G, as the system makes a lot of swapping when you leave the default setting. Under this scenario, the panic is easily reproducible when the system is stressed by running a buildworld, and doing some more activity, as "du -hs /usr/ports" and surfing the web. Hardware is somewhat ancient and outdated (Dell Latitude D630, 4G RAM). Full details here: http://pastebin.com/raw.php?i=vHk6GLy6 If you needed, I can submit a bug report, but before to do that, I need to make sure that the panic is not related to my specific hardware or my ARC setting. Best regards.