From owner-freebsd-stable@FreeBSD.ORG Fri Oct 31 01:49:49 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 023EEED2; Fri, 31 Oct 2014 01:49:49 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 9ABCABC; Fri, 31 Oct 2014 01:49:48 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ar0EAB/qUlSDaFve/2dsb2JhbABcDoNUWASDAsoRCoZ5VAKBMgEBAQEBfYQCAQEBAwEBAQEgBCcgCwUWGAICDRkCKQEJJgYIBwQBHASIFwkNtUyUZgEBAQEGAQEBAQEBARuBLI8SAQENDjQHgneBVAWWWoQShDU8jSqHLYM4XCEvB4EBBxcigQMBAQE X-IronPort-AV: E=Sophos;i="5.07,290,1413259200"; d="scan'208";a="163580651" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 30 Oct 2014 21:49:47 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 3E53FB4082; Thu, 30 Oct 2014 21:49:47 -0400 (EDT) Date: Thu, 30 Oct 2014 21:49:47 -0400 (EDT) From: Rick Macklem To: Garrett Wollman Message-ID: <928219131.2682604.1414720187244.JavaMail.root@uoguelph.ca> In-Reply-To: <1902145956.2676513.1414719094052.JavaMail.root@uoguelph.ca> Subject: Re: Definite NFS bug MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.209] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Oct 2014 01:49:49 -0000 I wrote: > Garrett Wollman wrote: > > Like many other users, I upgrade my FreeBSD servers by NFS-mounting > > /usr/src and /usr/obj from a shared build server.[1] Since I > > upgraded > > the build server to 9.3, clients running 9.3 kernels have been > > randomly erroring out during installkernel and installworld. Today > > I > > had some time to look more closely into this and found that the > > error > > is definitely coming from the server: at some point, it just > > randomly > > starts returning errors to client ACCESS and GETATTR operations. > > The > > errors are a mix of NFS3ERR_IO and NFS3ERR_ACCES, but there is > > nothing > > on the server to indicate any kind of error, and restarting the > > operation on the client causes it to fail in a different place. > > With > > enough patients and restarts, it's possible to complete the > > installation in just four or five passes. > > > > Needless to say this is a bit worrying. Strangely, 9.1 and 9.2 > > clients don't see this issue at all; it's only 9.3 clients that > > break. > > > > It's easy to reproduce, just 'cd /usr/sc && find . -type f > > >/dev/null'. > > It does not seem to depend on the client NFS version (3 or 4) or > > implementation ("old" or "new"). I haven't tried the "old" server > > yet > > -- I'll need to figure out how to do that first. > > Oh, and it wasn't clear to me if you are seeing this on a 9.3 server only? (If you get the same outcome testing against an older server, then it seems it is a client side issue.) If that is the case, I'd suggest you try a pre-r261056 (one of the changes was r261056, not r261057) stable/9 kernel. At a closer look, most of the kernel rpc changes are for the server side. (Most of the client side commits just change the copyright, but there are a couple of client side changes beyond that.) > Well, I took a quick look and, if I got it correct, there is one > single > line change in the "old" client between 9.2 and 9.3, which defined > an otherwise unused mount flag called NFSMNT_NONCONTIGWR. (It is > only used by the new client when "nocontigwr" is specified.) > > However, there was some fairly extensive changes done (mostly by > mav@) > to the kernel rpc (sys/rpc), which is used by both clients and both > servers. > Most of these changes were committed to stable/9 as r261057, r261058. > If you could build a kernel from stable/9 just prior to r261057 and > see > if that client runs into the problem, it could help determine if > these > changes are causing the problem. > Alternately, running the 9.3 system with a 9.2 sys/rpc (if it > links/runs), > that could also help see if the kernel rpc is the culprit. (You can > load the kernel rpc as a module, but it's linked into most kernels.) > > If it doesn't turn out to be in the kernel rpc, my next guess would > be changes to the net device driver (to check for this you could use > a different type of hardware device or the 9.2 driver on the 9.3 > system. maybe?). > > The "new" client has some changes 9.2->9.3, but since nothing changed > for the "old" client and you see the problem with the "old" one, I > think the NFS client is not the culprit. > > rick > > > If anyone is willing to help debug this, I can share a packet > > trace, > > but I don't think it's very informative. Also, if anyone has a > > good > > dtrace script that I could run on the server that would report > > what's > > going on when that first NFS3ERR_IO is returned, that would be > > great. > > > > -GAWollman > > > > [1] I'd run my own freebsd-update server but unfortunately it is > > too > > tied to building things that look like official FreeBSD security > > updates, and isn't really designed for (e.g.) updating kernels when > > we > > change a configuration option. It also doesn't have any obvious > > knobs > > for building with anything other than a default {make,src}.conf. > > And with a pkg-able base just around the corner I don't really want > > to > > put much effort into making freebsd-update do what I want. NFS, on > > the other hand, is a big deal and so I need to track down and fix > > these bugs. > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org" >