From owner-freebsd-arm@freebsd.org Fri Mar 11 01:08:20 2016 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 978FAACA62B; Fri, 11 Mar 2016 01:08:20 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 1D5E2B0C; Fri, 11 Mar 2016 01:08:19 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:fqt3Eh14f65UqSW0smDT+DRfVm0co7zxezQtwd8ZsegeKvad9pjvdHbS+e9qxAeQG96LtLQU26GP7P+ocFdDyKjCmUhKSIZLWR4BhJdetC0bK+nBN3fGKuX3ZTcxBsVIWQwt1Xi6NU9IBJS2PAWK8TWM5DIfUi/yKRBybrysXNWC0ILnh6vrpMKbSj4LrQT+SIs6FA+xowTVu5teqqpZAYF19CH0pGBVcf9d32JiKAHbtR/94sCt4MwrqHwI6LoJvvRNWqTifqk+UacQTHF/azh0t4XXskz7RBaLrl4VTmUbiFIcGwHY6Dn1RJD0sze8uu580m+EIYv7Qa1iChq46KI+ch7ji28iPjU69GzSwphqiatQoxasojRixIHJbYWNNLx1d/WOLpshWWNdU5MJBGR6CYSmYt5KVrJZMA== X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DOAQCNGeJW/61jaINeDoRvBrgOghMBDYFthg8CgXgUAQEBAQEBAQFjJ4ItghQBAQEDASMEUgULAgEIGAICDRkCAlcCBBMbiAEIrkmPIQEBAQEBBQEBAQEBG3yFHIF7gkeEIhaDAoE6BYdYAoVadD2IWIhcB4Z8h2yFL4YCiGgCHgEBQoIDGYENWR4uAYkVIwEZfgEBAQ X-IronPort-AV: E=Sophos;i="5.24,318,1454994000"; d="scan'208";a="270314578" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 10 Mar 2016 20:08:12 -0500 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 6493415F55D; Thu, 10 Mar 2016 20:08:12 -0500 (EST) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id jwyDUfK0fRdY; Thu, 10 Mar 2016 20:08:11 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 587A615F56D; Thu, 10 Mar 2016 20:08:11 -0500 (EST) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 4PskEHakYHaR; Thu, 10 Mar 2016 20:08:11 -0500 (EST) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 39A4615F55D; Thu, 10 Mar 2016 20:08:11 -0500 (EST) Date: Thu, 10 Mar 2016 20:08:10 -0500 (EST) From: Rick Macklem To: Paul Mather Cc: Ronald Klop , freebsd-fs@freebsd.org, freebsd-arm@freebsd.org Message-ID: <2136530467.13386220.1457658490896.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca> <60E8006A-F0A8-4284-839E-882FAD7E6A55@gromit.dlib.vt.edu> <508973676.11871738.1457575196588.JavaMail.zimbra@uoguelph.ca> Subject: Re: Unstable NFS on recent CURRENT MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.12] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF44 (Win)/8.0.9_GA_6191) Thread-Topic: Unstable NFS on recent CURRENT Thread-Index: vkg/rK143TXd6S6nj595PPG46ZJutA== X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Mar 2016 01:08:20 -0000 Paul Mather wrote: > On Mar 9, 2016, at 8:59 PM, Rick Macklem wrote: > > > Paul Mather wrote: > >> On Mar 8, 2016, at 7:49 PM, Rick Macklem wrote: > >> > >>> Paul Mather wrote: > >>>> On Mar 7, 2016, at 9:55 PM, Rick Macklem wrote: > >>>> > >>>>> Paul Mather (forwarded by Ronald Klop) wrote: > >>>>>> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather > >>>>>> > >>>>>> wrote: > >>>>>> > >>>>>>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have > >>>>>>> been > >>>>>>> having trouble with NFS. I have been doing a buildworld and > >>>>>>> buildkernel > >>>>>>> with /usr/src and /usr/obj mounted via NFS. Recently, this process > >>>>>>> has > >>>>>>> resulted in the buildworld failing at some point, with a variety of > >>>>>>> errors (Segmentation fault; Permission denied; etc.). Even a "ls > >>>>>>> -alR" > >>>>>>> of /usr/src doesn't manage to complete. It errors out thus: > >>>>>>> > >>>>>>> ===== > >>>>>>> [[...]] > >>>>>>> total 0 > >>>>>>> ls: ./.svn/pristine/fe: Permission denied > >>>>>>> > >>>>>>> ./.svn/pristine/ff: > >>>>>>> total 0 > >>>>>>> ls: ./.svn/pristine/ff: Permission denied > >>>>>>> ls: fts_read: Permission denied > >>>>>>> ===== > >>>>>>> > >>>>>>> On the console, I get the following: > >>>>>>> > >>>>>>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid > >>>>>>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR > >>>>>>> MIDDLEWARE) > >>>>>>> > >>> Oh, I had forgotten this. Here's the comment related to this error. > >>> (about line#445 in sys/fs/nfsclient/nfs_clport.c): > >>> 446 * BROKEN NFS SERVER OR MIDDLEWARE > >>> 447 * > >>> 448 * Certain NFS servers (certain old proprietary > >>> filers > >>> ca. > >>> 449 * 2006) or broken middleboxes (e.g. WAN accelerator > >>> products) > >>> 450 * will respond to GETATTR requests with results for > >>> a > >>> 451 * different fileid. > >>> 452 * > >>> 453 * The WAN accelerator we've observed not only > >>> serves > >>> stale > >>> 454 * cache results for a given file, it also > >>> occasionally serves > >>> 455 * results for wholly different files. This causes > >>> surprising > >>> 456 * problems; for example the cached size attribute > >>> of > >>> a file > >>> 457 * may truncate down and then back up, resulting in > >>> zero > >>> 458 * regions in file contents read by applications. > >>> We > >>> observed > >>> 459 * this reliably with Clang and .c files during > >>> parallel build. > >>> 460 * A pcap revealed packet fragmentation and GETATTR > >>> RPC > >>> 461 * responses with wholly wrong fileids. > >>> > >>> If you can connect the client->server with a simple switch (or just an > >>> RJ45 > >>> cable), it > >>> might be worth testing that way. (I don't recall the name of the > >>> middleware > >>> product, but > >>> I think it was shipped by one of the major switch vendors. I also don't > >>> know if the product > >>> supports NFSv4?) > >>> > >>> rick > >> > >> > >> Currently, the client is connected to the server via a dumb gigabit > >> switch, > >> so it is already fairly direct. > >> > >> As for the above error, it appeared on the console only once. (Sorry if I > >> made it sound like it appears every time.) > >> > >> I just tried another buildworld attempt via NFS and it failed again. This > >> time, I get this on the BeagleBone Black console: > >> > >> nfs_getpages: error 13 > >> vm_fault: pager read error, pid 5401 (install) > >> > > 13 is EACCES and could be caused by what I mention below. (Any mount of a > > file > > system on the server unless "-S" is specified as a flag for mountd.) > > > >> > >> The other thing I have noticed is that if I induce heavy load on the NFS > >> server---e.g., by starting a Poudriere bulk build---then that provokes the > >> client to crash much more readily. For example, I started a NFS > >> buildworld > >> on the BeagleBone Black, and it seemed to be chugging along nicely. The > >> moment I kicked off a Poudriere build update of my packages on the NFS > >> server, it crashed the buildworld on the NFS client. > >> > > Try adding "-S" to mountd_flags on the server. Any time file systems are > > mounted > > (and Poudriere likes to do that, I am told), mount sends a SIGHUP to mountd > > to > > reload /etc/exports. When /etc/exports are being reloaded, there will be > > access > > errors for mounts (that are temporarily not exported) unless you specify > > "-S" > > (which makes mountd suspend the nfsd threads during the reload of > > /etc/exports). > > > > rick > > > Bingo! I think we may have a winner. I added that flag to mountd_flags on > the server and the "instability" appears to have gone away. > > It may be that all along the NFS problems on the client just coincided with > Poudriere runs on the server. I build custom packages for my local machines > using Poudriere so I use it quite a lot. Maybe the Poudriere port should > come with a warning at install to those using NFS that it may provoke > disruption and suggest the addition of "-S"? (Alternatively, maybe "-S" > could become a default for mountd_flags? Is there a downside from using it > that means making it a default option is unsuitable?) > Well, the first time I proposed "-S" the collective felt it wasn't the appropriate solution to the "export reload" problem. The second time, the "collective" agreed that it was ok as a non-default option. (Part of this story was an alternative to mountd called nfse which did update exports atomically, but it never made it into FreeBSD.) The only downside to making it a default is that it does change behaviour and some might consider that a POLA violation. Others would consider it just a bug fix. There was one report of long delays before exports got updated on a very busy server. (I have a one line patch that fixes this, but that won't be committed into FreeBSD-current until April.) Now that "-S" has been in FreeBSD for a couple of years, I am planning on asking the "collective" (I usually post these kind of things on freebsd-fs@) to make it the default in FreeBSD-current, because this problem seems to crop up fairly frequently. I will probably post w.r.t. this in April when I can again to svn commits. I only recently found out the Poudriere does mounts and causes this problem. I may also commit a man page update (which can be MFC'd) that mentions if you are using Poudriere you want this flag. Having the same thing mentioned in the Poudriere port install might be nice, too. Thanks for testing this, rick > Anyway, many, many thanks for all the help, Rick. I'll keep monitoring my > BeagleBone Black, but it looks for now that this has solved the NFS > "instability." > > Cheers, > > Paul. > >