From owner-freebsd-current@FreeBSD.ORG Tue May 7 07:12:09 2013 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D998EC54 for ; Tue, 7 May 2013 07:12:09 +0000 (UTC) (envelope-from Hartmut.Brandt@dlr.de) Received: from mailhost.dlr.de (mailhost.dlr.de [129.247.252.33]) by mx1.freebsd.org (Postfix) with ESMTP id 4DE5F32B for ; Tue, 7 May 2013 07:12:08 +0000 (UTC) Received: from DLREXHUB01.intra.dlr.de (172.21.152.130) by dlrexedge02.dlr.de (172.21.163.101) with Microsoft SMTP Server (TLS) id 14.2.328.9; Tue, 7 May 2013 09:11:37 +0200 Received: from KNOP-BEAGLE.kn.op.dlr.de (129.247.178.136) by smtp.dlr.de (172.21.152.151) with Microsoft SMTP Server (TLS) id 14.2.328.9; Tue, 7 May 2013 09:11:39 +0200 Date: Tue, 7 May 2013 09:12:20 +0200 From: Hartmut Brandt X-X-Sender: brandt_h@KNOP-BEAGLE.kn.op.dlr.de To: Rick Macklem Subject: Re: files disappearing from ls on NFS In-Reply-To: <365967035.135220.1367844804040.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: References: <365967035.135220.1367844804040.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-Originating-IP: [129.247.178.136] Cc: current@freebsd.org, ato@iem.pw.edu.pl X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 May 2013 07:12:09 -0000 On Mon, 6 May 2013, Rick Macklem wrote: RM>Hartmut Brandt wrote: RM>> Hi Rick, RM>> RM>> the patch doesn't help. So how can I help to fix that? Of course, I RM>> can use the work-around with oldnfs, but ... RM>> RM>Well, I plan on going through the readdir code and seeing if I can spot RM>a case that would break for small RPC replies. If I can find something, RM>I'll email you a patch for testing. (I can't seem to reproduce the problem RM>here.) RM> RM>The mysterious part for me is why it has shown up recently, because there RM>hasn't been any recent change committed that seems like it could cause this. RM>(Maybe it is just a co-incidence that it showed up recently and the bug has RM> been there all along?) RM> RM>I'll admit my worst fear is that is somehow caused by the switch to clang for RM>certain arches. If that is the case, it could take a long time to isolate. I'm quite sure that I've build the system in February with clang already. But in march or so a new clang version was committed. harti RM>> -----Original Message----- RM>> From: Rick Macklem [mailto:rmacklem@uoguelph.ca] RM>> Sent: Saturday, May 04, 2013 11:33 PM RM>> To: Brandt, Hartmut RM>> Cc: current@freebsd.org; Andrzej Tobola RM>> Subject: Re: files disappearing from ls on NFS RM>> RM>> Hartmut Brandt wrote: RM>> > On Fri, 3 May 2013, Rick Macklem wrote: RM>> > RM>> > RM>Ok, if you succeed in isolating the commit, that would be great. RM>> > RM>> > Hmm. I'm somewhat stuck. clang from yesterday can't compile clang RM>> > from RM>> > a month ago... RM>> > RM>> > harti RM>> > RM>> Oh well. You could try this patch (which is the one to fix readdir for RM>> union mounts), since I can see that VOP_VPTOCNP() will also be broken RM>> without it. (I can't see how that would break "ls", but it breaks RM>> __getcwd() and friends, so maybe it can affect "ls" somehow?) RM>> RM>> It's a cut/paste under windows, so I'm afraid the whitespace will be RM>> messed up, but it's pretty simple to apply by hand. RM>> RM>> Index: nfs_clvnops.c RM>> =================================================================== RM>> --- nfs_clvnops.c (revision 249568) RM>> +++ nfs_clvnops.c (working copy) RM>> @@ -2221,6 +2221,7 @@ RM>> !NFS_TIMESPEC_COMPARE(&np->n_mtime, &vattr.va_mtime)) { RM>> mtx_unlock(&np->n_mtx); RM>> NFSINCRGLOBAL(newnfsstats.direofcache_hits); RM>> + *ap->a_eofflag = 1; RM>> return (0); RM>> } else RM>> mtx_unlock(&np->n_mtx); @@ -2233,8 +2234,10 @@ RM>> tresid = uio->uio_resid; RM>> error = ncl_bioread(vp, uio, 0, ap->a_cred); RM>> RM>> - if (!error && uio->uio_resid == tresid) RM>> + if (!error && uio->uio_resid == tresid) { RM>> NFSINCRGLOBAL(newnfsstats.direofcache_misses); RM>> + *ap->a_eofflag = 1; RM>> + } RM>> return (error); RM>> } RM>> RM>> I haven't yet succeeded in reproducing the problem, but will be poking RM>> at it some more, rick RM>> RM>> > RM> RM>> > RM>rick RM>> > RM> RM>> > RM>> harti RM>> > RM>> RM>> > RM>> On Fri, 3 May 2013, Rick Macklem wrote: RM>> > RM>> RM>> > RM>> RM>Hartmut Brandt wrote: RM>> > RM>> RM>> Hi, RM>> > RM>> RM>> RM>> > RM>> RM>> I've updated one of my -current machines this week RM>> > (previous RM>> > RM>> update RM>> > RM>> RM>> was in RM>> > RM>> RM>> february). Now I see a strange effect (it seems only on RM>> > NFS RM>> > RM>> mounts): RM>> > RM>> RM>> ls or RM>> > RM>> RM>> even echo * will list only some files (strange enough the RM>> > first RM>> > RM>> files RM>> > RM>> RM>> from RM>> > RM>> RM>> the normal, alphabetically ordered list). If I change RM>> > something RM>> > RM>> in the RM>> > RM>> RM>> directory (delete a file or create a new one) for some RM>> > time RM>> > the RM>> > RM>> RM>> complete RM>> > RM>> RM>> listing will appear but after sime time (seconds to a RM>> > minute RM>> > or RM>> > RM>> so) RM>> > RM>> RM>> again RM>> > RM>> RM>> only part of the files is listed. RM>> > RM>> RM>> RM>> > RM>> RM>> A ktrace on ls /usr/src/lib/libc/gen shows that RM>> > getdirentries is RM>> > RM>> RM>> called RM>> > RM>> RM>> only once (returning 4096). For a full listing RM>> > getdirentries RM>> > is RM>> > RM>> called RM>> > RM>> RM>> 5 RM>> > RM>> RM>> times with the last returning 0. RM>> > RM>> RM>> RM>> > RM>> RM>> I can still open files that are not listed if I know their RM>> > name, RM>> > RM>> RM>> though. RM>> > RM>> RM>> RM>> > RM>> RM>> The NFS server is a Windows 2008 server with an OpenText RM>> > NFS RM>> > RM>> Server RM>> > RM>> RM>> which RM>> > RM>> RM>> works without problems to all the other FreeBSD machines. RM>> > RM>> RM>> RM>> > RM>> RM>> So what could that be? RM>> > RM>> RM>> RM>> > RM>> RM>Someone else reported missing files returned via "ls" RM>> > recently, RM>> > RM>> when RM>> > RM>> RM>they used a small readdirsize (below 8K). I haven't yet had RM>> > a RM>> > RM>> change to try RM>> > RM>> RM>and reproduce it or do any snooping around. RM>> > RM>> RM> RM>> > RM>> RM>There haven't been any recent changes to readdir in the NFS RM>> > client, RM>> > RM>> RM>except a trivial one that adds a check for vnode type being RM>> > VDIR, RM>> > RM>> RM>so I don't see that it can be a recent NFS change. RM>> > RM>> RM> RM>> > RM>> RM>If you can increase the readdirsize, try that to see if it RM>> > avoids RM>> > RM>> RM>the problem. "nfsstat -m" shows you what the mount options RM>> > end RM>> > up RM>> > RM>> RM>being after doing the mount. The server might be limiting RM>> > the RM>> > RM>> readdirsize RM>> > RM>> RM>to 4K, so you should check, even if you specify a large RM>> > value RM>> > for RM>> > RM>> RM>the mount. RM>> > RM>> RM> RM>> > RM>> RM>rick RM>> > RM>> RM> RM>> > RM>> RM>> Regards, RM>> > RM>> RM>> harti RM>> > RM>> RM>> _______________________________________________ RM>> > RM>> RM>> freebsd-current@freebsd.org mailing list RM>> > RM>> RM>> http://lists.freebsd.org/mailman/listinfo/freebsd-current RM>> > RM>> RM>> To unsubscribe, send any mail to RM>> > RM>> RM>> "freebsd-current-unsubscribe@freebsd.org" RM>> > RM>> RM> RM>> > RM>> _______________________________________________ RM>> > RM>> freebsd-current@freebsd.org mailing list RM>> > RM>> http://lists.freebsd.org/mailman/listinfo/freebsd-current RM>> > RM>> To unsubscribe, send any mail to RM>> > RM>> "freebsd-current-unsubscribe@freebsd.org" RM>> > RM> RM>> > _______________________________________________ RM>> > freebsd-current@freebsd.org mailing list RM>> > http://lists.freebsd.org/mailman/listinfo/freebsd-current RM>> > To unsubscribe, send any mail to RM>> > "freebsd-current-unsubscribe@freebsd.org" RM>