From owner-freebsd-current@FreeBSD.ORG Tue May 14 07:15:01 2013 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E9FB01D3 for ; Tue, 14 May 2013 07:15:00 +0000 (UTC) (envelope-from Hartmut.Brandt@dlr.de) Received: from mailhost.dlr.de (mailhost.dlr.de [129.247.252.33]) by mx1.freebsd.org (Postfix) with ESMTP id 5A55D753 for ; Tue, 14 May 2013 07:15:00 +0000 (UTC) Received: from DLREXHUB01.intra.dlr.de (172.21.152.130) by dlrexedge02.dlr.de (172.21.163.101) with Microsoft SMTP Server (TLS) id 14.2.328.9; Tue, 14 May 2013 09:14:36 +0200 Received: from KNOP-BEAGLE.kn.op.dlr.de (129.247.178.136) by smtp.dlr.de (172.21.152.151) with Microsoft SMTP Server (TLS) id 14.2.328.9; Tue, 14 May 2013 09:14:37 +0200 Date: Tue, 14 May 2013 09:14:41 +0200 From: Hartmut Brandt X-X-Sender: brandt_h@KNOP-BEAGLE.kn.op.dlr.de To: Rick Macklem Subject: Re: files disappearing from ls on NFS In-Reply-To: <970256490.323235.1368474673678.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: References: <970256490.323235.1368474673678.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-Originating-IP: [129.247.178.136] Cc: current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 May 2013 07:15:01 -0000 On Mon, 13 May 2013, Rick Macklem wrote: RM>Hartmut Brandt wrote: RM>> On Sun, 12 May 2013, Rick Macklem wrote: RM>> RM>> RM>Hartmut Brandt wrote: RM>> RM>> Hi, RM>> RM>> RM>> RM>> I've updated one of my -current machines this week (previous RM>> update RM>> RM>> was in RM>> RM>> february). Now I see a strange effect (it seems only on NFS RM>> mounts): RM>> RM>> ls or RM>> RM>> even echo * will list only some files (strange enough the first RM>> files RM>> RM>> from RM>> RM>> the normal, alphabetically ordered list). If I change something RM>> in the RM>> RM>> directory (delete a file or create a new one) for some time the RM>> RM>> complete RM>> RM>> listing will appear but after sime time (seconds to a minute or RM>> so) RM>> RM>> again RM>> RM>> only part of the files is listed. RM>> RM>> RM>> RM>> A ktrace on ls /usr/src/lib/libc/gen shows that getdirentries is RM>> RM>> called RM>> RM>> only once (returning 4096). For a full listing getdirentries is RM>> called RM>> RM>> 5 RM>> RM>> times with the last returning 0. RM>> RM>> RM>> RM>> I can still open files that are not listed if I know their name, RM>> RM>> though. RM>> RM>> RM>> RM>> The NFS server is a Windows 2008 server with an OpenText NFS RM>> Server RM>> RM>> which RM>> RM>> works without problems to all the other FreeBSD machines. RM>> RM>> RM>> RM>> So what could that be? RM>> RM>> RM>> RM>I've attached a patch that might be worth trying. It is a "shot in RM>> the dark", RM>> RM>but brings the new NFS client's readdir closer to the old one RM>> (which you RM>> RM>mentioned still works ok). RM>> RM> RM>> RM>Please let me know how it goes, if you have a chance to test it, RM>> rick RM>> RM>> Hi Rick, RM>> RM>> the patch doesn't help. RM>> RM>> I wrote a small test program, which opens a directory, calls RM>> getdents(2) RM>> in a loop and dumps that. I figured out, that the return of the system RM>> call depends on the buffer size I pass to it. The directory has a RM>> block RM>> size of 4k according to fstat(2). If I use that, I get some 300 of the RM>> almost 500 directory entries. If I use 8k, I get just around 200 and RM>> if I RM>> use 16k I get a handfull. If I dump the buffer in this case I see RM>> 0x200 RM>> bytes filled with directory entries, then a lot of zeros and starting RM>> from RM>> 0x1000 again data. This is of course ignored because of the zeros RM>> before. RM>> RM>And for this case getdents(2) returned 16K? It is normal for getdents(2) RM>to return less than requested and when end of dir occurs, it should return 0. RM> RM>But if it returns 16K, there shouldn't be zeroed space in the middle of RM>it. RM> RM>And this always occurs or only after you wait a while? (You noted in the RM>above description that it would be ok for a little while after a directory RM>change and then would break, which suggests some kind of caching problem.) Today in the morning everything was fine. After waiting 5 minutes, again only partial directories. When I do a read with 8k buffer size, getdents(2) returns 8k, but starting from 0x200 until 0x1000 the buffer is filled with zeros. The entry just before the zeroes ends exactly at 0x200 (that would be the first byte of the next entry) and at 0x1000 a new entry starts. The rest of the buffer is fine. The next read returns only 4k and seems to be fine - altough it contains some junk non-zero bytes in the padding. Ten minutes later again everything is fine. I tries to spy at the NFS communication with tcpdump, but it seems unwilling to display something useful about the NFS. Is it able to decode the readdir stuff? harti