From owner-freebsd-current@FreeBSD.ORG Mon May 4 16:10:41 2015 Return-Path: Delivered-To: current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 756C62B9 for ; Mon, 4 May 2015 16:10:41 +0000 (UTC) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps1.elischer.org", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 331D511F3 for ; Mon, 4 May 2015 16:10:40 +0000 (UTC) Received: from Julian-MBP3.local (ppp121-45-241-118.lns20.per4.internode.on.net [121.45.241.118]) (authenticated bits=0) by vps1.elischer.org (8.14.9/8.14.9) with ESMTP id t44GAYte025881 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Mon, 4 May 2015 09:10:37 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <554799F4.9030309@freebsd.org> Date: Tue, 05 May 2015 00:10:28 +0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Konstantin Belousov CC: Jilles Tjoelker , "current@freebsd.org" Subject: Re: seekdir/readdir patch .. Call for Review. References: <55432593.6050709@freebsd.org> <20150501161742.GW2390@kib.kiev.ua> <20150503143345.GB70576@stack.nl> <554787BA.1020105@freebsd.org> <20150504160437.GI2390@kib.kiev.ua> In-Reply-To: <20150504160437.GI2390@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 May 2015 16:10:41 -0000 On 5/5/15 12:04 AM, Konstantin Belousov wrote: > On Mon, May 04, 2015 at 10:52:42PM +0800, Julian Elischer wrote: >> On 5/3/15 10:33 PM, Jilles Tjoelker wrote: >>> On Fri, May 01, 2015 at 07:17:42PM +0300, Konstantin Belousov wrote: >>>> On Fri, May 01, 2015 at 03:04:51PM +0800, Julian Elischer wrote: >>>>> if you are interested in readdir(3), seekdir(3) and telldir(3) then >>>>> you should look at >>>>> https://reviews.freebsd.org/D2410 >>>>> this patches around a problem in seekdir() that breaks Samba. >>>>> Seekdir(3) will not work as expected when files prior to the point of >>>>> interest in directory have been deleted since the directory was opened. >>>>> Windows clients using Samba cause both these things to happen, causing >>>>> the next readdir(3) after the bad seekdir(3) to skip some entries and >>>>> return the wrong file. >>>>> Samba only needs to step back a single directory entry in the case >>>>> where it reads an entry and then discovers it can't fit it into the >>>>> buffer it is sending to the windows client. It turns out we can >>>>> reliably cater to Samba's requirement because the "last returned >>>>> element" is always still in memory, so with a little care, we can >>>>> set our filepointer back to it safely. (once) >>>>> seekdir and readdir (and telldir()) need a complete rewrite along with >>>>> getdirentries() but that is more than a small edit like this. >>>> Can you explain your expectations from the whole readdir() vs. parallel >>>> directory modifications interaction ? From what I understood so far, >>>> there is unlocked modification of the container and parallel iterator >>>> over the same container. IMO, in such situation, whatever tweaks you >>>> apply to the iterator, it is still cannot be made reliable. >>>> Before making single-purpose changes to the libc readdir and seekdir >>>> code, or to the kernel code, it would be useful to state exact behaviour >>>> of the dirent machinery we want to see. No, 'make samba works in my >>>> situation' does not sound good enough. >>> Consider the subsequence of entries that existed at opendir() time and >>> were not removed until now. This subsequence is clearly defined and does >>> not have concurrency problems. The order of this subsequence must remain >>> unchanged and seekdir() must be correct with respect to this >>> subsequence. >>> >>> Additionally, two other kinds of entries may be returned. New entries >>> may be inserted anywhere in between the entries of the subsequence, and >>> removed entries may be returned as if they were still part of the >>> subsequence (so that not every readdir() needs a system call). >>> >>> A simple implementation for UFS-style directories is to store the offset >>> in the directory (all bits of it, not masking off the lower 9 bits). >>> This needs d_off or similar in struct dirent. The kernel getdirentries() >>> then needs a similar loop as the old libc seekdir() to go from the start >>> of the 512-byte directory block to the desired entry (since an entry may >>> not exist at the stored offset within the directory block). >>> >>> This means that a UFS-style directory cannot be compacted (existing >>> entries moved from higher to lower offsets to fill holes) while it is >>> open for reading. An NFS exported directory is always open for reading. >>> >>> This also means that duplicate entries can only be returned if that >>> particular filename was deleted and created again. >>> >>> Without kernel support, it is hard to get telldir/seekdir completely >>> reliable. The current libc implementation is wrong since the "holes" >>> within the block just disappear and change the offsets of the following >>> entries; the kernel cannot fix this using entries with d_fileno = 0 >>> since it cannot know, in the general case, how long the deleted entry >>> was in the filesystem-independent dirent format. My previous idea of >>> storing one d_fileno during telldir() is wrong since it will fail if >>> that entry is deleted. >>> >>> If you do not care about memory usage (which probably is already >>> excessive with the current libc implementation), you could store at >>> telldir() time the offset of the current block returned by >>> getdirentries() and the d_fileno of all entries already returned in the >>> current block. >>> >>> The D2410 patch can conceptually work for what Samba needs, stepping >>> back one directory entry. I will comment on it. >>> >> how long do I have to wait until I can commit this and was kib's >> comment a >> "do not commit"? > No, I only mean what I asked about. I do not have strong objections about > the patch, but whatever is done in this regard, should clearly explain the > case it handles and related limitations (IMO). ok I'll add some comments and add more in the commit message and man page. JUlian