From owner-freebsd-standards@FreeBSD.ORG Tue Jun 11 21:40:07 2013 Return-Path: Delivered-To: freebsd-standards@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2C8B3EA2 for ; Tue, 11 Jun 2013 21:40:07 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 1E9921EA5 for ; Tue, 11 Jun 2013 21:40:07 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r5BLe62q019921 for ; Tue, 11 Jun 2013 21:40:06 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r5BLe6xb019920; Tue, 11 Jun 2013 21:40:06 GMT (envelope-from gnats) Date: Tue, 11 Jun 2013 21:40:06 GMT Message-Id: <201306112140.r5BLe6xb019920@freefall.freebsd.org> To: freebsd-standards@FreeBSD.org Cc: From: Jilles Tjoelker Subject: Re: standards/179248: A return value of telldir(3) only seekable for once X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Jilles Tjoelker List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jun 2013 21:40:07 -0000 The following reply was made to PR standards/179248; it has been noted by GNATS. From: Jilles Tjoelker To: Akinori MUSHA Cc: freebsd-gnats-submit@FreeBSD.org Subject: Re: standards/179248: A return value of telldir(3) only seekable for once Date: Tue, 11 Jun 2013 23:29:53 +0200 On Mon, Jun 03, 2013 at 07:14:46AM +0000, Akinori MUSHA wrote: > >Number: 179248 > >Category: standards > >Synopsis: A return value of telldir(3) only seekable for once > [snip] > >Description: > Our implementation of telldir(3)/seekdir(3) is not POSIX compliant in > that a value obtained from telldir(3) is invalidated after calling > seekdir(3) and then readdir(3). > IEEE Std 1003.1, 2008/2013 says that only a call of rewinddir(3) may > invalidate the location values returned by telldir(3): > If the value of loc was not obtained from an earlier call to > telldir(), or if a call to rewinddir() occurred between the call > to telldir() and the call to seekdir(), the results of subsequent > calls to readdir() are unspecified. I think the problem is that telldir()/seekdir() want to return to the same directory entry within the block, instead of to the beginning of the block, while the required bits for the entry within the block are not available in telldir()'s return value. Some other platforms provide kernel support for this operation. The struct dirent has a field d_off which is the file offset of the next entry. It looks like the ino64 patches from Gleb Kurtsou add this functionality to the FreeBSD kernel. With this, telldir() returns the d_off value in the last dirent returned by readdir() (or 0) and seekdir() simply calls lseek(). As a result, a telldir()/seekdir() sequence may set the directory "backwards" a few entries even if it has been unmodified, because UFS truncates the offset to a block boundary. This may require a network filesystem to deny requests for a single directory entry at a time. Alternatively, UFS may replace the truncated bits with the number of directory entries to skip. This takes advantage of d_off being more like a "cookie" than a true file offset. The kernel may have a similar "out of bits" problem when an application with 32-bit long calls getdirentries(2) on an NFSv3 directory which returns 64-bit cookies, and also with unionfs and mount -o union. In the case of unionfs, the kernel appears to use some sort of state in the unionfs vnode and assumes that the directory cookies are otherwise unique enough. This likely causes problems if lseek() is used with a non-zero offset/cookie. In the case of mount -o union, the kernel "solves" the problem by irreversibly modifying the open file description to refer to the lower layer after the upper layer's entries have been read; the only way to deal with this in userland is to read the entire directory on a duplicate open file description (created with open(fd, ".", ...)) on opendir() and rewinddir() (bug: rewinddir() does not do this, violating POSIX's requirement that rewinddir() pick up changes made to the directory). > >Fix: > I don't have a quick fix for this, as it may need a revamp of how the > location thing is defined. > NetBSD seems to have a different implementation which doesn't have > this problem. > However, I'm not sure if theirs is flawless esp. wrt memory > management. NetBSD stores the (block, entry) pairs uniquely and for the life of the DIR object (perhaps discarding them upon rewinddir() as well). This means no memory is "leaked" per se but memory consumption on a DIR that has many telldir() calls is proportional to the number of entries in the directory. Also, a "proper" solution is possible if you are willing to accept that it does not work for all filesystems. Most filesystems leave some of the bits zero (particularly if there are 64 of them) which can then be used to store the entry number. However, a malloc-based solution is then still necessary for filesystems that do need all the bits or very large directories. -- Jilles Tjoelker