Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Apr 2015 10:59:50 +0800
From:      Julian Elischer <julian@freebsd.org>
To:        Rick Macklem <rmacklem@uoguelph.ca>, Jilles Tjoelker <jilles@stack.nl>
Cc:        freebsd-current@freebsd.org, John Baldwin <jhb@freebsd.org>
Subject:   Re: readdir/telldir/seekdir problem (i think)
Message-ID:  <553B0326.1090306@freebsd.org>
In-Reply-To: <326462676.25571625.1429925971889.JavaMail.root@uoguelph.ca>
References:  <326462676.25571625.1429925971889.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On 4/25/15 9:39 AM, Rick Macklem wrote:
> Jilles Tjoelker wrote:
>> On Fri, Apr 24, 2015 at 04:28:12PM -0400, John Baldwin wrote:
>>> Yes, this isn't at all safe.  There's no guarantee whatsoever that
>>> the offset on the directory fd that isn't something returned by
>>> getdirentries has any meaning.  In particular, the size of the
>>> directory entry in a random filesystem might be a different size
>>> than the structure returned by getdirentries (since it converts
>>> things into a FS-independent format).
>>> This might work for UFS by accident, but this is probably why ZFS
>>> doesn't work.
>>> However, this might be properly fixed by the thing that ino64 is
>>> doing where each directory entry returned by getdirentries gives
>>> you a seek offset that you _can_ directly seek to (as opposed to
>>> seeking to the start of the block and then walking forward N
>>> entries until you get an inter-block entry that is the same).
>> The ino64 branch only reserves space for d_off and does not use it in
>> any way. This is appropriate since actually using d_off is a major
>> feature addition.
>>
> Well, at some point ino64 will need to define a new getdirentries(2)
> syscall and I believe this new syscall can have different/additional
> arguments.
yes, posix only specifies 2 mandatory fields (d_ino and d_name) and
everything else is implementation dependent.
> I'd suggest that the new gtedirentries(2) syscall should return a
> flag to indicate that the underlying file system is filling in d_off.
> Then the libc functions can use d_off if it it available.
> (They will still need to "work" at least as well as they do now if
>   the file system doesn't support d_off. The old getdirentries(2) syscall
>   will be returning the old/current "struct dirent" which doesn't have
>   the field anyhow.)
>
> Another bit of fun is that the argument for seekdir()/telldir() is a
> long and ends up 32bits for some arches. d_off is 64bits, since that
> is what some file systems require.
what does linux use?
------
       In glibc up to version 2.1.1, the return type of telldir() was 
off_t.
        POSIX.1-2001 specifies long, and this is the type used since glibc
        2.1.2.

also from the linux man page: this is interesting..

--------
        In early filesystems, the value returned by telldir() was a simple
        file offset within a directory.  Modern filesystems use tree 
or hash
        structures, rather than flat tables, to represent directories.  On
        such filesystems, the value returned by telldir() (and used
        internally by readdir(3)) is a "cookie" that is used by the
        implementation to derive a position within a directory. 
Application
        programs should treat this strictly as an opaque value, making no
        assumptions about its contents.
------
but glibc uses the contents in a nonopaque (and possibly wrong) way 
itself in seekdir. .
(not following their own advice.)


> Maybe the library code can only use d_off if it is a 64bit arch and
> the file system is filling it in. (Or maybe the library can keep track
> of 32<->64bit mappings for the offsets. I haven't looked at the libc
> functions for a while, so I can't remember what they keep track of.)

one supposes a 32 bit system would not have such large file systems on 
it..
(maybe?)
>
> rick
>
>> A proper d_off would still be useful even if UFS's readdir keeps
>> masking
>> off the offset so a directory read always starts at the beginning of
>> a
>> 512-byte directory block, since this allows more distinct offset
>> values
>> than safely using getdirentries()'s *basep. With d_off, one outer
>> loop
>> must read at least one directory block to avoid spinning
>> indefinitely,
>> while using getdirentries()'s *basep requires reading the whole
>> getdirentries() buffer.
>>
>> Some Linux filesystems go further and provide a unique d_off for each
>> entry.
>>
>> Another idea would be to store the last d_ino instead of dd_loc into
>> the
>> struct ddloc. On seekdir(), this would seek to loc_seek as before and
>> skip entries until that d_ino is found, or to the start of the buffer
>> if
>> not found (and possibly return some entries again that should not be
>> returned, but Samba copes with that).
>>
>> --
>> Jilles Tjoelker
>> _______________________________________________
>> freebsd-current@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to
>> "freebsd-current-unsubscribe@freebsd.org"
>>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?553B0326.1090306>