Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Apr 2015 02:36:24 +0800
From:      Julian Elischer <julian@freebsd.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: readdir/telldir/seekdir problem (i think)
Message-ID:  <553A8D28.7090901@freebsd.org>
In-Reply-To: <553A7DB0.8080308@freebsd.org>
References:  <55386505.70708@freebsd.org> <10872728.5fNYcpCvKN@ralph.baldwin.cx> <5539CE6F.5040802@freebsd.org> <3179863.LLgeEeSADf@ralph.baldwin.cx> <553A7DB0.8080308@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 4/25/15 1:30 AM, Julian Elischer wrote:
> On 4/24/15 10:59 PM, John Baldwin wrote:
>> On Friday, April 24, 2015 01:02:39 PM Julian Elischer wrote:
>>> On 4/23/15 9:54 PM, John Baldwin wrote:
>>>> On Thursday, April 23, 2015 05:02:08 PM Julian Elischer wrote:
>>>>> On 4/23/15 11:20 AM, Julian Elischer wrote:
>>>>>> I'm debugging a problem being seen with samba 3.6.
>>>>>>
>>>>>> basically  telldir/seekdir/readdir don't seem to work as 
>>>>>> advertised..
>>>>> ok so it looks like readdir() (and friends) is totally broken in 
>>>>> the face
>>>>> of deletes unless you read the entire directory at once or reset 
>>>>> to the
>>>>> the first file before the deletes, or earlier.
>>>> I'm not sure that Samba isn't assuming non-portable behavior.  
>>>> For example:
>>>>
>>>> >From 
>>>> http://pubs.opengroup.org/onlinepubs/009695399/functions/readdir_r.html 
>>>>
>>>>
>>>> If a file is removed from or added to the directory after the 
>>>> most recent call
>>>> to opendir() or rewinddir(), whether a subsequent call to 
>>>> readdir() returns an
>>>> entry for that file is unspecified.
>>>>
>>>> While this doesn't speak directly to your case, it does note that 
>>>> you will
>>>> get inconsistencies if you scan a directory concurrent with add 
>>>> and remove.
>>>>
>>>> UFS might kind of work actually since deletes do not compact the 
>>>> backing
>>>> directory, but I suspect NFS and ZFS would not work.  In 
>>>> addition, our
>>>> current NFS support for seekdir is pretty flaky and can't be 
>>>> fixed without
>>>> changes to return the seek offset for each directory entry (I 
>>>> believe that
>>>> the projects/ino64 patches include this since they are breaking 
>>>> the ABI of
>>>> the relevant structures already).  The ABI breakage makes this a 
>>>> very
>>>> non-trivial task.  However, even if you have that per-item 
>>>> cookie, it is
>>>> likely meaningless in the face of filesystems that use any sort 
>>>> of more
>>>> advanced structure than an array (such as trees, etc.) to store 
>>>> directory
>>>> entries.  POSIX specifically mentions this in the rationale for 
>>>> seekdir:
>>>>
>>>> One of the perceived problems of implementation is that returning 
>>>> to a given point in a directory is quite difficult to describe 
>>>> formally, in spite of its intuitive appeal, when systems that use 
>>>> B-trees, hashing functions, or other similar mechanisms to order 
>>>> their directories are considered. The definition of seekdir() and 
>>>> telldir() does not specify whether, when using these interfaces, 
>>>> a given directory entry will be seen at all, or more than once.
>>>>
>>>> In fact, given that quote, I would argue that what Samba is doing is
>>>> non-portable.  This would seem to indicate that a conforming 
>>>> seekdir could
>>>> just change readdir to immediately return EOF until you call 
>>>> rewinddir.
>>> how does readdir know that the directory has been changed? fstat?
>> Undefined.  FreeBSD's libc doesn't cache the entire directory 
>> (unless you
>> are using a union mount), instead it just caches the most recent 
>> call to
>> getdirentries().  It then serves up entries from that block until 
>> you hit
>> the end.  When you hit the end it reads the next block, etc. When you
>> call telldir(), the kernel saves the seek offset from the start of the
>> current block and the offset within that block and returns a cookie to
>> you.  seekdir() looks up the cookie to find the (seek offset, block 
>> offset)
>> tuple.  If the location matches the directory's current location it 
>> doesn't
>> do anything, otherwise it seeks to the seek offset and reads a new 
>> block via
>> getdirentries().  There is no check for seeing if a directory is
>> changed.  Instead, you can only be stale by one "block".  When you 
>> read
>> a new block it is relative to the directory's state at that time.
>>
>> Rick's suggestion of reusing the block for a seek within a block 
>> would be
>> fairly easy to implement, I think it would just be:
>>
>> Index: head/lib/libc/gen/telldir.c
>> ===================================================================
>> --- head/lib/libc/gen/telldir.c (revision 281929)
>> +++ head/lib/libc/gen/telldir.c (working copy)
>> @@ -101,8 +101,10 @@
>>                  return;
>>          if (lp->loc_loc == dirp->dd_loc && lp->loc_seek == 
>> dirp->dd_seek)
>>                  return;
>> -       (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek, SEEK_SET);
>> -       dirp->dd_seek = lp->loc_seek;
>> +       if (lp->loc_seek != dirp->dd_seek) {
>> +               (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek, 
>> SEEK_SET);
>> +               dirp->dd_seek = lp->loc_seek;
>> +       }
>
> yes I did that yesterday but it still fails when you transition 
> blocks.. (badly).
>
> I also tried bigger blocks.. also fails (eventually)
>
> I did find a way to make it work...  you had to seek back
> to the first block you deleted on each set..
> then work forward from there again..  unfortunately since
> I'm trying to make a microsoft program not fail (via samba)
> I have no control over how it does things and seekdir doesn't
> know what was deleted anyway... (so the fix is fine for  the
> test program but not for real life)
>
> I think I can make the BSD one act like the linux one by changing 
> the lseek being done to use the offset (loc) plus the buffer seek 
> address of the target, instead of just going for the buffer base and 
> stepping forward through the entries..
>
> maybe tomorrow.
>

The following conditional code makes ours behave the same as the linux 
one.
it breaks several 'rules' but works where ours is clean but fails..
as Rick said..  "maybe that's what we should do too."


this is at the end of seekdir()


The new code does what linux does.. and shouldn't work.. but does
             // at least in the limited conditions I need it to.
             // We'll probably need to do this at work...:


The original code is what we have now, but gets mightily confused 
sometimes.
        // This is clean(er) but fails in specific situations(when 
doing commands
        // from Microft windows, via samba).


root@vps1:/tmp # diff -u dir.c.orig dir.c
--- dir.c.orig    2015-04-24 11:29:36.855317000 -0700
+++ dir.c    2015-04-24 11:15:49.058500000 -0700
@@ -1105,6 +1105,13 @@
          dirp->dd_loc = lp->loc_loc;
          return;
      }
+#ifdef GLIBC_SEEK
+    (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek + lp->loc_loc, 
SEEK_SET);
+    dirp->dd_seek = lp->loc_seek + lp->loc_loc;
+    dirp->dd_loc = 0;
+    lp->loc_seek = dirp->dd_seek;
+    lp->loc_loc = 0;
+#else
      (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek, SEEK_SET);
      dirp->dd_seek = lp->loc_seek;
      dirp->dd_loc = 0;
@@ -1114,6 +1121,7 @@
          if (dp == NULL)
              break;
      }
+#endif
  }
>
>
>
>
>
>>          dirp->dd_loc = 0;
>>          while (dirp->dd_loc < lp->loc_loc) {
>>                  dp = _readdir_unlocked(dirp, 0);
>>
>>
>> (You still want to re-parse the block I think since the telldir offset
>> might be from an earlier version of the block and not point to a valid
>> directory entry since they are variable sized.)
>>
>
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to 
> "freebsd-current-unsubscribe@freebsd.org"
>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?553A8D28.7090901>