Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Apr 2015 16:03:08 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Julian Elischer <julian@freebsd.org>
Cc:        freebsd-current@freebsd.org, John Baldwin <jhb@freebsd.org>
Subject:   Re: readdir/telldir/seekdir problem (i think)
Message-ID:  <1101073752.26759547.1430164988301.JavaMail.root@uoguelph.ca>
In-Reply-To: <553E676D.1020902@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Julian Elischer wrote:
> On 4/25/15 4:28 AM, John Baldwin wrote:
> > On Saturday, April 25, 2015 02:36:24 AM Julian Elischer wrote:
> >> On 4/25/15 1:30 AM, Julian Elischer wrote:
> >>> On 4/24/15 10:59 PM, John Baldwin wrote:
> >>>> Index: head/lib/libc/gen/telldir.c
> >>>> ===================================================================
> >>>> --- head/lib/libc/gen/telldir.c (revision 281929)
> >>>> +++ head/lib/libc/gen/telldir.c (working copy)
> >>>> @@ -101,8 +101,10 @@
> >>>>                   return;
> >>>>           if (lp->loc_loc == dirp->dd_loc && lp->loc_seek ==
> >>>> dirp->dd_seek)
> >>>>                   return;
> >>>> -       (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek,
> >>>> SEEK_SET);
> >>>> -       dirp->dd_seek = lp->loc_seek;
> >>>> +       if (lp->loc_seek != dirp->dd_seek) {
> >>>> +               (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek,
> >>>> SEEK_SET);
> >>>> +               dirp->dd_seek = lp->loc_seek;
> >>>> +       }
> >>> yes I did that yesterday but it still fails when you transition
> >>> blocks.. (badly).
> >>>
> >>> I also tried bigger blocks.. also fails (eventually)
> >>>
> >>> I did find a way to make it work...  you had to seek back
> >>> to the first block you deleted on each set..
> >>> then work forward from there again..  unfortunately since
> >>> I'm trying to make a microsoft program not fail (via samba)
> >>> I have no control over how it does things and seekdir doesn't
> >>> know what was deleted anyway... (so the fix is fine for  the
> >>> test program but not for real life)
> >>>
> >>> I think I can make the BSD one act like the linux one by changing
> >>> the lseek being done to use the offset (loc) plus the buffer seek
> >>> address of the target, instead of just going for the buffer base
> >>> and
> >>> stepping forward through the entries..
> >>>
> >>> maybe tomorrow.
> >>>
> >> The following conditional code makes ours behave the same as the
> >> linux
> >> one.
> >> it breaks several 'rules' but works where ours is clean but
> >> fails..
> >> as Rick said..  "maybe that's what we should do too."
> >>
> >>
> >> this is at the end of seekdir()
> >>
> >>
> >> The new code does what linux does.. and shouldn't work.. but does
> >>               // at least in the limited conditions I need it to.
> >>               // We'll probably need to do this at work...:
> >>
> >>
> >> The original code is what we have now, but gets mightily confused
> >> sometimes.
> >>          // This is clean(er) but fails in specific
> >>          situations(when
> >> doing commands
> >>          // from Microft windows, via samba).
> >>
> >>
> >> root@vps1:/tmp # diff -u dir.c.orig dir.c
> >> --- dir.c.orig    2015-04-24 11:29:36.855317000 -0700
> >> +++ dir.c    2015-04-24 11:15:49.058500000 -0700
> >> @@ -1105,6 +1105,13 @@
> >>            dirp->dd_loc = lp->loc_loc;
> >>            return;
> >>        }
> >> +#ifdef GLIBC_SEEK
> >> +    (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek + lp->loc_loc,
> >> SEEK_SET);
> >> +    dirp->dd_seek = lp->loc_seek + lp->loc_loc;
> >> +    dirp->dd_loc = 0;
> >> +    lp->loc_seek = dirp->dd_seek;
> >> +    lp->loc_loc = 0;
> >> +#else
> >>        (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek, SEEK_SET);
> >>        dirp->dd_seek = lp->loc_seek;
> >>        dirp->dd_loc = 0;
> >> @@ -1114,6 +1121,7 @@
> >>            if (dp == NULL)
> >>                break;
> >>        }
> >> +#endif
> >>    }
> > Yes, this isn't at all safe.  There's no guarantee whatsoever that
> > the offset on the directory fd that isn't something returned by
> > getdirentries has any meaning.  In particular, the size of the
> > directory entry in a random filesystem might be a different size
> > than the structure returned by getdirentries (since it converts
> > things into a FS-independent format).
> >
> > This might work for UFS by accident, but this is probably why ZFS
> > doesn't work.
> >
> > However, this might be properly fixed by the thing that ino64 is
> > doing where each directory entry returned by getdirentries gives
> > you a seek offset that you _can_ directly seek to (as opposed to
> > seeking to the start of the block and then walking forward N
> > entries until you get an inter-block entry that is the same).
> I just made the stunning discovery that our seekdir/readdir/telldir
> code in libc works with
> FreeBSD 8.0.
> so maybe the problem is that the kernel changed it's behaviour, and
> no-one thought to fix libc..
> 
> (at least it works on one of our 8.0 base appliances.. I'll do more
> testing tomorrow.. it's past midnight.)
> 
I suspect that pre-r252438 systems work better for UFS than r252438
or later. That patch changed ufs_readdir() so that it no longer returned
the on-disk directory structure. (Among other things, it added code that
skipped over d_ino == 0 entries.)

As such, r252438 and later systems have UFS where the "logical" offset
of a directory entry returned by getdirentries() isn't the same as the
"physical" offset for it in the on-disk directory.

Having said the above, I have two somewhat inconsistent thoughts:
1 - As jhb has explained, the libc functions aren't safe for telldir()/seekdir()
    when entries are added/deleted. It just happens that UFS might work
    ok (and is more likely to work ok when "logical offset" == "physical offset").
2 - I'm not sure r252438 was a good idea (at least the part that skips invalid
    d_ino == 0 entries) because I don't think making "logical offset" != "physical offset"
    is a good idea, if there isn't a good reason to need to do so.

I think it is hard to argue that r252438 broke the libc functions. It just
happens that cases that aren't guaranteed to work happens to work without r252438.

I also think that the use of d_off (or d_cookie, if you prefer that name), which
would be the "physical offset" of the next directory entry is the best bet
for fixing this, in general. (By in general, I mean for all file systems.)
But this will require a new getdirentries(2) syscall and libc functions that
know how to use it.

rick

> 
> 
> >
> 
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to
> "freebsd-current-unsubscribe@freebsd.org"
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1101073752.26759547.1430164988301.JavaMail.root>