Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Jan 2005 16:48:51 -0800
From:      Tim Kientzle <kientzle@freebsd.org>
To:        David Schultz <das@freebsd.org>
Cc:        Pawel Jakub Dawidek <pjd@freebsd.org>
Subject:   Re: fts improvements, alternatives
Message-ID:  <41E716F3.20504@freebsd.org>
In-Reply-To: <20050113072153.GA28485@VARK.MIT.EDU>
References:  <200501120735.j0C7ZABq048856@repoman.freebsd.org> <41E5ED66.4070902@freebsd.org> <20050113072153.GA28485@VARK.MIT.EDU>

next in thread | previous in thread | raw e-mail | index | archive | help
[Moved to -current for further discussion.]

David Schultz wrote:
>Tim Kientzle wrote:
>>Pawel Jakub Dawidek wrote:
>>
>>> Introduce new field 'fts_bignum' ...
>>> This work is part of the BigDisk project:
>>>         http://www.FreeBSD.org/projects/bigdisk/
>>
>>Any plans to deal with other fts limits ... ?
> 
> Removing FTS_LOGICAL's path length limitation is nontrivial, but
> given your experience with bsdtar, I'd say you're an ideal person
> to do it.  ;-)

As it happens, I started tinkering with these ideas a
while ago but haven't found time to finish it all.

Here's a snapshot of the current WIP:

http://people.freebsd.org/~kientzle/libarchive/src/tree.tgz

This includes a simple main() test function that
just does the rough equivalent of "find .".

> .. fts() effectively uses chdir("..") to
> climb up one level in the tree in chdir mode.  If it chdir'd
> across a symlink earlier, this would put it in the wrong place.
> Perhaps you have a better solution, but my best idea is to keep
> the parent directory open when chdiring ...

The tree package does exactly this.  It just keeps a
flag with each ancestor node indicating the type of traversal
that's needed.

> A more uniform approach would be to ... always keep the
> parent open when descending a level.  Unfortunately, this limits
> the traversal depth to the number of available file descriptors.

The tree package does not do this for exactly this reason.
I have some ideas for handling the case where the number of symlinks
exceeds the available file descriptors, but that doesn't seem
particularly pressing at the moment.  <grin>

> (On the other hand, I would argue that anyone with a directory
> tree a few thousand levels deep is asking for trouble.)

I thought you *wanted* to support big disks! ;-)

I started this work partly because I wanted to really stress
the long pathname support in bsdtar.  I've archived
directory trees with megabyte pathnames (several thousand
directory levels crossing several hundred symlinks) in testing.
Of course, I can't yet *dearchive* such deep trees.
That seems to be a harder problem.

The tree package also keeps a *lot* less data in memory than fts.
It has no trouble with million-entry directories, for example.
In comparison, the current ls crashes on such large directories
in part because of the memory required for fts.

The tree package is quite a bit different in many respects:
  * The traversal is in a very different order, for instance.
  * It has a completely different API than fts.
    It's fully opaque (so should be easier to change in the future,
    unlike fts).
  * It takes a very different approach to determine
    when to visit a child.  In particular, instead of
    the client specifying a mode and optionally inhibiting the descent
    through a "prune" request, the tree package has the client
    specifically request descent.  If you request descent for
    *every* item, you'll end up with a logical traversal; if you
    request descent for every dir, you'll end up with a physical
    traversal.  (The tree package is smart enough to ignore any
    descent request that isn't for a dir or a link to a dir.)

Feedback, suggestions, etc. appreciated.

Tim



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?41E716F3.20504>