Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 May 2001 02:34:54 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Andrew Reilly <areilly@bigpond.net.au>
Cc:        Terry Lambert <tlambert@primenet.com>, hackers@freebsd.org
Subject:   Re: technical comparison
Message-ID:  <3B14BEBE.A93704AD@mindspring.com>
References:  <200105252049.NAA13292@usr06.primenet.com> <20010526192516.A2573@gurney.reilly.home> <20010526193723.B2573@gurney.reilly.home>

next in thread | previous in thread | raw e-mail | index | archive | help
Andrew Reilly wrote:
> On Sat, May 26, 2001 at 07:25:16PM +1000, Andrew Reilly wrote:
> > One of my personal mail folders has 4400 messages in it, and
> > I've only been collecting that one for a few years.  It's not
> > millions, but its a few more than the "500" that I've seen some
> > discuss here as a reasonable limit (why is that reasonable?) and
> > it's many many more than the 72 or so limit available in ADFS.
> 
> I realised as soon as I pressed the send button that my current
> use of large directories for mail files doesn't actually involve
> any random access: the directory is read sequentially to build
> the header list.
> 
> It is quite concievable that a performance tweak to the IMAP
> server could involve a header cache in a relational database of
> some sort, and that would certainly contain references to the
> individual files, which would then be accessed randomly.
> 
> /usr/ports/distfiles on any of the mirrors probably contains
> upwards of 5000 files too, and there is a strong likelyhood that
> these will be accessed out-of-order by ports-makefile-driven
> fetch requests.

Cyrus IMAP uses a header cache in precisely this way.

And since the cache files are created very early on, they
are early in the directory, and so do not suffer a large
startup penalty.

The searches for specific files would indeed be linear,
but they would be O(1) linear for each file.


As I said before, I replaced the FFS directory code with
a "trie" structured directory structure.  Using these
n-ary structures, you could very quickly look up any
individual files, and a linear traversal of the directory
to iterate all files (an increasingly common thing for
visual file browsers to do) was still O(1) linear.

People didn't find the patches very useful, beyond them
being "an interesting curiousity", since in reality, the
problem of huge directories tends not to exist in nature,
where code was written to deal with the limitations of
S51K and similar FS's, and thus doesn't tend to do things
like dump all its large number of files into a single
directory.

A similar set of patches cause iterated filenames to have
their vnodes prefaulted, which helps immensely in AppleTalk
and SMB file serving, where the protocol demands stat data
back at the same time because of assumptions about the host
OS's files.  This effectively enters them into the directory
cache, and locality of reference keeps them there.  This is
a really simple hack.  Then all you have to do is up your
directory cache size to whatever your favorite unreasonable
limit happens to be (e.g. 70,000), and everything becomes a
cache hit, after the initial load-up.

The second is still a clever hack (IMO), since it's still
useful, but you'd want it to be a per FS option, or at
least minimally an ioctl() to set the option on a directory
fd after opening it, so that exported SMBFS share would have
the behaviour, but your news spool would not.

As I said before, however, the tradeoff of better performance
on really obscene directories was not really worth the binary
backward compatability problems that resulted when switching
a system over (in other words, it was an interesting research
topic, but little more than that).

I think the "benchmark" in question is pretty lame, and the
things which it is attempting to "prove" will not occur in
real systems, unless you are running pessimal code.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3B14BEBE.A93704AD>