From owner-freebsd-hackers  Thu Jun 22  8:34:28 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP
	id 998C037C31B; Thu, 22 Jun 2000 08:34:24 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.9.3/8.9.3) with SMTP id LAA43074;
	Thu, 22 Jun 2000 11:33:17 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Thu, 22 Jun 2000 11:33:17 -0400 (EDT)
From: Robert Watson <rwatson@FreeBSD.ORG>
X-Sender: robert@fledge.watson.org
To: "Daniel O'Connor" <doconnor@gsoft.com.au>
Cc: Luigi Rizzo <luigi@info.iet.unipi.it>, hackers@FreeBSD.ORG,
	"Nicole Harrington." <nicole@unixgirl.com>, adrian@FreeBSD.ORG
Subject: Re: How many files can I put in one diretory?
In-Reply-To: <XFMail.000622171146.doconnor@gsoft.com.au>
Message-ID: <Pine.NEB.3.96L.1000622112834.42894A-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 22 Jun 2000, Daniel O'Connor wrote:

> On 22-Jun-00 Luigi Rizzo wrote:
> >  that sounds insane! Because a name is a name, why dont they call
> >  those files xx/yy/zz/tt.html and the like, to get down to a more
> >  reasonable # of files per directory.
> >  
> >  Or use a single file and a cgi which extracts things from the right place.
> >  In such a context, i assume that the best place to do the name lookup
> >  is in the app, not in the kernel.
> 
> Yeah.. This is why databases where invented :)
> 
> FYI 40000 in a directory really makes directory listings slow.. 2 million would
> suck :)

Actually, I'd choose a higher starting suck number -- if you're thinking
of ls, remember that ls attempts to read all of the entries into memory
and sort them.  The directory listing becomes much faster if you use
``-f'', which prevents sorting of output.  I have a cyrus server with
easily 50,000 entries in many directories and that has not been a serious
impediment to correct functioning, although no doubt there is a high
performance impact.

One possibility here, if the names of the files don't matter, is to make
use of Adrian Chadd's IFS, which avoids the issue by providing direct
inode # access to an FFS disk layout.  When opening a file, the inode
number is returned so that you can handle meta-data in your own database
(possibly on the same drive), which permits custom name mechanisms
optimized for seeks, etc.  This would be great, for example, for AFS and
Coda client caches and server storage, where the distributed file systems
provide their own stoarge for meta-data in internal databases (and in the
case of Coda, in a transactional database).  Name lookup against the IFS
space is O(1).

This code is not yet committed, but is definitely of interest.

  Robert N M Watson 

robert@fledge.watson.org              http://www.watson.org/~robert/
PGP key fingerprint: AF B5 5F FF A6 4A 79 37  ED 5F 55 E9 58 04 6A B1
TIS Labs at Network Associates, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message