Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Aug 1998 23:32:08 +0200
From:      Juergen Nickelsen <ni@tellique.de>
To:        questions@FreeBSD.ORG
Subject:   Re: Free BSD file system
Message-ID:  <35D9F2D8.5E638264@tellique.de>
References:  <35D9C4E6.2897@echidna.com> <19980818141707.54820@homenet>

next in thread | previous in thread | raw e-mail | index | archive | help
Aaron Jeremias Luz wrote:

> > I have a situation that involves storing the better part of a
> > million small (700 bytes to 1.9 kbytes) files (don't ask!). From a
> > filesystem efficiency point of view, what is a practical maximum
> > number of files per directory? How many directories can you have
> > under one directory?
>
> The filesystem should remain efficient. However, applications which
> read the directory may be overwhelmed. For example, ls sorts the
> names of the files in a directory before outputing them. Running ls
> on a directory with many thousands of files in it could take a
> while.

While you can have as many files and directories in a directory as you
want (provided you have enough space and inodes(*) on the partition),
having too many entries in a directory slows down all directory
operations. This does not only affect programs that sort the entries,
but also the kernel.

Every time a file in the directory is accessed, the file system has to
make a linear search through the directory to find the file's entry,
and every time a file is created, the directory is searched for a free
entry (or is extended after none has been found). While the directory
itself will be held in the buffer cache to avoid frequent disk access,
the search through the directory structure itself is expensive if you
have a million entries.

Look at what apache does in the proxy cache: in the cache directory it
creates a lot of subdirectories, each with a name one character long,
and in each of these again subdirectories of the same kind. This
hierarchy is by default three levels deep. For file and directory
names apache uses the 64 characters [a-ZA-Z0-9@_]. If you have three
levels of directories, you have 64**3 = 2**18 directories at level 3,
and if you put up to 64 files into these directories, you have 2**24
== 16 Million files.

To access one of these files, you need to search four directories each
64 entries long (in the worst case), which is a search over 256
entries -- significantly less than a search through a directory of a
million entries. You have to open three more directories for the
search, but that will easily pay off.

(*) The number of inodes (which hold the information about a file; a
directory entry is a reference to the inode) is definitely an issue if
we talk about a million files. Today I made a file system of 8.5 GB on
a new disk (under Solaris, though), and newfs created 1048060 inodes
-- barely enough for your case. You can change that number with an
option to newfs (look for "number of bytes per inode" in newfs(8)) to
have enough inodes for you files and directories.

Greetings, Juergen.

-- 
Juergen Nickelsen <ni@tellique.de>
Tellique Kommunikationstechnik GmbH
Gustav-Meyer-Allee 25, 13355 Berlin, Germany
Tel. +49 30 46307-552 / Fax +49 30 46307-579

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?35D9F2D8.5E638264>