FreeBSD Mail Archives

Date:      Sun, 11 Jan 2004 09:55:09 +1100
From:      Peter Jeremy <peterjeremy@optushome.com.au>
To:        Tom Arnold <xyzzy@sysabend.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Large Filesystem Woes
Message-ID:  <20040110225509.GA60996@server.vk2pj.dyndns.org>
In-Reply-To: <20040109193551.GD39751@moo.sysabend.org>
References:  <20040109193551.GD39751@moo.sysabend.org>

On Fri, Jan 09, 2004 at 11:35:51AM -0800, Tom Arnold wrote:
>Building a box thats going to house many billions of small files.  Think
>innd circa 1998 or someone trying to house AOLs mail system on cyrus or
>something.

This is probably going to stress any filesystem.  You might like to
consider an alternative approach to storing the files (eg some sort of
database).

>  To this end I've hung a 3.3TB hardware raid off a BSD box
>broken into 4 partitions.  3 1TB and 1 300GB.
>Originally this was on a 4.9 box.  da0s1 and da0s2 were formatted "stock"
>( -f 2048 -b 16384 -i 8192 ) da1s1 and s2 were both formatted -f 512 -b 4096
>-i 512.

I ran '-f 512 -b 4096' on a news server for a while but I found that
'-f 1024 -b 8192' significantly improved performance (at the cost of
a significant increase in disk space usage).

>Switched to 5.2.  Newfs'd the RAID for UFS2.  First issue, if the machine
>came up dirty, bgfsck seemed to do its thing and the machine was online and
>usable after about 20 minutes however after a few hours I get this error :
>
>fsck: /dev/da1s1e: CANNOT CREATE SNAPSHOT /export/database/.snap/fsck_snapshot: File too large
>fsck: /dev/da1s1e: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

I can't explain this.  This means that mount(2) returned EFBIG - which
isn't a documented error.  I had a quick look through the sources and
can't quickly see why EFBIG would get returned.

>And the second thing I've noticed is I have lost a lot of space.
>Under 4.9 with UFS da1s1e was approx 870gigs and s2e was around 180, now
>I see :
>Filesystem    Size   Used  Avail Capacity iused      ifree %iused  Mounted on
>/dev/da0s1e   992G   4.0K   912G     0%       2  134411260    0%   /export/logs1
>/dev/da0s2e   992G   4.0K   912G     0%       2  134411260    0%   /export/logs2
>/dev/da1s1e   510G   1.0K   469G     0%       2 2148661228    0%   /export/database
>/dev/da1s2e    94G   1.0K    86G     0%       2  395214332    0%   /export/spare

The size of a UFS1 inode is 128 bytes and a UFS2 inode is 256 bytes.
With '-i 512', UFS2 allocates about 1/2 of your disk space to inodes.
(And you have a further overhead of 8 bytes + name for each directory
entry).

>I'm not certain if I've run into some kind of weird limit here or a bug or
>what and am looking for ideas to persue before I'm stuck going to an OS with
>something journaled.

Inode numbers are supposed to be u_int32_t but it's possible that they
are being (incorrectly) treated as signed somewhere (and you have >2^31
inodes on da1s1e).

Moving to a journalled filesystem won't necessarily help.  I use
DEC/Compaq/HP AdvFS at work - each file needs at least 282 bytes of
metadata (under some circumstances, it can require multiple 282 byte
metadata blocks) and from memory it is limited to 2^31 (or maybe 2^32)
files.  Our main fileserver has a filesystem with 2.7e6 files and we
are continually running into undocumented "features" (aka bugs) as a
result of the large number of files.  (OTOH, I have no problems with
1.9e6 files in a UFS1 partition on a FreeBSD box).

Peter

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040110225509.GA60996>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation