Date: Sun, 11 Jan 2004 09:55:09 +1100 From: Peter Jeremy <peterjeremy@optushome.com.au> To: Tom Arnold <xyzzy@sysabend.org> Cc: freebsd-hackers@freebsd.org Subject: Re: Large Filesystem Woes Message-ID: <20040110225509.GA60996@server.vk2pj.dyndns.org> In-Reply-To: <20040109193551.GD39751@moo.sysabend.org> References: <20040109193551.GD39751@moo.sysabend.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jan 09, 2004 at 11:35:51AM -0800, Tom Arnold wrote: >Building a box thats going to house many billions of small files. Think >innd circa 1998 or someone trying to house AOLs mail system on cyrus or >something. This is probably going to stress any filesystem. You might like to consider an alternative approach to storing the files (eg some sort of database). > To this end I've hung a 3.3TB hardware raid off a BSD box >broken into 4 partitions. 3 1TB and 1 300GB. >Originally this was on a 4.9 box. da0s1 and da0s2 were formatted "stock" >( -f 2048 -b 16384 -i 8192 ) da1s1 and s2 were both formatted -f 512 -b 4096 >-i 512. I ran '-f 512 -b 4096' on a news server for a while but I found that '-f 1024 -b 8192' significantly improved performance (at the cost of a significant increase in disk space usage). >Switched to 5.2. Newfs'd the RAID for UFS2. First issue, if the machine >came up dirty, bgfsck seemed to do its thing and the machine was online and >usable after about 20 minutes however after a few hours I get this error : > >fsck: /dev/da1s1e: CANNOT CREATE SNAPSHOT /export/database/.snap/fsck_snapshot: File too large >fsck: /dev/da1s1e: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. I can't explain this. This means that mount(2) returned EFBIG - which isn't a documented error. I had a quick look through the sources and can't quickly see why EFBIG would get returned. >And the second thing I've noticed is I have lost a lot of space. >Under 4.9 with UFS da1s1e was approx 870gigs and s2e was around 180, now >I see : >Filesystem Size Used Avail Capacity iused ifree %iused Mounted on >/dev/da0s1e 992G 4.0K 912G 0% 2 134411260 0% /export/logs1 >/dev/da0s2e 992G 4.0K 912G 0% 2 134411260 0% /export/logs2 >/dev/da1s1e 510G 1.0K 469G 0% 2 2148661228 0% /export/database >/dev/da1s2e 94G 1.0K 86G 0% 2 395214332 0% /export/spare The size of a UFS1 inode is 128 bytes and a UFS2 inode is 256 bytes. With '-i 512', UFS2 allocates about 1/2 of your disk space to inodes. (And you have a further overhead of 8 bytes + name for each directory entry). >I'm not certain if I've run into some kind of weird limit here or a bug or >what and am looking for ideas to persue before I'm stuck going to an OS with >something journaled. Inode numbers are supposed to be u_int32_t but it's possible that they are being (incorrectly) treated as signed somewhere (and you have >2^31 inodes on da1s1e). Moving to a journalled filesystem won't necessarily help. I use DEC/Compaq/HP AdvFS at work - each file needs at least 282 bytes of metadata (under some circumstances, it can require multiple 282 byte metadata blocks) and from memory it is limited to 2^31 (or maybe 2^32) files. Our main fileserver has a filesystem with 2.7e6 files and we are continually running into undocumented "features" (aka bugs) as a result of the large number of files. (OTOH, I have no problems with 1.9e6 files in a UFS1 partition on a FreeBSD box). Peter
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040110225509.GA60996>