Date: Sun, 26 Jul 2009 10:19:44 -0400 From: Maxim Khitrov <mkhitrov@gmail.com> To: "b. f." <bf1783@googlemail.com> Cc: freebsd-questions@freebsd.org Subject: Re: UFS2 tuning for heterogeneous 4TB file system Message-ID: <26ddd1750907260719x761a1c94r27c572ab1ff6a582@mail.gmail.com> In-Reply-To: <d873d5be0907260056ib6906cbpae649f880ec7493f@mail.gmail.com> References: <d873d5be0907260056ib6906cbpae649f880ec7493f@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jul 26, 2009 at 3:56 AM, b. f.<bf1783@googlemail.com> wrote: >>The file system in question will not have a common file size (which is >>what, as I understand, bytes per inode should be tuned for). There >>will be many small files (< 10 KB) and many large ones (> 500 MB). A >>similar, in terms of content, 2TB ntfs file system on another server >>has an average file size of about 26 MB with 59,246 files. > > Ordinarily, it may have a large variation in file sizes, =C2=A0but can yo= u > intervene, and segregate large and small files in separate > filesystems, so that you can optimize the settings for each > independently? That's a good idea, but the problem is that this raid array will grow in the future as I add additional drives. As far as I know, a partition can be expanded using growfs, but it cannot be moved to a higher address (with any "standard" tools). So if I create two separate partitions for different file types, the first partition will have to remain a fixed size. That would be problematic, since I cannot easily predict how much space it would need initially and for all future purposes (enough to store all the files, yet not waste space that could otherwise be used for the second partition). >>Ideally, I would prefer that small files do not waste more than 4 KB >>of space, which is what you have with ntfs. At the same time, having >>fsck running for days after an unclean shutdown is also not a good >>option (I always disable background checking). From what I've gathered >>so far, the two requirements are at the opposite ends in terms of file >>system optimization. > > I gather you are trying to be conservative, but have you considered > using gjournal(8)? =C2=A0At least for the filesystems with many small > files? =C2=A0In that way, you could safely avoid the need for most if not > all use of fsck(8), and, as an adjunct benefit, you would be able to > operate on the small files more quickly: > > http://lists.freebsd.org/pipermail/freebsd-current/2006-June/064043.html > http://www.freebsd.org/doc/en_US.ISO8859-1/articles/gjournal-desktop/arti= cle.html > > gjournal has a lower overhead than ZFS, and has proven to be fairly > reliable. =C2=A0Also, you can always unhook it and revert to plain UFS > mounts easily. > > b. > Just fairly reliable? :) I've done a bit of reading on gjournal and the main thing that's preventing me from using it is the recency of implementation. I've had a number of FreeBSD servers go down in the past due to power outages and SoftUpdates with foreground fsck have never failed me. I have never had a corrupt ufs2 partition, which is not something I can say about a few linux servers with ext3. Have there been any serious studies into how gjournal and SU deal with power outages? By that I mean taking two identical machines, issuing write operations, yanking the power cords, and then watching both systems recover? I'm sure that gjournal will take less time to reboot, but if this experiment is repeated a few hundred times I wonder what the corruption statistics would be. Is there ever a case, for instance, when the journal itself becomes corrupt because the power was pulled in the middle of a metadata flush? Basically, I have no experience with gjournal, poor experience with other journaled file systems, and no real comparison between reliability characteristics of gjournal and SoftUpdates, which have served me very well in the past. - Max
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?26ddd1750907260719x761a1c94r27c572ab1ff6a582>