Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 11 Dec 2011 12:10:32 -0800 (PST)
From:      "Pedro F. Giffuni" <giffunip@tutopia.com>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        current@freebsd.org
Subject:   Re: calling all fs experts
Message-ID:  <1323634232.36004.YahooMailClassic@web113503.mail.gq1.yahoo.com>

next in thread | raw e-mail | index | archive | help
=0A--- Dom 11/12/11, Kostik Belousov <kostikbel@gmail.com> ha scritto:=0A=
=0A> =0A> If you wanted to get responses from experts only, sorry in=0A> ad=
vance.=0A>=0A=0AI am no fs expert but just thought I'd mention some things=
=0Abased on my playing with the BSD ext2fs ...=0A =0A> The fs (AKA UFS) use=
s clustering provided by the block=0A> cache. The clustering=0A> code, main=
ly located in the kern/vfs_cluster.c, coalesces=0A> sequence of=0A> reads o=
r writes that are targeting the consequtive blocks,=0A> into single=0A> phy=
sical read or write of the maximal size of MAXPHYS.=0A> Current definition=
=0A> of MAXPHYS is 128KB.=0A>=0A=0AThe clustering code is really cool and t=
he idea is that it=0Agives UFS the advantages of an extent based fs.=0AI ha=
ven't seen benchmarks in UFS2 but on ext2 it didn't=0Aseem to work as it sh=
ould though. =0A=0AOne issue is that ext2 doesn't support fragments and as=
=0Aa consequence ext2 will not use big blocksizes. This is a=0Alimitation i=
n the ext2 design that UFS doesn't have, but=0Astill linux's ext2fs outperf=
orms UFS in async mode (we do=0Ashine in sync mode).=0A=0AIt was never clea=
r exactly why this happens but it would=0Aappear there is a bottleneck in g=
eom that is not good in=0Awriting many contiguous blocks.=0A=0A> Clustering=
 allows filesystem to improve the layout of the=0A> files by calling=0A> VO=
P_REALLOCBLKS() to redo the allocation to make the=0A> writing sequence of=
=0A> blocks sequential if it is not.=0A> =0A> Even if file is not layed out=
 ideally, or the i/o pattern=0A> is random, most=0A> writes scheduled are a=
synchronous, and for reads, the=0A> system tries to=0A> schedule read-ahead=
s for some limited number of blocks.=0A> This allows the=0A> lower layers, =
i.e. geom and disk drivers, to optimize the=0A> i/o queue=0A> to coalesce r=
equests that are consequitive on disk, but not=0A> on the queue.=0A> =0A> B=
TW, some time ago I was interested in the effect on the=0A> fragmentation=
=0A> on UFS, due to some semi-abandoned patch, which could make=0A> the=0A>=
 fragmentation worse. I wrote the tool that calculated the=0A> percentage=
=0A> of non-consequtive spots in the whole filesystem.=0A> Apparently, even=
=0A> under the hard load consisting of writing a lot of files=0A> under the=
=0A> megabytes in size, UFS managed to keep the number of spots=0A> under 2=
-3% on=0A> sufficiently free volume.=0A> =0A=0AYes, the realloc_blk code is=
 very efficient in that. In fact=0Ait is so good it actually hides some ine=
fficient operations=0Ain UFS. Bruce had a patch for this that I cc'd to Kir=
k but=0Athe difference was not big because the realloc_blk code does=0Ait's=
 job in memory.=0A=0AZheng Liu did the reallocation thing for ext2fs and it=
 gave=0Abetter results than preallocation but the results are not=0Aas spec=
tacular as in UFS (the UFS code takes advantage of=0Afragments there too). =
I do expect to commit it (kern/159233)=0Aonce my mentor reviews and approve=
s it.=0A=0Acheers,=0A=0APedro.=0A



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1323634232.36004.YahooMailClassic>