From owner-freebsd-current@FreeBSD.ORG Sun Dec 11 20:24:25 2011 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 631FE1065672 for ; Sun, 11 Dec 2011 20:24:25 +0000 (UTC) (envelope-from giffunip@tutopia.com) Received: from nm7-vm0.bullet.mail.sp2.yahoo.com (nm7-vm0.bullet.mail.sp2.yahoo.com [98.139.91.192]) by mx1.freebsd.org (Postfix) with SMTP id 3E07B8FC0A for ; Sun, 11 Dec 2011 20:24:25 +0000 (UTC) Received: from [98.139.91.62] by nm7.bullet.mail.sp2.yahoo.com with NNFMP; 11 Dec 2011 20:10:32 -0000 Received: from [98.139.91.58] by tm2.bullet.mail.sp2.yahoo.com with NNFMP; 11 Dec 2011 20:10:32 -0000 Received: from [127.0.0.1] by omp1058.mail.sp2.yahoo.com with NNFMP; 11 Dec 2011 20:10:32 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 557934.39750.bm@omp1058.mail.sp2.yahoo.com Received: (qmail 40303 invoked by uid 60001); 11 Dec 2011 20:10:32 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1323634232; bh=Eecm7fB28X/2KtsUf7Iwh7MXCCR/3s5bnBap+sD7/5I=; h=X-YMail-OSG:Received:X-RocketYMMF:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding; b=c2NI37ZAxKSuwCpLYrUdM+KDedabghNcnz4qoCqSuuLZDX8vJSoC5gO2aJtiC1hEGV4+s/wFQ5iaPWMLlsF/yypnQfuPNSPnPycMwduOdy76d2Z/EAI9j/4kD0AnFqedptC5MQuJ5lmkWpta8tVuU0eSr7PmMEuJ7g5+HplwPXQ= X-YMail-OSG: Ob.t5wUVM1mcZ8ZDROxkh3faKfi7eEN19Cr7y.mEKPTRzhw PxgH1UhHzo2lM8bfLPlq_n_fTZorJiFWi2AQp3bz9XYJFf16KVPmLr79cWNk JImiknNHOGdARMXAhJ10xOFY8fgc7kLHpAiIngiPKBwPQI5P.IKZSVwbeGw0 hBvKwvl7vorGeVmTAi5lHeMa45yHlAxgdfyexPtFCZdu2B0uT4iE5yghsBzY fDsq.Oo488xQAwURSN0LUdBnth64oOnlmBMdAOwiSL9ZZEdFZHOc4p8ddqxn GwkClsJkbhBAt_kaKYj_r_xn1gj1WMi5haTYX_.YgaFAj5vLrvJ3LzjWJRe6 o3tvULZWRU3soGTnGY7ogbLDN7L5ffKDpVc_T4QHgN1Evcyq.mDpE6HTjvRs xTVIGgLCoul1JiCtrYHMf3bvnyllbunmxgPkmS4JDSmXUqDaVePpf8b2Oph0 R Received: from [200.118.157.7] by web113503.mail.gq1.yahoo.com via HTTP; Sun, 11 Dec 2011 12:10:32 PST X-RocketYMMF: giffunip X-Mailer: YahooMailClassic/15.0.4 YahooMailWebService/0.8.115.331698 Message-ID: <1323634232.36004.YahooMailClassic@web113503.mail.gq1.yahoo.com> Date: Sun, 11 Dec 2011 12:10:32 -0800 (PST) From: "Pedro F. Giffuni" To: Kostik Belousov MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: current@freebsd.org Subject: Re: calling all fs experts X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: giffunip@tutopia.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Dec 2011 20:24:25 -0000 =0A--- Dom 11/12/11, Kostik Belousov ha scritto:=0A= =0A> =0A> If you wanted to get responses from experts only, sorry in=0A> ad= vance.=0A>=0A=0AI am no fs expert but just thought I'd mention some things= =0Abased on my playing with the BSD ext2fs ...=0A =0A> The fs (AKA UFS) use= s clustering provided by the block=0A> cache. The clustering=0A> code, main= ly located in the kern/vfs_cluster.c, coalesces=0A> sequence of=0A> reads o= r writes that are targeting the consequtive blocks,=0A> into single=0A> phy= sical read or write of the maximal size of MAXPHYS.=0A> Current definition= =0A> of MAXPHYS is 128KB.=0A>=0A=0AThe clustering code is really cool and t= he idea is that it=0Agives UFS the advantages of an extent based fs.=0AI ha= ven't seen benchmarks in UFS2 but on ext2 it didn't=0Aseem to work as it sh= ould though. =0A=0AOne issue is that ext2 doesn't support fragments and as= =0Aa consequence ext2 will not use big blocksizes. This is a=0Alimitation i= n the ext2 design that UFS doesn't have, but=0Astill linux's ext2fs outperf= orms UFS in async mode (we do=0Ashine in sync mode).=0A=0AIt was never clea= r exactly why this happens but it would=0Aappear there is a bottleneck in g= eom that is not good in=0Awriting many contiguous blocks.=0A=0A> Clustering= allows filesystem to improve the layout of the=0A> files by calling=0A> VO= P_REALLOCBLKS() to redo the allocation to make the=0A> writing sequence of= =0A> blocks sequential if it is not.=0A> =0A> Even if file is not layed out= ideally, or the i/o pattern=0A> is random, most=0A> writes scheduled are a= synchronous, and for reads, the=0A> system tries to=0A> schedule read-ahead= s for some limited number of blocks.=0A> This allows the=0A> lower layers, = i.e. geom and disk drivers, to optimize the=0A> i/o queue=0A> to coalesce r= equests that are consequitive on disk, but not=0A> on the queue.=0A> =0A> B= TW, some time ago I was interested in the effect on the=0A> fragmentation= =0A> on UFS, due to some semi-abandoned patch, which could make=0A> the=0A>= fragmentation worse. I wrote the tool that calculated the=0A> percentage= =0A> of non-consequtive spots in the whole filesystem.=0A> Apparently, even= =0A> under the hard load consisting of writing a lot of files=0A> under the= =0A> megabytes in size, UFS managed to keep the number of spots=0A> under 2= -3% on=0A> sufficiently free volume.=0A> =0A=0AYes, the realloc_blk code is= very efficient in that. In fact=0Ait is so good it actually hides some ine= fficient operations=0Ain UFS. Bruce had a patch for this that I cc'd to Kir= k but=0Athe difference was not big because the realloc_blk code does=0Ait's= job in memory.=0A=0AZheng Liu did the reallocation thing for ext2fs and it= gave=0Abetter results than preallocation but the results are not=0Aas spec= tacular as in UFS (the UFS code takes advantage of=0Afragments there too). = I do expect to commit it (kern/159233)=0Aonce my mentor reviews and approve= s it.=0A=0Acheers,=0A=0APedro.=0A