From owner-freebsd-stable@FreeBSD.ORG  Wed Feb 15 04:50:15 2012
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DE86D1065672
	for <freebsd-stable@freebsd.org>; Wed, 15 Feb 2012 04:50:14 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.freebsd.org (Postfix) with ESMTP id A43BB8FC13
	for <freebsd-stable@freebsd.org>; Wed, 15 Feb 2012 04:50:14 +0000 (UTC)
Received: from localhost.samsco.home (pooker.samsco.org [168.103.85.57])
	(authenticated bits=0)
	by pooker.samsco.org (8.14.5/8.14.5) with ESMTP id q1F4o4EE067266;
	Tue, 14 Feb 2012 21:50:04 -0700 (MST)
	(envelope-from scottl@samsco.org)
Mime-Version: 1.0 (Apple Message framework v1251.1)
Content-Type: text/plain; charset=us-ascii
From: Scott Long <scottl@samsco.org>
In-Reply-To: <20120214200258.GA29641@server.vk2pj.dyndns.org>
Date: Tue, 14 Feb 2012 21:50:04 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <CA28336C-8462-4358-9E68-B01EEB4237CE@samsco.org>
References: <4F215A99.8020003@os2.kiev.ua> <4F27C04F.7020400@omnilan.de>
	<4F27C7C7.3060807@os2.kiev.ua>
	<CAJ-VmomezUWrEgxxmUEOhWnmLDohMAWRpSXmTR=n2y_LuizKJg@mail.gmail.com>
	<4F37F81E.7070100@os2.kiev.ua>
	<CAJ-Vmok9Ph1sgFCy6kNT4XR14grTLvG9M3JvT9eVBRjgqD+Y9g@mail.gmail.com>
	<4F38AF69.6010506@os2.kiev.ua> <20120213132821.GA78733@in-addr.com>
	<20120214200258.GA29641@server.vk2pj.dyndns.org>
To: Peter Jeremy <peterjeremy@acm.org>
X-Mailer: Apple Mail (2.1251.1)
X-Spam-Status: No, score=-50.0 required=3.8 tests=ALL_TRUSTED,
	T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.0
X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on pooker.samsco.org
Cc: freebsd-stable@freebsd.org
Subject: Re: disk devices speed is ugly
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Feb 2012 04:50:15 -0000


On Feb 14, 2012, at 1:02 PM, Peter Jeremy wrote:

> On 2012-Feb-13 08:28:21 -0500, Gary Palmer <gpalmer@freebsd.org> =
wrote:
>> The filesystem is the *BEST* place to do caching.  It knows what =
metadata
>> is most effective to cache and what other data (e.g. file contents) =
doesn't
>> need to be cached.
>=20
> Agreed.
>=20
>> Any attempt to do this in layers between the FS and
>> the disk won't achieve the same gains as a properly written =
filesystem.=20
>=20
> Agreed - but traditionally, Unix uses this approach via block devices.
> For various reasons, FreeBSD moved caching into UFS and removed block
> devices.  Unfortunately, this means that any FS that wants caching has
> to implement its own - and currently only UFS & ZFS do.
>=20
> What would be nice is a generic caching subsystem that any FS can use
> - similar to the old block devices but with hooks to allow the FS to
> request read-ahead, advise of unwanted blocks and ability to flush
> dirty blocks in a requested order with the equivalent of barriers
> (request Y will not occur until preceeding request X has been
> committed to stable media).  This would allow filesystems to regain
> the benefits of block devices with minimal effort and then improve
> performance & cache efficiency with additional work.
>=20

Any filesystem that uses bread/bwrite/cluster_read are already using the =
"generic caching subsystem" that you propose.  This includes UDF, =
CD9660, MSDOS, NTFS, XFS, ReiserFS, EXT2FS, and HPFS, i.e. every local =
storage filesystem in the tree except for ZFS.  Not all of them =
implement VOP_GETPAGES/VOP_PUTPAGES, but those are just optimizations =
for the vnode pager, not requirements for using buffer-cache services on =
block devices.  As Kostik pointed out in a parallel email, the only =
thing that was removed from FreeBSD was the userland interface to cached =
devices via /dev nodes.  This has nothing to do with filesystems, though =
I suppose that could maybe sorta kinda be an issue for FUSE?.

ZFS isn't in this list because it implements its own private =
buffer/cache (the ARC) that understands the special requirements of ZFS. =
 There are good and bad aspects to this, noted below.

> One downside of the "each FS does its own caching" in that the caches
> are all separate and need careful integration into the VM subsystem to
> prevent starvation (eg past problems with UFS starving ZFS L2ARC).
>=20

I'm not sure what you mean here.  The ARC is limited by available wired =
memory; attempts to allocate such memory will evict pages from the =
buffer cache as necessary, until all available RAM is consumed.  If =
anything, ZFS starves the rest of the system, not the other way around, =
and that's simply because the ARC isn't integrated with the normal VM.  =
Such integration is extremely hard and has nothing to do with having a =
generic caching subsystem.

Scott