From owner-freebsd-stable@FreeBSD.ORG Wed Feb 15 04:50:15 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DE86D1065672 for ; Wed, 15 Feb 2012 04:50:14 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id A43BB8FC13 for ; Wed, 15 Feb 2012 04:50:14 +0000 (UTC) Received: from localhost.samsco.home (pooker.samsco.org [168.103.85.57]) (authenticated bits=0) by pooker.samsco.org (8.14.5/8.14.5) with ESMTP id q1F4o4EE067266; Tue, 14 Feb 2012 21:50:04 -0700 (MST) (envelope-from scottl@samsco.org) Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=us-ascii From: Scott Long In-Reply-To: <20120214200258.GA29641@server.vk2pj.dyndns.org> Date: Tue, 14 Feb 2012 21:50:04 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4F215A99.8020003@os2.kiev.ua> <4F27C04F.7020400@omnilan.de> <4F27C7C7.3060807@os2.kiev.ua> <4F37F81E.7070100@os2.kiev.ua> <4F38AF69.6010506@os2.kiev.ua> <20120213132821.GA78733@in-addr.com> <20120214200258.GA29641@server.vk2pj.dyndns.org> To: Peter Jeremy X-Mailer: Apple Mail (2.1251.1) X-Spam-Status: No, score=-50.0 required=3.8 tests=ALL_TRUSTED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on pooker.samsco.org Cc: freebsd-stable@freebsd.org Subject: Re: disk devices speed is ugly X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Feb 2012 04:50:15 -0000 On Feb 14, 2012, at 1:02 PM, Peter Jeremy wrote: > On 2012-Feb-13 08:28:21 -0500, Gary Palmer = wrote: >> The filesystem is the *BEST* place to do caching. It knows what = metadata >> is most effective to cache and what other data (e.g. file contents) = doesn't >> need to be cached. >=20 > Agreed. >=20 >> Any attempt to do this in layers between the FS and >> the disk won't achieve the same gains as a properly written = filesystem.=20 >=20 > Agreed - but traditionally, Unix uses this approach via block devices. > For various reasons, FreeBSD moved caching into UFS and removed block > devices. Unfortunately, this means that any FS that wants caching has > to implement its own - and currently only UFS & ZFS do. >=20 > What would be nice is a generic caching subsystem that any FS can use > - similar to the old block devices but with hooks to allow the FS to > request read-ahead, advise of unwanted blocks and ability to flush > dirty blocks in a requested order with the equivalent of barriers > (request Y will not occur until preceeding request X has been > committed to stable media). This would allow filesystems to regain > the benefits of block devices with minimal effort and then improve > performance & cache efficiency with additional work. >=20 Any filesystem that uses bread/bwrite/cluster_read are already using the = "generic caching subsystem" that you propose. This includes UDF, = CD9660, MSDOS, NTFS, XFS, ReiserFS, EXT2FS, and HPFS, i.e. every local = storage filesystem in the tree except for ZFS. Not all of them = implement VOP_GETPAGES/VOP_PUTPAGES, but those are just optimizations = for the vnode pager, not requirements for using buffer-cache services on = block devices. As Kostik pointed out in a parallel email, the only = thing that was removed from FreeBSD was the userland interface to cached = devices via /dev nodes. This has nothing to do with filesystems, though = I suppose that could maybe sorta kinda be an issue for FUSE?. ZFS isn't in this list because it implements its own private = buffer/cache (the ARC) that understands the special requirements of ZFS. = There are good and bad aspects to this, noted below. > One downside of the "each FS does its own caching" in that the caches > are all separate and need careful integration into the VM subsystem to > prevent starvation (eg past problems with UFS starving ZFS L2ARC). >=20 I'm not sure what you mean here. The ARC is limited by available wired = memory; attempts to allocate such memory will evict pages from the = buffer cache as necessary, until all available RAM is consumed. If = anything, ZFS starves the rest of the system, not the other way around, = and that's simply because the ARC isn't integrated with the normal VM. = Such integration is extremely hard and has nothing to do with having a = generic caching subsystem. Scott