From owner-freebsd-arch@FreeBSD.ORG Sun Feb 26 14:04:21 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A677106567A; Sun, 26 Feb 2012 14:04:21 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id A44E08FC1C; Sun, 26 Feb 2012 14:04:20 +0000 (UTC) Received: by lagz14 with SMTP id z14so6301703lag.13 for ; Sun, 26 Feb 2012 06:04:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=wyVdTHiwTYnwxOG2gKRHeGl992/uLImcXX664FXO5x4=; b=UYV+J5uilhPOr3P2+S2mNtz/rJfDemGRqhOFIUTY2fryxNgVe26zIwHsNSsSNPa01J JlZO0FOKi6618Z23bpVTlFllkrEY6PdQfGyyJY1Hxefsd+tOQoMKeC59fWWX6T17MQrE b/QBTzTrtmhP7Lf2B1hWxCzFe0LZSAKQLnJco= MIME-Version: 1.0 Received: by 10.112.27.199 with SMTP id v7mr3412896lbg.36.1330265059463; Sun, 26 Feb 2012 06:04:19 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.112.41.5 with HTTP; Sun, 26 Feb 2012 06:04:19 -0800 (PST) In-Reply-To: <20120225194630.GI1344@garage.freebsd.pl> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> <20120225151334.GH1344@garage.freebsd.pl> <20120225194630.GI1344@garage.freebsd.pl> Date: Sun, 26 Feb 2012 15:04:19 +0100 X-Google-Sender-Auth: o5D1MLltHuoq3NS-UbW0zmBkRC0 Message-ID: From: Attilio Rao To: Pawel Jakub Dawidek Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Konstantin Belousov , arch@freebsd.org Subject: Re: Prefaulting for i/o buffers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Feb 2012 14:04:21 -0000 Il 25 febbraio 2012 20:46, Pawel Jakub Dawidek ha scritto= : > On Sat, Feb 25, 2012 at 06:45:00PM +0100, Attilio Rao wrote: >> Il 25 febbraio 2012 16:13, Pawel Jakub Dawidek ha scri= tto: >> > I personal opinion about rangelocks and many other VFS features we >> > currently have is that it is good idea in theory, but in practise it >> > tends to overcomplicate VFS. >> > >> > I'm in opinion that we should move as much stuff as we can to individu= al >> > file systems. We try to implement everything in VFS itself in hope tha= t >> > this will simplify file systems we have. It then turns out only one fi= le >> > system is really using this stuff (most of the time it is UFS) and thi= s >> > is PITA for all the other file systems as well as maintaining VFS. VFS >> > became so complicated over the years that there are maybe few people >> > that can understand it, and every single change to VFS is a huge risk = of >> > potentially breaking some unrelated parts. >> >> I think this is questionable due to the following assets: >> - If the problem is filesystems writers having trouble in >> understanding the necessary locking we should really provide cleaner >> and more complete documentation. One would think the same with our VM >> subsystem, but at least in that case there is plenty of comments that >> help understanding how to deal with vm_object, vm_pages locking during >> their lifelines. > > Documentation is not the answer here. If the code is so complex it is > harder to learn, no matter how good the documentation is, it makes less > people willing to learn it in the first place and it makes the code more > buggy, because there are more edge/special cases you can forget about. > >> - Our primitives may be more complicated than the >> 'all-in-the-filesystem' one, but at least they offer a complete and >> centralized view over the resources we have allocated in the whole >> system and they allow building better policies about how to manage >> them. One problem I see here, is that those policies are not fully >> implemented, tuned or just got outdated, removing one of the highest >> beneficial that we have by making vnodes so generic > > Again, this is only nice theory, that is far from being the reality. > You will never be able to have control on all the resources allocated by > file systems. > >> About the thing I mentioned myself: >> - As long as the same path now has both range-locking and vnode >> locking I don't see as a good idea to keep both separated forever. >> Merging them seems to me an important evolution (not only helping >> shrinking the number of primitives themselves but also introducing >> less overhead and likely rewamped scalability for vnodes (but I think >> this needs a deep investigation). >> - About ZFS rangelocks absorbing the VFS ones, I think this is a minor >> point, but still, if you think it can be done efficiently and without >> loosing performance I don't see why not do that. You already wrote >> rangelocks for ZFS, so you are have earned a big experience in this >> area and can comment on fallouts, etc., but I don't see a good reason >> to not do that, unless it is just too difficult. This is not about >> generalizing a new mechanism, it is using a general mechanism in a >> specific implementation, if possible. > > I did not implement rangelocking for ZFS. It came with ZFS when I ported > it. Until we want to merge changes from upstream (which is now IllumOS) > we don't want to make huge changes just for the sake of proving that > this is general purpose mechanism used by more than one file system. > > Attilio, don't get me wrong. In 99% cases it is good to make code more > general and more universal and reusable, but we can't ignore reality. > > There are reasons why file systems like XFS, ReiserFS and others where > never fully ported. I'm not saying VFS complexity was the only reason, > but I'm sure it was one of them. > > Our VFS is very UFS-centric. We make so many assumptions that sounds > fine only for UFS. I saw plenty of those while working on ZFS, like: > > - "Every file system needs cache. Let's make it general, so that all file > =C2=A0systems can use it!" Well, for VFS each file system is a separate > =C2=A0entity, which is not the case for ZFS. ZFS can cache one block only > =C2=A0once that is used by one file system, 10 clones and 100 snapshots, > =C2=A0which all are separate mount points from VFS perspective. > =C2=A0The same block would be cached 111 times by the buffer cache. > > - "rmdir(2) on a mountpoint is bad idea, let's deny it at VFS level." > =C2=A0It is bad idea, indeed, but in ZFS it is a nice way to remove snaps= hot > =C2=A0by rmdiring .zfs/snapshot/ directory. > > - Noone implemented rangelocking in VFS, so no file system can use it. > =C2=A0Even if the given file system has all the code to do it. > > etc. > > I'm also sure it will be way easier for Jeff to make VFS MP-safe if it > was less complex. > > When looking at the big picture, it would be nice to have all this > general stuff like rangelocking, quota, buffer cache, etc. as some kind > of libraries for file systems to use and not something that is > mandatory. If I develop a file system for FreeBSD only and I don't want > to reinvent the wheel, I can use those libraries. If I port file system > to FreeBSD or develop a file system that doesn't really need those > libraries I'm not forced to use them. > > All this might make a good working group subject at BSDCan devsummit. > We could cross swords there:) Do you think you will be able to chair such a group? I'm not sure I will be able to make it for BSDCan, but it would be valuable if you or someone else interested can let the ball roll on these topics. Thanks, Attilio --=20 Peace can only be achieved by understanding - A. Einstein