From owner-freebsd-arch@FreeBSD.ORG  Sun Feb 26 14:04:21 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5A677106567A;
	Sun, 26 Feb 2012 14:04:21 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com
	[209.85.215.54])
	by mx1.freebsd.org (Postfix) with ESMTP id A44E08FC1C;
	Sun, 26 Feb 2012 14:04:20 +0000 (UTC)
Received: by lagz14 with SMTP id z14so6301703lag.13
	for <multiple recipients>; Sun, 26 Feb 2012 06:04:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=wyVdTHiwTYnwxOG2gKRHeGl992/uLImcXX664FXO5x4=;
	b=UYV+J5uilhPOr3P2+S2mNtz/rJfDemGRqhOFIUTY2fryxNgVe26zIwHsNSsSNPa01J
	JlZO0FOKi6618Z23bpVTlFllkrEY6PdQfGyyJY1Hxefsd+tOQoMKeC59fWWX6T17MQrE
	b/QBTzTrtmhP7Lf2B1hWxCzFe0LZSAKQLnJco=
MIME-Version: 1.0
Received: by 10.112.27.199 with SMTP id v7mr3412896lbg.36.1330265059463; Sun,
	26 Feb 2012 06:04:19 -0800 (PST)
Sender: asmrookie@gmail.com
Received: by 10.112.41.5 with HTTP; Sun, 26 Feb 2012 06:04:19 -0800 (PST)
In-Reply-To: <20120225194630.GI1344@garage.freebsd.pl>
References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua>
	<CAJ-FndABi21GfcCRTZizCPc_Mnxm1EY271BiXcYt9SD_zXFpXw@mail.gmail.com>
	<20120225151334.GH1344@garage.freebsd.pl>
	<CAJ-FndBBKHrpB1MNJTXx8gkFXR2d-O6k5-HJeOAyv2DznpN-QQ@mail.gmail.com>
	<20120225194630.GI1344@garage.freebsd.pl>
Date: Sun, 26 Feb 2012 15:04:19 +0100
X-Google-Sender-Auth: o5D1MLltHuoq3NS-UbW0zmBkRC0
Message-ID: <CAJ-FndBp9Eb5vVibXoLTLYCOELxJtDKY56MwpA9Kyk=OhiuaQw@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Pawel Jakub Dawidek <pjd@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: Konstantin Belousov <kostikbel@gmail.com>, arch@freebsd.org
Subject: Re: Prefaulting for i/o buffers
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Feb 2012 14:04:21 -0000

Il 25 febbraio 2012 20:46, Pawel Jakub Dawidek <pjd@freebsd.org> ha scritto=
:
> On Sat, Feb 25, 2012 at 06:45:00PM +0100, Attilio Rao wrote:
>> Il 25 febbraio 2012 16:13, Pawel Jakub Dawidek <pjd@freebsd.org> ha scri=
tto:
>> > I personal opinion about rangelocks and many other VFS features we
>> > currently have is that it is good idea in theory, but in practise it
>> > tends to overcomplicate VFS.
>> >
>> > I'm in opinion that we should move as much stuff as we can to individu=
al
>> > file systems. We try to implement everything in VFS itself in hope tha=
t
>> > this will simplify file systems we have. It then turns out only one fi=
le
>> > system is really using this stuff (most of the time it is UFS) and thi=
s
>> > is PITA for all the other file systems as well as maintaining VFS. VFS
>> > became so complicated over the years that there are maybe few people
>> > that can understand it, and every single change to VFS is a huge risk =
of
>> > potentially breaking some unrelated parts.
>>
>> I think this is questionable due to the following assets:
>> - If the problem is filesystems writers having trouble in
>> understanding the necessary locking we should really provide cleaner
>> and more complete documentation. One would think the same with our VM
>> subsystem, but at least in that case there is plenty of comments that
>> help understanding how to deal with vm_object, vm_pages locking during
>> their lifelines.
>
> Documentation is not the answer here. If the code is so complex it is
> harder to learn, no matter how good the documentation is, it makes less
> people willing to learn it in the first place and it makes the code more
> buggy, because there are more edge/special cases you can forget about.
>
>> - Our primitives may be more complicated than the
>> 'all-in-the-filesystem' one, but at least they offer a complete and
>> centralized view over the resources we have allocated in the whole
>> system and they allow building better policies about how to manage
>> them. One problem I see here, is that those policies are not fully
>> implemented, tuned or just got outdated, removing one of the highest
>> beneficial that we have by making vnodes so generic
>
> Again, this is only nice theory, that is far from being the reality.
> You will never be able to have control on all the resources allocated by
> file systems.
>
>> About the thing I mentioned myself:
>> - As long as the same path now has both range-locking and vnode
>> locking I don't see as a good idea to keep both separated forever.
>> Merging them seems to me an important evolution (not only helping
>> shrinking the number of primitives themselves but also introducing
>> less overhead and likely rewamped scalability for vnodes (but I think
>> this needs a deep investigation).
>> - About ZFS rangelocks absorbing the VFS ones, I think this is a minor
>> point, but still, if you think it can be done efficiently and without
>> loosing performance I don't see why not do that. You already wrote
>> rangelocks for ZFS, so you are have earned a big experience in this
>> area and can comment on fallouts, etc., but I don't see a good reason
>> to not do that, unless it is just too difficult. This is not about
>> generalizing a new mechanism, it is using a general mechanism in a
>> specific implementation, if possible.
>
> I did not implement rangelocking for ZFS. It came with ZFS when I ported
> it. Until we want to merge changes from upstream (which is now IllumOS)
> we don't want to make huge changes just for the sake of proving that
> this is general purpose mechanism used by more than one file system.
>
> Attilio, don't get me wrong. In 99% cases it is good to make code more
> general and more universal and reusable, but we can't ignore reality.
>
> There are reasons why file systems like XFS, ReiserFS and others where
> never fully ported. I'm not saying VFS complexity was the only reason,
> but I'm sure it was one of them.
>
> Our VFS is very UFS-centric. We make so many assumptions that sounds
> fine only for UFS. I saw plenty of those while working on ZFS, like:
>
> - "Every file system needs cache. Let's make it general, so that all file
> =C2=A0systems can use it!" Well, for VFS each file system is a separate
> =C2=A0entity, which is not the case for ZFS. ZFS can cache one block only
> =C2=A0once that is used by one file system, 10 clones and 100 snapshots,
> =C2=A0which all are separate mount points from VFS perspective.
> =C2=A0The same block would be cached 111 times by the buffer cache.
>
> - "rmdir(2) on a mountpoint is bad idea, let's deny it at VFS level."
> =C2=A0It is bad idea, indeed, but in ZFS it is a nice way to remove snaps=
hot
> =C2=A0by rmdiring .zfs/snapshot/<name> directory.
>
> - Noone implemented rangelocking in VFS, so no file system can use it.
> =C2=A0Even if the given file system has all the code to do it.
>
> etc.
>
> I'm also sure it will be way easier for Jeff to make VFS MP-safe if it
> was less complex.
>
> When looking at the big picture, it would be nice to have all this
> general stuff like rangelocking, quota, buffer cache, etc. as some kind
> of libraries for file systems to use and not something that is
> mandatory. If I develop a file system for FreeBSD only and I don't want
> to reinvent the wheel, I can use those libraries. If I port file system
> to FreeBSD or develop a file system that doesn't really need those
> libraries I'm not forced to use them.
>
> All this might make a good working group subject at BSDCan devsummit.
> We could cross swords there:)

Do you think you will be able to chair such a group?
I'm not sure I will be able to make it for BSDCan, but it would be
valuable if you or someone else interested can let the ball roll on
these topics.

Thanks,
Attilio


--=20
Peace can only be achieved by understanding - A. Einstein