From owner-freebsd-arch@FreeBSD.ORG Sat Feb 25 17:45:02 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EF5BB106566B; Sat, 25 Feb 2012 17:45:01 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 8F4A78FC12; Sat, 25 Feb 2012 17:45:01 +0000 (UTC) Received: by vcge1 with SMTP id e1so965933vcg.13 for ; Sat, 25 Feb 2012 09:45:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=nzKdFKlpGVQm/tk7uzQX2cCDaf1QX/Hi/Ko4Jxzv+Pg=; b=mTGJ8Cn416IgY3Z+1WnOQF+p1/rkXim5oAEKTqZCZFUEZ6ssA+YcAoihlFAFE9B7rT 06fVdvZrrvT87VjmGVG5lI/mSkST/Ry4RqEv6aks4RLj4iM/SBWHjFT5xPc82eODlEOH cjFk08xdcISX4SX5UjmV3Qpzi1FzPPhM1/ihs= MIME-Version: 1.0 Received: by 10.52.77.101 with SMTP id r5mr2974872vdw.109.1330191900926; Sat, 25 Feb 2012 09:45:00 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.220.189.2 with HTTP; Sat, 25 Feb 2012 09:45:00 -0800 (PST) In-Reply-To: <20120225151334.GH1344@garage.freebsd.pl> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> <20120225151334.GH1344@garage.freebsd.pl> Date: Sat, 25 Feb 2012 18:45:00 +0100 X-Google-Sender-Auth: tjy6w1h0-KKAAhFteSUAQwEo-58 Message-ID: From: Attilio Rao To: Pawel Jakub Dawidek Content-Type: text/plain; charset=UTF-8 Cc: Konstantin Belousov , arch@freebsd.org Subject: Re: Prefaulting for i/o buffers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Feb 2012 17:45:02 -0000 Il 25 febbraio 2012 16:13, Pawel Jakub Dawidek ha scritto: > On Sat, Feb 25, 2012 at 01:01:32PM +0000, Attilio Rao wrote: >> Il 03 febbraio 2012 19:37, Konstantin Belousov ha scritto: >> > FreeBSD I/O infrastructure has well known issue with deadlock caused >> > by vnode lock order reversal when buffers supplied to read(2) or >> > write(2) syscalls are backed by mmaped file. >> > >> > I previously published the patches to convert i/o path to use VMIO, >> > based on the Jeff Roberson proposal, see >> > http://wiki.freebsd.org/VM6. As a side effect, the VM6 fixed the >> > deadlock. Since that work is very intrusive and did not got any >> > follow-up, it get stalled. >> > >> > Below is very lightweight patch which only goal is to fix deadlock in >> > the least intrusive way. This is possible after FreeBSD got the >> > vm_fault_quick_hold_pages(9) and vm_fault_disable_pagefaults(9) KPIs. >> > http://people.freebsd.org/~kib/misc/vm1.3.patch >> >> Hi, >> I was reviewing: >> http://people.freebsd.org/~kib/misc/vm1.11.patch >> >> and I think it is great. It is simple enough and I don't have further >> comments on it. >> >> However, as a side note, I was thinking if we could get one day at the >> point to integrate rangelocks into vnodes lockmgr directly. >> It would be a huge patch, rewrtiting the locking of several members of >> vnodes likely, but I think it would be worth it in terms of cleaness >> of the interface and less overhead. Also, it would be interesting to >> consider merging rangelock implementation in ZFS' one, at some point. > > I personal opinion about rangelocks and many other VFS features we > currently have is that it is good idea in theory, but in practise it > tends to overcomplicate VFS. > > I'm in opinion that we should move as much stuff as we can to individual > file systems. We try to implement everything in VFS itself in hope that > this will simplify file systems we have. It then turns out only one file > system is really using this stuff (most of the time it is UFS) and this > is PITA for all the other file systems as well as maintaining VFS. VFS > became so complicated over the years that there are maybe few people > that can understand it, and every single change to VFS is a huge risk of > potentially breaking some unrelated parts. I think this is questionable due to the following assets: - If the problem is filesystems writers having trouble in understanding the necessary locking we should really provide cleaner and more complete documentation. One would think the same with our VM subsystem, but at least in that case there is plenty of comments that help understanding how to deal with vm_object, vm_pages locking during their lifelines. - Our primitives may be more complicated than the 'all-in-the-filesystem' one, but at least they offer a complete and centralized view over the resources we have allocated in the whole system and they allow building better policies about how to manage them. One problem I see here, is that those policies are not fully implemented, tuned or just got outdated, removing one of the highest beneficial that we have by making vnodes so generic About the thing I mentioned myself: - As long as the same path now has both range-locking and vnode locking I don't see as a good idea to keep both separated forever. Merging them seems to me an important evolution (not only helping shrinking the number of primitives themselves but also introducing less overhead and likely rewamped scalability for vnodes (but I think this needs a deep investigation). - About ZFS rangelocks absorbing the VFS ones, I think this is a minor point, but still, if you think it can be done efficiently and without loosing performance I don't see why not do that. You already wrote rangelocks for ZFS, so you are have earned a big experience in this area and can comment on fallouts, etc., but I don't see a good reason to not do that, unless it is just too difficult. This is not about generalizing a new mechanism, it is using a general mechanism in a specific implementation, if possible. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein