From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 27 07:26:56 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1C7011065670
	for <freebsd-fs@FreeBSD.ORG>; Sun, 27 Apr 2008 07:26:56 +0000 (UTC)
	(envelope-from randy@psg.com)
Received: from rip.psg.com (rip.psg.com [IPv6:2001:418:1::39])
	by mx1.freebsd.org (Postfix) with ESMTP id 07AEB8FC1C
	for <freebsd-fs@FreeBSD.ORG>; Sun, 27 Apr 2008 07:26:56 +0000 (UTC)
	(envelope-from randy@psg.com)
Received: from 50.216.138.210.bn.2iij.net ([210.138.216.50] helo=rmac.psg.com)
	by rip.psg.com with esmtpsa (TLSv1:AES256-SHA:256)
	(Exim 4.69 (FreeBSD)) (envelope-from <randy@psg.com>)
	id 1Jq1Hj-0006zX-DA
	for freebsd-fs@FreeBSD.ORG; Sun, 27 Apr 2008 07:26:55 +0000
Message-ID: <48142ABE.4050107@psg.com>
Date: Sun, 27 Apr 2008 16:26:54 +0900
From: Randy Bush <randy@psg.com>
User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213)
MIME-Version: 1.0
To: freebsd-fs@FreeBSD.ORG
X-Enigmail-Version: 0.95.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: 
Subject: zfs and vfs.zfs.prefetch_disable="1"
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 27 Apr 2008 07:26:56 -0000

i have in my incantations for zfs on i386 to stick the following in
/boot/loader.conf.local

vm.kmem_size=600M
vm.kmem_size_max=600M
zfs_load=YES
vfs.zfs.prefetch_disable=1

but i have no idea where that last one crept in.  any clues?

randy

From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 27 12:01:42 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7DE5E1065679
	for <freebsd-fs@freebsd.org>; Sun, 27 Apr 2008 12:01:42 +0000 (UTC)
	(envelope-from engywook@gmail.com)
Received: from wf-out-1314.google.com (wf-out-1314.google.com [209.85.200.172])
	by mx1.freebsd.org (Postfix) with ESMTP id 52C6B8FC23
	for <freebsd-fs@freebsd.org>; Sun, 27 Apr 2008 12:01:42 +0000 (UTC)
	(envelope-from engywook@gmail.com)
Received: by wf-out-1314.google.com with SMTP id 25so3522584wfa.7
	for <freebsd-fs@freebsd.org>; Sun, 27 Apr 2008 05:01:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=8Ap+hTVtDT3nzp3NnAKPupTCwz3uORQHQ30OXVhyyD4=;
	b=SGtoF4Fo8R8k9CWLK5+muP63u2ETEUEz3iKQ1u6oWcpoHMGJyovlDRr+dryg79/COuRm9ZcjdQUcdYvK5M3BU9kjj9/P71kGVOib8WvlR9pOuWiVx9l9IXyCn81q12qqueXVU3X7gUBPejiL847SJSvPZCrYTP6bwLUL+Oyz0lc=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=o3acPjaBc/RpZ9yD84qtt+27eyYeRcRehtGdBsa+FgXiP5NUquPdC46GuY6MdzwPZNNlwpHc368lMhqHOHP7ZA1cBUKxBj14XKx0eYMS3DwGBeYKQzoqaE1uzC1GVgDhvNm0L35aoyYxmQ+XPNjedNETF5TpGpvGONkuPkZW4Hw=
Received: by 10.142.104.9 with SMTP id b9mr1570188wfc.48.1209297702071;
	Sun, 27 Apr 2008 05:01:42 -0700 (PDT)
Received: by 10.143.3.10 with HTTP; Sun, 27 Apr 2008 05:01:42 -0700 (PDT)
Message-ID: <24adbbc00804270501t48b9a1c5le2f1d0bce18572cf@mail.gmail.com>
Date: Sun, 27 Apr 2008 14:01:42 +0200
From: "Daniel Andersson" <engywook@gmail.com>
To: yalur@mail.ru
In-Reply-To: <200804162212.32560.yalur@mail.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <24adbbc00804151529m2a74085ds468eaac55ba94a32@mail.gmail.com>
	<200804162212.32560.yalur@mail.ru>
Cc: freebsd-fs@freebsd.org
Subject: Re: Choppy performance.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 27 Apr 2008 12:01:42 -0000

>How do you calculate totall memory use in top? Real memory use is present
>in "RES" column but not in "SIZE" column.
>
>################################
>PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
 >1215           engy        1  99    0  2085M   139M zfs:(&  49:32
 >19.53%      rtorrent
>################################

Well, I checked memory usage in rtorrent and it said it was
higher than 2GB(total physical RAM) then used top to read the swap line:

Swap: 1024M Total, 39M Used, 985M Free, 3% Inuse

But if it isn't really using that much memory how come I get
memory allocation errors in rtorrent if there's more memory
avaliable?

Cheers,
Daniel

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 05:35:33 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5F0EF1065673;
	Mon, 28 Apr 2008 05:35:33 +0000 (UTC)
	(envelope-from delphij@delphij.net)
Received: from tarsier.delphij.net (unknown [IPv6:2001:470:1f03:2c9::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 1468C8FC13;
	Mon, 28 Apr 2008 05:35:32 +0000 (UTC)
	(envelope-from delphij@delphij.net)
Received: from tarsier.geekcn.org (tarsier.geekcn.org [202.108.54.204])
	(using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by tarsier.delphij.net (Postfix) with ESMTPS id 0372528449;
	Mon, 28 Apr 2008 13:35:28 +0800 (CST)
Received: from localhost (tarsier.geekcn.org [202.108.54.204])
	by tarsier.geekcn.org (Postfix) with ESMTP id B5B3BEB73F9;
	Mon, 28 Apr 2008 13:35:27 +0800 (CST)
X-Virus-Scanned: amavisd-new at geekcn.org
Received: from tarsier.geekcn.org ([202.108.54.204])
	by localhost (mail.geekcn.org [202.108.54.204]) (amavisd-new,
	port 10024)
	with ESMTP id PG8oHbNk0H0K; Mon, 28 Apr 2008 13:35:18 +0800 (CST)
Received: from charlie.delphij.net (c-69-181-135-56.hsd1.ca.comcast.net
	[69.181.135.56])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by tarsier.geekcn.org (Postfix) with ESMTPSA id DE9CDEB73D7;
	Mon, 28 Apr 2008 13:35:15 +0800 (CST)
DomainKey-Signature: a=rsa-sha1; s=default; d=delphij.net; c=nofws; q=dns;
	h=message-id:date:from:reply-to:organization:user-agent:
	mime-version:to:cc:subject:x-enigmail-version:openpgp:content-type:content-transfer-encoding;
	b=Qwyxu08d6yEWoAuQm02dR7Zf9mTnFhKEq71MvXb36bNJqaI0OduipHwIyII+zix42
	FoIiovsahlh1KJBpHAWTg==
Message-ID: <4815620F.3090005@delphij.net>
Date: Sun, 27 Apr 2008 22:35:11 -0700
From: Xin LI <delphij@delphij.net>
Organization: The FreeBSD Project
User-Agent: Thunderbird 2.0.0.12 (X11/20080422)
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
X-Enigmail-Version: 0.95.6
OpenPGP: id=18EDEBA0;
	url=http://www.delphij.net/delphij.asc
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Jeff Roberson <jeff@freebsd.org>, kib@FreeBSD.org
Subject: [7.0-R] Possible ufs livelock during coredump path?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: d@delphij.net
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 05:35:33 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

It seems that we have a potential livelock during coredump on 7.0-R, the
case was that two processes trying to coredump in the same time (e.g. if
I configure kern.corefile=/var/tmp/%N.core and a lot of instances
coredump in the same time), perhaps when paging involved with it.  Upon
reboot, it would not recover but wait infinitely.  The box is running
7.0-R/i386, UP (Origin = "GenuineIntel"  Id = 0xf34  Stepping = 4).

Is this an known issue?  This is my own server but I do not have my
hands on it because it is in China, however I can provide some help if
the experiment can be recovered with a power-cycle :)

Cheers,
- --
Xin LI <delphij@delphij.net>	http://www.delphij.net/
FreeBSD - The Power to Serve!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgVYg8ACgkQi+vbBBjt66DTMQCfXQ4q327phAzDeEmUhtgUoJxS
Ap8AniSdbCY0HN9m5wf9nAbKyLFifUQg
=V94G
-----END PGP SIGNATURE-----

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 07:19:32 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2865F1065670;
	Mon, 28 Apr 2008 07:19:32 +0000 (UTC) (envelope-from peter@holm.cc)
Received: from wbm3.pair.net (wbm3.pair.net [209.68.3.66])
	by mx1.freebsd.org (Postfix) with ESMTP id 033F08FC12;
	Mon, 28 Apr 2008 07:19:31 +0000 (UTC) (envelope-from peter@holm.cc)
Received: by wbm3.pair.net (Postfix, from userid 65534)
	id 1B9C76B178; Mon, 28 Apr 2008 03:00:28 -0400 (EDT)
Received: from 193.234.247.50 ([193.234.247.50])
	(SquirrelMail authenticated user holm@aedde.pair.com)
	by webmail3.pair.com with HTTP;
	Mon, 28 Apr 2008 09:00:28 +0200 (CEST)
Message-ID: <64011.193.234.247.50.1209366028.squirrel@webmail3.pair.com>
In-Reply-To: <4815620F.3090005@delphij.net>
References: <4815620F.3090005@delphij.net>
Date: Mon, 28 Apr 2008 09:00:28 +0200 (CEST)
From: "Peter Holm" <peter@holm.cc>
To: d@delphij.net
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
Cc: freebsd-fs@freebsd.org, Jeff Roberson <jeff@freebsd.org>, kib@freebsd.org
Subject: Re: [7.0-R] Possible ufs livelock during coredump path?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 07:19:32 -0000


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> It seems that we have a potential livelock during coredump on 7.0-R, the
> case was that two processes trying to coredump in the same time (e.g. if
> I configure kern.corefile=/var/tmp/%N.core and a lot of instances
> coredump in the same time), perhaps when paging involved with it.  Upon
> reboot, it would not recover but wait infinitely.  The box is running
> 7.0-R/i386, UP (Origin = "GenuineIntel"  Id = 0xf34  Stepping = 4).
>
> Is this an known issue?  This is my own server but I do not have my
> hands on it because it is in China, however I can provide some help if
> the experiment can be recovered with a power-cycle :)
>

AFAIK it is an old problem. I have some test where I had to disable core
dumps for the same reason. I seem to remember that the problem is related
to running out of VM?

- Peter


> Cheers,
> - --
> Xin LI <delphij@delphij.net>	http://www.delphij.net/
> FreeBSD - The Power to Serve!
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.9 (FreeBSD)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkgVYg8ACgkQi+vbBBjt66DTMQCfXQ4q327phAzDeEmUhtgUoJxS
> Ap8AniSdbCY0HN9m5wf9nAbKyLFifUQg
> =V94G
> -----END PGP SIGNATURE-----
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>


From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 07:46:44 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3F183106564A;
	Mon, 28 Apr 2008 07:46:44 +0000 (UTC)
	(envelope-from delphij@delphij.net)
Received: from tarsier.delphij.net (unknown [IPv6:2001:470:1f03:2c9::2])
	by mx1.freebsd.org (Postfix) with ESMTP id B33D18FC22;
	Mon, 28 Apr 2008 07:46:42 +0000 (UTC)
	(envelope-from delphij@delphij.net)
Received: from tarsier.geekcn.org (tarsier.geekcn.org [202.108.54.204])
	(using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by tarsier.delphij.net (Postfix) with ESMTPS id 9012D28449;
	Mon, 28 Apr 2008 15:46:41 +0800 (CST)
Received: from localhost (tarsier.geekcn.org [202.108.54.204])
	by tarsier.geekcn.org (Postfix) with ESMTP id A2513EB77EF;
	Mon, 28 Apr 2008 15:46:39 +0800 (CST)
X-Virus-Scanned: amavisd-new at geekcn.org
Received: from tarsier.geekcn.org ([202.108.54.204])
	by localhost (mail.geekcn.org [202.108.54.204]) (amavisd-new,
	port 10024)
	with ESMTP id YU4odJ+QeeBY; Mon, 28 Apr 2008 15:46:29 +0800 (CST)
Received: from charlie.delphij.net (c-69-181-135-56.hsd1.ca.comcast.net
	[69.181.135.56])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by tarsier.geekcn.org (Postfix) with ESMTPSA id B7C4CEB77DA;
	Mon, 28 Apr 2008 15:46:23 +0800 (CST)
DomainKey-Signature: a=rsa-sha1; s=default; d=delphij.net; c=nofws; q=dns;
	h=message-id:date:from:reply-to:organization:user-agent:
	mime-version:to:cc:subject:references:in-reply-to:
	x-enigmail-version:openpgp:content-type:content-transfer-encoding;
	b=jrdo0Zv6AtxMOfxXvznaj++Pcz3I8H3uUiOlClT4A2ysl2AoPcPtm0WOimZ3xUKAy
	PB2ThUdZW78s4oWa0f9bg==
Message-ID: <481580CB.1000800@delphij.net>
Date: Mon, 28 Apr 2008 00:46:19 -0700
From: Xin LI <delphij@delphij.net>
Organization: The FreeBSD Project
User-Agent: Thunderbird 2.0.0.12 (X11/20080422)
MIME-Version: 1.0
To: Peter Holm <peter@holm.cc>
References: <4815620F.3090005@delphij.net>
	<64011.193.234.247.50.1209366028.squirrel@webmail3.pair.com>
In-Reply-To: <64011.193.234.247.50.1209366028.squirrel@webmail3.pair.com>
X-Enigmail-Version: 0.95.6
OpenPGP: id=18EDEBA0;
	url=http://www.delphij.net/delphij.asc
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, kib@freebsd.org, Jeff Roberson <jeff@freebsd.org>,
	d@delphij.net
Subject: Re: [7.0-R] Possible ufs livelock during coredump path?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: d@delphij.net
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 07:46:44 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Peter Holm wrote:
| Hi,
|
| It seems that we have a potential livelock during coredump on 7.0-R, the
| case was that two processes trying to coredump in the same time (e.g. if
| I configure kern.corefile=/var/tmp/%N.core and a lot of instances
| coredump in the same time), perhaps when paging involved with it.  Upon
| reboot, it would not recover but wait infinitely.  The box is running
| 7.0-R/i386, UP (Origin = "GenuineIntel"  Id = 0xf34  Stepping = 4).
|
| Is this an known issue?  This is my own server but I do not have my
| hands on it because it is in China, however I can provide some help if
| the experiment can be recovered with a power-cycle :)
|
|
|> AFAIK it is an old problem. I have some test where I had to disable core
|> dumps for the same reason. I seem to remember that the problem is related
|> to running out of VM?

For my case it does not seem to be ran out of VM (at least the system
did not printed out any messages, the log has a lot of kernel: pid 27223
(httpd), uid 80: exited on signal 11 (core dumped) but not the out of
swap one.

So, presumably we can reliably trigger this situation (or at least your
ones :)?

Cheers,
- --
Xin LI <delphij@delphij.net>	http://www.delphij.net/
FreeBSD - The Power to Serve!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgVgMsACgkQi+vbBBjt66BnOwCeJLB5xoE27b3CN/x/VIL+0EAI
+c8AoJyYiqCi7tBeqZBx6cj/+gzBLmFn
=qZmb
-----END PGP SIGNATURE-----

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 08:10:31 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 69470106566B
	for <fs@freebsd.org>; Mon, 28 Apr 2008 08:10:31 +0000 (UTC)
	(envelope-from daichi@freebsd.org)
Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.246.90])
	by mx1.freebsd.org (Postfix) with ESMTP id 414298FC1E
	for <fs@freebsd.org>; Mon, 28 Apr 2008 08:10:31 +0000 (UTC)
	(envelope-from daichi@freebsd.org)
Received: from parancell.ongs.co.jp (dullmdaler.ongs.co.jp [202.216.246.94])
	by natial.ongs.co.jp (Postfix) with ESMTP id 12A99125438;
	Mon, 28 Apr 2008 17:10:30 +0900 (JST)
Message-ID: <48158675.1060809@freebsd.org>
Date: Mon, 28 Apr 2008 17:10:29 +0900
From: Daichi GOTO <daichi@freebsd.org>
User-Agent: Thunderbird 2.0.0.12 (X11/20080423)
MIME-Version: 1.0
To: Kostik Belousov <kostikbel@gmail.com>
References: <4811B0A0.8040702@freebsd.org>
	<20080426100116.GL18958@deviant.kiev.zoral.com.ua>
In-Reply-To: <20080426100116.GL18958@deviant.kiev.zoral.com.ua>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: fs@freebsd.org
Subject: Re: Approval request of some additions to src/sys/kern/vfs_subr.c
 and src/sys/sys/vnode.h
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 08:10:31 -0000

Kostik Belousov wrote:
> On Fri, Apr 25, 2008 at 07:21:20PM +0900, Daichi GOTO wrote:
>> Hi Konstantin :)
>>
>> To fix a unionfs issue of http://www.freebsd.org/cgi/query-pr.cgi?pr=109377,
>> we need to add new functions
>>
>>    void vkernrele(struct vnode *vp);
>>    void vkernref(struct vnode *vp);
>>
>> and one value
>>
>>    int	v_kernusecount; /* i ref count of kernel */
>>
>> to src/sys/sys/vnode.h and rc/sys/kern/vfs_subr.c.
>>
>> Unionfs will be panic when lower fs layer is forced umounted by
>> "umount -f".  So to avoid this issue, we've added
>> "v_kernusecount" value that means "a vnode count that kernel are
>> using".  vkernrele() and vkernref() are functions that manage
>> "v_kernusecount" value.
>>
>> Please check those and give us an approve or some comments!
> 
> There is already the vnode reference count. From your description, I cannot
> understand how the kernusecount would prevent the panic when forced unmount
> is performed. Could you, please, show the actual code ? PR mentioned
> does not contain any patch.

Oops, sorry. patch is follow:
   http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-p20-3.diff

> The problem you described is common for the kernel code, and right way
> to handle it, for now, is to keep refcount _and_ check for the forced
> reclaim.

-- 
   Daichi GOTO, http://people.freebsd.org/~daichi

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 08:29:45 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 58BD7106567D
	for <freebsd-fs@freebsd.org>; Mon, 28 Apr 2008 08:29:45 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 1793F8FC13
	for <freebsd-fs@freebsd.org>; Mon, 28 Apr 2008 08:29:44 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1JqOk3-00020z-NQ
	for freebsd-fs@freebsd.org; Mon, 28 Apr 2008 08:29:43 +0000
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Mon, 28 Apr 2008 08:29:43 +0000
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Mon, 28 Apr 2008 08:29:43 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Mon, 28 Apr 2008 10:29:23 +0200
Lines: 30
Message-ID: <fv41tg$ubl$1@ger.gmane.org>
References: <48142ABE.4050107@psg.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enig54E08CDECC35E8AB718D4E48"
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Thunderbird 2.0.0.12 (X11/20080227)
In-Reply-To: <48142ABE.4050107@psg.com>
X-Enigmail-Version: 0.95.0
Sender: news <news@ger.gmane.org>
Subject: Re: zfs and vfs.zfs.prefetch_disable="1"
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 08:29:45 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig54E08CDECC35E8AB718D4E48
Randy Bush wrote: 

> i have in my incantations for zfs on i386 to stick the following in
> /boot/loader.conf.local
>=20
> vm.kmem_size=3D600M
> vm.kmem_size_max=3D600M
> zfs_load=3DYES
> vfs.zfs.prefetch_disable=3D1

Cannot say for sure but AFAIK it was mentioned during the Great ZFS
Flamewars as a possible way to reduce memory usage by ZFS, and also as a
possible way of avoiding some deadlocks (possibly py Pawel).


--------------enig54E08CDECC35E8AB718D4E48
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIFYrwldnAQVacBcgRAlpAAJ9ZaJRh9JzqoOxyM0tNoGQepimwhgCgjzy4
/7yuWLYVYnAUezVaMhEXzfs=
=lUQJ
-----END PGP SIGNATURE-----

--------------enig54E08CDECC35E8AB718D4E48--


From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 08:31:33 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ECC69106566B;
	Mon, 28 Apr 2008 08:31:33 +0000 (UTC)
	(envelope-from jdc@parodius.com)
Received: from mx01.sc1.parodius.com (mx01.sc1.parodius.com [72.20.106.3])
	by mx1.freebsd.org (Postfix) with ESMTP id CA7EE8FC1B;
	Mon, 28 Apr 2008 08:31:33 +0000 (UTC)
	(envelope-from jdc@parodius.com)
Received: by mx01.sc1.parodius.com (Postfix, from userid 1000)
	id B94AB1CC033; Mon, 28 Apr 2008 01:31:33 -0700 (PDT)
Date: Mon, 28 Apr 2008 01:31:33 -0700
From: Jeremy Chadwick <koitsu@freebsd.org>
To: Randy Bush <randy@psg.com>
Message-ID: <20080428083133.GA81628@eos.sc1.parodius.com>
References: <48142ABE.4050107@psg.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <48142ABE.4050107@psg.com>
User-Agent: Mutt/1.5.17 (2007-11-01)
Cc: freebsd-fs@FreeBSD.ORG, ivoras@freebsd.org
Subject: Re: zfs and vfs.zfs.prefetch_disable="1"
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 08:31:34 -0000

On Sun, Apr 27, 2008 at 04:26:54PM +0900, Randy Bush wrote:
> i have in my incantations for zfs on i386 to stick the following in
> /boot/loader.conf.local
> 
> vm.kmem_size=600M
> vm.kmem_size_max=600M
> zfs_load=YES
> vfs.zfs.prefetch_disable=1
> 
> but i have no idea where that last one crept in.  any clues?

It probably came from the old version of the "ZFS Tuning Guide" section
of the ZFS on FreeBSD Wiki.  It was removed on August 30th 2007 by Ivan
Voras.

http://wiki.freebsd.org/ZFSTuningGuide?action=diff&rev2=12&rev1=11
http://wiki.freebsd.org/ZFSTuningGuide?action=recall&rev=12
http://wiki.freebsd.org/ZFSTuningGuide?action=recall&rev=11

I've CC'd him here.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 08:33:27 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4EE16106567A;
	Mon, 28 Apr 2008 08:33:27 +0000 (UTC) (envelope-from peter@holm.cc)
Received: from wbm3.pair.net (wbm3.pair.net [209.68.3.66])
	by mx1.freebsd.org (Postfix) with ESMTP id 299698FC30;
	Mon, 28 Apr 2008 08:33:26 +0000 (UTC) (envelope-from peter@holm.cc)
Received: by wbm3.pair.net (Postfix, from userid 65534)
	id 007F86B179; Mon, 28 Apr 2008 04:33:23 -0400 (EDT)
Received: from 193.234.247.50 ([193.234.247.50])
	(SquirrelMail authenticated user holm@aedde.pair.com)
	by webmail3.pair.com with HTTP;
	Mon, 28 Apr 2008 10:33:23 +0200 (CEST)
Message-ID: <35682.193.234.247.50.1209371603.squirrel@webmail3.pair.com>
In-Reply-To: <481580CB.1000800@delphij.net>
References: <4815620F.3090005@delphij.net>
	<64011.193.234.247.50.1209366028.squirrel@webmail3.pair.com>
	<481580CB.1000800@delphij.net>
Date: Mon, 28 Apr 2008 10:33:23 +0200 (CEST)
From: "Peter Holm" <peter@holm.cc>
To: d@delphij.net
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
Cc: freebsd-fs@freebsd.org, Jeff Roberson <jeff@freebsd.org>, d@delphij.net,
	kib@freebsd.org
Subject: Re: [7.0-R] Possible ufs livelock during coredump path?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 08:33:27 -0000


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Peter Holm wrote:
> | Hi,
> |
> | It seems that we have a potential livelock during coredump on 7.0-R, the
> | case was that two processes trying to coredump in the same time (e.g. if
> | I configure kern.corefile=/var/tmp/%N.core and a lot of instances
> | coredump in the same time), perhaps when paging involved with it.  Upon
> | reboot, it would not recover but wait infinitely.  The box is running
> | 7.0-R/i386, UP (Origin = "GenuineIntel"  Id = 0xf34  Stepping = 4).
> |
> | Is this an known issue?  This is my own server but I do not have my
> | hands on it because it is in China, however I can provide some help if
> | the experiment can be recovered with a power-cycle :)
> |
> |
> |> AFAIK it is an old problem. I have some test where I had to disable
> core
> |> dumps for the same reason. I seem to remember that the problem is
> related
> |> to running out of VM?
>
> For my case it does not seem to be ran out of VM (at least the system
> did not printed out any messages, the log has a lot of kernel: pid 27223
> (httpd), uid 80: exited on signal 11 (core dumped) but not the out of
> swap one.
>

Nor did I, as I remember.

> So, presumably we can reliably trigger this situation (or at least your
> ones :)?
>

It's a long time since I looked at this problem, but this would seem to be
a good excuse to look at it again.

- Peter

> Cheers,
> - --
> Xin LI <delphij@delphij.net>	http://www.delphij.net/
> FreeBSD - The Power to Serve!
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.9 (FreeBSD)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkgVgMsACgkQi+vbBBjt66BnOwCeJLB5xoE27b3CN/x/VIL+0EAI
> +c8AoJyYiqCi7tBeqZBx6cj/+gzBLmFn
> =qZmb
> -----END PGP SIGNATURE-----
>


From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 09:12:00 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5EEB21065670
	for <freebsd-fs@freebsd.org>; Mon, 28 Apr 2008 09:12:00 +0000 (UTC)
	(envelope-from ivoras@gmail.com)
Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.231])
	by mx1.freebsd.org (Postfix) with ESMTP id 3764C8FC0C
	for <freebsd-fs@freebsd.org>; Mon, 28 Apr 2008 09:11:59 +0000 (UTC)
	(envelope-from ivoras@gmail.com)
Received: by rv-out-0506.google.com with SMTP id b25so3144944rvf.43
	for <freebsd-fs@freebsd.org>; Mon, 28 Apr 2008 02:11:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth;
	bh=2yIAgDssI4Sm+vBTRgfcYPI1CSMB6pV2G7M0ZbblSrs=;
	b=qAclsUdhRp1g/vdodbFisUOBIduUfmybPPczxjKknnkKV+x0E2BSxEG0zmfKwm3+sSvlifZRqXaqzQ8RsmdMZb+jKxD3Deec/NxXV+IwEfgosvY0QVcJTq2P/p+qOmIpxo374fVsZOo0LmS8YtG9Sc0L7POiPhKaHnzUXYu90f0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth;
	b=CBOI8KnqmiYhnzLYde9kHA8ge5zSEMJLmezJcvAWMO0SiDUUANHQ63hOJRAmuwdY66MiU0C63xh8qz9vzjXKSSMz8FBsO3aE57f3dee1dUNUcIK5VKaP91T2+OguJuD/JFvOLftsfkiqKF4zTr5rB30H9uu6BAxpUFam0VDZk3k=
Received: by 10.141.37.8 with SMTP id p8mr3133220rvj.53.1209372311759;
	Mon, 28 Apr 2008 01:45:11 -0700 (PDT)
Received: by 10.141.212.1 with HTTP; Mon, 28 Apr 2008 01:45:11 -0700 (PDT)
Message-ID: <9bbcef730804280145x6961c43ekab916ec289396361@mail.gmail.com>
Date: Mon, 28 Apr 2008 10:45:11 +0200
From: "Ivan Voras" <ivoras@freebsd.org>
Sender: ivoras@gmail.com
To: "Jeremy Chadwick" <koitsu@freebsd.org>
In-Reply-To: <20080428083133.GA81628@eos.sc1.parodius.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <48142ABE.4050107@psg.com>
	<20080428083133.GA81628@eos.sc1.parodius.com>
X-Google-Sender-Auth: f32a2f258c169cce
Cc: Randy Bush <randy@psg.com>, freebsd-fs@freebsd.org
Subject: Re: zfs and vfs.zfs.prefetch_disable="1"
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 09:12:00 -0000

2008/4/28 Jeremy Chadwick <koitsu@freebsd.org>:
> On Sun, Apr 27, 2008 at 04:26:54PM +0900, Randy Bush wrote:
>  > i have in my incantations for zfs on i386 to stick the following in
>  > /boot/loader.conf.local
>  >
>  > vm.kmem_size=600M
>  > vm.kmem_size_max=600M
>  > zfs_load=YES
>  > vfs.zfs.prefetch_disable=1
>  >
>  > but i have no idea where that last one crept in.  any clues?
>
>  It probably came from the old version of the "ZFS Tuning Guide" section
>  of the ZFS on FreeBSD Wiki.  It was removed on August 30th 2007 by Ivan
>  Voras.
>
>  http://wiki.freebsd.org/ZFSTuningGuide?action=diff&rev2=12&rev1=11
>  http://wiki.freebsd.org/ZFSTuningGuide?action=recall&rev=12
>  http://wiki.freebsd.org/ZFSTuningGuide?action=recall&rev=11
>
>  I've CC'd him here.

The change you reference is apparently about zil_disable, which wasn't
removed, just moved. But the prefetch_disable setting was added and
removed a couple of times to the page, the latest being that it was
added since Pawel uses it in his post.

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 09:37:10 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9C452106566C
	for <fs@freebsd.org>; Mon, 28 Apr 2008 09:37:10 +0000 (UTC)
	(envelope-from daichi@freebsd.org)
Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.246.90])
	by mx1.freebsd.org (Postfix) with ESMTP id 75C3B8FC0A
	for <fs@freebsd.org>; Mon, 28 Apr 2008 09:37:10 +0000 (UTC)
	(envelope-from daichi@freebsd.org)
Received: from parancell.ongs.co.jp (dullmdaler.ongs.co.jp [202.216.246.94])
	by natial.ongs.co.jp (Postfix) with ESMTP id C3195125438;
	Mon, 28 Apr 2008 18:37:09 +0900 (JST)
Message-ID: <48159AC5.3030000@freebsd.org>
Date: Mon, 28 Apr 2008 18:37:09 +0900
From: Daichi GOTO <daichi@freebsd.org>
User-Agent: Thunderbird 2.0.0.12 (X11/20080423)
MIME-Version: 1.0
To: Kostik Belousov <kostikbel@gmail.com>
References: <4811B0A0.8040702@freebsd.org>
	<20080426100116.GL18958@deviant.kiev.zoral.com.ua>
In-Reply-To: <20080426100116.GL18958@deviant.kiev.zoral.com.ua>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: fs@freebsd.org
Subject: Re: Approval request of some additions to src/sys/kern/vfs_subr.c
 and src/sys/sys/vnode.h
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 09:37:10 -0000

Kostik Belousov wrote:
> On Fri, Apr 25, 2008 at 07:21:20PM +0900, Daichi GOTO wrote:
>> Hi Konstantin :)
>>
>> To fix a unionfs issue of http://www.freebsd.org/cgi/query-pr.cgi?pr=109377,
>> we need to add new functions
>>
>>    void vkernrele(struct vnode *vp);
>>    void vkernref(struct vnode *vp);
>>
>> and one value
>>
>>    int	v_kernusecount; /* i ref count of kernel */
>>
>> to src/sys/sys/vnode.h and rc/sys/kern/vfs_subr.c.
>>
>> Unionfs will be panic when lower fs layer is forced umounted by
>> "umount -f".  So to avoid this issue, we've added
>> "v_kernusecount" value that means "a vnode count that kernel are
>> using".  vkernrele() and vkernref() are functions that manage
>> "v_kernusecount" value.
>>
>> Please check those and give us an approve or some comments!
> 
> There is already the vnode reference count. From your description, I cannot
> understand how the kernusecount would prevent the panic when forced unmount
> is performed. Could you, please, show the actual code ? PR mentioned
> does not contain any patch.

Our patch realizes avoiding kernel panic by "umount -f" operation using with
EBUSY process.

On current implementation (not applied our patch), "umount -f" tries to
release vnode at any vnode reference count value. Since that, unionfs
and nullfs access invalid vnode and lead kernel panic. To prevent this
issue, we need a some kind of not-umount-accept-mechanism in invalid case
(e.x. fs in unionfsed stack, it must be umounted in correct order) and
to realize that, current vnode reference count is not enough we are thinking.

If you have any ideas to realize the same solution with current vnode
reference, would you please tell us your idea :)

> The problem you described is common for the kernel code, and right way
> to handle it, for now, is to keep refcount _and_ check for the forced
> reclaim.

-- 
   Daichi GOTO, http://people.freebsd.org/~daichi

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 11:06:56 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4D2D210656AA
	for <freebsd-fs@FreeBSD.org>; Mon, 28 Apr 2008 11:06:56 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 42BC98FC32
	for <freebsd-fs@FreeBSD.org>; Mon, 28 Apr 2008 11:06:56 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m3SB6u6b056105
	for <freebsd-fs@FreeBSD.org>; Mon, 28 Apr 2008 11:06:56 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m3SB6tmj056101
	for freebsd-fs@FreeBSD.org; Mon, 28 Apr 2008 11:06:55 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 28 Apr 2008 11:06:55 GMT
Message-Id: <200804281106.m3SB6tmj056101@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 11:06:56 -0000

Current FreeBSD problem reports
Critical problems
Serious problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o kern/116170  fs         [panic] Kernel panic when mounting /tmp
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
o bin/122172   fs         [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE

5 problems total.

Non-critical problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o bin/118249   fs         mv(1): moving a directory changes its mtime

6 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 13:24:25 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CA7971065677;
	Mon, 28 Apr 2008 13:24:25 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from anti-4.kiev.sovam.com (anti-4.kiev.sovam.com [62.64.120.202])
	by mx1.freebsd.org (Postfix) with ESMTP id 625968FC1C;
	Mon, 28 Apr 2008 13:24:25 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from [212.82.216.226] (helo=skuns.kiev.zoral.com.ua)
	by anti-4.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256)
	(Exim 4.67) (envelope-from <kostikbel@gmail.com>)
	id 1JqTLD-0003Wq-8T; Mon, 28 Apr 2008 16:24:23 +0300
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by skuns.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m3SDOJZh040737
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 28 Apr 2008 16:24:19 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id
	m3SDODBx052241; Mon, 28 Apr 2008 16:24:13 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id m3SDODTT052240; 
	Mon, 28 Apr 2008 16:24:13 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Mon, 28 Apr 2008 16:24:13 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: Daichi GOTO <daichi@freebsd.org>
Message-ID: <20080428132413.GS18958@deviant.kiev.zoral.com.ua>
References: <4811B0A0.8040702@freebsd.org>
	<20080426100116.GL18958@deviant.kiev.zoral.com.ua>
	<48159AC5.3030000@freebsd.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="A61Eau4L8twGtri1"
Content-Disposition: inline
In-Reply-To: <48159AC5.3030000@freebsd.org>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: ClamAV version 0.91.2,
	clamav-milter version 0.91.2 on skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.4
X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on
	skuns.kiev.zoral.com.ua
X-Scanner-Signature: e7d6a74e6a41bfc4865bf8c76b21c35c
X-DrWeb-checked: yes
X-SpamTest-Envelope-From: kostikbel@gmail.com
X-SpamTest-Group-ID: 00000000
X-SpamTest-Info: Profiles 2733 [Apr 28 2008]
X-SpamTest-Info: helo_type=3
X-SpamTest-Info: {received from trusted relay: not dialup}
X-SpamTest-Method: none
X-SpamTest-Method: Local Lists
X-SpamTest-Rate: 0
X-SpamTest-Status: Not detected
X-SpamTest-Status-Extended: not_detected
X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release
Cc: fs@freebsd.org
Subject: Re: Approval request of some additions to src/sys/kern/vfs_subr.c
	and src/sys/sys/vnode.h
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 13:24:25 -0000


--A61Eau4L8twGtri1
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Apr 28, 2008 at 06:37:09PM +0900, Daichi GOTO wrote:
> Kostik Belousov wrote:
> >On Fri, Apr 25, 2008 at 07:21:20PM +0900, Daichi GOTO wrote:
> >>Hi Konstantin :)
> >>
> >>To fix a unionfs issue of=20
> >>http://www.freebsd.org/cgi/query-pr.cgi?pr=3D109377,
> >>we need to add new functions
> >>
> >>   void vkernrele(struct vnode *vp);
> >>   void vkernref(struct vnode *vp);
> >>
> >>and one value
> >>
> >>   int	v_kernusecount; /* i ref count of kernel */
> >>
> >>to src/sys/sys/vnode.h and rc/sys/kern/vfs_subr.c.
> >>
> >>Unionfs will be panic when lower fs layer is forced umounted by
> >>"umount -f".  So to avoid this issue, we've added
> >>"v_kernusecount" value that means "a vnode count that kernel are
> >>using".  vkernrele() and vkernref() are functions that manage
> >>"v_kernusecount" value.
> >>
> >>Please check those and give us an approve or some comments!
> >
> >There is already the vnode reference count. From your description, I can=
not
> >understand how the kernusecount would prevent the panic when forced unmo=
unt
> >is performed. Could you, please, show the actual code ? PR mentioned
> >does not contain any patch.
>=20
> Our patch realizes avoiding kernel panic by "umount -f" operation using w=
ith
> EBUSY process.
>=20
> On current implementation (not applied our patch), "umount -f" tries to
> release vnode at any vnode reference count value. Since that, unionfs
> and nullfs access invalid vnode and lead kernel panic. To prevent this
> issue, we need a some kind of not-umount-accept-mechanism in invalid case
> (e.x. fs in unionfsed stack, it must be umounted in correct order) and
> to realize that, current vnode reference count is not enough we are=20
> thinking.
>=20
> If you have any ideas to realize the same solution with current vnode
> reference, would you please tell us your idea :)
>=20
> >The problem you described is common for the kernel code, and right way
> >to handle it, for now, is to keep refcount _and_ check for the forced
> >reclaim.

Your patch in essence disables the forced unmount. I would object against
such decision.

Even if taking this direction, I believe more cleaner solution would be
to introduce a counter that disables the (forced) unmount into the
struct mount, instead of the struct vnode. Having the counter in the
vnode, the unmount -f behaviour is non-deterministic and depended on
the presence of the cached vnodes of the upper layer. The mount counter
would be incremented by unionfs cover mount. But, as I said above, this
looks like a wrong solution.

The right way to handle the forced reclaim with the current VFS is to
add the explicit checks for the reclaimed vnodes where it is needed. The
vnode cannot be reclaimed while the vnode lock is held. When obtaining
the vnode lock, the reclamation can be detected. For instance, the
vget() without LK_RETRY shall be checked for ENOENT.

You said that that nullfs is vulnerable to the problem. Could you,
please, point me to the corresponding stack trace ? At least, the nullfs
vop_lock() seems to carefully check the possible problems.

--A61Eau4L8twGtri1
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (FreeBSD)

iEYEARECAAYFAkgVz/wACgkQC3+MBN1Mb4isvwCfbECmYEu6lJ2FXIqaU3zYPTZs
5I0AoNzrqhXvT5XHDQs+l65owxM8rfp3
=eTfF
-----END PGP SIGNATURE-----

--A61Eau4L8twGtri1--

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 14:36:39 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 16D251065674
	for <fs@freebsd.org>; Mon, 28 Apr 2008 14:36:39 +0000 (UTC)
	(envelope-from daichi@freebsd.org)
Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.246.90])
	by mx1.freebsd.org (Postfix) with ESMTP id BDCBD8FC0A
	for <fs@freebsd.org>; Mon, 28 Apr 2008 14:36:38 +0000 (UTC)
	(envelope-from daichi@freebsd.org)
Received: from parancell.ongs.co.jp (dullmdaler.ongs.co.jp [202.216.246.94])
	by natial.ongs.co.jp (Postfix) with ESMTP id E5077125438;
	Mon, 28 Apr 2008 23:36:37 +0900 (JST)
Message-ID: <4815E0F5.30706@freebsd.org>
Date: Mon, 28 Apr 2008 23:36:37 +0900
From: Daichi GOTO <daichi@freebsd.org>
User-Agent: Thunderbird 2.0.0.12 (X11/20080423)
MIME-Version: 1.0
To: Kostik Belousov <kostikbel@gmail.com>
References: <4811B0A0.8040702@freebsd.org>
	<20080426100116.GL18958@deviant.kiev.zoral.com.ua>
	<48159AC5.3030000@freebsd.org>
	<20080428132413.GS18958@deviant.kiev.zoral.com.ua>
In-Reply-To: <20080428132413.GS18958@deviant.kiev.zoral.com.ua>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: fs@freebsd.org
Subject: Re: Approval request of some additions to src/sys/kern/vfs_subr.c
 and src/sys/sys/vnode.h
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 14:36:39 -0000

Kostik Belousov wrote:
> On Mon, Apr 28, 2008 at 06:37:09PM +0900, Daichi GOTO wrote:
>> Kostik Belousov wrote:
>>> On Fri, Apr 25, 2008 at 07:21:20PM +0900, Daichi GOTO wrote:
>>>> Hi Konstantin :)
>>>>
>>>> To fix a unionfs issue of 
>>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=109377,
>>>> we need to add new functions
>>>>
>>>>   void vkernrele(struct vnode *vp);
>>>>   void vkernref(struct vnode *vp);
>>>>
>>>> and one value
>>>>
>>>>   int	v_kernusecount; /* i ref count of kernel */
>>>>
>>>> to src/sys/sys/vnode.h and rc/sys/kern/vfs_subr.c.
>>>>
>>>> Unionfs will be panic when lower fs layer is forced umounted by
>>>> "umount -f".  So to avoid this issue, we've added
>>>> "v_kernusecount" value that means "a vnode count that kernel are
>>>> using".  vkernrele() and vkernref() are functions that manage
>>>> "v_kernusecount" value.
>>>>
>>>> Please check those and give us an approve or some comments!
>>> There is already the vnode reference count. From your description, I cannot
>>> understand how the kernusecount would prevent the panic when forced unmount
>>> is performed. Could you, please, show the actual code ? PR mentioned
>>> does not contain any patch.
>> Our patch realizes avoiding kernel panic by "umount -f" operation using with
>> EBUSY process.
>>
>> On current implementation (not applied our patch), "umount -f" tries to
>> release vnode at any vnode reference count value. Since that, unionfs
>> and nullfs access invalid vnode and lead kernel panic. To prevent this
>> issue, we need a some kind of not-umount-accept-mechanism in invalid case
>> (e.x. fs in unionfsed stack, it must be umounted in correct order) and
>> to realize that, current vnode reference count is not enough we are 
>> thinking.
>>
>> If you have any ideas to realize the same solution with current vnode
>> reference, would you please tell us your idea :)
>>
>>> The problem you described is common for the kernel code, and right way
>>> to handle it, for now, is to keep refcount _and_ check for the forced
>>> reclaim.
> 
> Your patch in essence disables the forced unmount. I would object against
> such decision.
> 
> Even if taking this direction, I believe more cleaner solution would be
> to introduce a counter that disables the (forced) unmount into the
> struct mount, instead of the struct vnode. Having the counter in the
> vnode, the unmount -f behaviour is non-deterministic and depended on
> the presence of the cached vnodes of the upper layer. The mount counter
> would be incremented by unionfs cover mount. But, as I said above, this
> looks like a wrong solution.
> 
> The right way to handle the forced reclaim with the current VFS is to
> add the explicit checks for the reclaimed vnodes where it is needed. The
> vnode cannot be reclaimed while the vnode lock is held. When obtaining
> the vnode lock, the reclamation can be detected. For instance, the
> vget() without LK_RETRY shall be checked for ENOENT.
> 
> You said that that nullfs is vulnerable to the problem. Could you,
> please, point me to the corresponding stack trace ? At least, the nullfs
> vop_lock() seems to carefully check the possible problems.

-- 
   Daichi GOTO, http://people.freebsd.org/~daichi

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 14:36:57 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 391391065687
	for <fs@freebsd.org>; Mon, 28 Apr 2008 14:36:57 +0000 (UTC)
	(envelope-from daichi@freebsd.org)
Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.246.90])
	by mx1.freebsd.org (Postfix) with ESMTP id AC3648FC12
	for <fs@freebsd.org>; Mon, 28 Apr 2008 14:36:56 +0000 (UTC)
	(envelope-from daichi@freebsd.org)
Received: from parancell.ongs.co.jp (dullmdaler.ongs.co.jp [202.216.246.94])
	by natial.ongs.co.jp (Postfix) with ESMTP id CAAA7125438;
	Mon, 28 Apr 2008 23:36:55 +0900 (JST)
Message-ID: <4815E107.9030902@freebsd.org>
Date: Mon, 28 Apr 2008 23:36:55 +0900
From: Daichi GOTO <daichi@freebsd.org>
User-Agent: Thunderbird 2.0.0.12 (X11/20080423)
MIME-Version: 1.0
To: Kostik Belousov <kostikbel@gmail.com>
References: <4811B0A0.8040702@freebsd.org>
	<20080426100116.GL18958@deviant.kiev.zoral.com.ua>
	<48159AC5.3030000@freebsd.org>
	<20080428132413.GS18958@deviant.kiev.zoral.com.ua>
In-Reply-To: <20080428132413.GS18958@deviant.kiev.zoral.com.ua>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: fs@freebsd.org
Subject: Re: Approval request of some additions to src/sys/kern/vfs_subr.c
 and src/sys/sys/vnode.h
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 14:36:57 -0000

Thanks for your response and explanation :)

Kostik Belousov wrote:
> On Mon, Apr 28, 2008 at 06:37:09PM +0900, Daichi GOTO wrote:
>> Kostik Belousov wrote:
>>> On Fri, Apr 25, 2008 at 07:21:20PM +0900, Daichi GOTO wrote:
>>>> Hi Konstantin :)
>>>>
>>>> To fix a unionfs issue of 
>>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=109377,
>>>> we need to add new functions
>>>>
>>>>   void vkernrele(struct vnode *vp);
>>>>   void vkernref(struct vnode *vp);
>>>>
>>>> and one value
>>>>
>>>>   int	v_kernusecount; /* i ref count of kernel */
>>>>
>>>> to src/sys/sys/vnode.h and rc/sys/kern/vfs_subr.c.
>>>>
>>>> Unionfs will be panic when lower fs layer is forced umounted by
>>>> "umount -f".  So to avoid this issue, we've added
>>>> "v_kernusecount" value that means "a vnode count that kernel are
>>>> using".  vkernrele() and vkernref() are functions that manage
>>>> "v_kernusecount" value.
>>>>
>>>> Please check those and give us an approve or some comments!
>>> There is already the vnode reference count. From your description, I cannot
>>> understand how the kernusecount would prevent the panic when forced unmount
>>> is performed. Could you, please, show the actual code ? PR mentioned
>>> does not contain any patch.
>> Our patch realizes avoiding kernel panic by "umount -f" operation using with
>> EBUSY process.
>>
>> On current implementation (not applied our patch), "umount -f" tries to
>> release vnode at any vnode reference count value. Since that, unionfs
>> and nullfs access invalid vnode and lead kernel panic. To prevent this
>> issue, we need a some kind of not-umount-accept-mechanism in invalid case
>> (e.x. fs in unionfsed stack, it must be umounted in correct order) and
>> to realize that, current vnode reference count is not enough we are 
>> thinking.
>>
>> If you have any ideas to realize the same solution with current vnode
>> reference, would you please tell us your idea :)
>>
>>> The problem you described is common for the kernel code, and right way
>>> to handle it, for now, is to keep refcount _and_ check for the forced
>>> reclaim.
> 
> Your patch in essence disables the forced unmount. I would object against
> such decision.

Oooooo....   OK. We understand.

> Even if taking this direction, I believe more cleaner solution would be
> to introduce a counter that disables the (forced) unmount into the
> struct mount, instead of the struct vnode. Having the counter in the
> vnode, the unmount -f behaviour is non-deterministic and depended on
> the presence of the cached vnodes of the upper layer. The mount counter
> would be incremented by unionfs cover mount. But, as I said above, this
> looks like a wrong solution.
> 
> The right way to handle the forced reclaim with the current VFS is to
> add the explicit checks for the reclaimed vnodes where it is needed. The
> vnode cannot be reclaimed while the vnode lock is held. When obtaining
> the vnode lock, the reclamation can be detected. For instance, the
> vget() without LK_RETRY shall be checked for ENOENT.

At last, we want to check that vnode is released or not where
unionfs does not know. If we can do that check, our patch is
not needed for solving that issue.

Would you please give us the way to check that target vnode is
released or not before accessing it.


> You said that that nullfs is vulnerable to the problem. Could you,
> please, point me to the corresponding stack trace ? At least, the nullfs
> vop_lock() seems to carefully check the possible problems.

-- 
   Daichi GOTO, http://people.freebsd.org/~daichi

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 16:22:51 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CEB14106567B;
	Mon, 28 Apr 2008 16:22:51 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from anti-4.kiev.sovam.com (anti-4.kiev.sovam.com [62.64.120.202])
	by mx1.freebsd.org (Postfix) with ESMTP id 631588FC1B;
	Mon, 28 Apr 2008 16:22:51 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from [212.82.216.226] (helo=skuns.kiev.zoral.com.ua)
	by anti-4.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256)
	(Exim 4.67) (envelope-from <kostikbel@gmail.com>)
	id 1JqW7t-000C1E-Fe; Mon, 28 Apr 2008 19:22:50 +0300
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by skuns.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m3SGMiaC048965
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 28 Apr 2008 19:22:44 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id
	m3SGMdL0057121; Mon, 28 Apr 2008 19:22:39 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id m3SGMctY057120; 
	Mon, 28 Apr 2008 19:22:38 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Mon, 28 Apr 2008 19:22:38 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: Daichi GOTO <daichi@freebsd.org>
Message-ID: <20080428162238.GT18958@deviant.kiev.zoral.com.ua>
References: <4811B0A0.8040702@freebsd.org>
	<20080426100116.GL18958@deviant.kiev.zoral.com.ua>
	<48159AC5.3030000@freebsd.org>
	<20080428132413.GS18958@deviant.kiev.zoral.com.ua>
	<4815E107.9030902@freebsd.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="FJ2D5YQYG6NL2pc1"
Content-Disposition: inline
In-Reply-To: <4815E107.9030902@freebsd.org>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: ClamAV version 0.91.2,
	clamav-milter version 0.91.2 on skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.4
X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on
	skuns.kiev.zoral.com.ua
X-Scanner-Signature: 00e80b47c0af1091817fe870f3098f37
X-DrWeb-checked: yes
X-SpamTest-Envelope-From: kostikbel@gmail.com
X-SpamTest-Group-ID: 00000000
X-SpamTest-Info: Profiles 2737 [Apr 28 2008]
X-SpamTest-Info: helo_type=3
X-SpamTest-Info: {received from trusted relay: not dialup}
X-SpamTest-Method: none
X-SpamTest-Method: Local Lists
X-SpamTest-Rate: 0
X-SpamTest-Status: Not detected
X-SpamTest-Status-Extended: not_detected
X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release
Cc: fs@freebsd.org
Subject: Re: Approval request of some additions to src/sys/kern/vfs_subr.c
	and src/sys/sys/vnode.h
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 16:22:51 -0000


--FJ2D5YQYG6NL2pc1
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Apr 28, 2008 at 11:36:55PM +0900, Daichi GOTO wrote:
> Thanks for your response and explanation :)
>=20
> Kostik Belousov wrote:
> >On Mon, Apr 28, 2008 at 06:37:09PM +0900, Daichi GOTO wrote:
> >>Kostik Belousov wrote:
> >>>On Fri, Apr 25, 2008 at 07:21:20PM +0900, Daichi GOTO wrote:
> >>>>Hi Konstantin :)
> >>>>
> >>>>To fix a unionfs issue of=20
> >>>>http://www.freebsd.org/cgi/query-pr.cgi?pr=3D109377,
> >>>>we need to add new functions
> >>>>
> >>>>  void vkernrele(struct vnode *vp);
> >>>>  void vkernref(struct vnode *vp);
> >>>>
> >>>>and one value
> >>>>
> >>>>  int	v_kernusecount; /* i ref count of kernel */
> >>>>
> >>>>to src/sys/sys/vnode.h and rc/sys/kern/vfs_subr.c.
> >>>>
> >>>>Unionfs will be panic when lower fs layer is forced umounted by
> >>>>"umount -f".  So to avoid this issue, we've added
> >>>>"v_kernusecount" value that means "a vnode count that kernel are
> >>>>using".  vkernrele() and vkernref() are functions that manage
> >>>>"v_kernusecount" value.
> >>>>
> >>>>Please check those and give us an approve or some comments!
> >>>There is already the vnode reference count. From your description, I=
=20
> >>>cannot
> >>>understand how the kernusecount would prevent the panic when forced=20
> >>>unmount
> >>>is performed. Could you, please, show the actual code ? PR mentioned
> >>>does not contain any patch.
> >>Our patch realizes avoiding kernel panic by "umount -f" operation using=
=20
> >>with
> >>EBUSY process.
> >>
> >>On current implementation (not applied our patch), "umount -f" tries to
> >>release vnode at any vnode reference count value. Since that, unionfs
> >>and nullfs access invalid vnode and lead kernel panic. To prevent this
> >>issue, we need a some kind of not-umount-accept-mechanism in invalid ca=
se
> >>(e.x. fs in unionfsed stack, it must be umounted in correct order) and
> >>to realize that, current vnode reference count is not enough we are=20
> >>thinking.
> >>
> >>If you have any ideas to realize the same solution with current vnode
> >>reference, would you please tell us your idea :)
> >>
> >>>The problem you described is common for the kernel code, and right way
> >>>to handle it, for now, is to keep refcount _and_ check for the forced
> >>>reclaim.
> >
> >Your patch in essence disables the forced unmount. I would object against
> >such decision.
>=20
> Oooooo....   OK. We understand.
>=20
> >Even if taking this direction, I believe more cleaner solution would be
> >to introduce a counter that disables the (forced) unmount into the
> >struct mount, instead of the struct vnode. Having the counter in the
> >vnode, the unmount -f behaviour is non-deterministic and depended on
> >the presence of the cached vnodes of the upper layer. The mount counter
> >would be incremented by unionfs cover mount. But, as I said above, this
> >looks like a wrong solution.
> >
> >The right way to handle the forced reclaim with the current VFS is to
> >add the explicit checks for the reclaimed vnodes where it is needed. The
> >vnode cannot be reclaimed while the vnode lock is held. When obtaining
> >the vnode lock, the reclamation can be detected. For instance, the
> >vget() without LK_RETRY shall be checked for ENOENT.
>=20
> At last, we want to check that vnode is released or not where
> unionfs does not know. If we can do that check, our patch is
> not needed for solving that issue.
>=20
> Would you please give us the way to check that target vnode is
> released or not before accessing it.

The basic rules of our VFS are:
1. You _must_ hold the vnode unless the vnode is locked. Hold count
   prevents the vnode memory from being reused and guarantees the
   validity of the counters, v_vnlock, v_mount and vop (but please note
   that validity !=3D stability). E.g., v_mount may be NULLed and vop
   become the deadfs_vop due to reclamation.
2. The vnode lock is held when the vnode is vgone(9)'ed. In the other
   words, if you have a pointer to the non-reclaimed vnode that
   is locked, the vnode cannot be reclaimed until the lock is freed.
3. The verbs that lock a vnode (vget() and vn_lock(9)) have two mode
   of operations.
   - If you specify the LK_RETRY in the lock flags, you would get
     even the reclaimed vnode locked.
   - If you do not specified LK_RETRY, you would get ENOENT for the
     reclaimed vnode.
   [See the #1 for the reason why you must have a vnode held while
    calling vget() or vn_lock()].
4. The reclaimed vnode has the VI_DOOMED flag set; you must have vnode
   interlock locked to check the context of the v_iflag. Most filesystems,
   as opposed to the VFS, use the other technique to detect the reclaimed
   vnode, if needed. They clear the v_data in the vop_reclaim, and
   verification of the (v_data !=3D NULL) is enough to check for reclamatio=
n.

Very good example of the practical usage of the rules above are the
nullfs routines null_reclaim(), null_lock() and null_nodeget().

>=20
>=20
> >You said that that nullfs is vulnerable to the problem. Could you,
> >please, point me to the corresponding stack trace ? At least, the nullfs
> >vop_lock() seems to carefully check the possible problems.

--FJ2D5YQYG6NL2pc1
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (FreeBSD)

iEYEARECAAYFAkgV+c4ACgkQC3+MBN1Mb4haKACfXxdcHAicJTki0O0Iw60E3WmG
4y8An2qfC3GYLpvDljGmgrbxKqtJY8uS
=2gda
-----END PGP SIGNATURE-----

--FJ2D5YQYG6NL2pc1--

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 17:26:49 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4E1B31065672
	for <fs@freebsd.org>; Mon, 28 Apr 2008 17:26:49 +0000 (UTC)
	(envelope-from daichi@freebsd.org)
Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.246.90])
	by mx1.freebsd.org (Postfix) with ESMTP id 1AA318FC21
	for <fs@freebsd.org>; Mon, 28 Apr 2008 17:26:49 +0000 (UTC)
	(envelope-from daichi@freebsd.org)
Received: from parancell.ongs.co.jp (dullmdaler.ongs.co.jp [202.216.246.94])
	by natial.ongs.co.jp (Postfix) with ESMTP id 87A0D125438;
	Tue, 29 Apr 2008 02:26:48 +0900 (JST)
Message-ID: <481608D8.1080308@freebsd.org>
Date: Tue, 29 Apr 2008 02:26:48 +0900
From: Daichi GOTO <daichi@freebsd.org>
User-Agent: Thunderbird 2.0.0.12 (X11/20080423)
MIME-Version: 1.0
To: Kostik Belousov <kostikbel@gmail.com>
References: <4811B0A0.8040702@freebsd.org>
	<20080426100116.GL18958@deviant.kiev.zoral.com.ua>
	<48159AC5.3030000@freebsd.org>
	<20080428132413.GS18958@deviant.kiev.zoral.com.ua>
	<4815E107.9030902@freebsd.org>
	<20080428162238.GT18958@deviant.kiev.zoral.com.ua>
In-Reply-To: <20080428162238.GT18958@deviant.kiev.zoral.com.ua>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: fs@freebsd.org
Subject: Re: Approval request of some additions to src/sys/kern/vfs_subr.c
 and src/sys/sys/vnode.h
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 17:26:49 -0000

Kostik Belousov wrote:
> On Mon, Apr 28, 2008 at 11:36:55PM +0900, Daichi GOTO wrote:
>> Thanks for your response and explanation :)
>>
>> Kostik Belousov wrote:
>>> On Mon, Apr 28, 2008 at 06:37:09PM +0900, Daichi GOTO wrote:
>>>> Kostik Belousov wrote:
>>>>> On Fri, Apr 25, 2008 at 07:21:20PM +0900, Daichi GOTO wrote:
>>>>>> Hi Konstantin :)
>>>>>>
>>>>>> To fix a unionfs issue of 
>>>>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=109377,
>>>>>> we need to add new functions
>>>>>>
>>>>>>  void vkernrele(struct vnode *vp);
>>>>>>  void vkernref(struct vnode *vp);
>>>>>>
>>>>>> and one value
>>>>>>
>>>>>>  int	v_kernusecount; /* i ref count of kernel */
>>>>>>
>>>>>> to src/sys/sys/vnode.h and rc/sys/kern/vfs_subr.c.
>>>>>>
>>>>>> Unionfs will be panic when lower fs layer is forced umounted by
>>>>>> "umount -f".  So to avoid this issue, we've added
>>>>>> "v_kernusecount" value that means "a vnode count that kernel are
>>>>>> using".  vkernrele() and vkernref() are functions that manage
>>>>>> "v_kernusecount" value.
>>>>>>
>>>>>> Please check those and give us an approve or some comments!
>>>>> There is already the vnode reference count. From your description, I 
>>>>> cannot
>>>>> understand how the kernusecount would prevent the panic when forced 
>>>>> unmount
>>>>> is performed. Could you, please, show the actual code ? PR mentioned
>>>>> does not contain any patch.
>>>> Our patch realizes avoiding kernel panic by "umount -f" operation using 
>>>> with
>>>> EBUSY process.
>>>>
>>>> On current implementation (not applied our patch), "umount -f" tries to
>>>> release vnode at any vnode reference count value. Since that, unionfs
>>>> and nullfs access invalid vnode and lead kernel panic. To prevent this
>>>> issue, we need a some kind of not-umount-accept-mechanism in invalid case
>>>> (e.x. fs in unionfsed stack, it must be umounted in correct order) and
>>>> to realize that, current vnode reference count is not enough we are 
>>>> thinking.
>>>>
>>>> If you have any ideas to realize the same solution with current vnode
>>>> reference, would you please tell us your idea :)
>>>>
>>>>> The problem you described is common for the kernel code, and right way
>>>>> to handle it, for now, is to keep refcount _and_ check for the forced
>>>>> reclaim.
>>> Your patch in essence disables the forced unmount. I would object against
>>> such decision.
>> Oooooo....   OK. We understand.
>>
>>> Even if taking this direction, I believe more cleaner solution would be
>>> to introduce a counter that disables the (forced) unmount into the
>>> struct mount, instead of the struct vnode. Having the counter in the
>>> vnode, the unmount -f behaviour is non-deterministic and depended on
>>> the presence of the cached vnodes of the upper layer. The mount counter
>>> would be incremented by unionfs cover mount. But, as I said above, this
>>> looks like a wrong solution.
>>>
>>> The right way to handle the forced reclaim with the current VFS is to
>>> add the explicit checks for the reclaimed vnodes where it is needed. The
>>> vnode cannot be reclaimed while the vnode lock is held. When obtaining
>>> the vnode lock, the reclamation can be detected. For instance, the
>>> vget() without LK_RETRY shall be checked for ENOENT.
>> At last, we want to check that vnode is released or not where
>> unionfs does not know. If we can do that check, our patch is
>> not needed for solving that issue.
>>
>> Would you please give us the way to check that target vnode is
>> released or not before accessing it.
> 
> The basic rules of our VFS are:
> 1. You _must_ hold the vnode unless the vnode is locked. Hold count
>    prevents the vnode memory from being reused and guarantees the
>    validity of the counters, v_vnlock, v_mount and vop (but please note
>    that validity != stability). E.g., v_mount may be NULLed and vop
>    become the deadfs_vop due to reclamation.
> 2. The vnode lock is held when the vnode is vgone(9)'ed. In the other
>    words, if you have a pointer to the non-reclaimed vnode that
>    is locked, the vnode cannot be reclaimed until the lock is freed.
> 3. The verbs that lock a vnode (vget() and vn_lock(9)) have two mode
>    of operations.
>    - If you specify the LK_RETRY in the lock flags, you would get
>      even the reclaimed vnode locked.
>    - If you do not specified LK_RETRY, you would get ENOENT for the
>      reclaimed vnode.
>    [See the #1 for the reason why you must have a vnode held while
>     calling vget() or vn_lock()].
> 4. The reclaimed vnode has the VI_DOOMED flag set; you must have vnode
>    interlock locked to check the context of the v_iflag. Most filesystems,
>    as opposed to the VFS, use the other technique to detect the reclaimed
>    vnode, if needed. They clear the v_data in the vop_reclaim, and
>    verification of the (v_data != NULL) is enough to check for reclamation.
> 
> Very good example of the practical usage of the rules above are the
> nullfs routines null_reclaim(), null_lock() and null_nodeget().

Thanks for your explanation!  We'll try to research and get another
new solution for this issue :)

>>> You said that that nullfs is vulnerable to the problem. Could you,
>>> please, point me to the corresponding stack trace ? At least, the nullfs
>>> vop_lock() seems to carefully check the possible problems.

-- 
   Daichi GOTO, http://people.freebsd.org/~daichi

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 29 01:52:56 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4C8EE106564A
	for <freebsd-fs@freebsd.org>; Tue, 29 Apr 2008 01:52:56 +0000 (UTC)
	(envelope-from andrew@thefrog.net)
Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.175])
	by mx1.freebsd.org (Postfix) with ESMTP id B81DD8FC15
	for <freebsd-fs@freebsd.org>; Tue, 29 Apr 2008 01:52:55 +0000 (UTC)
	(envelope-from andrew@thefrog.net)
Received: by ug-out-1314.google.com with SMTP id y2so951853uge.37
	for <freebsd-fs@freebsd.org>; Mon, 28 Apr 2008 18:52:54 -0700 (PDT)
Received: by 10.67.30.3 with SMTP id h3mr5654698ugj.35.1209432441311;
	Mon, 28 Apr 2008 18:27:21 -0700 (PDT)
Received: by 10.86.36.4 with HTTP; Mon, 28 Apr 2008 18:27:21 -0700 (PDT)
Message-ID: <16a6ef710804281827p4b6e1ef3sbec516163ba764a@mail.gmail.com>
Date: Tue, 29 Apr 2008 11:27:21 +1000
From: "Andrew Hill" <lists@thefrog.net>
Sender: andrew@thefrog.net
To: freebsd-fs@freebsd.org
MIME-Version: 1.0
X-Google-Sender-Auth: 9c73f03254ec42d2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: ZFS docs / info
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Apr 2008 01:52:56 -0000

Not sure if this is the right list for this (apologies if not) but here
goes...

Over the last week I've spent a lot of time getting to know ZFS; starting
with basically no knowledge of how the bits and pieces are structured, nor
how to use it, and (with a lot of late nights) getting to the point where I
feel comfortable using it for a ~2TB raidz server.

I've been using FreeBSD for about 8 years, so I'm comfortable using the
system, my learning curve was purely with zfs. So at the suggestion of a
friend I made a bunch of notes and wrote an intro to zfs as I now see it,
and made some specific notes on things that I didn't find obvious from the
documentation (at least the docs I found... which were the ZFS Tuning Guide
on wiki.freebsd.org, the sun ZFS administrator's guide, zfs/zpool man pages
and a bunch of blogs). Mostly the structure of the differnet elements of ZFS
(zpools, file systems, vdevs, zvols) and how they interact, but also a few
limitations of how those can be configured.

I figure what I've written may be (hopefully) useful to others with UNIX
experience but brand new to ZFS, or, better still, if someone is writing a
wiki or documentation for ZFS on bsd, i'm happy for any of what i've written
to be used for that kind of thing.

post 1 - basic intro, overview of the structure of zfs (zpools, zfs, vdevs,
zvols and how they all interact)
http://blog.thefrog.net/2008/04/zfs-on-freebsd.html

post 2 - some notable limitations and features i didn't really get from my
reading of the docs (and a bug that i've yet to reproduce in a debug kernel)
http://blog.thefrog.net/2008/04/more-zfs-on-freebsd.html

i'm providing links because there's a rather large amount of text, which
will no doubt have the odd mistake to fix as they're pointed out to me

anyway, hope it helps someone

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 30 07:35:47 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EA9321065670
	for <freebsd-fs@freebsd.org>; Wed, 30 Apr 2008 07:35:47 +0000 (UTC)
	(envelope-from dudu@dudu.ro)
Received: from fk-out-0910.google.com (fk-out-0910.google.com [209.85.128.189])
	by mx1.freebsd.org (Postfix) with ESMTP id 920528FC1C
	for <freebsd-fs@freebsd.org>; Wed, 30 Apr 2008 07:35:47 +0000 (UTC)
	(envelope-from dudu@dudu.ro)
Received: by fk-out-0910.google.com with SMTP id k31so203867fkk.11
	for <freebsd-fs@freebsd.org>; Wed, 30 Apr 2008 00:35:46 -0700 (PDT)
Received: by 10.82.159.15 with SMTP id h15mr28189bue.29.1209539294954;
	Wed, 30 Apr 2008 00:08:14 -0700 (PDT)
Received: by 10.82.185.8 with HTTP; Wed, 30 Apr 2008 00:08:14 -0700 (PDT)
Message-ID: <ad79ad6b0804300008k2564951br992e3ba23926468c@mail.gmail.com>
Date: Wed, 30 Apr 2008 10:08:14 +0300
From: "Vlad GALU" <dudu@dudu.ro>
To: freebsd-fs@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Subject: [FYI] Unionfs hosed by weekly cronjobs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Apr 2008 07:35:48 -0000

    I added the FYI tag in the subject of this message in order to let
you know that apart from noticing the symptom, I don't have any other
useful info.
    The machine in question runs the latest RELENG_7 and used to have
/usr/ports mounted "below" twice in two different jails. Other mount
flags are rw and noatime.
    Whenever the weekly jobs start, the system freezes. Ping still
works, however. I couldn't test each weekly script because I don't
have physical access to this machine and am currently away from my
office. Switching to nullfs for the aforementioned mountpoints worked
around the issue, at the cost of eliminating the possibility of
building the same port in different jails. When I get back to my
office I'll try to reproduce the problem, but if anybody can do it in
the meantime, even better.

    Thanks,
    Vlad.

-- 
~/.signature: no such file or directory

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 30 13:36:52 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 875B61065685
	for <freebsd-fs@freebsd.org>; Wed, 30 Apr 2008 13:36:52 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 3A7BF8FC19
	for <freebsd-fs@freebsd.org>; Wed, 30 Apr 2008 13:36:52 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1JrCUN-0001UC-Bb
	for freebsd-fs@freebsd.org; Wed, 30 Apr 2008 13:36:51 +0000
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Wed, 30 Apr 2008 13:36:51 +0000
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Wed, 30 Apr 2008 13:36:51 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Wed, 30 Apr 2008 15:36:41 +0200
Lines: 30
Message-ID: <fv9sl9$trs$1@ger.gmane.org>
References: <16a6ef710804281827p4b6e1ef3sbec516163ba764a@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enigCF37EA0F7776B7D6006BFC9F"
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Thunderbird 2.0.0.12 (X11/20080227)
In-Reply-To: <16a6ef710804281827p4b6e1ef3sbec516163ba764a@mail.gmail.com>
X-Enigmail-Version: 0.95.0
Sender: news <news@ger.gmane.org>
Subject: Re: ZFS docs / info
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Apr 2008 13:36:52 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigCF37EA0F7776B7D6006BFC9F
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Andrew Hill wrote:
> Not sure if this is the right list for this (apologies if not) but here=

> goes...
>=20
> Over the last week I've spent a lot of time getting to know ZFS;=20

Do you know about http://wiki.freebsd.org/ZFS ?


--------------enigCF37EA0F7776B7D6006BFC9F
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIGHXpldnAQVacBcgRAsbAAJ444uZsQfklgwglyjlMx1Hb4QBhcQCfc8In
Zmm888YLi2Sc7Z9UoRPYYN8=
=Padf
-----END PGP SIGNATURE-----

--------------enigCF37EA0F7776B7D6006BFC9F--


From owner-freebsd-fs@FreeBSD.ORG  Thu May  1 08:54:40 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E4A501065670
	for <freebsd-fs@freebsd.org>; Thu,  1 May 2008 08:54:40 +0000 (UTC)
	(envelope-from andrew@thefrog.net)
Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.224])
	by mx1.freebsd.org (Postfix) with ESMTP id C40CC8FC0A
	for <freebsd-fs@freebsd.org>; Thu,  1 May 2008 08:54:40 +0000 (UTC)
	(envelope-from andrew@thefrog.net)
Received: by rv-out-0506.google.com with SMTP id b25so540239rvf.43
	for <freebsd-fs@freebsd.org>; Thu, 01 May 2008 01:54:40 -0700 (PDT)
Received: by 10.141.212.5 with SMTP id o5mr746197rvq.20.1209632080265;
	Thu, 01 May 2008 01:54:40 -0700 (PDT)
Received: from pc-150.acfr.usyd.edu.au ( [129.78.210.150])
	by mx.google.com with ESMTPS id g22sm2341477rvb.7.2008.05.01.01.54.37
	(version=TLSv1/SSLv3 cipher=OTHER);
	Thu, 01 May 2008 01:54:39 -0700 (PDT)
Message-Id: <E7987C90-44B5-4F84-B88F-4B647F0213DC@thefrog.net>
From: Andrew Hill <lists@thefrog.net>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v919.2)
Date: Thu, 1 May 2008 18:54:35 +1000
X-Mailer: Apple Mail (2.919.2)
Sender: Andrew Hill <andrew@thefrog.net>
Subject: ZFS docs / info
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 May 2008 08:54:41 -0000

Ivan Voras wrote:
 > Do you know about http://wiki.freebsd.org/ZFS ?

Yes, that was my starting point as I learnt about ZFS. I simply wanted  
to offer documentation aimed at a different level of user.

I found that the documentation on that wiki and the docs it links to  
tended to fit into one of three categories
1. it provided a very high level listing of features of the whole  
system, without talking about specific components, what each one is  
responsible for and how they fit together (e.g. is the zpool or the  
zfs responsible for checksumming, compression, redundancy, etc) -  
great for convincing people of the worth of ZFS
2. it assumes the reader has full knowledge of how the zfs pieces fit  
together (i.e. they what they want to create and when) and was simply  
there to document the syntax of the zpool and zfs commands - a good  
quick-reference guide for those familiar with zfs
3. it provided very detailed information about commands, which must of  
course include how to use every single component available to ZFS, a  
lot of which is far beyond what a typical 'home' bsd user would want,  
and perhaps confusing due to the level of detail - but perfect for an  
engineer or administrator

Obviously the right documentation for a specific user really depends  
on their background knowledge, and I felt that the first category was  
great for convincing someone to use ZFS, but if they knew nothing of  
how the pieces fit together then 2 and 3 were a very deep pool to dive  
into. So I've tried to summarise the info I found from all three into  
a simpler document aimed somewhere in between high-level-overview and  
detailed-man-pages, containing what I found most useful from the  
documentation available

I don't imagine anyone who's actually bothered to sign up to freebsd- 
fs will want documentation at the level I've written it (they'll be  
going for #2 or 3 above), but I figured those trying to find out how  
it fits together might stumble across the archives, or maybe someone  
involved in documentation will see some utility (for new zfs users) in  
what i've written.

Andrew


From owner-freebsd-fs@FreeBSD.ORG  Fri May  2 20:58:40 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0BB67106566C
	for <freebsd-fs@freebsd.org>; Fri,  2 May 2008 20:58:40 +0000 (UTC)
	(envelope-from anderson@freebsd.org)
Received: from ns.trinitel.com (186.161.36.72.static.reverse.ltdomains.com
	[72.36.161.186])
	by mx1.freebsd.org (Postfix) with ESMTP id CF5A58FC0C
	for <freebsd-fs@freebsd.org>; Fri,  2 May 2008 20:58:39 +0000 (UTC)
	(envelope-from anderson@freebsd.org)
Received: from proton.storspeed.com
	(209-163-168-124.static.tenantsolutions.com [209.163.168.124]
	(may be forged)) (authenticated bits=0)
	by ns.trinitel.com (8.14.1/8.14.1) with ESMTP id m42KeCib098887;
	Fri, 2 May 2008 15:40:12 -0500 (CDT)
	(envelope-from anderson@freebsd.org)
Message-Id: <4CA7BA82-E95C-45FF-9B94-8EF27B6DB024@freebsd.org>
From: Eric Anderson <anderson@freebsd.org>
To: Attila Nagy <bra@fsn.hu>
In-Reply-To: <48070DCF.9090902@fsn.hu>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v919.2)
Date: Fri, 2 May 2008 15:40:11 -0500
References: <48070DCF.9090902@fsn.hu>
X-Mailer: Apple Mail (2.919.2)
X-Spam-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00 autolearn=ham
	version=3.1.8
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on ns.trinitel.com
Cc: freebsd-fs@freebsd.org
Subject: Re: Consistent inodes between distinct machines
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 May 2008 20:58:40 -0000

On Apr 17, 2008, at 3:43 AM, Attila Nagy wrote:

> Hello,
>
> I have several NFS servers, where the service must be available  
> 0-24. The servers are mounted read only on the clients and I've  
> solved the problem of maintaining consistent inodes between them by  
> rsyncing an UFS image and mounting it via md on the NFS servers.
> The machines have a common IP address with CARP, so if one of them  
> falls out, the other(s) can take over.
>
> This works nice, but rsyncing multi gigabyte files are becoming more  
> and more annoying, so I've wondered whether it would be possible to  
> get constant inodes between machines via alternative ways.


Why not avoid syncing multi-gigabyte files by splitting your huge FS  
image into many smaller say 512MB files, then use md and geom concat/ 
stripe/etc to make them all one image that you mount?

Eric


From owner-freebsd-fs@FreeBSD.ORG  Sat May  3 12:51:05 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8F270106564A
	for <freebsd-fs@freebsd.org>; Sat,  3 May 2008 12:51:05 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id 4BE898FC19
	for <freebsd-fs@freebsd.org>; Sat,  3 May 2008 12:51:04 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely5.cicely.de ([10.1.1.7])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id m43Cp1q9011127
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Sat, 3 May 2008 14:51:02 +0200 (CEST)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14])
	by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id m43Copdt001744
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 3 May 2008 14:50:52 +0200 (CEST)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (localhost [127.0.0.1])
	by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id m43CopgF043266;
	Sat, 3 May 2008 14:50:51 +0200 (CEST)
	(envelope-from ticso@cicely12.cicely.de)
Received: (from ticso@localhost)
	by cicely12.cicely.de (8.13.4/8.13.3/Submit) id m43Copwa043265;
	Sat, 3 May 2008 14:50:51 +0200 (CEST) (envelope-from ticso)
Date: Sat, 3 May 2008 14:50:51 +0200
From: Bernd Walter <ticso@cicely12.cicely.de>
To: Eric Anderson <anderson@freebsd.org>
Message-ID: <20080503125050.GG40730@cicely12.cicely.de>
References: <48070DCF.9090902@fsn.hu>
	<4CA7BA82-E95C-45FF-9B94-8EF27B6DB024@freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4CA7BA82-E95C-45FF-9B94-8EF27B6DB024@freebsd.org>
X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha
User-Agent: Mutt/1.5.9i
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8,
	BAYES_00=-2.599 autolearn=ham version=3.2.3
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on cicely12.cicely.de
Cc: freebsd-fs@freebsd.org
Subject: Re: Consistent inodes between distinct machines
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 03 May 2008 12:51:05 -0000

On Fri, May 02, 2008 at 03:40:11PM -0500, Eric Anderson wrote:
> On Apr 17, 2008, at 3:43 AM, Attila Nagy wrote:
> 
> >Hello,
> >
> >I have several NFS servers, where the service must be available  
> >0-24. The servers are mounted read only on the clients and I've  
> >solved the problem of maintaining consistent inodes between them by  
> >rsyncing an UFS image and mounting it via md on the NFS servers.
> >The machines have a common IP address with CARP, so if one of them  
> >falls out, the other(s) can take over.
> >
> >This works nice, but rsyncing multi gigabyte files are becoming more  
> >and more annoying, so I've wondered whether it would be possible to  
> >get constant inodes between machines via alternative ways.
> 
> 
> Why not avoid syncing multi-gigabyte files by splitting your huge FS  
> image into many smaller say 512MB files, then use md and geom concat/ 
> stripe/etc to make them all one image that you mount?

Where would be the positive effect by doing this?
FFS distributes data over the media, so all the small files changes
in almost every case and you have to checksum-compare the whole virtual
disk anyway.
With multiple files the syncing is more complex. For example a normal
rsync run can garantie that you get a complete file synced or none
at all, but this doesn't work out of the box with multiple files, so
you risk half updated data.

Nevertheless I think that the UFS/NFS combo is not very good for this
problem.

With ZFS send/receive however inode numbers are consistent.
Together with the differential stream creation it is quite efficient
to sync large volumes as well.
[75]cicely14# zfs send data/arm-elf@2008-05-03 | zfs receive -v data/test
receiving full stream of data/arm-elf@2008-05-03 into data/test@2008-05-03
received 126Mb stream in 28 seconds (4.50Mb/sec)
0.008u 5.046s 0:27.93 18.0%     53+2246k 0+0io 0pf+0w
[56]cicely14# ls -ali /usr/local/arm-elf/bin/
total 22585
147 drwxr-xr-x   2 root  wheel       20 Mar 25  2006 .
  3 drwxr-xr-x  11 root  wheel       11 Dec 25 04:58 ..
154 -rwxr-xr-x   1 root  wheel  1514107 Mar 25  2006 arm-elf-addr2line
150 -rwxr-xr-x   2 root  wheel  1495219 Mar 25  2006 arm-elf-ar
159 -rwxr-xr-x   2 root  wheel  2275463 Mar 25  2006 arm-elf-as
158 -rwxr-xr-x   1 root  wheel  1481234 Mar 25  2006 arm-elf-c++filt
163 -rwxr-xr-x   1 root  wheel   300233 Mar 25  2006 arm-elf-cpp
164 -rwxr-xr-x   2 root  wheel   296938 Mar 25  2006 arm-elf-gcc
164 -rwxr-xr-x   2 root  wheel   296938 Mar 25  2006 arm-elf-gcc-4.1.0
162 -rwxr-xr-x   1 root  wheel    15949 Mar 25  2006 arm-elf-gccbug
161 -rwxr-xr-x   1 root  wheel   126715 Mar 25  2006 arm-elf-gcov
160 -rwxr-xr-x   2 root  wheel  2162285 Mar 25  2006 arm-elf-ld
156 -rwxr-xr-x   2 root  wheel  1541809 Mar 25  2006 arm-elf-nm
153 -rwxr-xr-x   1 root  wheel  1871104 Mar 25  2006 arm-elf-objcopy
149 -rwxr-xr-x   2 root  wheel  2008424 Mar 25  2006 arm-elf-objdump
152 -rwxr-xr-x   2 root  wheel  1495214 Mar 25  2006 arm-elf-ranlib
155 -rwxr-xr-x   1 root  wheel   389000 Mar 25  2006 arm-elf-readelf
148 -rwxr-xr-x   1 root  wheel  1430608 Mar 25  2006 arm-elf-size
151 -rwxr-xr-x   1 root  wheel  1412788 Mar 25  2006 arm-elf-strings
157 -rwxr-xr-x   2 root  wheel  1871103 Mar 25  2006 arm-elf-strip
[57]cicely14# ls -ali /data/test/bin/
total 22585
147 drwxr-xr-x   2 root  wheel       20 Mar 25  2006 .
  3 drwxr-xr-x  11 root  wheel       11 Dec 25 04:58 ..
154 -rwxr-xr-x   1 root  wheel  1514107 Mar 25  2006 arm-elf-addr2line
150 -rwxr-xr-x   2 root  wheel  1495219 Mar 25  2006 arm-elf-ar
159 -rwxr-xr-x   2 root  wheel  2275463 Mar 25  2006 arm-elf-as
158 -rwxr-xr-x   1 root  wheel  1481234 Mar 25  2006 arm-elf-c++filt
163 -rwxr-xr-x   1 root  wheel   300233 Mar 25  2006 arm-elf-cpp
164 -rwxr-xr-x   2 root  wheel   296938 Mar 25  2006 arm-elf-gcc
164 -rwxr-xr-x   2 root  wheel   296938 Mar 25  2006 arm-elf-gcc-4.1.0
162 -rwxr-xr-x   1 root  wheel    15949 Mar 25  2006 arm-elf-gccbug
161 -rwxr-xr-x   1 root  wheel   126715 Mar 25  2006 arm-elf-gcov
160 -rwxr-xr-x   2 root  wheel  2162285 Mar 25  2006 arm-elf-ld
156 -rwxr-xr-x   2 root  wheel  1541809 Mar 25  2006 arm-elf-nm
153 -rwxr-xr-x   1 root  wheel  1871104 Mar 25  2006 arm-elf-objcopy
149 -rwxr-xr-x   2 root  wheel  2008424 Mar 25  2006 arm-elf-objdump
152 -rwxr-xr-x   2 root  wheel  1495214 Mar 25  2006 arm-elf-ranlib
155 -rwxr-xr-x   1 root  wheel   389000 Mar 25  2006 arm-elf-readelf
148 -rwxr-xr-x   1 root  wheel  1430608 Mar 25  2006 arm-elf-size
151 -rwxr-xr-x   1 root  wheel  1412788 Mar 25  2006 arm-elf-strings
157 -rwxr-xr-x   2 root  wheel  1871103 Mar 25  2006 arm-elf-strip

-- 
B.Walter <bernd@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.

From owner-freebsd-fs@FreeBSD.ORG  Sat May  3 15:55:45 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EDCFE106566C
	for <freebsd-fs@freebsd.org>; Sat,  3 May 2008 15:55:45 +0000 (UTC)
	(envelope-from yalur@mail.ru)
Received: from mx39.mail.ru (mx39.mail.ru [194.67.23.35])
	by mx1.freebsd.org (Postfix) with ESMTP id A4A488FC18
	for <freebsd-fs@freebsd.org>; Sat,  3 May 2008 15:55:45 +0000 (UTC)
	(envelope-from yalur@mail.ru)
Received: from [77.123.105.27] (port=51133 helo=reluctant-operater.volia.net)
	by mx39.mail.ru with asmtp 
	id 1JsK5Q-000PRP-00; Sat, 03 May 2008 19:55:44 +0400
From: Ruslan Kovtun <yalur@mail.ru>
Organization: Home
To: "Daniel Andersson" <engywook@gmail.com>
Date: Sat, 3 May 2008 18:55:43 +0300
User-Agent: KMail/1.9.7
References: <24adbbc00804151529m2a74085ds468eaac55ba94a32@mail.gmail.com>
	<200804162212.32560.yalur@mail.ru>
	<24adbbc00804270501t48b9a1c5le2f1d0bce18572cf@mail.gmail.com>
In-Reply-To: <24adbbc00804270501t48b9a1c5le2f1d0bce18572cf@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="koi8-r"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Message-Id: <200805031855.43218.yalur@mail.ru>
X-Spam: Not detected
Cc: freebsd-fs@freebsd.org
Subject: Re: Choppy performance.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: yalur@mail.ru
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 03 May 2008 15:55:46 -0000

Sorry, maybe I miss something.
What "memory allocation errors in rtorrent" do you mean?

> But if it isn't really using that much memory how come I get
> memory allocation errors in rtorrent if there's more memory
> avaliable?

One week ago was observed problem with write speed on ZFS pool with followi=
ng=20
configuration on i386:
vm.kmem_size_max=3D"1073741824"
vm.kmem_size=3D"1073741824"
KVA_PAGES=3D512
Write speed in 8 disks (raidz) is 40 Mb/sec and very choppy.=20
If I change to vm.kmem_size_max=3D"999M", write speed increase in 4 times=20
(160Mb/sec). I think this is bug.=20
What is yours configuration?=20


=2D-=20
________________
=F3 =D5=D7=C1=D6=C5=CE=C9=C5=CD
=EB=CF=D7=D4=D5=CE =F2=D5=D3=CC=C1=CE mailto <yalur@mail.ru>

From owner-freebsd-fs@FreeBSD.ORG  Sat May  3 18:09:34 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6FBA0106566B
	for <freebsd-fs@freebsd.org>; Sat,  3 May 2008 18:09:34 +0000 (UTC)
	(envelope-from bra@fsn.hu)
Received: from people.fsn.hu (people.fsn.hu [195.228.252.137])
	by mx1.freebsd.org (Postfix) with ESMTP id 5ADE78FC0A
	for <freebsd-fs@freebsd.org>; Sat,  3 May 2008 18:09:32 +0000 (UTC)
	(envelope-from bra@fsn.hu)
Received: from [172.27.51.1] (fw.axelero.hu [195.228.243.120])
	by people.fsn.hu (Postfix) with ESMTP id 769D8C7653;
	Sat,  3 May 2008 20:09:27 +0200 (CEST)
Message-ID: <481CAA55.2030506@fsn.hu>
Date: Sat, 03 May 2008 20:09:25 +0200
From: Attila Nagy <bra@fsn.hu>
User-Agent: Thunderbird 2.0.0.14 (Windows/20080421)
MIME-Version: 1.0
To: ticso@cicely.de
References: <48070DCF.9090902@fsn.hu>
	<4CA7BA82-E95C-45FF-9B94-8EF27B6DB024@freebsd.org>
	<20080503125050.GG40730@cicely12.cicely.de>
In-Reply-To: <20080503125050.GG40730@cicely12.cicely.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: Consistent inodes between distinct machines
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 03 May 2008 18:09:34 -0000

Hello,

On 2008.05.03. 14:50, Bernd Walter wrote:
> On Fri, May 02, 2008 at 03:40:11PM -0500, Eric Anderson wrote:
>   
>> On Apr 17, 2008, at 3:43 AM, Attila Nagy wrote:
>>
>>     
>>> Hello,
>>>
>>> I have several NFS servers, where the service must be available  
>>> 0-24. The servers are mounted read only on the clients and I've  
>>> solved the problem of maintaining consistent inodes between them by  
>>> rsyncing an UFS image and mounting it via md on the NFS servers.
>>> The machines have a common IP address with CARP, so if one of them  
>>> falls out, the other(s) can take over.
>>>
>>> This works nice, but rsyncing multi gigabyte files are becoming more  
>>> and more annoying, so I've wondered whether it would be possible to  
>>> get constant inodes between machines via alternative ways.
>>>       
>> Why not avoid syncing multi-gigabyte files by splitting your huge FS  
>> image into many smaller say 512MB files, then use md and geom concat/ 
>> stripe/etc to make them all one image that you mount?
>>     
>
> Where would be the positive effect by doing this?
> FFS distributes data over the media, so all the small files changes
> in almost every case and you have to checksum-compare the whole virtual
> disk anyway.
> With multiple files the syncing is more complex. For example a normal
> rsync run can garantie that you get a complete file synced or none
> at all, but this doesn't work out of the box with multiple files, so
> you risk half updated data.
>   
I haven't got Eric's e-mail, but I agree with the above.
> Nevertheless I think that the UFS/NFS combo is not very good for this
> problem.
>   
I don't think so. I need a stable system and UFS/NFS is in that state in 
FreeBSD.
> With ZFS send/receive however inode numbers are consistent.
>   
Yes, they are, but the filesystem IDs are not, so you cannot have CARP 
failover for the NFS servers, because all clients will have ESTALE 
errors on everything.
I've already tried that, see my e-mails about this topic in the archives 
(it would be good if we could synchronize the filesystem IDs and 
therefore the filehandles too).
> Together with the differential stream creation it is quite efficient
> to sync large volumes as well.
> [75]cicely14# zfs send data/arm-elf@2008-05-03 | zfs receive -v data/test
> receiving full stream of data/arm-elf@2008-05-03 into data/test@2008-05-03
> received 126Mb stream in 28 seconds (4.50Mb/sec)
> 0.008u 5.046s 0:27.93 18.0%     53+2246k 0+0io 0pf+0w
>   
Yes, that's why I thought of this in the first place. But there is 
another problem, which hits us today (with the loopbacked image mount) 
as well: you have to unmount the image and restart the NFS server (it 
can panic the machine otherwise), so we have to flip the active state 
from one machine to the other during the sync.
The exact process looks like this:
- rsync the image to the inactive server
- when it's done, remount the image and restart the nfsd
- flip CARP (this is when the new content will go into production)
- sync the image to the now inactive, previously active server

This is a painful, slow (because of the rsync) and fragile process. And 
if the active server crashes while the sync is going, you are there with 
a possibly non-working state.

With ZFS, the sync time is much smaller, but you have to flip the active 
state and restart nfsd as well.

Currently I'm experimenting with a silly kernel patch, which replaces 
the following arc4random()s with a constant value:
./ffs/ffs_alloc.c:              ip->i_gen = arc4random() / 2 + 1;
./ffs/ffs_alloc.c:              prefcg = arc4random() % fs->fs_ncg;
./ffs/ffs_alloc.c:                      dp2->di_gen = arc4random() / 2 + 1;
./ffs/ffs_vfsops.c:             ip->i_gen = arc4random() / 2 + 1;

It seems that this works when I don't use soft updates on the volumes. 
So what I have now:
- all of the machines have the above arc4random()s removed
- all machines run the data file system in async mode (for speed and 
because soft updates seems to mess up the constant inodes)
- I have all the data in a subversion repository (better than a plain 
"master image", because it's versioned, logged, etc)
- I do updates in this way on the machines: mount -o rw,async /data; svn 
up; mount -o ro /data

So far it seems to be OK, but I'm not yet finished with the testing.

From owner-freebsd-fs@FreeBSD.ORG  Sat May  3 18:52:07 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 05D72106567D;
	Sat,  3 May 2008 18:52:07 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id 553558FC1C;
	Sat,  3 May 2008 18:52:06 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely5.cicely.de ([10.1.1.7])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id m43Iq324022066
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Sat, 3 May 2008 20:52:04 +0200 (CEST)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14])
	by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id m43IpuwU004876
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 3 May 2008 20:51:56 +0200 (CEST)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (localhost [127.0.0.1])
	by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id m43IptGn044067;
	Sat, 3 May 2008 20:51:56 +0200 (CEST)
	(envelope-from ticso@cicely12.cicely.de)
Received: (from ticso@localhost)
	by cicely12.cicely.de (8.13.4/8.13.3/Submit) id m43IptsD044066;
	Sat, 3 May 2008 20:51:55 +0200 (CEST) (envelope-from ticso)
Date: Sat, 3 May 2008 20:51:55 +0200
From: Bernd Walter <ticso@cicely12.cicely.de>
To: Attila Nagy <bra@fsn.hu>
Message-ID: <20080503185155.GA44005@cicely12.cicely.de>
References: <48070DCF.9090902@fsn.hu>
	<4CA7BA82-E95C-45FF-9B94-8EF27B6DB024@freebsd.org>
	<20080503125050.GG40730@cicely12.cicely.de>
	<481CAA55.2030506@fsn.hu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <481CAA55.2030506@fsn.hu>
X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha
User-Agent: Mutt/1.5.9i
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8,
	BAYES_00=-2.599 autolearn=ham version=3.2.3
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on cicely12.cicely.de
Cc: freebsd-fs@freebsd.org, ticso@cicely.de
Subject: Re: Consistent inodes between distinct machines
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 03 May 2008 18:52:07 -0000

On Sat, May 03, 2008 at 08:09:25PM +0200, Attila Nagy wrote:
> Hello,
> 
> On 2008.05.03. 14:50, Bernd Walter wrote:
> >On Fri, May 02, 2008 at 03:40:11PM -0500, Eric Anderson wrote:
> >  
> >>On Apr 17, 2008, at 3:43 AM, Attila Nagy wrote:
> >Nevertheless I think that the UFS/NFS combo is not very good for this
> >problem.
> >  
> I don't think so. I need a stable system and UFS/NFS is in that state in 
> FreeBSD.

ZFS is pretty stable as well, although it has some points you need
to care and tune about.

> >With ZFS send/receive however inode numbers are consistent.
> >  
> Yes, they are, but the filesystem IDs are not, so you cannot have CARP 
> failover for the NFS servers, because all clients will have ESTALE 
> errors on everything.

Havn't though about this.
Of course this is a real problem.
Have you tried the following:
Setup Server A with all required ZFS filesystems.
Replicate everything to Server B using dd.
Then the filesystem ID should be the same on both systems.
This will not work for newly created filesystems however and you may
need to take extra care about not accidently change disks between the
machines, since they have the same disk IDs as well.
I admit - not very perfect :(

> I've already tried that, see my e-mails about this topic in the archives 
> (it would be good if we could synchronize the filesystem IDs and 
> therefore the filehandles too).
> >Together with the differential stream creation it is quite efficient
> >to sync large volumes as well.
> >[75]cicely14# zfs send data/arm-elf@2008-05-03 | zfs receive -v data/test
> >receiving full stream of data/arm-elf@2008-05-03 into data/test@2008-05-03
> >received 126Mb stream in 28 seconds (4.50Mb/sec)
> >0.008u 5.046s 0:27.93 18.0%     53+2246k 0+0io 0pf+0w
> >  
> Yes, that's why I thought of this in the first place. But there is 
> another problem, which hits us today (with the loopbacked image mount) 
> as well: you have to unmount the image and restart the NFS server (it 
> can panic the machine otherwise), so we have to flip the active state 
> from one machine to the other during the sync.

Of course you have to do this - readonly mounts mean not writing, but
it doesn't mean not caching metadata and expecting the underlying media
to change contents, so to stay in sync you have to remount.

> The exact process looks like this:
> - rsync the image to the inactive server
> - when it's done, remount the image and restart the nfsd

You also have to sync the image to a different file, since you can't
pollute the original file with new content, while it is mounted.
But with propper (IIRC default) options rsync already writes a new
file and than exchanges it with the old one.

> - flip CARP (this is when the new content will go into production)
> - sync the image to the now inactive, previously active server
> 
> This is a painful, slow (because of the rsync) and fragile process. And 
> if the active server crashes while the sync is going, you are there with 
> a possibly non-working state.
> 
> With ZFS, the sync time is much smaller, but you have to flip the active 
> state and restart nfsd as well.

Sounds plausible to me.

> Currently I'm experimenting with a silly kernel patch, which replaces 
> the following arc4random()s with a constant value:
> ./ffs/ffs_alloc.c:              ip->i_gen = arc4random() / 2 + 1;
> ./ffs/ffs_alloc.c:              prefcg = arc4random() % fs->fs_ncg;
> ./ffs/ffs_alloc.c:                      dp2->di_gen = arc4random() / 2 + 1;
> ./ffs/ffs_vfsops.c:             ip->i_gen = arc4random() / 2 + 1;
> 
> It seems that this works when I don't use soft updates on the volumes. 

But it is very fragile and it is there for a good reason.
Namely to distribute the allocated inodes over the media and since
AFAIK at leasy small files have their data allocated near the inode
you influece data distribution as well.
This will very likely lead to lower speed after some usage.

> So what I have now:
> - all of the machines have the above arc4random()s removed
> - all machines run the data file system in async mode (for speed and 
> because soft updates seems to mess up the constant inodes)
> - I have all the data in a subversion repository (better than a plain 
> "master image", because it's versioned, logged, etc)
> - I do updates in this way on the machines: mount -o rw,async /data; svn 
> up; mount -o ro /data
> 
> So far it seems to be OK, but I'm not yet finished with the testing.

Honestly said - I wouldn't trust that very much.
Say you use two disk stations with fibre channel, which are connetced to
two hosts.
Use the disk stations with different power supply rails.
Then use a solid constructed single server and have the same machine
as cold or maybe already booted standby.
Use the disk stations to mirror - one half on each station.
If the host dies you can easily take over the service to the other
machine by just mounting the disks.
If you do this with ZFS it even takes care that the original host will
not automatically mount them, since the host-id for the pool has been
changed to that of the other host.
It is not a hot standby as your solution, but talking about service
failures I would assume this will outperform any hackish solution.
I see so many people trying to do freaky failover with additional
complexity and additional failure points, instead of just to increase
the quality of their hardware.

-- 
B.Walter <bernd@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.

From owner-freebsd-fs@FreeBSD.ORG  Sat May  3 19:53:44 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 92B281065676;
	Sat,  3 May 2008 19:53:44 +0000 (UTC) (envelope-from bra@fsn.hu)
Received: from people.fsn.hu (people.fsn.hu [195.228.252.137])
	by mx1.freebsd.org (Postfix) with ESMTP id A4C1D8FC13;
	Sat,  3 May 2008 19:53:43 +0000 (UTC) (envelope-from bra@fsn.hu)
Received: from [172.27.51.1] (fw.axelero.hu [195.228.243.120])
	by people.fsn.hu (Postfix) with ESMTP id B89FAC8CA8;
	Sat,  3 May 2008 21:53:37 +0200 (CEST)
Message-ID: <481CC2B8.5080205@fsn.hu>
Date: Sat, 03 May 2008 21:53:28 +0200
From: Attila Nagy <bra@fsn.hu>
User-Agent: Thunderbird 2.0.0.14 (Windows/20080421)
MIME-Version: 1.0
To: ticso@cicely.de
References: <48070DCF.9090902@fsn.hu>
	<4CA7BA82-E95C-45FF-9B94-8EF27B6DB024@freebsd.org>
	<20080503125050.GG40730@cicely12.cicely.de>
	<481CAA55.2030506@fsn.hu>
	<20080503185155.GA44005@cicely12.cicely.de>
In-Reply-To: <20080503185155.GA44005@cicely12.cicely.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: Consistent inodes between distinct machines
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 03 May 2008 19:53:44 -0000

On 2008.05.03. 20:51, Bernd Walter wrote:
> On Sat, May 03, 2008 at 08:09:25PM +0200, Attila Nagy wrote:
>   
>> Hello,
>>
>> On 2008.05.03. 14:50, Bernd Walter wrote:
>>     
>>> On Fri, May 02, 2008 at 03:40:11PM -0500, Eric Anderson wrote:
>>>  
>>>       
>>>> On Apr 17, 2008, at 3:43 AM, Attila Nagy wrote:
>>>>         
>>> Nevertheless I think that the UFS/NFS combo is not very good for this
>>> problem.
>>>  
>>>       
>> I don't think so. I need a stable system and UFS/NFS is in that state in 
>> FreeBSD.
>>     
>
> ZFS is pretty stable as well, although it has some points you need
> to care and tune about.
>   
I have (had, switched back one to UFS) two machines with ZFS. One i386 
and one amd64. Both kept crashing or freezing, so I don't consider ZFS 
pretty stable ATM. :(

> Havn't though about this.
> Of course this is a real problem.
> Have you tried the following:
> Setup Server A with all required ZFS filesystems.
> Replicate everything to Server B using dd.
> Then the filesystem ID should be the same on both systems.
> This will not work for newly created filesystems however and you may
> need to take extra care about not accidently change disks between the
> machines, since they have the same disk IDs as well.
> I admit - not very perfect :(
>   
Haven't tried that -but thought of it-, because I would need a bunch of 
new filesystems for snapshotting and synchronizing and I would like to 
dd tens of gigabytes every time to all of the NFS servers over the network.
>> Yes, that's why I thought of this in the first place. But there is 
>> another problem, which hits us today (with the loopbacked image mount) 
>> as well: you have to unmount the image and restart the NFS server (it 
>> can panic the machine otherwise), so we have to flip the active state 
>> from one machine to the other during the sync.
>>     
>
> Of course you have to do this - readonly mounts mean not writing, but
> it doesn't mean not caching metadata and expecting the underlying media
> to change contents, so to stay in sync you have to remount.
>   
I am very well aware of that.
If it would work, I would choose a geom_gate solution with one RW 
machine and many RO ones with a mirror formed from them.
Of course that's still not perfect, so ZFS's mirroring would be a better 
fit (due to incremental updates).
But sadly, it's not possible (AFAIK with "standard" methods) to run 
systems like that.
>   
>> The exact process looks like this:
>> - rsync the image to the inactive server
>> - when it's done, remount the image and restart the nfsd
>>     
>
> You also have to sync the image to a different file, since you can't
> pollute the original file with new content, while it is mounted.
>   
I am doing this for years without any ill effects. Of course I don't 
access the filesystem while it's synced. I'm just lazy to umount it, but 
you are right, that's the correct way.
> But with propper (IIRC default) options rsync already writes a new
> file and than exchanges it with the old one.
>   
Yes, I use inplace syncing, because I don't have that much space available.
>   
>> Currently I'm experimenting with a silly kernel patch, which replaces 
>> the following arc4random()s with a constant value:
>> ./ffs/ffs_alloc.c:              ip->i_gen = arc4random() / 2 + 1;
>> ./ffs/ffs_alloc.c:              prefcg = arc4random() % fs->fs_ncg;
>> ./ffs/ffs_alloc.c:                      dp2->di_gen = arc4random() / 2 + 1;
>> ./ffs/ffs_vfsops.c:             ip->i_gen = arc4random() / 2 + 1;
>>
>> It seems that this works when I don't use soft updates on the volumes. 
>>     
>
> But it is very fragile and it is there for a good reason.
>   
For a normal filesystem, yes.
> Namely to distribute the allocated inodes over the media and since
> AFAIK at leasy small files have their data allocated near the inode
> you influece data distribution as well.
> This will very likely lead to lower speed after some usage.
>   
Because these are mostly RO (only RW while updating, which is a slow 
process anyway) volumes, used for serving NFS clients, I don't think it 
will matter that much. But I'll see.
Currently this is the best I could came up with.

> Honestly said - I wouldn't trust that very much.
> Say you use two disk stations with fibre channel, which are connetced to
> two hosts.
> Use the disk stations with different power supply rails.
> Then use a solid constructed single server and have the same machine
> as cold or maybe already booted standby.
> Use the disk stations to mirror - one half on each station.
> If the host dies you can easily take over the service to the other
> machine by just mounting the disks.
> If you do this with ZFS it even takes care that the original host will
> not automatically mount them, since the host-id for the pool has been
> changed to that of the other host.
> It is not a hot standby as your solution, but talking about service
> failures I would assume this will outperform any hackish solution.
> I see so many people trying to do freaky failover with additional
> complexity and additional failure points, instead of just to increase
> the quality of their hardware.
>
>   
The above servers are providing NFS to FreeBSD and Linux netboot clients 
(clients are at many sites, running the real services behind load 
balancers, BGP anycast routing, whatever you like). The NFS servers here 
have the function of rapid deployment (put some new machines in the 
server pool X), centralised management (only have to make the 
configuration and OS changes in one place), etc.

So I'm not trying to build a highly available general cluster (with 
NFS), but a highly available NFS server for netbooted clients.
And commercial NASes aren't better at all (at least this is what I've 
seen so far), most of them are not shared nothing systems with 
affordable, reliable multisite replication capabilities.