From owner-freebsd-hackers@FreeBSD.ORG  Thu Dec  2 21:28:46 2004
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9D95716A4CF
	for <hackers@freebsd.org>; Thu,  2 Dec 2004 21:28:46 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id E7A2A43D53
	for <hackers@freebsd.org>; Thu,  2 Dec 2004 21:28:45 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 25544 invoked by uid 89); 2 Dec 2004 21:28:45 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 2 Dec 2004 21:28:45 -0000
Received: (qmail 21553 invoked by uid 89); 2 Dec 2004 21:27:19 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 2 Dec 2004 21:27:19 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id iB2LRI9k041977;
	Thu, 2 Dec 2004 16:27:18 -0500 (EST)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: Andre Oppermann <andre@freebsd.org>
In-Reply-To: <41AF29AC.6030401@freebsd.org>
References: <41AE3F80.1000506@freebsd.org>  <41AF29AC.6030401@freebsd.org>
Content-Type: text/plain
Message-Id: <1102022838.11465.7735.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Thu, 02 Dec 2004 16:27:18 -0500
Content-Transfer-Encoding: 7bit
cc: hackers@freebsd.org
cc: Scott Long <scottl@freebsd.org>
cc: "current@freebsd.org" <current@freebsd.org>
Subject: Re: My project wish-list for the next 12 months
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Dec 2004 21:28:46 -0000

On Thu, 2004-12-02 at 09:41, Andre Oppermann wrote:
> Scott Long wrote:
> > 5.  Clustered FS support.  SANs are all the rage these days, and
> > clustered filesystems that allow data to be distributed across many
> > storage enpoints and accessed concurrently through the SAN are very
> > powerful.  RedHat recently bought Sistina and re-opened the GFS source
> > code, so exploring this would be very interesting.
> 
> There are certain steps that can be be taken one at a time.  For example
> it should be relatively easy to mount snapshots (ro) from more than one
> machine.  Next step would be to mount a full 'rw' filesystem as 'ro' on
> other boxes.  This would require cache and sector invalidation broadcasting
> from the 'rw' box to the 'ro' mounts.  

Mhhh .. if you plan to invalidate at the disk block cache layer then you
will run into race conditions with UFS/FFS (Especially with remove
operations)
I was once called in to evaluate such a multiple reader/single writer
system based on an UFS like file system and block layer invalidation and
had to convince management to kill it. (It appeared to work and actually
made it though internal and customer acceptance testing before failing
horrible in the field).

If you send me more details on your proposed cache and sector
invalidation/cluster design I will be happy to review it.


> The holy grail of course is to mount
> the same filesystem 'rw' on more than one box, preferrably more than two.
> This requires some more involved synchronization and locking on top of the
> cache invalidation.  And make sure that the multi-'rw' cluster stays alive
> if one of the participants freezes and doesn't respond anymore.
> 
> Scrolling through the UFS/FFS code I think the first one is 2-3 days of
> work.  The second 2-4 weeks and the third 2-3 month to get it right.
> If someone would throw up the money...