From owner-freebsd-hackers@FreeBSD.ORG  Thu Dec  2 18:03:01 2004
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4E77016A4D2; Thu,  2 Dec 2004 18:03:01 +0000 (GMT)
Received: from ims-1.prv.ampira.com (ims-1.ampira.com [66.179.231.26])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 3519143D31; Thu,  2 Dec 2004 18:03:00 +0000 (GMT)
	(envelope-from kamalp@acm.org)
Received: from [202.142.94.194] (helo=[172.16.3.26])
	by ims-1.prv.ampira.com with asmtp (Exim 4.24)
	id 1CZvEy-0004MZ-Im; Thu, 02 Dec 2004 12:59:41 -0500
Message-ID: <41AF58CE.10904@acm.org>
Date: Thu, 02 Dec 2004 23:32:54 +0530
From: "Kamal R. Prasad" <kamalp@acm.org>
User-Agent: Mozilla Thunderbird 0.7.3 (Windows/20040803)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Andre Oppermann <andre@freebsd.org>
References: <41AE3F80.1000506@freebsd.org> <41AF29AC.6030401@freebsd.org>
	<Pine.LNX.4.60.0412021225170.27619@athena> <41AF53E1.80408@freebsd.org>
In-Reply-To: <41AF53E1.80408@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
cc: Sam <sah@softcardsystems.com>
cc: hackers@freebsd.org
cc: Scott Long <scottl@freebsd.org>
cc: "current@freebsd.org" <current@freebsd.org>
Subject: Re: My project wish-list for the next 12 months
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Dec 2004 18:03:01 -0000

Andre Oppermann wrote:

> Sam wrote:
>
>> On Thu, 2 Dec 2004, Andre Oppermann wrote:
>>
>>> Scott Long wrote:
>>>
>>>> 5.  Clustered FS support.  SANs are all the rage these days, and
>>>> clustered filesystems that allow data to be distributed across many
>>>> storage enpoints and accessed concurrently through the SAN are very
>>>> powerful.  RedHat recently bought Sistina and re-opened the GFS source
>>>> code, so exploring this would be very interesting.
>>>
>>>
>>> There are certain steps that can be be taken one at a time.  For 
>>> example
>>> it should be relatively easy to mount snapshots (ro) from more than one
>>> machine.  Next step would be to mount a full 'rw' filesystem as 'ro' on
>>> other boxes.  This would require cache and sector invalidation 
>>> broadcasting
>>> from the 'rw' box to the 'ro' mounts.  The holy grail of course is 
>>> to mount
>>> the same filesystem 'rw' on more than one box, preferrably more than 
>>> two.
>>> This requires some more involved synchronization and locking on top 
>>> of the
>>> cache invalidation.  And make sure that the multi-'rw' cluster stays 
>>> alive
>>> if one of the participants freezes and doesn't respond anymore.
>>>
>>> Scrolling through the UFS/FFS code I think the first one is 2-3 days of
>>> work.  The second 2-4 weeks and the third 2-3 month to get it right.
>>> If someone would throw up the money...
>>
>>
>> You might also design in consideration for data redundancy.  Right now
>> GFS largely relies on the SAN box to export already redundant RAID
>> disks.  GFS sits on a "cluster aware" lvm layer that is supposed to
>> be able to do mirroring and striping, but I'm told it's not
>> stable enough for "production" use.
>
>
> Data redundancy would require a UFS/FFS redesign.  I'm 'only' talking
> about enhancing UFS/FFS but keeping anything ondisk the same (plus
> some more elements).
>
If you add redundancy code into UFS/FFS, that will slow down its 
performance (even for those not seeking redundancy).
A better way would  be to have another filesystem implementation like 
VxFS (veritas filesystem). Im not sure if they have published papers/put 
their techniques into public domain.

regards
-kamal