From owner-freebsd-fs  Mon Dec 17 14:51:15 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from priv-edtnes09-hme0.telusplanet.net (mtaout.telus.net [199.185.220.235])
	by hub.freebsd.org (Postfix) with ESMTP
	id 1B65837B50B; Mon, 17 Dec 2001 14:51:06 -0800 (PST)
Received: from fireball ([209.52.193.31])
          by priv-edtnes09-hme0.telusplanet.net
          (InterMail vM.5.01.04.01 201-253-122-122-101-20011014) with SMTP
          id <20011217225103.FRAL28264.priv-edtnes09-hme0.telusplanet.net@fireball>;
          Mon, 17 Dec 2001 15:51:03 -0700
Message-ID: <001301c1874d$50ae0d20$02000003@tornado>
From: "Dave Reyenga" <dreyenga@telus.net>
To: <freebsd-fs@freebsd.org>, <freebsd-hackers@freebsd.org>
Cc: <hiten@uk.FreeBSD.org>
Subject: Instead of JFS, why not a whole new FS?
Date: Mon, 17 Dec 2001 22:50:45 -0000
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4807.1700
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

How about writing a new filesystem based on UFS? This would save all of the
hassle that JFS would bring: licensing, porting time, etc. Of course, it
would likely bust any compatibility desired.

What I'm thinking is a filesystem that takes the current UFS and improves
upon it. It could support larger partitions, more partitions in a slice, and
perhaps a "Journal" partition (like the current "swap" partition) among
other new features.

What do others have to say about this? Are there any major flaws in my idea?
It just seems to me that this would cut a lot of hassle.

Those are just my $0.02. I know I've said it before, but I wasn't nearly as
clear last time.

-Craig


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Dec 17 15: 0:39 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from rwcrmhc53.attbi.com (rwcrmhc53.attbi.com [204.127.198.39])
	by hub.freebsd.org (Postfix) with ESMTP
	id EE01B37B632; Mon, 17 Dec 2001 15:00:16 -0800 (PST)
Received: from InterJet.elischer.org ([12.232.206.8])
          by rwcrmhc53.attbi.com
          (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP
          id <20011217230016.REBT10701.rwcrmhc53.attbi.com@InterJet.elischer.org>;
          Mon, 17 Dec 2001 23:00:16 +0000
Received: from localhost (localhost.elischer.org [127.0.0.1])
	by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id OAA36312;
	Mon, 17 Dec 2001 14:55:18 -0800 (PST)
Date: Mon, 17 Dec 2001 14:55:17 -0800 (PST)
From: Julian Elischer <julian@elischer.org>
To: Dave Reyenga <dreyenga@telus.net>
Cc: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org,
	hiten@uk.FreeBSD.org
Subject: Re: Instead of JFS, why not a whole new FS?
In-Reply-To: <001301c1874d$50ae0d20$02000003@tornado>
Message-ID: <Pine.BSF.4.21.0112171454050.36281-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

It is possible that Kirk may be thinking about doing this. He mumbled
something about a new FS a while ago but it wasn't clear whether he was
thinking of doing it, or he was just saying "someone will eventually do
it".


On Mon, 17 Dec 2001, Dave Reyenga wrote:

> How about writing a new filesystem based on UFS? This would save all of the
> hassle that JFS would bring: licensing, porting time, etc. Of course, it
> would likely bust any compatibility desired.
> 
> What I'm thinking is a filesystem that takes the current UFS and improves
> upon it. It could support larger partitions, more partitions in a slice, and
> perhaps a "Journal" partition (like the current "swap" partition) among
> other new features.
> 
> What do others have to say about this? Are there any major flaws in my idea?
> It just seems to me that this would cut a lot of hassle.
> 
> Those are just my $0.02. I know I've said it before, but I wasn't nearly as
> clear last time.
> 
> -Craig
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Dec 17 15: 5: 5 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from web21107.mail.yahoo.com (web21107.mail.yahoo.com [216.136.227.109])
	by hub.freebsd.org (Postfix) with SMTP id 7DF5C37B41E
	for <freebsd-fs@freebsd.org>; Mon, 17 Dec 2001 15:04:19 -0800 (PST)
Message-ID: <20011217230419.68884.qmail@web21107.mail.yahoo.com>
Received: from [62.254.0.5] by web21107.mail.yahoo.com via HTTP; Mon, 17 Dec 2001 15:04:19 PST
Date: Mon, 17 Dec 2001 15:04:19 -0800 (PST)
From: Hiten Pandya <hitmaster2k@yahoo.com>
Subject: Re: Instead of JFS, why not a whole new FS?
To: dreyenga@telus.net
Cc: freebsd-fs@freebsd.org, hackers@freebsd.org
In-Reply-To: <001301c1874d$50ae0d20$02000003@tornado>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

--- Dave Reyenga <dreyenga@telus.net> wrote:
> How about writing a new filesystem based on UFS?
> This would save all of the
> hassle that JFS would bring: licensing, porting
> time, etc. Of course, it
> would likely bust any compatibility desired.

hi,
first of all,
a project called UFS2 has been started by Kirk
McKusick
on improving the existing UFS file system and
improving
'softupdates' and other stuff in this file system.

> What I'm thinking is a filesystem that takes the
> current UFS and improves
> upon it. It could support larger partitions, more
> partitions in a slice, and
> perhaps a "Journal" partition (like the current
> "swap" partition) among
> other new features.

I dont know that this could be possible of having
a 'Journal' partition, though I may be wrong.

> What do others have to say about this? Are there any
> major flaws in my idea?
> It just seems to me that this would cut a lot of
> hassle.

One flaw in your idea is, that it would literally take
longer to make this kind of file system on our current
UFS source base.  The reason is due to the code
maturity level that UFS has reached of around 20
years.

I think porting JFS will take less time than upgrading
the current UFS, which as a matter of fact has already
been started by Kirk McKusick himself.

Regarding 'hassle', for me; nothing is a hassle as
long
as it can be acheived.  If you are really interested
in upgrading the current UFS, it would be good if you
got in touch with Kirk McKusick himself.

regards,
=Hiten
=<hiten@uk.FreeBSD.org>

=====
=Hiten
=<hiten@uk.FreeBSD.org>

__________________________________________________
Do You Yahoo!?
Check out Yahoo! Shopping and Yahoo! Auctions for all of
your unique holiday gifts! Buy at http://shopping.yahoo.com
or bid at http://auctions.yahoo.com

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Dec 17 16: 8:24 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from monorchid.lemis.com (monorchid.lemis.com [192.109.197.75])
	by hub.freebsd.org (Postfix) with ESMTP
	id 69B6937B41A; Mon, 17 Dec 2001 16:08:11 -0800 (PST)
Received: by monorchid.lemis.com (Postfix, from userid 1004)
	id 14C4A786E3; Tue, 18 Dec 2001 10:38:09 +1030 (CST)
Date: Tue, 18 Dec 2001 10:38:09 +1030
From: Greg Lehey <grog@FreeBSD.org>
To: Dave Reyenga <dreyenga@telus.net>
Cc: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org,
	hiten@uk.FreeBSD.org
Subject: Re: Instead of JFS, why not a whole new FS?
Message-ID: <20011218103809.V14500@monorchid.lemis.com>
References: <001301c1874d$50ae0d20$02000003@tornado>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <001301c1874d$50ae0d20$02000003@tornado>
User-Agent: Mutt/1.3.23i
Organization: The FreeBSD Project
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-418-838-708
WWW-Home-Page: http://www.FreeBSD.org/
X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF  13 24 52 F8 6D A4 95 EF
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Monday, 17 December 2001 at 22:50:45 -0000, Dave Reyenga wrote:
> How about writing a new filesystem based on UFS?

If it's based on UFS, it's not a new file system.

> This would save all of the hassle that JFS would bring: licensing,
> porting time, etc.

There are no hassles with licensing.  You'd be balancing porting time
against writing time.  Guess which would take longer.

> What I'm thinking is a filesystem that takes the current UFS and
> improves upon it. It could support larger partitions,

That's relatively trivial.  The big issue is compatibility.

> more partitions in a slice,

That's relatively trivial.  The big issue is compatibility.

> and perhaps a "Journal" partition (like the current "swap"
> partition)

Well, I don't think the journal would be like swap.

> among other new features.

That's pretty much what IBM did.  They called the result JFS.

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Dec 17 18:29:52 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from avocet.prod.itd.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50])
	by hub.freebsd.org (Postfix) with ESMTP
	id 985C037B41A; Mon, 17 Dec 2001 18:29:47 -0800 (PST)
Received: from pool0289.cvx40-bradley.dialup.earthlink.net ([216.244.43.34] helo=mindspring.com)
	by avocet.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16GA0n-0002CE-00; Mon, 17 Dec 2001 18:29:46 -0800
Message-ID: <3C1EAA1A.CA49932@mindspring.com>
Date: Mon, 17 Dec 2001 18:29:46 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Dave Reyenga <dreyenga@telus.net>
Cc: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org,
	hiten@uk.FreeBSD.org
Subject: Re: Instead of JFS, why not a whole new FS?
References: <001301c1874d$50ae0d20$02000003@tornado>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Dave Reyenga wrote:
> 
> How about writing a new filesystem based on UFS? This would save all of the
> hassle that JFS would bring: licensing, porting time, etc. Of course, it
> would likely bust any compatibility desired.
> 
> What I'm thinking is a filesystem that takes the current UFS and improves
> upon it. It could support larger partitions, more partitions in a slice, and
> perhaps a "Journal" partition (like the current "swap" partition) among
> other new features.
> 
> What do others have to say about this? Are there any major flaws in my idea?
> It just seems to me that this would cut a lot of hassle.

Any FS that shares code with an existing FS will not flush out
the full list of problems associated with writing a new FS in
the context of a FreeBSD system.

For that reason, any UFS based system, including but not limited
to FFS, LFS, EXT2FS, etc., is probably not a good example to use
for an educational project.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Dec 18  8:13:13 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from patan.sun.com (patan.Sun.COM [192.18.98.43])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5B3AB37B405; Tue, 18 Dec 2001 08:13:07 -0800 (PST)
Received: from canadamail1.Canada.Sun.COM ([129.155.5.100])
	by patan.sun.com (8.9.3+Sun/8.9.3) with ESMTP id JAA02709;
	Tue, 18 Dec 2001 09:12:48 -0700 (MST)
Received: from opcom-mail.canada.sun.com (scot.Canada.Sun.COM [129.155.8.107])
	by canadamail1.Canada.Sun.COM (8.9.3+Sun/8.9.3/ENSMAIL,v2.1p1) with ESMTP id LAA01699;
	Tue, 18 Dec 2001 11:13:05 -0500 (EST)
Received: from zonzorp.canada.sun.com (zonzorp.Canada.Sun.COM [129.155.6.21])
	by opcom-mail.canada.sun.com (8.9.1b+Sun/8.9.1) with ESMTP id LAA12127;
	Tue, 18 Dec 2001 11:12:40 -0500 (EST)
Received: from zonzorp (oz@localhost)
	by zonzorp.canada.sun.com (8.9.3+Sun/8.9.3) with ESMTP id LAA26047;
	Tue, 18 Dec 2001 11:11:01 -0500 (EST)
Message-Id: <200112181611.LAA26047@zonzorp.canada.sun.com>
X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0
To: Terry Lambert <tlambert2@mindspring.com>
Cc: freebsd-fs@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG
Subject: Re: Instead of JFS, why not a whole new FS? 
In-Reply-To: Message from Terry Lambert <tlambert2@mindspring.com> 
   of "Mon, 17 Dec 2001 18:29:46 PST." <3C1EAA1A.CA49932@mindspring.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Tue, 18 Dec 2001 11:11:00 -0500
From: "ozan s. yigit" <oz@zonzorp.canada.sun.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

> Any FS that shares code with an existing FS will not flush out
> the full list of problems associated with writing a new FS in
> the context of a FreeBSD system.

how about an implementation of plan9's kfs? it is fairly simple, with
dentries similar to unix inodes, eg.

	typedef struct {
		char name[NAMELEN];
		short uid;
		short gid;
		ushort mode;
		short wuid;
		Qid qid;
		long size;
		long dblock[NDBLOCK];	/* 6 */
		long iblock;
		long diblock;
		long atime;
		long mtime;
	} Dentry;

and perhaps would make a good educational implementation. sources for
plan9's own is in plan9/sys/src/cmd/disk, if one needs to take a look.
the document "the plan9 file server" by thompson gives some detail.

oz
---
ozan s. yigit			staff engineer, sun microsystems/es
http://www.cs.yorku.ca/~oz	ozan.yigit@sun.com || +1 [905] 415 2878
---
narrowness of imagination leads to narrowness of experience. [corollary to rob]


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Dec 18 11:18:52 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from elvis.mu.org (elvis.mu.org [216.33.66.196])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5195D37B405; Tue, 18 Dec 2001 11:18:50 -0800 (PST)
Received: by elvis.mu.org (Postfix, from userid 1192)
	id DA9FE81E0C; Tue, 18 Dec 2001 13:18:44 -0600 (CST)
Date: Tue, 18 Dec 2001 13:18:44 -0600
From: Alfred Perlstein <bright@mu.org>
To: Kirk McKusick <mckusick@freebsd.org>
Cc: fs@freebsd.org
Subject: fast fsck for snapshots
Message-ID: <20011218131844.E59831@elvis.mu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

In theory if one were to periodically check a running filesystem's
inodes for softdeps then update the superblock to point out the
oldest file with pending softdeps at startup one would only have
to scan all the inodes with mtimes > superblock update time.

Then one should be able to free the blocks not claimed by those
inodes.  Wouldn't this signifigantly cut down on the amount of
time required to fsck the snapshot?

I think one of the problems is that inodes are "scrubbed" when
flushed to disk as deleted files, one would have to write out
the mtime so that fsck could pick up recently deleted files.
Does FFS depend on the indirect blocks being "scrubbed" as well?

Good idea, or am I just too cafinated at the moment? :)

-- 
-Alfred Perlstein [alfred@freebsd.org]
'Instead of asking why a piece of software is using "1970s technology,"
 start asking why software is ignoring 30 years of accumulated wisdom.'
                           http://www.morons.org/rants/gpl-harmful.php3

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Dec 18 19:28: 5 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from omta02.mta.everyone.net (sitemail2.everyone.net [216.200.145.36])
	by hub.freebsd.org (Postfix) with ESMTP id 4E0F137B405
	for <freebsd-fs@freebsd.org>; Tue, 18 Dec 2001 19:27:59 -0800 (PST)
Received: from sitemail.everyone.net (reports [216.200.145.62])
	by omta02.mta.everyone.net (Postfix) with ESMTP id 3A05F1C4F15
	for <freebsd-fs@freebsd.org>; Tue, 18 Dec 2001 19:27:59 -0800 (PST)
Received: by sitemail.everyone.net (Postfix, from userid 99)
	id 23F5136F9; Tue, 18 Dec 2001 19:27:59 -0800 (PST)
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0
X-Mailer: MIME-tools 5.41 (Entity 5.404)
Date: Tue, 18 Dec 2001 19:27:59 -0800 (PST)
From: Rohit Grover <rohit@gojuryu.com>
To: freebsd-fs@freebsd.org
Subject: upper limit on # of vnops?
Reply-To: rohit@gojuryu.com
X-Originating-Ip: [65.194.57.194]
Message-Id: <20011219032759.23F5136F9@sitemail.everyone.net>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Hello,

I am using Freebsd4.3-RELEASE and wish to add a few vnode ops. Is there an upper limit on the number of vnode ops supported by the VFS layer in Freebsd? I am having some trouble going beyond a certain small number of new operations. Any help would be appreciated.

rohit.

_____________________________________________________________
http://www.gojuryu.com . What Karate Do was meant to be.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Dec 18 22:23:42 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from falcon.prod.itd.earthlink.net (falcon.mail.pas.earthlink.net [207.217.120.74])
	by hub.freebsd.org (Postfix) with ESMTP id 36D0C37B405
	for <freebsd-fs@freebsd.org>; Tue, 18 Dec 2001 22:23:38 -0800 (PST)
Received: from pool0514.cvx21-bradley.dialup.earthlink.net ([209.179.194.4] helo=mindspring.com)
	by falcon.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16Ga8e-0005TP-00; Tue, 18 Dec 2001 22:23:36 -0800
Message-ID: <3C203267.43543107@mindspring.com>
Date: Tue, 18 Dec 2001 22:23:35 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: rohit@gojuryu.com
Cc: freebsd-fs@freebsd.org
Subject: Re: upper limit on # of vnops?
References: <20011219032759.23F5136F9@sitemail.everyone.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Rohit Grover wrote:
> I am using Freebsd4.3-RELEASE and wish to add a few vnode ops.
> Is there an upper limit on the number of vnode ops supported
> by the VFS layer in Freebsd?

No.  But there are a number of artificial constraints on when and
how they may be added.

> I am having some trouble going beyond a certain small number of
> new operations. Any help would be appreciated.

Most likely, you do not need to add operations, and you should be
hooking your changes into fcntl(), etc..

In the unlikely event that you need to add some ops, you should
be aware of the artificial limitations:

1)	When the vnode_if.h and vnode_if.c code is generated
	from /sys/kern/vnode_if.src by /sys/kern/vnode_if.pl,
	the number of NOPs permitted is fixed, by virtue of
	the fixed size VOP descriptor array.

2)	When the VFS system is first initialized, it takes an
	existing filesystem instance, and refactors it in order
	to get the total number of VOPs.  This is arguably more
	correct than what it did perviously (counted the VOPs in
	the FFS code, for a mandatory instance of FFS), but the
	limit is real, and can't be exceeded.

	Basically, this adds some recompilation requirements that
	are much less obvious than they should be, if you are
	using modification of /sys/kern/vnode_if.src to add the
	new VOPs.  The best suggestion, if you are using this
	method, is to delete and recreate the compilation files,
	rather than expecting the dependencies to work if you
	add VOPs.

3)	You can not add VOPs to the table at run time.  The best
	you can currently do is to replace placeholder VOPs with
	new VOPs.  If you have placeholder VOPs, and you do this
	(see the end of the VOP descriptor array in the generated
	vnode_if.c in the kernel compilation directory), you are
	limited to the number of placeholders that exist.  If you
	look at the system call extension code, you will see that
	it has this same limitation.

4)	If you want to correct this, you will need to refactor
	all existing FS instances when you add a VOP (or VOPs).
	To do this, you will need to recreate the instance
	structures for the existing FS instances, and you will
	need to replace/extend the existing VOP list, as it is
	in vnode_if.c.  The vnode_if.h changes, which provide
	the wrappers are less important (you can manually add
	those to only the code that uses them).

	The main thing you will have to do is to ensure that all
	references to the generated list are by pointer, and then
	reallocate and copy the list, and then add your VOPs to
	the end of the list, following extension.

	Since VOP calls are made through this list, you will need
	to take the FS instance structures, which are allocated
	at mount time, and reallocate them, copying the old in,
	and maintaining defaults.

5)	Because of PHK's "default vops" stuff, you will need to
	refactor the instances, as well, rather than simply
	copying them, so that the correct defaults are maintained;
	in the original design, there was no such thing as "default
	vops", and such refactoring would not have been necessary
	(though you would still have to reallocate and do the prefix
	copy of the previous VOPs, if the VOP vector list changed;
	but the default of "not supported" would have been correct,
	particularly for intermediate stacking layers, where it would
	become a "pass through").

6)	If you intend to support stacking, you will have to refactor
	the stacks, as well.  This may be tricky.

	The correct thing to do when creating a stack is to push
	all NOP layers down in the instance version, which would
	(effectively) cut the intermediate layer transitions out of
	the assembled call graph.

	Effectively, this means that when you add VOPs, particularly
	VOPs for which there are non-pass-through defaults (another
	thing that interferes with stacking, as in the original design,
	all defaults were pass through), you will need to reconstruct
	the list.

7)	For most of the above reasons, that means that when you are
	adding VOPs at runtime, you will want to complete refactor
	all existing mount instances, such as they are (I say it this
	way because, though it is unlikely, if you were to be using
	one of the proxy layers -- either network or user space --
	that UCLA CS students did in John Heidemann's classes, then
	you would find it impossible, since you can not control the
	defaults on the other side of the proxy... consider a proxy
	from a local consume that knows about the new VOP to a remote
	stacking layer that doesn't, back to a local media FS that
	does, and the fact that you want the VOP to go all the way
	through and back, without harm, but it is out of range of
	the decriptor list on the remote node because of the "default
	vops" handling).

All in all, it would be much, much easier for you if you did one of:

A)	Use fcntl() in the FS, instead, and don't invent new VOPs.

OR:

B)	Add the VOPs to the /sys/kern/vnode_if.src, and totally
	recreate the compilation directory, in order that your
	VOPs will be apriori known to the system.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Dec 19 11:45:36 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from repulse.cnchost.com (repulse.concentric.net [207.155.248.4])
	by hub.freebsd.org (Postfix) with ESMTP
	id AE3A837B419; Wed, 19 Dec 2001 11:45:28 -0800 (PST)
Received: from bitblocks.com (adsl-209-204-185-216.sonic.net [209.204.185.216])
	by repulse.cnchost.com
	id OAA04975; Wed, 19 Dec 2001 14:45:20 -0500 (EST)
	[ConcentricHost SMTP Relay 1.14]
Message-ID: <200112191945.OAA04975@repulse.cnchost.com>
To: Terry Lambert <tlambert2@mindspring.com>
Cc: Andrea Campi <andrea@webcom.it>, freebsd-arch@FreeBSD.ORG,
	freebsd-fs@freebsd.org
Reply-To: freebsd-fs@freebsd.org
Subject: Re: Real world Root Resizing (was Re: Proposed auto-sizing patch ... 
In-reply-to: Your message of "Wed, 12 Dec 2001 10:36:19 PST."
             <3C17A3A3.A439BE21@mindspring.com> 
Date: Wed, 19 Dec 2001 11:45:21 -0800
From: Bakul Shah <bakul@bitblocks.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

[sorry, I should have sent my original message to -fs instead of -arch]

Andrea Campi wrote:
> #include <std/disclaimer>
> 
> I was able to simple boot to single user and growfs my / without any magic.
> I *might* have changed it to read-only just for safety but I don't think so

You are a smarter person than I!  I believed the growfs man
page (it only works on unmounted file system) but should've
realized it would work on a readonly mount provided you
reboot right after.  But I admit, I didn't trust growfs to be
bug free which is why I first made a mirror copy of the root
partition.

Terry Lambert writes:
> You could imagine a brute force tool to do this: back up to tape,
> newfs, and restore from tape.

You can tar cf to another filesystem and tar xf for the
special case of a small root filesystem.

> A better tool would allow you to defragment an existing FS, or even
> run in the background at boot, and defragment only if necessary (some
> inequality threshold on per cylinder group fill amounts, perhaps).
> 
> An even better tool might allow you to "defragment" a large disk, at
> the same time declaring the end of that disk "off limits".  Doing
> that would let you actually free up cylinder groups at the end of a
> disk -- and shrink partitions, as well as expand them.

I wonder if one can devise a syscall interface to do this
safely without requiring detailed knowledge of the FS layout
and replicating a lot of FS code in user mode.

* For shrinking a partition you need a syscall to limit
  disk block allocation.  Something like

    int fs_alloc(const char* mountpoint, size_t offset, size_t limit);
  
  This would do all allocation the [offset..limit) range
  until the next call.  Even if you grew a file outside this
  range, the new blocks will be allocated here.  A filesystem
  that does not implement this functionality returns ENOSYS.
  offset and limit are in disk blocksize unit but may need to
  be rounded up to some FS specific parameter (such as
  cylinder group size for FFS).

* For defragmenting you need a way to move file data.
  Something like

    int frealloc(fd, offset, count, addr)

  offset & count must be multiples of disk block size.
  addr is a hint as to where these blocks should be moved.
  The call fails if the suggested new blocks are in use.

  The FS code atomically (at syscall level) moves specified
  blocks to the new area.

* You also need to be able to get to various freelists.

I can't see how defragmentation can be done without some
knowledge of FS layout but perhaps most of the details can be
abstracted out well enough that the same interface can be
used for different FSes.

You would run this on a quiescent system but there is no need
to unmount the FS or even bring the system down to single
user.

Placement of files can also be changed once you have this
interface.  One idea is to sample file access time.  Files
that gets read frequently can be moved to reduce seek time.
Files with similar access time can be clustered and so on.
What would be better than sampling atime is keeping read
stats in each inode: each time a file is read and the atime
is to be updated, increment a small counter (but make it
`stick' when it reaches max).  This counter is zeroed when
the stats are gathered by a user program.  I am not holding
my breath though.

Comments?

-- bakul

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Dec 19 14:26: 2 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from omta02.mta.everyone.net (sitemail2.everyone.net [216.200.145.36])
	by hub.freebsd.org (Postfix) with ESMTP id 4158337B623
	for <freebsd-fs@freebsd.org>; Wed, 19 Dec 2001 14:25:39 -0800 (PST)
Received: from sitemail.everyone.net (reports [216.200.145.62])
	by omta02.mta.everyone.net (Postfix) with ESMTP id C80B51C379C
	for <freebsd-fs@freebsd.org>; Wed, 19 Dec 2001 14:25:38 -0800 (PST)
Received: by sitemail.everyone.net (Postfix, from userid 99)
	id ACBCD36F9; Wed, 19 Dec 2001 14:25:38 -0800 (PST)
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0
X-Mailer: MIME-tools 5.41 (Entity 5.404)
Date: Wed, 19 Dec 2001 14:25:38 -0800 (PST)
From: Rohit Grover <rohit@gojuryu.com>
To: freebsd-fs@freebsd.org
Subject: Re: upper limit on # of vnops?
Reply-To: rohit@gojuryu.com
X-Originating-Ip: [65.194.57.194]
Message-Id: <20011219222538.ACBCD36F9@sitemail.everyone.net>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org


>3)	You can not add VOPs to the table at run time.  The best
>	you can currently do is to replace placeholder VOPs with
>	new VOPs.  If you have placeholder VOPs, and you do this
>	(see the end of the VOP descriptor array in the generated
>	vnode_if.c in the kernel compilation directory), you are
>	limited to the number of placeholders that exist.  If you
>	look at the system call extension code, you will see that
>	it has this same limitation.


I wasn't aware of this constraint until now. I was trying to add vnode_ops using a loadable module. You're right, Freebsd 4.3-RELEASE doesn'nt support dynamic addition of vnode ops. The following code (taken from vfs_opv_recalc()) proves the point.

	....
	for (i = 0; i < vnodeopv_num; i++) {
		opv = vnodeopv_descs[i];
		opv_desc_vector_p = opv->opv_desc_vector_p;
		if (*opv_desc_vector_p)
			FREE(*opv_desc_vector_p, M_VNODE);
		MALLOC(*opv_desc_vector_p, vop_t **,
		       vfs_opv_numops * sizeof(vop_t *), M_VNODE, M_WAITOK);
	....

I also found out that the reason I was able to add a few vops until now was that the MALLOC (in vfs_opv_recalc() above) was reallocating the memory freed by FREE(). This was made possible by the fact that vfs_opv_numops was under a power-of-2. As soon as I added the 64th vop_t, the vop vectors for all currently active vnodes were freed in vfs_opv_recalc() and the system paniced in a wierd place.

thanks for your help Terry.

rohit.


_____________________________________________________________
http://www.gojuryu.com . What Karate Do was meant to be.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Dec 19 17:24:47 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.cablespeed.com (mail.cablespeed.com [206.112.192.76])
	by hub.freebsd.org (Postfix) with SMTP id D189837B417
	for <freebsd-fs@freebsd.org>; Wed, 19 Dec 2001 17:24:39 -0800 (PST)
Received: (qmail 24330 invoked by uid 0); 20 Dec 2001 01:24:39 -0000
Received: from unknown (HELO cablespeed.com) (216.45.72.227)
  by mail.cablespeed.com with SMTP; 20 Dec 2001 01:24:39 -0000
Message-ID: <3C213DD6.3CAD0C3C@cablespeed.com>
Date: Wed, 19 Dec 2001 20:24:38 -0500
From: Chuck McCrobie <mccrobie@cablespeed.com>
X-Mailer: Mozilla 4.72 [en] (X11; I; FreeBSD 4.4-STABLE i386)
X-Accept-Language: en
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: Re: Real world Root Resizing (was Re: Proposed auto-sizing patch ...
References: <200112191945.OAA04975@repulse.cnchost.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Bakul Shah wrote:

<snip>

> 
> I wonder if one can devise a syscall interface to do this
> safely without requiring detailed knowledge of the FS layout
> and replicating a lot of FS code in user mode.
> 
> * For shrinking a partition you need a syscall to limit
>   disk block allocation.  Something like
> 
>     int fs_alloc(const char* mountpoint, size_t offset, size_t limit);
> 
>   This would do all allocation the [offset..limit) range
>   until the next call.  Even if you grew a file outside this
>   range, the new blocks will be allocated here.  A filesystem
>   that does not implement this functionality returns ENOSYS.
>   offset and limit are in disk blocksize unit but may need to
>   be rounded up to some FS specific parameter (such as
>   cylinder group size for FFS).
> 
> * For defragmenting you need a way to move file data.
>   Something like
> 
>     int frealloc(fd, offset, count, addr)
> 
>   offset & count must be multiples of disk block size.
>   addr is a hint as to where these blocks should be moved.
>   The call fails if the suggested new blocks are in use.
> 
>   The FS code atomically (at syscall level) moves specified
>   blocks to the new area.
> 

Windows 2000 provides a "MOVE FILE DATA" IOCTL to the file system.  The
file system is supposed to move the referenced file data to the
specified location.  The location is specified by disk lbn.  The "MOVE
FILE DATA" may specify a location which is now occupied (but wasn't
before).  The file system is supposed to ignore the request in that
case.

> * You also need to be able to get to various freelists.
> 

Windows 2000 also provides a "GET SPACE BITMAP" IOCTL to the file
system.  The file system is supposed to return an up-to-date bitmap
describing the allocation of space in the partition.

> I can't see how defragmentation can be done without some
> knowledge of FS layout but perhaps most of the details can be
> abstracted out well enough that the same interface can be
> used for different FSes.
> 

I guess making a file physically contiguous might be a good start.  I
think the FFS cluster code attempts to keep files contiguous...  Perhaps
extracting out or exposing generic logic for the FFS code would work.

Would it be possible to also move around inodes?  My understanding of
the idea behind "dir pref" is to keep inodes of files in the same
directory contiguous.  Do other pieces (NFS?) keep track of inodes by
their location (or does inode number imply location?).  That is, does
moving a inode from one location to another break things higher up?

> You would run this on a quiescent system but there is no need
> to unmount the FS or even bring the system down to single
> user.
> 
> Placement of files can also be changed once you have this
> interface.  One idea is to sample file access time.  Files
> that gets read frequently can be moved to reduce seek time.
> Files with similar access time can be clustered and so on.
> What would be better than sampling atime is keeping read
> stats in each inode: each time a file is read and the atime
> is to be updated, increment a small counter (but make it
> `stick' when it reaches max).  This counter is zeroed when
> the stats are gathered by a user program.  I am not holding
> my breath though.
> 
> Comments?
> 
> -- bakul
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-fs" in the body of the message

-- 
--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Dec 22  3:33:14 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from elvis.mu.org (elvis.mu.org [216.33.66.196])
	by hub.freebsd.org (Postfix) with ESMTP
	id DAE8537B417; Sat, 22 Dec 2001 03:33:11 -0800 (PST)
Received: by elvis.mu.org (Postfix, from userid 1192)
	id 6BA5081E0C; Sat, 22 Dec 2001 05:33:06 -0600 (CST)
Date: Sat, 22 Dec 2001 05:33:06 -0600
From: Alfred Perlstein <bright@mu.org>
To: mckusick@freebsd.org
Cc: fs@freebsd.org
Subject: fsck and predictive readahead?
Message-ID: <20011222053306.Y48837@elvis.mu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

I'm wondering if fsck uses any sort of tricks to do read-ahead
to prefect data for pass1 and pass2.

If not does anyone thing it might speed things up?

We could use a reasonably simple child process (or team of them) to
read into anonymous mmap areas shared between the master and child
to do this.

Any ideas, any hints on where the code would fit best?

-- 
-Alfred Perlstein [alfred@freebsd.org]
'Instead of asking why a piece of software is using "1970s technology,"
 start asking why software is ignoring 30 years of accumulated wisdom.'
                           http://www.morons.org/rants/gpl-harmful.php3

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Dec 22  5:53:15 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11])
	by hub.freebsd.org (Postfix) with SMTP
	id C1B5537B405; Sat, 22 Dec 2001 05:53:12 -0800 (PST)
Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP
          id <aa41047@salmon>; 22 Dec 2001 13:53:11 +0000 (GMT)
To: Alfred Perlstein <bright@mu.org>
Cc: mckusick@freebsd.org, fs@freebsd.org
Subject: Re: fsck and predictive readahead? 
In-Reply-To: Your message of "Sat, 22 Dec 2001 05:33:06 CST."
             <20011222053306.Y48837@elvis.mu.org> 
Date: Sat, 22 Dec 2001 13:53:11 +0000
From: Ian Dowse <iedowse@maths.tcd.ie>
Message-ID:  <200112221353.aa41047@salmon.maths.tcd.ie>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

In message <20011222053306.Y48837@elvis.mu.org>, Alfred Perlstein writes:
>I'm wondering if fsck uses any sort of tricks to do read-ahead
>to prefect data for pass1 and pass2.
>
>If not does anyone thing it might speed things up?

I've wondered about this also. Since fsck spends virtually all of
its time waiting for disk reads, doing most kinds of speculative
disk reads would only slow things down. However, there is some
potential for re-ordering the reads to reduce seeking and to allow
data to be read in larger chunks.

Pass 1 involves quite a lot of disk seeking because it goes off and
retrieves all indirection blocks (blocks of block numbers) for any
inodes that have them. Otherwise pass 1 would be a simple linear
scan through all inodes. It would be possible to defer the reading
of indirection blocks and then read them in order (having 2nd- and
3rd-level indirection blocks complicates this). I think I tried a
simple form of this a few years ago, but the speedup was only
marginal. I believe I also tried changing fsck's bread() to read
larger blocks when contiguous reads were detected, again with no
significant improvements.

For pass 2, the directories are sorted by the block number of their
first block, so there is very little seeking. Some speed improvement
might be possible by doing a larger read when a few directory blocks
are close together on the disk.

An interesting exercise would be to modify fsck to print out a list
of the offset and length for every disk read it performs. Then sort
that list, coalesce contiguous reads, and see how long it takes the
disk to read the new list as compared to the original. Such perfect
sorting is obviously not feasable in practice, but it would give
some idea of the potential for improvements.

Ian

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Dec 22 13: 8:26 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from elvis.mu.org (elvis.mu.org [216.33.66.196])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4E0A337B41A; Sat, 22 Dec 2001 13:08:20 -0800 (PST)
Received: by elvis.mu.org (Postfix, from userid 1192)
	id C590B81E0C; Sat, 22 Dec 2001 15:08:14 -0600 (CST)
Date: Sat, 22 Dec 2001 15:08:14 -0600
From: Alfred Perlstein <bright@mu.org>
To: Ian Dowse <iedowse@maths.tcd.ie>
Cc: mckusick@freebsd.org, fs@freebsd.org
Subject: Re: fsck and predictive readahead?
Message-ID: <20011222150814.Z48837@elvis.mu.org>
References: <20011222053306.Y48837@elvis.mu.org> <200112221353.aa41047@salmon.maths.tcd.ie>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200112221353.aa41047@salmon.maths.tcd.ie>; from iedowse@maths.tcd.ie on Sat, Dec 22, 2001 at 01:53:11PM +0000
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

* Ian Dowse <iedowse@maths.tcd.ie> [011222 07:53] wrote:
> In message <20011222053306.Y48837@elvis.mu.org>, Alfred Perlstein writes:
> >I'm wondering if fsck uses any sort of tricks to do read-ahead
> >to prefect data for pass1 and pass2.
> >
> >If not does anyone thing it might speed things up?
> 
> I've wondered about this also. Since fsck spends virtually all of
> its time waiting for disk reads, doing most kinds of speculative
> disk reads would only slow things down. However, there is some
> potential for re-ordering the reads to reduce seeking and to allow
> data to be read in larger chunks.
> 
> Pass 1 involves quite a lot of disk seeking because it goes off and
> retrieves all indirection blocks (blocks of block numbers) for any
> inodes that have them. Otherwise pass 1 would be a simple linear
> scan through all inodes. It would be possible to defer the reading
> of indirection blocks and then read them in order (having 2nd- and
> 3rd-level indirection blocks complicates this). I think I tried a
> simple form of this a few years ago, but the speedup was only
> marginal. I believe I also tried changing fsck's bread() to read
> larger blocks when contiguous reads were detected, again with no
> significant improvements.
> 
> For pass 2, the directories are sorted by the block number of their
> first block, so there is very little seeking. Some speed improvement
> might be possible by doing a larger read when a few directory blocks
> are close together on the disk.
> 
> An interesting exercise would be to modify fsck to print out a list
> of the offset and length for every disk read it performs. Then sort
> that list, coalesce contiguous reads, and see how long it takes the
> disk to read the new list as compared to the original. Such perfect
> sorting is obviously not feasable in practice, but it would give
> some idea of the potential for improvements.

The problem you didn't address with all these changes was stalls
due to disk IO.

/usr/src/sbin/fsck_ffs # time ./fsck_ffs -d -n /vol/spare 
** /dev/ad0s1g (NO WRITE)
** Last Mounted on /vol/spare
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
303580 files, 12865338 used, 9183427 free (51827 frags, 1141450 blocks, 0.2% fragmentation)
./fsck_ffs -d -n /vol/spare  24.50s user 4.72s system 19% cpu 2:30.73 total

No matter how you order the IO, fsck is going to have to wait for read(2)
to return.  If we can offload that waiting to a child process we may
be able to fix this.

Is there any detailed commenting on the sources available, they are
quite readable, but still very terse.  A more in depth explanation
of each function would really help.  Do you know of a paper, manpage
or do you have the time to sprinkle some commentary into the code?

-- 
-Alfred Perlstein [alfred@freebsd.org]
'Instead of asking why a piece of software is using "1970s technology,"
 start asking why software is ignoring 30 years of accumulated wisdom.'
                           http://www.morons.org/rants/gpl-harmful.php3

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message