From owner-freebsd-fs  Sun Jan 26  0:57:41 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 338A737B401
	for <freebsd-fs@freebsd.org>; Sun, 26 Jan 2003 00:57:39 -0800 (PST)
Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189])
	by mx1.FreeBSD.org (Postfix) with ESMTP id AB04143ED8
	for <freebsd-fs@freebsd.org>; Sun, 26 Jan 2003 00:57:38 -0800 (PST)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0074.cvx40-bradley.dialup.earthlink.net ([216.244.42.74] helo=mindspring.com)
	by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)
	id 18cibf-0000lK-00; Sun, 26 Jan 2003 00:57:36 -0800
Message-ID: <3E33A208.5ED9A35B@mindspring.com>
Date: Sun, 26 Jan 2003 00:53:28 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Craig Reyenga <creyenga@connectmail.carleton.ca>
Cc: freebsd-fs@freebsd.org
Subject: Re: What about a case insensitive Filesystem?
References: <001101c2c4e4$51686960$0200000a@sewer.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4a1abf569f4f6fb263c7073bf60832e4e350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Craig Reyenga wrote:
> <begin potentially dumb question>
> Is there any way, either now or in the future, for FreeBSD to be able to
> have a UFS-based case-insensitive filesystem? It would be great for many
> applications, such as Samba servers, web servers catered to the general
> public (angelfire, geocities) and places where the user just doesn't care.
> Is this at all possible?
> <end potentially dumb question>
> 
> (I'm not on the list, so CC'ing would be great)

Where do you mean case insensitive?  Storage?  Lookup?  Iteration?

In general, what people mean when they say this is "case sensistive
on storage and iteration (for display), but case insensistive on
lookup".

The short answer is "it's possible: go ahead and write the code".

The longer answer is "it's possible, but it's not something you
really want to do, unless you are willing to move globbing into
the kernel".  This is because, on lookup, you want to effectively
turn each character into a wildcard based on case insensitivity.
This is relatively easy for US ASCII, where if the character is
in the right range, you AND off a bit, and treat everything that
way.  Thus it's better if you do this as if you were doing a
globbing operation, rather than the way UNIX expects you to do it.

The other issue that wants globbing in the kernel is when you have
a lookup for a particular purpose; in general, there are three
types of lookup:

1)	Lookup of existing entry for file operation (stat, open,
	etc.).

2)	Lookup of existing entry for directory entry creation
	operation (create, link larget, etc.).

3)	Lookup of existing entry for directory entry deletion
	operation (rename, unlink, etc.).

This doesn't seem iportant, until you have two files in a directory,
e.g. "start" and "Stop", and you try one of:

	mv Start stop
	mv Start start
	mv Stop Stop
	rm st*
	...

See the point?  The globbing, particularly in the "rm" case has to
be moved into the kernel.

The alternative to this is to add globbing to each and every shell
out there, and hope to God that it's implemented the same way in
all of them.  This is because in UNIX systems, the globbing is
expanded before being passed accross the system call boundary.

The main problem with doing this is that it, effectively, then
assumes that the underlying FS *must* be case insensitive on
lookup.  Specifically:

	ls > Start
	ls > start

Can never end up with two files, because the shell would find the
first file when looking up the redirect target for the second
command, and dump to it anyway.

This also has a problem with files which *already* exist on an
FS, on which such a shell is then used, e.g.:

	ls -i
	136 Start
	137 start
	cat sTart
	...

Therefore doing this in the shell is unacceptable from many, many
perspectives.., not the least of which is the fact that you can't
have case insensitivity be an attribute of the underlying FS, it
is instead an attribute of the shell.

FWIW, if this doesn't make a lot of sense to you, but you are
willing to hack up a shell to try it out, you will quickly see
what I mean.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Jan 27 11: 1:36 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id EEACE37B405
	for <fs@freebsd.org>; Mon, 27 Jan 2003 11:01:35 -0800 (PST)
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7CDCF43F13
	for <fs@freebsd.org>; Mon, 27 Jan 2003 11:01:35 -0800 (PST)
	(envelope-from owner-bugmaster@freebsd.org)
Received: from freefall.freebsd.org (peter@localhost [127.0.0.1])
	by freefall.freebsd.org (8.12.6/8.12.6) with ESMTP id h0RJ1ZNS068943
	for <fs@freebsd.org>; Mon, 27 Jan 2003 11:01:35 -0800 (PST)
	(envelope-from owner-bugmaster@freebsd.org)
Received: (from peter@localhost)
	by freefall.freebsd.org (8.12.6/8.12.6/Submit) id h0RJ1ZCb068937
	for fs@freebsd.org; Mon, 27 Jan 2003 11:01:35 -0800 (PST)
Date: Mon, 27 Jan 2003 11:01:35 -0800 (PST)
Message-Id: <200301271901.h0RJ1ZCb068937@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: peter set sender to owner-bugmaster@freebsd.org using -f
From: FreeBSD bugmaster <bugmaster@freebsd.org>
To: fs@FreeBSD.org
Subject: Current problem reports assigned to you
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Current FreeBSD problem reports
Critical problems
Serious problems
Non-critical problems

S  Submitted   Tracker     Resp.    Description
-------------------------------------------------------------------------------
a [2000/10/06] kern/21807  fs       [patches] Make System attribute correspon

1 problem total.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Jan 27 16:33:48 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 0F6C137B401; Mon, 27 Jan 2003 16:33:47 -0800 (PST)
Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id A7AFF43F79; Mon, 27 Jan 2003 16:33:46 -0800 (PST)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0350.cvx22-bradley.dialup.earthlink.net ([209.179.199.95] helo=mindspring.com)
	by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)
	id 18dJhA-0007mN-00; Mon, 27 Jan 2003 16:33:45 -0800
Message-ID: <3E35CF66.58143561@mindspring.com>
Date: Mon, 27 Jan 2003 16:31:34 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: FreeBSD bugmaster <bugmaster@freebsd.org>
Cc: fs@FreeBSD.org
Subject: Re: Current problem reports assigned to you
References: <200301271901.h0RJ1ZCb068937@freefall.freebsd.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a436657d9c72457ab537c8020cbff2b26e2601a10902912494350badd9bab72f9c350badd9bab72f9c
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

FreeBSD bugmaster wrote:
> Current FreeBSD problem reports
> Critical problems
> Serious problems
> Non-critical problems
> 
> S  Submitted   Tracker     Resp.    Description
> -------------------------------------------------------------------------------
> a [2000/10/06] kern/21807  fs       [patches] Make System attribute correspon
> 
> 1 problem total.


Could someone point this PR at someone who cares to try and fix
it (e.g. the original poster of the bug), instead of at the
FreeBSD-FS mailing list?

Brow-beating us with the "Open PR" cron job is going to make
any us on this list any more likely to care about "fixing" this
"problem" for you than we have been for the last 6 months.

Thanks.
-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Jan 27 16:43: 4 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id DE71837B401; Mon, 27 Jan 2003 16:43:03 -0800 (PST)
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 6839A43E4A; Mon, 27 Jan 2003 16:43:03 -0800 (PST)
	(envelope-from dougb@FreeBSD.org)
Received: from freefall.freebsd.org (dougb@localhost [127.0.0.1])
	by freefall.freebsd.org (8.12.6/8.12.6) with ESMTP id h0S0h3NS084725;
	Mon, 27 Jan 2003 16:43:03 -0800 (PST)
	(envelope-from dougb@freefall.freebsd.org)
Received: (from dougb@localhost)
	by freefall.freebsd.org (8.12.6/8.12.6/Submit) id h0S0h32g084721;
	Mon, 27 Jan 2003 16:43:03 -0800 (PST)
Date: Mon, 27 Jan 2003 16:43:03 -0800 (PST)
From: Doug Barton <dougb@FreeBSD.org>
Message-Id: <200301280043.h0S0h32g084721@freefall.freebsd.org>
To: dougb@FreeBSD.org, fs@FreeBSD.org, freebsd-bugs@FreeBSD.org
Subject: Re: kern/21807: [patches] Make System attribute correspond to SF_IMMUTABLE
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Synopsis: [patches] Make System attribute correspond to SF_IMMUTABLE

Responsible-Changed-From-To: fs->freebsd-bugs
Responsible-Changed-By: dougb
Responsible-Changed-When: Mon Jan 27 16:41:38 PST 2003
Responsible-Changed-Why: 

The -fs list has not expressed any interest.

http://www.freebsd.org/cgi/query-pr.cgi?pr=21807

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Jan 27 16:43:38 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7753D37B401; Mon, 27 Jan 2003 16:43:37 -0800 (PST)
Received: from 12-234-22-23.client.attbi.com (12-234-22-23.client.attbi.com [12.234.22.23])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id B67DB43F85; Mon, 27 Jan 2003 16:43:36 -0800 (PST)
	(envelope-from DougB@FreeBSD.org)
Received: from slave.gorean.org (budeafy5ukh64snm@slave.gorean.org [10.0.0.1])
	by 12-234-22-23.client.attbi.com (8.12.6/8.12.6) with ESMTP id h0S0hZRJ008507;
	Mon, 27 Jan 2003 16:43:36 -0800 (PST)
	(envelope-from DougB@FreeBSD.org)
Date: Mon, 27 Jan 2003 16:43:35 -0800 (PST)
From: Doug Barton <DougB@FreeBSD.org>
To: Terry Lambert <tlambert2@mindspring.com>
Cc: FreeBSD bugmaster <bugmaster@FreeBSD.org>, fs@FreeBSD.org
Subject: Re: Current problem reports assigned to you
In-Reply-To: <3E35CF66.58143561@mindspring.com>
Message-ID: <20030127164314.T1027@12-234-22-23.pyvrag.nggov.pbz>
References: <200301271901.h0RJ1ZCb068937@freefall.freebsd.org>
 <3E35CF66.58143561@mindspring.com>
Organization: http://www.FreeBSD.org/
X-message-flag: Outlook -- Not just for spreading viruses anymore!
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Mon, 27 Jan 2003, Terry Lambert wrote:

> FreeBSD bugmaster wrote:
> > Current FreeBSD problem reports
> > Critical problems
> > Serious problems
> > Non-critical problems
> >
> > S  Submitted   Tracker     Resp.    Description
> > -------------------------------------------------------------------------------
> > a [2000/10/06] kern/21807  fs       [patches] Make System attribute correspon
> >
> > 1 problem total.
>
>
> Could someone point this PR at someone who cares to try and fix
> it (e.g. the original poster of the bug), instead of at the
> FreeBSD-FS mailing list?

Done.

-- 

    If it's moving, encrypt it. If it's not moving, encrypt
      it till it moves, then encrypt it some more.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Jan 27 16:47: 8 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id CE4A437B401; Mon, 27 Jan 2003 16:47:07 -0800 (PST)
Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 6F0CB43E4A; Mon, 27 Jan 2003 16:47:07 -0800 (PST)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0350.cvx22-bradley.dialup.earthlink.net ([209.179.199.95] helo=mindspring.com)
	by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)
	id 18dJu6-0001sh-00; Mon, 27 Jan 2003 16:47:07 -0800
Message-ID: <3E35D281.4F6EDDBE@mindspring.com>
Date: Mon, 27 Jan 2003 16:44:49 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Doug Barton <DougB@FreeBSD.org>
Cc: FreeBSD bugmaster <bugmaster@FreeBSD.org>, fs@FreeBSD.org
Subject: Re: Current problem reports assigned to you
References: <200301271901.h0RJ1ZCb068937@freefall.freebsd.org>
	 <3E35CF66.58143561@mindspring.com> <20030127164314.T1027@12-234-22-23.pyvrag.nggov.pbz>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a436657d9c72457ab56f58b639eae7d031350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Doug Barton wrote:
> > Could someone point this PR at someone who cares to try and fix
> > it (e.g. the original poster of the bug), instead of at the
> > FreeBSD-FS mailing list?
> 
> Done.

Thank you.  You ar a god.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Jan 27 17: 7: 7 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 37F3E37B401; Mon, 27 Jan 2003 17:07:06 -0800 (PST)
Received: from mailsrv.otenet.gr (mailsrv.otenet.gr [195.170.0.5])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 0D6E543E4A; Mon, 27 Jan 2003 17:07:05 -0800 (PST)
	(envelope-from keramida@freebsd.org)
Received: from gothmog.gr (patr530-b162.otenet.gr [212.205.244.170])
	by mailsrv.otenet.gr (8.12.6/8.12.6) with ESMTP id h0S16dBb023275;
	Tue, 28 Jan 2003 03:07:00 +0200 (EET)
Received: from gothmog.gr (gothmog [127.0.0.1])
	by gothmog.gr (8.12.6/8.12.6) with ESMTP id h0S16JVF003628;
	Tue, 28 Jan 2003 03:06:19 +0200 (EET)
	(envelope-from keramida@freebsd.org)
Received: (from giorgos@localhost)
	by gothmog.gr (8.12.6/8.12.6/Submit) id h0S16JOW003627;
	Tue, 28 Jan 2003 03:06:19 +0200 (EET)
	(envelope-from keramida@freebsd.org)
Date: Tue, 28 Jan 2003 03:06:19 +0200
From: Giorgos Keramidas <keramida@freebsd.org>
To: Terry Lambert <tlambert2@mindspring.com>
Cc: Doug Barton <DougB@freebsd.org>, fs@freebsd.org
Subject: Re: Current problem reports assigned to you
Message-ID: <20030128010618.GA3598@gothmog.gr>
References: <200301271901.h0RJ1ZCb068937@freefall.freebsd.org> <3E35CF66.58143561@mindspring.com> <20030127164314.T1027@12-234-22-23.pyvrag.nggov.pbz> <3E35D281.4F6EDDBE@mindspring.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3E35D281.4F6EDDBE@mindspring.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On 2003-01-27 16:44, Terry Lambert <tlambert2@mindspring.com> wrote:
> Doug Barton wrote:
> > > Could someone point this PR at someone who cares to try and fix
> > > it (e.g. the original poster of the bug), instead of at the
> > > FreeBSD-FS mailing list?
> >
> > Done.
>
> Thank you.  You ar a god.

A fast god too.  I had oonly just read the message and run query-pr,
only to find the PR reassigned already :)

Thanks Doug.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Jan 27 20:35:13 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 0AC7037B401; Mon, 27 Jan 2003 20:35:11 -0800 (PST)
Received: from obsecurity.dyndns.org (adsl-64-169-104-205.dsl.lsan03.pacbell.net [64.169.104.205])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 6BFD743F3F; Mon, 27 Jan 2003 20:35:04 -0800 (PST)
	(envelope-from kris@obsecurity.org)
Received: from rot13.obsecurity.org (rot13.obsecurity.org [10.0.0.5])
	by obsecurity.dyndns.org (Postfix) with ESMTP
	id 41AE267872; Mon, 27 Jan 2003 20:35:03 -0800 (PST)
Received: by rot13.obsecurity.org (Postfix, from userid 1000)
	id 35ED9171F; Mon, 27 Jan 2003 20:35:03 -0800 (PST)
Date: Mon, 27 Jan 2003 20:35:03 -0800
From: Kris Kennaway <kris@obsecurity.org>
To: Kris Kennaway <kris@obsecurity.org>
Cc: current@FreeBSD.ORG, fs@FreeBSD.ORG
Subject: Re: INVARIANTS-related fs panic on alpha
Message-ID: <20030128043503.GA902@rot13.obsecurity.org>
References: <20030125081234.GA11722@rot13.obsecurity.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="qDbXVdCdHGoSgWSk"
Content-Disposition: inline
In-Reply-To: <20030125081234.GA11722@rot13.obsecurity.org>
User-Agent: Mutt/1.4i
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org


--qDbXVdCdHGoSgWSk
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Jan 25, 2003 at 12:12:34AM -0800, Kris Kennaway wrote:
> One of the alpha package clients panicked with this.  It was under
> very high load at the time (25 simultaneous package builds):
>=20
> fatal kernel trap:
>=20
>     trap entry     =3D 0x2 (memory management fault)
>     faulting va    =3D 0xdeadc0dedeadc0e6
>     type           =3D access violation
>     cause          =3D store instruction
>     pc             =3D 0xfffffc000053453c
>     ra             =3D 0xfffffc000053b2a8
>     sp             =3D 0xfffffe001da15b30
>     curthread      =3D 0xfffffc003e33b930
>         pid =3D 3, comm =3D g_up
>=20
> Stopped at      add_to_worklist+0xac:   stq     a0,0x8(t0) <0xdeadc0dedea=
dc0e6> <a0=3D0xfffffc0035deb200,t0=3D0xdeadc0dedeadc0de>
> db> trace
> add_to_worklist() at add_to_worklist+0xac
> handle_written_inodeblock() at handle_written_inodeblock+0x5e8
> softdep_disk_write_complete() at softdep_disk_write_complete+0xac
> bufdone() at bufdone+0x19c
> bufdonebio() at bufdonebio+0x1c
> biodone() at biodone+0x28
> g_dev_done() at g_dev_done+0xd8
> biodone() at biodone+0x28
> g_io_schedule_up() at g_io_schedule_up+0x4c
> g_up_procbody() at g_up_procbody+0x9c
> fork_exit() at fork_exit+0x100
> exception_return() at exception_return
> --- root of call graph ---
> db>

Here it is again:

fatal kernel trap:

    trap entry     =3D 0x4 (unaligned access fault)
    faulting va    =3D 0xdeadc0dedeadc0e6
    opcode         =3D 0x2d
    register       =3D 0x10
    pc             =3D 0xfffffc0000534540
    ra             =3D 0xfffffc000053b2a8
    sp             =3D 0xfffffe0006c0fb30
    curthread      =3D 0xfffffc0007ba7930
        pid =3D 3, comm =3D g_up

Stopped at      add_to_worklist+0xb0:   ldq     t0,0x7c60(gp) <0xfffffc0000=
6581d0>      <t0=3D0xdeadc0dedeadc0de,gp=3D0xfffffc0000650570>
db> trace
add_to_worklist() at add_to_worklist+0xb0
handle_written_inodeblock() at handle_written_inodeblock+0x5e8
softdep_disk_write_complete() at softdep_disk_write_complete+0xac
bufdone() at bufdone+0x19c
bufdonebio() at bufdonebio+0x1c
biodone() at biodone+0x28
g_dev_done() at g_dev_done+0xd8
biodone() at biodone+0x28
g_io_schedule_up() at g_io_schedule_up+0x4c
g_up_procbody() at g_up_procbody+0x9c
fork_exit() at fork_exit+0x100
exception_return() at exception_return
--- root of call graph ---
db>


--qDbXVdCdHGoSgWSk
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (FreeBSD)

iD8DBQE+Ngh2Wry0BWjoQKURAqiIAKCqGmPByHp3Dx2DyyjDGB/hQwUoAACggrtB
Nd8nsNkuPzG/fntL4bmpILg=
=uMly
-----END PGP SIGNATURE-----

--qDbXVdCdHGoSgWSk--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Jan 28  3:21:51 2003
Delivered-To: freebsd-fs@freebsd.org
Received: by hub.freebsd.org (Postfix, from userid 931)
	id A7F2037B401; Tue, 28 Jan 2003 03:21:50 -0800 (PST)
Date: Tue, 28 Jan 2003 03:21:50 -0800
From: Juli Mallett <jmallett@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: Adrian Chadd <adrian@FreeBSD.org>
Subject: Filesystem names with non-alphanum characters?
Message-ID: <20030128032150.B45888@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5.1i
Organisation: The FreeBSD Project <http://FreeBSD.org>
X-Alternate-Addresses: <jmallett@NewGold.NET>, <jmallett@xMach.org>, <juli@jerkcity.com>, <flata@toxic.magnesium.net>, <jmallett@OpenDarwin.org>
X-Towel: Yes
X-LiveJournal: flata, jmallett
X-Negacore: Yes
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Does anyone have a FreeBSD filesystem with a VFS type name which
contains a space?  Any other characters that are not alphanumeric?
fsck currently has a hack to look for fsck_foo_bar for the vfstype
"foo bar", but does not handle other things, and I am not sure if
there is a use for this special case at all, or if there is if we
should handle a larger set of transposition.  Even the one case I
thought maybe would exist ("4.2 ufs" or "4.2BSD ufs") does not, and
the analogue to it ("4.2bsd") has no space, and there is no fsck to
support that with a space ("fsck_4.2_bsd").

Anyone with input on this would be very welcome to speak up.  We've
had it since we got this stuff from NetBSD, sorta.  It appears to be
something Adrian added when converting it to our VFS system, so I'm
willing to write off that it was a "this is a good idea" change, since
our VFS system might not guarantee no spaces, but if it isn't something
useful, then it might be a good idea to remove it, as we certainly
don't actually try to make a name we can use, we just handle one small
case.

If nothing comes inre this, I may try to borrow phk's Danish axe for
application to this code, otherwise I will try to make it more general
purpose.

Thanx,
juli.
-- 
Juli Mallett <jmallett@FreeBSD.org>
AIM: BSDFlata -- IRC: juli on EFnet
OpenDarwin, Mono, FreeBSD Developer
ircd-hybrid Developer, EFnet addict
FreeBSD on MIPS-Anything on FreeBSD

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Jan 28 14: 6: 8 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7BA5B37B401
	for <fs@freebsd.org>; Tue, 28 Jan 2003 14:06:07 -0800 (PST)
Received: from tolkor.sgi.com (tolkor.sgi.com [198.149.18.6])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8F70F43E4A
	for <fs@freebsd.org>; Tue, 28 Jan 2003 14:06:06 -0800 (PST)
	(envelope-from cattelan@thebarn.com)
Received: from ledzep.americas.sgi.com (ledzep.americas.sgi.com [192.48.203.134])
	by tolkor.sgi.com (8.12.2/8.12.2/linux-outbound_gateway-1.2) with ESMTP id h0SKB5kq019782
	for <fs@freebsd.org>; Tue, 28 Jan 2003 14:11:05 -0600
Received: from daisy-e236.americas.sgi.com (daisy-e236.americas.sgi.com [128.162.236.214]) by ledzep.americas.sgi.com (SGI-8.9.3/americas-smart-nospam1.1) with ESMTP id OAA85232 for <fs@freebsd.org>; Tue, 28 Jan 2003 14:02:44 -0600 (CST)
Received: from [128.162.233.73] (naboo.americas.sgi.com [128.162.233.73]) by daisy-e236.americas.sgi.com (SGI-8.9.3/SGI-server-1.8) with ESMTP id OAA30079 for <fs@freebsd.org>; Tue, 28 Jan 2003 14:02:44 -0600 (CST)
Subject: restricted blocks?
From: Russell Cattelan <cattelan@thebarn.com>
To: fs@freebsd.org
Content-Type: text/plain
Organization: 
Message-Id: <1043784569.20928.58.camel@naboo.americas.sgi.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.0 
Date: 28 Jan 2003 14:09:30 -0600
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Does anybody have quick answer as to why block 1 isn't writable?

naboo[6:24pm]#dd if=/dev/zero of=/dev/da1s1d bs=512 oseek=0 count=1
1+0 records in
1+0 records out
512 bytes transferred in 0.000575 secs (890518 bytes/sec)
naboo[6:24pm]#dd if=/dev/zero of=/dev/da1s1d bs=512 oseek=1 count=1
dd: /dev/da1s1d: Operation not permitted
1+0 records in
0+0 records out
0 bytes transferred in 0.000819 secs (0 bytes/sec)
naboo[6:24pm]#dd if=/dev/zero of=/dev/da1s1d bs=512 oseek=2 count=1
1+0 records in
1+0 records out
512 bytes transferred in 0.000507 secs (1009630 bytes/sec)

XFS uses this location for one of it's meta data block.

-- 
Russell Cattelan <cattelan@thebarn.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Jan 28 15:12:35 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7792837B401
	for <fs@freebsd.org>; Tue, 28 Jan 2003 15:12:34 -0800 (PST)
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9333B43F43
	for <fs@freebsd.org>; Tue, 28 Jan 2003 15:12:33 -0800 (PST)
	(envelope-from phk@freebsd.org)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h0SNCQZE000900;
	Wed, 29 Jan 2003 00:12:32 +0100 (CET)
	(envelope-from phk@freebsd.org)
To: Russell Cattelan <cattelan@thebarn.com>
Cc: fs@freebsd.org
Subject: Re: restricted blocks? 
From: phk@freebsd.org
In-Reply-To: Your message of "28 Jan 2003 14:09:30 CST."
             <1043784569.20928.58.camel@naboo.americas.sgi.com> 
Date: Wed, 29 Jan 2003 00:12:26 +0100
Message-ID: <899.1043795546@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

In message <1043784569.20928.58.camel@naboo.americas.sgi.com>, Russell Cattelan
 writes:
>Does anybody have quick answer as to why block 1 isn't writable?

In all likelyhood your 'd' partition starts at offset zero and
therefore the second sector contains the disklabel which the
kernel will not allow you to overwrite.

Use disklabel -e to change to size of the 'd' partiton down by 
16, and set the offset to 16 and you should have no trouble.

Poul-Henning

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Jan 28 21: 4: 8 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 057FF37B401
	for <fs@FreeBSD.ORG>; Tue, 28 Jan 2003 21:04:07 -0800 (PST)
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4A4AC43F9B
	for <fs@FreeBSD.ORG>; Tue, 28 Jan 2003 21:04:05 -0800 (PST)
	(envelope-from bde@zeta.org.au)
Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246])
	by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id QAA01383;
	Wed, 29 Jan 2003 16:03:51 +1100
Date: Wed, 29 Jan 2003 16:05:56 +1100 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Russell Cattelan <cattelan@thebarn.com>
Cc: fs@FreeBSD.ORG
Subject: Re: restricted blocks?
In-Reply-To: <1043784569.20928.58.camel@naboo.americas.sgi.com>
Message-ID: <20030129160134.O31111-100000@gamplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On 28 Jan 2003, Russell Cattelan wrote:

> Does anybody have quick answer as to why block 1 isn't writable?
>
> naboo[6:24pm]#dd if=/dev/zero of=/dev/da1s1d bs=512 oseek=0 count=1
> 1+0 records in
> 1+0 records out
> 512 bytes transferred in 0.000575 secs (890518 bytes/sec)
> naboo[6:24pm]#dd if=/dev/zero of=/dev/da1s1d bs=512 oseek=1 count=1
> dd: /dev/da1s1d: Operation not permitted
> 1+0 records in
> 0+0 records out
> 0 bytes transferred in 0.000819 secs (0 bytes/sec)
> naboo[6:24pm]#dd if=/dev/zero of=/dev/da1s1d bs=512 oseek=2 count=1
> 1+0 records in
> 1+0 records out
> 512 bytes transferred in 0.000507 secs (1009630 bytes/sec)
>
> XFS uses this location for one of it's meta data block.

Most likely block 1 has a disk label on it.

The errno for this has apparently regressed from EROFS to EPERM.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31  8:30:51 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 96FF237B401
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 08:30:50 -0800 (PST)
Received: from mcomail02.maxtor.com (mcomail02.maxtor.com [134.6.76.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP id AF07943F43
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 08:30:49 -0800 (PST)
	(envelope-from stephen_byan@maxtor.com)
Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1])
	by mcomail02.maxtor.com (8.11.6/8.11.6) with ESMTP id h0VGOOX00470;
	Fri, 31 Jan 2003 09:24:24 -0700
Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id D4XRKRXR; Fri, 31 Jan 2003 09:30:47 -0700
Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM)
	id LAA0000001462; Fri, 31 Jan 2003 11:30:20 -0500 (EST)
Date: Fri, 31 Jan 2003 11:30:18 -0500
Mime-Version: 1.0 (Apple Message framework v551)
Content-Type: text/plain; charset=US-ASCII; format=flowed
Subject: DEV_B_SIZE
From: Steve Byan <stephen_byan@maxtor.com>
To: freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org
Content-Transfer-Encoding: 7bit
Message-Id: <4912E0FE-3539-11D7-B26B-00306548867E@maxtor.com>
X-Mailer: Apple Mail (2.551)
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

There's a notion afoot in IDEMA to enlarge the underlying physical 
block size of disks to 4096 bytes while keeping a 512-byte logical 
block size for the interface. Unaligned accesses would involve either a 
read-modify-write or some proprietary mechanism that provides 
persistence without the latency cost of a read-modify-write.

Performance issues aside, it occurs to me that hiding the underlying 
physical block size may break many careful-write and 
transaction-logging mechanisms, which may depend on no more than one 
block being corrupted during a failure. In IDEMA's proposal, a power 
failure during a write of a single 512-byte logical block could result 
in the corruption of the full 4K block, i.e. reads of any of the 
512-byte logical blocks in that 4K physical block  would return an 
uncorrectable ECC error.

I'd appreciate hearing examples where hiding the underlying physical 
block size would break a file system, database, transaction processing 
monitor, or whatever.  Please let me know if I may forward your reply 
to the committee. Thanks.

Regards,
-Steve
--------
Steve Byan <stephen_byan@maxtor.com>
Design Engineer
Maxtor Corp.
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
(508) 770-3414


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31  8:51: 0 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 55BE737B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 08:50:59 -0800 (PST)
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 398EA43F43
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 08:50:58 -0800 (PST)
	(envelope-from phk@freebsd.org)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h0VGor4W002640;
	Fri, 31 Jan 2003 17:50:54 +0100 (CET)
	(envelope-from phk@freebsd.org)
To: Steve Byan <stephen_byan@maxtor.com>
Cc: freebsd-fs@freebsd.org, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE 
From: phk@freebsd.org
In-Reply-To: Your message of "Fri, 31 Jan 2003 11:30:18 EST."
             <4912E0FE-3539-11D7-B26B-00306548867E@maxtor.com> 
Date: Fri, 31 Jan 2003 17:50:53 +0100
Message-ID: <2639.1044031853@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

In message <4912E0FE-3539-11D7-B26B-00306548867E@maxtor.com>, Steve Byan writes
:

>I'd appreciate hearing examples where hiding the underlying physical 
>block size would break a file system, database, transaction processing 
>monitor, or whatever.  Please let me know if I may forward your reply 
>to the committee. Thanks.

If by "hide" you mean that there will be no way to discover the
smallest atomic unit of writes, then you are right: it would be bad.

Provided we can get the size of the smallest atomic unit of writes
in a standardized, documented, mandatory way, we will have no problem
coping with it: Using a 4k size is no problem for our current
filesystem technologies and device sizes.

It was my impression that already many drives write entire tracks
as atomic units, at least we have had plenty of anecdotal evidence
to this effect ?

Poul-Henning

(FreeBSD's disk-I/O wizard)

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31  9: 4: 2 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id C6E5637B401; Fri, 31 Jan 2003 09:04:00 -0800 (PST)
Received: from mcomail01.maxtor.com (mcomail01.maxtor.com [134.6.76.15])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 47CD243E4A; Fri, 31 Jan 2003 09:03:59 -0800 (PST)
	(envelope-from stephen_byan@maxtor.com)
Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1])
	by mcomail01.maxtor.com (8.11.6/8.11.6) with ESMTP id h0VGrUI31927;
	Fri, 31 Jan 2003 09:53:30 -0700
Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id D4XRKTKZ; Fri, 31 Jan 2003 10:03:58 -0700
Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM)
	id MAA0000001510; Fri, 31 Jan 2003 12:03:46 -0500 (EST)
Date: Fri, 31 Jan 2003 12:03:44 -0500
Subject: Re: DEV_B_SIZE 
Content-Type: text/plain; charset=US-ASCII; format=flowed
Mime-Version: 1.0 (Apple Message framework v551)
Cc: freebsd-fs@freebsd.org, tech-kern@netbsd.org
To: phk@freebsd.org
From: Steve Byan <stephen_byan@maxtor.com>
In-Reply-To: <2639.1044031853@critter.freebsd.dk>
Message-Id: <F4D99E08-353D-11D7-B26B-00306548867E@maxtor.com>
Content-Transfer-Encoding: 7bit
X-Mailer: Apple Mail (2.551)
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org


On Friday, January 31, 2003, at 11:50  AM, phk@freebsd.org wrote:

> In message <4912E0FE-3539-11D7-B26B-00306548867E@maxtor.com>, Steve 
> Byan writes
> :
>
>> I'd appreciate hearing examples where hiding the underlying physical
>> block size would break a file system, database, transaction processing
>> monitor, or whatever.  Please let me know if I may forward your reply
>> to the committee. Thanks.
>
> If by "hide" you mean that there will be no way to discover the
> smallest atomic unit of writes, then you are right: it would be bad.

The notion is that such a disk would be instantly-compatible with 
existing software, modulo performance issues. I suspect this is not the 
case, and am searching for expert opinions in this matter.

> Provided we can get the size of the smallest atomic unit of writes
> in a standardized, documented, mandatory way, we will have no problem
> coping with it: Using a 4k size is no problem for our current
> filesystem technologies and device sizes.

Yes, I understand recompiling the world for 4K is possible. My question 
is whether not doing so poses a data-integrity / fail-recovery risk.

> It was my impression that already many drives write entire tracks
> as atomic units, at least we have had plenty of anecdotal evidence
> to this effect ?

I'm not aware of any SCSI or ATA disks which do this; certainly no 
Maxtor disk does. Count-key-data mainframe disks can be formatted to do 
so, but such disks probably don't run Unix. Caching in ATA disks might 
lead one to believe that the disk could corrupt an entire track, in the 
sense that a panic ( aka bluescreen) or a power-failure would cause all 
pending writes in its buffer to be lost, but even in ATA-land I don't 
believe a power failure would result in more than one disk block 
returning an uncorrectable read error.

Regards,
-Steve
--------
Steve Byan <stephen_byan@maxtor.com>
Design Engineer
Maxtor Corp.
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
(508) 770-3414


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31  9:18: 9 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 451C237B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 09:18:08 -0800 (PST)
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 65E6043F79
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 09:18:07 -0800 (PST)
	(envelope-from phk@freebsd.org)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h0VHI64W002904;
	Fri, 31 Jan 2003 18:18:06 +0100 (CET)
	(envelope-from phk@freebsd.org)
To: Steve Byan <stephen_byan@maxtor.com>
Cc: freebsd-fs@freebsd.org, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE 
From: phk@freebsd.org
In-Reply-To: Your message of "Fri, 31 Jan 2003 12:03:44 EST."
             <F4D99E08-353D-11D7-B26B-00306548867E@maxtor.com> 
Date: Fri, 31 Jan 2003 18:18:06 +0100
Message-ID: <2903.1044033486@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

In message <F4D99E08-353D-11D7-B26B-00306548867E@maxtor.com>, Steve Byan writes
:
>
>On Friday, January 31, 2003, at 11:50  AM, phk@freebsd.org wrote:
>
>> In message <4912E0FE-3539-11D7-B26B-00306548867E@maxtor.com>, Steve 
>> Byan writes
>> :
>>
>>> I'd appreciate hearing examples where hiding the underlying physical
>>> block size would break a file system, database, transaction processing
>>> monitor, or whatever.  Please let me know if I may forward your reply
>>> to the committee. Thanks.
>>
>> If by "hide" you mean that there will be no way to discover the
>> smallest atomic unit of writes, then you are right: it would be bad.
>
>The notion is that such a disk would be instantly-compatible with 
>existing software, modulo performance issues. I suspect this is not the 
>case, and am searching for expert opinions in this matter.

I'm fine with that, as long as the disk somewhere in a data field
we can query (if need be with a new request) exposes the smallest
atomically writable unit.

The only thing that exposes us to risk is we don't know the risk
exists, so as long as the fact that a 4k physical sector size is
used is not hidden from us, we can adapt.

>Yes, I understand recompiling the world for 4K is possible. My question 
>is whether not doing so poses a data-integrity / fail-recovery risk.

Nope.

>> It was my impression that already many drives write entire tracks
>> as atomic units, at least we have had plenty of anecdotal evidence
>> to this effect ?
>
>I'm not aware of any SCSI or ATA disks which do this; certainly no 
>Maxtor disk does.

Ok, that is nice to know.

And yes, we've had our trouble with write caches.

Poul-Henning

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31  9:55:14 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6E2B437B401
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 09:55:13 -0800 (PST)
Received: from host213-122-85-204.in-addr.btopenworld.com (host213-122-85-204.in-addr.btopenworld.com [213.122.85.204])
	by mx1.FreeBSD.org (Postfix) with ESMTP id A6C2443E4A
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 09:55:05 -0800 (PST)
	(envelope-from dsl@l8s.co.uk)
Received: (from dsl@localhost)
	by snowdrop.l8s.co.uk (8.11.6/8.11.6) id h0VHxI547889;
	Fri, 31 Jan 2003 17:59:18 GMT
Date: Fri, 31 Jan 2003 17:59:17 +0000
From: David Laight <david@l8s.co.uk>
To: Steve Byan <stephen_byan@maxtor.com>
Cc: freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
Message-ID: <20030131175917.E1487@snowdrop.l8s.co.uk>
References: <4912E0FE-3539-11D7-B26B-00306548867E@maxtor.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5.1i
In-Reply-To: <4912E0FE-3539-11D7-B26B-00306548867E@maxtor.com>; from stephen_byan@maxtor.com on Fri, Jan 31, 2003 at 11:30:18AM -0500
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Fri, Jan 31, 2003 at 11:30:18AM -0500, Steve Byan wrote:
> There's a notion afoot in IDEMA to enlarge the underlying physical 
> block size of disks to 4096 bytes while keeping a 512-byte logical 
> block size for the interface. Unaligned accesses would involve either a 
> read-modify-write or some proprietary mechanism that provides 
> persistence without the latency cost of a read-modify-write.

There probably ought to be a way of making the larger physical
size visible to systems that are willing to support larger
block sizes.  That way misaligned transfers would be far less
likely.

One problem to consider is that disks are still partitioned
on cylinder boundaries.  This is largely historic but isn't
this doen't actually make much sense, since the geometry
almost certainly varies across the disk and has to be faked
to fit the ATA CHS limits and (on PCs) the BIOS interface.

However what it does mean is that a partition could easily
not start on a 8 (512 byte) sector boundary.
So misaligned transefers are likely even if the filesystem
itself is using 4k blocks.

On a PC the partitioning will typically have the first one
starting in sector 63, and the others at multiple of 16065
sectors from the start of the disk).

This doesn't bode well for getting any aligned transfer
at all.

	David

-- 
David Laight: david@l8s.co.uk

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 10:16:46 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 971FA37B405
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 10:16:44 -0800 (PST)
Received: from rwcrmhc52.attbi.com (rwcrmhc52.attbi.com [216.148.227.88])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B1BA443F79
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 10:16:43 -0800 (PST)
	(envelope-from julian@elischer.org)
Received: from InterJet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4])
          by rwcrmhc52.attbi.com (rwcrmhc52) with ESMTP
          id <2003013118164305200du3dee>; Fri, 31 Jan 2003 18:16:43 +0000
Received: from localhost (localhost.elischer.org [127.0.0.1])
	by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id KAA45382;
	Fri, 31 Jan 2003 10:16:42 -0800 (PST)
Date: Fri, 31 Jan 2003 10:16:41 -0800 (PST)
From: Julian Elischer <julian@elischer.org>
To: Steve Byan <stephen_byan@maxtor.com>
Cc: freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
In-Reply-To: <4912E0FE-3539-11D7-B26B-00306548867E@maxtor.com>
Message-ID: <Pine.BSF.4.21.0301311002110.45015-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org



On Fri, 31 Jan 2003, Steve Byan wrote:

> There's a notion afoot in IDEMA to enlarge the underlying physical 
> block size of disks to 4096 bytes while keeping a 512-byte logical 
> block size for the interface. Unaligned accesses would involve either a 
> read-modify-write or some proprietary mechanism that provides 
> persistence without the latency cost of a read-modify-write.
> 
> Performance issues aside, it occurs to me that hiding the underlying 
> physical block size may break many careful-write and 
> transaction-logging mechanisms, which may depend on no more than one 
> block being corrupted during a failure. In IDEMA's proposal, a power 
> failure during a write of a single 512-byte logical block could result 
> in the corruption of the full 4K block, i.e. reads of any of the 
> 512-byte logical blocks in that 4K physical block  would return an 
> uncorrectable ECC error.
> 
> I'd appreciate hearing examples where hiding the underlying physical 
> block size would break a file system, database, transaction processing 
> monitor, or whatever.  Please let me know if I may forward your reply 
> to the committee. Thanks.

I presume that if such a drive were made, thre would be some way to
identify it?

It would be very easy to configure a filesystem to have a minimum
writable unit size of 4k, and I assume that doing so would be 
slightly advantageous. (no Read/modify/write). it would however 
be good if we could easily identify when doing so was a good idea.

Another idea would be to have some way that you could specify a block
number and have teh drive tell you the first in the same group.. That
would allow a filesystem to work out the alignment. It may not be able
to access absolute block numbers, if it's going through some layers of
translation, and some way of saying "am I alligned?" might be useful.

One thing that does come to mind is that as you say, on power fail we
would now be liable to lose a group of 8 sectors (4k) instead of 1 x 512
byte sector.

Recovery algorythms might have to deal with this (should we actually
decide to write one.. :-).

Particularly if the block being written was the 1st, but the other 7
blocks contain data that the OS has no way of knowing that they are in
jeopardy. In other words, I might know that block 1 is in danger and put
it in a write log, (in a logging filesystem) but I have no way of
knowing that the other 7 are in danger, so they may not be in the write
log (assuming thAat the write log only holds the last N transactions.).
I'd say that this means that the drive should hold the active 4k block
in nvram or something..

You seem to have considered this but I'm in agreement that it could
prove "nasty" in exactly the cases that are most important..
people use write logging etc. in cases where they care about the data
and recovery time. these are exactly the people who are going to be the 
most pissed off to lose their data. ..

If we can easily telll the system to use 4k frags or 4k blocknumbers
(i.e. we can elect to expose the real blocksize) then we are probably
in better shape.








To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 10:41:52 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 3DC5A37B401; Fri, 31 Jan 2003 10:41:50 -0800 (PST)
Received: from mcomail02.maxtor.com (mcomail02.maxtor.com [134.6.76.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 683B243F93; Fri, 31 Jan 2003 10:41:49 -0800 (PST)
	(envelope-from stephen_byan@maxtor.com)
Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1])
	by mcomail02.maxtor.com (8.11.6/8.11.6) with ESMTP id h0VIZQU16809;
	Fri, 31 Jan 2003 11:35:26 -0700
Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id D4XRKX44; Fri, 31 Jan 2003 11:41:48 -0700
Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM)
	id NAA0000002101; Fri, 31 Jan 2003 13:41:36 -0500 (EST)
Date: Fri, 31 Jan 2003 13:41:35 -0500
Subject: Re: DEV_B_SIZE 
Content-Type: text/plain; charset=US-ASCII; format=flowed
Mime-Version: 1.0 (Apple Message framework v551)
Cc: freebsd-fs@freebsd.org, tech-kern@netbsd.org
To: phk@freebsd.org
From: Steve Byan <stephen_byan@maxtor.com>
In-Reply-To: <2903.1044033486@critter.freebsd.dk>
Message-Id: <A02737C6-354B-11D7-B26B-00306548867E@maxtor.com>
Content-Transfer-Encoding: 7bit
X-Mailer: Apple Mail (2.551)
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org


On Friday, January 31, 2003, at 12:18  PM, phk@freebsd.org wrote:

> In message <F4D99E08-353D-11D7-B26B-00306548867E@maxtor.com>, Steve 
> Byan writes
> :
>>
>> On Friday, January 31, 2003, at 11:50  AM, phk@freebsd.org wrote:
>>
>>> In message <4912E0FE-3539-11D7-B26B-00306548867E@maxtor.com>, Steve
>>> Byan writes
>>> :
>>>
>>>> I'd appreciate hearing examples where hiding the underlying physical
>>>> block size would break a file system, database, transaction 
>>>> processing
>>>> monitor, or whatever.  Please let me know if I may forward your 
>>>> reply
>>>> to the committee. Thanks.
>>>
>>> If by "hide" you mean that there will be no way to discover the
>>> smallest atomic unit of writes, then you are right: it would be bad.
>>
>> The notion is that such a disk would be instantly-compatible with
>> existing software, modulo performance issues. I suspect this is not 
>> the
>> case, and am searching for expert opinions in this matter.
>
> I'm fine with that, as long as the disk somewhere in a data field
> we can query (if need be with a new request) exposes the smallest
> atomically writable unit.
>
> The only thing that exposes us to risk is we don't know the risk
> exists, so as long as the fact that a 4k physical sector size is
> used is not hidden from us, we can adapt.

But would existing code be functionally broken (perhaps with respect to 
failure recovery) if it were to not be modified to adapt to a different 
physical block size?

>
>> Yes, I understand recompiling the world for 4K is possible. My 
>> question
>> is whether not doing so poses a data-integrity / fail-recovery risk.
>
> Nope.

Really? fsck can recover from losing 4K bytes surrounding the last 
metadata block written?

Regards,
-Steve
--------
Steve Byan <stephen_byan@maxtor.com>
Design Engineer
Maxtor Corp.
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
(508) 770-3414


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 10:45:38 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D10B037B401
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 10:45:36 -0800 (PST)
Received: from mcomail01.maxtor.com (mcomail01.maxtor.com [134.6.76.15])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3869343F3F
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 10:45:36 -0800 (PST)
	(envelope-from stephen_byan@maxtor.com)
Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1])
	by mcomail01.maxtor.com (8.11.6/8.11.6) with ESMTP id h0VIZ2F32011;
	Fri, 31 Jan 2003 11:35:02 -0700
Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id D4XRKXY8; Fri, 31 Jan 2003 11:45:30 -0700
Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM)
	id NAA0000001943; Fri, 31 Jan 2003 13:45:04 -0500 (EST)
Date: Fri, 31 Jan 2003 13:45:03 -0500
Subject: Re: DEV_B_SIZE
Content-Type: text/plain; charset=US-ASCII; format=flowed
Mime-Version: 1.0 (Apple Message framework v551)
Cc: freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org
To: David Laight <david@l8s.co.uk>
From: Steve Byan <stephen_byan@maxtor.com>
In-Reply-To: <20030131175917.E1487@snowdrop.l8s.co.uk>
Message-Id: <1BBFD4B2-354C-11D7-B26B-00306548867E@maxtor.com>
Content-Transfer-Encoding: 7bit
X-Mailer: Apple Mail (2.551)
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org


On Friday, January 31, 2003, at 12:59  PM, David Laight wrote:

> On Fri, Jan 31, 2003 at 11:30:18AM -0500, Steve Byan wrote:
>> There's a notion afoot in IDEMA to enlarge the underlying physical
>> block size of disks to 4096 bytes while keeping a 512-byte logical
>> block size for the interface. Unaligned accesses would involve either 
>> a
>> read-modify-write or some proprietary mechanism that provides
>> persistence without the latency cost of a read-modify-write.
>
> There probably ought to be a way of making the larger physical
> size visible to systems that are willing to support larger
> block sizes.  That way misaligned transfers would be far less
> likely.

Yes, of course. But I asked with respect to an issue other than 
performance.
>
> One problem to consider is that disks are still partitioned
> on cylinder boundaries.  This is largely historic but isn't
> this doen't actually make much sense, since the geometry
> almost certainly varies across the disk and has to be faked
> to fit the ATA CHS limits and (on PCs) the BIOS interface.
>
> However what it does mean is that a partition could easily
> not start on a 8 (512 byte) sector boundary.
> So misaligned transefers are likely even if the filesystem
> itself is using 4k blocks.
>
> On a PC the partitioning will typically have the first one
> starting in sector 63, and the others at multiple of 16065
> sectors from the start of the disk).
>
> This doesn't bode well for getting any aligned transfer
> at all.

We understand that problem. It's just a performance issue. My concern 
is that even if we handwave the performance issues, there's an 
underlying semantic that would not be satisfied if we were to run 
existing software, unmodified, on a disk with an underlying 4K sector 
size.

Regards,
-Steve
--------
Steve Byan <stephen_byan@maxtor.com>
Design Engineer
Maxtor Corp.
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
(508) 770-3414


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 10:50:52 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 6841B37B401; Fri, 31 Jan 2003 10:50:51 -0800 (PST)
Received: from host213-122-194-66.in-addr.btopenworld.com (host213-122-194-66.in-addr.btopenworld.com [213.122.194.66])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id A647C43F43; Fri, 31 Jan 2003 10:50:48 -0800 (PST)
	(envelope-from dsl@l8s.co.uk)
Received: (from dsl@localhost)
	by snowdrop.l8s.co.uk (8.11.6/8.11.6) id h0VIt7o08765;
	Fri, 31 Jan 2003 18:55:07 GMT
Date: Fri, 31 Jan 2003 18:55:07 +0000
From: David Laight <david@l8s.co.uk>
To: Steve Byan <stephen_byan@maxtor.com>
Cc: phk@freebsd.org, freebsd-fs@freebsd.org, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
Message-ID: <20030131185507.G1487@snowdrop.l8s.co.uk>
References: <2903.1044033486@critter.freebsd.dk> <A02737C6-354B-11D7-B26B-00306548867E@maxtor.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5.1i
In-Reply-To: <A02737C6-354B-11D7-B26B-00306548867E@maxtor.com>; from stephen_byan@maxtor.com on Fri, Jan 31, 2003 at 01:41:35PM -0500
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

> Really? fsck can recover from losing 4K bytes surrounding the last 
> metadata block written?

The only metadata that matter are the inodes and (for ffs) the
indirect blocks.  You do really want the latter to be single disk
blocks - many systems actually write them synchonously.

The inode is (probably) only 128 bytes, losing an inode block
will lose the other files.

A journaling filesystem probably already has ways around this...

	David

-- 
David Laight: david@l8s.co.uk

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 10:51:27 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E2E2B37B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 10:51:25 -0800 (PST)
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1511543F9B
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 10:51:25 -0800 (PST)
	(envelope-from phk@freebsd.org)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h0VIpO4W019188;
	Fri, 31 Jan 2003 19:51:24 +0100 (CET)
	(envelope-from phk@freebsd.org)
To: Steve Byan <stephen_byan@maxtor.com>
Cc: freebsd-fs@freebsd.org, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE 
From: phk@freebsd.org
In-Reply-To: Your message of "Fri, 31 Jan 2003 13:41:35 EST."
             <A02737C6-354B-11D7-B26B-00306548867E@maxtor.com> 
Date: Fri, 31 Jan 2003 19:51:24 +0100
Message-ID: <19187.1044039084@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

In message <A02737C6-354B-11D7-B26B-00306548867E@maxtor.com>, Steve Byan writes
:

>> The only thing that exposes us to risk is we don't know the risk
>> exists, so as long as the fact that a 4k physical sector size is
>> used is not hidden from us, we can adapt.
>
>But would existing code be functionally broken (perhaps with respect to 
>failure recovery) if it were to not be modified to adapt to a different 
>physical block size?

Not broken any worse than because of write-caching.

>> Nope.
>
>Really? fsck can recover from losing 4K bytes surrounding the last 
>metadata block written?

If the fragment size is 4k when the filsystem is created, and this
would happen automatically, then there is no window for lossage.

The thing we really need is working tagged-queing...

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 10:56: 9 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 86A7237B401
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 10:56:06 -0800 (PST)
Received: from apollo.email.starband.net (smtp2.starband.net [148.78.247.23])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 81A8943F75
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 10:56:05 -0800 (PST)
	(envelope-from jkirby@storagecraft.com)
Received: from jkirbydesk (vsat-148-63-114-177.c002.t7.mrt.starband.net [148.63.114.177])
	(authenticated bits=0)
	by apollo.email.starband.net (8.12.4/8.12.4) with ESMTP id h0VItoH5024439;
	Fri, 31 Jan 2003 13:55:55 -0500
Reply-To: <jkirby@storagecraft.com>
From: "Jamey Kirby" <jkirby@storagecraft.com>
To: "'Steve Byan'" <stephen_byan@maxtor.com>,
	"'David Laight'" <david@l8s.co.uk>
Cc: <freebsd-fs@FreeBSD.ORG>, <tech-kern@netbsd.org>
Subject: RE: DEV_B_SIZE
Date: Fri, 31 Jan 2003 10:55:47 -0800
Organization: StorageCraft
Message-ID: <001601c2c95a$63d52d70$0300a8c0@jkirbydesk>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook, Build 10.0.4024
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
Importance: Normal
In-Reply-To: <1BBFD4B2-354C-11D7-B26B-00306548867E@maxtor.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

I have been a lurker for years and want to chime in.

Under Windows NT (all flavors), using a 4K sector size works fine. The
OS abstraction layers are very good and handling the alignment.

I wrote a virtual SCSI disk driver (ATA is presented as SCSI to the NT
OS kernel) and experimented with all sorts of sector sizes to see how
various software would handle it. I found no problems... However, I
myself have written test code in the past that assumes 512 byte sectors
rather than reading the sector size from the OS. Surly this code would
break.

Jamey Kirby
StorageCraft

-----Original Message-----
From: owner-freebsd-fs@FreeBSD.ORG [mailto:owner-freebsd-fs@FreeBSD.ORG]
On Behalf Of Steve Byan
Sent: Friday, January 31, 2003 10:45 AM
To: David Laight
Cc: freebsd-fs@FreeBSD.ORG; tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE


On Friday, January 31, 2003, at 12:59  PM, David Laight wrote:

> On Fri, Jan 31, 2003 at 11:30:18AM -0500, Steve Byan wrote:
>> There's a notion afoot in IDEMA to enlarge the underlying physical
>> block size of disks to 4096 bytes while keeping a 512-byte logical
>> block size for the interface. Unaligned accesses would involve either

>> a
>> read-modify-write or some proprietary mechanism that provides
>> persistence without the latency cost of a read-modify-write.
>
> There probably ought to be a way of making the larger physical
> size visible to systems that are willing to support larger
> block sizes.  That way misaligned transfers would be far less
> likely.

Yes, of course. But I asked with respect to an issue other than 
performance.
>
> One problem to consider is that disks are still partitioned
> on cylinder boundaries.  This is largely historic but isn't
> this doen't actually make much sense, since the geometry
> almost certainly varies across the disk and has to be faked
> to fit the ATA CHS limits and (on PCs) the BIOS interface.
>
> However what it does mean is that a partition could easily
> not start on a 8 (512 byte) sector boundary.
> So misaligned transefers are likely even if the filesystem
> itself is using 4k blocks.
>
> On a PC the partitioning will typically have the first one
> starting in sector 63, and the others at multiple of 16065
> sectors from the start of the disk).
>
> This doesn't bode well for getting any aligned transfer
> at all.

We understand that problem. It's just a performance issue. My concern 
is that even if we handwave the performance issues, there's an 
underlying semantic that would not be satisfied if we were to run 
existing software, unmodified, on a disk with an underlying 4K sector 
size.

Regards,
-Steve
--------
Steve Byan <stephen_byan@maxtor.com>
Design Engineer
Maxtor Corp.
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
(508) 770-3414


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 10:56:47 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A98CA37B401
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 10:56:44 -0800 (PST)
Received: from mcomail02.maxtor.com (mcomail02.maxtor.com [134.6.76.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 29AA443F93
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 10:56:38 -0800 (PST)
	(envelope-from stephen_byan@maxtor.com)
Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1])
	by mcomail02.maxtor.com (8.11.6/8.11.6) with ESMTP id h0VIoBf22209;
	Fri, 31 Jan 2003 11:50:11 -0700
Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id D4XRKYFQ; Fri, 31 Jan 2003 11:56:33 -0700
Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM)
	id NAA0000002033; Fri, 31 Jan 2003 13:56:11 -0500 (EST)
Date: Fri, 31 Jan 2003 13:56:09 -0500
Subject: Re: DEV_B_SIZE
Content-Type: text/plain; charset=US-ASCII; format=flowed
Mime-Version: 1.0 (Apple Message framework v551)
Cc: freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org
To: Julian Elischer <julian@elischer.org>
From: Steve Byan <stephen_byan@maxtor.com>
In-Reply-To: <Pine.BSF.4.21.0301311002110.45015-100000@InterJet.elischer.org>
Message-Id: <A91AD932-354D-11D7-B26B-00306548867E@maxtor.com>
Content-Transfer-Encoding: 7bit
X-Mailer: Apple Mail (2.551)
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org


On Friday, January 31, 2003, at 01:16  PM, Julian Elischer wrote:

>
>
> On Fri, 31 Jan 2003, Steve Byan wrote:
>
>> There's a notion afoot in IDEMA to enlarge the underlying physical
>> block size of disks to 4096 bytes while keeping a 512-byte logical
>> block size for the interface. Unaligned accesses would involve either 
>> a
>> read-modify-write or some proprietary mechanism that provides
>> persistence without the latency cost of a read-modify-write.
>>
>> Performance issues aside, it occurs to me that hiding the underlying
>> physical block size may break many careful-write and
>> transaction-logging mechanisms, which may depend on no more than one
>> block being corrupted during a failure. In IDEMA's proposal, a power
>> failure during a write of a single 512-byte logical block could result
>> in the corruption of the full 4K block, i.e. reads of any of the
>> 512-byte logical blocks in that 4K physical block  would return an
>> uncorrectable ECC error.
>>
>> I'd appreciate hearing examples where hiding the underlying physical
>> block size would break a file system, database, transaction processing
>> monitor, or whatever.  Please let me know if I may forward your reply
>> to the committee. Thanks.
>
> I presume that if such a drive were made, thre would be some way to
> identify it?

Yes, but my concern is that advocates claim existing software could 
work (albeit slowly) with such a drive. It's hard to retroactively 
modify binaries installed in the field to adapt to a larger block size 
:-)
>
> It would be very easy to configure a filesystem to have a minimum
> writable unit size of 4k, and I assume that doing so would be
> slightly advantageous. (no Read/modify/write). it would however
> be good if we could easily identify when doing so was a good idea.

Yes, I've built and run OSF/1 on a system with 4K sector size; this was 
essentially BSD4.3. Modifying DEV_B_SIZE and recompiling the world was 
sufficient (well, actually the boot loader had to know the block size, 
and I needed a way to format the disks to 4K, and ...).
>
> Another idea would be to have some way that you could specify a block
> number and have teh drive tell you the first in the same group.. That
> would allow a filesystem to work out the alignment. It may not be able
> to access absolute block numbers, if it's going through some layers of
> translation, and some way of saying "am I alligned?" might be useful.
>
> One thing that does come to mind is that as you say, on power fail we
> would now be liable to lose a group of 8 sectors (4k) instead of 1 x 
> 512
> byte sector.
>
> Recovery algorythms might have to deal with this (should we actually
> decide to write one.. :-).
>
> Particularly if the block being written was the 1st, but the other 7
> blocks contain data that the OS has no way of knowing that they are in
> jeopardy. In other words, I might know that block 1 is in danger and 
> put
> it in a write log, (in a logging filesystem) but I have no way of
> knowing that the other 7 are in danger, so they may not be in the write
> log (assuming thAat the write log only holds the last N transactions.).
> I'd say that this means that the drive should hold the active 4k block
> in nvram or something..
>
> You seem to have considered this but I'm in agreement that it could
> prove "nasty" in exactly the cases that are most important..
> people use write logging etc. in cases where they care about the data
> and recovery time. these are exactly the people who are going to be the
> most pissed off to lose their data. ..

Thanks, may I forward your response on to the committee?
>
> If we can easily telll the system to use 4k frags or 4k blocknumbers
> (i.e. we can elect to expose the real blocksize) then we are probably
> in better shape.

I agree.

Regards,
-Steve
--------
Steve Byan <stephen_byan@maxtor.com>
Design Engineer
Maxtor Corp.
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
(508) 770-3414


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11: 1:18 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id D11C737B401; Fri, 31 Jan 2003 11:01:16 -0800 (PST)
Received: from mcomail01.maxtor.com (mcomail01.maxtor.com [134.6.76.15])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 3590F43F75; Fri, 31 Jan 2003 11:01:16 -0800 (PST)
	(envelope-from stephen_byan@maxtor.com)
Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1])
	by mcomail01.maxtor.com (8.11.6/8.11.6) with ESMTP id h0VIofM04700;
	Fri, 31 Jan 2003 11:50:41 -0700
Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id D4XRKYLV; Fri, 31 Jan 2003 12:01:08 -0700
Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM)
	id OAA0000002209; Fri, 31 Jan 2003 14:00:57 -0500 (EST)
Date: Fri, 31 Jan 2003 14:00:55 -0500
Subject: Re: DEV_B_SIZE
Content-Type: text/plain; charset=US-ASCII; format=flowed
Mime-Version: 1.0 (Apple Message framework v551)
Cc: phk@freebsd.org, freebsd-fs@freebsd.org, tech-kern@netbsd.org
To: David Laight <david@l8s.co.uk>
From: Steve Byan <stephen_byan@maxtor.com>
In-Reply-To: <20030131185507.G1487@snowdrop.l8s.co.uk>
Message-Id: <538478DE-354E-11D7-B26B-00306548867E@maxtor.com>
Content-Transfer-Encoding: 7bit
X-Mailer: Apple Mail (2.551)
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org


On Friday, January 31, 2003, at 01:55  PM, David Laight wrote:

>> Really? fsck can recover from losing 4K bytes surrounding the last
>> metadata block written?
>
> The only metadata that matter are the inodes and (for ffs) the
> indirect blocks.  You do really want the latter to be single disk
> blocks - many systems actually write them synchonously.

What could be the effect of losing surrounding blocks on the (failed) 
write of an indirect block? Can we guarantee that fsck can reconstruct 
the filesystem, modulo some recently-created or deleted files, or is 
there a possibility of losing the entire filesystem?

> The inode is (probably) only 128 bytes, losing an inode block
> will lose the other files.
>
> A journaling filesystem probably already has ways around this...

I think journaling filesystems need to know the atomic block size in 
order to structure their log in a fault-tolerant way; I'm hoping 
someone on these lists can provide some details.

Regards,
-Steve
--------
Steve Byan <stephen_byan@maxtor.com>
Design Engineer
Maxtor Corp.
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
(508) 770-3414


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11: 6:17 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id B1FEA37B401; Fri, 31 Jan 2003 11:06:15 -0800 (PST)
Received: from mcomail02.maxtor.com (mcomail02.maxtor.com [134.6.76.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 1190343F3F; Fri, 31 Jan 2003 11:06:15 -0800 (PST)
	(envelope-from stephen_byan@maxtor.com)
Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1])
	by mcomail02.maxtor.com (8.11.6/8.11.6) with ESMTP id h0VIxqN25634;
	Fri, 31 Jan 2003 11:59:52 -0700
Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id D4XRKYTB; Fri, 31 Jan 2003 12:06:14 -0700
Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM)
	id OAA0000002262; Fri, 31 Jan 2003 14:06:13 -0500 (EST)
Date: Fri, 31 Jan 2003 14:06:11 -0500
Subject: Re: DEV_B_SIZE 
Content-Type: text/plain; charset=US-ASCII; format=flowed
Mime-Version: 1.0 (Apple Message framework v551)
Cc: freebsd-fs@freebsd.org, tech-kern@netbsd.org
To: phk@freebsd.org
From: Steve Byan <stephen_byan@maxtor.com>
In-Reply-To: <19187.1044039084@critter.freebsd.dk>
Message-Id: <1010FEB6-354F-11D7-B26B-00306548867E@maxtor.com>
Content-Transfer-Encoding: 7bit
X-Mailer: Apple Mail (2.551)
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org


On Friday, January 31, 2003, at 01:51  PM, phk@freebsd.org wrote:

> In message <A02737C6-354B-11D7-B26B-00306548867E@maxtor.com>, Steve 
> Byan writes
> :
>
>>> The only thing that exposes us to risk is we don't know the risk
>>> exists, so as long as the fact that a 4k physical sector size is
>>> used is not hidden from us, we can adapt.
>>
>> But would existing code be functionally broken (perhaps with respect 
>> to
>> failure recovery) if it were to not be modified to adapt to a 
>> different
>> physical block size?
>
> Not broken any worse than because of write-caching.

Agreed, but IDEMA is proposing to do this to SCSI drives, too.

>
>>> Nope.
>>
>> Really? fsck can recover from losing 4K bytes surrounding the last
>> metadata block written?
>
> If the fragment size is 4k when the filsystem is created, and this
> would happen automatically, then there is no window for lossage.

But if someone were to plug a new 4K-block disk into a system compiled 
to use 512 byte block disks, and the SCSI interface were faked to make 
it appear that the disk could read and write 512-byte blocks, then what 
happens? IDEMA's notion is that faking 512-byte logical size is good 
enough to get new disks to work in systems running legacy code. My fear 
is that it is not so simple.
>
> The thing we really need is working tagged-queing...

Since I believe tagged-queuing works in SCSI, I assume you are asking 
for it in ATA? Or is there some feature missing from SCSI 
tagged-queuing that you'd like to see?

Regards,
-Steve
--------
Steve Byan <stephen_byan@maxtor.com>
Design Engineer
Maxtor Corp.
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
(508) 770-3414


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11: 9: 4 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 46A0B37B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 11:09:03 -0800 (PST)
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id BD5C943F85
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 11:09:01 -0800 (PST)
	(envelope-from phk@freebsd.org)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h0VJ8l4W022439;
	Fri, 31 Jan 2003 20:08:48 +0100 (CET)
	(envelope-from phk@freebsd.org)
To: Steve Byan <stephen_byan@maxtor.com>
Cc: David Laight <david@l8s.co.uk>, freebsd-fs@freebsd.org,
	tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE 
From: phk@freebsd.org
In-Reply-To: Your message of "Fri, 31 Jan 2003 14:00:55 EST."
             <538478DE-354E-11D7-B26B-00306548867E@maxtor.com> 
Date: Fri, 31 Jan 2003 20:08:47 +0100
Message-ID: <22438.1044040127@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

In message <538478DE-354E-11D7-B26B-00306548867E@maxtor.com>, Steve Byan writes
:

>>> Really? fsck can recover from losing 4K bytes surrounding the last
>>> metadata block written?
>>
>> The only metadata that matter are the inodes and (for ffs) the
>> indirect blocks.  You do really want the latter to be single disk
>> blocks - many systems actually write them synchonously.
>
>What could be the effect of losing surrounding blocks on the (failed) 
>write of an indirect block? Can we guarantee that fsck can reconstruct 
>the filesystem, modulo some recently-created or deleted files, or is 
>there a possibility of losing the entire filesystem?

For inodes the situation is no different, only the exposure is greater:
instead of loosing three neighbour inodes we loose 31 neighbour inodes.
(Or for ufs2: 1 vs 15 inodes).

As long as I can ask the drive what the size of an atomic transfer
is it doesn't matter much to us if it is 512, 1k, 2k or 4k.  Going
above 4k would probably be a bit premature and therefore inconvenient.

If drives that come out with 4k sectors end up trashing too much
data for people, they will get a bad reputation rather fast and
I'm sure market mechanisms will take care of the issue.  If they
exhibit no worse losses than we already see due to write caching
and bugs in same, then the market won't react and you guys can
squeeze another N% more diskspace out of the same platter.

(I may be an anomaly in this, but I have actually worked on systems
which used 1k sectorsize on their 8" floppies when they made backup
copies to increase the capacity a small bit.)

I get the sense that you want us to say "NOOOO this is HORRIBLE!!!"
and you won't stop asking until we do ?

You won't have that from this bloke at least.

I don't know what the agenda you push are, but I'm not pushing it
for you...

Poul-Henning

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11:11:18 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 18EFF37B401
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 11:11:17 -0800 (PST)
Received: from mcomail02.maxtor.com (mcomail02.maxtor.com [134.6.76.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 590AE43F43
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 11:11:16 -0800 (PST)
	(envelope-from stephen_byan@maxtor.com)
Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1])
	by mcomail02.maxtor.com (8.11.6/8.11.6) with ESMTP id h0VJ4oq27414;
	Fri, 31 Jan 2003 12:04:50 -0700
Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id D4XRKYZ1; Fri, 31 Jan 2003 12:11:13 -0700
Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM)
	id OAA0000002089; Fri, 31 Jan 2003 14:10:55 -0500 (EST)
Date: Fri, 31 Jan 2003 14:10:54 -0500
Subject: Re: DEV_B_SIZE
Content-Type: text/plain; charset=US-ASCII; format=flowed
Mime-Version: 1.0 (Apple Message framework v551)
Cc: David Laight <david@l8s.co.uk>, freebsd-fs@FreeBSD.ORG,
	tech-kern@netbsd.org
To: Lord Isildur <mrfusion@uranium.vaxpower.org>
From: Steve Byan <stephen_byan@maxtor.com>
In-Reply-To: <Pine.3.89.10301311357.A20439-0100000@uranium.vaxpower.org>
Message-Id: <B8AB7B3A-354F-11D7-B26B-00306548867E@maxtor.com>
Content-Transfer-Encoding: 7bit
X-Mailer: Apple Mail (2.551)
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org


On Friday, January 31, 2003, at 01:51  PM, Lord Isildur wrote:

> to just get the performance of aligned accesses, we dont need to modify
> block sizes and such stuff. an an example, read the paper linked to 
> from
> this; http://www.pdl.cmu.edu/PDL-FTP/stray/traxtent_abs.html
> (brought to you by the same folks who did soft updates and raidframe)

Thanks, I'm aware of the excellent CMU paper. In fact, if anyone wants 
a way to get the complete physical geometry of Maxtor SCSI disks just 
by reading mode-pages, email me and I can supply the details.

  My concern is with the proposed backward-compatibility mode, which I 
fear subtly breaks the failure semantics which systems with persistent 
storage rely upon to recover.

Regards,
-Steve
--------
Steve Byan <stephen_byan@maxtor.com>
Design Engineer
Maxtor Corp.
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
(508) 770-3414


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11:11:31 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 923CE37B405; Fri, 31 Jan 2003 11:11:29 -0800 (PST)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 1615F43F9B; Fri, 31 Jan 2003 11:11:28 -0800 (PST)
	(envelope-from bright@elvis.mu.org)
Received: by elvis.mu.org (Postfix, from userid 1192)
	id DEA1AAE1C1; Fri, 31 Jan 2003 11:11:27 -0800 (PST)
Date: Fri, 31 Jan 2003 11:11:27 -0800
From: Alfred Perlstein <bright@mu.org>
To: Steve Byan <stephen_byan@maxtor.com>
Cc: David Laight <david@l8s.co.uk>, phk@freebsd.org,
	freebsd-fs@freebsd.org, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
Message-ID: <20030131191127.GS85104@elvis.mu.org>
References: <20030131185507.G1487@snowdrop.l8s.co.uk> <538478DE-354E-11D7-B26B-00306548867E@maxtor.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <538478DE-354E-11D7-B26B-00306548867E@maxtor.com>
User-Agent: Mutt/1.4i
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

I hope I'm not mistaken here, but for FFS to work it needs the 512
byte ops to be atomic, making them not so, or possibly obliterate
surrounding blocks doesn't sound like a good idea at all.

Shouldn't you guys be asking Dr McKusick?

I can forward this question on to some of the fs people at Apple
as well.

* Steve Byan <stephen_byan@maxtor.com> [030131 11:01] wrote:
> 
> On Friday, January 31, 2003, at 01:55  PM, David Laight wrote:
> 
> >>Really? fsck can recover from losing 4K bytes surrounding the last
> >>metadata block written?
> >
> >The only metadata that matter are the inodes and (for ffs) the
> >indirect blocks.  You do really want the latter to be single disk
> >blocks - many systems actually write them synchonously.
> 
> What could be the effect of losing surrounding blocks on the (failed) 
> write of an indirect block? Can we guarantee that fsck can reconstruct 
> the filesystem, modulo some recently-created or deleted files, or is 
> there a possibility of losing the entire filesystem?
> 
> >The inode is (probably) only 128 bytes, losing an inode block
> >will lose the other files.
> >
> >A journaling filesystem probably already has ways around this...
> 
> I think journaling filesystems need to know the atomic block size in 
> order to structure their log in a fault-tolerant way; I'm hoping 
> someone on these lists can provide some details.
> 
> Regards,
> -Steve
> --------
> Steve Byan <stephen_byan@maxtor.com>
> Design Engineer
> Maxtor Corp.
> MS 1-3/E23
> 333 South Street
> Shrewsbury, MA 01545
> (508) 770-3414
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-fs" in the body of the message

-- 
-Alfred Perlstein [alfred@freebsd.org]
'Instead of asking why a piece of software is using "1970s technology,"
 start asking why software is ignoring 30 years of accumulated wisdom.'

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11:21:20 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 90A3437B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 11:21:19 -0800 (PST)
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id A295543F9B
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 11:21:18 -0800 (PST)
	(envelope-from phk@freebsd.org)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h0VJLG4W024732;
	Fri, 31 Jan 2003 20:21:17 +0100 (CET)
	(envelope-from phk@freebsd.org)
To: Alfred Perlstein <bright@mu.org>
Cc: Steve Byan <stephen_byan@maxtor.com>,
	David Laight <david@l8s.co.uk>, freebsd-fs@freebsd.org,
	tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE 
From: phk@freebsd.org
In-Reply-To: Your message of "Fri, 31 Jan 2003 11:11:27 PST."
             <20030131191127.GS85104@elvis.mu.org> 
Date: Fri, 31 Jan 2003 20:21:16 +0100
Message-ID: <24731.1044040876@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

In message <20030131191127.GS85104@elvis.mu.org>, Alfred Perlstein writes:

>I hope I'm not mistaken here, but for FFS to work it needs the 512
>byte ops to be atomic, making them not so, or possibly obliterate
>surrounding blocks doesn't sound like a good idea at all.

UFS/FFS has no 512 bytes binding, it can work in other sectorsizes.

The implication is that your fragment size may increase.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11:22:13 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A553D37B401
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 11:22:11 -0800 (PST)
Received: from mcomail02.maxtor.com (mcomail02.maxtor.com [134.6.76.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E75FE43F43
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 11:22:10 -0800 (PST)
	(envelope-from stephen_byan@maxtor.com)
Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1])
	by mcomail02.maxtor.com (8.11.6/8.11.6) with ESMTP id h0VJFkc31069;
	Fri, 31 Jan 2003 12:15:46 -0700
Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id D4XRKZHS; Fri, 31 Jan 2003 12:22:09 -0700
Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM)
	id OAA0000002265; Fri, 31 Jan 2003 14:21:51 -0500 (EST)
Date: Fri, 31 Jan 2003 14:21:49 -0500
Subject: Re: DEV_B_SIZE
Content-Type: text/plain; charset=US-ASCII; format=flowed
Mime-Version: 1.0 (Apple Message framework v551)
Cc: <freebsd-fs@FreeBSD.ORG>, <tech-kern@netbsd.org>
To: <jkirby@storagecraft.com>
From: Steve Byan <stephen_byan@maxtor.com>
In-Reply-To: <001601c2c95a$63d52d70$0300a8c0@jkirbydesk>
Message-Id: <3F18DF97-3551-11D7-B26B-00306548867E@maxtor.com>
Content-Transfer-Encoding: 7bit
X-Mailer: Apple Mail (2.551)
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org


On Friday, January 31, 2003, at 01:55  PM, Jamey Kirby wrote:

> I have been a lurker for years and want to chime in.

Hi Jamey, recognize your name from the NTFS list.

> Under Windows NT (all flavors), using a 4K sector size works fine. The
> OS abstraction layers are very good and handling the alignment.

Yes, I've seen the code in the DDK and in the filesystem developers 
kit. NT's SCSI driver is already properly parameterized to use the 
block size returned by the device, as long as it is a power of 2 and 
greater than 512 byte.

However, I wonder about the failure semantics assumed by NTFS's log - 
does it rely on the beginning and the ending of each log record being 
in different physical sectors? Does it rely on no more than one sector 
being lost at the end of the log (i.e. could wiping out 4K at the tail 
of the log wipe out enough state such that the recovery code couldn't 
roll-back/roll-forward to a consistent filesystem state)?

How about the ExchangeServer? Does it's transaction mechanism depend on 
a specific block size?

How about SQLServer?

My concern is that a backwards-compatibility mechanism is being 
proposed that makes a device (even a SCSI device) with 4K physical 
blocks look like a 512-byte block device. I fear that since the failure 
semantics are subtly different, the careful-write and persistent 
logging strategies in current code will break, and no one will know 
until they experience the corner condition that results in their 
{filesystem | database | email server | transaction processing monitor} 
losing their data.

Regards,
-Steve
--------
Steve Byan <stephen_byan@maxtor.com>
Design Engineer
Maxtor Corp.
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
(508) 770-3414


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11:23:55 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2D45237B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 11:23:54 -0800 (PST)
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4485F43F75
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 11:23:53 -0800 (PST)
	(envelope-from phk@freebsd.org)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h0VJNq4W025201;
	Fri, 31 Jan 2003 20:23:52 +0100 (CET)
	(envelope-from phk@freebsd.org)
To: Steve Byan <stephen_byan@maxtor.com>
Cc: freebsd-fs@freebsd.org, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE 
From: phk@freebsd.org
In-Reply-To: Your message of "Fri, 31 Jan 2003 14:06:11 EST."
             <1010FEB6-354F-11D7-B26B-00306548867E@maxtor.com> 
Date: Fri, 31 Jan 2003 20:23:52 +0100
Message-ID: <25200.1044041032@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

In message <1010FEB6-354F-11D7-B26B-00306548867E@maxtor.com>, Steve Byan writes
:

>> Not broken any worse than because of write-caching.
>
>Agreed, but IDEMA is proposing to do this to SCSI drives, too.

We've seen broken caching on SCSI as well, but not recently I think :-)

>But if someone were to plug a new 4K-block disk into a system compiled 
>to use 512 byte block disks, and the SCSI interface were faked to make 
>it appear that the disk could read and write 512-byte blocks, then what 
>happens? IDEMA's notion is that faking 512-byte logical size is good 
>enough to get new disks to work in systems running legacy code. My fear 
>is that it is not so simple.

If plug a 4k sector disk into a system which doesn't know how to find
out that the drive really is 4k sectors, then you will increase the
window for lossage.

>> The thing we really need is working tagged-queing...
>
>Since I believe tagged-queuing works in SCSI, I assume you are asking 
>for it in ATA? Or is there some feature missing from SCSI 
>tagged-queuing that you'd like to see?

Yes, I was talking ATA there.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11:24:57 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id E915237B401; Fri, 31 Jan 2003 11:24:55 -0800 (PST)
Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 4ED2343F75; Fri, 31 Jan 2003 11:24:55 -0800 (PST)
	(envelope-from dschultz@uclink.Berkeley.EDU)
Received: from HAL9000.homeunix.com (localhost [127.0.0.1])
	by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h0VJOrNt016232;
	Fri, 31 Jan 2003 11:24:53 -0800 (PST)
	(envelope-from dschultz@uclink.Berkeley.EDU)
Received: (from das@localhost)
	by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h0VJOrC2016231;
	Fri, 31 Jan 2003 11:24:53 -0800 (PST)
	(envelope-from dschultz@uclink.Berkeley.EDU)
Date: Fri, 31 Jan 2003 11:24:52 -0800
From: David Schultz <dschultz@uclink.Berkeley.EDU>
To: Steve Byan <stephen_byan@maxtor.com>
Cc: phk@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
Message-ID: <20030131192452.GA15985@HAL9000.homeunix.com>
Mail-Followup-To: Steve Byan <stephen_byan@maxtor.com>,
	phk@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org
References: <2903.1044033486@critter.freebsd.dk> <A02737C6-354B-11D7-B26B-00306548867E@maxtor.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <A02737C6-354B-11D7-B26B-00306548867E@maxtor.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Thus spake Steve Byan <stephen_byan@maxtor.com>:
> >The only thing that exposes us to risk is we don't know the risk
> >exists, so as long as the fact that a 4k physical sector size is
> >used is not hidden from us, we can adapt.
> 
> But would existing code be functionally broken (perhaps with respect to 
> failure recovery) if it were to not be modified to adapt to a different 
> physical block size?

If the disk corrupts a sector it was writing, that's already a
problem for us.  If the sector is 4K, that just makes it more of a
problem.  With FFS and soft updates, we assume that the disk can
atomically write 512 bytes, and we ensure filesystem consistency
by establishing a safe partial ordering for metadata updates.  We
expect that after a crash, either the old contents or the new
contents of the sector are there.  I think we would need to
implement journalling to ensure integrity if hard drives were
likely to corrupt sectors on power failure.  (How often do they do
this right now, and how often would they with 4K sectors?)

Inodes are 128 bytes (UFS1) or 256 bytes (UFS2), so a 4K sector
could contain metadata for a lot of files.  If an indirect block
is squished, that might be less of a problem because it
corresponds to only one file.  In one sense, 4K sectors save a
little bit of space, since directory entries are never split
across a sector boundary so that they can be updated in a single,
atomic write.  But large sectors are still worse from a
reliability point of view if it's possible to lose the entire
sector.

The LFS is probably in much better shape...

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11:42:58 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 11D6037B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 11:42:57 -0800 (PST)
Received: from sccrmhc03.attbi.com (sccrmhc03.attbi.com [204.127.202.63])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 77BB743F3F
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 11:42:56 -0800 (PST)
	(envelope-from julian@elischer.org)
Received: from interjet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4])
          by sccrmhc03.attbi.com (sccrmhc03) with ESMTP
          id <2003013119425500300jva6me>; Fri, 31 Jan 2003 19:42:55 +0000
Received: from localhost (localhost.elischer.org [127.0.0.1])
	by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id LAA46067;
	Fri, 31 Jan 2003 11:42:53 -0800 (PST)
Date: Fri, 31 Jan 2003 11:42:53 -0800 (PST)
From: Julian Elischer <julian@elischer.org>
To: Steve Byan <stephen_byan@maxtor.com>
Cc: freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
In-Reply-To: <A91AD932-354D-11D7-B26B-00306548867E@maxtor.com>
Message-ID: <Pine.BSF.4.21.0301311142100.45015-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org



On Fri, 31 Jan 2003, Steve Byan wrote:

> 
> On Friday, January 31, 2003, at 01:16  PM, Julian Elischer wrote:
> 
> >
> > Recovery algorythms might have to deal with this (should we actually
> > decide to write one.. :-).
> >
> > Particularly if the block being written was the 1st, but the other 7
> > blocks contain data that the OS has no way of knowing that they are in
> > jeopardy. In other words, I might know that block 1 is in danger and 
> > put
> > it in a write log, (in a logging filesystem) but I have no way of
> > knowing that the other 7 are in danger, so they may not be in the write
> > log (assuming thAat the write log only holds the last N transactions.).
> > I'd say that this means that the drive should hold the active 4k block
> > in nvram or something..
> >
> > You seem to have considered this but I'm in agreement that it could
> > prove "nasty" in exactly the cases that are most important..
> > people use write logging etc. in cases where they care about the data
> > and recovery time. these are exactly the people who are going to be the
> > most pissed off to lose their data. ..
> 
> Thanks, may I forward your response on to the committee?

sure.. correct the spelling though :-)




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11:49: 3 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 35F2037B401; Fri, 31 Jan 2003 11:49:02 -0800 (PST)
Received: from sccrmhc01.attbi.com (sccrmhc01.attbi.com [204.127.202.61])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id EAA5C43F3F; Fri, 31 Jan 2003 11:49:00 -0800 (PST)
	(envelope-from julian@elischer.org)
Received: from InterJet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4])
          by sccrmhc01.attbi.com (sccrmhc01) with ESMTP
          id <200301311948590010087eaje>; Fri, 31 Jan 2003 19:49:00 +0000
Received: from localhost (localhost.elischer.org [127.0.0.1])
	by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id LAA46106;
	Fri, 31 Jan 2003 11:48:58 -0800 (PST)
Date: Fri, 31 Jan 2003 11:48:56 -0800 (PST)
From: Julian Elischer <julian@elischer.org>
To: Steve Byan <stephen_byan@maxtor.com>
Cc: phk@freebsd.org, freebsd-fs@freebsd.org, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE 
In-Reply-To: <1010FEB6-354F-11D7-B26B-00306548867E@maxtor.com>
Message-ID: <Pine.BSF.4.21.0301311144370.45015-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org



One thign I thought of is that it is not uncommon to 'dd' an entire
filesystem from one partition to another.
If we create a filesystem that is 'aligned' and we copy it to be 
'unalligned', we'd have a sudden performance drop for no immediatly
obvious reason. What was one write, would become a 2-sector read,
modify and 2-sector write. Especially when copying from one failing
drive to another with slightly different characteristics.

The idea isn't bad but I think it should be sold as a 4k sector
drive, with small print saying it can handle 512byte IO
instead of the other way around.





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11:50:44 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E8B4737B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 11:50:43 -0800 (PST)
Received: from quic.net (rrcs-central-24-123-205-180.biz.rr.com [24.123.205.180])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3406F43E4A
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 11:50:43 -0800 (PST)
	(envelope-from utsl@quic.net)
Received: from localhost (localhost [127.0.0.1])
  (uid 1032)
  by quic.net with local; Fri, 31 Jan 2003 14:50:42 -0500
Date: Fri, 31 Jan 2003 14:50:42 -0500
To: Steve Byan <stephen_byan@maxtor.com>, freebsd-fs@FreeBSD.ORG,
	tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
Message-ID: <20030131195042.GD6243@quic.net>
References: <Pine.3.89.10301311357.A20439-0100000@uranium.vaxpower.org> <B8AB7B3A-354F-11D7-B26B-00306548867E@maxtor.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
In-Reply-To: <B8AB7B3A-354F-11D7-B26B-00306548867E@maxtor.com>
User-Agent: Mutt/1.3.28i
From: Nathan Hawkins <utsl@quic.net>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Fri, Jan 31, 2003 at 02:10:54PM -0500, Steve Byan wrote:
> Thanks, I'm aware of the excellent CMU paper. In fact, if anyone wants 
> a way to get the complete physical geometry of Maxtor SCSI disks just 
> by reading mode-pages, email me and I can supply the details.

I'd be interested in that. Are those published?

>  My concern is with the proposed backward-compatibility mode, which I 
> fear subtly breaks the failure semantics which systems with persistent 
> storage rely upon to recover.

You might want to talk with Veritas. I'm pretty sure their Volume
Manager's log subdisks assume 512-byte sectors.

More generally, what impact would this have on existing RAID
implementations, hardware or software? This is a potentially more
damaging impact than filesystem semantics.

	---Nathan

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11:52:44 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C81F737B405
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 11:52:42 -0800 (PST)
Received: from hitl.washington.edu (hitl-new.hitl.washington.edu [128.95.73.60])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4F98C43E4A
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 11:52:42 -0800 (PST)
	(envelope-from perseant@hitl.washington.edu)
Received: from psychosis.hitl.washington.edu (psychosis.hitl.washington.edu [128.95.74.36])
	by hitl.washington.edu (8.11.6/8.9.3) with ESMTP id h0VJqeh13942;
	Fri, 31 Jan 2003 11:52:40 -0800 (PST)
Date: Fri, 31 Jan 2003 11:52:40 -0800 (PST)
From: Konrad Schroder <perseant@hitl.washington.edu>
To: Steve Byan <stephen_byan@maxtor.com>
Cc: freebsd-fs@freebsd.org, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
In-Reply-To: <538478DE-354E-11D7-B26B-00306548867E@maxtor.com>
Message-ID: <Pine.GSO.4.53.0301311107491.24006@psychosis.hitl.washington.edu>
References: <538478DE-354E-11D7-B26B-00306548867E@maxtor.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

My $0.02 regarding FFS: since the default block size (including indirect
blocks etc.) is 8k the only common alignment issue would come from
(mis-)alignment of the partition as a whole.  If the drive were structured
so that it reported cylinders as multiples of 4k, (almost) no one would
ever have the type of problem you're describing with FFS.

On Fri, 31 Jan 2003, Steve Byan wrote:

> I think journaling filesystems need to know the atomic block size in
> order to structure their log in a fault-tolerant way; I'm hoping
> someone on these lists can provide some details.

I think LFS is mostly okay here, though there is a corner case in which
some data could be lost (possibly the filesystem corrupted) without the
user knowing about it.  Let me describe such a case.

Suppose that the cleaner were operating.  Every cleaner write is a
checkpoint, but following the cleaner write, the previous checkpoint is
invalidated---so it is possible that there is only one valid checkpoint on
disk, at all.  Now further suppose that the filesystem were created with
fragment size less than 4k, the cleaner has just cleaned segment n+1,
filling segment n with that data; and another write has occurred into
segment n+1, thereby invalidating the contents of segment n+1; and there
were a power outage while that first segment summary in segment n+1 were
being written.

Both the previous checkpoint state (including segment n+1) and the current
checkpoint state (including segment n) would be invalid in this case.

The worst part about it is that even if fsck_lfs could fix this problem,
no one would know to run it; LFS uses roll-forward as its default repair
mechanism, and roll-forward always starts from the last known-valid
checkpoint.

The solution, of course, is to

1) Identify the disk as a 4k-sector disk;
2) Partition the disk so that LFS partitions begin on 4k boundaries;
3) Create the LFS filesystems with 4k or greater fragment size;
4) Play happily with your 8k/1k FFSes and 8k/4k LFSes.

If you did that the 4k sector size would be truly invisible to you---and
in particular, you would *not* need to recompile the kernel for any of
that unless I'm misunderstanding what you're saying.

------------------------------------------------------------------------
Konrad Schroder          http://www.hitl.washington.edu/people/perseant/
Information Tech & Services   Box 352142 -or- 215 Fluke Hall, Mason Road
Human Interface Technology Lab                  University of Washington
Voice: +1.206.616.1478   Fax: +1.206.543.5380    Seattle, WA, 98195, USA

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11:56:20 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9C9DA37B401; Fri, 31 Jan 2003 11:56:18 -0800 (PST)
Received: from apollo.email.starband.net (smtp2.starband.net [148.78.247.23])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id D9E5D43F43; Fri, 31 Jan 2003 11:56:17 -0800 (PST)
	(envelope-from jkirby@storagecraft.com)
Received: from jkirbydesk (vsat-148-63-114-177.c002.t7.mrt.starband.net [148.63.114.177])
	(authenticated bits=0)
	by apollo.email.starband.net (8.12.4/8.12.4) with ESMTP id h0VJtTH5005007;
	Fri, 31 Jan 2003 14:55:37 -0500
Reply-To: <jkirby@storagecraft.com>
From: "Jamey Kirby" <jkirby@storagecraft.com>
To: "'Julian Elischer'" <julian@elischer.org>,
	"'Steve Byan'" <stephen_byan@maxtor.com>
Cc: <phk@FreeBSD.ORG>, <freebsd-fs@FreeBSD.ORG>,
	<tech-kern@netbsd.org>
Subject: RE: DEV_B_SIZE 
Date: Fri, 31 Jan 2003 11:55:33 -0800
Organization: StorageCraft
Message-ID: <000601c2c962$c04aa2d0$0300a8c0@jkirbydesk>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook, Build 10.0.4024
In-Reply-To: <Pine.BSF.4.21.0301311144370.45015-100000@InterJet.elischer.org>
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
Importance: Normal
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Who will do the translation?

Will there be a device driver that makes the 4K disk look like a 512
byte disk? If so, the device driver would have to pre-read the 4K,
modify the 512 byte section and re-write the entire 4K. This would kill
performance.

If this will be handled in the drive, the same sort of logic must be
employed and surly there will be a performance problem; unless the drive
will be able to write the 512 bytes without a pre-read.

How easy is it to change the firmware in the drive to make it a 4K block
drive? I would be willing to tinker with a 4K drive and provide some
feedback.

Jamey


-----Original Message-----
From: owner-freebsd-fs@FreeBSD.ORG [mailto:owner-freebsd-fs@FreeBSD.ORG]
On Behalf Of Julian Elischer
Sent: Friday, January 31, 2003 11:49 AM
To: Steve Byan
Cc: phk@FreeBSD.ORG; freebsd-fs@FreeBSD.ORG; tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE 



One thign I thought of is that it is not uncommon to 'dd' an entire
filesystem from one partition to another.
If we create a filesystem that is 'aligned' and we copy it to be 
'unalligned', we'd have a sudden performance drop for no immediatly
obvious reason. What was one write, would become a 2-sector read,
modify and 2-sector write. Especially when copying from one failing
drive to another with slightly different characteristics.

The idea isn't bad but I think it should be sold as a 4k sector
drive, with small print saying it can handle 512byte IO
instead of the other way around.





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 11:58:39 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 120A537B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 11:58:38 -0800 (PST)
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1774943F3F
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 11:58:37 -0800 (PST)
	(envelope-from phk@freebsd.org)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h0VJwR4W031672;
	Fri, 31 Jan 2003 20:58:28 +0100 (CET)
	(envelope-from phk@freebsd.org)
To: Julian Elischer <julian@elischer.org>
Cc: Steve Byan <stephen_byan@maxtor.com>, freebsd-fs@freebsd.org,
	tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE 
From: phk@freebsd.org
In-Reply-To: Your message of "Fri, 31 Jan 2003 11:48:56 PST."
             <Pine.BSF.4.21.0301311144370.45015-100000@InterJet.elischer.org> 
Date: Fri, 31 Jan 2003 20:58:27 +0100
Message-ID: <31671.1044043107@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

In message <Pine.BSF.4.21.0301311144370.45015-100000@InterJet.elischer.org>, Ju
lian Elischer writes:
>
>
>One thign I thought of is that it is not uncommon to 'dd' an entire
>filesystem from one partition to another.
>If we create a filesystem that is 'aligned' and we copy it to be 
>'unalligned', we'd have a sudden performance drop for no immediatly
>obvious reason. What was one write, would become a 2-sector read,
>modify and 2-sector write. Especially when copying from one failing
>drive to another with slightly different characteristics.

If you run dd without bs=ALOT you deserve bad throughput.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 12:15:28 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8E64037B401; Fri, 31 Jan 2003 12:15:26 -0800 (PST)
Received: from rwcrmhc53.attbi.com (rwcrmhc53.attbi.com [204.127.198.39])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id E2A7A43F43; Fri, 31 Jan 2003 12:15:25 -0800 (PST)
	(envelope-from julian@elischer.org)
Received: from interjet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4])
          by rwcrmhc53.attbi.com (rwcrmhc53) with ESMTP
          id <20030131201519053003msr6e>; Fri, 31 Jan 2003 20:15:20 +0000
Received: from localhost (localhost.elischer.org [127.0.0.1])
	by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id MAA46356;
	Fri, 31 Jan 2003 12:15:19 -0800 (PST)
Date: Fri, 31 Jan 2003 12:15:17 -0800 (PST)
From: Julian Elischer <julian@elischer.org>
To: David Schultz <dschultz@uclink.Berkeley.EDU>
Cc: Steve Byan <stephen_byan@maxtor.com>, phk@FreeBSD.ORG,
	freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
In-Reply-To: <20030131192452.GA15985@HAL9000.homeunix.com>
Message-ID: <Pine.BSF.4.21.0301311214330.45015-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org



On Fri, 31 Jan 2003, David Schultz wrote:

> Thus spake Steve Byan <stephen_byan@maxtor.com>:
> > >The only thing that exposes us to risk is we don't know the risk
> > >exists, so as long as the fact that a 4k physical sector size is
> > >used is not hidden from us, we can adapt.
> > 
> > But would existing code be functionally broken (perhaps with respect to 
> > failure recovery) if it were to not be modified to adapt to a different 
> > physical block size?
> 
> If the disk corrupts a sector it was writing, that's already a
> problem for us.  If the sector is 4K, that just makes it more of a
> problem.  With FFS and soft updates, we assume that the disk can
> atomically write 512 bytes, and we ensure filesystem consistency
> by establishing a safe partial ordering for metadata updates.  We
> expect that after a crash, either the old contents or the new
> contents of the sector are there.  I think we would need to
> implement journalling to ensure integrity if hard drives were
> likely to corrupt sectors on power failure.  (How often do they do
> this right now, and how often would they with 4K sectors?)


in this case teh journel would have to not only include the block being
written, but data on each side of it that may be in teh same 4k.
that implies a read..


> 
> Inodes are 128 bytes (UFS1) or 256 bytes (UFS2), so a 4K sector
> could contain metadata for a lot of files.  If an indirect block
> is squished, that might be less of a problem because it
> corresponds to only one file.  In one sense, 4K sectors save a
> little bit of space, since directory entries are never split
> across a sector boundary so that they can be updated in a single,
> atomic write.  But large sectors are still worse from a
> reliability point of view if it's possible to lose the entire
> sector.
> 
> The LFS is probably in much better shape...
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-fs" in the body of the message
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 12:16:52 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9924837B407; Fri, 31 Jan 2003 12:16:51 -0800 (PST)
Received: from mcomail02.maxtor.com (mcomail02.maxtor.com [134.6.76.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id B06A743F85; Fri, 31 Jan 2003 12:16:50 -0800 (PST)
	(envelope-from stephen_byan@maxtor.com)
Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1])
	by mcomail02.maxtor.com (8.11.6/8.11.6) with ESMTP id h0VKAR817917;
	Fri, 31 Jan 2003 13:10:27 -0700
Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id D4XRK62D; Fri, 31 Jan 2003 13:16:52 -0700
Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM)
	id PAA0000002380; Fri, 31 Jan 2003 15:16:38 -0500 (EST)
Date: Fri, 31 Jan 2003 15:16:37 -0500
Subject: Re: DEV_B_SIZE 
Content-Type: text/plain; charset=US-ASCII; format=flowed
Mime-Version: 1.0 (Apple Message framework v551)
Cc: freebsd-fs@freebsd.org, tech-kern@netbsd.org
To: phk@freebsd.org
From: Steve Byan <stephen_byan@maxtor.com>
In-Reply-To: <22438.1044040127@critter.freebsd.dk>
Message-Id: <E6AEE678-3558-11D7-B26B-00306548867E@maxtor.com>
Content-Transfer-Encoding: 7bit
X-Mailer: Apple Mail (2.551)
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org


On Friday, January 31, 2003, at 02:08  PM, phk@freebsd.org wrote:

> I get the sense that you want us to say "NOOOO this is HORRIBLE!!!"
> and you won't stop asking until we do ?
>
> You won't have that from this bloke at least.
>
> I don't know what the agenda you push are, but I'm not pushing it
> for you...

I keep getting a response that reads like "we'll detect the larger 
block size and run with it".  I'm concerned that I'm not being clear 
that IDEMA is thinking of proposing a backward-compatibility mode with 
the presumption that it will work fine (albeit slowly) with existing 
binaries, i.e. code that hasn't been modified to be aware of the larger 
block size.

If you think there are no functional problems with this 
backwards-compatibility scenario, including during recovery (fsck or 
journal roll-forward), I'd be happy to hear a clear "no problem".

Regards,
-Steve
--------
Steve Byan <stephen_byan@maxtor.com>
Design Engineer
Maxtor Corp.
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
(508) 770-3414


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 12:40:34 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id E3E4B37B401; Fri, 31 Jan 2003 12:40:32 -0800 (PST)
Received: from sccrmhc03.attbi.com (sccrmhc03.attbi.com [204.127.202.63])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 44A3243FA7; Fri, 31 Jan 2003 12:40:32 -0800 (PST)
	(envelope-from julian@elischer.org)
Received: from interjet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4])
          by sccrmhc03.attbi.com (sccrmhc03) with ESMTP
          id <2003013120403000300jupd0e>; Fri, 31 Jan 2003 20:40:30 +0000
Received: from localhost (localhost.elischer.org [127.0.0.1])
	by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id MAA46556;
	Fri, 31 Jan 2003 12:40:29 -0800 (PST)
Date: Fri, 31 Jan 2003 12:40:28 -0800 (PST)
From: Julian Elischer <julian@elischer.org>
To: phk@freebsd.org
Cc: Steve Byan <stephen_byan@maxtor.com>, freebsd-fs@freebsd.org,
	tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE 
In-Reply-To: <31671.1044043107@critter.freebsd.dk>
Message-ID: <Pine.BSF.4.21.0301311240030.45015-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org



On Fri, 31 Jan 2003 phk@freebsd.org wrote:

> In message <Pine.BSF.4.21.0301311144370.45015-100000@InterJet.elischer.org>, Ju
> lian Elischer writes:
> >
> >
> >One thign I thought of is that it is not uncommon to 'dd' an entire
> >filesystem from one partition to another.
> >If we create a filesystem that is 'aligned' and we copy it to be 
> >'unalligned', we'd have a sudden performance drop for no immediatly
> >obvious reason. What was one write, would become a 2-sector read,
> >modify and 2-sector write. Especially when copying from one failing
> >drive to another with slightly different characteristics.
> 
> If you run dd without bs=ALOT you deserve bad throughput.

I'm talking about the performance of the filesystem after it's been
moved.


> 
> -- 
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk@FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe    
> Never attribute to malice what can adequately be explained by incompetence.
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 12:41:44 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id EE61937B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 12:41:42 -0800 (PST)
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1C49F43E4A
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 12:41:42 -0800 (PST)
	(envelope-from phk@freebsd.org)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h0VKfe4W039529;
	Fri, 31 Jan 2003 21:41:40 +0100 (CET)
	(envelope-from phk@freebsd.org)
To: Steve Byan <stephen_byan@maxtor.com>
Cc: freebsd-fs@freebsd.org, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE 
From: phk@freebsd.org
In-Reply-To: Your message of "Fri, 31 Jan 2003 15:16:37 EST."
             <E6AEE678-3558-11D7-B26B-00306548867E@maxtor.com> 
Date: Fri, 31 Jan 2003 21:41:40 +0100
Message-ID: <39528.1044045700@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

In message <E6AEE678-3558-11D7-B26B-00306548867E@maxtor.com>, Steve Byan writes
:
>
>On Friday, January 31, 2003, at 02:08  PM, phk@freebsd.org wrote:
>
>> I get the sense that you want us to say "NOOOO this is HORRIBLE!!!"
>> and you won't stop asking until we do ?
>>
>> You won't have that from this bloke at least.
>>
>> I don't know what the agenda you push are, but I'm not pushing it
>> for you...
>
>I keep getting a response that reads like "we'll detect the larger 
>block size and run with it".  I'm concerned that I'm not being clear 
>that IDEMA is thinking of proposing a backward-compatibility mode with 
>the presumption that it will work fine (albeit slowly) with existing 
>binaries, i.e. code that hasn't been modified to be aware of the larger 
>block size.
>
>If you think there are no functional problems with this 
>backwards-compatibility scenario, including during recovery (fsck or 
>journal roll-forward), I'd be happy to hear a clear "no problem".

Ok, to make it 100% clear:

1. We won't see any new problems.  The effects of 3.5k around a
   sector we touched being corrupted is no different from any other
   3.5k developing a bad sector read-error.   (Hopefully the drive
   will flag it with a read-error when we come back so it won't
   look like random data corruption.)

2. Already existing issues will do greater damage.  This follows
   directly from the fact that increasing the sectorsize increases the
   amount of data lost when a sector is lost.  If the market place
   hates that, the new drives will not be popular there.

3. If the OS can detect the true sectorsize, some choices can
   be made intelligently and reduce the performance hit and
   some of recovery issues.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 13: 1:31 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 177AD37B401; Fri, 31 Jan 2003 13:01:30 -0800 (PST)
Received: from mcomail02.maxtor.com (mcomail02.maxtor.com [134.6.76.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 6EB2943F43; Fri, 31 Jan 2003 13:01:29 -0800 (PST)
	(envelope-from stephen_byan@maxtor.com)
Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1])
	by mcomail02.maxtor.com (8.11.6/8.11.6) with ESMTP id h0VKt3u31068;
	Fri, 31 Jan 2003 13:55:03 -0700
Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id D4XRK8AS; Fri, 31 Jan 2003 14:01:29 -0700
Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM)
	id QAA0000002604; Fri, 31 Jan 2003 16:01:14 -0500 (EST)
Date: Fri, 31 Jan 2003 16:01:13 -0500
Subject: Re: DEV_B_SIZE
Content-Type: text/plain; charset=WINDOWS-1252; format=flowed
Mime-Version: 1.0 (Apple Message framework v551)
Cc: phk@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org
To: David Schultz <dschultz@uclink.Berkeley.EDU>
From: Steve Byan <stephen_byan@maxtor.com>
In-Reply-To: <20030131192452.GA15985@HAL9000.homeunix.com>
Message-Id: <21B8D16C-355F-11D7-B26B-00306548867E@maxtor.com>
Content-Transfer-Encoding: quoted-printable
X-Mailer: Apple Mail (2.551)
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org


On Friday, January 31, 2003, at 02:24  PM, David Schultz wrote:

> If the disk corrupts a sector it was writing, that's already a
> problem for us.

 =46rom the Maxtor Atlas 10K III Product Spec:

Section 4.5.1 Power Sequencing

You may apply the power in any order or manner, or open either the=20
power or
power return line with no loss of data or damage to the disk drive.=20
However,
data may be lost in the sector being written at the time of power loss.=20=

The drive
can withstand transient voltages of +10% to =96100% from nominal while
powering up or down.


> If the sector is 4K, that just makes it more of a
> problem.  With FFS and soft updates, we assume that the disk can
> atomically write 512 bytes, and we ensure filesystem consistency
> by establishing a safe partial ordering for metadata updates.  We
> expect that after a crash, either the old contents or the new
> contents of the sector are there.  I think we would need to
> implement journalling to ensure integrity if hard drives were
> likely to corrupt sectors on power failure.  (How often do they do
> this right now, and how often would they with 4K sectors?)

If you are doing nothing but continuously writing, the active data area=20=

covers more than 50% of the track, so you'd have more than a 0.5=20
probability of experiencing a corrupt sector. Derate this by your seek=20=

duty-cycle and your write disk utilization to arrive at the final=20
probability.

Regards,
-Steve
--------
Steve Byan <stephen_byan@maxtor.com>
Design Engineer
Maxtor Corp.
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
(508) 770-3414


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 13: 6:55 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id A90C637B401; Fri, 31 Jan 2003 13:06:53 -0800 (PST)
Received: from mcomail01.maxtor.com (mcomail01.maxtor.com [134.6.76.15])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id F15B943F79; Fri, 31 Jan 2003 13:06:52 -0800 (PST)
	(envelope-from stephen_byan@maxtor.com)
Received: from mcoexc03.mlm.maxtor.com (localhost.localdomain [127.0.0.1])
	by mcomail01.maxtor.com (8.11.6/8.11.6) with ESMTP id h0VKuNq15356;
	Fri, 31 Jan 2003 13:56:23 -0700
Received: from mmans02.mma.maxtor.com ([134.6.232.101]) by mcoexc03.mlm.maxtor.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id D4XRK8HA; Fri, 31 Jan 2003 14:06:53 -0700
Received: from maxtor.com by mmans02.mma.maxtor.com (8.8.8/1.1.22.3/08May01-0432PM)
	id QAA0000002539; Fri, 31 Jan 2003 16:06:44 -0500 (EST)
Date: Fri, 31 Jan 2003 16:06:43 -0500
Subject: Re: DEV_B_SIZE 
Content-Type: text/plain; charset=US-ASCII; format=flowed
Mime-Version: 1.0 (Apple Message framework v551)
Cc: "'Julian Elischer'" <julian@elischer.org>, <phk@FreeBSD.ORG>,
	<freebsd-fs@FreeBSD.ORG>, <tech-kern@netbsd.org>
To: <jkirby@storagecraft.com>
From: Steve Byan <stephen_byan@maxtor.com>
In-Reply-To: <000601c2c962$c04aa2d0$0300a8c0@jkirbydesk>
Message-Id: <E63D8B4F-355F-11D7-B26B-00306548867E@maxtor.com>
Content-Transfer-Encoding: 7bit
X-Mailer: Apple Mail (2.551)
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org


On Friday, January 31, 2003, at 02:55  PM, Jamey Kirby wrote:

> Who will do the translation?
>
> Will there be a device driver that makes the 4K disk look like a 512
> byte disk?

No.

> If so, the device driver would have to pre-read the 4K,
> modify the 512 byte section and re-write the entire 4K. This would kill
> performance.

Yes, it would.

> If this will be handled in the drive, the same sort of logic must be
> employed and surly there will be a performance problem; unless the 
> drive
> will be able to write the 512 bytes without a pre-read.

Yes, there surely would be a performance problem if the I/O has to wait 
for a read-modify-write. There may be proprietary techniques for hiding 
the cost. The assumption is that this is purely a 
backward-compatibility case, and the performance hit would motivate 
folks to update their software to recognize the new larger block size.

Regards,
-Steve
--------
Steve Byan <stephen_byan@maxtor.com>
Design Engineer
Maxtor Corp.
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
(508) 770-3414


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 13:31:44 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id D126C37B401; Fri, 31 Jan 2003 13:31:41 -0800 (PST)
Received: from yahoo.com (host81-134-41-205.in-addr.btopenworld.com [81.134.41.205])
	by mx1.FreeBSD.org (Postfix) with SMTP
	id 7E9DA43FC7; Fri, 31 Jan 2003 13:31:36 -0800 (PST)
	(envelope-from newhsave@yahoo.com)
Message-ID: <000410c5eb35$ccc25383$68615510@ljrpyit.rwa>
From: <newhsave@yahoo.com>
To: Homeworker@FreeBSD.ORG
Subject: Turn $25 into $45,000 MONTHLY, all automatic!                                                    2588CMRd2-598mGjG7972Raq-23
Date: Fri, 31 Jan 2003 12:22:19 +0900
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-Mailer: Microsoft Outlook Express 5.00.2615.200
Importance: Normal
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

This is IT!

Don't miss out in this amazing opportunity. Join
now for $25 and earn up to $45,000 MONTHLY!!!!

Yes, I said monthly AND IT IS ALL AUTOMATED.
Get FREE information. Just click the link below.

www.mlmontarget.com

This is FREE information that will amaze you on
how much MONEY YOU CAN EARN FOR ONLY
$25 per month. Many join multiple times and it is
ALL AUTOMATED. We do all the hard work. Click
below now for FREE information.

www.mlmontarget.com

Start getting your MONEY today!!!


8454CbFV6-461bgof1523wBxl0-180ztPb3459XesT5-575PZdt8733Anqp8-647GKkY1078Nswl71

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 13:48:33 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D735937B401
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 13:48:31 -0800 (PST)
Received: from uranium.vaxpower.org (URANIUM.CLUB.CC.cmu.edu [128.2.4.153])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9114743F3F
	for <freebsd-fs@FreeBSD.ORG>; Fri, 31 Jan 2003 13:48:29 -0800 (PST)
	(envelope-from mrfusion@uranium.vaxpower.org)
Received: (from mrfusion@localhost)
	by uranium.vaxpower.org (8.9.1/5.5.1) id NAA20481;
	Fri, 31 Jan 2003 13:51:52 -0500
Date: Fri, 31 Jan 2003 13:51:52 -0500 (EST)
From: Lord Isildur <mrfusion@uranium.vaxpower.org>
Subject: Re: DEV_B_SIZE
To: Steve Byan <stephen_byan@maxtor.com>
Cc: David Laight <david@l8s.co.uk>, freebsd-fs@FreeBSD.ORG,
	tech-kern@netbsd.org
In-Reply-To: <1BBFD4B2-354C-11D7-B26B-00306548867E@maxtor.com>
Message-ID: <Pine.3.89.10301311357.A20439-0100000@uranium.vaxpower.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

to just get the performance of aligned accesses, we dont need to modify
block sizes and such stuff. an an example, read the paper linked to from 
this; http://www.pdl.cmu.edu/PDL-FTP/stray/traxtent_abs.html
(brought to you by the same folks who did soft updates and raidframe)
happy hacking,
isildur

On Fri, 31 Jan 2003, Steve Byan wrote:

> 
> On Friday, January 31, 2003, at 12:59  PM, David Laight wrote:
> 
> > On Fri, Jan 31, 2003 at 11:30:18AM -0500, Steve Byan wrote:
> >> There's a notion afoot in IDEMA to enlarge the underlying physical
> >> block size of disks to 4096 bytes while keeping a 512-byte logical
> >> block size for the interface. Unaligned accesses would involve either 
> >> a
> >> read-modify-write or some proprietary mechanism that provides
> >> persistence without the latency cost of a read-modify-write.
> >
> > There probably ought to be a way of making the larger physical
> > size visible to systems that are willing to support larger
> > block sizes.  That way misaligned transfers would be far less
> > likely.
> 
> Yes, of course. But I asked with respect to an issue other than 
> performance.
> >
> > One problem to consider is that disks are still partitioned
> > on cylinder boundaries.  This is largely historic but isn't
> > this doen't actually make much sense, since the geometry
> > almost certainly varies across the disk and has to be faked
> > to fit the ATA CHS limits and (on PCs) the BIOS interface.
> >
> > However what it does mean is that a partition could easily
> > not start on a 8 (512 byte) sector boundary.
> > So misaligned transefers are likely even if the filesystem
> > itself is using 4k blocks.
> >
> > On a PC the partitioning will typically have the first one
> > starting in sector 63, and the others at multiple of 16065
> > sectors from the start of the disk).
> >
> > This doesn't bode well for getting any aligned transfer
> > at all.
> 
> We understand that problem. It's just a performance issue. My concern 
> is that even if we handwave the performance issues, there's an 
> underlying semantic that would not be satisfied if we were to run 
> existing software, unmodified, on a disk with an underlying 4K sector 
> size.
> 
> Regards,
> -Steve
> --------
> Steve Byan <stephen_byan@maxtor.com>
> Design Engineer
> Maxtor Corp.
> MS 1-3/E23
> 333 South Street
> Shrewsbury, MA 01545
> (508) 770-3414
> 
> 

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 14:46:33 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7EF4937B401; Fri, 31 Jan 2003 14:46:32 -0800 (PST)
Received: from mail.allcaps.org (allcaps.org [216.240.173.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 1E18943F85; Fri, 31 Jan 2003 14:46:32 -0800 (PST)
	(envelope-from bsder@allcaps.org)
Received: from mail.allcaps.org (localhost [127.0.0.1])
	by mail.allcaps.org (Postfix) with ESMTP
	id EB39392FA9; Fri, 31 Jan 2003 17:46:27 -0500 (EST)
Received: from localhost (bsder@localhost)
	by mail.allcaps.org (8.12.5/8.12.5/Submit) with ESMTP id h0VMkR90000481;
	Fri, 31 Jan 2003 14:46:27 -0800
X-Authentication-Warning: mail.allcaps.org: bsder owned process doing -bs
Date: Fri, 31 Jan 2003 14:46:27 -0800 (PST)
From: "Andrew P. Lentvorski, Jr." <bsder@allcaps.org>
To: Steve Byan <stephen_byan@maxtor.com>
Cc: phk@freebsd.org, <freebsd-fs@freebsd.org>, <tech-kern@netbsd.org>
Subject: Re: DEV_B_SIZE 
In-Reply-To: <E6AEE678-3558-11D7-B26B-00306548867E@maxtor.com>
Message-ID: <Pine.LNX.4.44.0301311424290.395-100000@mail.allcaps.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Fri, 31 Jan 2003, Steve Byan wrote:

> I keep getting a response that reads like "we'll detect the larger 
> block size and run with it".  I'm concerned that I'm not being clear 
> that IDEMA is thinking of proposing a backward-compatibility mode with 
> the presumption that it will work fine (albeit slowly) with existing 
> binaries, i.e. code that hasn't been modified to be aware of the larger 
> block size.

Is this the scenario you're worried about?

1) Plug a shiny new 4K type disk into, say, FreeBSD 4.7
2) FreeBSD 4.7 doesn't know about 4K disks, so uses 512 byte mode
3) System configures softupdates and does a newfs
4) ... time passes ...
5) Luser trips over power cord in middle of write and corrupts disk

Question: Does this work any differently given that the disk is 4K working
in 512 compatibility mode vs. a real 512 disk?

I think the answer depends upon the atomicity of the access.  If the drive
working in compatibility mode guarantees that only the new 512 bytes (out
of the total 4096) will be corrupt, things probably work.  If, however,
any of the 4096 bytes can be corrupted, it probably will not.

I assume that the whole reasoning behind moving to 4K size is to extend
the error coding to a larger chunk of bits for less overhead.  If that is
the case, a read-modify-write is likely to clobber any of the 4096 bytes,
and it is not likely to work transparently in compatibility mode under
failure conditions.

-a





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 15:49:38 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 433D137B401; Fri, 31 Jan 2003 15:49:37 -0800 (PST)
Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id A0D6543FA3; Fri, 31 Jan 2003 15:49:36 -0800 (PST)
	(envelope-from dschultz@uclink.berkeley.edu)
Received: from HAL9000.homeunix.com (localhost [127.0.0.1])
	by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h0VNnWNt016972;
	Fri, 31 Jan 2003 15:49:32 -0800 (PST)
	(envelope-from dschultz@uclink.berkeley.edu)
Received: (from das@localhost)
	by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h0VNnWuL016971;
	Fri, 31 Jan 2003 15:49:32 -0800 (PST)
	(envelope-from dschultz@uclink.berkeley.edu)
Date: Fri, 31 Jan 2003 15:49:32 -0800
From: David Schultz <dschultz@uclink.berkeley.edu>
To: Julian Elischer <julian@elischer.org>
Cc: Steve Byan <stephen_byan@maxtor.com>, phk@FreeBSD.ORG,
	freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
Message-ID: <20030131234932.GA16959@HAL9000.homeunix.com>
Mail-Followup-To: Julian Elischer <julian@elischer.org>,
	Steve Byan <stephen_byan@maxtor.com>, phk@FreeBSD.ORG,
	freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org
References: <20030131192452.GA15985@HAL9000.homeunix.com> <Pine.BSF.4.21.0301311214330.45015-100000@InterJet.elischer.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.BSF.4.21.0301311214330.45015-100000@InterJet.elischer.org>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Thus spake Julian Elischer <julian@elischer.org>:
> > contents of the sector are there.  I think we would need to
> > implement journalling to ensure integrity if hard drives were
> > likely to corrupt sectors on power failure.  (How often do they do
> > this right now, and how often would they with 4K sectors?)
> 
> 
> in this case teh journel would have to not only include the block being
> written, but data on each side of it that may be in teh same 4k.
> that implies a read..

If you had to do that, then nearly every write would be a
read-modify-write cycle.  It would be far less painful
to use 4K blocks or larger and align filesystem blocks
to disk sectors.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 16:11:43 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id F0ABD37B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 16:11:41 -0800 (PST)
Received: from mail.netbsd.org (mail.netbsd.org [155.53.1.253])
	by mx1.FreeBSD.org (Postfix) with SMTP id 98A0443F79
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 16:11:41 -0800 (PST)
	(envelope-from wrstuden@netbsd.org)
Received: (qmail 15315 invoked by uid 1130); 1 Feb 2003 00:11:40 -0000
Date: Fri, 31 Jan 2003 16:11:29 -0800 (PST)
From: Bill Studenmund <wrstuden@netbsd.org>
X-X-Sender:  <wrstuden@vespasia.home-net.icnt.net>
To: Steve Byan <stephen_byan@maxtor.com>
Cc: <freebsd-fs@freebsd.org>, <tech-kern@netbsd.org>
Subject: Re: DEV_B_SIZE 
In-Reply-To: <E6AEE678-3558-11D7-B26B-00306548867E@maxtor.com>
Message-ID: <Pine.NEB.4.33.0301311545180.4728-100000@vespasia.home-net.icnt.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Fri, 31 Jan 2003, Steve Byan wrote:

> I keep getting a response that reads like "we'll detect the larger
> block size and run with it".  I'm concerned that I'm not being clear
> that IDEMA is thinking of proposing a backward-compatibility mode with
> the presumption that it will work fine (albeit slowly) with existing
> binaries, i.e. code that hasn't been modified to be aware of the larger
> block size.
>
> If you think there are no functional problems with this
> backwards-compatibility scenario, including during recovery (fsck or
> journal roll-forward), I'd be happy to hear a clear "no problem".

I think Stephan Uphof hit on the main issues. I think there are functional
problems with this, but that it may be usefull in some situations. It just
needs a BIG warning.

Note I am assuming that if there's an error writing a 512-byte sector the
full 4k sector will have issues. If that is avoided (say only the 512-byte
area actually has an issue) then things are fine.

I think the main place that problems will arrise is that methods to reduce
error exposure won't necessarily work. Methods that try to resist single-
sector errors, say by making multiple copies of data, will need to know
that the single-sector error size (how much data goes away) is 4k, not
512 bytes. Exactly how may programs use these methods is not something I
know, so I can't tell you exactly what the exposure is.

The fact that the errors from a 4k re-write failing are not unheard of
isn't the issie. phk is right that that just looks like multiple sectors
dying. The problem is that we would have multiple-sector-death happening
with single-sector failure dynamics.

If you want this to not be an issue 100%, then just put a battery-backed
up cache on the device. Note I'm not saying back up the write cache, just
have a cache of the last area(s) being writen. We're talking maybe 8k of
cache plus checksumming plus the logical block addresses. Shouldn't be
hard (read should be cheep in mass quantities) to make a battery back up
something that small. Use a rechargable battery, and just say that if you
loose power while writing, you should restore power within say a month or
a few months to let said cache drain.

With well-tuned CMOS, you might even be able to get away with just static
charge or a capacitor for power storage.

Take care,

Bill




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 16:13:25 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id EC2E337B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 16:13:23 -0800 (PST)
Received: from sccrmhc01.attbi.com (sccrmhc01.attbi.com [204.127.202.61])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2F50243F79
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 16:13:23 -0800 (PST)
	(envelope-from julian@elischer.org)
Received: from InterJet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4])
          by sccrmhc01.attbi.com (sccrmhc01) with ESMTP
          id <20030201001316001008a0l0e>; Sat, 1 Feb 2003 00:13:17 +0000
Received: from localhost (localhost.elischer.org [127.0.0.1])
	by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id QAA48166;
	Fri, 31 Jan 2003 16:13:15 -0800 (PST)
Date: Fri, 31 Jan 2003 16:13:13 -0800 (PST)
From: Julian Elischer <julian@elischer.org>
To: David Schultz <dschultz@uclink.berkeley.edu>
Cc: Steve Byan <stephen_byan@maxtor.com>, freebsd-fs@FreeBSD.ORG,
	tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
In-Reply-To: <20030131234932.GA16959@HAL9000.homeunix.com>
Message-ID: <Pine.BSF.4.21.0301311611190.47169-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org



On Fri, 31 Jan 2003, David Schultz wrote:

> Thus spake Julian Elischer <julian@elischer.org>:
> > > contents of the sector are there.  I think we would need to
> > > implement journalling to ensure integrity if hard drives were
> > > likely to corrupt sectors on power failure.  (How often do they do
> > > this right now, and how often would they with 4K sectors?)
> > 
> > 
> > in this case teh journel would have to not only include the block being
> > written, but data on each side of it that may be in teh same 4k.
> > that implies a read..
> 
> If you had to do that, then nearly every write would be a
> read-modify-write cycle.  It would be far less painful
> to use 4K blocks or larger and align filesystem blocks
> to disk sectors.

exactly..

But this is a case where "a filesystem using 512 byte blocks
would behave significanlty differently with one of these drives"

which is what he was asking.

> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 16:34:42 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D49AE37B406
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 16:34:38 -0800 (PST)
Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B997F43F75
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 16:34:36 -0800 (PST)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0203.cvx21-bradley.dialup.earthlink.net ([209.179.192.203] helo=mindspring.com)
	by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)
	id 18elc7-00063u-00; Fri, 31 Jan 2003 16:34:32 -0800
Message-ID: <3E3B1582.39463573@mindspring.com>
Date: Fri, 31 Jan 2003 16:32:02 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Steve Byan <stephen_byan@maxtor.com>
Cc: freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
References: <4912E0FE-3539-11D7-B26B-00306548867E@maxtor.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4fec0bdacb27578085064db9f0561ec03a2d4e88014a4647c350badd9bab72f9c350badd9bab72f9c
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Steve Byan wrote:
> There's a notion afoot in IDEMA to enlarge the underlying physical
> block size of disks to 4096 bytes while keeping a 512-byte logical
> block size for the interface. Unaligned accesses would involve either a
> read-modify-write or some proprietary mechanism that provides
> persistence without the latency cost of a read-modify-write.
> 
> Performance issues aside, it occurs to me that hiding the underlying
> physical block size may break many careful-write and
> transaction-logging mechanisms, which may depend on no more than one
> block being corrupted during a failure. In IDEMA's proposal, a power
> failure during a write of a single 512-byte logical block could result
> in the corruption of the full 4K block, i.e. reads of any of the
> 512-byte logical blocks in that 4K physical block  would return an
> uncorrectable ECC error.
> 
> I'd appreciate hearing examples where hiding the underlying physical
> block size would break a file system, database, transaction processing
> monitor, or whatever.  Please let me know if I may forward your reply
> to the committee. Thanks.

UFS directory operations are on the basis of physical disk blocks,
which are assumed to be DEVBSIZE in size (512b).  Minimally, the I/O
path would be broken by this change by changing the atomic unit size
to 4096.

The reason this would break is that the atomic write guarantee is
used to ensure that a single sector changes are recorded atomically.
This is important in rename operations from a short name to a longer
name, where the new name is allocated as a hard link in the new block;
the place this becomes problematic is where the new block and the old
block are the same block, unknown to the software.

The transaction in question is atomic file replacement; it involves:

	name	- name of the file
	name.1	- name of the file whose contents are to atomically
		  replace the contents of "name"
	name.2	- name of intermediate file for use in transaction
		  rollback/forward

The transaction is:

	---------------------------	-----------------------------
	files				view
	---------------------------	-----------------------------
	name				name
	+name.1				name	name.1
	explicit_sync(name.1)		name	name.1
	name	->	name.2		name	name.1	name.2
						name.1	name.2
	name	<-	name.1		name	name.1	name.2
					name		name.2
	-name.2				name
	---------------------------	-----------------------------

The failure recovery is:

	---------------------------	-----------------------------
	view				process
	---------------------------	-----------------------------
	name				[NULL]
	name	name.1			[ROLL BACK(partial file?)]
					-name.1
	name	name.1	name.2		[ROLL FORWARD]
					-name
					name	<-	name.1
					-name.2
	name		name.2		[ROLL FORWARD]
					-name.2
	---------------------------	-----------------------------

Currently, UFS is subject to damage through courruption of data in a
pending transaction.  A corrupt sector destroys data.  But this is a
weakness of UFS, and is not a uniform weakness of all FS's that must
provide the same transactional guarantees to the applications, for
the purposes of recovery.

In a journalling or log structured FS, the failure of a write of a
sector of data -- or rather, an extent or log or journal line -- is
recoverable: you get the previous contents, because the journal line
has not been replaced with new contents with a newer date stamp.  The
result is that it backs the transaction out for you.  But this is still
potentially a partial back-out, which can leave us with any of the views
of the directory contents, which we need to use to discern our recovery
strategy ([NULL]/[ROLL BACK]/[ROLL FORWARD]).

The risk is much higher in this case, in that the logging extents may
in fact be adjacent, and span the 4K boundary, while only being self
protecting from spanning a 512b boundary.  The net effect of this is
that rather than guaranteeing to only damage a single extent, you may
damage two extents containing pre- and post-operation data.  Unless
the filesystem maintains extents two back, or goes out of its way to
ensure non-adjacency (can this be done, in the face of sector sparing?),
this type of failure is unrecoverable.

The main issue with this is that you can not ensure physical alignment
of the underlying logical device that is acting as a backing store for
the FS.  This was and is a common performance problem for demand paged
virtual memory using OS's: MSDOS FAT FS's on drives that claim an odd
numbered physical sector count per track result in the first partition
being on an odd 512b boundary.  The result is that physical pages in
memory are spanned by every third 1K FS block, because they are offset
by 512b from the start of the disk.

So even if you are not considering the single sector issue as a design
flaw in UFS, and even if requiring recompilation is acceptable (it is,
IMO), you can't necessarily avoid the failure case.

Note: This is not an exhaustive list, this is just off the top of my
head; I could probably come up with other scenarios, as well... e.g. at
the very least, for FAT, you would probably be screwed with a number
larger than 1K, even if you were careful to make sure that the sectors
per track was an even multiple of your physical block size, since the
FAT entry in FAT FS's *is* the inode.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 16:45:38 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 843F537B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 16:45:37 -0800 (PST)
Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D975343E4A
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 16:45:36 -0800 (PST)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0203.cvx21-bradley.dialup.earthlink.net ([209.179.192.203] helo=mindspring.com)
	by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)
	id 18elmm-00001U-00; Fri, 31 Jan 2003 16:45:33 -0800
Message-ID: <3E3B1857.2122B84F@mindspring.com>
Date: Fri, 31 Jan 2003 16:44:07 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Julian Elischer <julian@elischer.org>
Cc: Steve Byan <stephen_byan@maxtor.com>, freebsd-fs@FreeBSD.ORG,
	tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
References: <Pine.BSF.4.21.0301311002110.45015-100000@InterJet.elischer.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4fa5be0a4ef0945269477e932dc92a1a8350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Julian Elischer wrote:
> I presume that if such a drive were made, thre would be some way to
> identify it?
> 
> It would be very easy to configure a filesystem to have a minimum
> writable unit size of 4k, and I assume that doing so would be
> slightly advantageous. (no Read/modify/write). it would however
> be good if we could easily identify when doing so was a good idea.

Substantial modifications would be required to the UFS directory
management code to support both old and new disks in the same
machine with the same FS code.

Assuming that was addressed by making the DEVBSIZE define into a
variable based on the underlying device, there's the problem of
device concatenation.  Your devices would have to be made up of
homogeneous components, too, so once you got them to coexist with
old disks, you would still not be able to get them to aggregate
with them, in, e.g., a RAID 0, and maybe not in any RAID set.


> I'd say that this means that the drive should hold the active 4k block
> in nvram or something..

This would be very useful, but unlikely in the extreme, I think,
because of the associated costs.  8-(.  But it would be very, very
useful.


-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 17:12:54 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id DF5B937B401
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 17:12:52 -0800 (PST)
Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5F27A43F43
	for <freebsd-fs@freebsd.org>; Fri, 31 Jan 2003 17:12:52 -0800 (PST)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0052.cvx40-bradley.dialup.earthlink.net ([216.244.42.52] helo=mindspring.com)
	by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)
	id 18emD9-0001P6-00; Fri, 31 Jan 2003 17:12:48 -0800
Message-ID: <3E3B1E96.B76237AD@mindspring.com>
Date: Fri, 31 Jan 2003 17:10:46 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Nathan Hawkins <utsl@quic.net>
Cc: Steve Byan <stephen_byan@maxtor.com>, freebsd-fs@FreeBSD.ORG,
	tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
References: <Pine.3.89.10301311357.A20439-0100000@uranium.vaxpower.org> <B8AB7B3A-354F-11D7-B26B-00306548867E@maxtor.com> <20030131195042.GD6243@quic.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4dd2b123101f54ac3b9604623d3bb7044350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

Nathan Hawkins wrote:
> You might want to talk with Veritas. I'm pretty sure their Volume
> Manager's log subdisks assume 512-byte sectors.

Yes, this is true.  It would cause a problem for VXFS, at least
the VXFS whose source code I disked around with for USL's use on
UnixWare; almost all the directory entry management code is
verbatim from the USL UFS sources.

I know that AIX *would not* have a problem on the old HPFS, but the
OS/2 HPFS might have a problem.  I think Solaris, and anyone else
using a UFS derived FS would probably have a problem with directory
entry management, and for those areas I've already noted.  I don't
know if the NXFS I wrote for Novell's NetWare for UNIX product is
still in use anywhere, or not, these days, but if it is, the it
would have a problem, too, both in directory ops, and in secondary
inode management for EA's and resource forks.

The SGI XFS people, Novell, and the GFS people would also be good
ones to ask for input.  Microsoft and Apple, too, if it weren't
obvious.  8-).


> More generally, what impact would this have on existing RAID
> implementations, hardware or software? This is a potentially more
> damaging impact than filesystem semantics.

The real question is sector sparing, when it comes to that, and
whether it's on 4K boundaries or not, etc..  For the most part,
RAID that does parity should not care, but RAID 0 and 1 may be
a problem during a power failure, unless PHK's issue about the
write caching, and the inability to disconnect the bus on the
data portion of the write, is fixed.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 17:22:16 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 20ADF37B401; Fri, 31 Jan 2003 17:22:15 -0800 (PST)
Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 388B243F93; Fri, 31 Jan 2003 17:22:14 -0800 (PST)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0030.cvx21-bradley.dialup.earthlink.net ([209.179.192.30] helo=mindspring.com)
	by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)
	id 18emMC-0002qL-00; Fri, 31 Jan 2003 17:22:08 -0800
Message-ID: <3E3B20BF.B6F0BC6E@mindspring.com>
Date: Fri, 31 Jan 2003 17:19:59 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: phk@freebsd.org
Cc: Julian Elischer <julian@elischer.org>,
	Steve Byan <stephen_byan@maxtor.com>, freebsd-fs@freebsd.org,
	tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
References: <31671.1044043107@critter.freebsd.dk>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a42ee81c9e74eb11e50fc2b86576933bd5666fa475841a1c7a350badd9bab72f9c350badd9bab72f9c
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

phk@freebsd.org wrote:
> In message <Pine.BSF.4.21.0301311144370.45015-100000@InterJet.elischer.org>, Ju
> lian Elischer writes:
> >One thign I thought of is that it is not uncommon to 'dd' an entire
> >filesystem from one partition to another.
> >If we create a filesystem that is 'aligned' and we copy it to be
> >'unalligned', we'd have a sudden performance drop for no immediatly
> >obvious reason. What was one write, would become a 2-sector read,
> >modify and 2-sector write. Especially when copying from one failing
> >drive to another with slightly different characteristics.
> 
> If you run dd without bs=ALOT you deserve bad throughput.

I think he means that the performance of the resulting FS, if
it had expectations of running on a 4K block size, and got a
512b one instead, would be unexpected (e.g. the only difference
between the disks is a "Q" or "R" at the end of the disk model
number, etc.).

The real answer, if that's what you mean, Julian, is that the
FS is not likely to be transportable between the devices, or,
minimally, from a 512b to a 4K, because of the existing data
not having taken the 4K alignment issues into account (e.g.
directories would be an even multiple of 512b in length, rather
than an even multiple of 4K in length).  From a 4K to a 512b,
there's might also be an offset issue, if they were not treated
internally as if they were 512b on 4K systems, for data storage,
and only treated as 4K for atomicity.

My recommendation would be to indicate doing this is no longer
supported between drives of different physical block sizes.

FWIW, the original NEC PC98 disks were 1K physical block size
disks.  It might be worthwhile to ask the PC98 folks about
problems, but I'm going to guess that none of their fictitious
geometries, before they moved to using standard disks, was
ever an odd sector count per track.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Jan 31 18:27:26 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 2219F37B401; Fri, 31 Jan 2003 18:27:25 -0800 (PST)
Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id C32D343F43; Fri, 31 Jan 2003 18:27:23 -0800 (PST)
	(envelope-from grog@lemis.com)
Received: by wantadilla.lemis.com (Postfix, from userid 1004)
	id 0D4F651987; Sat,  1 Feb 2003 12:57:17 +1030 (CST)
Date: Sat, 1 Feb 2003 12:57:16 +1030
From: Greg 'groggy' Lehey <grog@FreeBSD.org>
To: Steve Byan <stephen_byan@maxtor.com>
Cc: phk@freebsd.org, freebsd-fs@freebsd.org, tech-kern@netbsd.org
Subject: Track buffering (was: DEV_B_SIZE)
Message-ID: <20030201022716.GO92530@wantadilla.lemis.com>
References: <2639.1044031853@critter.freebsd.dk> <F4D99E08-353D-11D7-B26B-00306548867E@maxtor.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <F4D99E08-353D-11D7-B26B-00306548867E@maxtor.com>
User-Agent: Mutt/1.4i
Organization: The FreeBSD Project
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-418-838-708
WWW-Home-Page: http://www.FreeBSD.org/
X-PGP-Fingerprint: 9A1B 8202 BCCE B846 F92F  09AC 22E6 F290 507A 4223
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

On Friday, 31 January 2003 at 12:03:44 -0500, Steve Byan wrote:
>
> On Friday, January 31, 2003, at 11:50  AM, phk@freebsd.org wrote:
>> It was my impression that already many drives write entire tracks
>> as atomic units, at least we have had plenty of anecdotal evidence
>> to this effect ?
>
> I'm not aware of any SCSI or ATA disks which do this; certainly no
> Maxtor disk does. Count-key-data mainframe disks can be formatted to do
> so, but such disks probably don't run Unix. Caching in ATA disks might
> lead one to believe that the disk could corrupt an entire track, in the
> sense that a panic ( aka bluescreen) or a power-failure would cause all
> pending writes in its buffer to be lost, but even in ATA-land I don't
> believe a power failure would result in more than one disk block
> returning an uncorrectable read error.

A couple of years back I did some power fail testing on IBM IDE
drives.  On one occasion I managed to blow out a whole range of
sectors (about 80), which I attributed to trashing a track buffer.

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Feb  1  0:40:48 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 36B9137B422
	for <freebsd-fs@FreeBSD.ORG>; Sat,  1 Feb 2003 00:40:43 -0800 (PST)
Received: from host213-122-108-127.in-addr.btopenworld.com (host213-122-108-127.in-addr.btopenworld.com [213.122.108.127])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2552F43FA7
	for <freebsd-fs@FreeBSD.ORG>; Sat,  1 Feb 2003 00:40:36 -0800 (PST)
	(envelope-from dsl@l8s.co.uk)
Received: (from dsl@localhost)
	by snowdrop.l8s.co.uk (8.11.6/8.11.6) id h118ise01592;
	Sat, 1 Feb 2003 08:44:54 GMT
Date: Sat, 1 Feb 2003 08:44:54 +0000
From: David Laight <david@l8s.co.uk>
To: Steve Byan <stephen_byan@maxtor.com>
Cc: freebsd-fs@FreeBSD.ORG, tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
Message-ID: <20030201084454.A1388@snowdrop.l8s.co.uk>
References: <4912E0FE-3539-11D7-B26B-00306548867E@maxtor.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5.1i
In-Reply-To: <4912E0FE-3539-11D7-B26B-00306548867E@maxtor.com>; from stephen_byan@maxtor.com on Fri, Jan 31, 2003 at 11:30:18AM -0500
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

The only reason I can see for supporting 512byte reads is to allow
them to to be used as system disks without requiring a BIOS update.

I suspect that the only reason that the BSD systems don't support
sector sizes other than 512 is a lack of test media.
Indeed someone has recently gone through the netbsd code getting
it to work with (IIRC) 1k blocks for a specific disk.

With a test sample the ffs support would be fixed in a few days,
and probably backported to recent releases within a few weeks.

No one using windows will care :-) you could lock the ATA bus
a few times a day and they'd just reset and continue. :-)

	David

-- 
David Laight: david@l8s.co.uk

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Feb  1  1:59:10 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E573337B401
	for <freebsd-fs@FreeBSD.ORG>; Sat,  1 Feb 2003 01:59:09 -0800 (PST)
Received: from chylonia.3miasto.net (chylonia.3miasto.net [217.96.12.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id AA46B43F3F
	for <freebsd-fs@FreeBSD.ORG>; Sat,  1 Feb 2003 01:59:03 -0800 (PST)
	(envelope-from wojtek@tensor.3miasto.net)
Received: from localhost (localhost [[UNIX: localhost]])
	by chylonia.3miasto.net (8.11.6/8.11.6) with ESMTP id h119wYh01207;
	Sat, 1 Feb 2003 10:58:34 +0100 (CET)
X-Authentication-Warning: chylonia.3miasto.net: wojtek owned process doing -bs
Date: Sat, 1 Feb 2003 10:58:34 +0100 (CET)
From: Wojciech Puchar <wojtek@tensor.3miasto.net>
X-X-Sender: wojtek@chylonia.3miasto.net
To: David Laight <david@l8s.co.uk>
Cc: Steve Byan <stephen_byan@maxtor.com>, freebsd-fs@FreeBSD.ORG,
	tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE
In-Reply-To: <20030201084454.A1388@snowdrop.l8s.co.uk>
Message-ID: <Pine.NEB.4.51.0302011057270.1027@chylonia.3miasto.net>
References: <4912E0FE-3539-11D7-B26B-00306548867E@maxtor.com>
 <20030201084454.A1388@snowdrop.l8s.co.uk>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

> them to to be used as system disks without requiring a BIOS update.
>
> I suspect that the only reason that the BSD systems don't support
> sector sizes other than 512 is a lack of test media.

this is not true. older SCSI drives allow formatting with 1K sectors,
CDROM's are 2K and EMULATED usually as 512b by netbsd, magneto-opticals
are up to 4KB (and doesn't work in NetBSD because of that).

> Indeed someone has recently gone through the netbsd code getting
> it to work with (IIRC) 1k blocks for a specific disk.
>
> With a test sample the ffs support would be fixed in a few days,
> and probably backported to recent releases within a few weeks.
>
> No one using windows will care :-) you could lock the ATA bus
exactly

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Feb  1  5:18: 1 2003
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 475B037B401
	for <freebsd-fs@freebsd.org>; Sat,  1 Feb 2003 05:18:00 -0800 (PST)
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5D75043F43
	for <freebsd-fs@freebsd.org>; Sat,  1 Feb 2003 05:17:59 -0800 (PST)
	(envelope-from phk@freebsd.org)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h11DHq4W018172;
	Sat, 1 Feb 2003 14:17:57 +0100 (CET)
	(envelope-from phk@freebsd.org)
To: David Laight <david@l8s.co.uk>
Cc: Steve Byan <stephen_byan@maxtor.com>, freebsd-fs@freebsd.org,
	tech-kern@netbsd.org
Subject: Re: DEV_B_SIZE 
From: phk@freebsd.org
In-Reply-To: Your message of "Sat, 01 Feb 2003 08:44:54 GMT."
             <20030201084454.A1388@snowdrop.l8s.co.uk> 
Date: Sat, 01 Feb 2003 14:17:52 +0100
Message-ID: <18171.1044105472@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

In message <20030201084454.A1388@snowdrop.l8s.co.uk>, David Laight writes:
>The only reason I can see for supporting 512byte reads is to allow
>them to to be used as system disks without requiring a BIOS update.
>
>I suspect that the only reason that the BSD systems don't support
>sector sizes other than 512 is a lack of test media.

What gave you the impression that we don't support anything but 512 bytes ?

I'm running a 2k sectorsize device right now.


-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message