From owner-freebsd-fs  Fri Oct  2 02:34:06 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id CAA23472
          for freebsd-fs-outgoing; Fri, 2 Oct 1998 02:34:06 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from gatekeeper.tsc.tdk.com (gatekeeper.tsc.tdk.com [207.113.159.21])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id CAA23467
          for <freebsd-fs@FreeBSD.ORG>; Fri, 2 Oct 1998 02:34:03 -0700 (PDT)
          (envelope-from gdonl@tsc.tdk.com)
Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191])
	by gatekeeper.tsc.tdk.com (8.8.8/8.8.8) with ESMTP id CAA10336;
	Fri, 2 Oct 1998 02:33:45 -0700 (PDT)
	(envelope-from gdonl@tsc.tdk.com)
Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194])
	by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id CAA04444;
	Fri, 2 Oct 1998 02:33:44 -0700 (PDT)
Received: (from gdonl@localhost)
	by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id CAA15261;
	Fri, 2 Oct 1998 02:33:42 -0700 (PDT)
From: Don Lewis <Don.Lewis@tsc.tdk.com>
Message-Id: <199810020933.CAA15261@salsa.gv.tsc.tdk.com>
Date: Fri, 2 Oct 1998 02:33:42 -0700
In-Reply-To: Terry Lambert <tlambert@primenet.com>
       "Re: vm system interaction with nullfs" (Sep 13, 10:43pm)
X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95)
To: Terry Lambert <tlambert@primenet.com>, Don.Lewis@tsc.tdk.com (Don Lewis)
Subject: Re: vm system interaction with nullfs
Cc: freebsd-fs@FreeBSD.ORG
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Sep 13, 10:43pm, Terry Lambert wrote:
} Subject: Re: vm system interaction with nullfs
} > Since the vm system keeps track of what it has in memory by (vnode, offset),
} > how is this supposed to work when stackable filesystems are in use which
} > create multiple vnodes for a single filesytem object, or is this broken?
} 
} Yes, this is still broken.
} 
} This was the primary reason for the migration of a putpages/getpages
} into all "bottom-of-stack" FS's.
} 
} The general fix is to create a "getfinalvp".  This would allow you
} to page through an object, while allowing layers stacked on top to
} dictate layout/content.

I started down this path and created VOP_GETBACKINGVP.  This part was
pretty easy.  The problem is that there are a zillion references to
vp->v_object all over the place that would need to be fixed.

} Another part of the puzzle is that the vnode locking is overly
} complex (VOP_LOCK).

Yes.  This is another one of those places where I get lost in a maze of
twisty little passages ...

} > It would seem that in the case of nullfs and similar transparent filesytems,
} > the vm system should always use the lowest vnode, but this doesn't seem to
} > be implemented (though I could just be getting lost in the maze of twisty
} > little passages).  It's even messier if the layer isn't transparent,
} > like an encryption layer.
} 
} Not in all cases, actually.  For a cryptographic FS (such as the
} one one of John Heidemann's students wrote, and John sent me), you
} will want a backing object for the unencrypted data, seperate from
} the backing object of the on-disk data.
} 
} There are also cases where the in-core data and the backing data
} aren't the same size.  For some of these (like a compression layer),
} you would implement this via the comperssion layer's vp's get/putpages,
} and not operate on the backing store directly.

It looks like some clues to how to properly implement this are in the
paper you mention.

} Really, you should go to ftp.cs.ucla.edu, and read up on the
} stacking architecture.  The documents in the "ficus" directory
} are the actual design documents for the BSD 4.4 stacking
} architecture.

Got it.  I thought I'd read this paper before (a year or so ago), but
apparently I had snagged an earlier version.  The lastest one,
UCLA-CSD-950032, is much more complete.  Unfortunately, I only
managed to read the first quarter before being distracted by other
fires.

My immediate needs don't require the full functionality of stackable
filesystems, or even all that nullfs is capable of.  I really need to
finish reading this paper so I can find out if featherweight layering
meet my requirements, since that might be easier to get working.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Fri Oct  2 02:52:56 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id CAA25512
          for freebsd-fs-outgoing; Fri, 2 Oct 1998 02:52:56 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from gatekeeper.tsc.tdk.com (gatekeeper.tsc.tdk.com [207.113.159.21])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id CAA25486;
          Fri, 2 Oct 1998 02:52:54 -0700 (PDT)
          (envelope-from gdonl@tsc.tdk.com)
Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191])
	by gatekeeper.tsc.tdk.com (8.8.8/8.8.8) with ESMTP id CAA10467;
	Fri, 2 Oct 1998 02:52:37 -0700 (PDT)
	(envelope-from gdonl@tsc.tdk.com)
Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194])
	by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id CAA04737;
	Fri, 2 Oct 1998 02:52:35 -0700 (PDT)
Received: (from gdonl@localhost)
	by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id CAA15297;
	Fri, 2 Oct 1998 02:52:34 -0700 (PDT)
Date: Fri, 2 Oct 1998 02:52:34 -0700 (PDT)
From: Don Lewis <Don.Lewis@tsc.tdk.com>
Message-Id: <199810020952.CAA15297@salsa.gv.tsc.tdk.com>
To: freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG
Subject: filesystem safety and SCSI disk write caching
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


I was doing some torture testing of softupdates, CAM, and fsck by
hitting the reset button on a running system and noticed that fsck
would sometimes encounter inconsistencies in the filesystem that
should not be happening, such as directory entries that pointed to
unallocated inodes.  I tracked the problem down to write caching
being enabled on the machine's SCSI disk.  After using camcontrol to
disable write caching, this problem went away.

It would be nice if folks in this situation would get a warning that
their filesystems could get trashed because the disk has write caching
enabled.  I think the best situation would be to issue this warning at
filesystem mount time (though folks who use async mounts shouldn't get
an extra warning about write caching, their filesystems may get trashed
anyway).  This would require a communications channel between the
filesystem and CAM.  Another possibility would be to just issue a
brief warning when the device is probed at boot time.  Even a warning
in the documentation would be helpful, at least for those who bothered
to read it.

BTW, with softupdates, and tagged command queuing enabled in CAM, there
is not much of a performance hit from turning off write caching.  I
saw "make buildworld" increase from about 2 hours to 2 hours 5 minutes,
and "make -j6 buildworld" increase from about 1 hour 30 minutes to
1 hour 35 minutes.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Fri Oct  2 04:12:27 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id EAA07156
          for freebsd-fs-outgoing; Fri, 2 Oct 1998 04:12:27 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from ns1.yes.no (ns1.yes.no [195.204.136.10])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id EAA07040
          for <freebsd-fs@FreeBSD.ORG>; Fri, 2 Oct 1998 04:12:15 -0700 (PDT)
          (envelope-from eivind@bitbox.follo.net)
Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218])
	by ns1.yes.no (8.9.1a/8.9.1) with ESMTP id NAA28899;
	Fri, 2 Oct 1998 13:11:53 +0200 (CEST)
Received: (from eivind@localhost)
	by bitbox.follo.net (8.8.8/8.8.6) id NAA11739;
	Fri, 2 Oct 1998 13:11:52 +0200 (MET DST)
Message-ID: <19981002131152.26322@follo.net>
Date: Fri, 2 Oct 1998 13:11:52 +0200
From: Eivind Eklund <eivind@yes.no>
To: Don Lewis <Don.Lewis@tsc.tdk.com>, Terry Lambert <tlambert@primenet.com>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: vm system interaction with nullfs
References: <tlambert@primenet.com> <199810020933.CAA15261@salsa.gv.tsc.tdk.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.89.1i
In-Reply-To: <199810020933.CAA15261@salsa.gv.tsc.tdk.com>; from Don Lewis on Fri, Oct 02, 1998 at 02:33:42AM -0700
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Fri, Oct 02, 1998 at 02:33:42AM -0700, Don Lewis wrote:
> On Sep 13, 10:43pm, Terry Lambert wrote:
> } Subject: Re: vm system interaction with nullfs
> } > Since the vm system keeps track of what it has in memory by (vnode, offset),
> } > how is this supposed to work when stackable filesystems are in use which
> } > create multiple vnodes for a single filesytem object, or is this broken?
> } 
> } Yes, this is still broken.
> } 
> } This was the primary reason for the migration of a putpages/getpages
> } into all "bottom-of-stack" FS's.
> } 
> } The general fix is to create a "getfinalvp".  This would allow you
> } to page through an object, while allowing layers stacked on top to
> } dictate layout/content.
> 
> I started down this path and created VOP_GETBACKINGVP.  This part was
> pretty easy.  The problem is that there are a zillion references to
> vp->v_object all over the place that would need to be fixed.

"A zillion" == 63.  Interesting.  I've always wondered what exact
number it represents :-)

Fixing these are trivial.  I can do that in (much) less than an hour,
if there is general agreement that this is the way to go (I think it
sound reasonable).

This time I want bde's word he won't veto as I'm about to commit it,
though.

Eivind.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Fri Oct  2 06:26:56 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id GAA24761
          for freebsd-fs-outgoing; Fri, 2 Oct 1998 06:26:56 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from uni4nn.gn.iaf.nl (osmium.gn.iaf.nl [193.67.144.12])
          by hub.freebsd.org (8.8.8/8.8.8) with SMTP id GAA24732;
          Fri, 2 Oct 1998 06:26:39 -0700 (PDT)
          (envelope-from wilko@yedi.iaf.nl)
Received: by uni4nn.gn.iaf.nl with UUCP id AA12369
  (5.67b/IDA-1.5); Fri, 2 Oct 1998 14:59:34 +0200
Received: (from wilko@localhost) by yedi.iaf.nl (8.8.8/8.6.12) id OAA29167; Fri, 2 Oct 1998 14:06:50 +0200 (CEST)
From: Wilko Bulte <wilko@yedi.iaf.nl>
Message-Id: <199810021206.OAA29167@yedi.iaf.nl>
Subject: Re: filesystem safety and SCSI disk write caching
In-Reply-To: <199810020952.CAA15297@salsa.gv.tsc.tdk.com> from Don Lewis at "Oct 2, 98 02:52:34 am"
To: Don.Lewis@tsc.tdk.com (Don Lewis)
Date: Fri, 2 Oct 1998 14:06:50 +0200 (CEST)
Cc: freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG
X-Organisation: Private FreeBSD site - Arnhem, The Netherlands
X-Pgp-Info: PGP public key at 'finger wilko@freefall.freebsd.org'
X-Mailer: ELM [version 2.4ME+ PL38 (25)]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

As Don Lewis wrote...
> 
> I was doing some torture testing of softupdates, CAM, and fsck by
> hitting the reset button on a running system and noticed that fsck
> would sometimes encounter inconsistencies in the filesystem that
> should not be happening, such as directory entries that pointed to
> unallocated inodes.  I tracked the problem down to write caching
> being enabled on the machine's SCSI disk.  After using camcontrol to
> disable write caching, this problem went away.

Yuck. Write caching on disks is evil. I've discussed this kind of
thing at great length with the disk gurus at work. There is consensus:
write caching is not to be trusted, there are even firmware incarnations
out there that get confused by a SCSI bus reset and loose track of what
they have cached. Admittedly this is junk firmware, but it seems to
happen.

This is the kind of stuff that makes the Oracle-s of this world extremely
nervous ;-)

> It would be nice if folks in this situation would get a warning that
> their filesystems could get trashed because the disk has write caching
> enabled.  I think the best situation would be to issue this warning at
> filesystem mount time (though folks who use async mounts shouldn't get
> an extra warning about write caching, their filesystems may get trashed
> anyway).  This would require a communications channel between the
> filesystem and CAM.  Another possibility would be to just issue a
> brief warning when the device is probed at boot time.  Even a warning
> in the documentation would be helpful, at least for those who bothered
> to read it.

You'd have to dig into the modepages of the drives. Makes for a somewhat
kludgy interface in the SCSI subsystem all the way up to 'mount'.

I'd rather put it into the device probe section for the da devices.

> BTW, with softupdates, and tagged command queuing enabled in CAM, there
> is not much of a performance hit from turning off write caching.  I
> saw "make buildworld" increase from about 2 hours to 2 hours 5 minutes,
> and "make -j6 buildworld" increase from about 1 hour 30 minutes to
> 1 hour 35 minutes.

Negligible difference in my book.

Wilko
_     ______________________________________________________________________
 |   / o / /  _  Bulte 				  email: wilko@yedi.iaf.nl 
 |/|/ / / /( (_) Arnhem, The Netherlands          WWW  : http://www.tcja.nl
______________________________________________ Powered by FreeBSD __________

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Fri Oct  2 09:46:52 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id JAA27222
          for freebsd-fs-outgoing; Fri, 2 Oct 1998 09:46:52 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from pluto.plutotech.com (mail.plutotech.com [206.168.67.137])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id JAA27200;
          Fri, 2 Oct 1998 09:46:40 -0700 (PDT)
          (envelope-from gibbs@plutotech.com)
Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130])
	by pluto.plutotech.com (8.8.7/8.8.5) with ESMTP id KAA21621;
	Fri, 2 Oct 1998 10:46:20 -0600 (MDT)
Message-Id: <199810021646.KAA21621@pluto.plutotech.com>
X-Mailer: exmh version 2.0.2 2/24/98
To: Don Lewis <Don.Lewis@tsc.tdk.com>
cc: freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG
Subject: Re: filesystem safety and SCSI disk write caching 
In-reply-to: Your message of "Fri, 02 Oct 1998 02:52:34 PDT."
             <199810020952.CAA15297@salsa.gv.tsc.tdk.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 02 Oct 1998 10:39:50 -0600
From: "Justin T. Gibbs" <gibbs@plutotech.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>
>I was doing some torture testing of softupdates, CAM, and fsck by
>hitting the reset button on a running system and noticed that fsck
>would sometimes encounter inconsistencies in the filesystem that
>should not be happening, such as directory entries that pointed to
>unallocated inodes.  I tracked the problem down to write caching
>being enabled on the machine's SCSI disk.  After using camcontrol to
>disable write caching, this problem went away.

This is a non-conclusive result.  By disabling the cache, you have
effectively reduced the concurrent transaction count which may mask bugs
elsewhere in the system.  So long as you do not lose power to your SCSI
disk (which the reset button should not cause to occur), the cache should
have no impact on the results of your test.

>BTW, with softupdates, and tagged command queuing enabled in CAM, there
>is not much of a performance hit from turning off write caching.  I
>saw "make buildworld" increase from about 2 hours to 2 hours 5 minutes,
>and "make -j6 buildworld" increase from about 1 hour 30 minutes to
>1 hour 35 minutes.

In this particular benchmark, perhaps not, but make buildworld is not
indicative of most I/O loads.

--
Justin


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Fri Oct  2 12:00:38 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id MAA18200
          for freebsd-fs-outgoing; Fri, 2 Oct 1998 12:00:38 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA18175;
          Fri, 2 Oct 1998 12:00:33 -0700 (PDT)
          (envelope-from julian@whistle.com)
Received: (from daemon@localhost)
	by alpo.whistle.com (8.8.5/8.8.5) id LAA02176;
	Fri, 2 Oct 1998 11:59:40 -0700 (PDT)
Received: from current1.whistle.com(207.76.205.22)
 via SMTP by alpo.whistle.com, id smtpdTh2170; Fri Oct  2 18:59:30 1998
Date: Fri, 2 Oct 1998 11:59:26 -0700 (PDT)
From: Julian Elischer <julian@whistle.com>
To: Don Lewis <Don.Lewis@tsc.tdk.com>
cc: freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG
Subject: Re: filesystem safety and SCSI disk write caching
In-Reply-To: <199810020952.CAA15297@salsa.gv.tsc.tdk.com>
Message-ID: <Pine.BSF.3.95.981002115603.15828A-100000@current1.whistle.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

you are correct.

write caching can screw soft updates if there is any
major re-ordering of the data written.

With tags it doesn't matter if they are re-ordered, as long
as they are not acknowledged until they are on the platter.

As you noted, softupdates itself does reduce the value of 
drive write caching, and in fact I'm glad that your numbers agree
with my expectations.

julian


On Fri, 2 Oct 1998, Don Lewis wrote:

> 
> I was doing some torture testing of softupdates, CAM, and fsck by
> hitting the reset button on a running system and noticed that fsck
> would sometimes encounter inconsistencies in the filesystem that
> should not be happening, such as directory entries that pointed to
> unallocated inodes.  I tracked the problem down to write caching
> being enabled on the machine's SCSI disk.  After using camcontrol to
> disable write caching, this problem went away.
> 
> It would be nice if folks in this situation would get a warning that
> their filesystems could get trashed because the disk has write caching
> enabled.  I think the best situation would be to issue this warning at
> filesystem mount time (though folks who use async mounts shouldn't get
> an extra warning about write caching, their filesystems may get trashed
> anyway).  This would require a communications channel between the
> filesystem and CAM.  Another possibility would be to just issue a
> brief warning when the device is probed at boot time.  Even a warning
> in the documentation would be helpful, at least for those who bothered
> to read it.
> 
> BTW, with softupdates, and tagged command queuing enabled in CAM, there
> is not much of a performance hit from turning off write caching.  I
> saw "make buildworld" increase from about 2 hours to 2 hours 5 minutes,
> and "make -j6 buildworld" increase from about 1 hour 30 minutes to
> 1 hour 35 minutes.
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-scsi" in the body of the message
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Fri Oct  2 12:30:33 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id MAA21755
          for freebsd-fs-outgoing; Fri, 2 Oct 1998 12:30:33 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from pluto.plutotech.com (mail.plutotech.com [206.168.67.137])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA21737;
          Fri, 2 Oct 1998 12:30:21 -0700 (PDT)
          (envelope-from gibbs@plutotech.com)
Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130])
	by pluto.plutotech.com (8.8.7/8.8.5) with ESMTP id NAA11650;
	Fri, 2 Oct 1998 13:29:57 -0600 (MDT)
Message-Id: <199810021929.NAA11650@pluto.plutotech.com>
X-Mailer: exmh version 2.0.2 2/24/98
To: Wilko Bulte <wilko@yedi.iaf.nl>
cc: Don.Lewis@tsc.tdk.com (Don Lewis), freebsd-fs@FreeBSD.ORG,
        freebsd-scsi@FreeBSD.ORG
Subject: Re: filesystem safety and SCSI disk write caching 
In-reply-to: Your message of "Fri, 02 Oct 1998 14:06:50 +0200."
             <199810021206.OAA29167@yedi.iaf.nl> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 02 Oct 1998 13:23:27 -0600
From: "Justin T. Gibbs" <gibbs@plutotech.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>Yuck. Write caching on disks is evil. I've discussed this kind of
>thing at great length with the disk gurus at work. There is consensus:
>write caching is not to be trusted, there are even firmware incarnations
>out there that get confused by a SCSI bus reset and loose track of what
>they have cached. Admittedly this is junk firmware, but it seems to
>happen.

Your statement doesn't seem to be "write caching is inherently evil",
but "there are many drives with bogus firmware where write caching is
evil".  There is a big difference.  If you have a sane device and a
UPS, write caching is not evil at all.

--
Justin


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Fri Oct  2 12:53:38 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id MAA25037
          for freebsd-fs-outgoing; Fri, 2 Oct 1998 12:53:38 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from pluto.plutotech.com (mail.plutotech.com [206.168.67.137])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA25020;
          Fri, 2 Oct 1998 12:53:34 -0700 (PDT)
          (envelope-from gibbs@plutotech.com)
Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130])
	by pluto.plutotech.com (8.8.7/8.8.5) with ESMTP id NAA12817;
	Fri, 2 Oct 1998 13:53:13 -0600 (MDT)
Message-Id: <199810021953.NAA12817@pluto.plutotech.com>
X-Mailer: exmh version 2.0.2 2/24/98
To: Julian Elischer <julian@whistle.com>
cc: Don Lewis <Don.Lewis@tsc.tdk.com>, freebsd-fs@FreeBSD.ORG,
        freebsd-scsi@FreeBSD.ORG
Subject: Re: filesystem safety and SCSI disk write caching 
In-reply-to: Your message of "Fri, 02 Oct 1998 11:59:26 PDT."
             <Pine.BSF.3.95.981002115603.15828A-100000@current1.whistle.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 02 Oct 1998 13:46:43 -0600
From: "Justin T. Gibbs" <gibbs@plutotech.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>write caching can screw soft updates if there is any
>major re-ordering of the data written.

Only if you lose power or have a buggy device.  Go read the SCSI spec on
write caching.

>With tags it doesn't matter if they are re-ordered, as long
>as they are not acknowledged until they are on the platter.

Tagged transactions may "complete" in a non-FIFO order. "Complete" either
means data transfered into the cache or data safely on the media depending
on whether the cache is enabled.  Re-ordered writes are allowed, but, only
such that it maintains read/write coherency.  This is with the restrictive
ordering semantics that drives usually ship with by default. You can turn
on "re-order at will" through a mode page.

Waiting for Terry's long winded response to this thread,
Justin


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Fri Oct  2 15:04:13 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id PAA12706
          for freebsd-fs-outgoing; Fri, 2 Oct 1998 15:04:13 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id PAA12595;
          Fri, 2 Oct 1998 15:04:01 -0700 (PDT)
          (envelope-from tlambert@usr08.primenet.com)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.8.8/8.8.8) id PAA12256;
	Fri, 2 Oct 1998 15:03:42 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp02.primenet.com, id smtpd012230; Fri Oct  2 15:03:41 1998
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id PAA21941;
	Fri, 2 Oct 1998 15:03:35 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199810022203.PAA21941@usr08.primenet.com>
Subject: Re: filesystem safety and SCSI disk write caching
To: gibbs@plutotech.com (Justin T. Gibbs)
Date: Fri, 2 Oct 1998 22:03:35 +0000 (GMT)
Cc: julian@whistle.com, Don.Lewis@tsc.tdk.com, freebsd-fs@FreeBSD.ORG,
        freebsd-scsi@FreeBSD.ORG
In-Reply-To: <199810021953.NAA12817@pluto.plutotech.com> from "Justin T. Gibbs" at Oct 2, 98 01:46:43 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> >write caching can screw soft updates if there is any
> >major re-ordering of the data written.
> 
> Only if you lose power or have a buggy device.  Go read the SCSI spec on
> write caching.
> 
> >With tags it doesn't matter if they are re-ordered, as long
> >as they are not acknowledged until they are on the platter.
> 
> Tagged transactions may "complete" in a non-FIFO order. "Complete" either
> means data transfered into the cache or data safely on the media depending
> on whether the cache is enabled.  Re-ordered writes are allowed, but, only
> such that it maintains read/write coherency.  This is with the restrictive
> ordering semantics that drives usually ship with by default. You can turn
> on "re-order at will" through a mode page.
> 
> Waiting for Terry's long winded response to this thread,

I told you so.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Fri Oct  2 15:07:33 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id PAA13404
          for freebsd-fs-outgoing; Fri, 2 Oct 1998 15:07:33 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from pluto.plutotech.com (mail.plutotech.com [206.168.67.137])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id PAA13369;
          Fri, 2 Oct 1998 15:07:19 -0700 (PDT)
          (envelope-from gibbs@plutotech.com)
Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130])
	by pluto.plutotech.com (8.8.7/8.8.5) with ESMTP id QAA02024;
	Fri, 2 Oct 1998 16:06:56 -0600 (MDT)
Message-Id: <199810022206.QAA02024@pluto.plutotech.com>
X-Mailer: exmh version 2.0.2 2/24/98
To: Terry Lambert <tlambert@primenet.com>
cc: gibbs@plutotech.com (Justin T. Gibbs), julian@whistle.com,
        Don.Lewis@tsc.tdk.com, freebsd-fs@FreeBSD.ORG,
        freebsd-scsi@FreeBSD.ORG
Subject: Re: filesystem safety and SCSI disk write caching 
In-reply-to: Your message of "Fri, 02 Oct 1998 22:03:35 -0000."
             <199810022203.PAA21941@usr08.primenet.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 02 Oct 1998 16:00:26 -0600
From: "Justin T. Gibbs" <gibbs@plutotech.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>I told you so.

You told me some things that were in-correct and some things that
I already knew.  Par for the course.

--
Justin


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Fri Oct  2 16:37:09 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id QAA23582
          for freebsd-fs-outgoing; Fri, 2 Oct 1998 16:37:09 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from ns1.yes.no (ns1.yes.no [195.204.136.10])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA23574
          for <freebsd-fs@FreeBSD.ORG>; Fri, 2 Oct 1998 16:37:04 -0700 (PDT)
          (envelope-from eivind@bitbox.follo.net)
Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218])
	by ns1.yes.no (8.9.1a/8.9.1) with ESMTP id BAA08785;
	Sat, 3 Oct 1998 01:36:44 +0200 (CEST)
Received: (from eivind@localhost)
	by bitbox.follo.net (8.8.8/8.8.6) id BAA22519;
	Sat, 3 Oct 1998 01:36:43 +0200 (MET DST)
Message-ID: <19981003013642.65347@follo.net>
Date: Sat, 3 Oct 1998 01:36:42 +0200
From: Eivind Eklund <eivind@yes.no>
To: Don Lewis <Don.Lewis@tsc.tdk.com>, Terry Lambert <tlambert@primenet.com>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: vm system interaction with nullfs
References: <tlambert@primenet.com> <199810020933.CAA15261@salsa.gv.tsc.tdk.com> <19981002131152.26322@follo.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.89.1i
In-Reply-To: <19981002131152.26322@follo.net>; from Eivind Eklund on Fri, Oct 02, 1998 at 01:11:52PM +0200
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Fri, Oct 02, 1998 at 01:11:52PM +0200, Eivind Eklund wrote:
> On Fri, Oct 02, 1998 at 02:33:42AM -0700, Don Lewis wrote:
> > } The general fix is to create a "getfinalvp".  This would allow you
> > } to page through an object, while allowing layers stacked on top to
> > } dictate layout/content.
> > 
> > I started down this path and created VOP_GETBACKINGVP.  This part was
> > pretty easy.  The problem is that there are a zillion references to
> > vp->v_object all over the place that would need to be fixed.
> 
> "A zillion" == 63.  Interesting.  I've always wondered what exact
> number it represents :-)
> 
> Fixing these are trivial.  I can do that in (much) less than an hour,
> if there is general agreement that this is the way to go (I think it
> sound reasonable).

OK, it took a couple of hours.  There is a set of patches to introduce
VOP_GETBACKINGOBJECT() at http://www.freebsd.org/~eivind/ - these are
absolutely non-tested (ie, they compile, but I have not even tried
booting the kernel they produce).  They are there only in case
somebody want to see the scope changes required to do this.

I'm not sure how much better off we're with this, however - there
seems to be some cases that still might be impossible to get right,
e.g. if we want to split a vm object over two vnodes (to have a
file-by-file RAID-1).

I'll have to think a bit more before I know which direction to pull.

Eivind.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Fri Oct  2 18:09:16 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id SAA05141
          for freebsd-fs-outgoing; Fri, 2 Oct 1998 18:09:16 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id SAA05128;
          Fri, 2 Oct 1998 18:09:14 -0700 (PDT)
          (envelope-from tlambert@usr06.primenet.com)
Received: (from daemon@localhost)
	by smtp03.primenet.com (8.8.8/8.8.8) id SAA02504;
	Fri, 2 Oct 1998 18:08:56 -0700 (MST)
Received: from usr06.primenet.com(206.165.6.206)
 via SMTP by smtp03.primenet.com, id smtpd002479; Fri Oct  2 18:08:54 1998
Received: (from tlambert@localhost)
	by usr06.primenet.com (8.8.5/8.8.5) id SAA02581;
	Fri, 2 Oct 1998 18:08:48 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199810030108.SAA02581@usr06.primenet.com>
Subject: Re: filesystem safety and SCSI disk write caching
To: gibbs@plutotech.com (Justin T. Gibbs)
Date: Sat, 3 Oct 1998 01:08:48 +0000 (GMT)
Cc: tlambert@primenet.com, gibbs@plutotech.com, julian@whistle.com,
        Don.Lewis@tsc.tdk.com, freebsd-fs@FreeBSD.ORG,
        freebsd-scsi@FreeBSD.ORG
In-Reply-To: <199810022206.QAA02024@pluto.plutotech.com> from "Justin T. Gibbs" at Oct 2, 98 04:00:26 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> >I told you so.
> 
> You told me some things that were in-correct and some things that
> I already knew.  Par for the course.

Feel free to make his setup work with SCSI write caching enabled.

When you do, I will eat crow.

Right now, the crow is in your court.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Fri Oct  2 18:25:45 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id SAA07494
          for freebsd-fs-outgoing; Fri, 2 Oct 1998 18:25:45 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.15.68.22])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id SAA07479;
          Fri, 2 Oct 1998 18:25:22 -0700 (PDT)
          (envelope-from bde@godzilla.zeta.org.au)
Received: (from bde@localhost)
	by godzilla.zeta.org.au (8.8.7/8.8.7) id LAA18635;
	Sat, 3 Oct 1998 11:24:50 +1000
Date: Sat, 3 Oct 1998 11:24:50 +1000
From: Bruce Evans <bde@zeta.org.au>
Message-Id: <199810030124.LAA18635@godzilla.zeta.org.au>
To: Don.Lewis@tsc.tdk.com, gibbs@plutotech.com
Subject: Re: filesystem safety and SCSI disk write caching
Cc: freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>>BTW, with softupdates, and tagged command queuing enabled in CAM, there
>>is not much of a performance hit from turning off write caching.  I
>>saw "make buildworld" increase from about 2 hours to 2 hours 5 minutes,
>>and "make -j6 buildworld" increase from about 1 hour 30 minutes to
>>1 hour 35 minutes.
>
>In this particular benchmark, perhaps not, but make buildworld is not
>indicative of most I/O loads.

I think it can be interpreted as showing that the performance hit is
very large.  `make world' is mostly cpu-bound, and most of it's i/o's
are reads (60% here).  I guess it spends less than 5 minutes of its time
writing (27000 block output operations here).  An increase of 5 minutes
is very large.

Bruce

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Sat Oct  3 00:36:25 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id AAA11358
          for freebsd-fs-outgoing; Sat, 3 Oct 1998 00:36:25 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from uni4nn.gn.iaf.nl (osmium.gn.iaf.nl [193.67.144.12])
          by hub.freebsd.org (8.8.8/8.8.8) with SMTP id AAA11340;
          Sat, 3 Oct 1998 00:36:20 -0700 (PDT)
          (envelope-from wilko@yedi.iaf.nl)
Received: by uni4nn.gn.iaf.nl with UUCP id AA00139
  (5.67b/IDA-1.5); Sat, 3 Oct 1998 09:35:00 +0200
Received: (from wilko@localhost) by yedi.iaf.nl (8.8.8/8.6.12) id AAA04758; Sat, 3 Oct 1998 00:54:44 +0200 (CEST)
From: Wilko Bulte <wilko@yedi.iaf.nl>
Message-Id: <199810022254.AAA04758@yedi.iaf.nl>
Subject: Re: filesystem safety and SCSI disk write caching
In-Reply-To: <199810021929.NAA11650@pluto.plutotech.com> from "Justin T. Gibbs" at "Oct 2, 98 01:23:27 pm"
To: gibbs@plutotech.com (Justin T. Gibbs)
Date: Sat, 3 Oct 1998 00:54:44 +0200 (CEST)
Cc: Don.Lewis@tsc.tdk.com, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG
X-Organisation: Private FreeBSD site - Arnhem, The Netherlands
X-Pgp-Info: PGP public key at 'finger wilko@freefall.freebsd.org'
X-Mailer: ELM [version 2.4ME+ PL38 (25)]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

As Justin T. Gibbs wrote...
> >Yuck. Write caching on disks is evil. I've discussed this kind of
> >thing at great length with the disk gurus at work. There is consensus:
> >write caching is not to be trusted, there are even firmware incarnations
> >out there that get confused by a SCSI bus reset and loose track of what
> >they have cached. Admittedly this is junk firmware, but it seems to
> >happen.
> 
> Your statement doesn't seem to be "write caching is inherently evil",
> but "there are many drives with bogus firmware where write caching is
> evil".  There is a big difference.  If you have a sane device and a

True, there is a big difference alright. But the problem is that is
not easy for an average user to find out how well behaved a disk's fw
actuall is.

> UPS, write caching is not evil at all.

Right. I for one opt for security over a (small ?) performance gain.

Wilko
_     ______________________________________________________________________
 |   / o / /  _  Bulte 				  email: wilko@yedi.iaf.nl 
 |/|/ / / /( (_) Arnhem, The Netherlands          WWW  : http://www.tcja.nl
______________________________________________ Powered by FreeBSD __________

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Sat Oct  3 18:26:53 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id SAA03839
          for freebsd-fs-outgoing; Sat, 3 Oct 1998 18:26:53 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id SAA03811;
          Sat, 3 Oct 1998 18:26:40 -0700 (PDT)
          (envelope-from tlambert@usr06.primenet.com)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.8.8/8.8.8) id SAA06542;
	Sat, 3 Oct 1998 18:26:19 -0700 (MST)
Received: from usr06.primenet.com(206.165.6.206)
 via SMTP by smtp02.primenet.com, id smtpd006510; Sat Oct  3 18:26:10 1998
Received: (from tlambert@localhost)
	by usr06.primenet.com (8.8.5/8.8.5) id SAA16639;
	Sat, 3 Oct 1998 18:26:03 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199810040126.SAA16639@usr06.primenet.com>
Subject: Re: filesystem safety and SCSI disk write caching
To: bde@zeta.org.au (Bruce Evans)
Date: Sun, 4 Oct 1998 01:26:03 +0000 (GMT)
Cc: Don.Lewis@tsc.tdk.com, gibbs@plutotech.com, freebsd-fs@FreeBSD.ORG,
        freebsd-scsi@FreeBSD.ORG
In-Reply-To: <199810030124.LAA18635@godzilla.zeta.org.au> from "Bruce Evans" at Oct 3, 98 11:24:50 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> >>BTW, with softupdates, and tagged command queuing enabled in CAM, there
> >>is not much of a performance hit from turning off write caching.  I
> >>saw "make buildworld" increase from about 2 hours to 2 hours 5 minutes,
> >>and "make -j6 buildworld" increase from about 1 hour 30 minutes to
> >>1 hour 35 minutes.
> >
> >In this particular benchmark, perhaps not, but make buildworld is not
> >indicative of most I/O loads.
> 
> I think it can be interpreted as showing that the performance hit is
> very large.  `make world' is mostly cpu-bound, and most of it's i/o's
> are reads (60% here).  I guess it spends less than 5 minutes of its time
> writing (27000 block output operations here).  An increase of 5 minutes
> is very large.

This is without "noatime".  Every inode read, is written, and every
directory inode is written multiple times, and all object files and
executables, as well as some generated sources, are written.

If you read the Ganger/Patt paper, you will see that soft updates
is within 5% of memory speed for most uses.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Sat Oct  3 18:38:01 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id SAA06080
          for freebsd-fs-outgoing; Sat, 3 Oct 1998 18:38:01 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.15.68.22])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id SAA06048;
          Sat, 3 Oct 1998 18:37:50 -0700 (PDT)
          (envelope-from bde@godzilla.zeta.org.au)
Received: (from bde@localhost)
	by godzilla.zeta.org.au (8.8.7/8.8.7) id LAA14507;
	Sun, 4 Oct 1998 11:37:22 +1000
Date: Sun, 4 Oct 1998 11:37:22 +1000
From: Bruce Evans <bde@zeta.org.au>
Message-Id: <199810040137.LAA14507@godzilla.zeta.org.au>
To: bde@zeta.org.au, tlambert@primenet.com
Subject: Re: filesystem safety and SCSI disk write caching
Cc: Don.Lewis@tsc.tdk.com, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG,
        gibbs@plutotech.com
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>> I think it can be interpreted as showing that the performance hit is
>> very large.  `make world' is mostly cpu-bound, and most of it's i/o's
>> are reads (60% here).  I guess it spends less than 5 minutes of its time
>> writing (27000 block output operations here).  An increase of 5 minutes
>> is very large.
>
>This is without "noatime".

Actually, 27000 is with "noatime" on all file systems, and with "async"
on all file systems that were written to by my `make world' (/tmp, /var/tmp,
MAKEOBJDIRPREFIX = /c/obj and DESTDIR = /c/root).

>Every inode read, is written, and every
>directory inode is written multiple times, and all object files and
>executables, as well as some generated sources, are written.

Yes, the default configuration may be much slower than mine.

Bruce

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Sat Oct  3 19:17:15 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id TAA12563
          for freebsd-fs-outgoing; Sat, 3 Oct 1998 19:17:15 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id TAA12494;
          Sat, 3 Oct 1998 19:16:55 -0700 (PDT)
          (envelope-from tlambert@usr06.primenet.com)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.8.8/8.8.8) id TAA15523;
	Sat, 3 Oct 1998 19:16:30 -0700 (MST)
Received: from usr06.primenet.com(206.165.6.206)
 via SMTP by smtp02.primenet.com, id smtpd015497; Sat Oct  3 19:16:20 1998
Received: (from tlambert@localhost)
	by usr06.primenet.com (8.8.5/8.8.5) id TAA19402;
	Sat, 3 Oct 1998 19:16:15 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199810040216.TAA19402@usr06.primenet.com>
Subject: Re: filesystem safety and SCSI disk write caching
To: bde@zeta.org.au (Bruce Evans)
Date: Sun, 4 Oct 1998 02:16:15 +0000 (GMT)
Cc: bde@zeta.org.au, tlambert@primenet.com, Don.Lewis@tsc.tdk.com,
        freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG, gibbs@plutotech.com
In-Reply-To: <199810040137.LAA14507@godzilla.zeta.org.au> from "Bruce Evans" at Oct 4, 98 11:37:22 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> >> I think it can be interpreted as showing that the performance hit is
> >> very large.  `make world' is mostly cpu-bound, and most of it's i/o's
> >> are reads (60% here).  I guess it spends less than 5 minutes of its time
> >> writing (27000 block output operations here).  An increase of 5 minutes
> >> is very large.
> >
> >This is without "noatime".
> 
> Actually, 27000 is with "noatime" on all file systems, and with "async"
> on all file systems that were written to by my `make world' (/tmp, /var/tmp,
> MAKEOBJDIRPREFIX = /c/obj and DESTDIR = /c/root).

Even better.

We are talking an increase of one hour, 30, to one hour, 35 in trade
for soft update without atime and without SCSI write caching enabled.

This is from 90 to 95 minutes, or 95/90 = 1.0555.

This is a difference of 5.6% to go from zero reliability to 100%
reliability, barring hardware failure.

In my book, this is overhead *well spent*.


I can post (once again) the results of a Novell study on server usage
patterns.  The 30,000 foot view for a typical server breaks down to:

	75%	reads
	15%	writes
	8%	directory search operations
	2%	other


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Sat Oct  3 23:00:38 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id XAA16422
          for freebsd-fs-outgoing; Sat, 3 Oct 1998 23:00:38 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from dingo.cdrom.com (castles144.castles.com [208.214.165.144])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id XAA16397;
          Sat, 3 Oct 1998 23:00:23 -0700 (PDT)
          (envelope-from mike@dingo.cdrom.com)
Received: from dingo.cdrom.com (localhost [127.0.0.1])
	by dingo.cdrom.com (8.9.1/8.8.8) with ESMTP id XAA02017;
	Sat, 3 Oct 1998 23:05:00 -0700 (PDT)
	(envelope-from mike@dingo.cdrom.com)
Message-Id: <199810040605.XAA02017@dingo.cdrom.com>
X-Mailer: exmh version 2.0.2 2/24/98
To: Bruce Evans <bde@zeta.org.au>
cc: tlambert@primenet.com, Don.Lewis@tsc.tdk.com, freebsd-fs@FreeBSD.ORG,
        freebsd-scsi@FreeBSD.ORG, gibbs@plutotech.com
Subject: Re: filesystem safety and SCSI disk write caching 
In-reply-to: Your message of "Sun, 04 Oct 1998 11:37:22 +1000."
             <199810040137.LAA14507@godzilla.zeta.org.au> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Sat, 03 Oct 1998 23:04:58 -0700
From: Mike Smith <mike@smith.net.au>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> >> I think it can be interpreted as showing that the performance hit is
> >> very large.  `make world' is mostly cpu-bound, and most of it's i/o's
> >> are reads (60% here).  I guess it spends less than 5 minutes of its time
> >> writing (27000 block output operations here).  An increase of 5 minutes
> >> is very large.
> >
> >This is without "noatime".
> 
> Actually, 27000 is with "noatime" on all file systems, and with "async"
> on all file systems that were written to by my `make world' (/tmp, /var/tmp,
> MAKEOBJDIRPREFIX = /c/obj and DESTDIR = /c/root).
> 
> >Every inode read, is written, and every
> >directory inode is written multiple times, and all object files and
> >executables, as well as some generated sources, are written.
> 
> Yes, the default configuration may be much slower than mine.

I can definitely back your basic point ('make world' is CPU bound) up.  
On a 4-way Xeon system with slow disks we were still able to get down 
around 40 minutes.

-- 
\\  Sometimes you're ahead,       \\  Mike Smith
\\  sometimes you're behind.      \\  mike@smith.net.au
\\  The race is long, and in the  \\  msmith@freebsd.org
\\  end it's only with yourself.  \\  msmith@cdrom.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

From owner-freebsd-fs  Sat Oct  3 23:16:59 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id XAA18294
          for freebsd-fs-outgoing; Sat, 3 Oct 1998 23:16:59 -0700 (PDT)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.15.68.22])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id XAA18279;
          Sat, 3 Oct 1998 23:16:54 -0700 (PDT)
          (envelope-from bde@godzilla.zeta.org.au)
Received: (from bde@localhost)
	by godzilla.zeta.org.au (8.8.7/8.8.7) id QAA26536;
	Sun, 4 Oct 1998 16:16:24 +1000
Date: Sun, 4 Oct 1998 16:16:24 +1000
From: Bruce Evans <bde@zeta.org.au>
Message-Id: <199810040616.QAA26536@godzilla.zeta.org.au>
To: bde@zeta.org.au, mike@smith.net.au
Subject: Re: filesystem safety and SCSI disk write caching
Cc: Don.Lewis@tsc.tdk.com, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG,
        gibbs@plutotech.com, tlambert@primenet.com
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>> Yes, the default configuration may be much slower than mine.
>
>I can definitely back your basic point ('make world' is CPU bound) up.  
>On a 4-way Xeon system with slow disks we were still able to get down 
>around 40 minutes.

Er, that shows that it is i/o bound on systems with so much CPU.  I
got it down to 75 minutes on 1-way K6-233 with 1 IDE disk before it
was bloated by perl5 and transition to elf.

Bruce

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message