From owner-freebsd-fs@FreeBSD.ORG  Sun Apr 13 10:55:10 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4FD9837B405; Sun, 13 Apr 2003 10:55:09 -0700 (PDT)
Received: from chez.McKusick.COM (chez.mckusick.com [209.31.233.177])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id CBF0643F93; Sun, 13 Apr 2003 10:55:07 -0700 (PDT)
	(envelope-from mckusick@mckusick.com)
Received: from beastie.mckusick.com (localhost [127.0.0.1])
	by beastie.mckusick.com (8.12.8/8.12.3) with ESMTP id h3D04Vb5006635;
	Sat, 12 Apr 2003 17:04:32 -0700 (PDT)
	(envelope-from mckusick@beastie.mckusick.com)
Message-Id: <200304130004.h3D04Vb5006635@beastie.mckusick.com>
To: Marko Zec <zec@tel.fer.hr>
In-Reply-To: Your message of "Sat, 12 Apr 2003 03:41:17 +0200."
             <3E976EBD.C3E66EF8@tel.fer.hr> 
Date: Sat, 12 Apr 2003 17:04:31 -0700
From: Kirk McKusick <mckusick@mckusick.com>
cc: freebsd-fs@freebsd.org
cc: freebsd-stable@freebsd.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 13 Apr 2003 17:55:10 -0000

I am of the opinion that fsync should work. Applications like
`vi' use fsync to ensure that the write of the new file is on
stable store before removing the old copy. If that semantic
is broken, it would be possible to have neither the old nor
the new copy of your file after a crash. I do not consider
that acceptable behavior. Further, the fsync call is used
to ensure that link/unlink/rename have been completed. So
more than just fsync is being affected by your change. Lastly,
I often write out a file when I am about to suspend my laptop
(for low battery or other reasons) and I really want that file
on the disk now. I do not want to have to wait for it to decide
at some future time to spin up the disk.

I suggest that you make the disabling of fsync a separate
option from the rest of your change so that people can
decide for themselves whether they want partial savings
with working semantics, or greater savings with broken
semantics. I am also intrigued by the changes proposed by
Ian Dowse that may better accomplish the same goals with
less breakage.

	Kirk McKusick

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 14 03:19:39 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id B81F937B404; Mon, 14 Apr 2003 03:19:39 -0700 (PDT)
Received: from HAL9000.homeunix.com (12-233-57-131.client.attbi.com
	[12.233.57.131])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 9E3C543F75; Mon, 14 Apr 2003 03:19:38 -0700 (PDT)
	(envelope-from das@FreeBSD.ORG)
Received: from HAL9000.homeunix.com (localhost [127.0.0.1])
	by HAL9000.homeunix.com (8.12.9/8.12.5) with ESMTP id h3EAJaN7018721;
	Mon, 14 Apr 2003 03:19:36 -0700 (PDT)
	(envelope-from das@FreeBSD.ORG)
Received: (from das@localhost)
	by HAL9000.homeunix.com (8.12.9/8.12.5/Submit) id h3EAJZZI018720;
	Mon, 14 Apr 2003 03:19:35 -0700 (PDT)
	(envelope-from das@FreeBSD.ORG)
Date: Mon, 14 Apr 2003 03:19:35 -0700
From: David Schultz <das@FreeBSD.ORG>
To: Marko Zec <zec@tel.fer.hr>
Message-ID: <20030414101935.GB18110@HAL9000.homeunix.com>
Mail-Followup-To: Marko Zec <zec@tel.fer.hr>, freebsd-fs@freebsd.org,
	freebsd-stable@freebsd.org, mckusick@McKusick.COM
References: <3E976EBD.C3E66EF8@tel.fer.hr>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3E976EBD.C3E66EF8@tel.fer.hr>
cc: freebsd-fs@FreeBSD.ORG
cc: mckusick@McKusick.COM
cc: freebsd-stable@FreeBSD.ORG
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Apr 2003 10:19:40 -0000

On Sat, Apr 12, 2003, Marko Zec wrote:
> Here's a patch against 4.8-RELEASE kernel that allows disk writes on
> softupdates-enabled filesystems to be delayed for (theoretically)
> arbitrarily long periods of time. The motivation for such updating
> policy is surprisingly not purely suicidal - it can allow disks on
> laptops to spin down immediately after I/O operations and stay idle for
> longer periods of time, thus saving considerable amount of battery
> power.

Very nice!  I have been thinking about doing something like this
for a long time, but I never managed to find the time.  Some
comments:

- As others have mentioned, the fsync-disabling feature is questionable
  and ought to be separate.  You can make it somewhat more useful by at
  least guaranteeing transactional consistency, i.e. by treating every
  fsync() call as a write barrier.  You would need to ensure this for
  both data and metadata, which I expect would be devilishly hard to do
  within the softupdates framework.  However, you might be able to
  accomplish it at the disk buffer level.  For instance, you could
  have fsync() push the appropriate dirty buffers out to a separate
  cache, then commit the contents of the cache in the order of the
  fsyncs when the disk is next active.

- The fiddling with rushjob seems rather arbitrary.  You can probably
  just let the existing code increment it as necessary and force a sync
  if the value gets too high.

- Patches against -CURRENT would be nice.  (Sorry, that will be a doosie.)

- It looks like you have a few separate changes in there, such as
	+	TUNABLE_INT_FETCH("kern.maxvnodes", &desiredvnodes);
  and
	-	long starttime;
	+	time_t starttime;

From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 14 16:47:31 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id B01E437B401
	for <fs@freebsd.org>; Mon, 14 Apr 2003 16:47:31 -0700 (PDT)
Received: from mail-out2.apple.com (mail-out2.apple.com [17.254.0.51])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 171A843F85
	for <fs@freebsd.org>; Mon, 14 Apr 2003 16:47:31 -0700 (PDT)
	(envelope-from mday@apple.com)
Received: from mailgate1.apple.com (A17-128-100-225.apple.com
	[17.128.100.225])
	by mail-out2.apple.com (8.12.9/8.12.9) with ESMTP id h3ENlVQd008164
	for <fs@freebsd.org>; Mon, 14 Apr 2003 16:47:31 -0700 (PDT)
Received: from scv1.apple.com (scv1.apple.com) by mailgate1.apple.com
	<T61997b8740118064e13f4@mailgate1.apple.com>;
	Mon, 14 Apr 2003 16:47:17 -0700
Received: from apple.com (daylight.apple.com [17.202.44.244])
	by scv1.apple.com (8.12.9/8.12.9) with ESMTP id h3ENlIVX016100;
	Mon, 14 Apr 2003 16:47:18 -0700 (PDT)
Date: Mon, 14 Apr 2003 16:46:59 -0700
Content-Type: text/plain; charset=US-ASCII; format=flowed
Mime-Version: 1.0 (Apple Message framework v552)
To: mistral@imasy.or.jp (Yoshihiko Sarumaru)
From: Mark Day <mday@apple.com>
In-Reply-To: <030413020639.M0101472@mistral.imasy.or.jp>
Message-Id: <627913C9-6ED3-11D7-A790-00039354009A@apple.com>
Content-Transfer-Encoding: 7bit
X-Mailer: Apple Mail (2.552)
cc: fs@freebsd.org
Subject: Re: time stamp on msdosfs could not be set by general user
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Apr 2003 23:47:32 -0000

On Saturday, April 12, 2003, at 10:06  AM, Yoshihiko Sarumaru wrote:

> mistral% cp -p somefile /dos/
> cp: utimes: /dos/somefile: Operation not permitted
> cp: chmod: /dos/somefile: Operation not permitted
>
> I can understand errors about chmod, but I can not understand errors
> about utimes and modified time could not be set at all.

This is a consequence of the user and group IDs not being settable 
per-file on DOS volumes.  In effect, the user and group IDs are being 
changed behind your back.

> Below patch ignores unmatching of user and file owner

Which means that the user who did the "cp" is not the same as the user 
associated with the volume (the one who owns everything on that volume 
-- which is settable via a mount option).

But since you were able to create the file in the first place, the user 
doing the cp must have had write access (as part of the group, or 
world).

Workarounds would be to do the cp as root, or mount the volume as owned 
by the same user as the one doing the cp.

> Any objection ?

Hard to say.  It violates the documented behavior of utimes -- that 
only the owner should be able to modify the times.  But if the volume 
properly stored user and group IDs, you would have been the owner of 
the file, and the utimes would have worked in this case.

Your change would allow utimes to work even for a file you didn't just 
create, as long as you had write access.  That's potentially a security 
problem, but msdosfs doesn't really have security to begin with.

For comparison, Darwin and Mac OS X generally avoid the problem by the 
way they manage the user ID.  By default, everything on a msdosfs 
volume is owned by a special user ID that gets mapped dynamically to 
whoever is logged in at the console.  For the one-user-at-a-time case, 
this works well.  But a different user would get the same behavior you 
see.

-Mark

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 15 07:12:52 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2737337B401
	for <fs@freebsd.org>; Tue, 15 Apr 2003 07:12:52 -0700 (PDT)
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E643643F3F
	for <fs@freebsd.org>; Tue, 15 Apr 2003 07:12:50 -0700 (PDT)
	(envelope-from bde@zeta.org.au)
Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246])
	by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id AAA30745;
	Wed, 16 Apr 2003 00:12:37 +1000
Date: Wed, 16 Apr 2003 00:12:37 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Mark Day <mday@apple.com>
In-Reply-To: <627913C9-6ED3-11D7-A790-00039354009A@apple.com>
Message-ID: <20030415233658.E1376@gamplex.bde.org>
References: <627913C9-6ED3-11D7-A790-00039354009A@apple.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: Yoshihiko Sarumaru <mistral@imasy.or.jp>
cc: fs@freebsd.org
Subject: Re: time stamp on msdosfs could not be set by general user
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Apr 2003 14:12:52 -0000

On Mon, 14 Apr 2003, Mark Day wrote:

> On Saturday, April 12, 2003, at 10:06  AM, Yoshihiko Sarumaru wrote:
>
> > mistral% cp -p somefile /dos/
> > cp: utimes: /dos/somefile: Operation not permitted
> > cp: chmod: /dos/somefile: Operation not permitted
> >
> > I can understand errors about chmod, but I can not understand errors
> > about utimes and modified time could not be set at all.
>
> This is a consequence of the user and group IDs not being settable
> per-file on DOS volumes.  In effect, the user and group IDs are being
> changed behind your back.

Not really behind one's back.  They are set to constants determined at
mount time, and whoever had mount permission usually has permission to
decide them.

> > Below patch ignores unmatching of user and file owner
>
> Which means that the user who did the "cp" is not the same as the user
> associated with the volume (the one who owns everything on that volume
> -- which is settable via a mount option).
>
> But since you were able to create the file in the first place, the user
> doing the cp must have had write access (as part of the group, or
> world).

It is also settable via chown on the mount point (before mounting).
I use root:msdosfs and am in group msdosfs so that I can access them
like I want except for this problem.

> Workarounds would be to do the cp as root, or mount the volume as owned
> by the same user as the one doing the cp.

Neither is what I like.  I normally do
"cp -p somefile /dospartition/somewhere", then say "@&*@^" and switch to
another terminal running a root shell and repeat the copy, except when
copying a lot of files I use root too much.

> > Any objection ?
>
> Hard to say.  It violates the documented behavior of utimes -- that
> only the owner should be able to modify the times.  But if the volume
> properly stored user and group IDs, you would have been the owner of
> the file, and the utimes would have worked in this case.
>
> Your change would allow utimes to work even for a file you didn't just
> create, as long as you had write access.  That's potentially a security
> problem, but msdosfs doesn't really have security to begin with.

I don't like ignoring the ownerships completely.  Perhaps relaxing the
ownership check to a group membership check would be acceptable.
msdosfs honors the ownerships for everything now, so it is no more
insecure than the configured ownerships permit.

Not that utimes with a null arg works now, since that only requires
write permission.  So we can change the timestamps to "now" by using
utimes().  This is OK (not just for msdosfs) since it is nothing more
than we could do using write()+truncate() and read().

> For comparison, Darwin and Mac OS X generally avoid the problem by the
> way they manage the user ID.  By default, everything on a msdosfs
> volume is owned by a special user ID that gets mapped dynamically to
> whoever is logged in at the console.  For the one-user-at-a-time case,
> this works well.  But a different user would get the same behavior you
> see.

FreeBSD only has non-dynamic mapping via /etc/fbtab.  This isn't quite
enough even for a one-user system since everything normally get mounted
before anyone can log in.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 15 11:25:28 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8B4A237B401; Tue, 15 Apr 2003 11:25:28 -0700 (PDT)
Received: from mail.tel.fer.hr (zg03-108.dialin.iskon.hr [213.191.135.109])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id B466343FA3; Tue, 15 Apr 2003 11:25:24 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from tel.fer.hr (marko-tp.katoda.net [192.168.201.109])
	by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3FINQxK000657;
	Tue, 15 Apr 2003 20:23:36 +0200 (CEST)
	(envelope-from zec@tel.fer.hr)
Message-ID: <3E9C4E85.F1F578B6@tel.fer.hr>
Date: Tue, 15 Apr 2003 20:25:09 +0200
From: Marko Zec <zec@tel.fer.hr>
X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org
Content-Type: multipart/mixed;
 boundary="------------CB13BF8AD3C84FDA09AF0A11"
Subject: UPDATE: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Apr 2003 18:25:29 -0000

This is a multi-part message in MIME format.
--------------CB13BF8AD3C84FDA09AF0A11
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Attached are updated patches (against both 4.8 and 5.0) for delaying
disk buffer synching on softupdates-enabled FS. The original patch
started a rather lengthy debate whether when disk updates are being
delayed the fsync() processing should be delayed as well. As Kirk
McKusick already summarized, some people will prefer partial battery
power savings with working fsync() semantics, while other will desire
greater savings with broken semantics. Therefore as suggested the
updated patch introduces an additional sysctl tunable
vfs.ena_lazy_fsync, which controls whether fsync() calls will be ignored
or not. Note that when vfs.sync_extdelay is set to 0, vfs.ena_lazy_fsync
has no effect, i.e. fsync() always works with standard semantics.

Cheers,

Marko


--------------CB13BF8AD3C84FDA09AF0A11
Content-Type: text/plain; charset=us-ascii;
 name="syncdelay-4.8.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="syncdelay-4.8.diff"

--- /usr/src/sys.org/dev/ata/ata-disk.c	Thu Jan 30 08:19:59 2003
+++ dev/ata/ata-disk.c	Sat Apr 12 00:31:26 2003
@@ -294,6 +294,7 @@ adstrategy(struct buf *bp)
     struct ad_softc *adp = bp->b_dev->si_drv1;
     int s;
 
+    stratcalls++;
     if (adp->device->flags & ATA_D_DETACHING) {
 	bp->b_error = ENXIO;
 	bp->b_flags |= B_ERROR;
--- /usr/src/sys.org/kern/vfs_subr.c	Sun Oct 13 18:19:12 2002
+++ kern/vfs_subr.c	Mon Apr 14 23:27:52 2003
@@ -116,6 +116,13 @@ SYSCTL_INT(_vfs, OID_AUTO, reassignbufme
 static int nameileafonly = 0;
 SYSCTL_INT(_vfs, OID_AUTO, nameileafonly, CTLFLAG_RW, &nameileafonly, 0, "");
 
+int stratcalls = 0;
+int sync_extdelay = 0;
+SYSCTL_INT(_vfs, OID_AUTO, sync_extdelay, CTLFLAG_RW, &sync_extdelay, 0, "");
+
+int ena_lazy_fsync = 0;
+SYSCTL_INT(_vfs, OID_AUTO, ena_lazy_fsync, CTLFLAG_RW, &ena_lazy_fsync, 0, "");
+
 #ifdef ENABLE_VFS_IOOPT
 int vfs_ioopt = 0;
 SYSCTL_INT(_vfs, OID_AUTO, ioopt, CTLFLAG_RW, &vfs_ioopt, 0, "");
@@ -137,7 +144,7 @@ static vm_zone_t vnode_zone;
  * The workitem queue.
  */
 #define SYNCER_MAXDELAY		32
-static int syncer_maxdelay = SYNCER_MAXDELAY;	/* maximum delay time */
+int syncer_maxdelay = SYNCER_MAXDELAY;	/* maximum delay time */
 time_t syncdelay = 30;		/* max time to delay syncing data */
 time_t filedelay = 30;		/* time to delay syncing files */
 SYSCTL_INT(_kern, OID_AUTO, filedelay, CTLFLAG_RW, &filedelay, 0, "");
@@ -145,7 +152,7 @@ time_t dirdelay = 29;		/* time to delay 
 SYSCTL_INT(_kern, OID_AUTO, dirdelay, CTLFLAG_RW, &dirdelay, 0, "");
 time_t metadelay = 28;		/* time to delay syncing metadata */
 SYSCTL_INT(_kern, OID_AUTO, metadelay, CTLFLAG_RW, &metadelay, 0, "");
-static int rushjob;			/* number of slots to run ASAP */
+int rushjob;			/* number of slots to run ASAP */
 static int stat_rush_requests;	/* number of times I/O speeded up */
 SYSCTL_INT(_debug, OID_AUTO, rush_requests, CTLFLAG_RW, &stat_rush_requests, 0, "");
 
@@ -1119,7 +1127,7 @@ sched_sync(void)
 {
 	struct synclist *slp;
 	struct vnode *vp;
-	long starttime;
+	time_t starttime;
 	int s;
 	struct proc *p = updateproc;
 
@@ -1127,8 +1135,6 @@ sched_sync(void)
 	    SHUTDOWN_PRI_LAST);   
 
 	for (;;) {
-		kproc_suspend_loop(p);
-
 		starttime = time_second;
 
 		/*
@@ -1198,8 +1204,25 @@ sched_sync(void)
 		 * matter as we are just trying to generally pace the
 		 * filesystem activity.
 		 */
-		if (time_second == starttime)
+		if (time_second != starttime)
+			continue;
+
+		if (sync_extdelay >= syncer_maxdelay)
+			while (syncer_delayno == 0 && rushjob == 0 &&
+	    		    abs(time_second - starttime) < sync_extdelay) {
+				stratcalls = 0;
 			tsleep(&lbolt, PPAUSE, "syncer", 0);
+				kproc_suspend_loop(p);
+				if (stratcalls != 0 && syncer_maxdelay <
+				    abs(time_second - starttime)) {
+					rushjob = syncer_maxdelay;
+					break;
+				}
+			}
+		else {
+			tsleep(&lbolt, PPAUSE, "syncer", 0);
+			kproc_suspend_loop(p);
+		}
 	}
 }
 
--- /usr/src/sys.org/kern/vfs_syscalls.c	Thu Jan  2 18:26:18 2003
+++ kern/vfs_syscalls.c	Tue Apr 15 13:42:01 2003
@@ -563,6 +563,9 @@ sync(p, uap)
 	register struct mount *mp, *nmp;
 	int asyncflag;
 
+	/* Notify sched_sync() to try flushing syncer_workitem_pending[*] */
+	rushjob += syncer_maxdelay; 
+
 	simple_lock(&mountlist_slock);
 	for (mp = TAILQ_FIRST(&mountlist); mp != NULL; mp = nmp) {
 		if (vfs_busy(mp, LK_NOWAIT, &mountlist_slock, p)) {
@@ -2627,6 +2630,10 @@ fsync(p, uap)
 	struct file *fp;
 	vm_object_t obj;
 	int error;
+
+	/* Just return if we are artificially delaying disk syncs */
+	if (sync_extdelay && ena_lazy_fsync)
+		return (0);
 
 	if ((error = getvnode(p->p_fd, SCARG(uap, fd), &fp)) != 0)
 		return (error);
--- /usr/src/sys.org/ufs/ffs/ffs_alloc.c	Fri Sep 21 21:15:21 2001
+++ ufs/ffs/ffs_alloc.c	Sat Apr 12 00:06:20 2003
@@ -125,6 +125,10 @@ ffs_alloc(ip, lbn, bpref, size, cred, bn
 #endif /* DIAGNOSTIC */
 	if (size == fs->fs_bsize && fs->fs_cstotal.cs_nbfree == 0)
 		goto nospace;
+	/* Speedup flushing of syncer_wokitem_pending[*] if low on freespace */
+	if (rushjob == 0 &&
+	    freespace(fs, fs->fs_minfree + 2) - numfrags(fs, size) < 0)
+		rushjob = syncer_maxdelay;
 	if (cred->cr_uid != 0 &&
 	    freespace(fs, fs->fs_minfree) - numfrags(fs, size) < 0)
 		goto nospace;
@@ -195,6 +199,10 @@ ffs_realloccg(ip, lbprev, bpref, osize, 
 	if (cred == NOCRED)
 		panic("ffs_realloccg: missing credential");
 #endif /* DIAGNOSTIC */
+	/* Speedup flushing of syncer_wokitem_pending[*] if low on freespace */
+	if (rushjob == 0 &&
+	    freespace(fs, fs->fs_minfree + 2) - numfrags(fs, nsize - osize) < 0)
+		rushjob = syncer_maxdelay;
 	if (cred->cr_uid != 0 &&
 	    freespace(fs, fs->fs_minfree) -  numfrags(fs, nsize - osize) < 0)
 		goto nospace;
--- /usr/src/sys.org/sys/buf.h	Sat Jan 25 20:02:23 2003
+++ sys/buf.h	Sat Apr 12 00:30:48 2003
@@ -478,6 +478,7 @@ extern char	*buffers;		/* The buffer con
 extern int	bufpages;		/* Number of memory pages in the buffer pool. */
 extern struct	buf *swbuf;		/* Swap I/O buffer headers. */
 extern int	nswbuf;			/* Number of swap I/O buffer headers. */
+extern int	stratcalls;		/* I/O ops since last buffer sync */
 extern TAILQ_HEAD(swqueue, buf) bswlist;
 extern TAILQ_HEAD(bqueues, buf) bufqueues[BUFFER_QUEUES];
 
--- /usr/src/sys.org/sys/vnode.h	Sun Dec 29 19:19:53 2002
+++ sys/vnode.h	Mon Apr 14 23:28:36 2003
@@ -294,6 +294,10 @@ extern	struct vm_zone *namei_zone;
 extern	int prtactive;			/* nonzero to call vprint() */
 extern	struct vattr va_null;		/* predefined null vattr structure */
 extern	int vfs_ioopt;
+extern	int rushjob;
+extern	int syncer_maxdelay;
+extern	int sync_extdelay;
+extern	int ena_lazy_fsync;
 
 /*
  * Macro/function to check for client cache inconsistency w.r.t. leasing.

--------------CB13BF8AD3C84FDA09AF0A11
Content-Type: text/plain; charset=us-ascii;
 name="syncdelay-5.0.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="syncdelay-5.0.diff"

--- /usr/src/sys.org/dev/ata/ata-disk.c	Sat Nov 16 09:07:36 2002
+++ dev/ata/ata-disk.c	Tue Apr 15 15:23:37 2003
@@ -289,6 +289,7 @@ adstrategy(struct bio *bp)
     struct ad_softc *adp = bp->bio_dev->si_drv1;
     int s;
 
+    stratcalls++;
     if (adp->device->flags & ATA_D_DETACHING) {
 	biofinish(bp, NULL, ENXIO);
 	return;
--- /usr/src/sys.org/kern/vfs_subr.c	Sat Nov 16 09:08:02 2002
+++ kern/vfs_subr.c	Tue Apr 15 15:34:19 2003
@@ -73,6 +73,8 @@
 #include <vm/vm_page.h>
 #include <vm/uma.h>
 
+#define abs(x)                      (((x) < 0) ? -(x) : (x))
+
 static MALLOC_DEFINE(M_NETADDR, "Export Host", "Export host address structure");
 
 static void	addalias(struct vnode *vp, dev_t nvp_rdev);
@@ -130,6 +132,13 @@ SYSCTL_INT(_vfs, OID_AUTO, reassignbufca
 static int nameileafonly;
 SYSCTL_INT(_vfs, OID_AUTO, nameileafonly, CTLFLAG_RW, &nameileafonly, 0, "");
 
+int stratcalls = 0;
+int sync_extdelay = 0;
+SYSCTL_INT(_vfs, OID_AUTO, sync_extdelay, CTLFLAG_RW, &sync_extdelay, 0, "");
+
+int ena_lazy_fsync = 0;
+SYSCTL_INT(_vfs, OID_AUTO, ena_lazy_fsync, CTLFLAG_RW, &ena_lazy_fsync, 0, "");
+
 #ifdef ENABLE_VFS_IOOPT
 /* See NOTES for a description of this setting. */
 int vfs_ioopt;
@@ -208,7 +217,7 @@ static struct synclist *syncer_workitem_
 static struct mtx sync_mtx;
 
 #define SYNCER_MAXDELAY		32
-static int syncer_maxdelay = SYNCER_MAXDELAY;	/* maximum delay time */
+int syncer_maxdelay = SYNCER_MAXDELAY;	/* maximum delay time */
 static int syncdelay = 30;		/* max time to delay syncing data */
 static int filedelay = 30;		/* time to delay syncing files */
 SYSCTL_INT(_kern, OID_AUTO, filedelay, CTLFLAG_RW, &filedelay, 0, "");
@@ -216,7 +225,7 @@ static int dirdelay = 29;		/* time to de
 SYSCTL_INT(_kern, OID_AUTO, dirdelay, CTLFLAG_RW, &dirdelay, 0, "");
 static int metadelay = 28;		/* time to delay syncing metadata */
 SYSCTL_INT(_kern, OID_AUTO, metadelay, CTLFLAG_RW, &metadelay, 0, "");
-static int rushjob;		/* number of slots to run ASAP */
+int rushjob;			/* number of slots to run ASAP */
 static int stat_rush_requests;	/* number of times I/O speeded up */
 SYSCTL_INT(_debug, OID_AUTO, rush_requests, CTLFLAG_RW, &stat_rush_requests, 0, "");
 
@@ -1669,7 +1678,7 @@ sched_sync(void)
 	struct synclist *slp;
 	struct vnode *vp;
 	struct mount *mp;
-	long starttime;
+	time_t starttime;
 	int s;
 	struct thread *td = FIRST_THREAD_IN_PROC(updateproc);  /* XXXKSE */
 
@@ -1679,8 +1688,6 @@ sched_sync(void)
 	    SHUTDOWN_PRI_LAST);
 
 	for (;;) {
-		kthread_suspend_check(td->td_proc);
-
 		starttime = time_second;
 
 		/*
@@ -1765,8 +1772,25 @@ sched_sync(void)
 		 * matter as we are just trying to generally pace the
 		 * filesystem activity.
 		 */
-		if (time_second == starttime)
+		if (time_second != starttime)
+			continue;
+
+		if (sync_extdelay >= syncer_maxdelay)
+			while (syncer_delayno == 0 && rushjob == 0 &&
+	    		    abs(time_second - starttime) < sync_extdelay) {
+				stratcalls = 0;
 			tsleep(&lbolt, PPAUSE, "syncer", 0);
+				kthread_suspend_check(td->td_proc);
+				if (stratcalls != 0 && syncer_maxdelay <
+				    abs(time_second - starttime)) {
+					rushjob = syncer_maxdelay;
+					break;
+				}
+			}
+		else {
+			tsleep(&lbolt, PPAUSE, "syncer", 0);
+			kthread_suspend_check(td->td_proc);
+		}
 	}
 }
 
--- /usr/src/sys.org/kern/vfs_syscalls.c	Sat Nov 16 09:08:02 2002
+++ kern/vfs_syscalls.c	Tue Apr 15 17:38:55 2003
@@ -123,6 +123,9 @@ sync(td, uap)
 	struct mount *mp, *nmp;
 	int asyncflag;
 
+	/* Notify sched_sync to try flushing dirty buffers */
+	rushjob += syncer_maxdelay;
+
 	mtx_lock(&mountlist_mtx);
 	for (mp = TAILQ_FIRST(&mountlist); mp != NULL; mp = nmp) {
 		if (vfs_busy(mp, LK_NOWAIT, &mountlist_mtx, td)) {
@@ -2704,6 +2707,10 @@ fsync(td, uap)
 	struct file *fp;
 	vm_object_t obj;
 	int error;
+
+	/* Just return if we are artificially delaying disk synchs */
+	if (sync_extdelay && ena_lazy_fsync)
+		return (0);
 
 	GIANT_REQUIRED;
 
--- /usr/src/sys.org/sys/bio.h	Sat Nov 16 09:08:19 2002
+++ sys/bio.h	Tue Apr 15 15:24:20 2003
@@ -134,6 +134,8 @@ bioq_first(struct bio_queue_head *head)
 	return (TAILQ_FIRST(&head->queue));
 }
 
+extern	int	stratcalls;
+
 void biodone(struct bio *bp);
 void biofinish(struct bio *bp, struct devstat *stat, int error);
 int biowait(struct bio *bp, const char *wchan);
--- /usr/src/sys.org/sys/vnode.h	Sat Nov 16 09:08:21 2002
+++ sys/vnode.h	Tue Apr 15 15:23:38 2003
@@ -361,6 +361,10 @@ extern	struct uma_zone *namei_zone;
 extern	int prtactive;			/* nonzero to call vprint() */
 extern	struct vattr va_null;		/* predefined null vattr structure */
 extern	int vfs_ioopt;
+extern	int rushjob;
+extern	int syncer_maxdelay;
+extern	int sync_extdelay;
+extern	int ena_lazy_fsync;
 
 /*
  * Macro/function to check for client cache inconsistency w.r.t. leasing.
--- /usr/src/sys.org/ufs/ffs/ffs_alloc.c	Sat Nov 16 09:08:21 2002
+++ ufs/ffs/ffs_alloc.c	Tue Apr 15 15:26:37 2003
@@ -139,6 +139,10 @@ ffs_alloc(ip, lbn, bpref, size, cred, bn
 #endif /* DIAGNOSTIC */
 	reclaimed = 0;
 retry:
+	/* Speedup flushing of dirty buffers in sched_sync */
+	if (rushjob == 0 &&
+	    freespace(fs, fs->fs_minfree + 2) - numfrags(fs, size) < 0)
+		rushjob = syncer_maxdelay;
 	if (size == fs->fs_bsize && fs->fs_cstotal.cs_nbfree == 0)
 		goto nospace;
 	if (suser_cred(cred, PRISON_ROOT) &&
@@ -222,6 +226,10 @@ ffs_realloccg(ip, lbprev, bprev, bpref, 
 #endif /* DIAGNOSTIC */
 	reclaimed = 0;
 retry:
+	/* Speedup flushing of dirty buffers in sched_sync */
+	if (rushjob == 0 &&
+	    freespace(fs, fs->fs_minfree + 2) - numfrags(fs, nsize - osize) < 0)
+		rushjob = syncer_maxdelay;
 	if (suser_cred(cred, PRISON_ROOT) &&
 	    freespace(fs, fs->fs_minfree) -  numfrags(fs, nsize - osize) < 0)
 		goto nospace;

--------------CB13BF8AD3C84FDA09AF0A11--


From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 15 11:38:06 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9778537B401; Tue, 15 Apr 2003 11:38:06 -0700 (PDT)
Received: from mail.tel.fer.hr (zg03-155.dialin.iskon.hr [213.191.135.156])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id E9E3543F85; Tue, 15 Apr 2003 11:38:04 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from tel.fer.hr (marko-tp.katoda.net [192.168.201.109])
	by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3FIa3xK000661;
	Tue, 15 Apr 2003 20:36:08 +0200 (CEST)
	(envelope-from zec@tel.fer.hr)
Message-ID: <3E9C517B.6039679A@tel.fer.hr>
Date: Tue, 15 Apr 2003 20:37:47 +0200
From: Marko Zec <zec@tel.fer.hr>
X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Kirk McKusick <mckusick@beastie.mckusick.com>
References: <200304130004.h3D04Vb5006635@beastie.mckusick.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: freebsd-fs@freebsd.org
cc: freebsd-stable@freebsd.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Apr 2003 18:38:07 -0000

Kirk McKusick wrote:

> I am of the opinion that fsync should work. Applications like
> `vi' use fsync to ensure that the write of the new file is on
> stable store before removing the old copy. If that semantic
> is broken, it would be possible to have neither the old nor
> the new copy of your file after a crash. I do not consider
> that acceptable behavior. Further, the fsync call is used
> to ensure that link/unlink/rename have been completed. So
> more than just fsync is being affected by your change. Lastly,
> I often write out a file when I am about to suspend my laptop
> (for low battery or other reasons) and I really want that file
> on the disk now. I do not want to have to wait for it to decide
> at some future time to spin up the disk.
>
> I suggest that you make the disabling of fsync a separate
> option from the rest of your change so that people can
> decide for themselves whether they want partial savings
> with working semantics, or greater savings with broken
> semantics. I am also intrigued by the changes proposed by
> Ian Dowse that may better accomplish the same goals with
> less breakage.

Tempted by a lot of opposition to the concept of (optionally) ignoring
fsync() calls when running on battery power, I wonder what effect the
concept of unconditional delaying of _all_ disk updates by ATA-disk
firmware will make on FS consistency in case of system crash or power
failure? I do not want to imply such a concept is a priori bad, however
I fail to realize its advantages over OS-controlled delaying of disk
synching.

Marko


From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 15 12:12:01 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 0291337B401; Tue, 15 Apr 2003 12:12:01 -0700 (PDT)
Received: from mail.tel.fer.hr (zg07-145.dialin.iskon.hr [213.191.150.146])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 4130743FA3; Tue, 15 Apr 2003 12:11:59 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from tel.fer.hr (marko-tp.katoda.net [192.168.201.109])
	by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3FJA6xK000670;
	Tue, 15 Apr 2003 21:10:11 +0200 (CEST)
	(envelope-from zec@tel.fer.hr)
Message-ID: <3E9C5975.43755858@tel.fer.hr>
Date: Tue, 15 Apr 2003 21:11:50 +0200
From: Marko Zec <zec@tel.fer.hr>
X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To: David Schultz <das@FreeBSD.ORG>
References: <3E976EBD.C3E66EF8@tel.fer.hr>
	<20030414101935.GB18110@HAL9000.homeunix.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: freebsd-fs@FreeBSD.ORG
cc: mckusick@McKusick.COM
cc: freebsd-stable@FreeBSD.ORG
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Apr 2003 19:12:01 -0000

David Schultz wrote:

>   For instance, you could
>   have fsync() push the appropriate dirty buffers out to a separate
>   cache, then commit the contents of the cache in the order of the
>   fsyncs when the disk is next active.

Huh... such a concept would still break fsync() semantics. Note that the
original patch also ensures dirty buffers get flushed if / when the disk spins
up, even before the delay timer gets expired.

> - The fiddling with rushjob seems rather arbitrary.  You can probably
>   just let the existing code increment it as necessary and force a sync
>   if the value gets too high.

If rushjob is would not be used for forcing prompt synching, the original code
could not guarantee the sync to occur immediately. Instead, the synching could
be further delayed for up to 30 seconds, which is not desirable if our major
design goal is to do as much disk I/O as possible in a small time interval and
leave the disk idle otherwise.

Marko


From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 15 14:54:53 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 276C437B401; Tue, 15 Apr 2003 14:54:53 -0700 (PDT)
Received: from testmail.wolves.k12.mo.us (testmail.wolves.k12.mo.us
	[207.160.214.10])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 4059043FA3; Tue, 15 Apr 2003 14:54:52 -0700 (PDT)
	(envelope-from cdillon@wolves.k12.mo.us)
Received: by testmail.wolves.k12.mo.us (Postfix, from userid 1001)
	id 1D957CD61; Tue, 15 Apr 2003 16:54:51 -0500 (CDT)
Received: from localhost (localhost [127.0.0.1])
	by testmail.wolves.k12.mo.us (Postfix) with ESMTP
	id 1A2C2CD19; Tue, 15 Apr 2003 16:54:51 -0500 (CDT)
Date: Tue, 15 Apr 2003 16:54:51 -0500 (CDT)
From: Chris Dillon <cdillon@wolves.k12.mo.us>
To: Marko Zec <zec@tel.fer.hr>
In-Reply-To: <3E9C5975.43755858@tel.fer.hr>
Message-ID: <20030415160925.U86854@duey.wolves.k12.mo.us>
References: <3E976EBD.C3E66EF8@tel.fer.hr>
	<20030414101935.GB18110@HAL9000.homeunix.com>
	<3E9C5975.43755858@tel.fer.hr>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-fs@freebsd.org
cc: mckusick@McKusick.COM
cc: David Schultz <das@freebsd.org>
cc: freebsd-stable@freebsd.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Apr 2003 21:54:53 -0000

On Tue, 15 Apr 2003, Marko Zec wrote:

> Huh... such a concept would still break fsync() semantics. Note that
> the original patch also ensures dirty buffers get flushed if / when
> the disk spins up, even before the delay timer gets expired.

Sorry to butt in on this thread... :-)  It just occurred to me that
the ability to delay all writes given an arbitrary time period would
be good for more than just laptops.  It would be great for
non-volatile flash filesystems which have a limited write life.  The
only thing you would have to change for that case is make the "flush
on read" optional, since the purpose would be to minimize writes, not
minimize disk spin-ups which don't exist on flash parts.  This would
only be advantageous if delaying the writes will actually cause fewer
writes to be made to the flash part than would have been made without
the delay, i.e. via normal soft-updates optimizations (a file created
and removed within the delay period never gets written, or delaying
atime updates of oft-read files), which I'm guessing would be the case
most of the time.

For example, on a small flash-based firewall I currently use at home,
I would use a delay time of 60 minutes or more.  That would correspond
to how I currently handle saving the important dynamic information
kept on a memory filesystem, such as DHCP leases, which is every 60
minutes mount a small filesystem read-write on the flash part, tar up
the dynamic data, and then umount the filesystem.  I then have to
un-tar that data onto the memory filesystem during boot.  Being able
to keep all of that information directly on a read-write filesystem on
the flash part but delay writes for a relatively long period of time
would alleviate all of that.

If the "clean" bit is set on the FS during that long delay that would
be even slicker (does it do that already?), since if the filesystem is
consistent thanks to softupdates it shouldn't need to be fsck'd at all
on boot.


-- 
 Chris Dillon - cdillon(at)wolves.k12.mo.us
 FreeBSD: The fastest and most stable server OS on the planet
 - Available for IA32 (Intel x86) and Alpha architectures
 - IA64, PowerPC, UltraSPARC, ARM, and S/390 under development
 - http://www.freebsd.org

No trees were harmed in the composition of this message, although some
electrons were mildly inconvenienced.

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 15 16:27:46 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 3B57637B401; Tue, 15 Apr 2003 16:27:46 -0700 (PDT)
Received: from mail.tel.fer.hr (zg06-140.dialin.iskon.hr [213.191.148.141])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 0B29643F75; Tue, 15 Apr 2003 16:27:44 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from tel.fer.hr ([192.168.202.105])
	by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3FNPpxK000691;
	Wed, 16 Apr 2003 01:25:55 +0200 (CEST)
	(envelope-from zec@tel.fer.hr)
Message-ID: <3E9C9566.8603E312@tel.fer.hr>
Date: Wed, 16 Apr 2003 01:27:34 +0200
From: Marko Zec <zec@tel.fer.hr>
X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Chris Dillon <cdillon@wolves.k12.mo.us>
References: <3E976EBD.C3E66EF8@tel.fer.hr>
	<20030414101935.GB18110@HAL9000.homeunix.com>
	<20030415160925.U86854@duey.wolves.k12.mo.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: freebsd-fs@freebsd.org
cc: freebsd-stable@freebsd.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Apr 2003 23:27:46 -0000

Chris Dillon wrote:

> On Tue, 15 Apr 2003, Marko Zec wrote:
>
> > Huh... such a concept would still break fsync() semantics. Note that
> > the original patch also ensures dirty buffers get flushed if / when
> > the disk spins up, even before the delay timer gets expired.
>
> Sorry to butt in on this thread... :-)  It just occurred to me that
> the ability to delay all writes given an arbitrary time period would
> be good for more than just laptops.  It would be great for
> non-volatile flash filesystems which have a limited write life.  The
> only thing you would have to change for that case is make the "flush
> on read" optional, since the purpose would be to minimize writes, not
> minimize disk spin-ups which don't exist on flash parts.  This would
> only be advantageous if delaying the writes will actually cause fewer
> writes to be made to the flash part than would have been made without
> the delay, i.e. via normal soft-updates optimizations (a file created
> and removed within the delay period never gets written, or delaying
> atime updates of oft-read files), which I'm guessing would be the case
> most of the time.

To achieve such a functionality, simply remove or comment out the
stratcalls++
line in /sys/dev/ata/ata-disk.c. A cleaner method would of course be
adding another tunable knob, which would also be a trivial thing to...
Cheers,

Marko

> For example, on a small flash-based firewall I currently use at home,
> I would use a delay time of 60 minutes or more.  That would correspond
> to how I currently handle saving the important dynamic information
> kept on a memory filesystem, such as DHCP leases, which is every 60
> minutes mount a small filesystem read-write on the flash part, tar up
> the dynamic data, and then umount the filesystem.  I then have to
> un-tar that data onto the memory filesystem during boot.  Being able
> to keep all of that information directly on a read-write filesystem on
> the flash part but delay writes for a relatively long period of time
> would alleviate all of that.
>
> If the "clean" bit is set on the FS during that long delay that would
> be even slicker (does it do that already?), since if the filesystem is
> consistent thanks to softupdates it shouldn't need to be fsck'd at all
> on boot.
>
> --
>  Chris Dillon - cdillon(at)wolves.k12.mo.us
>  FreeBSD: The fastest and most stable server OS on the planet
>  - Available for IA32 (Intel x86) and Alpha architectures
>  - IA64, PowerPC, UltraSPARC, ARM, and S/390 under development
>  - http://www.freebsd.org
>
> No trees were harmed in the composition of this message, although some
> electrons were mildly inconvenienced.


From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 15 20:30:30 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 255A137B407
	for <freebsd-fs@freebsd.org>; Tue, 15 Apr 2003 20:30:30 -0700 (PDT)
Received: from tango.chessclub.com (tango.chessclub.com [204.178.125.70])
	by mx1.FreeBSD.org (Postfix) with SMTP id ED5CD43FB1
	for <freebsd-fs@freebsd.org>; Tue, 15 Apr 2003 20:30:27 -0700 (PDT)
	(envelope-from sleator@tango.chessclub.com)
Received: (qmail 81144 invoked by uid 1000); 16 Apr 2003 03:19:17 -0000
Date: 16 Apr 2003 03:19:17 -0000
Message-ID: <20030416031917.81143.qmail@tango.chessclub.com>
From: Danny Sleator <sleator@cs.cmu.edu>
To: freebsd-fs@freebsd.org
Subject: better ways to get the news
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Apr 2003 03:30:30 -0000

I'm alternately outraged and depressed by what's happening
in the world.  We now have the most powerful, deceitful,
arrogant, and belligerent administration in US history.  And
almost everything they're doing is wrong.

Here's one example to illustrate the power of Emperor Bush.
He can start a unilateral, preemptive, unprecedented war
costing hundreds of billions of dollars.  His justification
for it constantly changes, and is buttressed by a stream of
lies.  Simultaneously he can demand and get from congress a
huge tax cut for the rich, despite the fact that we're in a
recession and there's a huge budget deficit.  And while
doing all this outrageous stuff, he remains extremely
popular.

I'm thinking about what I, an average Joe, can do to slow
down this juggernaut.  One thing I did was put up this
lighted sign outside of my house:

   http://www.cs.cmu.edu/~sleator/pictures/no-war.jpg

But I think the real problem, and the reason for Bush's
popularity, is that the American people basically don't have
a clue about what's really happening.  The mainstream media
are not communicating it.  Here are four examples to
illustrate this point.

1. Remember the huge crowd of Iraqis cheering and pulling
   down a statue of Saddam?  It turns out that the crowd was
   very small and some (all?)  of the jubilant members of
   the crowd were actors.

   http://www.informationclearinghouse.info/article2842.htm

2. Remember the rampant looting of Baghdad?  Perhaps you
   knew that the US didn't lift a finger to stop it.  But
   did you know that it was encouraged by US troops as a
   photo op?

   http://truthout.org/docs_03/041603D.shtml

3. Did you know that Richard Perle (a key author of the US's
   current Iraq policy) worked to undermine the Camp David
   accords in the summer of 2000?

   http://www.guardian.co.uk/israel/Story/0,2763,342857,00.html

4. There's an outrageous, little-known part of NAFTA called
   chapter 11, which foreign corporate investors are using
   to challenge laws designed to protect public health,
   environmental regulations, and jury verdicts.  The cases
   are heard before a secret international trade tribunal.

   http://www.citizen.org/publications/release.cfm?ID=7076

These are just a tiny sample to illustrate the problems of
missing and/or misleading stories in the media.  This
situation goes a long way toward explaining why the war is
so much more popular in the US than it is everywhere else.
So I'm suggesting (to all the addresses in my inbox over the
last few years) some good alternative sources of information
that I've found.

A good place to start is http://www.truthout.org
They collect stories from reputable sources all over the
world.  You can sign up for a daily mailing of stories of
their suggested stories.  I've included one below.  Sign up
for their mailings at: http://216.25.72.229/membership/sub_mgmt.php

http://www.fair.org is a media watchdog group.  They
maintain a web site, and they let you sign up for sporadic
mailings about media deceptions and bias.  They often have
action alerts about specific outrages in the media.

Another very good organization is http://www.moveon.org
They email reminders when congress is considering important
issues.  They make it easy to contact your congress person to
voice your opinion.  They also run ads in mainstream
publications and on TV.

I also highly recommend the book "What Liberal Media?" by
Eric Alterman.  He explains in great detail all the ways in
which the media system is broken, and how it got this way.

Here are some other great sites to take a look at:

    http://www.consortiumnews.com
    http://www.copvcia.com
    http://www.democraticunderground.com
    http://www.informationclearinghouse.info
    http://www.tompaine.com
    http://www.zmag.org/weluser.htm

I hope you find this mailing useful, and I apologize if you
got this more than once.  Feel free to distribute this
further.  One warning: If you keep up with these sites, your
world view will start to diverge from the "standard"
(i.e. false) world view.  You risk being viewed as a
conspiracy theorist or a nut.

          Danny Sleator
          Professor of Computer Science
          Carnegie Mellon University
          Email: sleator@cmu.edu


t r u t h o u t | 04.16

Eagleburger: Bush Should be Impeached if He Attacks Syria
<a href=" http://truthout.org/docs_03/041603A.shtml ">GO</a>

Echoes of Empires Past
<a href=" http://truthout.org/docs_03/041603B.shtml ">GO</a>

Bomb Before You Buy
<a href=" http://truthout.org/docs_03/041603C.shtml ">GO</a>

US Troops Encouraged Ransacking
<a href=" http://truthout.org/docs_03/041603D.shtml ">GO</a>

Reflections on the Battle of Baghdad
<a href=" http://truthout.org/docs_03/041603E.shtml ">GO</a>

Bush-Hitler Remark Sinks Movie Exec
<a href=" http://truthout.org/docs_03/041603F.shtml ">GO</a>

'Fearless' Dean Wins Converts
<a href=" http://truthout.org/docs_03/041603G.shtml ">GO</a>

What About Private Lori?
<a href=" http://truthout.org/docs_03/041603H.shtml ">GO</a>

t r u t h o u t - Newsletter Sign-up (Free) :
<a href=" https://www.truthout.org/membership/membership.htm ">GO</a>
Problems with the links? Go direct to our HomePage : http://www.truthout.org

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

t r u t h o u t | 04.15

William Rivers Pitt | How America Lost the War
<a href=" http://truthout.org/docs_03/041503A.shtml ">GO</a>

Rout Proves Anti-War Point
<a href=" http://truthout.org/docs_03/041503B.shtml ">GO</a>

Aftermath: The Bush Doctrine
<a href=" http://truthout.org/docs_03/041503C.shtml ">GO</a>

Baghdad Seeths With Anger Toward U.S.
<a href=" http://truthout.org/docs_03/041503D.shtml ">GO</a>

Syria Could Be Next, Warns Washington
<a href=" http://truthout.org/docs_03/041503E.shtml ">GO</a>

America Targeted 14,000 Sites. So Where Are The WMDs?
<a href=" http://truthout.org/docs_03/041503F.shtml ">GO</a>

Scandal-Hit US Firm Wins Key Contracts
<a href=" http://truthout.org/docs_03/041503G.shtml ">GO</a>

Civilisation Torn To Pieces
<a href=" http://truthout.org/docs_03/041503H.shtml ">GO</a>

Mesopotamia. Babylon. The Tigris and Euphrates
<a href=" http://truthout.org/docs_03/041503I.shtml ">GO</a>

t r u t h o u t - Newsletter Sign-up (Free) :
<a href=" https://www.truthout.org/membership/membership.htm ">GO</a>
Problems with the links? Go direct to our HomePage : http://www.truthout.org

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

t r u t h o u t | 04.14

War and Peace: Anarchy in the Streets
<a href=" http://truthout.org/docs_03/041403A.shtml ">GO</a>

U.S. Marines Exchange Heavy Fire in Central Baghdad
<a href=" http://truthout.org/docs_03/041403B.shtml ">GO</a>

Pillagers Strip Iraqi Museum of Its Treasure
<a href=" http://truthout.org/docs_03/041403C.shtml ">GO</a>

Crime Against Humanity
<a href=" http://truthout.org/docs_03/041403D.shtml ">GO</a>

Garner Waiting For "Last Shot" To Rule Baghdad
<a href=" http://truthout.org/docs_03/041403E.shtml ">GO</a>

Vanishing Liberties -- Where's the Press?
<a href=" http://truthout.org/docs_03/041403F.shtml ">GO</a>

Anthrax Source Probably Domestic
<a href=" http://truthout.org/docs_03/041403G.shtml ">GO</a>

India Mulls 'Pre-Emptive' Pakistan Strike, Cites Iraq War Precedent
<a href=" http://truthout.org/docs_03/041403H.shtml ">GO</a>

Outspoken Yellowstone Ranger Loses Job
<a href=" http://truthout.org/docs_03/041403I.shtml ">GO</a>

t r u t h o u t - Newsletter Sign-up (Free) :
<a href=" https://www.truthout.org/membership/membership.htm ">GO</a>
Problems with the links? Go direct to our HomePage : http://www.truthout.org

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

t r u t h o u t | 04.13

Congressman Questions Iraq Work Given To Halliburton Subsidiary Without
Competition
<a href=" http://truthout.org/docs_03/041303A.shtml ">GO</a>

US Arms Group Heads for Lisbon
<a href=" http://truthout.org/docs_03/041303B.shtml ">GO</a>

US Show of Force Galls Arab World
<a href=" http://truthout.org/docs_03/041303C.shtml ">GO</a>

U.S. Govt Accused of War Crimes against Journalists
<a href=" http://truthout.org/docs_03/041303D.shtml ">GO</a>

Ordinary People Fear Their Nation Could Be Next Target of 'Regime Change'
<a href=" http://truthout.org/docs_03/041303E.shtml ">GO</a>

Northern Iraq Falls, Mobs Run Riot in Baghdad
<a href=" http://truthout.org/docs_03/041303F.shtml ">GO</a>

The Future of Iraq's Oil
<a href=" http://truthout.org/docs_03/041303G.shtml ">GO</a>

War Within A War A Real Possibility
<a href=" http://truthout.org/docs_03/041303H.shtml ">GO</a>

t r u t h o u t - Newsletter Sign-up (Free) :
<a href=" https://www.truthout.org/membership/membership.htm ">GO</a>
Problems with the links? Go direct to our HomePage : http://www.truthout.org

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

t r u t h o u t | 04.12

Hans Blix: War Planned 'Long in Advance'
<a href=" http://truthout.org/docs_03/041203A.shtml ">GO</a>

The Press and the War
<a href=" http://truthout.org/docs_03/041203B.shtml ">GO</a>

Spoils of War
<a href=" http://truthout.org/docs_03/041203C.shtml ">GO</a>

Suicide Bomber In Baghdad Injures Four Marines
<a href=" http://truthout.org/docs_03/041203D.shtml ">GO</a>

Security Council Balks at Postwar Plans
<a href=" http://truthout.org/docs_03/041203E.shtml ">GO</a>

Murdoch Adds to Empire With Control of DirecTV
<a href=" http://truthout.org/docs_03/041203F.shtml ">GO</a>

Bush Offers Crooks And Warmongers To Lead Iraq
<a href=" http://truthout.org/docs_03/041203G.shtml ">GO</a>

House Revives ANWAR Again
<a href=" http://truthout.org/docs_03/041203H.shtml ">GO</a>

t r u t h o u t - Newsletter Sign-up (Free) :
<a href=" https://www.truthout.org/membership/membership.htm ">GO</a>
Problems with the links? Go direct to our HomePage : http://www.truthout.org

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

t r u t h o u t | 04.11

Despite Cheering Crowds, Army Unit Sees Urban Combat in Baghdad
<a href=" http://truthout.org/docs_03/041103A.shtml ">GO</a>

In Search of Horror Weapons
<a href=" http://truthout.org/docs_03/041103B.shtml ">GO</a>

Syria Now Top US Target for 'Regime Change'
<a href=" http://truthout.org/docs_03/041103C.shtml ">GO</a>

Descent Into a Charnel-House Hospital Hell
<a href=" http://truthout.org/docs_03/041103D.shtml ">GO</a>

Republicans Want Patriot Act Made Permanent
<a href=" http://truthout.org/docs_03/041103E.shtml ">GO</a>

The Pentagon's 'Trainee,' Ahmad Chalabi
<a href=" http://truthout.org/docs_03/041103F.shtml ">GO</a>

UNICEF Warns Of Worsening Situation For Children In Iraq
<a href=" http://truthout.org/docs_03/041103G.shtml ">GO</a>

House Democrats Want Halliburton Probe
<a href=" http://truthout.org/docs_03/041103H.shtml ">GO</a>

CPJ Condemns Journalists' Deaths In Iraq
<a href=" http://truthout.org/docs_03/041103I.shtml ">GO</a>

t r u t h o u t - Newsletter Sign-up (Free) :
<a href=" https://www.truthout.org/membership/membership.htm ">GO</a>
Problems with the links? Go direct to our HomePage : http://www.truthout.org

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

t r u t h o u t | 04.10

William Rivers Pitt | The Longest Winter
<a href=" http://truthout.org/docs_03/041003A.shtml ">GO</a>

Dark Day for Journalists in Iraq
<a href=" http://truthout.org/docs_03/041003B.shtml ">GO</a>

Wailing Children, the Wounded, the Dead
<a href=" http://truthout.org/docs_03/041003C.shtml ">GO</a>

The Taliban are Back in Southeast Afghanistan
<a href=" http://truthout.org/docs_03/041003D.shtml ">GO</a>

War Out of Compassion
<a href=" http://truthout.org/docs_03/041003E.shtml ">GO</a>

Iraqis In Basra Weigh Freedom's Cost
<a href=" http://truthout.org/docs_03/041003F.shtml ">GO</a>

Oakland Cops Defend Use of Force Against Protesters
<a href=" http://truthout.org/docs_03/041003G.shtml ">GO</a>

Coleman Apologizes For Remark About Wellstone
<a href=" http://truthout.org/docs_03/041003H.shtml ">GO</a>

Saddam Hussein, "Chemical Ali" Apparently Survive Attacks
<a href=" http://truthout.org/docs_03/041003I.shtml ">GO</a>

Economy on the Edge
<a href=" http://truthout.org/docs_03/041003J.shtml ">GO</a>

t r u t h o u t - Newsletter Sign-up (Free) :
<a href=" https://www.truthout.org/membership/membership.htm ">GO</a>
Problems with the links? Go direct to our HomePage : http://www.truthout.org

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/


t r u t h o u t | 04.09

Oakland Police Open Fire At Anti-War Protest
<a href=" http://truthout.org/docs_03/040903A.shtml ">GO</a>

Iraqis Launch Urban Fightback in Baghdad
<a href=" http://truthout.org/docs_03/040903B.shtml ">GO</a>

Simpson: 'This Is Like A Scene From Hell.'
<a href=" http://truthout.org/docs_03/040903C.shtml ">GO</a>

Baghdad Hospitals Overwhelmed, No Longer Counting Casualties
<a href=" http://truthout.org/docs_03/040903D.shtml ">GO</a>

"Smoking Gun" WMD Site in Iraq Turns Out to Contain Pesticide
<a href=" http://truthout.org/docs_03/040903E.shtml ">GO</a>

'I Love My Country, But.'
<a href=" http://truthout.org/docs_03/040903F.shtml ">GO</a>

Cronies Set To Make A Killing
<a href=" http://truthout.org/docs_03/040903G.shtml ">GO</a>

William Rivers Pitt's New Book Now Available
<a href=" http://truthout.org/docs_03/040903H.shtml ">GO</a>

t r u t h o u t - Newsletter Sign-up (Free) :
<a href=" https://www.truthout.org/membership/membership.htm ">GO</a>
Problems with the links? Go direct to our HomePage : http://www.truthout.org

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

t r u t h o u t | 04.08

Surgeon Describes "Horrific Injuries," Sanitized War
<a href=" http://truthout.org/docs_03/040803A.shtml ">GO</a>

Up to 3,000 Iraqi Fighters Dead in Show of Force
<a href=" http://truthout.org/docs_03/040803B.shtml ">GO</a>

Red Cross: Iraq Wounded Too High to Count
<a href=" http://truthout.org/docs_03/040803C.shtml ">GO</a>

U.S. Finds No Weapons of Mass Destruction in Iraq
<a href=" http://truthout.org/docs_03/040803D.shtml ">GO</a>

Little Hope for Post-War Boom in US Economy
<a href=" http://truthout.org/docs_03/040803E.shtml ">GO</a>

Carlyle Group Heads for Lisbon
<a href=" http://truthout.org/docs_03/040803F.shtml ">GO</a>

Army Chaplain Offers Baptisms, Baths
<a href=" http://truthout.org/docs_03/040803G.shtml ">GO</a>

Irish Anti-War Marchers to Confront Bush
<a href=" http://truthout.org/docs_03/040803H.shtml ">GO</a>

Disarmament In Tatters
<a href=" http://truthout.org/docs_03/040803I.shtml ">GO</a>

7-Year-Old Kurd: 'I Like War'
<a href=" http://truthout.org/docs_03/040803J.shtml ">GO</a>

t r u t h o u t - Newsletter Sign-up (Free) :
<a href=" https://www.truthout.org/membership/membership.htm ">GO</a>
Problems with the links? Go direct to our HomePage : http://www.truthout.org

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

t r u t h o u t | 04.07

Thirsty Iraqis Must Be Baptized to Get Water
<a href=" http://truthout.org/docs_03/040703A.shtml ">GO</a>

Near Baghdad, U.S. Troops Encounter 'Remarkable' Foe
<a href=" http://truthout.org/docs_03/040703B.shtml ">GO</a>

Britain Admits There May Be No WMD's in Iraq
<a href=" http://truthout.org/docs_03/040703C.shtml ">GO</a>

Forecasters Underrating Weakness of US Economy
<a href=" http://truthout.org/docs_03/040703D.shtml ">GO</a>

US Marines Kill Seven Iraqis After Truck Fails to Stop (Again)
<a href=" http://truthout.org/docs_03/040703E.shtml ">GO</a>

Baghdad Hospitals Stretched to their Limits
<a href=" http://truthout.org/docs_03/040703F.shtml ">GO</a>

American Portrayal of War of Liberation Faltering Across Arab World
<a href=" http://truthout.org/docs_03/040703G.shtml ">GO</a>

Blair and Friends Staring Into War's Political Abyss
<a href=" http://truthout.org/docs_03/040703H.shtml ">GO</a>

Turf War Rages in Washington Over Who Will Rule Iraq
<a href=" http://truthout.org/docs_03/040703I.shtml ">GO</a>

To Activists, Real Battles Are on Home Front
<a href=" http://truthout.org/docs_03/040703J.shtml ">GO</a>

t r u t h o u t - Newsletter Sign-up (Free) :
<a href=" https://www.truthout.org/membership/membership.htm ">GO</a>
Problems with the links? Go direct to our HomePage : http://www.truthout.org

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

t r u t h o u t | 04.06

Red Cross Horrified by Number of Dead Civilians
<a href=" http://truthout.org/docs_03/040603A.shtml ">GO</a>

Samar's Story
<a href=" http://truthout.org/docs_03/040603B.shtml ">GO</a>

At Umm Qasr, the "Secured" Port, "It's Chaos"
<a href=" http://truthout.org/docs_03/040603C.shtml ">GO</a>

How the Dissidents Fooled the Washington Hawks
<a href=" http://truthout.org/docs_03/040603D.shtml ">GO</a>

US Military Admits 'Suspicious' Powder is Explosive
<a href=" http://truthout.org/docs_03/040603E.shtml ">GO</a>

Kerry Lashes Out at Republican Criticisms
<a href=" http://truthout.org/docs_03/040603F.shtml ">GO</a>

Saddam Was Not Always Washington's 'Demon'
<a href=" http://truthout.org/docs_03/040603G.shtml ">GO</a>

The War's Dirty Secret: It's About Changing United States, Not Iraq
<a href=" http://truthout.org/docs_03/040603H.shtml ">GO</a>

Jobs Show Worse-Than-Expected Drop
<a href=" http://truthout.org/docs_03/040603I.shtml ">GO</a>

Senate Won't Debate Alaska Oil Drilling
<a href=" http://truthout.org/docs_03/040603J.shtml ">GO</a>

t r u t h o u t - Newsletter Sign-up (Free) :
<a href=" https://www.truthout.org/membership/membership.htm ">GO</a>
Problems with the links? Go direct to our HomePage : http://www.truthout.org

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/


To JOIN the TO list: http://www.truthout.org/membership/sub_mgmt.php

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 16 01:35:45 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id E087E37B401; Wed, 16 Apr 2003 01:35:45 -0700 (PDT)
Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net
	[207.217.120.189])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 369BA43F93; Wed, 16 Apr 2003 01:35:45 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0135.cvx40-bradley.dialup.earthlink.net ([216.244.42.135]
	helo=mindspring.com)
	by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 195iOD-0004QY-00; Wed, 16 Apr 2003 01:35:34 -0700
Message-ID: <3E9D157E.96FD09AE@mindspring.com>
Date: Wed, 16 Apr 2003 01:34:06 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Chris Dillon <cdillon@wolves.k12.mo.us>
References: <3E976EBD.C3E66EF8@tel.fer.hr>
	<20030414101935.GB18110@HAL9000.homeunix.com>
	<20030415160925.U86854@duey.wolves.k12.mo.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a48a278b88c0ad456bc35dc084ece7a78e548b785378294e88350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@freebsd.org
cc: mckusick@McKusick.COM
cc: freebsd-stable@freebsd.org
cc: David Schultz <das@freebsd.org>
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Apr 2003 08:35:46 -0000

Chris Dillon wrote:
> On Tue, 15 Apr 2003, Marko Zec wrote:
> > Huh... such a concept would still break fsync() semantics. Note that
> > the original patch also ensures dirty buffers get flushed if / when
> > the disk spins up, even before the delay timer gets expired.
> 
> Sorry to butt in on this thread... :-)  It just occurred to me that
> the ability to delay all writes given an arbitrary time period would
> be good for more than just laptops.  It would be great for
> non-volatile flash filesystems which have a limited write life.

The life expectancy of these devices is really, really
underestimated.  In practice, I've seen two million
write cycles from some of these in lab machines which
get rewritten pretty often.

You are actually better off with a "noatime" option, to
avoid cron beating the same set of bits once a second,
or even a read-only mount for most/all of your FS's to
avoid having to worry about writes at all.


> If the "clean" bit is set on the FS during that long delay that would
> be even slicker (does it do that already?), since if the filesystem is
> consistent thanks to softupdates it shouldn't need to be fsck'd at all
> on boot.

That's called "soft read-only".  Kirk implemented that
for the BSDI version, but not for FreeBSD or OpenBSD.  We
discussed it when he was doing the FreeBSD work on contract
for Whistle.  It's actually not that hard to do, I think,
but it's probably evil to not update access times on an FS
that's *technically* mounted read/write, if you're expecting
those semantics.

Practically, you can't really trust the BG fsck when it
comes to real disks, because you can lose whole tracks,
and if you ever do end up with an inconsistency, you are
pretty screwed if it results in a panic.  For something
that's solid state, that's less of a problem.  8-).

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 16 03:11:37 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9E9F837B401; Wed, 16 Apr 2003 03:11:37 -0700 (PDT)
Received: from HAL9000.homeunix.com (12-233-57-131.client.attbi.com
	[12.233.57.131])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 013C643FAF; Wed, 16 Apr 2003 03:11:37 -0700 (PDT)
	(envelope-from das@FreeBSD.org)
Received: from HAL9000.homeunix.com (localhost [127.0.0.1])
	by HAL9000.homeunix.com (8.12.9/8.12.5) with ESMTP id h3GABa9E001264;
	Wed, 16 Apr 2003 03:11:36 -0700 (PDT)
	(envelope-from das@FreeBSD.org)
Received: (from das@localhost)
	by HAL9000.homeunix.com (8.12.9/8.12.5/Submit) id h3GABaTM001263;
	Wed, 16 Apr 2003 03:11:36 -0700 (PDT)
	(envelope-from das@FreeBSD.org)
Date: Wed, 16 Apr 2003 03:11:36 -0700
From: David Schultz <das@FreeBSD.org>
To: Marko Zec <zec@tel.fer.hr>
Message-ID: <20030416101136.GA868@HAL9000.homeunix.com>
Mail-Followup-To: Marko Zec <zec@tel.fer.hr>, freebsd-fs@FreeBSD.org,
	freebsd-stable@FreeBSD.org, mckusick@McKusick.COM
References: <3E976EBD.C3E66EF8@tel.fer.hr>
	<20030414101935.GB18110@HAL9000.homeunix.com> <3E9C5975.43755858@tel.fer.hr>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3E9C5975.43755858@tel.fer.hr>
cc: freebsd-fs@FreeBSD.org
cc: mckusick@McKusick.COM
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Apr 2003 10:11:38 -0000

On Tue, Apr 15, 2003, Marko Zec wrote:
> David Schultz wrote:
> 
> >   For instance, you could
> >   have fsync() push the appropriate dirty buffers out to a separate
> >   cache, then commit the contents of the cache in the order of the
> >   fsyncs when the disk is next active.
> 
> Huh... such a concept would still break fsync() semantics. Note that the
> original patch also ensures dirty buffers get flushed if / when the disk spins
> up, even before the delay timer gets expired.

I didn't say it wouldn't still break fsync() semantics; it
doesn't.  However, you could guarantee that data are in a
consistent state with my proposal.  On the other hand, the more I
think about the details, the more I think this could be more of a
pain than it's worth.


> 
> > - The fiddling with rushjob seems rather arbitrary.  You can probably
> >   just let the existing code increment it as necessary and force a sync
> >   if the value gets too high.
> 
> If rushjob is would not be used for forcing prompt synching, the original code
> could not guarantee the sync to occur immediately. Instead, the synching could
> be further delayed for up to 30 seconds, which is not desirable if our major
> design goal is to do as much disk I/O as possible in a small time interval and
> leave the disk idle otherwise.

I was referring to all the places where rushjob is set to or
incremented by syncer_maxdelay.  AFAIK, it should never be that
large.  I don't think you want to overload a low memory handling
mechanism with the task of syncing the disk.

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 16 03:28:53 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 34B2C37B401; Wed, 16 Apr 2003 03:28:53 -0700 (PDT)
Received: from mail.r.caley.org.uk
	(82-41-209-16.cable.ubr12.edin.blueyonder.co.uk [82.41.209.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 6C32643F75; Wed, 16 Apr 2003 03:28:51 -0700 (PDT)
	(envelope-from rjc@caley.org.uk)
Received: from pele.r.caley.org.uk (pele.r.caley.org.uk [10.0.0.12])
	by mail.r.caley.org.uk (8.12.6/8.12.6) with ESMTP id h3GASnXj093442;
	Wed, 16 Apr 2003 11:28:49 +0100 (BST)
	(envelope-from rjc@bast.r.caley.org.uk)
Received: from pele.r.caley.org.uk (localhost [127.0.0.1])
	by pele.r.caley.org.uk (8.12.6/8.12.6) with ESMTP id h3GASnFl051393;
	Wed, 16 Apr 2003 11:28:49 +0100 (BST)
	(envelope-from rjc@bast.r.caley.org.uk)
Received: (from rjc@localhost)
	by pele.r.caley.org.uk (8.12.6/8.12.6/Submit) id h3GASnQQ051390;
	Wed, 16 Apr 2003 11:28:49 +0100 (BST)
	(envelope-from rjc@bast.r.caley.org.uk)
X-Authentication-Warning: pele.r.caley.org.uk: rjc set sender to
	rjc@bast.r.caley.org.uk using -f
Sender: rjc@caley.org.uk
To: Marko Zec <zec@tel.fer.hr>
References: <200304121438.h3CEct41030991@lurza.secnetix.de>
	<3E9840B8.F00E018F@tel.fer.hr>
From: Richard Caley <rjc@caley.org.uk>
In-Reply-To: <3E9840B8.F00E018F@tel.fer.hr>
Date: 16 Apr 2003 11:28:49 +0100
Message-ID: <87smsiohwe.fsf@pele.r.caley.org.uk>
Lines: 26
User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.1 (Cuyahoga Valley)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
cc: freebsd-fs@freebsd.org
cc: freebsd-stable@freebsd.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Apr 2003 10:28:53 -0000

In article <3E9840B8.F00E018F@tel.fer.hr>, Marko Zec (mz) writes:


mz> I agree that additional tunable for controlling fsync() behavior couldn't hurt,
mz> however as explained in previous note I see the fsync() as the most common
mz> initiator of disk spinnups, so a method for suppressing it must be made
mz> available, otherwise the whole patch wouldn't make much sense...

Would it make sense to make the fsync behaviour a per-process choice?
That way certain system processes could, if this delay behaviour is
enabled, use the null fsync. For instance, if syslog is one of the
things causing annoying spin-ups, then the user could tell syslog not
to really fsync, trading forensic information in the event of a crash
for battery life. 

Additionally there could be a really_really_fysnc call to be used to
make certain programs delay-aware. Eg, it might be acceptable for my
emacs checkpointing not to fsync, again I'm trading losing a little
more work in the event of a crash for battery life, but when I
explicitly save, I am saying I want that stuff on disk and stable NOW,
and damn battery.

-- 
Mail me as MYFIRSTNAME@MYLASTNAME.org.uk        _O_
                                                 |<

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 16 06:39:03 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2A28C37B404
	for <freebsd-fs@freebsd.org>; Wed, 16 Apr 2003 06:39:03 -0700 (PDT)
Received: from laptop.tenebras.com (laptop.tenebras.com [66.92.188.18])
	by mx1.FreeBSD.org (Postfix) with SMTP id 8EEEB43FB1
	for <freebsd-fs@freebsd.org>; Wed, 16 Apr 2003 06:39:00 -0700 (PDT)
	(envelope-from kudzu@tenebras.com)
Received: (qmail 12497 invoked from network); 16 Apr 2003 13:38:58 -0000
Received: from queequeg.tenebras.com (HELO tenebras.com) (192.168.188.241)
  by 0 with SMTP; 16 Apr 2003 13:38:58 -0000
Message-ID: <3E9D5CF2.7090606@tenebras.com>
Date: Wed, 16 Apr 2003 06:38:58 -0700
From: Michael Sierchio <kudzu@tenebras.com>
User-Agent: Mozilla/5.0 (X11; U; Linux i386; en-US; rv:1.3) Gecko/20030312
X-Accept-Language: en-us, en, zh-cn, zh-tw
MIME-Version: 1.0
To: Richard Caley <rjc@caley.org.uk>
References: <200304121438.h3CEct41030991@lurza.secnetix.de>
	<3E9840B8.F00E018F@tel.fer.hr> <87smsiohwe.fsf@pele.r.caley.org.uk>
In-Reply-To: <87smsiohwe.fsf@pele.r.caley.org.uk>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: freebsd-fs@freebsd.org
cc: freebsd-stable@freebsd.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Apr 2003 13:39:03 -0000

Richard Caley wrote:

> Additionally there could be a really_really_fysnc call ...

There is.  It is used in hundreds of programs.  It is called fsync (2).

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 16 09:25:05 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 43C3037B401; Wed, 16 Apr 2003 09:25:05 -0700 (PDT)
Received: from testmail.wolves.k12.mo.us (testmail.wolves.k12.mo.us
	[207.160.214.10])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 068C443FD7; Wed, 16 Apr 2003 09:25:04 -0700 (PDT)
	(envelope-from cdillon@wolves.k12.mo.us)
Received: by testmail.wolves.k12.mo.us (Postfix, from userid 1001)
	id DA7C0CD7C; Wed, 16 Apr 2003 11:25:02 -0500 (CDT)
Received: from localhost (localhost [127.0.0.1])
	by testmail.wolves.k12.mo.us (Postfix) with ESMTP
	id D8EACCD19; Wed, 16 Apr 2003 11:25:02 -0500 (CDT)
Date: Wed, 16 Apr 2003 11:25:02 -0500 (CDT)
From: Chris Dillon <cdillon@wolves.k12.mo.us>
To: Terry Lambert <tlambert2@mindspring.com>
In-Reply-To: <3E9D157E.96FD09AE@mindspring.com>
Message-ID: <20030416100921.U91118@duey.wolves.k12.mo.us>
References: <3E976EBD.C3E66EF8@tel.fer.hr>
	<20030414101935.GB18110@HAL9000.homeunix.com>
	<20030415160925.U86854@duey.wolves.k12.mo.us>
	<3E9D157E.96FD09AE@mindspring.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-fs@freebsd.org
cc: mckusick@McKusick.COM
cc: freebsd-stable@freebsd.org
cc: David Schultz <das@freebsd.org>
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Apr 2003 16:25:05 -0000

On Wed, 16 Apr 2003, Terry Lambert wrote:

> Chris Dillon wrote:
> > On Tue, 15 Apr 2003, Marko Zec wrote:
> > > Huh... such a concept would still break fsync() semantics. Note
> > > that the original patch also ensures dirty buffers get flushed
> > > if / when the disk spins up, even before the delay timer gets
> > > expired.
> >
> > Sorry to butt in on this thread... :-)  It just occurred to me
> > that the ability to delay all writes given an arbitrary time
> > period would be good for more than just laptops.  It would be
> > great for non-volatile flash filesystems which have a limited
> > write life.
>
> The life expectancy of these devices is really, really
> underestimated.  In practice, I've seen two million write cycles
> from some of these in lab machines which get rewritten pretty often.

I realize they have what looks like a really big number of writes on a
human scale, but to a computer which does things methodically day in
and day out without stopping, those writes can add up relatively
quickly.  Even with a life of two million write cycles, the
"occasional" 30-second round of updates that happen to write the same
bits over and over will give your flash part a life of only 1.9 years
(2000000 writes * 30 seconds apart = 60000000 seconds to failure).
Also, I doubt you'll actually get 2 million writes out of the average
consumer flash part.  A little USB key drive I have here is only rated
at 1 million writes, so it would likely last less than a year under
the above conditions.

> You are actually better off with a "noatime" option, to avoid cron
> beating the same set of bits once a second, or even a read-only
> mount for most/all of your FS's to avoid having to worry about
> writes at all.

Yeah, I already do that in the stuff I've built, I'm just saying it
would be advantageous not to have to do that in certain cases.

> > If the "clean" bit is set on the FS during that long delay that
> > would be even slicker (does it do that already?), since if the
> > filesystem is consistent thanks to softupdates it shouldn't need
> > to be fsck'd at all on boot.
>
> That's called "soft read-only".  Kirk implemented that for the BSDI
> version, but not for FreeBSD or OpenBSD.  We discussed it when he
> was doing the FreeBSD work on contract for Whistle.  It's actually
> not that hard to do, I think, but it's probably evil to not update
> access times on an FS that's *technically* mounted read/write, if
> you're expecting those semantics.

I've seen some versions of Windows do the soft-read-only thing with
FAT filesystems.  I also recall surprising a FreeBSD box with a reset
button and seeing a few RW-mounted filesystems go by marked "clean"
during boot, but if we don't have soft-read-only I was probably just
imagining it, or something else was at play.

As for atimes, if you're expecting all writes to be delayed, and you
still want atimes to be updated, you'll surely take into account that
the atime updates will be delayed as well.  This is all purely
optional behaviour, remember, so you should understand which bits of
your foot you're likely to shoot off when you turn it on.  It's not
really foot-shooting in that case, either, as long as you're not
relying on your atimes for anything important.

> Practically, you can't really trust the BG fsck when it comes to
> real disks, because you can lose whole tracks, and if you ever do
> end up with an inconsistency, you are pretty screwed if it results
> in a panic.  For something that's solid state, that's less of a
> problem.  8-).

Yes, definately.  Soft-read-only combined with regular foreground
fsck's would be the way to go with the current crop of drives.

-- 
 Chris Dillon - cdillon(at)wolves.k12.mo.us
 FreeBSD: The fastest and most stable server OS on the planet
 - Available for IA32 (Intel x86) and Alpha architectures
 - IA64, PowerPC, UltraSPARC, ARM, and S/390 under development
 - http://www.freebsd.org

No trees were harmed in the composition of this message, although some
electrons were mildly inconvenienced.

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 16 15:10:09 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 0330937B404; Wed, 16 Apr 2003 15:10:09 -0700 (PDT)
Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11])
	by mx1.FreeBSD.org (Postfix) with SMTP
	id C2BE943F75; Wed, 16 Apr 2003 15:10:07 -0700 (PDT)
	(envelope-from iedowse@maths.tcd.ie)
Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP
          id <aa96829@salmon>; 16 Apr 2003 23:10:07 +0100 (BST)
To: Marko Zec <zec@tel.fer.hr>
In-Reply-To: Your message of "Tue, 15 Apr 2003 20:37:47 +0200."
             <3E9C517B.6039679A@tel.fer.hr> 
Date: Wed, 16 Apr 2003 23:10:06 +0100
From: Ian Dowse <iedowse@maths.tcd.ie>
Message-ID: <200304162310.aa96829@salmon.maths.tcd.ie>
cc: freebsd-fs@freebsd.org
cc: freebsd-stable@freebsd.org
cc: Kirk McKusick <mckusick@beastie.mckusick.com>
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Apr 2003 22:10:09 -0000

In message <3E9C517B.6039679A@tel.fer.hr>, Marko Zec writes:
>Tempted by a lot of opposition to the concept of (optionally) ignoring
>fsync() calls when running on battery power, I wonder what effect the
>concept of unconditional delaying of _all_ disk updates by ATA-disk
>firmware will make on FS consistency in case of system crash or power
>failure? I do not want to imply such a concept is a priori bad, however
>I fail to realize its advantages over OS-controlled delaying of disk
>synching.

Note that the ATA "delayed write" mechanism only delays writes while
the disk is spun down; at other times there is no change in behaviour.
Since the disk only spins down after it has been idle for a time,
it is very unlikely that the disk is left in an inconsistent state
while it is stopped.

Just after the disk spins up there is a small window where the
cached writes get written out in a burst. Due to the amount of
cached data and the probable re-ordering of writes, the disk is
quite likely to be in an inconsistent state during this flurry of
writes, but the window is short so it is probably not a big issue
in practice.

The main advantage of using the ATA delayed write mechanism is that
the disk itself can take advantage of knowing whether or not it is
spinning, whereas the OS does not have that information. The downside
is that it is not guaranteed that fsync'd data gets written to disk
immediately, though in practice the disk tends to be spinning when
the fsync is performed due to the previous accesses. I've been using
ATA delayed writes on a few laptops for over a year and it has never
caused me any problems - it generally works just right in the sense
that the disk remains spun down when the machine is mostly idle,
and spins up when you save files from an editor etc.

Doing the write delaying in the OS is always going to be a tradeoff
between excessively delaying writes when the machine is busy and
maximising the time between spin-ups when idle, though obviously
there is more control possible over which writes get delayed and
which don't.

Ian

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 16 19:26:09 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 78E1137B401; Wed, 16 Apr 2003 19:26:09 -0700 (PDT)
Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net
	[207.217.120.188])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 5894943FCB; Wed, 16 Apr 2003 19:26:08 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0250.cvx40-bradley.dialup.earthlink.net ([216.244.42.250]
	helo=mindspring.com)
	by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 195z66-0005pC-00; Wed, 16 Apr 2003 19:26:00 -0700
Message-ID: <3E9E1063.C7D29C29@mindspring.com>
Date: Wed, 16 Apr 2003 19:24:35 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Chris Dillon <cdillon@wolves.k12.mo.us>
References: <3E976EBD.C3E66EF8@tel.fer.hr>
	<20030414101935.GB18110@HAL9000.homeunix.com>
	<20030415160925.U86854@duey.wolves.k12.mo.us>
	<20030416100921.U91118@duey.wolves.k12.mo.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a47ab6d6c875cbb8072b3f8575e5e62c02a8438e0f32a48e08350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@freebsd.org
cc: mckusick@McKusick.COM
cc: freebsd-stable@freebsd.org
cc: David Schultz <das@freebsd.org>
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Apr 2003 02:26:09 -0000

Chris Dillon wrote:
> As for atimes, if you're expecting all writes to be delayed, and you
> still want atimes to be updated, you'll surely take into account that
> the atime updates will be delayed as well.  This is all purely
> optional behaviour, remember, so you should understand which bits of
> your foot you're likely to shoot off when you turn it on.  It's not
> really foot-shooting in that case, either, as long as you're not
> relying on your atimes for anything important.

POSIX sometimes says "SHALL be updated"; but mostly, it says
"SHALL be marked for update".  Probably you can delay those
indefinitely, as long as the timestamp is set at the time you
mark, so it matches what would have been there.  It's probably
OK to coelesce them to the most recent one, as well.

The atime is actually one of the things I had to "POSIX lawyer"
in a project back around 1994.  8-).

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 02:49:02 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 849F437B401
	for <freebsd-fs@freebsd.org>; Thu, 17 Apr 2003 02:49:02 -0700 (PDT)
Received: from mailbox.univie.ac.at (mailbox.univie.ac.at [131.130.1.27])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7F3C543F3F
	for <freebsd-fs@freebsd.org>; Thu, 17 Apr 2003 02:49:01 -0700 (PDT)
	(envelope-from l.ertl@univie.ac.at)
Received: from pcle2.cc.univie.ac.at (pcle2.cc.univie.ac.at [131.130.2.177])
	by mailbox.univie.ac.at (8.12.2/8.12.2) with ESMTP id h3H9mnvN029940
	for <freebsd-fs@freebsd.org>; Thu, 17 Apr 2003 11:48:55 +0200
Date: Thu, 17 Apr 2003 11:48:49 +0200 (CEST)
From: Lukas Ertl <l.ertl@univie.ac.at>
To: freebsd-fs@freebsd.org
Message-ID: <20030417114652.A11713@pcle2.cc.univie.ac.at>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=ISO-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
X-DCC-ZID-Univie-mailbox-Metrics: mailbox 4251; Body=1 Fuz1=1 Fuz2=1
Subject: growing filesystems in 5-current
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Apr 2003 09:49:02 -0000

Hi!

(I've sent the following mail to -hackers, but haven't received a reply
yet, so I'm trying here - thanks.)

Since growfs currently is not able to grow filesystems on vinum volumes in
5-current, I started playing around with it and hacked to following patch.
On first look it seems to work, but there is still a problem I can't
explain.

Consider a simple vinum volume with a concat plex, containing a 32 MB
subdisk. I newfs this volume like that:

---8<---
# newfs -O2 /dev/vinum/mytest
/dev/vinum/mytest: 32.0MB (65536 sectors) block size 16384, fragment size
2048
        using 4 cylinder groups of 8.02MB, 513 blks, 1088 inodes.
super-block backups (for fsck -b #) at:
 160, 16576, 32992, 49408
---8<---

So far, so good. Then I attach another 32 MB subdisk to the plex and try
my hacked growfs on it and I get this:

---8<---
# growfs /dev/vinum/mytest
We strongly recommend you to make a backup before growing the Filesystem

 Did you backup your data (Yes/No) ? Yes
new file systemsize is: 32768 frags
Warning: 16160 sector(s) cannot be allocated.
growfs: 56.1MB (114912 sectors) block size 16384, fragment size 2048
        using 7 cylinder groups of 8.02MB, 513 blks, 1088 inodes.
super-block backups (for fsck -b #) at:
 65824, 82240, 98656
---8<---

Why do I loose so many sectors there? Can you help me find the bug?

At first I suspected sblock.fs_fpg, since a debug printf after:

---8<---
    if (sblock.fs_size % sblock.fs_fpg !=3D 0 &&
        sblock.fs_size % sblock.fs_fpg < cgdmin(&sblock, sblock.fs_ncg)) {
---8<---

said that sblock.fs_fpg is 0 - a debug printf before that if statement
told me a more likely number.

Apart from that: am I going the wrong way with this patch? Is there a
better way to fit growfs to the new vinum/geom stuff?

Here's the patch:

---8<---
Index: growfs.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /u/cvs/cvs/src/sbin/growfs/growfs.c,v
retrieving revision 1.13
diff -u -r1.13 growfs.c
--- growfs.c=0930 Dec 2002 21:18:05 -0000=091.13
+++ growfs.c=0916 Apr 2003 17:51:02 -0000
@@ -56,6 +56,7 @@
 #include <sys/disklabel.h>
 #include <sys/ioctl.h>
 #include <sys/stat.h>
+#include <sys/disk.h>

 #include <stdio.h>
 #include <paths.h>
@@ -111,6 +112,8 @@
 static char=09=09inobuf[MAXBSIZE];=09/* inode block */
 static int=09=09maxino;=09=09=09/* last valid inode */

+static int  unlabeled;
+
 /*
  * An  array of elements of type struct gfs_bpp describes all blocks  to
  * be relocated in order to free the space needed for the cylinder group
@@ -148,6 +151,7 @@
 static void=09updrefs(int, ino_t, struct gfs_bpp *, int, int, unsigned int=
);
 static void=09indirchk(ufs_lbn_t, ufs_lbn_t, ufs2_daddr_t, ufs_lbn_t,
 =09=09    struct gfs_bpp *, int, int, unsigned int);
+static void get_dev_size(int, int *);

 /* ************************************************************ growfs ***=
** */
 /*
@@ -1884,6 +1888,21 @@
 =09return columns;
 }

+static void
+get_dev_size(int fd, int *size)
+{
+=09int sectorsize;
+=09off_t mediasize;
+
+=09ioctl(fd, DIOCGSECTORSIZE, &sectorsize);
+=09ioctl(fd, DIOCGMEDIASIZE, &mediasize);
+
+=09if (sectorsize <=3D 0)
+=09=09errx(1, "bogus sectorsize: %d", sectorsize);
+
+=09*size =3D mediasize / sectorsize;
+}
+
 /* ************************************************************** main ***=
** */
 /*
  * growfs(8)  is a utility which allows to increase the size of  an  exist=
ing
@@ -1921,6 +1940,7 @@
 =09struct disklabel=09*lp;
 =09struct partition=09*pp;
 =09int=09i,fsi,fso;
+=09u_int32_t p_size;
 =09char=09reply[5];
 #ifdef FSMAXSNAP
 =09int=09j;
@@ -2020,25 +2040,30 @@
 =09 */
 =09cp=3Ddevice+strlen(device)-1;
 =09lp =3D get_disklabel(fsi);
-=09if(lp->d_type =3D=3D DTYPE_VINUM) {
-=09=09pp =3D &lp->d_partitions[0];
-=09} else if (isdigit(*cp)) {
-=09=09pp =3D &lp->d_partitions[2];
-=09} else if (*cp>=3D'a' && *cp<=3D'h') {
-=09=09pp =3D &lp->d_partitions[*cp - 'a'];
+=09if (lp !=3D NULL) {
+=09=09if (isdigit(*cp)) {
+=09=09=09pp =3D &lp->d_partitions[2];
+=09=09} else if (*cp>=3D'a' && *cp<=3D'h') {
+=09=09=09pp =3D &lp->d_partitions[*cp - 'a'];
+=09=09} else {
+=09=09=09errx(1, "unknown device");
+=09=09}
+=09=09p_size =3D pp->p_size;
 =09} else {
-=09=09errx(1, "unknown device");
+=09=09get_dev_size(fsi, &p_size);
 =09}

 =09/*
 =09 * Check if that partition looks suited for growing a file system.
 =09 */
-=09if (pp->p_size < 1) {
+=09if (p_size < 1) {
 =09=09errx(1, "partition is unavailable");
 =09}
+/*
 =09if (pp->p_fstype !=3D FS_BSDFFS) {
 =09=09errx(1, "partition not 4.2BSD");
 =09}
+*/

 =09/*
 =09 * Read the current superblock, and take a backup.
@@ -2067,11 +2092,11 @@
 =09 * Determine size to grow to. Default to the full size specified in
 =09 * the disk label.
 =09 */
-=09sblock.fs_size =3D dbtofsb(&osblock, pp->p_size);
+=09sblock.fs_size =3D dbtofsb(&osblock, p_size);
 =09if (size !=3D 0) {
-=09=09if (size > pp->p_size){
+=09=09if (size > p_size){
 =09=09=09errx(1, "There is not enough space (%d < %d)",
-=09=09=09    pp->p_size, size);
+=09=09=09    p_size, size);
 =09=09}
 =09=09sblock.fs_size =3D dbtofsb(&osblock, size);
 =09}
@@ -2121,7 +2146,7 @@
 =09 * later on realize we have to abort our operation, on that block
 =09 * there should be no data, so we can't destroy something yet.
 =09 */
-=09wtfs((ufs2_daddr_t)pp->p_size-1, (size_t)DEV_BSIZE, (void *)&sblock,
+=09wtfs((ufs2_daddr_t)p_size-1, (size_t)DEV_BSIZE, (void *)&sblock,
 =09    fso, Nflag);

 =09/*
@@ -2182,12 +2207,14 @@
 =09/*
 =09 * Update the disk label.
 =09 */
-=09pp->p_fsize =3D sblock.fs_fsize;
-=09pp->p_frag =3D sblock.fs_frag;
-=09pp->p_cpg =3D sblock.fs_fpg;
-
-=09return_disklabel(fso, lp, Nflag);
-=09DBG_PRINT0("label rewritten\n");
+=09if (!unlabeled) {
+=09=09pp->p_fsize =3D sblock.fs_fsize;
+=09=09pp->p_frag =3D sblock.fs_frag;
+=09=09pp->p_cpg =3D sblock.fs_fpg;
+
+=09=09return_disklabel(fso, lp, Nflag);
+=09=09DBG_PRINT0("label rewritten\n");
+=09}

 =09close(fsi);
 =09if(fso>-1) close(fso);
@@ -2254,12 +2281,13 @@
 =09if (!lab) {
 =09=09errx(1, "malloc failed");
 =09}
-=09if (ioctl(fd, DIOCGDINFO, (char *)lab) < 0) {
-=09=09errx(1, "DIOCGDINFO failed");
+=09if (!ioctl(fd, DIOCGDINFO, (char *)lab)) {
+=09=09return (lab);
 =09}
+=09unlabeled++;

 =09DBG_LEAVE;
-=09return (lab);
+=09return (NULL);
 }
---8<---

best regards,
le

--=20
Lukas Ertl                             eMail: l.ertl@univie.ac.at
UNIX-Systemadministrator               Tel.:  (+43 1) 4277-14073
Zentraler Informatikdienst (ZID)       Fax.:  (+43 1) 4277-9140
der Universit=E4t Wien                   http://mailbox.univie.ac.at/~le/

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 03:27:07 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id E981637B404; Thu, 17 Apr 2003 03:27:07 -0700 (PDT)
Received: from franky.speednet.com.au (franky.speednet.com.au [203.57.65.5])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id BC54243FCB; Thu, 17 Apr 2003 03:27:06 -0700 (PDT)
	(envelope-from andyf@speednet.com.au)
Received: from hewey.af.speednet.com.au (hewey.af.speednet.com.au
	[203.38.96.242])h3HAR2l1080319;	Thu, 17 Apr 2003 20:27:02 +1000 (EST)
	(envelope-from andyf@speednet.com.au)
Received: from hewey.af.speednet.com.au (hewey.af.speednet.com.au
	[203.38.96.242])h3HAR1g9002252;	Thu, 17 Apr 2003 20:27:01 +1000 (EST)
	(envelope-from andyf@speednet.com.au)
Date: Thu, 17 Apr 2003 20:27:00 +1000 (EST)
From: Andy Farkas <andyf@speednet.com.au>
X-X-Sender: andyf@hewey.af.speednet.com.au
To: "Paul M. Lambert" <plambert@plambert.net>
In-Reply-To: <20030417075306.GZ71088@slappy.plambert.net>
Message-ID: <20030417194056.B795-100000@hewey.af.speednet.com.au>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-fs@freebsd.org
cc: freebsd-questions@freebsd.org
Subject: Re: chflags "archived" flag?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Apr 2003 10:27:08 -0000


[cc'd to -fs because you might have more clue..]

>
> chflags(1) and chflags(2) and chflags(3) all mention SF_ARCHIVED as a flag
> that the superuser can set on a file or directory.
>
> My question is simple: what's this flag do?  Does it have any effect?
>

Short answer: nothing. no.

Its only there to support msdos(5) type file systems.

--

 :{ andyf@speednet.com.au

        Andy Farkas
    System Administrator
   Speednet Communications
 http://www.speednet.com.au/


From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 04:45:38 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 0CB9637B401; Thu, 17 Apr 2003 04:45:38 -0700 (PDT)
Received: from premijer.tel.fer.hr (premijer.tel.fer.hr [161.53.19.221])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 0481F43FAF; Thu, 17 Apr 2003 04:45:37 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from tel.fer.hr (unknown [161.53.19.14])
	by premijer.tel.fer.hr (Postfix) with ESMTP
	id 120DA1380; Thu, 17 Apr 2003 13:45:17 +0200 (MET DST)
Message-ID: <3E9E93D8.EB16ED42@tel.fer.hr>
Date: Thu, 17 Apr 2003 13:45:28 +0200
From: Marko Zec <zec@tel.fer.hr>
X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To: David Schultz <das@FreeBSD.org>
References: <3E976EBD.C3E66EF8@tel.fer.hr>
	<20030414101935.GB18110@HAL9000.homeunix.com> <3E9C5975.43755858@tel.fer.hr>
	<20030416101136.GA868@HAL9000.homeunix.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: freebsd-fs@FreeBSD.org
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Apr 2003 11:45:38 -0000

David Schultz wrote:

> On Tue, Apr 15, 2003, Marko Zec wrote:
> >
> > > - The fiddling with rushjob seems rather arbitrary.  You can probably
> > >   just let the existing code increment it as necessary and force a sync
> > >   if the value gets too high.
> >
> > If rushjob is would not be used for forcing prompt synching, the original code
> > could not guarantee the sync to occur immediately. Instead, the synching could
> > be further delayed for up to 30 seconds, which is not desirable if our major
> > design goal is to do as much disk I/O as possible in a small time interval and
> > leave the disk idle otherwise.
>
> I was referring to all the places where rushjob is set to or
> incremented by syncer_maxdelay.  AFAIK, it should never be that
> large.

Hmm... Why? :)

> I don't think you want to overload a low memory handling
> mechanism with the task of syncing the disk.

As far as I can see the rushjob variable is used only at one place in
kern/vfs_subr.c to notify softupdates synching scheduler to start synching earlier
than the normal timers would expire. I just reused the same mechanism to urge
synching of dirty buffers when the extra delay timer expires, or when outstanding
disk I/O occurs, to coalesce disk updates with occasional disk spinups.

Marko


From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 05:03:56 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id A36D537B401; Thu, 17 Apr 2003 05:03:56 -0700 (PDT)
Received: from premijer.tel.fer.hr (premijer.tel.fer.hr [161.53.19.221])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 8DCF843FBD; Thu, 17 Apr 2003 05:03:55 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from tel.fer.hr (unknown [161.53.19.14])
	by premijer.tel.fer.hr (Postfix) with ESMTP
	id 0B89C1380; Thu, 17 Apr 2003 14:03:37 +0200 (MET DST)
Message-ID: <3E9E9827.4BB19197@tel.fer.hr>
Date: Thu, 17 Apr 2003 14:03:51 +0200
From: Marko Zec <zec@tel.fer.hr>
X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Ian Dowse <iedowse@maths.tcd.ie>
References: <200304162310.aa96829@salmon.maths.tcd.ie>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: freebsd-fs@freebsd.org
cc: freebsd-stable@freebsd.org
cc: Kirk McKusick <mckusick@beastie.mckusick.com>
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Apr 2003 12:03:57 -0000

Ian Dowse wrote:

> In message <3E9C517B.6039679A@tel.fer.hr>, Marko Zec writes:
> >Tempted by a lot of opposition to the concept of (optionally) ignoring
> >fsync() calls when running on battery power, I wonder what effect the
> >concept of unconditional delaying of _all_ disk updates by ATA-disk
> >firmware will make on FS consistency in case of system crash or power
> >failure? I do not want to imply such a concept is a priori bad, however
> >I fail to realize its advantages over OS-controlled delaying of disk
> >synching.
>
> Note that the ATA "delayed write" mechanism only delays writes while
> the disk is spun down; at other times there is no change in behaviour.
> Since the disk only spins down after it has been idle for a time,
> it is very unlikely that the disk is left in an inconsistent state
> while it is stopped.
>
> Just after the disk spins up there is a small window where the
> cached writes get written out in a burst. Due to the amount of
> cached data and the probable re-ordering of writes, the disk is
> quite likely to be in an inconsistent state during this flurry of
> writes, but the window is short so it is probably not a big issue
> in practice.
>
> The main advantage of using the ATA delayed write mechanism is that
> the disk itself can take advantage of knowing whether or not it is
> spinning, whereas the OS does not have that information.

The OS _does_ know (approximately) when the disk is spinning and when not.
For example, if the disk is configured to stop spinning immediately after
the last I/O operation, the OS can safely assume 10 or more seconds
afterwards the spinning will be stopped. The OS only has to keep record (in
form of timestamp or something similar) when it has issued the last I/O
request to the disk. In my patch this is accomplished using the stratcalls
marker, which is increased every time the strategy routine of the ATA disk
driver is invoked. Therefore the OS can also successfully coalesce the
pending disk updates with other outstanding I/O disk operations, which are
typically reads of uncached sectors or VM swapping.

> The downside
> is that it is not guaranteed that fsync'd data gets written to disk
> immediately, though in practice the disk tends to be spinning when
> the fsync is performed due to the previous accesses. I've been using
> ATA delayed writes on a few laptops for over a year and it has never
> caused me any problems - it generally works just right in the sense
> that the disk remains spun down when the machine is mostly idle,
> and spins up when you save files from an editor etc.

I agree the ATA delayed writes is a great functionality that can help save
battery power. I just want to point out that it can suffer from the same
consistency problems as the model of OS controlled delayed synching combined
with null fsync() processing. However, if the OS controls the delaying of
updates, you can turn on or off normal fsync() semantics as desired. With
delaying writes in ATA firmware, you simply do not have the choice :)
Cheers,

Marko

> Doing the write delaying in the OS is always going to be a tradeoff
> between excessively delaying writes when the machine is busy and
> maximising the time between spin-ups when idle, though obviously
> there is more control possible over which writes get delayed and
> which don't.
>
> Ian


From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 09:32:22 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 03BE437B408
	for <freebsd-fs@freebsd.org>; Thu, 17 Apr 2003 09:32:22 -0700 (PDT)
Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net
	[207.217.120.139])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7888843FBD
	for <freebsd-fs@freebsd.org>; Thu, 17 Apr 2003 09:32:21 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0064.cvx21-bradley.dialup.earthlink.net ([209.179.192.64]
	helo=mindspring.com)
	by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196CIp-0002Yk-00; Thu, 17 Apr 2003 09:32:00 -0700
Message-ID: <3E9ED6B3.CF700528@mindspring.com>
Date: Thu, 17 Apr 2003 09:30:43 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Lukas Ertl <l.ertl@univie.ac.at>
References: <20030417114652.A11713@pcle2.cc.univie.ac.at>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a41429122b84c67fbd04aaa6225954f26a350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@freebsd.org
Subject: Re: growing filesystems in 5-current
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Apr 2003 16:32:22 -0000

Lukas Ertl wrote:
> Since growfs currently is not able to grow filesystems on vinum volumes in
> 5-current, I started playing around with it and hacked to following patch.
> On first look it seems to work, but there is still a problem I can't
> explain.

[ ... ]

> So far, so good. Then I attach another 32 MB subdisk to the plex and try
> my hacked growfs on it and I get this:
[ ... ]
> new file systemsize is: 32768 frags
> Warning: 16160 sector(s) cannot be allocated.
> growfs: 56.1MB (114912 sectors) block size 16384, fragment size 2048
>         using 7 cylinder groups of 8.02MB, 513 blks, 1088 inodes.
> super-block backups (for fsck -b #) at:
>  65824, 82240, 98656
> ---8<---
> 
> Why do I loose so many sectors there? Can you help me find the bug?

The simple answer is that you must be getting the size of the
underlying plex wrong, if you are really losing anything.

In reality, I think it's because you are expecting the stats
to apply to the whole range, and what's happening is that it's
only initializing the cylinder groups for the new part you
added.  The progression is:

	65824, 82240, 98656

If we project this backwards, we see:

	82240 - 65824 = 16416
	98656 - 82240 = 16416

So:

	65824 - 16416 = 49408 - 16416 = 32992 - 16416 = 16576

With a remainder of 160 for FS control structures or whatever.

So the previous progression was:

	160 (start)
	16576, 32992, 49408

...and also consists of 3 elements, following "start", so it
seems you aren't losing anything, at least to me.

Probably your patch is fine.

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 09:41:56 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id A47B037B401; Thu, 17 Apr 2003 09:41:56 -0700 (PDT)
Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net
	[207.217.120.139])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id D25C943FAF; Thu, 17 Apr 2003 09:41:55 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0064.cvx21-bradley.dialup.earthlink.net ([209.179.192.64]
	helo=mindspring.com)
	by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196CSM-0003jx-00; Thu, 17 Apr 2003 09:41:51 -0700
Message-ID: <3E9ED902.8BF30AA7@mindspring.com>
Date: Thu, 17 Apr 2003 09:40:34 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Marko Zec <zec@tel.fer.hr>
References: <3E976EBD.C3E66EF8@tel.fer.hr>
	<3E9C5975.43755858@tel.fer.hr><3E9E93D8.EB16ED42@tel.fer.hr>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a41429122b84c67fbd29c0a98c3d2fd16b3ca473d225a0f487350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Apr 2003 16:41:57 -0000

Marko Zec wrote:
> David Schultz wrote:
> > I was referring to all the places where rushjob is set to or
> > incremented by syncer_maxdelay.  AFAIK, it should never be that
> > large.
> 
> Hmm... Why? :)

Increased latency; larger pool retention time, larger pool size,
more kernel memory tied up in dependency lists for longer, more
operations blocked because a dependency is already on the write
list, and so locked against modification.


> > I don't think you want to overload a low memory handling
> > mechanism with the task of syncing the disk.
> 
> As far as I can see the rushjob variable is used only at one place in
> kern/vfs_subr.c to notify softupdates synching scheduler to start
> synching earlier than the normal timers would expire. I just reused
> the same mechanism to urge synching of dirty buffers when the extra
> delay timer expires, or when outstanding disk I/O occurs, to coalesce
> disk updates with occasional disk spinups.

...and not syncing in the normal place.

I'm wondering if this really helps some real world situation;
my gut feeling is that it doesn't, and it increases memory use
considerably, until it's flushed.

What I'd like to see is a statistics counter of "delayed syncs"
that occur as a result of doing this, gathered over a period of
time, along with another statistics counter of "drive spindowns".

I know that this will probably end up being observer influenced
enough to be merely anecdotal, but say gather two sets over an
extended period of use without powering the machine down; the
first set without the change, and the next set with the change.

Either way it turns out, it would make a stronger case for or
against than just hand-waving.  8-).

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 09:55:42 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 0E18537B401; Thu, 17 Apr 2003 09:55:42 -0700 (PDT)
Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net
	[207.217.120.139])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 3FA3943FE1; Thu, 17 Apr 2003 09:55:41 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0064.cvx21-bradley.dialup.earthlink.net ([209.179.192.64]
	helo=mindspring.com)
	by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196Cff-0005dn-00; Thu, 17 Apr 2003 09:55:36 -0700
Message-ID: <3E9EDC38.1CE381C6@mindspring.com>
Date: Thu, 17 Apr 2003 09:54:16 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Marko Zec <zec@tel.fer.hr>
References: <200304162310.aa96829@salmon.maths.tcd.ie>
	<3E9E9827.4BB19197@tel.fer.hr>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4c5c06b35ece679a4cfdfcaf6b4f66f3993caf27dac41a8fd350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@freebsd.org
cc: Ian Dowse <iedowse@maths.tcd.ie>
cc: freebsd-stable@freebsd.org
cc: Kirk McKusick <mckusick@beastie.mckusick.com>
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Apr 2003 16:55:42 -0000

Marko Zec wrote:
> Ian Dowse wrote:
> > Note that the ATA "delayed write" mechanism only delays writes while
> > the disk is spun down; at other times there is no change in behaviour.
> > Since the disk only spins down after it has been idle for a time,
> > it is very unlikely that the disk is left in an inconsistent state
> > while it is stopped.

I'm wondering if the ATA "delayed write" actually does this, or if
it merely relaxes the cache restrictions, without retaining the
ordering enforcement.

I suspect that it does not retain the ordering enforcement, as
there is no way to disconnect on a tagged queue write, because
you must issue a request for status, and it can't be done as a
seperate ATA operation (see the posts by the Maxtor employee, on
and around January 20th of this year to the -FS list for details).

You are much better off accumulating requests in the kernel in
buffers, and then using the normal write mechanism to push them
out to the drive ordered (IMO).  This implies a barrier and new
code above the bwrite interface, to keep the buffers from getting
locked, and stalling you applications in user space.

A problem I see here is that swap is on a totally different path,
and in a different area of the disk (practically guaranteeing a
seek, and a track buffer invalidation on the disk), even if you
could cause swapping to be delayed (I don't think you can; FreeBSD
aggressively uses memory, and so when you need to swap, you *need*
to swap).


> The OS _does_ know (approximately) when the disk is spinning and when not.
> For example, if the disk is configured to stop spinning immediately after
> the last I/O operation, the OS can safely assume 10 or more seconds
> afterwards the spinning will be stopped. The OS only has to keep record (in
> form of timestamp or something similar) when it has issued the last I/O
> request to the disk. In my patch this is accomplished using the stratcalls
> marker, which is increased every time the strategy routine of the ATA disk
> driver is invoked. Therefore the OS can also successfully coalesce the
> pending disk updates with other outstanding I/O disk operations, which are
> typically reads of uncached sectors or VM swapping.

This is useful, but not enough.  You need to actually communicate
the information above the block I/O layer, to the soft updates.  I
think, effectively, what you actually want to do is to stop the
soft updates clock, rather than trying to play stupid disk tricks
with timers, etc., above and beyond what you have to do.  I can see
it being useful on SCSI disks, as well, particularly where there are
temperature issues.  Though in that case, you probably are more
memory starved than anything, and it will end up doing you no good.

> I agree the ATA delayed writes is a great functionality that can help save
> battery power.

I don't; only if the write order is maintained is it "great".

> I just want to point out that it can suffer from the same
> consistency problems as the model of OS controlled delayed synching combined
> with null fsync() processing. However, if the OS controls the delaying of
> updates, you can turn on or off normal fsync() semantics as desired. With
> delaying writes in ATA firmware, you simply do not have the choice :)

I think people are confusing fsync() with syncd at this point.  8-(.

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 09:57:16 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2E84C37B401
	for <freebsd-fs@freebsd.org>; Thu, 17 Apr 2003 09:57:16 -0700 (PDT)
Received: from mailbox.univie.ac.at (mailbox.univie.ac.at [131.130.1.27])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C5CCA43FD7
	for <freebsd-fs@freebsd.org>; Thu, 17 Apr 2003 09:57:14 -0700 (PDT)
	(envelope-from l.ertl@univie.ac.at)
Received: from localhost.localdomain (adslle.cc.univie.ac.at [131.130.102.11])
	by mailbox.univie.ac.at (8.12.2/8.12.2) with ESMTP id h3HGuxil214586;
	Thu, 17 Apr 2003 18:57:06 +0200
Date: Thu, 17 Apr 2003 18:56:59 +0200 (CEST)
From: Lukas Ertl <l.ertl@univie.ac.at>
To: Terry Lambert <tlambert2@mindspring.com>
In-Reply-To: <3E9ED6B3.CF700528@mindspring.com>
Message-ID: <20030417184604.V719@leelou.in.tern>
References: <20030417114652.A11713@pcle2.cc.univie.ac.at>
	<3E9ED6B3.CF700528@mindspring.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=ISO-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
X-DCC-ZID-Univie-Metrics: mx1 4261; Body=2 Fuz1=2 Fuz2=2
cc: freebsd-fs@freebsd.org
Subject: Re: growing filesystems in 5-current
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Apr 2003 16:57:16 -0000

On Thu, 17 Apr 2003, Terry Lambert wrote:

> ...and also consists of 3 elements, following "start", so it
> seems you aren't losing anything, at least to me.
>
> Probably your patch is fine.

Thanks for your answer, Terry, seems reasonable.

There's still a thing that I have recognized now and that bothers me, and
I can't explain this one too.

Consider again this 32 MB vinum volume. If I newfs it with the default
size of 65536 sectors I get this:

---8<---
# newfs -O2 -s 65536 /dev/vinum/mytest
/dev/vinum/mytest: 32.0MB (65536 sectors) block size 16384, fragment size
2048
        using 4 cylinder groups of 8.02MB, 513 blks, 1088 inodes.
super-block backups (for fsck -b #) at:
 160, 16576, 32992, 49408

# df -k /dev/vinum/mytest
Filesystem          1K-blocks     Used    Avail Capacity  Mounted on
/dev/vinum/mytest       31470        2    28952    0.%
---8<---

Four cg's with 8.02MB each? 513 blocks? Why's that? Shouldn't that be 8MB
each and 512 blocks?

If I growfs this one I get the behaviour I described in my first mail.

Now look at this:

---8<---
# newfs -O2 -s 65535 /dev/vinum/mytest
/dev/vinum/mytest: 32.0MB (65532 sectors) block size 16384, fragment size
2048
        using 4 cylinder groups of 8.00MB, 512 blks, 1024 inodes.
super-block backups (for fsck -b #) at:
 160, 16544, 32928, 49312

# df -k /dev/vinum/mytest
Filesystem          1K-blocks     Used    Avail Capacity  Mounted on
/dev/vinum/mytest       31532        2    29008    0.%
---8<---

So I explicitly make the FS one sector smaller than the default value, and
I get not only 4 cg's with 8 MB and 512 blocks (which would seem correct
to me), but I also get more space available on the FS.

And if I growfs this one, everything works as expected:

---8<---
# growfs /dev/vinum/mytest
We strongly recommend you to make a backup before growing the Filesystem

 Did you backup your data (Yes/No) ? Yes
new file systemsize is: 32768 frags
growfs: 64.0MB (131072 sectors) block size 16384, fragment size 2048
        using 8 cylinder groups of 8.00MB, 512 blks, 1024 inodes.
super-block backups (for fsck -b #) at:
 65696, 82080, 98464, 114848
---8<---

What the heck is going on here? newfs bug? Or did I get something wrong?

best regards,
le

--=20
Lukas Ertl                             eMail: l.ertl@univie.ac.at
UNIX-Systemadministrator               Tel.:  (+43 1) 4277-14073
Zentraler Informatikdienst (ZID)       Fax.:  (+43 1) 4277-9140
der Universit=E4t Wien                   http://mailbox.univie.ac.at/~le/

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 10:42:15 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5801E37B401
	for <freebsd-fs@freebsd.org>; Thu, 17 Apr 2003 10:42:15 -0700 (PDT)
Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net
	[207.217.120.188])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 941B543FA3
	for <freebsd-fs@freebsd.org>; Thu, 17 Apr 2003 10:42:10 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0064.cvx21-bradley.dialup.earthlink.net ([209.179.192.64]
	helo=mindspring.com)
	by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196DOA-0004D8-00; Thu, 17 Apr 2003 10:41:35 -0700
Message-ID: <3E9EE6F9.6672A808@mindspring.com>
Date: Thu, 17 Apr 2003 10:40:09 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Lukas Ertl <l.ertl@univie.ac.at>
References: <20030417114652.A11713@pcle2.cc.univie.ac.at>
	<3E9ED6B3.CF700528@mindspring.com> <20030417184604.V719@leelou.in.tern>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4d08911b75f359757177a757c45a4e0d1a8438e0f32a48e08350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@freebsd.org
Subject: Re: growing filesystems in 5-current
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Apr 2003 17:42:15 -0000

Lukas Ertl wrote:
> Consider again this 32 MB vinum volume. If I newfs it with the default
> size of 65536 sectors I get this:
> 
> ---8<---
> # newfs -O2 -s 65536 /dev/vinum/mytest
> /dev/vinum/mytest: 32.0MB (65536 sectors) block size 16384, fragment size
> 2048
>         using 4 cylinder groups of 8.02MB, 513 blks, 1088 inodes.
> super-block backups (for fsck -b #) at:
>  160, 16576, 32992, 49408
> 
> # df -k /dev/vinum/mytest
> Filesystem          1K-blocks     Used    Avail Capacity  Mounted on
> /dev/vinum/mytest       31470        2    28952    0.%
> ---8<---
> 
> Four cg's with 8.02MB each? 513 blocks? Why's that? Shouldn't that be 8MB
> each and 512 blocks?

The short answer for the first question is that the MB calculation
is not what you think.

The short answer for the second question is "because of the frag size".

So the answer to the last question is "no".

As to the available capacity, you can only use even numbers of
cylinder groups, because there's a bitmap.


> If I growfs this one I get the behaviour I described in my first mail.
> 
> Now look at this:
> 
> ---8<---
> # newfs -O2 -s 65535 /dev/vinum/mytest
> /dev/vinum/mytest: 32.0MB (65532 sectors) block size 16384, fragment size
> 2048

The most important thing to note here is that, before, you told
it 65536, and it gave you 65536.  Here you are asking for 65535,
and getting 65532.  That's 3 less sectors to get to a 4 sector
boundary, so that you have an even multiple of the frag size of
2048 (512b * 4 = 2048).


>         using 4 cylinder groups of 8.00MB, 512 blks, 1024 inodes.
> super-block backups (for fsck -b #) at:
>  160, 16544, 32928, 49312
> 
> # df -k /dev/vinum/mytest
> Filesystem          1K-blocks     Used    Avail Capacity  Mounted on
> /dev/vinum/mytest       31532        2    29008    0.%
> ---8<---
> 
> So I explicitly make the FS one sector smaller than the default value, and
> I get not only 4 cg's with 8 MB and 512 blocks (which would seem correct
> to me), but I also get more space available on the FS.

If you want to know exactly where it comes from, you've added
additional frags, which are counted in the numbers, so you get
those additional "whole disk blocks.  We can do the math again:

	29008(1K) - 28952(1K) = 48(1K) = 24(2K) / 3 = 8(2K)

	...and you have a total of 8 cylinder groups.

Only whole file system blocks are considered for the calculation
of the free reserve, so you get "more free space" that's not tied
up in 16K disk blocks.


> And if I growfs this one, everything works as expected:

[ ... ]

> What the heck is going on here? newfs bug? Or did I get something wrong?

Nope; just power of two math, and an impedence mismatch in
rounding.  8-).

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 12:08:57 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 2DF6737B401; Thu, 17 Apr 2003 12:08:57 -0700 (PDT)
Received: from gatekeeper.oremut01.us.wh.verio.net
	(gatekeeper.oremut01.us.wh.verio.net [198.65.168.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 7240E43FD7; Thu, 17 Apr 2003 12:08:56 -0700 (PDT)
	(envelope-from fclift@verio.net)
Received: from mx.dmz.orem.verio.net (mx.dmz.orem.verio.net [10.1.1.10])
	by gatekeeper.oremut01.us.wh.verio.net (Postfix) with ESMTP
	id 0EE433BF43A; Thu, 17 Apr 2003 13:08:56 -0600 (MDT)
Received: from vespa.dmz.orem.verio.net (vespa.dmz.orem.verio.net [10.1.1.59])
	by mx.dmz.orem.verio.net (8.11.6p2/8.11.6) with ESMTP id h3HJ8tJ98405;
	Thu, 17 Apr 2003 13:08:55 -0600 (MDT)
Date: Thu, 17 Apr 2003 13:12:39 -0600 (MDT)
From: Fred Clift <fclift@verio.net>
X-X-Sender: <fred@vespa.dmz.orem.verio.net>
To: Ian Dowse <iedowse@maths.tcd.ie>
In-Reply-To: <200304162310.aa96829@salmon.maths.tcd.ie>
Message-ID: <20030417130651.N46464-100000@vespa.dmz.orem.verio.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-fs@freebsd.org
cc: freebsd-stable@freebsd.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Apr 2003 19:08:57 -0000

On Wed, 16 Apr 2003, Ian Dowse wrote:

>
> Just after the disk spins up there is a small window where the
> cached writes get written out in a burst. Due to the amount of
> cached data and the probable re-ordering of writes, the disk is
> quite likely to be in an inconsistent state during this flurry of
> writes, but the window is short so it is probably not a big issue
> in practice.

Of course, this is when your power-supply is most likley to fail due to
the sudden increased load :).

<Anecdote>
I lost a disk that I had been occasionally using as a backup drive due to
an effect like this.  I had two scsi drives in an external enclosure, and
I wanted to re-newfs the other drive in the enclosure so I started a tar
job to copy all the files over and the PS blew about 20 seconds into the
write since both drives were 'busy' rather than just one or the other as
had been the case for quite a while as the machine sat in the corner and
did nothing for a year.  The target drive was hosed bad enough that you
couldn't newfs it any more and the vendor's low-level format tools claimed
the disk was unrepairable...
</Anecdote>

I guess in a laptop, this failure mode isn't as likley as in my case...


Fred

--
Fred Clift - fclift@verio.net -- Remember: If brute
force doesn't work, you're just not using enough.

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 12:27:13 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 3F08D37B401; Thu, 17 Apr 2003 12:27:13 -0700 (PDT)
Received: from mail.tel.fer.hr (zg07-196.dialin.iskon.hr [213.191.150.197])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 832FC43F3F; Thu, 17 Apr 2003 12:27:10 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from marko-tp (marko@[192.168.202.105])
	by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3HJPFxI000836;
	Thu, 17 Apr 2003 21:25:20 +0200 (CEST)
	(envelope-from zec@tel.fer.hr)
From: Marko Zec <zec@tel.fer.hr>
To: Terry Lambert <tlambert2@mindspring.com>
Date: Thu, 17 Apr 2003 21:26:57 +0200
User-Agent: KMail/1.5
References: <3E976EBD.C3E66EF8@tel.fer.hr> <3E9E93D8.EB16ED42@tel.fer.hr>
	<3E9ED902.8BF30AA7@mindspring.com>
In-Reply-To: <3E9ED902.8BF30AA7@mindspring.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200304172126.57611.zec@tel.fer.hr>
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Apr 2003 19:27:13 -0000

On Thursday 17 April 2003 18:40, Terry Lambert wrote:
> Marko Zec wrote:
> > David Schultz wrote:
> > > I was referring to all the places where rushjob is set to or
> > > incremented by syncer_maxdelay.  AFAIK, it should never be that
> > > large.
> >
> > Hmm... Why? :)
>
> Increased latency; larger pool retention time, larger pool size,
> more kernel memory tied up in dependency lists for longer, more
> operations blocked because a dependency is already on the write
> list, and so locked against modification.

Increasing "rushjob" has only a single consequence, and that is precisely a 
prompt flushing of dirty buffers. Are you sure we are talking about the same 
code here, rushjob in kern/vfs_subr.c, or something completely different?

>
> > > I don't think you want to overload a low memory handling
> > > mechanism with the task of syncing the disk.
> >
> > As far as I can see the rushjob variable is used only at one place in
> > kern/vfs_subr.c to notify softupdates synching scheduler to start
> > synching earlier than the normal timers would expire. I just reused
> > the same mechanism to urge synching of dirty buffers when the extra
> > delay timer expires, or when outstanding disk I/O occurs, to coalesce
> > disk updates with occasional disk spinups.
>
> ...and not syncing in the normal place.
>
> I'm wondering if this really helps some real world situation;
> my gut feeling is that it doesn't, and it increases memory use
> considerably, until it's flushed.

Ignoring fsync _really_ helps in real world situations, if you keep in mind 
that the original purpose of the patch is to keep the disk spinned down and 
save battery power.

>
> What I'd like to see is a statistics counter of "delayed syncs"
> that occur as a result of doing this, gathered over a period of
> time, along with another statistics counter of "drive spindowns".
>
> I know that this will probably end up being observer influenced
> enough to be merely anecdotal, but say gather two sets over an
> extended period of use without powering the machine down; the
> first set without the change, and the next set with the change.
>
> Either way it turns out, it would make a stronger case for or
> against than just hand-waving.  8-).

Such a measurement could turn out to be relevant only if one would precisely 
define a test load. Obviously different results could be expected if the 
machine would be completely idle and if it would be not. Instead of just 
hand-waving, could we just more closely specify what we consider a relevant 
load for a battery-powered laptop? :)

Marko

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 12:43:47 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 978E537B404; Thu, 17 Apr 2003 12:43:47 -0700 (PDT)
Received: from mail.tel.fer.hr (zg05-025.dialin.iskon.hr [213.191.138.26])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 9F19C43FA3; Thu, 17 Apr 2003 12:43:45 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from marko-tp (marko@[192.168.202.105])
	by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3HJfhxI000841;
	Thu, 17 Apr 2003 21:41:47 +0200 (CEST)
	(envelope-from zec@tel.fer.hr)
From: Marko Zec <zec@tel.fer.hr>
To: Terry Lambert <tlambert2@mindspring.com>
Date: Thu, 17 Apr 2003 21:43:26 +0200
User-Agent: KMail/1.5
References: <200304162310.aa96829@salmon.maths.tcd.ie>
	<3E9E9827.4BB19197@tel.fer.hr> <3E9EDC38.1CE381C6@mindspring.com>
In-Reply-To: <3E9EDC38.1CE381C6@mindspring.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200304172143.26387.zec@tel.fer.hr>
cc: freebsd-fs@freebsd.org
cc: Ian Dowse <iedowse@maths.tcd.ie>
cc: freebsd-stable@freebsd.org
cc: Kirk McKusick <mckusick@beastie.mckusick.com>
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Apr 2003 19:43:48 -0000

On Thursday 17 April 2003 18:54, Terry Lambert wrote:
> Marko Zec wrote:
> > Ian Dowse wrote:
> > > Note that the ATA "delayed write" mechanism only delays writes while
> > > the disk is spun down; at other times there is no change in behaviour.
> > > Since the disk only spins down after it has been idle for a time,
> > > it is very unlikely that the disk is left in an inconsistent state
> > > while it is stopped.
>
> I'm wondering if the ATA "delayed write" actually does this, or if
> it merely relaxes the cache restrictions, without retaining the
> ordering enforcement.
>
> I suspect that it does not retain the ordering enforcement, as
> there is no way to disconnect on a tagged queue write, because
> you must issue a request for status, and it can't be done as a
> seperate ATA operation (see the posts by the Maxtor employee, on
> and around January 20th of this year to the -FS list for details).
>
> You are much better off accumulating requests in the kernel in
> buffers, and then using the normal write mechanism to push them
> out to the drive ordered (IMO). 

That is precisely what the original OS-controlled delayed synching patch does 
:)

> This implies a barrier and new
> code above the bwrite interface, to keep the buffers from getting
> locked, and stalling you applications in user space.
>
> A problem I see here is that swap is on a totally different path,
> and in a different area of the disk (practically guaranteeing a
> seek, and a track buffer invalidation on the disk), even if you
> could cause swapping to be delayed (I don't think you can; FreeBSD
> aggressively uses memory, and so when you need to swap, you *need*
> to swap).
>
> > The OS _does_ know (approximately) when the disk is spinning and when
> > not. For example, if the disk is configured to stop spinning immediately
> > after the last I/O operation, the OS can safely assume 10 or more seconds
> > afterwards the spinning will be stopped. The OS only has to keep record
> > (in form of timestamp or something similar) when it has issued the last
> > I/O request to the disk. In my patch this is accomplished using the
> > stratcalls marker, which is increased every time the strategy routine of
> > the ATA disk driver is invoked. Therefore the OS can also successfully
> > coalesce the pending disk updates with other outstanding I/O disk
> > operations, which are typically reads of uncached sectors or VM swapping.
>
> This is useful, but not enough.  You need to actually communicate
> the information above the block I/O layer, to the soft updates.  I
> think, effectively, what you actually want to do is to stop the
> soft updates clock

Hey man, that's exactly what I have done in my patch ("stopping the soft 
updates clock" as you call it). On the block I/O layer I'm only checking if 
the disk is active or not... Are you sure you have checked out the patch / 
code?

> , rather than trying to play stupid disk tricks
> with timers, etc., above and beyond what you have to do.  I can see
> it being useful on SCSI disks, as well, particularly where there are
> temperature issues.  Though in that case, you probably are more
> memory starved than anything, and it will end up doing you no good.
>
> > I agree the ATA delayed writes is a great functionality that can help
> > save battery power.
>
> I don't; only if the write order is maintained is it "great".
>
> > I just want to point out that it can suffer from the same
> > consistency problems as the model of OS controlled delayed synching
> > combined with null fsync() processing. However, if the OS controls the
> > delaying of updates, you can turn on or off normal fsync() semantics as
> > desired. With delaying writes in ATA firmware, you simply do not have the
> > choice :)
>
> I think people are confusing fsync() with syncd at this point.  8-(.
>
> -- Terry

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 17:08:50 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 69F7437B40B; Thu, 17 Apr 2003 17:08:47 -0700 (PDT)
Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net
	[207.217.120.189])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 6E12843FDD; Thu, 17 Apr 2003 17:08:46 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0101.cvx22-bradley.dialup.earthlink.net ([209.179.198.101]
	helo=mindspring.com)
	by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196JQm-00035J-00; Thu, 17 Apr 2003 17:08:41 -0700
Message-ID: <3E9F4195.C830A6AD@mindspring.com>
Date: Thu, 17 Apr 2003 17:06:45 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Marko Zec <zec@tel.fer.hr>
References: <3E976EBD.C3E66EF8@tel.fer.hr> <3E9E93D8.EB16ED42@tel.fer.hr>
	<3E9ED902.8BF30AA7@mindspring.com> <200304172126.57611.zec@tel.fer.hr>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4c29abeabdf7b825298ca9fb9a6590f09548b785378294e88350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 00:08:50 -0000

Marko Zec wrote:
> On Thursday 17 April 2003 18:40, Terry Lambert wrote:
> > Marko Zec wrote:
> > > David Schultz wrote:
> > > > I was referring to all the places where rushjob is set to or
> > > > incremented by syncer_maxdelay.  AFAIK, it should never be that
> > > > large.
> > >
> > > Hmm... Why? :)
> >
> > Increased latency; larger pool retention time, larger pool size,
> > more kernel memory tied up in dependency lists for longer, more
> > operations blocked because a dependency is already on the write
> > list, and so locked against modification.
> 
> Increasing "rushjob" has only a single consequence, and that is precisely a
> prompt flushing of dirty buffers. Are you sure we are talking about the same
> code here, rushjob in kern/vfs_subr.c, or something completely different?

I'm talking about what David Schultz was talking about when you
said "Hmm... Why?".  8-).

If you increase the syncer delay, you increase the amount of
unsynced data that's outstanding, on average, which is what
makes doing it dangerous.  Especially right now, where there
is a lot of code that doesn't expect a NULL return from the
kernel malloc, but the new kernel malloc can always return
NULL.  Any additional amount of memory pressure you force on
things through added latency delays is Bad(tm).


> > I'm wondering if this really helps some real world situation;
> > my gut feeling is that it doesn't, and it increases memory use
> > considerably, until it's flushed.
> 
> Ignoring fsync _really_ helps in real world situations, if you keep in mind
> that the original purpose of the patch is to keep the disk spinned down and
> save battery power.

I understand the original purpose; I'd still llike to see stats to
back up whether or not it accomplishes it.  8-).


> > I know that this will probably end up being observer influenced
> > enough to be merely anecdotal, but say gather two sets over an
> > extended period of use without powering the machine down; the
> > first set without the change, and the next set with the change.
> >
> > Either way it turns out, it would make a stronger case for or
> > against than just hand-waving.  8-).
> 
> Such a measurement could turn out to be relevant only if one would precisely
> define a test load.

Which is why I suggested a statistical load, instead.  FreeBSD
isn't well enough put together to allow you to replay an I/O
load like that, particularly a sparse one, so the best you are
going to be able to get is statistical significance.

Actually, if you think about it, it would be hard to prove that
even a repeatable sparse load was unbiased for a particular
result, so you're back to gathering statistical data anyway, to
create a couple of "representative" load sets.

> Obviously different results could be expected if the
> machine would be completely idle and if it would be not. Instead of just
> hand-waving, could we just more closely specify what we consider a relevant
> load for a battery-powered laptop? :)

I guess that would be "any load where the fsync patch helps"?

8-) 8-).

I think it would probaby be betweer to stall the soft updates
clock, flush the pending block I/O out (to unlock the buffers),
and then spin down the disks under OS control.  You could really
guarantee relevence in that case.  Anyone who complained could
pick their own relevence criteria, and hack the code.

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 17:18:54 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id A75C037B401; Thu, 17 Apr 2003 17:18:54 -0700 (PDT)
Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net
	[207.217.120.189])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id DAD4543F75; Thu, 17 Apr 2003 17:18:53 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0101.cvx22-bradley.dialup.earthlink.net ([209.179.198.101]
	helo=mindspring.com)
	by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196JaU-00052N-00; Thu, 17 Apr 2003 17:18:43 -0700
Message-ID: <3E9F4413.D294E69E@mindspring.com>
Date: Thu, 17 Apr 2003 17:17:23 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Marko Zec <zec@tel.fer.hr>
References: <200304162310.aa96829@salmon.maths.tcd.ie>
	<3E9E9827.4BB19197@tel.fer.hr> <3E9EDC38.1CE381C6@mindspring.com>
	<200304172143.26387.zec@tel.fer.hr>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4a048fd8ed6dd21c47885f8499087e66e3ca473d225a0f487350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@freebsd.org
cc: Ian Dowse <iedowse@maths.tcd.ie>
cc: freebsd-stable@freebsd.org
cc: Kirk McKusick <mckusick@beastie.mckusick.com>
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 00:18:55 -0000

Marko Zec wrote:
> > You are much better off accumulating requests in the kernel in
> > buffers, and then using the normal write mechanism to push them
> > out to the drive ordered (IMO).
> 
> That is precisely what the original OS-controlled delayed synching patch does
> :)

Yeah, but the spin-down isn't really under OS control, except
as a sort of statistical hysteresis thing.  8-).

The real problem that people have with the patch is that it is
moderately evil, in that the fsync() doesn't block until it has
actually sync'ed the data out to the disk, like fsync() is
supposed to... and it lets dependent operations keep going.  So
you break the semantics.

I think people would be happier if you just stopped the soft
updates sync clock, and then if someone actually fsync()'ed, or
the dependency list got too big, it spun up the disk, completed
all the I/O quickly, and then spun it down again.


> > This is useful, but not enough.  You need to actually communicate
> > the information above the block I/O layer, to the soft updates.  I
> > think, effectively, what you actually want to do is to stop the
> > soft updates clock
> 
> Hey man, that's exactly what I have done in my patch ("stopping the soft
> updates clock" as you call it). On the block I/O layer I'm only checking if
> the disk is active or not... Are you sure you have checked out the patch /
> code?

See above; do that AND preserve the fsync() semantics, and
you'll have something (still thinking there's a confusion
between fsync() semantics and syncd operation).

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 17:46:09 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 836C037B401; Thu, 17 Apr 2003 17:46:05 -0700 (PDT)
Received: from mail.tel.fer.hr (zg06-163.dialin.iskon.hr [213.191.148.164])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 0119343F3F; Thu, 17 Apr 2003 17:46:02 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from marko-tp (marko@[192.168.201.107])
	by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3I0iBxI000859;
	Fri, 18 Apr 2003 02:44:13 +0200 (CEST)
	(envelope-from zec@tel.fer.hr)
From: Marko Zec <zec@tel.fer.hr>
To: Terry Lambert <tlambert2@mindspring.com>
Date: Fri, 18 Apr 2003 02:45:52 +0200
User-Agent: KMail/1.5
References: <200304162310.aa96829@salmon.maths.tcd.ie>
	<200304172143.26387.zec@tel.fer.hr> <3E9F4413.D294E69E@mindspring.com>
In-Reply-To: <3E9F4413.D294E69E@mindspring.com>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200304180245.53107.zec@tel.fer.hr>
cc: freebsd-fs@freebsd.org
cc: freebsd-stable@freebsd.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 00:46:10 -0000

On Friday 18 April 2003 02:17, Terry Lambert wrote:

> I think people would be happier if you just stopped the soft
> updates sync clock, and then if someone actually fsync()'ed, or
> the dependency list got too big, it spun up the disk, completed
> all the I/O quickly, and then spun it down again.

The updated patch does precisely what you just described above. It already 
includes a tunable vfs.ena_lazy_fsync (off by default) which allows choosing 
whether blocking (standard) or null- fsync() semantics apply. Check out 
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=15720+0+current/freebsd-fs
:)

Marko

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 18:09:14 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5134837B401; Thu, 17 Apr 2003 18:09:14 -0700 (PDT)
Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net
	[207.217.120.139])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 92D0A43F75; Thu, 17 Apr 2003 18:09:13 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0101.cvx22-bradley.dialup.earthlink.net ([209.179.198.101]
	helo=mindspring.com)
	by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196KNK-0000DZ-00; Thu, 17 Apr 2003 18:09:11 -0700
Message-ID: <3E9F4FE4.9B8567DC@mindspring.com>
Date: Thu, 17 Apr 2003 18:07:48 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Marko Zec <zec@tel.fer.hr>
References: <200304162310.aa96829@salmon.maths.tcd.ie>
	<200304172143.26387.zec@tel.fer.hr> <3E9F4413.D294E69E@mindspring.com>
	<200304180245.53107.zec@tel.fer.hr>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4ca8a7942351ba88739234b2a71b4f4c7350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@freebsd.org
cc: freebsd-stable@freebsd.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 01:09:14 -0000

Marko Zec wrote:
> On Friday 18 April 2003 02:17, Terry Lambert wrote:
> > I think people would be happier if you just stopped the soft
> > updates sync clock, and then if someone actually fsync()'ed, or
> > the dependency list got too big, it spun up the disk, completed
> > all the I/O quickly, and then spun it down again.
> 
> The updated patch does precisely what you just described above. It already
> includes a tunable vfs.ena_lazy_fsync (off by default) which allows choosing
> whether blocking (standard) or null- fsync() semantics apply. Check out
> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=15720+0+current/freebsd-fs
> :)

No, you are missing my previous point: the check for free space
should include a check for number of elements *TOTAL* in all slots
on the soft updates timer wheel.  Otherwise it can eat all of
memory.

The free space check only works in the case that you've done a
delete and are allocating new space: the case where you are doing
more and more allocations/opverwrites of data is not handled, and
can grow to eat all available kernel memory.  There was in fact a
bug, early on, that Matt Dillon worked around that caused it under
load, and it was in exactly the code you are touching.


Also, the "ena_lazy_fsync" needs to be overridable, based on
barriers in the dependency list: it's not acceptable to violate
the POSIX semantics over trying to delay fsync().  You insert a
dependency that is blocked by some other dependency already
there, and you're in semantic trouble.  Normally, this would be
prevented by a write lock on the buffer in question, but it's
not queued for write, because the wheels not moving.

The "ena_lazy_fsync" is really a problem, if it permits an
operation, such as the update of a database index file to
point to a new record that has been written to the database
data file.  At this point, fsync() is used for implied
contracts.  The only way you can legitimately delay it is
if there isn't an implied contract, which you should be able
to see as a barrier in the soft update dependency list.


Under what circumstances you you find that delaying fsync()
helps you?  What program are you running that calls fsync()?

I think that maybe you are running a program (like qmail)
that doesn't trust the FS to comply with POSIX, so it inserts
some extra fsync()'s "just in case we are running on ext2fs"
or whatever.

And it still needs a sysctl that counts the number of them
that actually get delayed.  Even if you don't use it for a
statistical check, it will check you on the number of times
fsync() (and sync()) get called by someone.  If it's a small
number, you need to fix the bogus program, rather than hack
the kernel.  8-).

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 17 21:46:26 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id F191337B401; Thu, 17 Apr 2003 21:46:25 -0700 (PDT)
Received: from harmony.village.org (rover.bsdimp.com [204.144.255.66])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 112C043FA3; Thu, 17 Apr 2003 21:46:25 -0700 (PDT)
	(envelope-from imp@bsdimp.com)
Received: from localhost (warner@rover2.village.org [10.0.0.1])
	by harmony.village.org (8.12.8/8.12.3) with ESMTP id h3I4kMA7086789;
	Thu, 17 Apr 2003 22:46:23 -0600 (MDT)
	(envelope-from imp@bsdimp.com)
Date: Thu, 17 Apr 2003 22:46:01 -0600 (MDT)
Message-Id: <20030417.224601.38718174.imp@bsdimp.com>
To: cdillon@wolves.k12.mo.us
From: "M. Warner Losh" <imp@bsdimp.com>
In-Reply-To: <20030416100921.U91118@duey.wolves.k12.mo.us>
References: <20030415160925.U86854@duey.wolves.k12.mo.us>
	<3E9D157E.96FD09AE@mindspring.com>
	<20030416100921.U91118@duey.wolves.k12.mo.us>
X-Mailer: Mew version 2.1 on Emacs 21.2 / Mule 5.0 (SAKAKI)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: freebsd-fs@freebsd.org
cc: mckusick@McKusick.COM
cc: das@freebsd.org
cc: freebsd-stable@freebsd.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 04:46:26 -0000

In message: <20030416100921.U91118@duey.wolves.k12.mo.us>
            Chris Dillon <cdillon@wolves.k12.mo.us> writes:
: quickly.  Even with a life of two million write cycles, the
: "occasional" 30-second round of updates that happen to write the same
: bits over and over will give your flash part a life of only 1.9 years
: (2000000 writes * 30 seconds apart = 60000000 seconds to failure).
: Also, I doubt you'll actually get 2 million writes out of the average
: consumer flash part.

I've gotten 10M writes in the lab here on parts that didn't fail.
Also, that's 2M writes per cell, and the CF parts wear average.  The
reason why this happens is because there are typically more than 1
cell per part.

However, you are *MUCH* better off logging to a memory file system
with cron.  Or better yet, not running cron or not logging it at all.
We log our stuff to /var/log (and don't bother logging the cron
messages) and newsyslog to a small writable partition once a day or so
on the average.  So using this as an argument to trash fsync is not
very strong.  There are much better ways to deal with these issues for
CF systems.

You are much better off doing a read-only / with a small writable
partition for things that need to be saved (we call ours /mod).  We
have a write rate of about 10 per hours, which gives our system an
expected life in excess of 20 years.

Our company has shipped over 200 flash systems, and we've had 3
flashes fail, all due to infant mortality...

Warner

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 18 00:13:32 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9F57737B401; Fri, 18 Apr 2003 00:13:32 -0700 (PDT)
Received: from HAL9000.homeunix.com (12-233-57-131.client.attbi.com
	[12.233.57.131])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id AB73D43FB1; Fri, 18 Apr 2003 00:13:31 -0700 (PDT)
	(envelope-from das@FreeBSD.org)
Received: from HAL9000.homeunix.com (localhost [127.0.0.1])
	by HAL9000.homeunix.com (8.12.9/8.12.5) with ESMTP id h3I7DU9E009228;
	Fri, 18 Apr 2003 00:13:30 -0700 (PDT)
	(envelope-from das@FreeBSD.org)
Received: (from das@localhost)
	by HAL9000.homeunix.com (8.12.9/8.12.5/Submit) id h3I7DTni009227;
	Fri, 18 Apr 2003 00:13:29 -0700 (PDT)
	(envelope-from das@FreeBSD.org)
Date: Fri, 18 Apr 2003 00:13:29 -0700
From: David Schultz <das@FreeBSD.org>
To: Marko Zec <zec@tel.fer.hr>
Message-ID: <20030418071329.GA9125@HAL9000.homeunix.com>
Mail-Followup-To: Marko Zec <zec@tel.fer.hr>, freebsd-fs@FreeBSD.org,
	freebsd-stable@FreeBSD.org
References: <3E976EBD.C3E66EF8@tel.fer.hr>
	<20030414101935.GB18110@HAL9000.homeunix.com> <3E9C5975.43755858@tel.fer.hr>
	<20030416101136.GA868@HAL9000.homeunix.com> <3E9E93D8.EB16ED42@tel.fer.hr>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3E9E93D8.EB16ED42@tel.fer.hr>
cc: freebsd-fs@FreeBSD.org
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 07:13:33 -0000

On Thu, Apr 17, 2003, Marko Zec wrote:
> David Schultz wrote:
> 
> > On Tue, Apr 15, 2003, Marko Zec wrote:
> > >
> > > > - The fiddling with rushjob seems rather arbitrary.  You can probably
> > > >   just let the existing code increment it as necessary and force a sync
> > > >   if the value gets too high.
> > >
> > > If rushjob is would not be used for forcing prompt synching, the original code
> > > could not guarantee the sync to occur immediately. Instead, the synching could
> > > be further delayed for up to 30 seconds, which is not desirable if our major
> > > design goal is to do as much disk I/O as possible in a small time interval and
> > > leave the disk idle otherwise.
> >
> > I was referring to all the places where rushjob is set to or
> > incremented by syncer_maxdelay.  AFAIK, it should never be that
> > large.
> 
> Hmm... Why? :)
> 
> > I don't think you want to overload a low memory handling
> > mechanism with the task of syncing the disk.
> 
> As far as I can see the rushjob variable is used only at one place in
> kern/vfs_subr.c to notify softupdates synching scheduler to start synching earlier
> than the normal timers would expire. I just reused the same mechanism to urge
> synching of dirty buffers when the extra delay timer expires, or when outstanding
> disk I/O occurs, to coalesce disk updates with occasional disk spinups.

When the system is low on memory or has reached a related limit,
it tries to sync data to disk faster by slowly increasing the
value of rushjob until the situation improves.  If the syncer is
able to keep up, it will process data faster and pull rushjob back
down to zero.  If rushjob gets too high (half the maximum sync
delay, usually 15), the system resorts to other measures.

Your code bumps rushjob up by the arbitrary value 32, which is
rather large.  Doing so is going to throw things out of whack.
What you would probably want to do is leave rushjob alone.  If it
ever becomes nonzero, the syncer should wake up and start writing
again.  If you would like to write the data out more quickly
whenever the disks start up so you can make them spin down again,
look at softdep_request_cleanup() in -CURRENT.

But really, even getting fsync() to do *remotely* the right thing
(i.e. correct ordering but no guarantee of writing data to stable
storage when in power save mode) is going to be *really*hard*.
Warner has a much better suggestion.

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 18 05:49:20 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4C2BE37B401; Fri, 18 Apr 2003 05:49:20 -0700 (PDT)
Received: from HAL9000.homeunix.com (12-233-57-131.client.attbi.com
	[12.233.57.131])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 949E343F3F; Fri, 18 Apr 2003 05:49:19 -0700 (PDT)
	(envelope-from das@FreeBSD.ORG)
Received: from HAL9000.homeunix.com (localhost [127.0.0.1])
	by HAL9000.homeunix.com (8.12.9/8.12.5) with ESMTP id h3ICnG9E011022;
	Fri, 18 Apr 2003 05:49:16 -0700 (PDT)
	(envelope-from das@FreeBSD.ORG)
Received: (from das@localhost)
	by HAL9000.homeunix.com (8.12.9/8.12.5/Submit) id h3ICnEHq011021;
	Fri, 18 Apr 2003 05:49:15 -0700 (PDT)
	(envelope-from das@FreeBSD.ORG)
Date: Fri, 18 Apr 2003 05:49:14 -0700
From: David Schultz <das@FreeBSD.ORG>
To: Terry Lambert <tlambert2@mindspring.com>
Message-ID: <20030418124914.GA10979@HAL9000.homeunix.com>
Mail-Followup-To: Terry Lambert <tlambert2@mindspring.com>,
	Marko Zec <zec@tel.fer.hr>, freebsd-fs@freebsd.org,
	Ian Dowse <iedowse@maths.tcd.ie>, freebsd-stable@freebsd.org,
	Kirk McKusick <mckusick@beastie.mckusick.com>
References: <200304162310.aa96829@salmon.maths.tcd.ie>
	<3E9E9827.4BB19197@tel.fer.hr> <3E9EDC38.1CE381C6@mindspring.com>
	<200304172143.26387.zec@tel.fer.hr> <3E9F4413.D294E69E@mindspring.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3E9F4413.D294E69E@mindspring.com>
cc: freebsd-fs@FreeBSD.ORG
cc: freebsd-stable@FreeBSD.ORG
cc: Ian Dowse <iedowse@maths.tcd.ie>
cc: Kirk McKusick <mckusick@beastie.mckusick.com>
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 12:49:20 -0000

On Thu, Apr 17, 2003, Terry Lambert wrote:
> Marko Zec wrote:
> > > You are much better off accumulating requests in the kernel in
> > > buffers, and then using the normal write mechanism to push them
> > > out to the drive ordered (IMO).
> > 
> > That is precisely what the original OS-controlled delayed synching patch does
> > :)
> 
> Yeah, but the spin-down isn't really under OS control, except
> as a sort of statistical hysteresis thing.  8-).

The OS can know exactly when the disk is spinning if it tells the
disk not to timeout all by itself with the IDLE command, and
explicitly tells it to IDLE IMMEDIATE at the appropriate time.
But being exact about this isn't particularly important.

As for the ATA delayed write feature, I don't believe it will
guarantee consistency.  This is true even if the drive doesn't
reorder writes, which it is free to do.  Consider a correctness
constraint given by the partial ordering of blocks A->B->A.  That
is, we have to first make a change to block A, then update block
B, then make a different change to block A.  This is going to be
fairly common if a fair number of writes are queued; it happens
whenever an editor saves a file using the correct fsync/rename
sequence, for instance.  The disk will coalesce the two writes to
A in its cache and therefore violate the constraint.

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 18 09:24:38 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 63CEE37B405; Fri, 18 Apr 2003 09:24:38 -0700 (PDT)
Received: from gatekeeper.oremut01.us.wh.verio.net
	(gatekeeper.oremut01.us.wh.verio.net [198.65.168.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id ACEC543FB1; Fri, 18 Apr 2003 09:24:36 -0700 (PDT)
	(envelope-from fclift@verio.net)
Received: from mx.dmz.orem.verio.net (mx.dmz.orem.verio.net [10.1.1.10])
	by gatekeeper.oremut01.us.wh.verio.net (Postfix) with ESMTP
	id F27713BF437; Fri, 18 Apr 2003 10:24:35 -0600 (MDT)
Received: from vespa.dmz.orem.verio.net (vespa.dmz.orem.verio.net [10.1.1.59])
	by mx.dmz.orem.verio.net (8.11.6p2/8.11.6) with ESMTP id h3IGOZJ30971;
	Fri, 18 Apr 2003 10:24:35 -0600 (MDT)
Date: Fri, 18 Apr 2003 10:28:24 -0600 (MDT)
From: Fred Clift <fclift@verio.net>
X-X-Sender: <fred@vespa.dmz.orem.verio.net>
To: David Schultz <das@freebsd.org>
In-Reply-To: <20030418124914.GA10979@HAL9000.homeunix.com>
Message-ID: <20030418101259.M49571-100000@vespa.dmz.orem.verio.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-fs@freebsd.org
cc: freebsd-stable@freebsd.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 16:24:38 -0000

On Fri, 18 Apr 2003, David Schultz wrote:

> explicitly tells it to IDLE IMMEDIATE at the appropriate time.
> But being exact about this isn't particularly important.

I think it might be nice to have something like this that immediately
spins the disk down after the burst of writes - though, if I remember
correctly, keeping a disk spinning takes far far less power than spinning
it up, so shutting down the disk 3 minutes earlier than you otherwise
might wont be that big of a power savings compared to avoiding spinning it
up so much.


> As for the ATA delayed write feature, I don't believe it will
> guarantee consistency.  This is true even if the drive doesn't

There has been a lot of talk on this thread about how the
(not-enabled-by-default) fsync portion of this patch violates the 'fsync
contract' and violates guarantees of consistency.  As was stated by the
creator of the patch, this is intented to only be used in situations where
it is relatively 'unimportant' to have these guarantees.  His typical
usage is on a non-mission-critical machine (his laptop) that doesn't
contain data which, _when_lost_, isn't going to be irreplaceable.

There have been many objections about various databases not getting
updates, qmail/sendmail loosing mail, vi removing/overwirting a file, etc,
but aparently these are not the cases for which this patch was designed.
If a person cared about these possiblities, he wouldn't turn this
functionality on.

If on the other hand, a person were stuck at the doctor's office waiting
room, with low battery, playing nethack, then perhaps this patch is just
what you want.

Can we stop going on and on about how terrible this patch is for
'important' and 'unrecoveralbe' data?  This patch should not be used on
any machine that has irreplacable data.  If I were using this on my laptop
working on code and I LOST my changes, I can always cvs update to get the
file back and start working again, having lost 30 minutes of work.  Of
course my laptop doesn't get major mission-critical use either...

On the other hand _if_ the patch could be slightly modified to still
guarantee fsync semantics (when qmail writes mail, vi overwrites a file,
or mysql updates a table, etc) so that data would be safer, but not
significantly degrade the utility of the patch then I'd say lets 1) make
the small change (ie disk spin-up/write/spin-down on every fsync?  will
this take more power than it is worth?) and 2) incorperate this into
FreeBSD and let people get on with using it!. (It doesn't have to be
commited into the tree to get use, but it certainly would get much more
use this way.)

Fred

--
Fred Clift - fclift@verio.net -- Remember: If brute
force doesn't work, you're just not using enough.

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 18 09:46:26 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 22FA537B405; Fri, 18 Apr 2003 09:46:26 -0700 (PDT)
Received: from pa-plum1b-166.pit.adelphia.net (pa-plum1b-122.pit.adelphia.net
	[24.53.161.122])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 2EECF43FCB; Fri, 18 Apr 2003 09:46:25 -0700 (PDT)
	(envelope-from wmoran@potentialtech.com)
Received: from potentialtech.com (working [172.16.0.95])
	h3IGkNwl000376;	Fri, 18 Apr 2003 12:46:24 -0400 (EDT)
	(envelope-from wmoran@potentialtech.com)
Message-ID: <3EA02BDF.7020306@potentialtech.com>
Date: Fri, 18 Apr 2003 12:46:23 -0400
From: Bill Moran <wmoran@potentialtech.com>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.2.1) Gecko/20030301
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Fred Clift <fclift@verio.net>
References: <20030418101259.M49571-100000@vespa.dmz.orem.verio.net>
In-Reply-To: <20030418101259.M49571-100000@vespa.dmz.orem.verio.net>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: freebsd-fs@freebsd.org
cc: David Schultz <das@freebsd.org>
cc: freebsd-stable@freebsd.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 16:46:26 -0000

Fred Clift wrote:
> On Fri, 18 Apr 2003, David Schultz wrote:
> 
>>explicitly tells it to IDLE IMMEDIATE at the appropriate time.
>>But being exact about this isn't particularly important.
> 
> I think it might be nice to have something like this that immediately
> spins the disk down after the burst of writes - though, if I remember
> correctly, keeping a disk spinning takes far far less power than spinning
> it up, so shutting down the disk 3 minutes earlier than you otherwise
> might wont be that big of a power savings compared to avoiding spinning it
> up so much.
> 
>>As for the ATA delayed write feature, I don't believe it will
>>guarantee consistency.  This is true even if the drive doesn't
> 
> There has been a lot of talk on this thread about how the
> (not-enabled-by-default) fsync portion of this patch violates the 'fsync
> contract' and violates guarantees of consistency.  As was stated by the
> creator of the patch, this is intented to only be used in situations where
> it is relatively 'unimportant' to have these guarantees.  His typical
> usage is on a non-mission-critical machine (his laptop) that doesn't
> contain data which, _when_lost_, isn't going to be irreplaceable.
> 
> There have been many objections about various databases not getting
> updates, qmail/sendmail loosing mail, vi removing/overwirting a file, etc,
> but aparently these are not the cases for which this patch was designed.
> If a person cared about these possiblities, he wouldn't turn this
> functionality on.
> 
> If on the other hand, a person were stuck at the doctor's office waiting
> room, with low battery, playing nethack, then perhaps this patch is just
> what you want.
> 
> Can we stop going on and on about how terrible this patch is for
> 'important' and 'unrecoveralbe' data?  This patch should not be used on
> any machine that has irreplacable data.  If I were using this on my laptop
> working on code and I LOST my changes, I can always cvs update to get the
> file back and start working again, having lost 30 minutes of work.  Of
> course my laptop doesn't get major mission-critical use either...
> 
> On the other hand _if_ the patch could be slightly modified to still
> guarantee fsync semantics (when qmail writes mail, vi overwrites a file,
> or mysql updates a table, etc) so that data would be safer, but not
> significantly degrade the utility of the patch then I'd say lets 1) make
> the small change (ie disk spin-up/write/spin-down on every fsync?  will
> this take more power than it is worth?) and 2) incorperate this into
> FreeBSD and let people get on with using it!. (It doesn't have to be
> commited into the tree to get use, but it certainly would get much more
> use this way.)

I've been following this thread for a while out of curiosity.
I understand the dangers of suicical fsync, and I understand the benefits.
I know this isn't normally the kind of thing that should get said on these
lists, but if anyone is taking a vote, I agree with Fred 100%.  Include
the functionality, document the dangers, and leave it off by default.
Despite the hundred-billion places where it would be a bad idea, I feel
there are a number of places where it would be helpful.

-- 
Bill Moran
Potential Technologies
http://www.potentialtech.com

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 18 11:13:36 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id F2CF937B401; Fri, 18 Apr 2003 11:13:35 -0700 (PDT)
Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net
	[207.217.120.189])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 26CA443FD7; Fri, 18 Apr 2003 11:13:35 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0577.cvx22-bradley.dialup.earthlink.net ([209.179.200.67]
	helo=mindspring.com)
	by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196aMd-0000Nt-00; Fri, 18 Apr 2003 11:13:32 -0700
Message-ID: <3EA03FF1.280B6810@mindspring.com>
Date: Fri, 18 Apr 2003 11:12:01 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: David Schultz <das@FreeBSD.ORG>
References: <200304162310.aa96829@salmon.maths.tcd.ie>
	<3E9E9827.4BB19197@tel.fer.hr> <3E9EDC38.1CE381C6@mindspring.com>
	<200304172143.26387.zec@tel.fer.hr> <3E9F4413.D294E69E@mindspring.com>
	<20030418124914.GA10979@HAL9000.homeunix.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a491ef9124fa8972c225596c66c40ce4b6350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@FreeBSD.ORG
cc: freebsd-stable@FreeBSD.ORG
cc: Ian Dowse <iedowse@maths.tcd.ie>
cc: Kirk McKusick <mckusick@beastie.mckusick.com>
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 18:13:36 -0000

David Schultz wrote:
> > Yeah, but the spin-down isn't really under OS control, except
> > as a sort of statistical hysteresis thing.  8-).
> 
> The OS can know exactly when the disk is spinning if it tells the
> disk not to timeout all by itself with the IDLE command, and
> explicitly tells it to IDLE IMMEDIATE at the appropriate time.
> But being exact about this isn't particularly important.

As it sits, the implementation is via a timer that is not under
OS control.  It would be nice if it used this method, instead,
since it would allow anyone who wanted to to implement a "policy",
if the default policy bothered them (e.g. do it when the screen
saver kicks on, or do it when there haven't been any mouse/keyboard
input events for XX seconds, etc. -- you could even hook this to
whether the delayed fsync is active or not, which seems a better
time for it to be active, anyway).


> As for the ATA delayed write feature, I don't believe it will
> guarantee consistency.

It doesn't.  I checked, after voicing my suspions of it.

> This is true even if the drive doesn't
> reorder writes, which it is free to do.  Consider a correctness
> constraint given by the partial ordering of blocks A->B->A.  That
> is, we have to first make a change to block A, then update block
> B, then make a different change to block A.  This is going to be
> fairly common if a fair number of writes are queued; it happens
> whenever an editor saves a file using the correct fsync/rename
> sequence, for instance.  The disk will coalesce the two writes to
> A in its cache and therefore violate the constraint.

You can't turn the reordering off, and your example is exactly
the "barrier" case I had previously described.  8-).

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 18 13:43:18 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 96CF237B401; Fri, 18 Apr 2003 13:43:18 -0700 (PDT)
Received: from mail.tel.fer.hr (zg04-042.dialin.iskon.hr [213.191.137.43])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id CF01243F85; Fri, 18 Apr 2003 13:43:15 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from marko-tp (marko@[192.168.201.107])
	by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3IKfPxI000931;
	Fri, 18 Apr 2003 22:41:26 +0200 (CEST)
	(envelope-from zec@tel.fer.hr)
From: Marko Zec <zec@tel.fer.hr>
To: David Schultz <das@FreeBSD.org>
Date: Fri, 18 Apr 2003 22:43:05 +0200
User-Agent: KMail/1.5
References: <3E976EBD.C3E66EF8@tel.fer.hr> <3E9E93D8.EB16ED42@tel.fer.hr>
	<20030418071329.GA9125@HAL9000.homeunix.com>
In-Reply-To: <20030418071329.GA9125@HAL9000.homeunix.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-2"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200304182243.05739.zec@tel.fer.hr>
cc: freebsd-fs@FreeBSD.org
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 20:43:18 -0000

On Friday 18 April 2003 09:13, David Schultz wrote:
> When the system is low on memory or has reached a related limit,
> it tries to sync data to disk faster by slowly increasing the
> value of rushjob until the situation improves.  If the syncer is
> able to keep up, it will process data faster and pull rushjob back
> down to zero.

True.

> If rushjob gets too high (half the maximum sync
> delay, usually 15), the system resorts to other measures.

Which measures, and in which cases? The only two chunks of code in the entire 
-stable kernel that probe the value of rushjob (indirectly through invoking 
speedup_syncer() ) are newdirrem() and inodedep_lookup() in 
ufs/ffs/ffs_softdep.c. Neither of these two will either corrupt a single bit 
of data or crash the system if rushjob gets higher than max syncdelay / 2.

> Your code bumps rushjob up by the arbitrary value 32, which is
> rather large.  Doing so is going to throw things out of whack.

Which things and how?

> What you would probably want to do is leave rushjob alone.  If it
> ever becomes nonzero, the syncer should wake up and start writing
> again.

Sure, that's precisely why I increment rushjob - to instruct the syncer to 
start synching when I want it to. What's wrong with that?

> If you would like to write the data out more quickly
> whenever the disks start up so you can make them spin down again,
> look at softdep_request_cleanup() in -CURRENT.
>
> But really, even getting fsync() to do *remotely* the right thing
> (i.e. correct ordering but no guarantee of writing data to stable
> storage when in power save mode) is going to be *really*hard*.
> Warner has a much better suggestion.

If I'm not mistaking Warner was talking about using memory based FS and 
periodically synching it to a flash based device. Such a concept is perfectly 
sane for appliances using solid state disks, however I don't see how it can 
be applied to a typical laptop.

Marko

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 18 14:08:07 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 78E7337B40B; Fri, 18 Apr 2003 14:08:07 -0700 (PDT)
Received: from mail.tel.fer.hr (zg07-053.dialin.iskon.hr [213.191.150.54])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 7A04F43FE9; Fri, 18 Apr 2003 14:08:05 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from marko-tp (marko@[192.168.201.107])
	by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3IL6CxI000936;
	Fri, 18 Apr 2003 23:06:16 +0200 (CEST)
	(envelope-from zec@tel.fer.hr)
From: Marko Zec <zec@tel.fer.hr>
To: Terry Lambert <tlambert2@mindspring.com>
Date: Fri, 18 Apr 2003 23:07:53 +0200
User-Agent: KMail/1.5
References: <200304162310.aa96829@salmon.maths.tcd.ie>
	<200304180245.53107.zec@tel.fer.hr> <3E9F4FE4.9B8567DC@mindspring.com>
In-Reply-To: <3E9F4FE4.9B8567DC@mindspring.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-2"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200304182307.53890.zec@tel.fer.hr>
cc: freebsd-fs@freebsd.org
cc: freebsd-stable@freebsd.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 21:08:08 -0000

On Friday 18 April 2003 03:07, Terry Lambert wrote:
> No, you are missing my previous point: the check for free space
> should include a check for number of elements *TOTAL* in all slots
> on the soft updates timer wheel.  Otherwise it can eat all of
> memory.
>
> The free space check only works in the case that you've done a
> delete and are allocating new space: the case where you are doing
> more and more allocations/opverwrites of data is not handled, and
> can grow to eat all available kernel memory.  There was in fact a
> bug, early on, that Matt Dillon worked around that caused it under
> load, and it was in exactly the code you are touching.

If what you are saying were true, than one could simply crash an _unpached_ 
system by doing a lot of FS write operations. What my patch does is that it 
just temporarily suspends the softupdates "wheels" as you call it. However, 
if VM or another ffs subsytem indicates (by increasing the value of rushjob) 
that buffers should get flushed more frequently, than my patch will 
_immediately_ drop out of the delay loop and allow the syncing to proceed 
ASAP. I really do not see what can be wrong with such a concept?

> 
> Under what circumstances you you find that delaying fsync()
> helps you?  What program are you running that calls fsync()?

The vi editor, pretty much every e-mail client, and so on...

> Even if you don't use it for a
> statistical check, it will check you on the number of times
> fsync() (and sync()) get called by someone. If it's a small
> number, you need to fix the bogus program, rather than hack
> the kernel.  8-).

No, those programs are not bogus, and neither is the kernel. I just want to 
have a method to keep the damn disk spinned down, that's all.

Marko

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 18 14:21:33 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id BDD4B37B401; Fri, 18 Apr 2003 14:21:33 -0700 (PDT)
Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net
	[207.217.120.139])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 2300C43F85; Fri, 18 Apr 2003 14:21:33 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0240.cvx22-bradley.dialup.earthlink.net ([209.179.198.240]
	helo=mindspring.com)
	by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196dIY-0005ad-00; Fri, 18 Apr 2003 14:21:31 -0700
Message-ID: <3EA06C07.A34F1C31@mindspring.com>
Date: Fri, 18 Apr 2003 14:20:07 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Marko Zec <zec@tel.fer.hr>
References: <3E976EBD.C3E66EF8@tel.fer.hr> <3E9E93D8.EB16ED42@tel.fer.hr>
	<200304182243.05739.zec@tel.fer.hr>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a435745805e1197074fb358af05677c83f667c3043c0873f7e350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 21:21:34 -0000

Marko Zec wrote:
> On Friday 18 April 2003 09:13, David Schultz wrote:
> > Your code bumps rushjob up by the arbitrary value 32, which is
> > rather large.  Doing so is going to throw things out of whack.
> 
> Which things and how?
> 
> > What you would probably want to do is leave rushjob alone.  If it
> > ever becomes nonzero, the syncer should wake up and start writing
> > again.
> 
> Sure, that's precisely why I increment rushjob - to instruct the syncer to
> start synching when I want it to. What's wrong with that?

Touching rushjob is probably not a good idea.

The main technical (not philosophical) problem with the patch
as it sits is that you can cause the soft updates wheel to wrap
around.

Then when you write things out, they write out of order.

The purpose of the wheel is to allow placing of operations at
some relative offset in the future to an outstanding operation,
to ensure ordering.


No matter what else you do, you can not allow the wheel to
"wrap".  Because the offsets are "future relative", that means
that you have to flush at some number of wheel entries equal
to:

	wrap_boundary - the_largest_potential_future_offset - 1.

Making the wheel bigger is probably acceptable, but then you
will exacerbate the memory problem that rushjob was invented
to resolve (please do a "cvs log" and look at the checkin
comments; I still believe it was "dillon" who made the change).

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 18 14:24:56 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5C79B37B401; Fri, 18 Apr 2003 14:24:56 -0700 (PDT)
Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net
	[207.217.120.139])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id BD3B543FD7; Fri, 18 Apr 2003 14:24:55 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0240.cvx22-bradley.dialup.earthlink.net ([209.179.198.240]
	helo=mindspring.com)
	by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196dLp-00061Q-00; Fri, 18 Apr 2003 14:24:54 -0700
Message-ID: <3EA06CD2.E299D864@mindspring.com>
Date: Fri, 18 Apr 2003 14:23:30 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Marko Zec <zec@tel.fer.hr>
References: <200304162310.aa96829@salmon.maths.tcd.ie>
	<200304180245.53107.zec@tel.fer.hr> <3E9F4FE4.9B8567DC@mindspring.com>
	<200304182307.53890.zec@tel.fer.hr>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4fc7764252737ce9c695e8026c7825103667c3043c0873f7e350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@freebsd.org
cc: freebsd-stable@freebsd.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 21:24:56 -0000

Marko Zec wrote:
> On Friday 18 April 2003 03:07, Terry Lambert wrote:
> > No, you are missing my previous point: the check for free space
> > should include a check for number of elements *TOTAL* in all slots
> > on the soft updates timer wheel.  Otherwise it can eat all of
> > memory.
> >
> > The free space check only works in the case that you've done a
> > delete and are allocating new space: the case where you are doing
> > more and more allocations/opverwrites of data is not handled, and
> > can grow to eat all available kernel memory.  There was in fact a
> > bug, early on, that Matt Dillon worked around that caused it under
> > load, and it was in exactly the code you are touching.
> 
> If what you are saying were true, than one could simply crash an _unpached_
> system by doing a lot of FS write operations.

No.  See the checkin comments for "rushjob".

> What my patch does is that it
> just temporarily suspends the softupdates "wheels" as you call it. However,
> if VM or another ffs subsytem indicates (by increasing the value of rushjob)
> that buffers should get flushed more frequently, than my patch will
> _immediately_ drop out of the delay loop and allow the syncing to proceed
> ASAP. I really do not see what can be wrong with such a concept?

No.  See last posting: the wheel can not be allowed to "wrap".

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 18 14:49:13 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id EBB3937B404; Fri, 18 Apr 2003 14:49:12 -0700 (PDT)
Received: from mail.tel.fer.hr (zg06-176.dialin.iskon.hr [213.191.148.177])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id E960443FB1; Fri, 18 Apr 2003 14:49:10 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from marko-tp (marko@[192.168.201.107])
	by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3ILlGxI000941;
	Fri, 18 Apr 2003 23:47:21 +0200 (CEST)
	(envelope-from zec@tel.fer.hr)
From: Marko Zec <zec@tel.fer.hr>
To: Terry Lambert <tlambert2@mindspring.com>
Date: Fri, 18 Apr 2003 23:48:58 +0200
User-Agent: KMail/1.5
References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304182243.05739.zec@tel.fer.hr>
	<3EA06C07.A34F1C31@mindspring.com>
In-Reply-To: <3EA06C07.A34F1C31@mindspring.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-2"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200304182348.58356.zec@tel.fer.hr>
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Apr 2003 21:49:13 -0000

On Friday 18 April 2003 23:20, Terry Lambert wrote:
> > Sure, that's precisely why I increment rushjob - to instruct the syncer
> > to start synching when I want it to. What's wrong with that?
>
> Touching rushjob is probably not a good idea.
>
> The main technical (not philosophical) problem with the patch
> as it sits is that you can cause the soft updates wheel to wrap
> around.

No, that just cannot happen. You are probably confusing rushjob with 
syncer_delayno, which gets reset to 0 each time it reaches the value of 
syncer_maxdelay. The rushjob variable simply tells the syncer how many times 
it should iterate _sequentially_ through the softupdates queues before 
getting to sleep on lbolt.

>
> Then when you write things out, they write out of order.

Uhh.. NO!

> The purpose of the wheel is to allow placing of operations at
> some relative offset in the future to an outstanding operation,
> to ensure ordering.

True. And this has not changed with my patch.

> No matter what else you do, you can not allow the wheel to
> "wrap".  Because the offsets are "future relative", that means
> that you have to flush at some number of wheel entries equal
> to:
>
> 	wrap_boundary - the_largest_potential_future_offset - 1.
>
> Making the wheel bigger is probably acceptable, but then you
> will exacerbate the memory problem that rushjob was invented
> to resolve (please do a "cvs log" and look at the checkin
> comments; I still believe it was "dillon" who made the change).

Where from did you get the idea I'm making the wheel bigger? The size of the 
softupdates "wheel" is determined by the value of syncer_maxdelay, which not 
only I haven't touched at all, but is also completely unrelated to the 
rushjob variable.

If it is of any relevance for this discussion, I want to add that I've been 
running my system with extended delaying all the time for the last two weeks 
(even when on AC power). I have had absolutely no problems nor have lost a 
single bit of data, even during the most stresfull tests such as untarring of 
huge archives, or making the kernel etc. Not to mention this is also my 
primary and "production" machine, with all my e-mail on it etc.

Marko

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 18 17:36:05 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id E338837B405; Fri, 18 Apr 2003 17:36:05 -0700 (PDT)
Received: from HAL9000.homeunix.com (12-233-57-131.client.attbi.com
	[12.233.57.131])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id E2D7743FE9; Fri, 18 Apr 2003 17:36:04 -0700 (PDT)
	(envelope-from das@FreeBSD.org)
Received: from HAL9000.homeunix.com (localhost [127.0.0.1])
	by HAL9000.homeunix.com (8.12.9/8.12.5) with ESMTP id h3J0Zx9E012929;
	Fri, 18 Apr 2003 17:35:59 -0700 (PDT)
	(envelope-from das@FreeBSD.org)
Received: (from das@localhost)
	by HAL9000.homeunix.com (8.12.9/8.12.5/Submit) id h3J0Zwnj012928;
	Fri, 18 Apr 2003 17:35:58 -0700 (PDT)
	(envelope-from das@FreeBSD.org)
Date: Fri, 18 Apr 2003 17:35:58 -0700
From: David Schultz <das@FreeBSD.org>
To: Fred Clift <fclift@verio.net>
Message-ID: <20030419003558.GA12856@HAL9000.homeunix.com>
Mail-Followup-To: Fred Clift <fclift@verio.net>,
	freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org
References: <20030418124914.GA10979@HAL9000.homeunix.com>
	<20030418101259.M49571-100000@vespa.dmz.orem.verio.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20030418101259.M49571-100000@vespa.dmz.orem.verio.net>
cc: freebsd-fs@FreeBSD.org
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Apr 2003 00:36:06 -0000

On Fri, Apr 18, 2003, Fred Clift wrote:
> There have been many objections about various databases not getting
> updates, qmail/sendmail loosing mail, vi removing/overwirting a file, etc,
> but aparently these are not the cases for which this patch was designed.
> If a person cared about these possiblities, he wouldn't turn this
> functionality on.
> 
> If on the other hand, a person were stuck at the doctor's office waiting
> room, with low battery, playing nethack, then perhaps this patch is just
> what you want.

If you're in the doctor's office writing a long letter, and
following a crash you find that not only the latest changes but
the *entire* *file* just vanished, you might not be such a happy
camper.  If you leave fsync() alone, your computer will do exactly
what you want it to do.  It will guarantee that *some* version of
the file is on disk, and when you tell your editor to save, it
will guarantee that the *latest* version is on disk.  So if you
want the disk to stay in power save mode, you just don't ask your
editor to write it to disk.

If you're playing nethack, on the other hand, you won't be
fsyncing anyway because nethack doesn't have state that's vitally
important.

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 18 18:30:50 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 1989937B404; Fri, 18 Apr 2003 18:30:50 -0700 (PDT)
Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net
	[207.217.120.189])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 1B39943FE9; Fri, 18 Apr 2003 18:30:48 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0114.cvx21-bradley.dialup.earthlink.net ([209.179.192.114]
	helo=mindspring.com)
	by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196hBj-0007jF-00; Fri, 18 Apr 2003 18:30:44 -0700
Message-ID: <3EA0A647.BEC5931A@mindspring.com>
Date: Fri, 18 Apr 2003 18:28:39 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Marko Zec <zec@tel.fer.hr>
References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304182243.05739.zec@tel.fer.hr>
	<3EA06C07.A34F1C31@mindspring.com> <200304182348.58356.zec@tel.fer.hr>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a419d7a1c3f61e46c70ea0413bc5b08d503ca473d225a0f487350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Apr 2003 01:30:50 -0000

Marko Zec wrote:
> > The main technical (not philosophical) problem with the patch
> > as it sits is that you can cause the soft updates wheel to wrap
> > around.
> 
> No, that just cannot happen. You are probably confusing rushjob with
> syncer_delayno, which gets reset to 0 each time it reaches the value of
> syncer_maxdelay. The rushjob variable simply tells the syncer how many times
> it should iterate _sequentially_ through the softupdates queues before
> getting to sleep on lbolt.


Obviously I am not explaining myself correctly.  I guess the next
step would be to offer my own patch set for doing what you are
trying to do.  Before I do that, let me try one more time.

I think that it is important that the value of syncer_delayno
needs to continue to be incremented once a second, and that the
modified sched_sync(), which with your patch no longer does this,
needs to used it's own counter.

In other words, I think that you need to implement a two handed
clock algorithm, to keep the buckets from getting too deep with
work items, and in case there is some dependency which is not
being accounted for that has been working because there is an
implicit delay of 1 second or more in vn_syncer_add_to_worklist()
calls (your patch would break this, so without taking this into
account, we would have to retest all of soft updates).


> > Then when you write things out, they write out of order.
> 
> Uhh.. NO!

Uh, yes; potentially they do.  See the implicit dependency
situation described above.  There are other cases, too, but
they are much more complicated.  I wish Kirk would speak up
in more technical detail about the problems you are potentially
introducing; they require a deep understanding of the soft
updates code.


> > The purpose of the wheel is to allow placing of operations at
> > some relative offset in the future to an outstanding operation,
> > to ensure ordering.
> 
> True. And this has not changed with my patch.

No, it has changed.  It's changed both in the depth of the
queue entries, and it's changed in the relative spacing of
implicit dependencies, and it's changed in the relative depth,
for two or more dependent operations with future offsets.

In the depth case, when the code runs, is going to stall the
system for a really long time, relatively, because there are
a number of worklists which are *substantially* deep, because
vn_syncer_add_to_worklist() was using a syncer_delano that has
been assumed to be updated once a second, and never changed
during your stall.  This means that the worklist represented by
syncer_workitem_pending[syncer_delayno] is going to contain
*almost all work* that was enqueued in the interim.

The problem with this is that in the for(;;) loop in sched_sync()
in the "if (LIST_FIRST(slp) == vp)" code block, you are likely
to run yourself into a panic.  See the comment about "sync_fsync()
moves it to a different slot so we are safe"?  That comment is no
longer true.


> > No matter what else you do, you can not allow the wheel to
> > "wrap".  Because the offsets are "future relative", that means
> > that you have to flush at some number of wheel entries equal
> > to:
> >
> >       wrap_boundary - the_largest_potential_future_offset - 1.
> >
> > Making the wheel bigger is probably acceptable, but then you
> > will exacerbate the memory problem that rushjob was invented
> > to resolve (please do a "cvs log" and look at the checkin
> > comments; I still believe it was "dillon" who made the change).
> 
> Where from did you get the idea I'm making the wheel bigger? The size of the
> softupdates "wheel" is determined by the value of syncer_maxdelay, which not
> only I haven't touched at all, but is also completely unrelated to the
> rushjob variable.

I didn't get the idea you were making the wheel bigger.  That's
the problem: you probably need to make the wheel bigger, so that
the [new!] second hand on the two handed clock has more time until
it runs into first hand on the clock.  You will have to do this so
you can bound the vn_syncer_add_to_worklist() add delay to something
less than "syncer_maxdelay - 2"; I suggest "syncer_maxdelay / 2", as
a first approximation (remember this needs to be a power of 2, due
to syncer_mask).

You also want to count workitem insertions and removals, so you have
a total count.  This is easy: it's already protected by sync_mtx,
so all you need is a static global counter.

When the counter gets to a certain size (configurable), you have too
much memory tied up in the work queue -- so you flush it.


> If it is of any relevance for this discussion, I want to add that I've been
> running my system with extended delaying all the time for the last two weeks
> (even when on AC power). I have had absolutely no problems nor have lost a
> single bit of data, even during the most stresfull tests such as untarring of
> huge archives, or making the kernel etc. Not to mention this is also my
> primary and "production" machine, with all my e-mail on it etc.

Write some code that specifically stresses a specific FS dependency
on a set of files, iteratively, over and over again.  Then close all
the files, and call "sync", and wait.

Or run your test, and then unmount the FS on which the test was
running, before your delayed fsync gets a change to run, and then
do a shutdown.  When the system comes back up, check the data to
see if it's what it's supposed to be.

Basically, you are going to have to provide something *other than*
"rushjob" to be able to cause unmounts and other "special" code to
be able to force the fsync (consider removable media, like flash,
if nothing else).

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Fri Apr 18 19:21:49 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8E1D137B401
	for <freebsd-fs@FreeBSD.org>; Fri, 18 Apr 2003 19:21:49 -0700 (PDT)
Received: from laptop.tenebras.com (laptop.tenebras.com [66.92.188.18])
	by mx1.FreeBSD.org (Postfix) with SMTP id 944E843FBF
	for <freebsd-fs@FreeBSD.org>; Fri, 18 Apr 2003 19:21:47 -0700 (PDT)
	(envelope-from kudzu@tenebras.com)
Received: (qmail 22690 invoked from network); 19 Apr 2003 02:21:44 -0000
Received: from queequeg.tenebras.com (HELO tenebras.com) (192.168.188.241)
  by 0 with SMTP; 19 Apr 2003 02:21:44 -0000
Message-ID: <3EA0B2B8.4000600@tenebras.com>
Date: Fri, 18 Apr 2003 19:21:44 -0700
From: Michael Sierchio <kudzu@tenebras.com>
User-Agent: Mozilla/5.0 (X11; U; Linux i386; en-US; rv:1.3) Gecko/20030312
X-Accept-Language: en-us, en, zh-cn, zh-tw
MIME-Version: 1.0
To: Terry Lambert <tlambert2@mindspring.com>
References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304182243.05739.zec@tel.fer.hr>
	<3EA06C07.A34F1C31@mindspring.com> <200304182348.58356.zec@tel.fer.hr>
	<3EA0A647.BEC5931A@mindspring.com>
In-Reply-To: <3EA0A647.BEC5931A@mindspring.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: freebsd-fs@FreeBSD.org
cc: freebsd-stable@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Apr 2003 02:21:49 -0000

Terry Lambert wrote:

> Obviously I am not explaining myself correctly.  I guess the next
> step would be to offer my own patch set for doing what you are
> trying to do.  Before I do that, let me try one more time.

Forgive me, but let me cut through this Gordian Knot and just say:
the proposal is for the introduction of a feature of questionable
value, with consequences that have not been adequately considered.

It should never be committed.

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 19 00:03:27 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id CBDD037B401; Sat, 19 Apr 2003 00:03:27 -0700 (PDT)
Received: from HAL9000.homeunix.com (12-233-57-131.client.attbi.com
	[12.233.57.131])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 15DA743FDF; Sat, 19 Apr 2003 00:03:27 -0700 (PDT)
	(envelope-from das@FreeBSD.org)
Received: from HAL9000.homeunix.com (localhost [127.0.0.1])
	by HAL9000.homeunix.com (8.12.9/8.12.5) with ESMTP id h3J73P9E014134;
	Sat, 19 Apr 2003 00:03:25 -0700 (PDT)
	(envelope-from das@FreeBSD.org)
Received: (from das@localhost)
	by HAL9000.homeunix.com (8.12.9/8.12.5/Submit) id h3J73K2o014133;
	Sat, 19 Apr 2003 00:03:20 -0700 (PDT)
	(envelope-from das@FreeBSD.org)
Date: Sat, 19 Apr 2003 00:03:20 -0700
From: David Schultz <das@FreeBSD.org>
To: Marko Zec <zec@tel.fer.hr>
Message-ID: <20030419070320.GA14034@HAL9000.homeunix.com>
Mail-Followup-To: Marko Zec <zec@tel.fer.hr>, freebsd-fs@FreeBSD.org,
	freebsd-stable@FreeBSD.org
References: <3E976EBD.C3E66EF8@tel.fer.hr> <3E9E93D8.EB16ED42@tel.fer.hr>
	<20030418071329.GA9125@HAL9000.homeunix.com>
	<200304182243.05739.zec@tel.fer.hr>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200304182243.05739.zec@tel.fer.hr>
cc: freebsd-fs@FreeBSD.org
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Apr 2003 07:03:28 -0000

On Fri, Apr 18, 2003, Marko Zec wrote:
> > If rushjob gets too high (half the maximum sync
> > delay, usually 15), the system resorts to other measures.
> 
> Which measures, and in which cases? The only two chunks of code in the entire 
> -stable kernel that probe the value of rushjob 

Look at -CURRENT.

> > Your code bumps rushjob up by the arbitrary value 32, which is
> > rather large.  Doing so is going to throw things out of whack.
> 
> Which things and how?

My complaint was simply that you're incrementing rushjob by some
number you pulled out of a hat, namely 32.  This causes the syncer
to spin around 32 times every time someone calls sync(), and most
of the time, it won't have anything to do.  Moreover, in -CURRENT,
you can lead the system to believe that resources are scarcer than
they really are.  Look at what request_cleanup() does when
speedup_syncer() fails, for instance.

> > What you would probably want to do is leave rushjob alone.  If it
> > ever becomes nonzero, the syncer should wake up and start writing
> > again.
> 
> Sure, that's precisely why I increment rushjob - to instruct the syncer to 
> start synching when I want it to. What's wrong with that?

You seem to be overthinking this.  On a relatively quiescent
laptop, all you have to do is have the drives spin down and
suspend the operation of the syncer as long as no processes are
blocked on I/O.  If this results in too many dirty buffers, the
system will automatically notice this and kick the syncer.  You
don't need to step in and kick the syncer 32 times or disable
fsync() in order to get reasonable benefits without breaking
things.  This simple approach can easily be refined later if need be.

> > But really, even getting fsync() to do *remotely* the right thing
> > (i.e. correct ordering but no guarantee of writing data to stable
> > storage when in power save mode) is going to be *really*hard*.
> > Warner has a much better suggestion.
> 
> If I'm not mistaking Warner was talking about using memory based FS and 
> periodically synching it to a flash based device. Such a concept is perfectly 
> sane for appliances using solid state disks, however I don't see how it can 
> be applied to a typical laptop.

It's the same principle.  For flash, you want to limit the number
of writes since you only get a finite number of them.  For
laptops, you want to limit the number of writes because keeping
your drive spinning drains the battery.  In both cases, you can
solve the problem by using a memory filesystem for things like
cron that write frequently.

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 19 02:53:30 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 11DE437B401; Sat, 19 Apr 2003 02:53:30 -0700 (PDT)
Received: from mail.tel.fer.hr (zg05-039.dialin.iskon.hr [213.191.138.40])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 41B0043FBF; Sat, 19 Apr 2003 02:53:25 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from marko-tp (marko@[192.168.201.107])
	by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3J9pMxI000991;
	Sat, 19 Apr 2003 11:51:27 +0200 (CEST)
	(envelope-from zec@tel.fer.hr)
From: Marko Zec <zec@tel.fer.hr>
To: Terry Lambert <tlambert2@mindspring.com>
Date: Sat, 19 Apr 2003 11:53:03 +0200
User-Agent: KMail/1.5
References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304182348.58356.zec@tel.fer.hr>
	<3EA0A647.BEC5931A@mindspring.com>
In-Reply-To: <3EA0A647.BEC5931A@mindspring.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-2"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200304191153.03970.zec@tel.fer.hr>
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Apr 2003 09:53:30 -0000

On Saturday 19 April 2003 03:28, Terry Lambert wrote:
> I think that it is important that the value of syncer_delayno
> needs to continue to be incremented once a second, and that the
> modified sched_sync(), which with your patch no longer does this,
> needs to used it's own counter.

If you look again at the _unpatched_ syncer loop, you will clearly see that 
syncer_delayno is not guaranteed to be incremented only once a second. If 
speedup_syncer() increases the value of rushjob, the syncer_delayno will be 
increased up to rushjob times in a second in the syncer loop, depending if 
the buffers can be flushed fast enough. The expedited synching will proceed 
until rushjob drops down to 0.

My patch didn't invent nor did change that model at all.

> In other words, I think that you need to implement a two handed
> clock algorithm, to keep the buckets from getting too deep with
> work items, and in case there is some dependency which is not
> being accounted for that has been working because there is an
> implicit delay of 1 second or more in vn_syncer_add_to_worklist()
> calls (your patch would break this, so without taking this into
> account, we would have to retest all of soft updates).

Again, please look at the _unmodified_ syncer code. My patch didn't change a 
thing regarding the possibility for the syncer to try flushing more than one 
syncer_workitem_pending queue in a second.

>
> > > Then when you write things out, they write out of order.
> >
> > Uhh.. NO!
>
> Uh, yes; potentially they do.  See the implicit dependency
> situation described above.  There are other cases, too, but
> they are much more complicated.  I wish Kirk would speak up
> in more technical detail about the problems you are potentially
> introducing; they require a deep understanding of the soft
> updates code.

I wish also...

> > > The purpose of the wheel is to allow placing of operations at
> > > some relative offset in the future to an outstanding operation,
> > > to ensure ordering.
> >
> > True. And this has not changed with my patch.
>
> No, it has changed.  It's changed both in the depth of the
> queue entries, and it's changed in the relative spacing of
> implicit dependencies, and it's changed in the relative depth,
> for two or more dependent operations with future offsets.

How? By simply stopping the softupdates clock for a couple of seconds (ok, 
minutes :) more than usual?

> In the depth case, when the code runs, is going to stall the
> system for a really long time, relatively, because there are
> a number of worklists which are *substantially* deep, because
> vn_syncer_add_to_worklist() was using a syncer_delano that has
> been assumed to be updated once a second, and never changed
> during your stall.  This means that the worklist represented by
> syncer_workitem_pending[syncer_delayno] is going to contain
> *almost all work* that was enqueued in the interim.
>
> The problem with this is that in the for(;;) loop in sched_sync()
> in the "if (LIST_FIRST(slp) == vp)" code block, you are likely
> to run yourself into a panic.  See the comment about "sync_fsync()
> moves it to a different slot so we are safe"?  That comment is no
> longer true.

Is it possible you are confusing the sync_fsync() routine in kern/vfs_subr.c 
(which I didn't touch) with the modified fsync() handler in 
kern/vfs_syscalls.c.

[the rest of the debate deleted]

Can we please either slowly conclude this discussion, or provide a feasible 
alternative to the proposed patch? I start feeling like we are wasting 
tremendeous amount of time here while going nowhere.

Marko

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 19 11:20:06 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id D585737B401; Sat, 19 Apr 2003 11:20:06 -0700 (PDT)
Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net
	[207.217.120.188])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id F2F6843FAF; Sat, 19 Apr 2003 11:20:05 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0234.cvx22-bradley.dialup.earthlink.net ([209.179.198.234]
	helo=mindspring.com)
	by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196wwS-0000ja-00; Sat, 19 Apr 2003 11:20:01 -0700
Message-ID: <3EA19303.1DB825C8@mindspring.com>
Date: Sat, 19 Apr 2003 11:18:43 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Marko Zec <zec@tel.fer.hr>
References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304182348.58356.zec@tel.fer.hr>
	<3EA0A647.BEC5931A@mindspring.com> <200304191153.03970.zec@tel.fer.hr>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a46cdfd73c98dcc69db972ae3c87b5828d350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Apr 2003 18:20:07 -0000

Marko Zec wrote:
> On Saturday 19 April 2003 03:28, Terry Lambert wrote:
> > I think that it is important that the value of syncer_delayno
> > needs to continue to be incremented once a second, and that the
> > modified sched_sync(), which with your patch no longer does this,
> > needs to used it's own counter.
> 
> If you look again at the _unpatched_ syncer loop, you will clearly see that
> syncer_delayno is not guaranteed to be incremented only once a second. If
> speedup_syncer() increases the value of rushjob, the syncer_delayno will be
> increased up to rushjob times in a second in the syncer loop, depending if
> the buffers can be flushed fast enough. The expedited synching will proceed
> until rushjob drops down to 0.
> 
> My patch didn't invent nor did change that model at all.

The problem is not the distribution of the entries removed from
the wheel, it is the distribution of entries inserted onto the
wheel.

Running the wheel forward quickly during removal is not a problem.

*Not* running the wheel forward _at all_ during insertion *is* a
problem.

What we care about here is distribution of entries which exist,
not distribution of entries which no longer exist.


> > In other words, I think that you need to implement a two handed
> > clock algorithm, to keep the buckets from getting too deep with
> > work items, and in case there is some dependency which is not
> > being accounted for that has been working because there is an
> > implicit delay of 1 second or more in vn_syncer_add_to_worklist()
> > calls (your patch would break this, so without taking this into
> > account, we would have to retest all of soft updates).
> 
> Again, please look at the _unmodified_ syncer code. My patch didn't change a
> thing regarding the possibility for the syncer to try flushing more than one
> syncer_workitem_pending queue in a second.

Again, I don't care about flushing for this case: I care about
insertion.


> > > > Then when you write things out, they write out of order.
> > >
> > > Uhh.. NO!
> >
> > Uh, yes; potentially they do.  See the implicit dependency
> > situation described above.  There are other cases, too, but
> > they are much more complicated.  I wish Kirk would speak up
> > in more technical detail about the problems you are potentially
> > introducing; they require a deep understanding of the soft
> > updates code.
> 
> I wish also...

Be aware that I was at least associated with the FreeBSD soft
updates code implementation (I did the original "make it compile
and link pass", among other things, when Whistle, the company I
worked for, paid Kirk to do the implementation), and I was also
part of a team which implemented soft updates for FFS in a
different environment in 1995.

I'm not trying to claim authority here, since I'm one of the
sides in this disagreement, but realize I'm not totally clueless
when it comes to soft updates.


> > No, it has changed.  It's changed both in the depth of the
> > queue entries, and it's changed in the relative spacing of
> > implicit dependencies, and it's changed in the relative depth,
> > for two or more dependent operations with future offsets.
> 
> How? By simply stopping the softupdates clock for a couple of seconds (ok,
> minutes :) more than usual?

Say you stop the clock for 30 seconds: syncer_delayno is not
incremented during those 30 seconds.  Now, during that time,
vn_syncer_add_to_worklist() is called once a second to add
workitems.  Say they are the same workitems (delay 0, delay 6).
Now (relative to the original syncer_delayno), the buckets that
are represented by "syncer_workitem_pending[syncer_delayno+delay]"

vn_syncer_add_to_worklist() instance
|
|             syncer_workitem_pending[original_syncer_delayno + N]
|  0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3
|  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6
v
0  1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[your patch:]
30 30          30
[not your patch:]
   1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 0

Your patch causes us to get two single buckets 30 deep.

Not having your patch gives us 37 buckets; 12 are 1 deep, 18 are 2
deep.

Does this make sense now?  It is about insertion in the face of a
stopped clock, and how bursty the resulting "catchup" will be.

If you look at the code, you will see that there is no opportunity
for other code to run in a single bucket list traversal, but in the
rushjob case of multiple bucket traversals, the system gets control
back in between buckets, so the operation of the system is much,
much smoother in the case that individual buckets are not allowed
to get too deep.  This is normally accomplished by incrementing the
value of syncer_delayno once per second, as a continuous function,
rather than a bursty increment once every 30 seconds.


> Is it possible you are confusing the sync_fsync() routine in kern/vfs_subr.c
> (which I didn't touch) with the modified fsync() handler in
> kern/vfs_syscalls.c.

No.  I am only talking about the vn_syncer_add_to_worklist() and
sched_sync() functions, and how they interact on the syncer_delayno
clock.


> Can we please either slowly conclude this discussion, or provide a feasible
> alternative to the proposed patch? I start feeling like we are wasting
> tremendeous amount of time here while going nowhere.

Please read the above, specifically the diagram of bucket list
depths with a working clock vs. a stopped clock, and the fact
that the bucket list traversals are atomic, but multiple bucket
traversals of the same number of equally distributed work items
are not.

I guess I'm willing to provide an alternate patch, if I have to
do so, but I would prefer that you understand the issues yourself,
since that makes one more person clueful about the issues, in case
the few of the rest of us get hit by a bus.  8-).

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 19 12:35:29 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 871EE37B401; Sat, 19 Apr 2003 12:35:29 -0700 (PDT)
Received: from mail.tel.fer.hr (zg02-002.dialin.iskon.hr [213.191.130.3])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id DF19E43FB1; Sat, 19 Apr 2003 12:35:09 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from marko-tp (marko@[192.168.201.107])
	by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3JJX9xI001016;
	Sat, 19 Apr 2003 21:33:14 +0200 (CEST)
	(envelope-from zec@tel.fer.hr)
From: Marko Zec <zec@tel.fer.hr>
To: Terry Lambert <tlambert2@mindspring.com>
Date: Sat, 19 Apr 2003 21:34:51 +0200
User-Agent: KMail/1.5
References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304191153.03970.zec@tel.fer.hr>
	<3EA19303.1DB825C8@mindspring.com>
In-Reply-To: <3EA19303.1DB825C8@mindspring.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-2"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200304192134.51484.zec@tel.fer.hr>
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Apr 2003 19:35:30 -0000

On Saturday 19 April 2003 20:18, Terry Lambert wrote:
> Say you stop the clock for 30 seconds: syncer_delayno is not
> incremented during those 30 seconds.  Now, during that time,
> vn_syncer_add_to_worklist() is called once a second to add
> workitems.  Say they are the same workitems (delay 0, delay 6).
> Now (relative to the original syncer_delayno), the buckets that
> are represented by "syncer_workitem_pending[syncer_delayno+delay]"
>
> vn_syncer_add_to_worklist() instance
>
> |             syncer_workitem_pending[original_syncer_delayno + N]
> |  0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3
> | 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
> | 6
>
> v
> 0  1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 [your patch:]
> 30 30          30
> [not your patch:]
>    1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1
> 0
>
> Your patch causes us to get two single buckets 30 deep.
>
> Not having your patch gives us 37 buckets; 12 are 1 deep, 18 are 2
> deep.

The whole purpose of the patch is to delay disk writes when running on battery 
power. In such a case it is completely irellevant whether the buckets get 
more or less evenly distributed over all the delay queues, or they get 
concentrated in only two (or more precisely: in three). In either case, all 
the queues will be flushed as quickly as possible when the disk gets spinned 
up, in order for the disk to be active for the shortest possible time.

> Does this make sense now?  It is about insertion in the face of a
> stopped clock, and how bursty the resulting "catchup" will be.

And that is exactly what the user of battery powered laptop wants - to have 
infrequent but bursty writes to disk, and an idle disk at all other times. I 
have claimed such a functionality from my very first post. This is a feature, 
not a bug. What's wrong with that?

> If you look at the code, you will see that there is no opportunity
> for other code to run in a single bucket list traversal, but in the
> rushjob case of multiple bucket traversals, the system gets control
> back in between buckets, so the operation of the system is much,
> much smoother in the case that individual buckets are not allowed
> to get too deep.  This is normally accomplished by incrementing the
> value of syncer_delayno once per second, as a continuous function,
> rather than a bursty increment once every 30 seconds.

I completely agree with you that smoothness will be sacrificed, but again, 
please do have in mind the original purpose of the patch. When running on 
battery power, smoothness is a bad thing. When running on AC, the patch will 
become inactive, so 100% normal operation is automatically restored, and you 
get all the smoothness back.

> Please read the above, specifically the diagram of bucket list
> depths with a working clock vs. a stopped clock, and the fact
> that the bucket list traversals are atomic, but multiple bucket
> traversals of the same number of equally distributed work items
> are not.

True. But this still doesn't justify your claims from previous posts that the 
patched system is likely to corrupt data or crash the system. I am still 
pretty much convinced it will do neither of these two things, both by looking 
at the scope of the modifications the patch introduces, and from my 
experience with a production system running all the time on a patched kernel.

Cheers,

Marko

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 19 13:56:58 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5933537B401; Sat, 19 Apr 2003 13:56:58 -0700 (PDT)
Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net
	[207.217.120.218])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 77ADA43FBD; Sat, 19 Apr 2003 13:56:57 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0234.cvx22-bradley.dialup.earthlink.net ([209.179.198.234]
	helo=mindspring.com)
	by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196zOF-0002sx-00; Sat, 19 Apr 2003 13:56:52 -0700
Message-ID: <3EA1B72D.B8B96268@mindspring.com>
Date: Sat, 19 Apr 2003 13:53:01 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Marko Zec <zec@tel.fer.hr>
References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304191153.03970.zec@tel.fer.hr>
	<3EA19303.1DB825C8@mindspring.com> <200304192134.51484.zec@tel.fer.hr>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4b20386bb8968a0ef5f2a25a2ddac2613a7ce0e8f8d31aa3f350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Apr 2003 20:56:58 -0000

Marko Zec wrote:
> > If you look at the code, you will see that there is no opportunity
> > for other code to run in a single bucket list traversal, but in the
> > rushjob case of multiple bucket traversals, the system gets control
> > back in between buckets, so the operation of the system is much,
> > much smoother in the case that individual buckets are not allowed
> > to get too deep.  This is normally accomplished by incrementing the
> > value of syncer_delayno once per second, as a continuous function,
> > rather than a bursty increment once every 30 seconds.
> 
> I completely agree with you that smoothness will be sacrificed, but again,
> please do have in mind the original purpose of the patch. When running on
> battery power, smoothness is a bad thing. When running on AC, the patch will
> become inactive, so 100% normal operation is automatically restored, and you
> get all the smoothness back.

You are still missing the point.

If I have 30 entries each on 2 queues, the rest of the system
gets an opportunity to run once between what might be significant
bouts of I/O, which is the slowest thing you can do.

If I have 2 entries each on 30 queue, the rest of the system
gets an opportunity to run 29 times between much less significant
bouts of I/O (1/15th of the latency).

So the difference is between the disk spinning up and the system
freezing for the duration, or the disk spinning up and the
system freezing unnoticbly to the user for 1/10th of a second
per worklist for a larger number of worklists.

Add to this that the batches of I/O are unlikely to be on the
same track, and therefore there's seek latency as well, and you
have a significant freeze that's going to appear like the machine
is locked up.

I guess if you are willing to monitor the mailing lists and explain
why this isn't a bad thing every time users complain about it, it's
no big deal, ecept to people who want the feature, but don't agree
with your implementation.  8-).


> > Please read the above, specifically the diagram of bucket list
> > depths with a working clock vs. a stopped clock, and the fact
> > that the bucket list traversals are atomic, but multiple bucket
> > traversals of the same number of equally distributed work items
> > are not.
> 
> True. But this still doesn't justify your claims from previous posts
> that the patched system is likely to corrupt data or crash the system.

The previous claim for potential panic was based on the fact
that the same bucket was being used for the next I/O, rather
than the same + 1 bucket, which is what the code assumed.  I
just took it for granted that the failure case was self-evident.

You need to read the comment in the sched_sync() code, and
understand why it is saying what it is saying:

                                /*
                                 * Note: VFS vnodes can remain on the
                                 * worklist too with no dirty blocks, but
                                 * since sync_fsync() moves it to a different
                                 * slot we are safe.
                                 */

Your changes makes it so the insertion *does not* put it in a
different slot (because the fsync is most likely delayed).
Therefore we are *not* safe.


The other FS corruption occurs because you don't specifically
disable the delaying code before a shutdown or umount or mount
-u -o ro, etc..


> I am still pretty much convinced it will do neither of these two
> things, both by looking at the scope of the modifications the
> patch introduces,

My analysis (and several other people's) differs from yours.


> and from my experience with a production system
> running all the time on a patched kernel.

This is totally irrelevent; it's anecdotal, and therefore has
nothing whatsoever to do with provable correctness.

"From my experience" is the same argument that Linux used to
justify async mounts in ext2fs, and they were provably wrong.

-

I guess at this point, I have to ask: what's wrong with Ian
Dowse's patches to do approximately the same thing?

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 19 14:51:10 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 056C137B401; Sat, 19 Apr 2003 14:51:10 -0700 (PDT)
Received: from mail.tel.fer.hr (zg02-229.dialin.iskon.hr [213.191.130.230])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id E7B0B43FBF; Sat, 19 Apr 2003 14:51:04 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from marko-tp (marko@[192.168.201.107])
	by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3JLn6xI001024;
	Sat, 19 Apr 2003 23:49:11 +0200 (CEST)
	(envelope-from zec@tel.fer.hr)
From: Marko Zec <zec@tel.fer.hr>
To: Terry Lambert <tlambert2@mindspring.com>
Date: Sat, 19 Apr 2003 23:50:48 +0200
User-Agent: KMail/1.5
References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304192134.51484.zec@tel.fer.hr>
	<3EA1B72D.B8B96268@mindspring.com>
In-Reply-To: <3EA1B72D.B8B96268@mindspring.com>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-2"
Content-Transfer-Encoding: 7bit
Message-Id: <200304192350.48576.zec@tel.fer.hr>
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Apr 2003 21:51:10 -0000

On Saturday 19 April 2003 22:53, Terry Lambert wrote:
> You are still missing the point.
>
> If I have 30 entries each on 2 queues, the rest of the system
> gets an opportunity to run once between what might be significant
> bouts of I/O, which is the slowest thing you can do.
>
> If I have 2 entries each on 30 queue, the rest of the system
> gets an opportunity to run 29 times between much less significant
> bouts of I/O (1/15th of the latency).
>
> So the difference is between the disk spinning up and the system
> freezing for the duration, or the disk spinning up and the
> system freezing unnoticbly to the user for 1/10th of a second
> per worklist for a larger number of worklists.
>
> Add to this that the batches of I/O are unlikely to be on the
> same track, and therefore there's seek latency as well, and you
> have a significant freeze that's going to appear like the machine
> is locked up.

Does the laptop owner care if the system freezes for a couple of miliseconds 
more than usual? If you have tried the patch yourself, you would certainly 
observe that the freeze you are talking about is completely unnoticable. Even 
under the highest loads, my system can accumulate at most around 300 dirty 
buffers before starting to sync for one reason or another. Modern ATA disks 
posess a significant amount of RAM available for write caching, which will 
compensate even for such write bursts. Therefore the disk head seek latency 
you mentioned won't be noticeable in most cases.

> Your changes makes it so the insertion *does not* put it in a
> different slot (because the fsync is most likely delayed).
                              ^^^^^
> Therefore we are *not* safe.

Again, in my understanding the (modified) fsync() handler is completely 
unrelated to the (unmodified) sync_fsync() function.

> The other FS corruption occurs because you don't specifically
> disable the delaying code before a shutdown or umount or mount
> -u -o ro, etc..

Such a problem simply does not exist. Please try out the patch, enable the 
delaying, fill in as much dirty buffers as possible, and unmount the FS. You 
will notice that a) all the dirty buffers will be automatically written to 
the disk; b) the unmount operation will succeed; c) the system will not crash 
and d) the FS will be perfectly consistent at the next mount.

> My analysis (and several other people's) differs from yours.
>
> > and from my experience with a production system
> > running all the time on a patched kernel.
>
> This is totally irrelevent; it's anecdotal, and therefore has
> nothing whatsoever to do with provable correctness.

No offence please, but your argumentation would look much more convincing if 
you could provoke a system crash with the patch enabled, and then provide a 
backtrace. If the patch is as bad as you are suggesting, that shouldn't be 
that hard to do, should it?

Marko

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 19 16:42:14 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 2E9AD37B404; Sat, 19 Apr 2003 16:42:14 -0700 (PDT)
Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net
	[207.217.120.188])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 7C91243FA3; Sat, 19 Apr 2003 16:42:13 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0077.cvx40-bradley.dialup.earthlink.net ([216.244.42.77]
	helo=mindspring.com)
	by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 1971yE-0007HZ-00; Sat, 19 Apr 2003 16:42:11 -0700
Message-ID: <3EA1DE82.68F32B77@mindspring.com>
Date: Sat, 19 Apr 2003 16:40:50 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Marko Zec <zec@tel.fer.hr>
References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304192134.51484.zec@tel.fer.hr>
	<3EA1B72D.B8B96268@mindspring.com> <200304192350.48576.zec@tel.fer.hr>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4c348be04d95caf49280f378ec0a8b4c6a2d4e88014a4647c350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Apr 2003 23:42:14 -0000

Marko Zec wrote:
> Does the laptop owner care if the system freezes for a couple of miliseconds
> more than usual?

I am a laptop owner.  I care.

> If you have tried the patch yourself, you would certainly observe
> that the freeze you are talking about is completely unnoticable.

I run 13 jails for 12 virtual machines on my laptop.  I noticed.


> Therefore the disk head seek latency you mentioned won't be
> noticeable in most cases.

Define "most cases".


> > Your changes makes it so the insertion *does not* put it in a
> > different slot (because the fsync is most likely delayed).
>                               ^^^^^
> > Therefore we are *not* safe.
> 
> Again, in my understanding the (modified) fsync() handler is completely
> unrelated to the (unmodified) sync_fsync() function.

You're wrong.  You have to take into account both the vnodes on
the FS, and the vnodes that the FS is mounted on on devfs.


> > The other FS corruption occurs because you don't specifically
> > disable the delaying code before a shutdown or umount or mount
> > -u -o ro, etc..
> 
> Such a problem simply does not exist. Please try out the patch, enable the
> delaying, fill in as much dirty buffers as possible, and unmount the FS. You
> will notice that a) all the dirty buffers will be automatically written to
> the disk; b) the unmount operation will succeed; c) the system will not crash
> and d) the FS will be perfectly consistent at the next mount.

This is not true.  I've proved it by corrupting an FS by holding
down the power button on my laptop to force an ATX power-off, with
no recourse.  This is the same type of failure that could occur on
a normal laptop when battery output drops the power out from under
you.

The basic problem is that the forces fsync is no longer forced.


> > This is totally irrelevent; it's anecdotal, and therefore has
> > nothing whatsoever to do with provable correctness.
> 
> No offence please, but your argumentation would look much more
> convincing if you could provoke a system crash with the patch
> enabled, and then provide a backtrace. If the patch is as bad
> as you are suggesting, that shouldn't be that hard to do, should it?

I've done it.  I guess you want me to do it again, citing that
absence of evidence is not evidence of absence?

The problem her is well understood.  Rather than arguing further,
I will offer a modification of your patches.

Note that this modification is still unsafe, due to the lack of
a "force" flag for the fsync in the unmount and mount -u cases;
give me a couple of days, since I test patches before I post
them (normally 2 weeks; I'll make an exception in this case).

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 19 17:27:58 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id CB67E37B401; Sat, 19 Apr 2003 17:27:58 -0700 (PDT)
Received: from mail.tel.fer.hr (zg04-020.dialin.iskon.hr [213.191.137.21])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 9181743FB1; Sat, 19 Apr 2003 17:27:56 -0700 (PDT)
	(envelope-from zec@tel.fer.hr)
Received: from marko-tp (marko@[192.168.201.107])
	by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3K0Q2xI001037;
	Sun, 20 Apr 2003 02:26:06 +0200 (CEST)
	(envelope-from zec@tel.fer.hr)
From: Marko Zec <zec@tel.fer.hr>
To: Terry Lambert <tlambert2@mindspring.com>
Date: Sun, 20 Apr 2003 02:27:43 +0200
User-Agent: KMail/1.5
References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304192350.48576.zec@tel.fer.hr>
	<3EA1DE82.68F32B77@mindspring.com>
In-Reply-To: <3EA1DE82.68F32B77@mindspring.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-2"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200304200227.44268.zec@tel.fer.hr>
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 20 Apr 2003 00:27:59 -0000

On Sunday 20 April 2003 01:40, Terry Lambert wrote:
> Marko Zec wrote:
> > If you have tried the patch yourself, you would certainly observe
> > that the freeze you are talking about is completely unnoticable.
>
> I run 13 jails for 12 virtual machines on my laptop.  I noticed.

:)
If you are really serious about running 12 VMs on a laptop, then:

a) you do not want to have this patch enabled in the first place, and
b) what kind of delay exactly did you notice?

> > Therefore the disk head seek latency you mentioned won't be
> > noticeable in most cases.
>
> Define "most cases".

Those where the onboard write-caching RAM on the ATA disk is large enough to 
compensate for disk head seek latency for the whole write burst.

> > Again, in my understanding the (modified) fsync() handler is completely
> > unrelated to the (unmodified) sync_fsync() function.
>
> You're wrong.  You have to take into account both the vnodes on
> the FS, and the vnodes that the FS is mounted on on devfs.

Hmm, the original patch was against 4.8-R, and this whole discussion is 
flooding the -stable mailing list, in case you forgot. Where from did you now 
pull the devfs? And even with devfs, what if my patch (optionally) ignores 
fsync()? Does that mean that all the programs that close their files without 
caling fsync() are going to crash the system? Uhhh....

> > > The other FS corruption occurs because you don't specifically
> > > disable the delaying code before a shutdown or umount or mount
> > > -u -o ro, etc..
> >
> > Such a problem simply does not exist. Please try out the patch, enable
> > the delaying, fill in as much dirty buffers as possible, and unmount the
> > FS. You will notice that a) all the dirty buffers will be automatically
> > written to the disk; b) the unmount operation will succeed; c) the system
> > will not crash and d) the FS will be perfectly consistent at the next
> > mount.
>
> This is not true.  I've proved it by corrupting an FS by holding
> down the power button on my laptop to force an ATX power-off, with
> no recourse.

??? You have proved what by pulling out the plug? That umount or shutdown do 
not work (pls. read your previous claim 10 lines above)? I do not believe to 
be reading this...

> > No offence please, but your argumentation would look much more
> > convincing if you could provoke a system crash with the patch
> > enabled, and then provide a backtrace. If the patch is as bad
> > as you are suggesting, that shouldn't be that hard to do, should it?
>
> I've done it.  I guess you want me to do it again, citing that
> absence of evidence is not evidence of absence?

I'd simply prefer to receive a backtrace, rather than just tons of noise.
An improved patch couldn't hurt either :)

Marko

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 19 23:30:52 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id F36AE37B401; Sat, 19 Apr 2003 23:30:51 -0700 (PDT)
Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11])
	by mx1.FreeBSD.org (Postfix) with SMTP
	id 1040643FDF; Sat, 19 Apr 2003 23:30:50 -0700 (PDT)
	(envelope-from iedowse@maths.tcd.ie)
Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP
          id <aa34354@salmon>; 20 Apr 2003 07:30:49 +0100 (BST)
To: Terry Lambert <tlambert2@mindspring.com>
In-Reply-To: Your message of "Fri, 18 Apr 2003 11:12:01 PDT."
             <3EA03FF1.280B6810@mindspring.com> 
Date: Sun, 20 Apr 2003 07:30:44 +0100
From: Ian Dowse <iedowse@maths.tcd.ie>
Message-ID: <200304200730.aa34354@salmon.maths.tcd.ie>
cc: freebsd-fs@FreeBSD.ORG
cc: David Schultz <das@FreeBSD.ORG>
cc: freebsd-stable@FreeBSD.ORG
cc: Kirk McKusick <mckusick@beastie.mckusick.com>
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 20 Apr 2003 06:30:52 -0000

In message <3EA03FF1.280B6810@mindspring.com>, Terry Lambert writes:
>David Schultz wrote:
>> As for the ATA delayed write feature, I don't believe it will
>> guarantee consistency.
>
>It doesn't.  I checked, after voicing my suspions of it.

Yes, write ordering and hence FS consistency is not guaranteed; my
original point was just that the situation regarding FS consistency
with ATA delayed writes is not significantly worse than that with
the default behaviour of having ATA write cacheing enabled. In fact,
if the OS is modified to perform writes in batches then the two
cases are almost identical: in one case the disk collects a batch
of writes, possibly reorders them, and writes them out in one burst;
in the other case the OS sends a burst of writes, the disk possibly
reorders them and writes them out. For reference I've included below
what IBM say about the delayed write feature in their disk
documentation.

BTW, to answer a point Marko mentioned, I don't consider the delayed
write behaviour to be nearly as bad as a null fsync(), because you
are very unlikely to completely lose a file that has been modified,
saved and then fsync()'d. If the write/rename/fsync all happen while
the disk is spun down then the old version of the file is still
intact on the media if the power fails. With a null fsync(), there
can be a considerable window where the disk contains just a zero-length
file.

I completely accept that there is more flexibility at the OS side
to control which writes get delayed and by how much, and that an
OS-side implementation would be extremely useful. However I think
it would require further work to develop a good implementation. For
example, the current proposed patch effectively assumes that there
is only one disk in the system since `stratcalls' is a global
variable (e.g., I believe that reading from an ATA flash device
would trigger a flush to any real ATA disks in the system). It would
also be useful if the solution was not specific to ATA devices and
had per-device control over the behaviour.

I guess my point of view is more that doing this right at the OS
side is hard, and ATA delayed write is an unobtrusive neat feature
that does mostly the right thing at the cost of only a marginal
increase in the risk of data loss for typical uses.

Ian

	11.13 Delayed Write function (vendor specific)

	Delayed Write function is a power saving enhancement whereby
	the device delays the actual data writing into the media.
	When the device is in the power saving mode and the Write
	command (Write Sectors, Write Multiple, or Write DMA) comes
	from the host, the transferred data is not written into the
	media immediately, only stored into the cache buffer. When
	the cache buffer becomes full or reaches the predefined
	size, or if any command except the Write command is issued,
	the operation to write the data from the cache buffer into
	the media is begun.

	Power consumption can be reduced by Delayed Write. When
	Write commands come with a long interval, the device must
	exit from the power saving mode and enter into the power
	saving mode again without Delayed Write function. If Delayed
	Write is enabled, such power saving mode transition times
	can be reduced. As a result, the additional energy for power
	saving mode transition can be saved, then the average power
	consumption of the device can be reduced.

	However, the time elapsed from the completion of the Write
	command to the media write completion will be extended with
	Delayed Write function. If the power for the device is
	turned off during this time, the data which has not been
	written to the media is lost.  Therefore, a command listed
	in the Write Cache Function section shall be issued before
	the power off to confirm whole cached data has been written
	into the media.

	For safety, Delayed Write function is disabled at Power On
	Default.