From owner-freebsd-stable@FreeBSD.ORG  Sat Apr 19 13:56:58 2003
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5933537B401; Sat, 19 Apr 2003 13:56:58 -0700 (PDT)
Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net
	[207.217.120.218])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 77ADA43FBD; Sat, 19 Apr 2003 13:56:57 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0234.cvx22-bradley.dialup.earthlink.net ([209.179.198.234]
	helo=mindspring.com)
	by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 196zOF-0002sx-00; Sat, 19 Apr 2003 13:56:52 -0700
Message-ID: <3EA1B72D.B8B96268@mindspring.com>
Date: Sat, 19 Apr 2003 13:53:01 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Marko Zec <zec@tel.fer.hr>
References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304191153.03970.zec@tel.fer.hr>
	<3EA19303.1DB825C8@mindspring.com> <200304192134.51484.zec@tel.fer.hr>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4b20386bb8968a0ef5f2a25a2ddac2613a7ce0e8f8d31aa3f350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@FreeBSD.org
cc: David Schultz <das@FreeBSD.org>
cc: freebsd-stable@FreeBSD.org
Subject: Re: PATCH: Forcible delaying of UFS (soft)updates
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Apr 2003 20:56:58 -0000

Marko Zec wrote:
> > If you look at the code, you will see that there is no opportunity
> > for other code to run in a single bucket list traversal, but in the
> > rushjob case of multiple bucket traversals, the system gets control
> > back in between buckets, so the operation of the system is much,
> > much smoother in the case that individual buckets are not allowed
> > to get too deep.  This is normally accomplished by incrementing the
> > value of syncer_delayno once per second, as a continuous function,
> > rather than a bursty increment once every 30 seconds.
> 
> I completely agree with you that smoothness will be sacrificed, but again,
> please do have in mind the original purpose of the patch. When running on
> battery power, smoothness is a bad thing. When running on AC, the patch will
> become inactive, so 100% normal operation is automatically restored, and you
> get all the smoothness back.

You are still missing the point.

If I have 30 entries each on 2 queues, the rest of the system
gets an opportunity to run once between what might be significant
bouts of I/O, which is the slowest thing you can do.

If I have 2 entries each on 30 queue, the rest of the system
gets an opportunity to run 29 times between much less significant
bouts of I/O (1/15th of the latency).

So the difference is between the disk spinning up and the system
freezing for the duration, or the disk spinning up and the
system freezing unnoticbly to the user for 1/10th of a second
per worklist for a larger number of worklists.

Add to this that the batches of I/O are unlikely to be on the
same track, and therefore there's seek latency as well, and you
have a significant freeze that's going to appear like the machine
is locked up.

I guess if you are willing to monitor the mailing lists and explain
why this isn't a bad thing every time users complain about it, it's
no big deal, ecept to people who want the feature, but don't agree
with your implementation.  8-).


> > Please read the above, specifically the diagram of bucket list
> > depths with a working clock vs. a stopped clock, and the fact
> > that the bucket list traversals are atomic, but multiple bucket
> > traversals of the same number of equally distributed work items
> > are not.
> 
> True. But this still doesn't justify your claims from previous posts
> that the patched system is likely to corrupt data or crash the system.

The previous claim for potential panic was based on the fact
that the same bucket was being used for the next I/O, rather
than the same + 1 bucket, which is what the code assumed.  I
just took it for granted that the failure case was self-evident.

You need to read the comment in the sched_sync() code, and
understand why it is saying what it is saying:

                                /*
                                 * Note: VFS vnodes can remain on the
                                 * worklist too with no dirty blocks, but
                                 * since sync_fsync() moves it to a different
                                 * slot we are safe.
                                 */

Your changes makes it so the insertion *does not* put it in a
different slot (because the fsync is most likely delayed).
Therefore we are *not* safe.


The other FS corruption occurs because you don't specifically
disable the delaying code before a shutdown or umount or mount
-u -o ro, etc..


> I am still pretty much convinced it will do neither of these two
> things, both by looking at the scope of the modifications the
> patch introduces,

My analysis (and several other people's) differs from yours.


> and from my experience with a production system
> running all the time on a patched kernel.

This is totally irrelevent; it's anecdotal, and therefore has
nothing whatsoever to do with provable correctness.

"From my experience" is the same argument that Linux used to
justify async mounts in ext2fs, and they were provably wrong.

-

I guess at this point, I have to ask: what's wrong with Ian
Dowse's patches to do approximately the same thing?

-- Terry