From owner-freebsd-stable@FreeBSD.ORG Sat Apr 19 13:56:58 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5933537B401; Sat, 19 Apr 2003 13:56:58 -0700 (PDT) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 77ADA43FBD; Sat, 19 Apr 2003 13:56:57 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0234.cvx22-bradley.dialup.earthlink.net ([209.179.198.234] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196zOF-0002sx-00; Sat, 19 Apr 2003 13:56:52 -0700 Message-ID: <3EA1B72D.B8B96268@mindspring.com> Date: Sat, 19 Apr 2003 13:53:01 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Marko Zec References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304191153.03970.zec@tel.fer.hr> <3EA19303.1DB825C8@mindspring.com> <200304192134.51484.zec@tel.fer.hr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4b20386bb8968a0ef5f2a25a2ddac2613a7ce0e8f8d31aa3f350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2003 20:56:58 -0000 Marko Zec wrote: > > If you look at the code, you will see that there is no opportunity > > for other code to run in a single bucket list traversal, but in the > > rushjob case of multiple bucket traversals, the system gets control > > back in between buckets, so the operation of the system is much, > > much smoother in the case that individual buckets are not allowed > > to get too deep. This is normally accomplished by incrementing the > > value of syncer_delayno once per second, as a continuous function, > > rather than a bursty increment once every 30 seconds. > > I completely agree with you that smoothness will be sacrificed, but again, > please do have in mind the original purpose of the patch. When running on > battery power, smoothness is a bad thing. When running on AC, the patch will > become inactive, so 100% normal operation is automatically restored, and you > get all the smoothness back. You are still missing the point. If I have 30 entries each on 2 queues, the rest of the system gets an opportunity to run once between what might be significant bouts of I/O, which is the slowest thing you can do. If I have 2 entries each on 30 queue, the rest of the system gets an opportunity to run 29 times between much less significant bouts of I/O (1/15th of the latency). So the difference is between the disk spinning up and the system freezing for the duration, or the disk spinning up and the system freezing unnoticbly to the user for 1/10th of a second per worklist for a larger number of worklists. Add to this that the batches of I/O are unlikely to be on the same track, and therefore there's seek latency as well, and you have a significant freeze that's going to appear like the machine is locked up. I guess if you are willing to monitor the mailing lists and explain why this isn't a bad thing every time users complain about it, it's no big deal, ecept to people who want the feature, but don't agree with your implementation. 8-). > > Please read the above, specifically the diagram of bucket list > > depths with a working clock vs. a stopped clock, and the fact > > that the bucket list traversals are atomic, but multiple bucket > > traversals of the same number of equally distributed work items > > are not. > > True. But this still doesn't justify your claims from previous posts > that the patched system is likely to corrupt data or crash the system. The previous claim for potential panic was based on the fact that the same bucket was being used for the next I/O, rather than the same + 1 bucket, which is what the code assumed. I just took it for granted that the failure case was self-evident. You need to read the comment in the sched_sync() code, and understand why it is saying what it is saying: /* * Note: VFS vnodes can remain on the * worklist too with no dirty blocks, but * since sync_fsync() moves it to a different * slot we are safe. */ Your changes makes it so the insertion *does not* put it in a different slot (because the fsync is most likely delayed). Therefore we are *not* safe. The other FS corruption occurs because you don't specifically disable the delaying code before a shutdown or umount or mount -u -o ro, etc.. > I am still pretty much convinced it will do neither of these two > things, both by looking at the scope of the modifications the > patch introduces, My analysis (and several other people's) differs from yours. > and from my experience with a production system > running all the time on a patched kernel. This is totally irrelevent; it's anecdotal, and therefore has nothing whatsoever to do with provable correctness. "From my experience" is the same argument that Linux used to justify async mounts in ext2fs, and they were provably wrong. - I guess at this point, I have to ask: what's wrong with Ian Dowse's patches to do approximately the same thing? -- Terry