From owner-freebsd-fs@FreeBSD.ORG Sun Apr 13 10:55:10 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4FD9837B405; Sun, 13 Apr 2003 10:55:09 -0700 (PDT) Received: from chez.McKusick.COM (chez.mckusick.com [209.31.233.177]) by mx1.FreeBSD.org (Postfix) with ESMTP id CBF0643F93; Sun, 13 Apr 2003 10:55:07 -0700 (PDT) (envelope-from mckusick@mckusick.com) Received: from beastie.mckusick.com (localhost [127.0.0.1]) by beastie.mckusick.com (8.12.8/8.12.3) with ESMTP id h3D04Vb5006635; Sat, 12 Apr 2003 17:04:32 -0700 (PDT) (envelope-from mckusick@beastie.mckusick.com) Message-Id: <200304130004.h3D04Vb5006635@beastie.mckusick.com> To: Marko Zec In-Reply-To: Your message of "Sat, 12 Apr 2003 03:41:17 +0200." <3E976EBD.C3E66EF8@tel.fer.hr> Date: Sat, 12 Apr 2003 17:04:31 -0700 From: Kirk McKusick cc: freebsd-fs@freebsd.org cc: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2003 17:55:10 -0000 I am of the opinion that fsync should work. Applications like `vi' use fsync to ensure that the write of the new file is on stable store before removing the old copy. If that semantic is broken, it would be possible to have neither the old nor the new copy of your file after a crash. I do not consider that acceptable behavior. Further, the fsync call is used to ensure that link/unlink/rename have been completed. So more than just fsync is being affected by your change. Lastly, I often write out a file when I am about to suspend my laptop (for low battery or other reasons) and I really want that file on the disk now. I do not want to have to wait for it to decide at some future time to spin up the disk. I suggest that you make the disabling of fsync a separate option from the rest of your change so that people can decide for themselves whether they want partial savings with working semantics, or greater savings with broken semantics. I am also intrigued by the changes proposed by Ian Dowse that may better accomplish the same goals with less breakage. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Mon Apr 14 03:19:39 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B81F937B404; Mon, 14 Apr 2003 03:19:39 -0700 (PDT) Received: from HAL9000.homeunix.com (12-233-57-131.client.attbi.com [12.233.57.131]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9E3C543F75; Mon, 14 Apr 2003 03:19:38 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.9/8.12.5) with ESMTP id h3EAJaN7018721; Mon, 14 Apr 2003 03:19:36 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.9/8.12.5/Submit) id h3EAJZZI018720; Mon, 14 Apr 2003 03:19:35 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Date: Mon, 14 Apr 2003 03:19:35 -0700 From: David Schultz To: Marko Zec Message-ID: <20030414101935.GB18110@HAL9000.homeunix.com> Mail-Followup-To: Marko Zec , freebsd-fs@freebsd.org, freebsd-stable@freebsd.org, mckusick@McKusick.COM References: <3E976EBD.C3E66EF8@tel.fer.hr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3E976EBD.C3E66EF8@tel.fer.hr> cc: freebsd-fs@FreeBSD.ORG cc: mckusick@McKusick.COM cc: freebsd-stable@FreeBSD.ORG Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2003 10:19:40 -0000 On Sat, Apr 12, 2003, Marko Zec wrote: > Here's a patch against 4.8-RELEASE kernel that allows disk writes on > softupdates-enabled filesystems to be delayed for (theoretically) > arbitrarily long periods of time. The motivation for such updating > policy is surprisingly not purely suicidal - it can allow disks on > laptops to spin down immediately after I/O operations and stay idle for > longer periods of time, thus saving considerable amount of battery > power. Very nice! I have been thinking about doing something like this for a long time, but I never managed to find the time. Some comments: - As others have mentioned, the fsync-disabling feature is questionable and ought to be separate. You can make it somewhat more useful by at least guaranteeing transactional consistency, i.e. by treating every fsync() call as a write barrier. You would need to ensure this for both data and metadata, which I expect would be devilishly hard to do within the softupdates framework. However, you might be able to accomplish it at the disk buffer level. For instance, you could have fsync() push the appropriate dirty buffers out to a separate cache, then commit the contents of the cache in the order of the fsyncs when the disk is next active. - The fiddling with rushjob seems rather arbitrary. You can probably just let the existing code increment it as necessary and force a sync if the value gets too high. - Patches against -CURRENT would be nice. (Sorry, that will be a doosie.) - It looks like you have a few separate changes in there, such as + TUNABLE_INT_FETCH("kern.maxvnodes", &desiredvnodes); and - long starttime; + time_t starttime; From owner-freebsd-fs@FreeBSD.ORG Mon Apr 14 16:47:31 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B01E437B401 for ; Mon, 14 Apr 2003 16:47:31 -0700 (PDT) Received: from mail-out2.apple.com (mail-out2.apple.com [17.254.0.51]) by mx1.FreeBSD.org (Postfix) with ESMTP id 171A843F85 for ; Mon, 14 Apr 2003 16:47:31 -0700 (PDT) (envelope-from mday@apple.com) Received: from mailgate1.apple.com (A17-128-100-225.apple.com [17.128.100.225]) by mail-out2.apple.com (8.12.9/8.12.9) with ESMTP id h3ENlVQd008164 for ; Mon, 14 Apr 2003 16:47:31 -0700 (PDT) Received: from scv1.apple.com (scv1.apple.com) by mailgate1.apple.com ; Mon, 14 Apr 2003 16:47:17 -0700 Received: from apple.com (daylight.apple.com [17.202.44.244]) by scv1.apple.com (8.12.9/8.12.9) with ESMTP id h3ENlIVX016100; Mon, 14 Apr 2003 16:47:18 -0700 (PDT) Date: Mon, 14 Apr 2003 16:46:59 -0700 Content-Type: text/plain; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v552) To: mistral@imasy.or.jp (Yoshihiko Sarumaru) From: Mark Day In-Reply-To: <030413020639.M0101472@mistral.imasy.or.jp> Message-Id: <627913C9-6ED3-11D7-A790-00039354009A@apple.com> Content-Transfer-Encoding: 7bit X-Mailer: Apple Mail (2.552) cc: fs@freebsd.org Subject: Re: time stamp on msdosfs could not be set by general user X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2003 23:47:32 -0000 On Saturday, April 12, 2003, at 10:06 AM, Yoshihiko Sarumaru wrote: > mistral% cp -p somefile /dos/ > cp: utimes: /dos/somefile: Operation not permitted > cp: chmod: /dos/somefile: Operation not permitted > > I can understand errors about chmod, but I can not understand errors > about utimes and modified time could not be set at all. This is a consequence of the user and group IDs not being settable per-file on DOS volumes. In effect, the user and group IDs are being changed behind your back. > Below patch ignores unmatching of user and file owner Which means that the user who did the "cp" is not the same as the user associated with the volume (the one who owns everything on that volume -- which is settable via a mount option). But since you were able to create the file in the first place, the user doing the cp must have had write access (as part of the group, or world). Workarounds would be to do the cp as root, or mount the volume as owned by the same user as the one doing the cp. > Any objection ? Hard to say. It violates the documented behavior of utimes -- that only the owner should be able to modify the times. But if the volume properly stored user and group IDs, you would have been the owner of the file, and the utimes would have worked in this case. Your change would allow utimes to work even for a file you didn't just create, as long as you had write access. That's potentially a security problem, but msdosfs doesn't really have security to begin with. For comparison, Darwin and Mac OS X generally avoid the problem by the way they manage the user ID. By default, everything on a msdosfs volume is owned by a special user ID that gets mapped dynamically to whoever is logged in at the console. For the one-user-at-a-time case, this works well. But a different user would get the same behavior you see. -Mark From owner-freebsd-fs@FreeBSD.ORG Tue Apr 15 07:12:52 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2737337B401 for ; Tue, 15 Apr 2003 07:12:52 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id E643643F3F for ; Tue, 15 Apr 2003 07:12:50 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id AAA30745; Wed, 16 Apr 2003 00:12:37 +1000 Date: Wed, 16 Apr 2003 00:12:37 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Mark Day In-Reply-To: <627913C9-6ED3-11D7-A790-00039354009A@apple.com> Message-ID: <20030415233658.E1376@gamplex.bde.org> References: <627913C9-6ED3-11D7-A790-00039354009A@apple.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Yoshihiko Sarumaru cc: fs@freebsd.org Subject: Re: time stamp on msdosfs could not be set by general user X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Apr 2003 14:12:52 -0000 On Mon, 14 Apr 2003, Mark Day wrote: > On Saturday, April 12, 2003, at 10:06 AM, Yoshihiko Sarumaru wrote: > > > mistral% cp -p somefile /dos/ > > cp: utimes: /dos/somefile: Operation not permitted > > cp: chmod: /dos/somefile: Operation not permitted > > > > I can understand errors about chmod, but I can not understand errors > > about utimes and modified time could not be set at all. > > This is a consequence of the user and group IDs not being settable > per-file on DOS volumes. In effect, the user and group IDs are being > changed behind your back. Not really behind one's back. They are set to constants determined at mount time, and whoever had mount permission usually has permission to decide them. > > Below patch ignores unmatching of user and file owner > > Which means that the user who did the "cp" is not the same as the user > associated with the volume (the one who owns everything on that volume > -- which is settable via a mount option). > > But since you were able to create the file in the first place, the user > doing the cp must have had write access (as part of the group, or > world). It is also settable via chown on the mount point (before mounting). I use root:msdosfs and am in group msdosfs so that I can access them like I want except for this problem. > Workarounds would be to do the cp as root, or mount the volume as owned > by the same user as the one doing the cp. Neither is what I like. I normally do "cp -p somefile /dospartition/somewhere", then say "@&*@^" and switch to another terminal running a root shell and repeat the copy, except when copying a lot of files I use root too much. > > Any objection ? > > Hard to say. It violates the documented behavior of utimes -- that > only the owner should be able to modify the times. But if the volume > properly stored user and group IDs, you would have been the owner of > the file, and the utimes would have worked in this case. > > Your change would allow utimes to work even for a file you didn't just > create, as long as you had write access. That's potentially a security > problem, but msdosfs doesn't really have security to begin with. I don't like ignoring the ownerships completely. Perhaps relaxing the ownership check to a group membership check would be acceptable. msdosfs honors the ownerships for everything now, so it is no more insecure than the configured ownerships permit. Not that utimes with a null arg works now, since that only requires write permission. So we can change the timestamps to "now" by using utimes(). This is OK (not just for msdosfs) since it is nothing more than we could do using write()+truncate() and read(). > For comparison, Darwin and Mac OS X generally avoid the problem by the > way they manage the user ID. By default, everything on a msdosfs > volume is owned by a special user ID that gets mapped dynamically to > whoever is logged in at the console. For the one-user-at-a-time case, > this works well. But a different user would get the same behavior you > see. FreeBSD only has non-dynamic mapping via /etc/fbtab. This isn't quite enough even for a one-user system since everything normally get mounted before anyone can log in. Bruce From owner-freebsd-fs@FreeBSD.ORG Tue Apr 15 11:25:28 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8B4A237B401; Tue, 15 Apr 2003 11:25:28 -0700 (PDT) Received: from mail.tel.fer.hr (zg03-108.dialin.iskon.hr [213.191.135.109]) by mx1.FreeBSD.org (Postfix) with ESMTP id B466343FA3; Tue, 15 Apr 2003 11:25:24 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from tel.fer.hr (marko-tp.katoda.net [192.168.201.109]) by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3FINQxK000657; Tue, 15 Apr 2003 20:23:36 +0200 (CEST) (envelope-from zec@tel.fer.hr) Message-ID: <3E9C4E85.F1F578B6@tel.fer.hr> Date: Tue, 15 Apr 2003 20:25:09 +0200 From: Marko Zec X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Content-Type: multipart/mixed; boundary="------------CB13BF8AD3C84FDA09AF0A11" Subject: UPDATE: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Apr 2003 18:25:29 -0000 This is a multi-part message in MIME format. --------------CB13BF8AD3C84FDA09AF0A11 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Attached are updated patches (against both 4.8 and 5.0) for delaying disk buffer synching on softupdates-enabled FS. The original patch started a rather lengthy debate whether when disk updates are being delayed the fsync() processing should be delayed as well. As Kirk McKusick already summarized, some people will prefer partial battery power savings with working fsync() semantics, while other will desire greater savings with broken semantics. Therefore as suggested the updated patch introduces an additional sysctl tunable vfs.ena_lazy_fsync, which controls whether fsync() calls will be ignored or not. Note that when vfs.sync_extdelay is set to 0, vfs.ena_lazy_fsync has no effect, i.e. fsync() always works with standard semantics. Cheers, Marko --------------CB13BF8AD3C84FDA09AF0A11 Content-Type: text/plain; charset=us-ascii; name="syncdelay-4.8.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="syncdelay-4.8.diff" --- /usr/src/sys.org/dev/ata/ata-disk.c Thu Jan 30 08:19:59 2003 +++ dev/ata/ata-disk.c Sat Apr 12 00:31:26 2003 @@ -294,6 +294,7 @@ adstrategy(struct buf *bp) struct ad_softc *adp = bp->b_dev->si_drv1; int s; + stratcalls++; if (adp->device->flags & ATA_D_DETACHING) { bp->b_error = ENXIO; bp->b_flags |= B_ERROR; --- /usr/src/sys.org/kern/vfs_subr.c Sun Oct 13 18:19:12 2002 +++ kern/vfs_subr.c Mon Apr 14 23:27:52 2003 @@ -116,6 +116,13 @@ SYSCTL_INT(_vfs, OID_AUTO, reassignbufme static int nameileafonly = 0; SYSCTL_INT(_vfs, OID_AUTO, nameileafonly, CTLFLAG_RW, &nameileafonly, 0, ""); +int stratcalls = 0; +int sync_extdelay = 0; +SYSCTL_INT(_vfs, OID_AUTO, sync_extdelay, CTLFLAG_RW, &sync_extdelay, 0, ""); + +int ena_lazy_fsync = 0; +SYSCTL_INT(_vfs, OID_AUTO, ena_lazy_fsync, CTLFLAG_RW, &ena_lazy_fsync, 0, ""); + #ifdef ENABLE_VFS_IOOPT int vfs_ioopt = 0; SYSCTL_INT(_vfs, OID_AUTO, ioopt, CTLFLAG_RW, &vfs_ioopt, 0, ""); @@ -137,7 +144,7 @@ static vm_zone_t vnode_zone; * The workitem queue. */ #define SYNCER_MAXDELAY 32 -static int syncer_maxdelay = SYNCER_MAXDELAY; /* maximum delay time */ +int syncer_maxdelay = SYNCER_MAXDELAY; /* maximum delay time */ time_t syncdelay = 30; /* max time to delay syncing data */ time_t filedelay = 30; /* time to delay syncing files */ SYSCTL_INT(_kern, OID_AUTO, filedelay, CTLFLAG_RW, &filedelay, 0, ""); @@ -145,7 +152,7 @@ time_t dirdelay = 29; /* time to delay SYSCTL_INT(_kern, OID_AUTO, dirdelay, CTLFLAG_RW, &dirdelay, 0, ""); time_t metadelay = 28; /* time to delay syncing metadata */ SYSCTL_INT(_kern, OID_AUTO, metadelay, CTLFLAG_RW, &metadelay, 0, ""); -static int rushjob; /* number of slots to run ASAP */ +int rushjob; /* number of slots to run ASAP */ static int stat_rush_requests; /* number of times I/O speeded up */ SYSCTL_INT(_debug, OID_AUTO, rush_requests, CTLFLAG_RW, &stat_rush_requests, 0, ""); @@ -1119,7 +1127,7 @@ sched_sync(void) { struct synclist *slp; struct vnode *vp; - long starttime; + time_t starttime; int s; struct proc *p = updateproc; @@ -1127,8 +1135,6 @@ sched_sync(void) SHUTDOWN_PRI_LAST); for (;;) { - kproc_suspend_loop(p); - starttime = time_second; /* @@ -1198,8 +1204,25 @@ sched_sync(void) * matter as we are just trying to generally pace the * filesystem activity. */ - if (time_second == starttime) + if (time_second != starttime) + continue; + + if (sync_extdelay >= syncer_maxdelay) + while (syncer_delayno == 0 && rushjob == 0 && + abs(time_second - starttime) < sync_extdelay) { + stratcalls = 0; tsleep(&lbolt, PPAUSE, "syncer", 0); + kproc_suspend_loop(p); + if (stratcalls != 0 && syncer_maxdelay < + abs(time_second - starttime)) { + rushjob = syncer_maxdelay; + break; + } + } + else { + tsleep(&lbolt, PPAUSE, "syncer", 0); + kproc_suspend_loop(p); + } } } --- /usr/src/sys.org/kern/vfs_syscalls.c Thu Jan 2 18:26:18 2003 +++ kern/vfs_syscalls.c Tue Apr 15 13:42:01 2003 @@ -563,6 +563,9 @@ sync(p, uap) register struct mount *mp, *nmp; int asyncflag; + /* Notify sched_sync() to try flushing syncer_workitem_pending[*] */ + rushjob += syncer_maxdelay; + simple_lock(&mountlist_slock); for (mp = TAILQ_FIRST(&mountlist); mp != NULL; mp = nmp) { if (vfs_busy(mp, LK_NOWAIT, &mountlist_slock, p)) { @@ -2627,6 +2630,10 @@ fsync(p, uap) struct file *fp; vm_object_t obj; int error; + + /* Just return if we are artificially delaying disk syncs */ + if (sync_extdelay && ena_lazy_fsync) + return (0); if ((error = getvnode(p->p_fd, SCARG(uap, fd), &fp)) != 0) return (error); --- /usr/src/sys.org/ufs/ffs/ffs_alloc.c Fri Sep 21 21:15:21 2001 +++ ufs/ffs/ffs_alloc.c Sat Apr 12 00:06:20 2003 @@ -125,6 +125,10 @@ ffs_alloc(ip, lbn, bpref, size, cred, bn #endif /* DIAGNOSTIC */ if (size == fs->fs_bsize && fs->fs_cstotal.cs_nbfree == 0) goto nospace; + /* Speedup flushing of syncer_wokitem_pending[*] if low on freespace */ + if (rushjob == 0 && + freespace(fs, fs->fs_minfree + 2) - numfrags(fs, size) < 0) + rushjob = syncer_maxdelay; if (cred->cr_uid != 0 && freespace(fs, fs->fs_minfree) - numfrags(fs, size) < 0) goto nospace; @@ -195,6 +199,10 @@ ffs_realloccg(ip, lbprev, bpref, osize, if (cred == NOCRED) panic("ffs_realloccg: missing credential"); #endif /* DIAGNOSTIC */ + /* Speedup flushing of syncer_wokitem_pending[*] if low on freespace */ + if (rushjob == 0 && + freespace(fs, fs->fs_minfree + 2) - numfrags(fs, nsize - osize) < 0) + rushjob = syncer_maxdelay; if (cred->cr_uid != 0 && freespace(fs, fs->fs_minfree) - numfrags(fs, nsize - osize) < 0) goto nospace; --- /usr/src/sys.org/sys/buf.h Sat Jan 25 20:02:23 2003 +++ sys/buf.h Sat Apr 12 00:30:48 2003 @@ -478,6 +478,7 @@ extern char *buffers; /* The buffer con extern int bufpages; /* Number of memory pages in the buffer pool. */ extern struct buf *swbuf; /* Swap I/O buffer headers. */ extern int nswbuf; /* Number of swap I/O buffer headers. */ +extern int stratcalls; /* I/O ops since last buffer sync */ extern TAILQ_HEAD(swqueue, buf) bswlist; extern TAILQ_HEAD(bqueues, buf) bufqueues[BUFFER_QUEUES]; --- /usr/src/sys.org/sys/vnode.h Sun Dec 29 19:19:53 2002 +++ sys/vnode.h Mon Apr 14 23:28:36 2003 @@ -294,6 +294,10 @@ extern struct vm_zone *namei_zone; extern int prtactive; /* nonzero to call vprint() */ extern struct vattr va_null; /* predefined null vattr structure */ extern int vfs_ioopt; +extern int rushjob; +extern int syncer_maxdelay; +extern int sync_extdelay; +extern int ena_lazy_fsync; /* * Macro/function to check for client cache inconsistency w.r.t. leasing. --------------CB13BF8AD3C84FDA09AF0A11 Content-Type: text/plain; charset=us-ascii; name="syncdelay-5.0.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="syncdelay-5.0.diff" --- /usr/src/sys.org/dev/ata/ata-disk.c Sat Nov 16 09:07:36 2002 +++ dev/ata/ata-disk.c Tue Apr 15 15:23:37 2003 @@ -289,6 +289,7 @@ adstrategy(struct bio *bp) struct ad_softc *adp = bp->bio_dev->si_drv1; int s; + stratcalls++; if (adp->device->flags & ATA_D_DETACHING) { biofinish(bp, NULL, ENXIO); return; --- /usr/src/sys.org/kern/vfs_subr.c Sat Nov 16 09:08:02 2002 +++ kern/vfs_subr.c Tue Apr 15 15:34:19 2003 @@ -73,6 +73,8 @@ #include #include +#define abs(x) (((x) < 0) ? -(x) : (x)) + static MALLOC_DEFINE(M_NETADDR, "Export Host", "Export host address structure"); static void addalias(struct vnode *vp, dev_t nvp_rdev); @@ -130,6 +132,13 @@ SYSCTL_INT(_vfs, OID_AUTO, reassignbufca static int nameileafonly; SYSCTL_INT(_vfs, OID_AUTO, nameileafonly, CTLFLAG_RW, &nameileafonly, 0, ""); +int stratcalls = 0; +int sync_extdelay = 0; +SYSCTL_INT(_vfs, OID_AUTO, sync_extdelay, CTLFLAG_RW, &sync_extdelay, 0, ""); + +int ena_lazy_fsync = 0; +SYSCTL_INT(_vfs, OID_AUTO, ena_lazy_fsync, CTLFLAG_RW, &ena_lazy_fsync, 0, ""); + #ifdef ENABLE_VFS_IOOPT /* See NOTES for a description of this setting. */ int vfs_ioopt; @@ -208,7 +217,7 @@ static struct synclist *syncer_workitem_ static struct mtx sync_mtx; #define SYNCER_MAXDELAY 32 -static int syncer_maxdelay = SYNCER_MAXDELAY; /* maximum delay time */ +int syncer_maxdelay = SYNCER_MAXDELAY; /* maximum delay time */ static int syncdelay = 30; /* max time to delay syncing data */ static int filedelay = 30; /* time to delay syncing files */ SYSCTL_INT(_kern, OID_AUTO, filedelay, CTLFLAG_RW, &filedelay, 0, ""); @@ -216,7 +225,7 @@ static int dirdelay = 29; /* time to de SYSCTL_INT(_kern, OID_AUTO, dirdelay, CTLFLAG_RW, &dirdelay, 0, ""); static int metadelay = 28; /* time to delay syncing metadata */ SYSCTL_INT(_kern, OID_AUTO, metadelay, CTLFLAG_RW, &metadelay, 0, ""); -static int rushjob; /* number of slots to run ASAP */ +int rushjob; /* number of slots to run ASAP */ static int stat_rush_requests; /* number of times I/O speeded up */ SYSCTL_INT(_debug, OID_AUTO, rush_requests, CTLFLAG_RW, &stat_rush_requests, 0, ""); @@ -1669,7 +1678,7 @@ sched_sync(void) struct synclist *slp; struct vnode *vp; struct mount *mp; - long starttime; + time_t starttime; int s; struct thread *td = FIRST_THREAD_IN_PROC(updateproc); /* XXXKSE */ @@ -1679,8 +1688,6 @@ sched_sync(void) SHUTDOWN_PRI_LAST); for (;;) { - kthread_suspend_check(td->td_proc); - starttime = time_second; /* @@ -1765,8 +1772,25 @@ sched_sync(void) * matter as we are just trying to generally pace the * filesystem activity. */ - if (time_second == starttime) + if (time_second != starttime) + continue; + + if (sync_extdelay >= syncer_maxdelay) + while (syncer_delayno == 0 && rushjob == 0 && + abs(time_second - starttime) < sync_extdelay) { + stratcalls = 0; tsleep(&lbolt, PPAUSE, "syncer", 0); + kthread_suspend_check(td->td_proc); + if (stratcalls != 0 && syncer_maxdelay < + abs(time_second - starttime)) { + rushjob = syncer_maxdelay; + break; + } + } + else { + tsleep(&lbolt, PPAUSE, "syncer", 0); + kthread_suspend_check(td->td_proc); + } } } --- /usr/src/sys.org/kern/vfs_syscalls.c Sat Nov 16 09:08:02 2002 +++ kern/vfs_syscalls.c Tue Apr 15 17:38:55 2003 @@ -123,6 +123,9 @@ sync(td, uap) struct mount *mp, *nmp; int asyncflag; + /* Notify sched_sync to try flushing dirty buffers */ + rushjob += syncer_maxdelay; + mtx_lock(&mountlist_mtx); for (mp = TAILQ_FIRST(&mountlist); mp != NULL; mp = nmp) { if (vfs_busy(mp, LK_NOWAIT, &mountlist_mtx, td)) { @@ -2704,6 +2707,10 @@ fsync(td, uap) struct file *fp; vm_object_t obj; int error; + + /* Just return if we are artificially delaying disk synchs */ + if (sync_extdelay && ena_lazy_fsync) + return (0); GIANT_REQUIRED; --- /usr/src/sys.org/sys/bio.h Sat Nov 16 09:08:19 2002 +++ sys/bio.h Tue Apr 15 15:24:20 2003 @@ -134,6 +134,8 @@ bioq_first(struct bio_queue_head *head) return (TAILQ_FIRST(&head->queue)); } +extern int stratcalls; + void biodone(struct bio *bp); void biofinish(struct bio *bp, struct devstat *stat, int error); int biowait(struct bio *bp, const char *wchan); --- /usr/src/sys.org/sys/vnode.h Sat Nov 16 09:08:21 2002 +++ sys/vnode.h Tue Apr 15 15:23:38 2003 @@ -361,6 +361,10 @@ extern struct uma_zone *namei_zone; extern int prtactive; /* nonzero to call vprint() */ extern struct vattr va_null; /* predefined null vattr structure */ extern int vfs_ioopt; +extern int rushjob; +extern int syncer_maxdelay; +extern int sync_extdelay; +extern int ena_lazy_fsync; /* * Macro/function to check for client cache inconsistency w.r.t. leasing. --- /usr/src/sys.org/ufs/ffs/ffs_alloc.c Sat Nov 16 09:08:21 2002 +++ ufs/ffs/ffs_alloc.c Tue Apr 15 15:26:37 2003 @@ -139,6 +139,10 @@ ffs_alloc(ip, lbn, bpref, size, cred, bn #endif /* DIAGNOSTIC */ reclaimed = 0; retry: + /* Speedup flushing of dirty buffers in sched_sync */ + if (rushjob == 0 && + freespace(fs, fs->fs_minfree + 2) - numfrags(fs, size) < 0) + rushjob = syncer_maxdelay; if (size == fs->fs_bsize && fs->fs_cstotal.cs_nbfree == 0) goto nospace; if (suser_cred(cred, PRISON_ROOT) && @@ -222,6 +226,10 @@ ffs_realloccg(ip, lbprev, bprev, bpref, #endif /* DIAGNOSTIC */ reclaimed = 0; retry: + /* Speedup flushing of dirty buffers in sched_sync */ + if (rushjob == 0 && + freespace(fs, fs->fs_minfree + 2) - numfrags(fs, nsize - osize) < 0) + rushjob = syncer_maxdelay; if (suser_cred(cred, PRISON_ROOT) && freespace(fs, fs->fs_minfree) - numfrags(fs, nsize - osize) < 0) goto nospace; --------------CB13BF8AD3C84FDA09AF0A11-- From owner-freebsd-fs@FreeBSD.ORG Tue Apr 15 11:38:06 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9778537B401; Tue, 15 Apr 2003 11:38:06 -0700 (PDT) Received: from mail.tel.fer.hr (zg03-155.dialin.iskon.hr [213.191.135.156]) by mx1.FreeBSD.org (Postfix) with ESMTP id E9E3543F85; Tue, 15 Apr 2003 11:38:04 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from tel.fer.hr (marko-tp.katoda.net [192.168.201.109]) by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3FIa3xK000661; Tue, 15 Apr 2003 20:36:08 +0200 (CEST) (envelope-from zec@tel.fer.hr) Message-ID: <3E9C517B.6039679A@tel.fer.hr> Date: Tue, 15 Apr 2003 20:37:47 +0200 From: Marko Zec X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: Kirk McKusick References: <200304130004.h3D04Vb5006635@beastie.mckusick.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: freebsd-fs@freebsd.org cc: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Apr 2003 18:38:07 -0000 Kirk McKusick wrote: > I am of the opinion that fsync should work. Applications like > `vi' use fsync to ensure that the write of the new file is on > stable store before removing the old copy. If that semantic > is broken, it would be possible to have neither the old nor > the new copy of your file after a crash. I do not consider > that acceptable behavior. Further, the fsync call is used > to ensure that link/unlink/rename have been completed. So > more than just fsync is being affected by your change. Lastly, > I often write out a file when I am about to suspend my laptop > (for low battery or other reasons) and I really want that file > on the disk now. I do not want to have to wait for it to decide > at some future time to spin up the disk. > > I suggest that you make the disabling of fsync a separate > option from the rest of your change so that people can > decide for themselves whether they want partial savings > with working semantics, or greater savings with broken > semantics. I am also intrigued by the changes proposed by > Ian Dowse that may better accomplish the same goals with > less breakage. Tempted by a lot of opposition to the concept of (optionally) ignoring fsync() calls when running on battery power, I wonder what effect the concept of unconditional delaying of _all_ disk updates by ATA-disk firmware will make on FS consistency in case of system crash or power failure? I do not want to imply such a concept is a priori bad, however I fail to realize its advantages over OS-controlled delaying of disk synching. Marko From owner-freebsd-fs@FreeBSD.ORG Tue Apr 15 12:12:01 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0291337B401; Tue, 15 Apr 2003 12:12:01 -0700 (PDT) Received: from mail.tel.fer.hr (zg07-145.dialin.iskon.hr [213.191.150.146]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4130743FA3; Tue, 15 Apr 2003 12:11:59 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from tel.fer.hr (marko-tp.katoda.net [192.168.201.109]) by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3FJA6xK000670; Tue, 15 Apr 2003 21:10:11 +0200 (CEST) (envelope-from zec@tel.fer.hr) Message-ID: <3E9C5975.43755858@tel.fer.hr> Date: Tue, 15 Apr 2003 21:11:50 +0200 From: Marko Zec X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: David Schultz References: <3E976EBD.C3E66EF8@tel.fer.hr> <20030414101935.GB18110@HAL9000.homeunix.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: freebsd-fs@FreeBSD.ORG cc: mckusick@McKusick.COM cc: freebsd-stable@FreeBSD.ORG Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Apr 2003 19:12:01 -0000 David Schultz wrote: > For instance, you could > have fsync() push the appropriate dirty buffers out to a separate > cache, then commit the contents of the cache in the order of the > fsyncs when the disk is next active. Huh... such a concept would still break fsync() semantics. Note that the original patch also ensures dirty buffers get flushed if / when the disk spins up, even before the delay timer gets expired. > - The fiddling with rushjob seems rather arbitrary. You can probably > just let the existing code increment it as necessary and force a sync > if the value gets too high. If rushjob is would not be used for forcing prompt synching, the original code could not guarantee the sync to occur immediately. Instead, the synching could be further delayed for up to 30 seconds, which is not desirable if our major design goal is to do as much disk I/O as possible in a small time interval and leave the disk idle otherwise. Marko From owner-freebsd-fs@FreeBSD.ORG Tue Apr 15 14:54:53 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 276C437B401; Tue, 15 Apr 2003 14:54:53 -0700 (PDT) Received: from testmail.wolves.k12.mo.us (testmail.wolves.k12.mo.us [207.160.214.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4059043FA3; Tue, 15 Apr 2003 14:54:52 -0700 (PDT) (envelope-from cdillon@wolves.k12.mo.us) Received: by testmail.wolves.k12.mo.us (Postfix, from userid 1001) id 1D957CD61; Tue, 15 Apr 2003 16:54:51 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by testmail.wolves.k12.mo.us (Postfix) with ESMTP id 1A2C2CD19; Tue, 15 Apr 2003 16:54:51 -0500 (CDT) Date: Tue, 15 Apr 2003 16:54:51 -0500 (CDT) From: Chris Dillon To: Marko Zec In-Reply-To: <3E9C5975.43755858@tel.fer.hr> Message-ID: <20030415160925.U86854@duey.wolves.k12.mo.us> References: <3E976EBD.C3E66EF8@tel.fer.hr> <20030414101935.GB18110@HAL9000.homeunix.com> <3E9C5975.43755858@tel.fer.hr> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-fs@freebsd.org cc: mckusick@McKusick.COM cc: David Schultz cc: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Apr 2003 21:54:53 -0000 On Tue, 15 Apr 2003, Marko Zec wrote: > Huh... such a concept would still break fsync() semantics. Note that > the original patch also ensures dirty buffers get flushed if / when > the disk spins up, even before the delay timer gets expired. Sorry to butt in on this thread... :-) It just occurred to me that the ability to delay all writes given an arbitrary time period would be good for more than just laptops. It would be great for non-volatile flash filesystems which have a limited write life. The only thing you would have to change for that case is make the "flush on read" optional, since the purpose would be to minimize writes, not minimize disk spin-ups which don't exist on flash parts. This would only be advantageous if delaying the writes will actually cause fewer writes to be made to the flash part than would have been made without the delay, i.e. via normal soft-updates optimizations (a file created and removed within the delay period never gets written, or delaying atime updates of oft-read files), which I'm guessing would be the case most of the time. For example, on a small flash-based firewall I currently use at home, I would use a delay time of 60 minutes or more. That would correspond to how I currently handle saving the important dynamic information kept on a memory filesystem, such as DHCP leases, which is every 60 minutes mount a small filesystem read-write on the flash part, tar up the dynamic data, and then umount the filesystem. I then have to un-tar that data onto the memory filesystem during boot. Being able to keep all of that information directly on a read-write filesystem on the flash part but delay writes for a relatively long period of time would alleviate all of that. If the "clean" bit is set on the FS during that long delay that would be even slicker (does it do that already?), since if the filesystem is consistent thanks to softupdates it shouldn't need to be fsck'd at all on boot. -- Chris Dillon - cdillon(at)wolves.k12.mo.us FreeBSD: The fastest and most stable server OS on the planet - Available for IA32 (Intel x86) and Alpha architectures - IA64, PowerPC, UltraSPARC, ARM, and S/390 under development - http://www.freebsd.org No trees were harmed in the composition of this message, although some electrons were mildly inconvenienced. From owner-freebsd-fs@FreeBSD.ORG Tue Apr 15 16:27:46 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3B57637B401; Tue, 15 Apr 2003 16:27:46 -0700 (PDT) Received: from mail.tel.fer.hr (zg06-140.dialin.iskon.hr [213.191.148.141]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0B29643F75; Tue, 15 Apr 2003 16:27:44 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from tel.fer.hr ([192.168.202.105]) by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3FNPpxK000691; Wed, 16 Apr 2003 01:25:55 +0200 (CEST) (envelope-from zec@tel.fer.hr) Message-ID: <3E9C9566.8603E312@tel.fer.hr> Date: Wed, 16 Apr 2003 01:27:34 +0200 From: Marko Zec X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: Chris Dillon References: <3E976EBD.C3E66EF8@tel.fer.hr> <20030414101935.GB18110@HAL9000.homeunix.com> <20030415160925.U86854@duey.wolves.k12.mo.us> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: freebsd-fs@freebsd.org cc: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Apr 2003 23:27:46 -0000 Chris Dillon wrote: > On Tue, 15 Apr 2003, Marko Zec wrote: > > > Huh... such a concept would still break fsync() semantics. Note that > > the original patch also ensures dirty buffers get flushed if / when > > the disk spins up, even before the delay timer gets expired. > > Sorry to butt in on this thread... :-) It just occurred to me that > the ability to delay all writes given an arbitrary time period would > be good for more than just laptops. It would be great for > non-volatile flash filesystems which have a limited write life. The > only thing you would have to change for that case is make the "flush > on read" optional, since the purpose would be to minimize writes, not > minimize disk spin-ups which don't exist on flash parts. This would > only be advantageous if delaying the writes will actually cause fewer > writes to be made to the flash part than would have been made without > the delay, i.e. via normal soft-updates optimizations (a file created > and removed within the delay period never gets written, or delaying > atime updates of oft-read files), which I'm guessing would be the case > most of the time. To achieve such a functionality, simply remove or comment out the stratcalls++ line in /sys/dev/ata/ata-disk.c. A cleaner method would of course be adding another tunable knob, which would also be a trivial thing to... Cheers, Marko > For example, on a small flash-based firewall I currently use at home, > I would use a delay time of 60 minutes or more. That would correspond > to how I currently handle saving the important dynamic information > kept on a memory filesystem, such as DHCP leases, which is every 60 > minutes mount a small filesystem read-write on the flash part, tar up > the dynamic data, and then umount the filesystem. I then have to > un-tar that data onto the memory filesystem during boot. Being able > to keep all of that information directly on a read-write filesystem on > the flash part but delay writes for a relatively long period of time > would alleviate all of that. > > If the "clean" bit is set on the FS during that long delay that would > be even slicker (does it do that already?), since if the filesystem is > consistent thanks to softupdates it shouldn't need to be fsck'd at all > on boot. > > -- > Chris Dillon - cdillon(at)wolves.k12.mo.us > FreeBSD: The fastest and most stable server OS on the planet > - Available for IA32 (Intel x86) and Alpha architectures > - IA64, PowerPC, UltraSPARC, ARM, and S/390 under development > - http://www.freebsd.org > > No trees were harmed in the composition of this message, although some > electrons were mildly inconvenienced. From owner-freebsd-fs@FreeBSD.ORG Tue Apr 15 20:30:30 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 255A137B407 for ; Tue, 15 Apr 2003 20:30:30 -0700 (PDT) Received: from tango.chessclub.com (tango.chessclub.com [204.178.125.70]) by mx1.FreeBSD.org (Postfix) with SMTP id ED5CD43FB1 for ; Tue, 15 Apr 2003 20:30:27 -0700 (PDT) (envelope-from sleator@tango.chessclub.com) Received: (qmail 81144 invoked by uid 1000); 16 Apr 2003 03:19:17 -0000 Date: 16 Apr 2003 03:19:17 -0000 Message-ID: <20030416031917.81143.qmail@tango.chessclub.com> From: Danny Sleator To: freebsd-fs@freebsd.org Subject: better ways to get the news X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2003 03:30:30 -0000 I'm alternately outraged and depressed by what's happening in the world. We now have the most powerful, deceitful, arrogant, and belligerent administration in US history. And almost everything they're doing is wrong. Here's one example to illustrate the power of Emperor Bush. He can start a unilateral, preemptive, unprecedented war costing hundreds of billions of dollars. His justification for it constantly changes, and is buttressed by a stream of lies. Simultaneously he can demand and get from congress a huge tax cut for the rich, despite the fact that we're in a recession and there's a huge budget deficit. And while doing all this outrageous stuff, he remains extremely popular. I'm thinking about what I, an average Joe, can do to slow down this juggernaut. One thing I did was put up this lighted sign outside of my house: http://www.cs.cmu.edu/~sleator/pictures/no-war.jpg But I think the real problem, and the reason for Bush's popularity, is that the American people basically don't have a clue about what's really happening. The mainstream media are not communicating it. Here are four examples to illustrate this point. 1. Remember the huge crowd of Iraqis cheering and pulling down a statue of Saddam? It turns out that the crowd was very small and some (all?) of the jubilant members of the crowd were actors. http://www.informationclearinghouse.info/article2842.htm 2. Remember the rampant looting of Baghdad? Perhaps you knew that the US didn't lift a finger to stop it. But did you know that it was encouraged by US troops as a photo op? http://truthout.org/docs_03/041603D.shtml 3. Did you know that Richard Perle (a key author of the US's current Iraq policy) worked to undermine the Camp David accords in the summer of 2000? http://www.guardian.co.uk/israel/Story/0,2763,342857,00.html 4. There's an outrageous, little-known part of NAFTA called chapter 11, which foreign corporate investors are using to challenge laws designed to protect public health, environmental regulations, and jury verdicts. The cases are heard before a secret international trade tribunal. http://www.citizen.org/publications/release.cfm?ID=7076 These are just a tiny sample to illustrate the problems of missing and/or misleading stories in the media. This situation goes a long way toward explaining why the war is so much more popular in the US than it is everywhere else. So I'm suggesting (to all the addresses in my inbox over the last few years) some good alternative sources of information that I've found. A good place to start is http://www.truthout.org They collect stories from reputable sources all over the world. You can sign up for a daily mailing of stories of their suggested stories. I've included one below. Sign up for their mailings at: http://216.25.72.229/membership/sub_mgmt.php http://www.fair.org is a media watchdog group. They maintain a web site, and they let you sign up for sporadic mailings about media deceptions and bias. They often have action alerts about specific outrages in the media. Another very good organization is http://www.moveon.org They email reminders when congress is considering important issues. They make it easy to contact your congress person to voice your opinion. They also run ads in mainstream publications and on TV. I also highly recommend the book "What Liberal Media?" by Eric Alterman. He explains in great detail all the ways in which the media system is broken, and how it got this way. Here are some other great sites to take a look at: http://www.consortiumnews.com http://www.copvcia.com http://www.democraticunderground.com http://www.informationclearinghouse.info http://www.tompaine.com http://www.zmag.org/weluser.htm I hope you find this mailing useful, and I apologize if you got this more than once. Feel free to distribute this further. One warning: If you keep up with these sites, your world view will start to diverge from the "standard" (i.e. false) world view. You risk being viewed as a conspiracy theorist or a nut. Danny Sleator Professor of Computer Science Carnegie Mellon University Email: sleator@cmu.edu t r u t h o u t | 04.16 Eagleburger: Bush Should be Impeached if He Attacks Syria GO Echoes of Empires Past GO Bomb Before You Buy GO US Troops Encouraged Ransacking GO Reflections on the Battle of Baghdad GO Bush-Hitler Remark Sinks Movie Exec GO 'Fearless' Dean Wins Converts GO What About Private Lori? GO t r u t h o u t - Newsletter Sign-up (Free) : GO Problems with the links? Go direct to our HomePage : http://www.truthout.org _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ t r u t h o u t | 04.15 William Rivers Pitt | How America Lost the War GO Rout Proves Anti-War Point GO Aftermath: The Bush Doctrine GO Baghdad Seeths With Anger Toward U.S. GO Syria Could Be Next, Warns Washington GO America Targeted 14,000 Sites. So Where Are The WMDs? GO Scandal-Hit US Firm Wins Key Contracts GO Civilisation Torn To Pieces GO Mesopotamia. Babylon. The Tigris and Euphrates GO t r u t h o u t - Newsletter Sign-up (Free) : GO Problems with the links? Go direct to our HomePage : http://www.truthout.org _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ t r u t h o u t | 04.14 War and Peace: Anarchy in the Streets GO U.S. Marines Exchange Heavy Fire in Central Baghdad GO Pillagers Strip Iraqi Museum of Its Treasure GO Crime Against Humanity GO Garner Waiting For "Last Shot" To Rule Baghdad GO Vanishing Liberties -- Where's the Press? GO Anthrax Source Probably Domestic GO India Mulls 'Pre-Emptive' Pakistan Strike, Cites Iraq War Precedent GO Outspoken Yellowstone Ranger Loses Job GO t r u t h o u t - Newsletter Sign-up (Free) : GO Problems with the links? Go direct to our HomePage : http://www.truthout.org _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ t r u t h o u t | 04.13 Congressman Questions Iraq Work Given To Halliburton Subsidiary Without Competition GO US Arms Group Heads for Lisbon GO US Show of Force Galls Arab World GO U.S. Govt Accused of War Crimes against Journalists GO Ordinary People Fear Their Nation Could Be Next Target of 'Regime Change' GO Northern Iraq Falls, Mobs Run Riot in Baghdad GO The Future of Iraq's Oil GO War Within A War A Real Possibility GO t r u t h o u t - Newsletter Sign-up (Free) : GO Problems with the links? Go direct to our HomePage : http://www.truthout.org _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ t r u t h o u t | 04.12 Hans Blix: War Planned 'Long in Advance' GO The Press and the War GO Spoils of War GO Suicide Bomber In Baghdad Injures Four Marines GO Security Council Balks at Postwar Plans GO Murdoch Adds to Empire With Control of DirecTV GO Bush Offers Crooks And Warmongers To Lead Iraq GO House Revives ANWAR Again GO t r u t h o u t - Newsletter Sign-up (Free) : GO Problems with the links? Go direct to our HomePage : http://www.truthout.org _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ t r u t h o u t | 04.11 Despite Cheering Crowds, Army Unit Sees Urban Combat in Baghdad GO In Search of Horror Weapons GO Syria Now Top US Target for 'Regime Change' GO Descent Into a Charnel-House Hospital Hell GO Republicans Want Patriot Act Made Permanent GO The Pentagon's 'Trainee,' Ahmad Chalabi GO UNICEF Warns Of Worsening Situation For Children In Iraq GO House Democrats Want Halliburton Probe GO CPJ Condemns Journalists' Deaths In Iraq GO t r u t h o u t - Newsletter Sign-up (Free) : GO Problems with the links? Go direct to our HomePage : http://www.truthout.org _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ t r u t h o u t | 04.10 William Rivers Pitt | The Longest Winter GO Dark Day for Journalists in Iraq GO Wailing Children, the Wounded, the Dead GO The Taliban are Back in Southeast Afghanistan GO War Out of Compassion GO Iraqis In Basra Weigh Freedom's Cost GO Oakland Cops Defend Use of Force Against Protesters GO Coleman Apologizes For Remark About Wellstone GO Saddam Hussein, "Chemical Ali" Apparently Survive Attacks GO Economy on the Edge GO t r u t h o u t - Newsletter Sign-up (Free) : GO Problems with the links? Go direct to our HomePage : http://www.truthout.org _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ t r u t h o u t | 04.09 Oakland Police Open Fire At Anti-War Protest GO Iraqis Launch Urban Fightback in Baghdad GO Simpson: 'This Is Like A Scene From Hell.' GO Baghdad Hospitals Overwhelmed, No Longer Counting Casualties GO "Smoking Gun" WMD Site in Iraq Turns Out to Contain Pesticide GO 'I Love My Country, But.' GO Cronies Set To Make A Killing GO William Rivers Pitt's New Book Now Available GO t r u t h o u t - Newsletter Sign-up (Free) : GO Problems with the links? Go direct to our HomePage : http://www.truthout.org _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ t r u t h o u t | 04.08 Surgeon Describes "Horrific Injuries," Sanitized War GO Up to 3,000 Iraqi Fighters Dead in Show of Force GO Red Cross: Iraq Wounded Too High to Count GO U.S. Finds No Weapons of Mass Destruction in Iraq GO Little Hope for Post-War Boom in US Economy GO Carlyle Group Heads for Lisbon GO Army Chaplain Offers Baptisms, Baths GO Irish Anti-War Marchers to Confront Bush GO Disarmament In Tatters GO 7-Year-Old Kurd: 'I Like War' GO t r u t h o u t - Newsletter Sign-up (Free) : GO Problems with the links? Go direct to our HomePage : http://www.truthout.org _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ t r u t h o u t | 04.07 Thirsty Iraqis Must Be Baptized to Get Water GO Near Baghdad, U.S. Troops Encounter 'Remarkable' Foe GO Britain Admits There May Be No WMD's in Iraq GO Forecasters Underrating Weakness of US Economy GO US Marines Kill Seven Iraqis After Truck Fails to Stop (Again) GO Baghdad Hospitals Stretched to their Limits GO American Portrayal of War of Liberation Faltering Across Arab World GO Blair and Friends Staring Into War's Political Abyss GO Turf War Rages in Washington Over Who Will Rule Iraq GO To Activists, Real Battles Are on Home Front GO t r u t h o u t - Newsletter Sign-up (Free) : GO Problems with the links? Go direct to our HomePage : http://www.truthout.org _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ t r u t h o u t | 04.06 Red Cross Horrified by Number of Dead Civilians GO Samar's Story GO At Umm Qasr, the "Secured" Port, "It's Chaos" GO How the Dissidents Fooled the Washington Hawks GO US Military Admits 'Suspicious' Powder is Explosive GO Kerry Lashes Out at Republican Criticisms GO Saddam Was Not Always Washington's 'Demon' GO The War's Dirty Secret: It's About Changing United States, Not Iraq GO Jobs Show Worse-Than-Expected Drop GO Senate Won't Debate Alaska Oil Drilling GO t r u t h o u t - Newsletter Sign-up (Free) : GO Problems with the links? Go direct to our HomePage : http://www.truthout.org _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ To JOIN the TO list: http://www.truthout.org/membership/sub_mgmt.php From owner-freebsd-fs@FreeBSD.ORG Wed Apr 16 01:35:45 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E087E37B401; Wed, 16 Apr 2003 01:35:45 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 369BA43F93; Wed, 16 Apr 2003 01:35:45 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0135.cvx40-bradley.dialup.earthlink.net ([216.244.42.135] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 195iOD-0004QY-00; Wed, 16 Apr 2003 01:35:34 -0700 Message-ID: <3E9D157E.96FD09AE@mindspring.com> Date: Wed, 16 Apr 2003 01:34:06 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Chris Dillon References: <3E976EBD.C3E66EF8@tel.fer.hr> <20030414101935.GB18110@HAL9000.homeunix.com> <20030415160925.U86854@duey.wolves.k12.mo.us> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a48a278b88c0ad456bc35dc084ece7a78e548b785378294e88350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@freebsd.org cc: mckusick@McKusick.COM cc: freebsd-stable@freebsd.org cc: David Schultz Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2003 08:35:46 -0000 Chris Dillon wrote: > On Tue, 15 Apr 2003, Marko Zec wrote: > > Huh... such a concept would still break fsync() semantics. Note that > > the original patch also ensures dirty buffers get flushed if / when > > the disk spins up, even before the delay timer gets expired. > > Sorry to butt in on this thread... :-) It just occurred to me that > the ability to delay all writes given an arbitrary time period would > be good for more than just laptops. It would be great for > non-volatile flash filesystems which have a limited write life. The life expectancy of these devices is really, really underestimated. In practice, I've seen two million write cycles from some of these in lab machines which get rewritten pretty often. You are actually better off with a "noatime" option, to avoid cron beating the same set of bits once a second, or even a read-only mount for most/all of your FS's to avoid having to worry about writes at all. > If the "clean" bit is set on the FS during that long delay that would > be even slicker (does it do that already?), since if the filesystem is > consistent thanks to softupdates it shouldn't need to be fsck'd at all > on boot. That's called "soft read-only". Kirk implemented that for the BSDI version, but not for FreeBSD or OpenBSD. We discussed it when he was doing the FreeBSD work on contract for Whistle. It's actually not that hard to do, I think, but it's probably evil to not update access times on an FS that's *technically* mounted read/write, if you're expecting those semantics. Practically, you can't really trust the BG fsck when it comes to real disks, because you can lose whole tracks, and if you ever do end up with an inconsistency, you are pretty screwed if it results in a panic. For something that's solid state, that's less of a problem. 8-). -- Terry From owner-freebsd-fs@FreeBSD.ORG Wed Apr 16 03:11:37 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9E9F837B401; Wed, 16 Apr 2003 03:11:37 -0700 (PDT) Received: from HAL9000.homeunix.com (12-233-57-131.client.attbi.com [12.233.57.131]) by mx1.FreeBSD.org (Postfix) with ESMTP id 013C643FAF; Wed, 16 Apr 2003 03:11:37 -0700 (PDT) (envelope-from das@FreeBSD.org) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.9/8.12.5) with ESMTP id h3GABa9E001264; Wed, 16 Apr 2003 03:11:36 -0700 (PDT) (envelope-from das@FreeBSD.org) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.9/8.12.5/Submit) id h3GABaTM001263; Wed, 16 Apr 2003 03:11:36 -0700 (PDT) (envelope-from das@FreeBSD.org) Date: Wed, 16 Apr 2003 03:11:36 -0700 From: David Schultz To: Marko Zec Message-ID: <20030416101136.GA868@HAL9000.homeunix.com> Mail-Followup-To: Marko Zec , freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org, mckusick@McKusick.COM References: <3E976EBD.C3E66EF8@tel.fer.hr> <20030414101935.GB18110@HAL9000.homeunix.com> <3E9C5975.43755858@tel.fer.hr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3E9C5975.43755858@tel.fer.hr> cc: freebsd-fs@FreeBSD.org cc: mckusick@McKusick.COM cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2003 10:11:38 -0000 On Tue, Apr 15, 2003, Marko Zec wrote: > David Schultz wrote: > > > For instance, you could > > have fsync() push the appropriate dirty buffers out to a separate > > cache, then commit the contents of the cache in the order of the > > fsyncs when the disk is next active. > > Huh... such a concept would still break fsync() semantics. Note that the > original patch also ensures dirty buffers get flushed if / when the disk spins > up, even before the delay timer gets expired. I didn't say it wouldn't still break fsync() semantics; it doesn't. However, you could guarantee that data are in a consistent state with my proposal. On the other hand, the more I think about the details, the more I think this could be more of a pain than it's worth. > > > - The fiddling with rushjob seems rather arbitrary. You can probably > > just let the existing code increment it as necessary and force a sync > > if the value gets too high. > > If rushjob is would not be used for forcing prompt synching, the original code > could not guarantee the sync to occur immediately. Instead, the synching could > be further delayed for up to 30 seconds, which is not desirable if our major > design goal is to do as much disk I/O as possible in a small time interval and > leave the disk idle otherwise. I was referring to all the places where rushjob is set to or incremented by syncer_maxdelay. AFAIK, it should never be that large. I don't think you want to overload a low memory handling mechanism with the task of syncing the disk. From owner-freebsd-fs@FreeBSD.ORG Wed Apr 16 03:28:53 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 34B2C37B401; Wed, 16 Apr 2003 03:28:53 -0700 (PDT) Received: from mail.r.caley.org.uk (82-41-209-16.cable.ubr12.edin.blueyonder.co.uk [82.41.209.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6C32643F75; Wed, 16 Apr 2003 03:28:51 -0700 (PDT) (envelope-from rjc@caley.org.uk) Received: from pele.r.caley.org.uk (pele.r.caley.org.uk [10.0.0.12]) by mail.r.caley.org.uk (8.12.6/8.12.6) with ESMTP id h3GASnXj093442; Wed, 16 Apr 2003 11:28:49 +0100 (BST) (envelope-from rjc@bast.r.caley.org.uk) Received: from pele.r.caley.org.uk (localhost [127.0.0.1]) by pele.r.caley.org.uk (8.12.6/8.12.6) with ESMTP id h3GASnFl051393; Wed, 16 Apr 2003 11:28:49 +0100 (BST) (envelope-from rjc@bast.r.caley.org.uk) Received: (from rjc@localhost) by pele.r.caley.org.uk (8.12.6/8.12.6/Submit) id h3GASnQQ051390; Wed, 16 Apr 2003 11:28:49 +0100 (BST) (envelope-from rjc@bast.r.caley.org.uk) X-Authentication-Warning: pele.r.caley.org.uk: rjc set sender to rjc@bast.r.caley.org.uk using -f Sender: rjc@caley.org.uk To: Marko Zec References: <200304121438.h3CEct41030991@lurza.secnetix.de> <3E9840B8.F00E018F@tel.fer.hr> From: Richard Caley In-Reply-To: <3E9840B8.F00E018F@tel.fer.hr> Date: 16 Apr 2003 11:28:49 +0100 Message-ID: <87smsiohwe.fsf@pele.r.caley.org.uk> Lines: 26 User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.1 (Cuyahoga Valley) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii cc: freebsd-fs@freebsd.org cc: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2003 10:28:53 -0000 In article <3E9840B8.F00E018F@tel.fer.hr>, Marko Zec (mz) writes: mz> I agree that additional tunable for controlling fsync() behavior couldn't hurt, mz> however as explained in previous note I see the fsync() as the most common mz> initiator of disk spinnups, so a method for suppressing it must be made mz> available, otherwise the whole patch wouldn't make much sense... Would it make sense to make the fsync behaviour a per-process choice? That way certain system processes could, if this delay behaviour is enabled, use the null fsync. For instance, if syslog is one of the things causing annoying spin-ups, then the user could tell syslog not to really fsync, trading forensic information in the event of a crash for battery life. Additionally there could be a really_really_fysnc call to be used to make certain programs delay-aware. Eg, it might be acceptable for my emacs checkpointing not to fsync, again I'm trading losing a little more work in the event of a crash for battery life, but when I explicitly save, I am saying I want that stuff on disk and stable NOW, and damn battery. -- Mail me as MYFIRSTNAME@MYLASTNAME.org.uk _O_ |< From owner-freebsd-fs@FreeBSD.ORG Wed Apr 16 06:39:03 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2A28C37B404 for ; Wed, 16 Apr 2003 06:39:03 -0700 (PDT) Received: from laptop.tenebras.com (laptop.tenebras.com [66.92.188.18]) by mx1.FreeBSD.org (Postfix) with SMTP id 8EEEB43FB1 for ; Wed, 16 Apr 2003 06:39:00 -0700 (PDT) (envelope-from kudzu@tenebras.com) Received: (qmail 12497 invoked from network); 16 Apr 2003 13:38:58 -0000 Received: from queequeg.tenebras.com (HELO tenebras.com) (192.168.188.241) by 0 with SMTP; 16 Apr 2003 13:38:58 -0000 Message-ID: <3E9D5CF2.7090606@tenebras.com> Date: Wed, 16 Apr 2003 06:38:58 -0700 From: Michael Sierchio User-Agent: Mozilla/5.0 (X11; U; Linux i386; en-US; rv:1.3) Gecko/20030312 X-Accept-Language: en-us, en, zh-cn, zh-tw MIME-Version: 1.0 To: Richard Caley References: <200304121438.h3CEct41030991@lurza.secnetix.de> <3E9840B8.F00E018F@tel.fer.hr> <87smsiohwe.fsf@pele.r.caley.org.uk> In-Reply-To: <87smsiohwe.fsf@pele.r.caley.org.uk> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-fs@freebsd.org cc: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2003 13:39:03 -0000 Richard Caley wrote: > Additionally there could be a really_really_fysnc call ... There is. It is used in hundreds of programs. It is called fsync (2). From owner-freebsd-fs@FreeBSD.ORG Wed Apr 16 09:25:05 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 43C3037B401; Wed, 16 Apr 2003 09:25:05 -0700 (PDT) Received: from testmail.wolves.k12.mo.us (testmail.wolves.k12.mo.us [207.160.214.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 068C443FD7; Wed, 16 Apr 2003 09:25:04 -0700 (PDT) (envelope-from cdillon@wolves.k12.mo.us) Received: by testmail.wolves.k12.mo.us (Postfix, from userid 1001) id DA7C0CD7C; Wed, 16 Apr 2003 11:25:02 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by testmail.wolves.k12.mo.us (Postfix) with ESMTP id D8EACCD19; Wed, 16 Apr 2003 11:25:02 -0500 (CDT) Date: Wed, 16 Apr 2003 11:25:02 -0500 (CDT) From: Chris Dillon To: Terry Lambert In-Reply-To: <3E9D157E.96FD09AE@mindspring.com> Message-ID: <20030416100921.U91118@duey.wolves.k12.mo.us> References: <3E976EBD.C3E66EF8@tel.fer.hr> <20030414101935.GB18110@HAL9000.homeunix.com> <20030415160925.U86854@duey.wolves.k12.mo.us> <3E9D157E.96FD09AE@mindspring.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-fs@freebsd.org cc: mckusick@McKusick.COM cc: freebsd-stable@freebsd.org cc: David Schultz Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2003 16:25:05 -0000 On Wed, 16 Apr 2003, Terry Lambert wrote: > Chris Dillon wrote: > > On Tue, 15 Apr 2003, Marko Zec wrote: > > > Huh... such a concept would still break fsync() semantics. Note > > > that the original patch also ensures dirty buffers get flushed > > > if / when the disk spins up, even before the delay timer gets > > > expired. > > > > Sorry to butt in on this thread... :-) It just occurred to me > > that the ability to delay all writes given an arbitrary time > > period would be good for more than just laptops. It would be > > great for non-volatile flash filesystems which have a limited > > write life. > > The life expectancy of these devices is really, really > underestimated. In practice, I've seen two million write cycles > from some of these in lab machines which get rewritten pretty often. I realize they have what looks like a really big number of writes on a human scale, but to a computer which does things methodically day in and day out without stopping, those writes can add up relatively quickly. Even with a life of two million write cycles, the "occasional" 30-second round of updates that happen to write the same bits over and over will give your flash part a life of only 1.9 years (2000000 writes * 30 seconds apart = 60000000 seconds to failure). Also, I doubt you'll actually get 2 million writes out of the average consumer flash part. A little USB key drive I have here is only rated at 1 million writes, so it would likely last less than a year under the above conditions. > You are actually better off with a "noatime" option, to avoid cron > beating the same set of bits once a second, or even a read-only > mount for most/all of your FS's to avoid having to worry about > writes at all. Yeah, I already do that in the stuff I've built, I'm just saying it would be advantageous not to have to do that in certain cases. > > If the "clean" bit is set on the FS during that long delay that > > would be even slicker (does it do that already?), since if the > > filesystem is consistent thanks to softupdates it shouldn't need > > to be fsck'd at all on boot. > > That's called "soft read-only". Kirk implemented that for the BSDI > version, but not for FreeBSD or OpenBSD. We discussed it when he > was doing the FreeBSD work on contract for Whistle. It's actually > not that hard to do, I think, but it's probably evil to not update > access times on an FS that's *technically* mounted read/write, if > you're expecting those semantics. I've seen some versions of Windows do the soft-read-only thing with FAT filesystems. I also recall surprising a FreeBSD box with a reset button and seeing a few RW-mounted filesystems go by marked "clean" during boot, but if we don't have soft-read-only I was probably just imagining it, or something else was at play. As for atimes, if you're expecting all writes to be delayed, and you still want atimes to be updated, you'll surely take into account that the atime updates will be delayed as well. This is all purely optional behaviour, remember, so you should understand which bits of your foot you're likely to shoot off when you turn it on. It's not really foot-shooting in that case, either, as long as you're not relying on your atimes for anything important. > Practically, you can't really trust the BG fsck when it comes to > real disks, because you can lose whole tracks, and if you ever do > end up with an inconsistency, you are pretty screwed if it results > in a panic. For something that's solid state, that's less of a > problem. 8-). Yes, definately. Soft-read-only combined with regular foreground fsck's would be the way to go with the current crop of drives. -- Chris Dillon - cdillon(at)wolves.k12.mo.us FreeBSD: The fastest and most stable server OS on the planet - Available for IA32 (Intel x86) and Alpha architectures - IA64, PowerPC, UltraSPARC, ARM, and S/390 under development - http://www.freebsd.org No trees were harmed in the composition of this message, although some electrons were mildly inconvenienced. From owner-freebsd-fs@FreeBSD.ORG Wed Apr 16 15:10:09 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0330937B404; Wed, 16 Apr 2003 15:10:09 -0700 (PDT) Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by mx1.FreeBSD.org (Postfix) with SMTP id C2BE943F75; Wed, 16 Apr 2003 15:10:07 -0700 (PDT) (envelope-from iedowse@maths.tcd.ie) Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 16 Apr 2003 23:10:07 +0100 (BST) To: Marko Zec In-Reply-To: Your message of "Tue, 15 Apr 2003 20:37:47 +0200." <3E9C517B.6039679A@tel.fer.hr> Date: Wed, 16 Apr 2003 23:10:06 +0100 From: Ian Dowse Message-ID: <200304162310.aa96829@salmon.maths.tcd.ie> cc: freebsd-fs@freebsd.org cc: freebsd-stable@freebsd.org cc: Kirk McKusick Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2003 22:10:09 -0000 In message <3E9C517B.6039679A@tel.fer.hr>, Marko Zec writes: >Tempted by a lot of opposition to the concept of (optionally) ignoring >fsync() calls when running on battery power, I wonder what effect the >concept of unconditional delaying of _all_ disk updates by ATA-disk >firmware will make on FS consistency in case of system crash or power >failure? I do not want to imply such a concept is a priori bad, however >I fail to realize its advantages over OS-controlled delaying of disk >synching. Note that the ATA "delayed write" mechanism only delays writes while the disk is spun down; at other times there is no change in behaviour. Since the disk only spins down after it has been idle for a time, it is very unlikely that the disk is left in an inconsistent state while it is stopped. Just after the disk spins up there is a small window where the cached writes get written out in a burst. Due to the amount of cached data and the probable re-ordering of writes, the disk is quite likely to be in an inconsistent state during this flurry of writes, but the window is short so it is probably not a big issue in practice. The main advantage of using the ATA delayed write mechanism is that the disk itself can take advantage of knowing whether or not it is spinning, whereas the OS does not have that information. The downside is that it is not guaranteed that fsync'd data gets written to disk immediately, though in practice the disk tends to be spinning when the fsync is performed due to the previous accesses. I've been using ATA delayed writes on a few laptops for over a year and it has never caused me any problems - it generally works just right in the sense that the disk remains spun down when the machine is mostly idle, and spins up when you save files from an editor etc. Doing the write delaying in the OS is always going to be a tradeoff between excessively delaying writes when the machine is busy and maximising the time between spin-ups when idle, though obviously there is more control possible over which writes get delayed and which don't. Ian From owner-freebsd-fs@FreeBSD.ORG Wed Apr 16 19:26:09 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 78E1137B401; Wed, 16 Apr 2003 19:26:09 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5894943FCB; Wed, 16 Apr 2003 19:26:08 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0250.cvx40-bradley.dialup.earthlink.net ([216.244.42.250] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 195z66-0005pC-00; Wed, 16 Apr 2003 19:26:00 -0700 Message-ID: <3E9E1063.C7D29C29@mindspring.com> Date: Wed, 16 Apr 2003 19:24:35 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Chris Dillon References: <3E976EBD.C3E66EF8@tel.fer.hr> <20030414101935.GB18110@HAL9000.homeunix.com> <20030415160925.U86854@duey.wolves.k12.mo.us> <20030416100921.U91118@duey.wolves.k12.mo.us> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a47ab6d6c875cbb8072b3f8575e5e62c02a8438e0f32a48e08350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@freebsd.org cc: mckusick@McKusick.COM cc: freebsd-stable@freebsd.org cc: David Schultz Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2003 02:26:09 -0000 Chris Dillon wrote: > As for atimes, if you're expecting all writes to be delayed, and you > still want atimes to be updated, you'll surely take into account that > the atime updates will be delayed as well. This is all purely > optional behaviour, remember, so you should understand which bits of > your foot you're likely to shoot off when you turn it on. It's not > really foot-shooting in that case, either, as long as you're not > relying on your atimes for anything important. POSIX sometimes says "SHALL be updated"; but mostly, it says "SHALL be marked for update". Probably you can delay those indefinitely, as long as the timestamp is set at the time you mark, so it matches what would have been there. It's probably OK to coelesce them to the most recent one, as well. The atime is actually one of the things I had to "POSIX lawyer" in a project back around 1994. 8-). -- Terry From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 02:49:02 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 849F437B401 for ; Thu, 17 Apr 2003 02:49:02 -0700 (PDT) Received: from mailbox.univie.ac.at (mailbox.univie.ac.at [131.130.1.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7F3C543F3F for ; Thu, 17 Apr 2003 02:49:01 -0700 (PDT) (envelope-from l.ertl@univie.ac.at) Received: from pcle2.cc.univie.ac.at (pcle2.cc.univie.ac.at [131.130.2.177]) by mailbox.univie.ac.at (8.12.2/8.12.2) with ESMTP id h3H9mnvN029940 for ; Thu, 17 Apr 2003 11:48:55 +0200 Date: Thu, 17 Apr 2003 11:48:49 +0200 (CEST) From: Lukas Ertl To: freebsd-fs@freebsd.org Message-ID: <20030417114652.A11713@pcle2.cc.univie.ac.at> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE X-DCC-ZID-Univie-mailbox-Metrics: mailbox 4251; Body=1 Fuz1=1 Fuz2=1 Subject: growing filesystems in 5-current X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2003 09:49:02 -0000 Hi! (I've sent the following mail to -hackers, but haven't received a reply yet, so I'm trying here - thanks.) Since growfs currently is not able to grow filesystems on vinum volumes in 5-current, I started playing around with it and hacked to following patch. On first look it seems to work, but there is still a problem I can't explain. Consider a simple vinum volume with a concat plex, containing a 32 MB subdisk. I newfs this volume like that: ---8<--- # newfs -O2 /dev/vinum/mytest /dev/vinum/mytest: 32.0MB (65536 sectors) block size 16384, fragment size 2048 using 4 cylinder groups of 8.02MB, 513 blks, 1088 inodes. super-block backups (for fsck -b #) at: 160, 16576, 32992, 49408 ---8<--- So far, so good. Then I attach another 32 MB subdisk to the plex and try my hacked growfs on it and I get this: ---8<--- # growfs /dev/vinum/mytest We strongly recommend you to make a backup before growing the Filesystem Did you backup your data (Yes/No) ? Yes new file systemsize is: 32768 frags Warning: 16160 sector(s) cannot be allocated. growfs: 56.1MB (114912 sectors) block size 16384, fragment size 2048 using 7 cylinder groups of 8.02MB, 513 blks, 1088 inodes. super-block backups (for fsck -b #) at: 65824, 82240, 98656 ---8<--- Why do I loose so many sectors there? Can you help me find the bug? At first I suspected sblock.fs_fpg, since a debug printf after: ---8<--- if (sblock.fs_size % sblock.fs_fpg !=3D 0 && sblock.fs_size % sblock.fs_fpg < cgdmin(&sblock, sblock.fs_ncg)) { ---8<--- said that sblock.fs_fpg is 0 - a debug printf before that if statement told me a more likely number. Apart from that: am I going the wrong way with this patch? Is there a better way to fit growfs to the new vinum/geom stuff? Here's the patch: ---8<--- Index: growfs.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /u/cvs/cvs/src/sbin/growfs/growfs.c,v retrieving revision 1.13 diff -u -r1.13 growfs.c --- growfs.c=0930 Dec 2002 21:18:05 -0000=091.13 +++ growfs.c=0916 Apr 2003 17:51:02 -0000 @@ -56,6 +56,7 @@ #include #include #include +#include #include #include @@ -111,6 +112,8 @@ static char=09=09inobuf[MAXBSIZE];=09/* inode block */ static int=09=09maxino;=09=09=09/* last valid inode */ +static int unlabeled; + /* * An array of elements of type struct gfs_bpp describes all blocks to * be relocated in order to free the space needed for the cylinder group @@ -148,6 +151,7 @@ static void=09updrefs(int, ino_t, struct gfs_bpp *, int, int, unsigned int= ); static void=09indirchk(ufs_lbn_t, ufs_lbn_t, ufs2_daddr_t, ufs_lbn_t, =09=09 struct gfs_bpp *, int, int, unsigned int); +static void get_dev_size(int, int *); /* ************************************************************ growfs ***= ** */ /* @@ -1884,6 +1888,21 @@ =09return columns; } +static void +get_dev_size(int fd, int *size) +{ +=09int sectorsize; +=09off_t mediasize; + +=09ioctl(fd, DIOCGSECTORSIZE, §orsize); +=09ioctl(fd, DIOCGMEDIASIZE, &mediasize); + +=09if (sectorsize <=3D 0) +=09=09errx(1, "bogus sectorsize: %d", sectorsize); + +=09*size =3D mediasize / sectorsize; +} + /* ************************************************************** main ***= ** */ /* * growfs(8) is a utility which allows to increase the size of an exist= ing @@ -1921,6 +1940,7 @@ =09struct disklabel=09*lp; =09struct partition=09*pp; =09int=09i,fsi,fso; +=09u_int32_t p_size; =09char=09reply[5]; #ifdef FSMAXSNAP =09int=09j; @@ -2020,25 +2040,30 @@ =09 */ =09cp=3Ddevice+strlen(device)-1; =09lp =3D get_disklabel(fsi); -=09if(lp->d_type =3D=3D DTYPE_VINUM) { -=09=09pp =3D &lp->d_partitions[0]; -=09} else if (isdigit(*cp)) { -=09=09pp =3D &lp->d_partitions[2]; -=09} else if (*cp>=3D'a' && *cp<=3D'h') { -=09=09pp =3D &lp->d_partitions[*cp - 'a']; +=09if (lp !=3D NULL) { +=09=09if (isdigit(*cp)) { +=09=09=09pp =3D &lp->d_partitions[2]; +=09=09} else if (*cp>=3D'a' && *cp<=3D'h') { +=09=09=09pp =3D &lp->d_partitions[*cp - 'a']; +=09=09} else { +=09=09=09errx(1, "unknown device"); +=09=09} +=09=09p_size =3D pp->p_size; =09} else { -=09=09errx(1, "unknown device"); +=09=09get_dev_size(fsi, &p_size); =09} =09/* =09 * Check if that partition looks suited for growing a file system. =09 */ -=09if (pp->p_size < 1) { +=09if (p_size < 1) { =09=09errx(1, "partition is unavailable"); =09} +/* =09if (pp->p_fstype !=3D FS_BSDFFS) { =09=09errx(1, "partition not 4.2BSD"); =09} +*/ =09/* =09 * Read the current superblock, and take a backup. @@ -2067,11 +2092,11 @@ =09 * Determine size to grow to. Default to the full size specified in =09 * the disk label. =09 */ -=09sblock.fs_size =3D dbtofsb(&osblock, pp->p_size); +=09sblock.fs_size =3D dbtofsb(&osblock, p_size); =09if (size !=3D 0) { -=09=09if (size > pp->p_size){ +=09=09if (size > p_size){ =09=09=09errx(1, "There is not enough space (%d < %d)", -=09=09=09 pp->p_size, size); +=09=09=09 p_size, size); =09=09} =09=09sblock.fs_size =3D dbtofsb(&osblock, size); =09} @@ -2121,7 +2146,7 @@ =09 * later on realize we have to abort our operation, on that block =09 * there should be no data, so we can't destroy something yet. =09 */ -=09wtfs((ufs2_daddr_t)pp->p_size-1, (size_t)DEV_BSIZE, (void *)&sblock, +=09wtfs((ufs2_daddr_t)p_size-1, (size_t)DEV_BSIZE, (void *)&sblock, =09 fso, Nflag); =09/* @@ -2182,12 +2207,14 @@ =09/* =09 * Update the disk label. =09 */ -=09pp->p_fsize =3D sblock.fs_fsize; -=09pp->p_frag =3D sblock.fs_frag; -=09pp->p_cpg =3D sblock.fs_fpg; - -=09return_disklabel(fso, lp, Nflag); -=09DBG_PRINT0("label rewritten\n"); +=09if (!unlabeled) { +=09=09pp->p_fsize =3D sblock.fs_fsize; +=09=09pp->p_frag =3D sblock.fs_frag; +=09=09pp->p_cpg =3D sblock.fs_fpg; + +=09=09return_disklabel(fso, lp, Nflag); +=09=09DBG_PRINT0("label rewritten\n"); +=09} =09close(fsi); =09if(fso>-1) close(fso); @@ -2254,12 +2281,13 @@ =09if (!lab) { =09=09errx(1, "malloc failed"); =09} -=09if (ioctl(fd, DIOCGDINFO, (char *)lab) < 0) { -=09=09errx(1, "DIOCGDINFO failed"); +=09if (!ioctl(fd, DIOCGDINFO, (char *)lab)) { +=09=09return (lab); =09} +=09unlabeled++; =09DBG_LEAVE; -=09return (lab); +=09return (NULL); } ---8<--- best regards, le --=20 Lukas Ertl eMail: l.ertl@univie.ac.at UNIX-Systemadministrator Tel.: (+43 1) 4277-14073 Zentraler Informatikdienst (ZID) Fax.: (+43 1) 4277-9140 der Universit=E4t Wien http://mailbox.univie.ac.at/~le/ From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 03:27:07 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E981637B404; Thu, 17 Apr 2003 03:27:07 -0700 (PDT) Received: from franky.speednet.com.au (franky.speednet.com.au [203.57.65.5]) by mx1.FreeBSD.org (Postfix) with ESMTP id BC54243FCB; Thu, 17 Apr 2003 03:27:06 -0700 (PDT) (envelope-from andyf@speednet.com.au) Received: from hewey.af.speednet.com.au (hewey.af.speednet.com.au [203.38.96.242])h3HAR2l1080319; Thu, 17 Apr 2003 20:27:02 +1000 (EST) (envelope-from andyf@speednet.com.au) Received: from hewey.af.speednet.com.au (hewey.af.speednet.com.au [203.38.96.242])h3HAR1g9002252; Thu, 17 Apr 2003 20:27:01 +1000 (EST) (envelope-from andyf@speednet.com.au) Date: Thu, 17 Apr 2003 20:27:00 +1000 (EST) From: Andy Farkas X-X-Sender: andyf@hewey.af.speednet.com.au To: "Paul M. Lambert" In-Reply-To: <20030417075306.GZ71088@slappy.plambert.net> Message-ID: <20030417194056.B795-100000@hewey.af.speednet.com.au> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-fs@freebsd.org cc: freebsd-questions@freebsd.org Subject: Re: chflags "archived" flag? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2003 10:27:08 -0000 [cc'd to -fs because you might have more clue..] > > chflags(1) and chflags(2) and chflags(3) all mention SF_ARCHIVED as a flag > that the superuser can set on a file or directory. > > My question is simple: what's this flag do? Does it have any effect? > Short answer: nothing. no. Its only there to support msdos(5) type file systems. -- :{ andyf@speednet.com.au Andy Farkas System Administrator Speednet Communications http://www.speednet.com.au/ From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 04:45:38 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0CB9637B401; Thu, 17 Apr 2003 04:45:38 -0700 (PDT) Received: from premijer.tel.fer.hr (premijer.tel.fer.hr [161.53.19.221]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0481F43FAF; Thu, 17 Apr 2003 04:45:37 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from tel.fer.hr (unknown [161.53.19.14]) by premijer.tel.fer.hr (Postfix) with ESMTP id 120DA1380; Thu, 17 Apr 2003 13:45:17 +0200 (MET DST) Message-ID: <3E9E93D8.EB16ED42@tel.fer.hr> Date: Thu, 17 Apr 2003 13:45:28 +0200 From: Marko Zec X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: David Schultz References: <3E976EBD.C3E66EF8@tel.fer.hr> <20030414101935.GB18110@HAL9000.homeunix.com> <3E9C5975.43755858@tel.fer.hr> <20030416101136.GA868@HAL9000.homeunix.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: freebsd-fs@FreeBSD.org cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2003 11:45:38 -0000 David Schultz wrote: > On Tue, Apr 15, 2003, Marko Zec wrote: > > > > > - The fiddling with rushjob seems rather arbitrary. You can probably > > > just let the existing code increment it as necessary and force a sync > > > if the value gets too high. > > > > If rushjob is would not be used for forcing prompt synching, the original code > > could not guarantee the sync to occur immediately. Instead, the synching could > > be further delayed for up to 30 seconds, which is not desirable if our major > > design goal is to do as much disk I/O as possible in a small time interval and > > leave the disk idle otherwise. > > I was referring to all the places where rushjob is set to or > incremented by syncer_maxdelay. AFAIK, it should never be that > large. Hmm... Why? :) > I don't think you want to overload a low memory handling > mechanism with the task of syncing the disk. As far as I can see the rushjob variable is used only at one place in kern/vfs_subr.c to notify softupdates synching scheduler to start synching earlier than the normal timers would expire. I just reused the same mechanism to urge synching of dirty buffers when the extra delay timer expires, or when outstanding disk I/O occurs, to coalesce disk updates with occasional disk spinups. Marko From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 05:03:56 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A36D537B401; Thu, 17 Apr 2003 05:03:56 -0700 (PDT) Received: from premijer.tel.fer.hr (premijer.tel.fer.hr [161.53.19.221]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8DCF843FBD; Thu, 17 Apr 2003 05:03:55 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from tel.fer.hr (unknown [161.53.19.14]) by premijer.tel.fer.hr (Postfix) with ESMTP id 0B89C1380; Thu, 17 Apr 2003 14:03:37 +0200 (MET DST) Message-ID: <3E9E9827.4BB19197@tel.fer.hr> Date: Thu, 17 Apr 2003 14:03:51 +0200 From: Marko Zec X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: Ian Dowse References: <200304162310.aa96829@salmon.maths.tcd.ie> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: freebsd-fs@freebsd.org cc: freebsd-stable@freebsd.org cc: Kirk McKusick Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2003 12:03:57 -0000 Ian Dowse wrote: > In message <3E9C517B.6039679A@tel.fer.hr>, Marko Zec writes: > >Tempted by a lot of opposition to the concept of (optionally) ignoring > >fsync() calls when running on battery power, I wonder what effect the > >concept of unconditional delaying of _all_ disk updates by ATA-disk > >firmware will make on FS consistency in case of system crash or power > >failure? I do not want to imply such a concept is a priori bad, however > >I fail to realize its advantages over OS-controlled delaying of disk > >synching. > > Note that the ATA "delayed write" mechanism only delays writes while > the disk is spun down; at other times there is no change in behaviour. > Since the disk only spins down after it has been idle for a time, > it is very unlikely that the disk is left in an inconsistent state > while it is stopped. > > Just after the disk spins up there is a small window where the > cached writes get written out in a burst. Due to the amount of > cached data and the probable re-ordering of writes, the disk is > quite likely to be in an inconsistent state during this flurry of > writes, but the window is short so it is probably not a big issue > in practice. > > The main advantage of using the ATA delayed write mechanism is that > the disk itself can take advantage of knowing whether or not it is > spinning, whereas the OS does not have that information. The OS _does_ know (approximately) when the disk is spinning and when not. For example, if the disk is configured to stop spinning immediately after the last I/O operation, the OS can safely assume 10 or more seconds afterwards the spinning will be stopped. The OS only has to keep record (in form of timestamp or something similar) when it has issued the last I/O request to the disk. In my patch this is accomplished using the stratcalls marker, which is increased every time the strategy routine of the ATA disk driver is invoked. Therefore the OS can also successfully coalesce the pending disk updates with other outstanding I/O disk operations, which are typically reads of uncached sectors or VM swapping. > The downside > is that it is not guaranteed that fsync'd data gets written to disk > immediately, though in practice the disk tends to be spinning when > the fsync is performed due to the previous accesses. I've been using > ATA delayed writes on a few laptops for over a year and it has never > caused me any problems - it generally works just right in the sense > that the disk remains spun down when the machine is mostly idle, > and spins up when you save files from an editor etc. I agree the ATA delayed writes is a great functionality that can help save battery power. I just want to point out that it can suffer from the same consistency problems as the model of OS controlled delayed synching combined with null fsync() processing. However, if the OS controls the delaying of updates, you can turn on or off normal fsync() semantics as desired. With delaying writes in ATA firmware, you simply do not have the choice :) Cheers, Marko > Doing the write delaying in the OS is always going to be a tradeoff > between excessively delaying writes when the machine is busy and > maximising the time between spin-ups when idle, though obviously > there is more control possible over which writes get delayed and > which don't. > > Ian From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 09:32:22 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 03BE437B408 for ; Thu, 17 Apr 2003 09:32:22 -0700 (PDT) Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7888843FBD for ; Thu, 17 Apr 2003 09:32:21 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0064.cvx21-bradley.dialup.earthlink.net ([209.179.192.64] helo=mindspring.com) by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196CIp-0002Yk-00; Thu, 17 Apr 2003 09:32:00 -0700 Message-ID: <3E9ED6B3.CF700528@mindspring.com> Date: Thu, 17 Apr 2003 09:30:43 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Lukas Ertl References: <20030417114652.A11713@pcle2.cc.univie.ac.at> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a41429122b84c67fbd04aaa6225954f26a350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@freebsd.org Subject: Re: growing filesystems in 5-current X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2003 16:32:22 -0000 Lukas Ertl wrote: > Since growfs currently is not able to grow filesystems on vinum volumes in > 5-current, I started playing around with it and hacked to following patch. > On first look it seems to work, but there is still a problem I can't > explain. [ ... ] > So far, so good. Then I attach another 32 MB subdisk to the plex and try > my hacked growfs on it and I get this: [ ... ] > new file systemsize is: 32768 frags > Warning: 16160 sector(s) cannot be allocated. > growfs: 56.1MB (114912 sectors) block size 16384, fragment size 2048 > using 7 cylinder groups of 8.02MB, 513 blks, 1088 inodes. > super-block backups (for fsck -b #) at: > 65824, 82240, 98656 > ---8<--- > > Why do I loose so many sectors there? Can you help me find the bug? The simple answer is that you must be getting the size of the underlying plex wrong, if you are really losing anything. In reality, I think it's because you are expecting the stats to apply to the whole range, and what's happening is that it's only initializing the cylinder groups for the new part you added. The progression is: 65824, 82240, 98656 If we project this backwards, we see: 82240 - 65824 = 16416 98656 - 82240 = 16416 So: 65824 - 16416 = 49408 - 16416 = 32992 - 16416 = 16576 With a remainder of 160 for FS control structures or whatever. So the previous progression was: 160 (start) 16576, 32992, 49408 ...and also consists of 3 elements, following "start", so it seems you aren't losing anything, at least to me. Probably your patch is fine. -- Terry From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 09:41:56 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A47B037B401; Thu, 17 Apr 2003 09:41:56 -0700 (PDT) Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id D25C943FAF; Thu, 17 Apr 2003 09:41:55 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0064.cvx21-bradley.dialup.earthlink.net ([209.179.192.64] helo=mindspring.com) by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196CSM-0003jx-00; Thu, 17 Apr 2003 09:41:51 -0700 Message-ID: <3E9ED902.8BF30AA7@mindspring.com> Date: Thu, 17 Apr 2003 09:40:34 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Marko Zec References: <3E976EBD.C3E66EF8@tel.fer.hr> <3E9C5975.43755858@tel.fer.hr><3E9E93D8.EB16ED42@tel.fer.hr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a41429122b84c67fbd29c0a98c3d2fd16b3ca473d225a0f487350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2003 16:41:57 -0000 Marko Zec wrote: > David Schultz wrote: > > I was referring to all the places where rushjob is set to or > > incremented by syncer_maxdelay. AFAIK, it should never be that > > large. > > Hmm... Why? :) Increased latency; larger pool retention time, larger pool size, more kernel memory tied up in dependency lists for longer, more operations blocked because a dependency is already on the write list, and so locked against modification. > > I don't think you want to overload a low memory handling > > mechanism with the task of syncing the disk. > > As far as I can see the rushjob variable is used only at one place in > kern/vfs_subr.c to notify softupdates synching scheduler to start > synching earlier than the normal timers would expire. I just reused > the same mechanism to urge synching of dirty buffers when the extra > delay timer expires, or when outstanding disk I/O occurs, to coalesce > disk updates with occasional disk spinups. ...and not syncing in the normal place. I'm wondering if this really helps some real world situation; my gut feeling is that it doesn't, and it increases memory use considerably, until it's flushed. What I'd like to see is a statistics counter of "delayed syncs" that occur as a result of doing this, gathered over a period of time, along with another statistics counter of "drive spindowns". I know that this will probably end up being observer influenced enough to be merely anecdotal, but say gather two sets over an extended period of use without powering the machine down; the first set without the change, and the next set with the change. Either way it turns out, it would make a stronger case for or against than just hand-waving. 8-). -- Terry From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 09:55:42 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0E18537B401; Thu, 17 Apr 2003 09:55:42 -0700 (PDT) Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3FA3943FE1; Thu, 17 Apr 2003 09:55:41 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0064.cvx21-bradley.dialup.earthlink.net ([209.179.192.64] helo=mindspring.com) by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196Cff-0005dn-00; Thu, 17 Apr 2003 09:55:36 -0700 Message-ID: <3E9EDC38.1CE381C6@mindspring.com> Date: Thu, 17 Apr 2003 09:54:16 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Marko Zec References: <200304162310.aa96829@salmon.maths.tcd.ie> <3E9E9827.4BB19197@tel.fer.hr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4c5c06b35ece679a4cfdfcaf6b4f66f3993caf27dac41a8fd350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@freebsd.org cc: Ian Dowse cc: freebsd-stable@freebsd.org cc: Kirk McKusick Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2003 16:55:42 -0000 Marko Zec wrote: > Ian Dowse wrote: > > Note that the ATA "delayed write" mechanism only delays writes while > > the disk is spun down; at other times there is no change in behaviour. > > Since the disk only spins down after it has been idle for a time, > > it is very unlikely that the disk is left in an inconsistent state > > while it is stopped. I'm wondering if the ATA "delayed write" actually does this, or if it merely relaxes the cache restrictions, without retaining the ordering enforcement. I suspect that it does not retain the ordering enforcement, as there is no way to disconnect on a tagged queue write, because you must issue a request for status, and it can't be done as a seperate ATA operation (see the posts by the Maxtor employee, on and around January 20th of this year to the -FS list for details). You are much better off accumulating requests in the kernel in buffers, and then using the normal write mechanism to push them out to the drive ordered (IMO). This implies a barrier and new code above the bwrite interface, to keep the buffers from getting locked, and stalling you applications in user space. A problem I see here is that swap is on a totally different path, and in a different area of the disk (practically guaranteeing a seek, and a track buffer invalidation on the disk), even if you could cause swapping to be delayed (I don't think you can; FreeBSD aggressively uses memory, and so when you need to swap, you *need* to swap). > The OS _does_ know (approximately) when the disk is spinning and when not. > For example, if the disk is configured to stop spinning immediately after > the last I/O operation, the OS can safely assume 10 or more seconds > afterwards the spinning will be stopped. The OS only has to keep record (in > form of timestamp or something similar) when it has issued the last I/O > request to the disk. In my patch this is accomplished using the stratcalls > marker, which is increased every time the strategy routine of the ATA disk > driver is invoked. Therefore the OS can also successfully coalesce the > pending disk updates with other outstanding I/O disk operations, which are > typically reads of uncached sectors or VM swapping. This is useful, but not enough. You need to actually communicate the information above the block I/O layer, to the soft updates. I think, effectively, what you actually want to do is to stop the soft updates clock, rather than trying to play stupid disk tricks with timers, etc., above and beyond what you have to do. I can see it being useful on SCSI disks, as well, particularly where there are temperature issues. Though in that case, you probably are more memory starved than anything, and it will end up doing you no good. > I agree the ATA delayed writes is a great functionality that can help save > battery power. I don't; only if the write order is maintained is it "great". > I just want to point out that it can suffer from the same > consistency problems as the model of OS controlled delayed synching combined > with null fsync() processing. However, if the OS controls the delaying of > updates, you can turn on or off normal fsync() semantics as desired. With > delaying writes in ATA firmware, you simply do not have the choice :) I think people are confusing fsync() with syncd at this point. 8-(. -- Terry From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 09:57:16 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2E84C37B401 for ; Thu, 17 Apr 2003 09:57:16 -0700 (PDT) Received: from mailbox.univie.ac.at (mailbox.univie.ac.at [131.130.1.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id C5CCA43FD7 for ; Thu, 17 Apr 2003 09:57:14 -0700 (PDT) (envelope-from l.ertl@univie.ac.at) Received: from localhost.localdomain (adslle.cc.univie.ac.at [131.130.102.11]) by mailbox.univie.ac.at (8.12.2/8.12.2) with ESMTP id h3HGuxil214586; Thu, 17 Apr 2003 18:57:06 +0200 Date: Thu, 17 Apr 2003 18:56:59 +0200 (CEST) From: Lukas Ertl To: Terry Lambert In-Reply-To: <3E9ED6B3.CF700528@mindspring.com> Message-ID: <20030417184604.V719@leelou.in.tern> References: <20030417114652.A11713@pcle2.cc.univie.ac.at> <3E9ED6B3.CF700528@mindspring.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE X-DCC-ZID-Univie-Metrics: mx1 4261; Body=2 Fuz1=2 Fuz2=2 cc: freebsd-fs@freebsd.org Subject: Re: growing filesystems in 5-current X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2003 16:57:16 -0000 On Thu, 17 Apr 2003, Terry Lambert wrote: > ...and also consists of 3 elements, following "start", so it > seems you aren't losing anything, at least to me. > > Probably your patch is fine. Thanks for your answer, Terry, seems reasonable. There's still a thing that I have recognized now and that bothers me, and I can't explain this one too. Consider again this 32 MB vinum volume. If I newfs it with the default size of 65536 sectors I get this: ---8<--- # newfs -O2 -s 65536 /dev/vinum/mytest /dev/vinum/mytest: 32.0MB (65536 sectors) block size 16384, fragment size 2048 using 4 cylinder groups of 8.02MB, 513 blks, 1088 inodes. super-block backups (for fsck -b #) at: 160, 16576, 32992, 49408 # df -k /dev/vinum/mytest Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/vinum/mytest 31470 2 28952 0.% ---8<--- Four cg's with 8.02MB each? 513 blocks? Why's that? Shouldn't that be 8MB each and 512 blocks? If I growfs this one I get the behaviour I described in my first mail. Now look at this: ---8<--- # newfs -O2 -s 65535 /dev/vinum/mytest /dev/vinum/mytest: 32.0MB (65532 sectors) block size 16384, fragment size 2048 using 4 cylinder groups of 8.00MB, 512 blks, 1024 inodes. super-block backups (for fsck -b #) at: 160, 16544, 32928, 49312 # df -k /dev/vinum/mytest Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/vinum/mytest 31532 2 29008 0.% ---8<--- So I explicitly make the FS one sector smaller than the default value, and I get not only 4 cg's with 8 MB and 512 blocks (which would seem correct to me), but I also get more space available on the FS. And if I growfs this one, everything works as expected: ---8<--- # growfs /dev/vinum/mytest We strongly recommend you to make a backup before growing the Filesystem Did you backup your data (Yes/No) ? Yes new file systemsize is: 32768 frags growfs: 64.0MB (131072 sectors) block size 16384, fragment size 2048 using 8 cylinder groups of 8.00MB, 512 blks, 1024 inodes. super-block backups (for fsck -b #) at: 65696, 82080, 98464, 114848 ---8<--- What the heck is going on here? newfs bug? Or did I get something wrong? best regards, le --=20 Lukas Ertl eMail: l.ertl@univie.ac.at UNIX-Systemadministrator Tel.: (+43 1) 4277-14073 Zentraler Informatikdienst (ZID) Fax.: (+43 1) 4277-9140 der Universit=E4t Wien http://mailbox.univie.ac.at/~le/ From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 10:42:15 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5801E37B401 for ; Thu, 17 Apr 2003 10:42:15 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 941B543FA3 for ; Thu, 17 Apr 2003 10:42:10 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0064.cvx21-bradley.dialup.earthlink.net ([209.179.192.64] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196DOA-0004D8-00; Thu, 17 Apr 2003 10:41:35 -0700 Message-ID: <3E9EE6F9.6672A808@mindspring.com> Date: Thu, 17 Apr 2003 10:40:09 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Lukas Ertl References: <20030417114652.A11713@pcle2.cc.univie.ac.at> <3E9ED6B3.CF700528@mindspring.com> <20030417184604.V719@leelou.in.tern> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4d08911b75f359757177a757c45a4e0d1a8438e0f32a48e08350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@freebsd.org Subject: Re: growing filesystems in 5-current X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2003 17:42:15 -0000 Lukas Ertl wrote: > Consider again this 32 MB vinum volume. If I newfs it with the default > size of 65536 sectors I get this: > > ---8<--- > # newfs -O2 -s 65536 /dev/vinum/mytest > /dev/vinum/mytest: 32.0MB (65536 sectors) block size 16384, fragment size > 2048 > using 4 cylinder groups of 8.02MB, 513 blks, 1088 inodes. > super-block backups (for fsck -b #) at: > 160, 16576, 32992, 49408 > > # df -k /dev/vinum/mytest > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/vinum/mytest 31470 2 28952 0.% > ---8<--- > > Four cg's with 8.02MB each? 513 blocks? Why's that? Shouldn't that be 8MB > each and 512 blocks? The short answer for the first question is that the MB calculation is not what you think. The short answer for the second question is "because of the frag size". So the answer to the last question is "no". As to the available capacity, you can only use even numbers of cylinder groups, because there's a bitmap. > If I growfs this one I get the behaviour I described in my first mail. > > Now look at this: > > ---8<--- > # newfs -O2 -s 65535 /dev/vinum/mytest > /dev/vinum/mytest: 32.0MB (65532 sectors) block size 16384, fragment size > 2048 The most important thing to note here is that, before, you told it 65536, and it gave you 65536. Here you are asking for 65535, and getting 65532. That's 3 less sectors to get to a 4 sector boundary, so that you have an even multiple of the frag size of 2048 (512b * 4 = 2048). > using 4 cylinder groups of 8.00MB, 512 blks, 1024 inodes. > super-block backups (for fsck -b #) at: > 160, 16544, 32928, 49312 > > # df -k /dev/vinum/mytest > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/vinum/mytest 31532 2 29008 0.% > ---8<--- > > So I explicitly make the FS one sector smaller than the default value, and > I get not only 4 cg's with 8 MB and 512 blocks (which would seem correct > to me), but I also get more space available on the FS. If you want to know exactly where it comes from, you've added additional frags, which are counted in the numbers, so you get those additional "whole disk blocks. We can do the math again: 29008(1K) - 28952(1K) = 48(1K) = 24(2K) / 3 = 8(2K) ...and you have a total of 8 cylinder groups. Only whole file system blocks are considered for the calculation of the free reserve, so you get "more free space" that's not tied up in 16K disk blocks. > And if I growfs this one, everything works as expected: [ ... ] > What the heck is going on here? newfs bug? Or did I get something wrong? Nope; just power of two math, and an impedence mismatch in rounding. 8-). -- Terry From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 12:08:57 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2DF6737B401; Thu, 17 Apr 2003 12:08:57 -0700 (PDT) Received: from gatekeeper.oremut01.us.wh.verio.net (gatekeeper.oremut01.us.wh.verio.net [198.65.168.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7240E43FD7; Thu, 17 Apr 2003 12:08:56 -0700 (PDT) (envelope-from fclift@verio.net) Received: from mx.dmz.orem.verio.net (mx.dmz.orem.verio.net [10.1.1.10]) by gatekeeper.oremut01.us.wh.verio.net (Postfix) with ESMTP id 0EE433BF43A; Thu, 17 Apr 2003 13:08:56 -0600 (MDT) Received: from vespa.dmz.orem.verio.net (vespa.dmz.orem.verio.net [10.1.1.59]) by mx.dmz.orem.verio.net (8.11.6p2/8.11.6) with ESMTP id h3HJ8tJ98405; Thu, 17 Apr 2003 13:08:55 -0600 (MDT) Date: Thu, 17 Apr 2003 13:12:39 -0600 (MDT) From: Fred Clift X-X-Sender: To: Ian Dowse In-Reply-To: <200304162310.aa96829@salmon.maths.tcd.ie> Message-ID: <20030417130651.N46464-100000@vespa.dmz.orem.verio.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-fs@freebsd.org cc: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2003 19:08:57 -0000 On Wed, 16 Apr 2003, Ian Dowse wrote: > > Just after the disk spins up there is a small window where the > cached writes get written out in a burst. Due to the amount of > cached data and the probable re-ordering of writes, the disk is > quite likely to be in an inconsistent state during this flurry of > writes, but the window is short so it is probably not a big issue > in practice. Of course, this is when your power-supply is most likley to fail due to the sudden increased load :). I lost a disk that I had been occasionally using as a backup drive due to an effect like this. I had two scsi drives in an external enclosure, and I wanted to re-newfs the other drive in the enclosure so I started a tar job to copy all the files over and the PS blew about 20 seconds into the write since both drives were 'busy' rather than just one or the other as had been the case for quite a while as the machine sat in the corner and did nothing for a year. The target drive was hosed bad enough that you couldn't newfs it any more and the vendor's low-level format tools claimed the disk was unrepairable... I guess in a laptop, this failure mode isn't as likley as in my case... Fred -- Fred Clift - fclift@verio.net -- Remember: If brute force doesn't work, you're just not using enough. From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 12:27:13 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3F08D37B401; Thu, 17 Apr 2003 12:27:13 -0700 (PDT) Received: from mail.tel.fer.hr (zg07-196.dialin.iskon.hr [213.191.150.197]) by mx1.FreeBSD.org (Postfix) with ESMTP id 832FC43F3F; Thu, 17 Apr 2003 12:27:10 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from marko-tp (marko@[192.168.202.105]) by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3HJPFxI000836; Thu, 17 Apr 2003 21:25:20 +0200 (CEST) (envelope-from zec@tel.fer.hr) From: Marko Zec To: Terry Lambert Date: Thu, 17 Apr 2003 21:26:57 +0200 User-Agent: KMail/1.5 References: <3E976EBD.C3E66EF8@tel.fer.hr> <3E9E93D8.EB16ED42@tel.fer.hr> <3E9ED902.8BF30AA7@mindspring.com> In-Reply-To: <3E9ED902.8BF30AA7@mindspring.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200304172126.57611.zec@tel.fer.hr> cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2003 19:27:13 -0000 On Thursday 17 April 2003 18:40, Terry Lambert wrote: > Marko Zec wrote: > > David Schultz wrote: > > > I was referring to all the places where rushjob is set to or > > > incremented by syncer_maxdelay. AFAIK, it should never be that > > > large. > > > > Hmm... Why? :) > > Increased latency; larger pool retention time, larger pool size, > more kernel memory tied up in dependency lists for longer, more > operations blocked because a dependency is already on the write > list, and so locked against modification. Increasing "rushjob" has only a single consequence, and that is precisely a prompt flushing of dirty buffers. Are you sure we are talking about the same code here, rushjob in kern/vfs_subr.c, or something completely different? > > > > I don't think you want to overload a low memory handling > > > mechanism with the task of syncing the disk. > > > > As far as I can see the rushjob variable is used only at one place in > > kern/vfs_subr.c to notify softupdates synching scheduler to start > > synching earlier than the normal timers would expire. I just reused > > the same mechanism to urge synching of dirty buffers when the extra > > delay timer expires, or when outstanding disk I/O occurs, to coalesce > > disk updates with occasional disk spinups. > > ...and not syncing in the normal place. > > I'm wondering if this really helps some real world situation; > my gut feeling is that it doesn't, and it increases memory use > considerably, until it's flushed. Ignoring fsync _really_ helps in real world situations, if you keep in mind that the original purpose of the patch is to keep the disk spinned down and save battery power. > > What I'd like to see is a statistics counter of "delayed syncs" > that occur as a result of doing this, gathered over a period of > time, along with another statistics counter of "drive spindowns". > > I know that this will probably end up being observer influenced > enough to be merely anecdotal, but say gather two sets over an > extended period of use without powering the machine down; the > first set without the change, and the next set with the change. > > Either way it turns out, it would make a stronger case for or > against than just hand-waving. 8-). Such a measurement could turn out to be relevant only if one would precisely define a test load. Obviously different results could be expected if the machine would be completely idle and if it would be not. Instead of just hand-waving, could we just more closely specify what we consider a relevant load for a battery-powered laptop? :) Marko From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 12:43:47 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 978E537B404; Thu, 17 Apr 2003 12:43:47 -0700 (PDT) Received: from mail.tel.fer.hr (zg05-025.dialin.iskon.hr [213.191.138.26]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9F19C43FA3; Thu, 17 Apr 2003 12:43:45 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from marko-tp (marko@[192.168.202.105]) by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3HJfhxI000841; Thu, 17 Apr 2003 21:41:47 +0200 (CEST) (envelope-from zec@tel.fer.hr) From: Marko Zec To: Terry Lambert Date: Thu, 17 Apr 2003 21:43:26 +0200 User-Agent: KMail/1.5 References: <200304162310.aa96829@salmon.maths.tcd.ie> <3E9E9827.4BB19197@tel.fer.hr> <3E9EDC38.1CE381C6@mindspring.com> In-Reply-To: <3E9EDC38.1CE381C6@mindspring.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200304172143.26387.zec@tel.fer.hr> cc: freebsd-fs@freebsd.org cc: Ian Dowse cc: freebsd-stable@freebsd.org cc: Kirk McKusick Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2003 19:43:48 -0000 On Thursday 17 April 2003 18:54, Terry Lambert wrote: > Marko Zec wrote: > > Ian Dowse wrote: > > > Note that the ATA "delayed write" mechanism only delays writes while > > > the disk is spun down; at other times there is no change in behaviour. > > > Since the disk only spins down after it has been idle for a time, > > > it is very unlikely that the disk is left in an inconsistent state > > > while it is stopped. > > I'm wondering if the ATA "delayed write" actually does this, or if > it merely relaxes the cache restrictions, without retaining the > ordering enforcement. > > I suspect that it does not retain the ordering enforcement, as > there is no way to disconnect on a tagged queue write, because > you must issue a request for status, and it can't be done as a > seperate ATA operation (see the posts by the Maxtor employee, on > and around January 20th of this year to the -FS list for details). > > You are much better off accumulating requests in the kernel in > buffers, and then using the normal write mechanism to push them > out to the drive ordered (IMO). That is precisely what the original OS-controlled delayed synching patch does :) > This implies a barrier and new > code above the bwrite interface, to keep the buffers from getting > locked, and stalling you applications in user space. > > A problem I see here is that swap is on a totally different path, > and in a different area of the disk (practically guaranteeing a > seek, and a track buffer invalidation on the disk), even if you > could cause swapping to be delayed (I don't think you can; FreeBSD > aggressively uses memory, and so when you need to swap, you *need* > to swap). > > > The OS _does_ know (approximately) when the disk is spinning and when > > not. For example, if the disk is configured to stop spinning immediately > > after the last I/O operation, the OS can safely assume 10 or more seconds > > afterwards the spinning will be stopped. The OS only has to keep record > > (in form of timestamp or something similar) when it has issued the last > > I/O request to the disk. In my patch this is accomplished using the > > stratcalls marker, which is increased every time the strategy routine of > > the ATA disk driver is invoked. Therefore the OS can also successfully > > coalesce the pending disk updates with other outstanding I/O disk > > operations, which are typically reads of uncached sectors or VM swapping. > > This is useful, but not enough. You need to actually communicate > the information above the block I/O layer, to the soft updates. I > think, effectively, what you actually want to do is to stop the > soft updates clock Hey man, that's exactly what I have done in my patch ("stopping the soft updates clock" as you call it). On the block I/O layer I'm only checking if the disk is active or not... Are you sure you have checked out the patch / code? > , rather than trying to play stupid disk tricks > with timers, etc., above and beyond what you have to do. I can see > it being useful on SCSI disks, as well, particularly where there are > temperature issues. Though in that case, you probably are more > memory starved than anything, and it will end up doing you no good. > > > I agree the ATA delayed writes is a great functionality that can help > > save battery power. > > I don't; only if the write order is maintained is it "great". > > > I just want to point out that it can suffer from the same > > consistency problems as the model of OS controlled delayed synching > > combined with null fsync() processing. However, if the OS controls the > > delaying of updates, you can turn on or off normal fsync() semantics as > > desired. With delaying writes in ATA firmware, you simply do not have the > > choice :) > > I think people are confusing fsync() with syncd at this point. 8-(. > > -- Terry From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 17:08:50 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 69F7437B40B; Thu, 17 Apr 2003 17:08:47 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6E12843FDD; Thu, 17 Apr 2003 17:08:46 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0101.cvx22-bradley.dialup.earthlink.net ([209.179.198.101] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196JQm-00035J-00; Thu, 17 Apr 2003 17:08:41 -0700 Message-ID: <3E9F4195.C830A6AD@mindspring.com> Date: Thu, 17 Apr 2003 17:06:45 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Marko Zec References: <3E976EBD.C3E66EF8@tel.fer.hr> <3E9E93D8.EB16ED42@tel.fer.hr> <3E9ED902.8BF30AA7@mindspring.com> <200304172126.57611.zec@tel.fer.hr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4c29abeabdf7b825298ca9fb9a6590f09548b785378294e88350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 00:08:50 -0000 Marko Zec wrote: > On Thursday 17 April 2003 18:40, Terry Lambert wrote: > > Marko Zec wrote: > > > David Schultz wrote: > > > > I was referring to all the places where rushjob is set to or > > > > incremented by syncer_maxdelay. AFAIK, it should never be that > > > > large. > > > > > > Hmm... Why? :) > > > > Increased latency; larger pool retention time, larger pool size, > > more kernel memory tied up in dependency lists for longer, more > > operations blocked because a dependency is already on the write > > list, and so locked against modification. > > Increasing "rushjob" has only a single consequence, and that is precisely a > prompt flushing of dirty buffers. Are you sure we are talking about the same > code here, rushjob in kern/vfs_subr.c, or something completely different? I'm talking about what David Schultz was talking about when you said "Hmm... Why?". 8-). If you increase the syncer delay, you increase the amount of unsynced data that's outstanding, on average, which is what makes doing it dangerous. Especially right now, where there is a lot of code that doesn't expect a NULL return from the kernel malloc, but the new kernel malloc can always return NULL. Any additional amount of memory pressure you force on things through added latency delays is Bad(tm). > > I'm wondering if this really helps some real world situation; > > my gut feeling is that it doesn't, and it increases memory use > > considerably, until it's flushed. > > Ignoring fsync _really_ helps in real world situations, if you keep in mind > that the original purpose of the patch is to keep the disk spinned down and > save battery power. I understand the original purpose; I'd still llike to see stats to back up whether or not it accomplishes it. 8-). > > I know that this will probably end up being observer influenced > > enough to be merely anecdotal, but say gather two sets over an > > extended period of use without powering the machine down; the > > first set without the change, and the next set with the change. > > > > Either way it turns out, it would make a stronger case for or > > against than just hand-waving. 8-). > > Such a measurement could turn out to be relevant only if one would precisely > define a test load. Which is why I suggested a statistical load, instead. FreeBSD isn't well enough put together to allow you to replay an I/O load like that, particularly a sparse one, so the best you are going to be able to get is statistical significance. Actually, if you think about it, it would be hard to prove that even a repeatable sparse load was unbiased for a particular result, so you're back to gathering statistical data anyway, to create a couple of "representative" load sets. > Obviously different results could be expected if the > machine would be completely idle and if it would be not. Instead of just > hand-waving, could we just more closely specify what we consider a relevant > load for a battery-powered laptop? :) I guess that would be "any load where the fsync patch helps"? 8-) 8-). I think it would probaby be betweer to stall the soft updates clock, flush the pending block I/O out (to unlock the buffers), and then spin down the disks under OS control. You could really guarantee relevence in that case. Anyone who complained could pick their own relevence criteria, and hack the code. -- Terry From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 17:18:54 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A75C037B401; Thu, 17 Apr 2003 17:18:54 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id DAD4543F75; Thu, 17 Apr 2003 17:18:53 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0101.cvx22-bradley.dialup.earthlink.net ([209.179.198.101] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196JaU-00052N-00; Thu, 17 Apr 2003 17:18:43 -0700 Message-ID: <3E9F4413.D294E69E@mindspring.com> Date: Thu, 17 Apr 2003 17:17:23 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Marko Zec References: <200304162310.aa96829@salmon.maths.tcd.ie> <3E9E9827.4BB19197@tel.fer.hr> <3E9EDC38.1CE381C6@mindspring.com> <200304172143.26387.zec@tel.fer.hr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4a048fd8ed6dd21c47885f8499087e66e3ca473d225a0f487350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@freebsd.org cc: Ian Dowse cc: freebsd-stable@freebsd.org cc: Kirk McKusick Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 00:18:55 -0000 Marko Zec wrote: > > You are much better off accumulating requests in the kernel in > > buffers, and then using the normal write mechanism to push them > > out to the drive ordered (IMO). > > That is precisely what the original OS-controlled delayed synching patch does > :) Yeah, but the spin-down isn't really under OS control, except as a sort of statistical hysteresis thing. 8-). The real problem that people have with the patch is that it is moderately evil, in that the fsync() doesn't block until it has actually sync'ed the data out to the disk, like fsync() is supposed to... and it lets dependent operations keep going. So you break the semantics. I think people would be happier if you just stopped the soft updates sync clock, and then if someone actually fsync()'ed, or the dependency list got too big, it spun up the disk, completed all the I/O quickly, and then spun it down again. > > This is useful, but not enough. You need to actually communicate > > the information above the block I/O layer, to the soft updates. I > > think, effectively, what you actually want to do is to stop the > > soft updates clock > > Hey man, that's exactly what I have done in my patch ("stopping the soft > updates clock" as you call it). On the block I/O layer I'm only checking if > the disk is active or not... Are you sure you have checked out the patch / > code? See above; do that AND preserve the fsync() semantics, and you'll have something (still thinking there's a confusion between fsync() semantics and syncd operation). -- Terry From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 17:46:09 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 836C037B401; Thu, 17 Apr 2003 17:46:05 -0700 (PDT) Received: from mail.tel.fer.hr (zg06-163.dialin.iskon.hr [213.191.148.164]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0119343F3F; Thu, 17 Apr 2003 17:46:02 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from marko-tp (marko@[192.168.201.107]) by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3I0iBxI000859; Fri, 18 Apr 2003 02:44:13 +0200 (CEST) (envelope-from zec@tel.fer.hr) From: Marko Zec To: Terry Lambert Date: Fri, 18 Apr 2003 02:45:52 +0200 User-Agent: KMail/1.5 References: <200304162310.aa96829@salmon.maths.tcd.ie> <200304172143.26387.zec@tel.fer.hr> <3E9F4413.D294E69E@mindspring.com> In-Reply-To: <3E9F4413.D294E69E@mindspring.com> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200304180245.53107.zec@tel.fer.hr> cc: freebsd-fs@freebsd.org cc: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 00:46:10 -0000 On Friday 18 April 2003 02:17, Terry Lambert wrote: > I think people would be happier if you just stopped the soft > updates sync clock, and then if someone actually fsync()'ed, or > the dependency list got too big, it spun up the disk, completed > all the I/O quickly, and then spun it down again. The updated patch does precisely what you just described above. It already includes a tunable vfs.ena_lazy_fsync (off by default) which allows choosing whether blocking (standard) or null- fsync() semantics apply. Check out http://docs.freebsd.org/cgi/getmsg.cgi?fetch=15720+0+current/freebsd-fs :) Marko From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 18:09:14 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5134837B401; Thu, 17 Apr 2003 18:09:14 -0700 (PDT) Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id 92D0A43F75; Thu, 17 Apr 2003 18:09:13 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0101.cvx22-bradley.dialup.earthlink.net ([209.179.198.101] helo=mindspring.com) by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196KNK-0000DZ-00; Thu, 17 Apr 2003 18:09:11 -0700 Message-ID: <3E9F4FE4.9B8567DC@mindspring.com> Date: Thu, 17 Apr 2003 18:07:48 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Marko Zec References: <200304162310.aa96829@salmon.maths.tcd.ie> <200304172143.26387.zec@tel.fer.hr> <3E9F4413.D294E69E@mindspring.com> <200304180245.53107.zec@tel.fer.hr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4ca8a7942351ba88739234b2a71b4f4c7350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@freebsd.org cc: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 01:09:14 -0000 Marko Zec wrote: > On Friday 18 April 2003 02:17, Terry Lambert wrote: > > I think people would be happier if you just stopped the soft > > updates sync clock, and then if someone actually fsync()'ed, or > > the dependency list got too big, it spun up the disk, completed > > all the I/O quickly, and then spun it down again. > > The updated patch does precisely what you just described above. It already > includes a tunable vfs.ena_lazy_fsync (off by default) which allows choosing > whether blocking (standard) or null- fsync() semantics apply. Check out > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=15720+0+current/freebsd-fs > :) No, you are missing my previous point: the check for free space should include a check for number of elements *TOTAL* in all slots on the soft updates timer wheel. Otherwise it can eat all of memory. The free space check only works in the case that you've done a delete and are allocating new space: the case where you are doing more and more allocations/opverwrites of data is not handled, and can grow to eat all available kernel memory. There was in fact a bug, early on, that Matt Dillon worked around that caused it under load, and it was in exactly the code you are touching. Also, the "ena_lazy_fsync" needs to be overridable, based on barriers in the dependency list: it's not acceptable to violate the POSIX semantics over trying to delay fsync(). You insert a dependency that is blocked by some other dependency already there, and you're in semantic trouble. Normally, this would be prevented by a write lock on the buffer in question, but it's not queued for write, because the wheels not moving. The "ena_lazy_fsync" is really a problem, if it permits an operation, such as the update of a database index file to point to a new record that has been written to the database data file. At this point, fsync() is used for implied contracts. The only way you can legitimately delay it is if there isn't an implied contract, which you should be able to see as a barrier in the soft update dependency list. Under what circumstances you you find that delaying fsync() helps you? What program are you running that calls fsync()? I think that maybe you are running a program (like qmail) that doesn't trust the FS to comply with POSIX, so it inserts some extra fsync()'s "just in case we are running on ext2fs" or whatever. And it still needs a sysctl that counts the number of them that actually get delayed. Even if you don't use it for a statistical check, it will check you on the number of times fsync() (and sync()) get called by someone. If it's a small number, you need to fix the bogus program, rather than hack the kernel. 8-). -- Terry From owner-freebsd-fs@FreeBSD.ORG Thu Apr 17 21:46:26 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F191337B401; Thu, 17 Apr 2003 21:46:25 -0700 (PDT) Received: from harmony.village.org (rover.bsdimp.com [204.144.255.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 112C043FA3; Thu, 17 Apr 2003 21:46:25 -0700 (PDT) (envelope-from imp@bsdimp.com) Received: from localhost (warner@rover2.village.org [10.0.0.1]) by harmony.village.org (8.12.8/8.12.3) with ESMTP id h3I4kMA7086789; Thu, 17 Apr 2003 22:46:23 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Thu, 17 Apr 2003 22:46:01 -0600 (MDT) Message-Id: <20030417.224601.38718174.imp@bsdimp.com> To: cdillon@wolves.k12.mo.us From: "M. Warner Losh" In-Reply-To: <20030416100921.U91118@duey.wolves.k12.mo.us> References: <20030415160925.U86854@duey.wolves.k12.mo.us> <3E9D157E.96FD09AE@mindspring.com> <20030416100921.U91118@duey.wolves.k12.mo.us> X-Mailer: Mew version 2.1 on Emacs 21.2 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: freebsd-fs@freebsd.org cc: mckusick@McKusick.COM cc: das@freebsd.org cc: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 04:46:26 -0000 In message: <20030416100921.U91118@duey.wolves.k12.mo.us> Chris Dillon writes: : quickly. Even with a life of two million write cycles, the : "occasional" 30-second round of updates that happen to write the same : bits over and over will give your flash part a life of only 1.9 years : (2000000 writes * 30 seconds apart = 60000000 seconds to failure). : Also, I doubt you'll actually get 2 million writes out of the average : consumer flash part. I've gotten 10M writes in the lab here on parts that didn't fail. Also, that's 2M writes per cell, and the CF parts wear average. The reason why this happens is because there are typically more than 1 cell per part. However, you are *MUCH* better off logging to a memory file system with cron. Or better yet, not running cron or not logging it at all. We log our stuff to /var/log (and don't bother logging the cron messages) and newsyslog to a small writable partition once a day or so on the average. So using this as an argument to trash fsync is not very strong. There are much better ways to deal with these issues for CF systems. You are much better off doing a read-only / with a small writable partition for things that need to be saved (we call ours /mod). We have a write rate of about 10 per hours, which gives our system an expected life in excess of 20 years. Our company has shipped over 200 flash systems, and we've had 3 flashes fail, all due to infant mortality... Warner From owner-freebsd-fs@FreeBSD.ORG Fri Apr 18 00:13:32 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9F57737B401; Fri, 18 Apr 2003 00:13:32 -0700 (PDT) Received: from HAL9000.homeunix.com (12-233-57-131.client.attbi.com [12.233.57.131]) by mx1.FreeBSD.org (Postfix) with ESMTP id AB73D43FB1; Fri, 18 Apr 2003 00:13:31 -0700 (PDT) (envelope-from das@FreeBSD.org) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.9/8.12.5) with ESMTP id h3I7DU9E009228; Fri, 18 Apr 2003 00:13:30 -0700 (PDT) (envelope-from das@FreeBSD.org) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.9/8.12.5/Submit) id h3I7DTni009227; Fri, 18 Apr 2003 00:13:29 -0700 (PDT) (envelope-from das@FreeBSD.org) Date: Fri, 18 Apr 2003 00:13:29 -0700 From: David Schultz To: Marko Zec Message-ID: <20030418071329.GA9125@HAL9000.homeunix.com> Mail-Followup-To: Marko Zec , freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org References: <3E976EBD.C3E66EF8@tel.fer.hr> <20030414101935.GB18110@HAL9000.homeunix.com> <3E9C5975.43755858@tel.fer.hr> <20030416101136.GA868@HAL9000.homeunix.com> <3E9E93D8.EB16ED42@tel.fer.hr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3E9E93D8.EB16ED42@tel.fer.hr> cc: freebsd-fs@FreeBSD.org cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 07:13:33 -0000 On Thu, Apr 17, 2003, Marko Zec wrote: > David Schultz wrote: > > > On Tue, Apr 15, 2003, Marko Zec wrote: > > > > > > > - The fiddling with rushjob seems rather arbitrary. You can probably > > > > just let the existing code increment it as necessary and force a sync > > > > if the value gets too high. > > > > > > If rushjob is would not be used for forcing prompt synching, the original code > > > could not guarantee the sync to occur immediately. Instead, the synching could > > > be further delayed for up to 30 seconds, which is not desirable if our major > > > design goal is to do as much disk I/O as possible in a small time interval and > > > leave the disk idle otherwise. > > > > I was referring to all the places where rushjob is set to or > > incremented by syncer_maxdelay. AFAIK, it should never be that > > large. > > Hmm... Why? :) > > > I don't think you want to overload a low memory handling > > mechanism with the task of syncing the disk. > > As far as I can see the rushjob variable is used only at one place in > kern/vfs_subr.c to notify softupdates synching scheduler to start synching earlier > than the normal timers would expire. I just reused the same mechanism to urge > synching of dirty buffers when the extra delay timer expires, or when outstanding > disk I/O occurs, to coalesce disk updates with occasional disk spinups. When the system is low on memory or has reached a related limit, it tries to sync data to disk faster by slowly increasing the value of rushjob until the situation improves. If the syncer is able to keep up, it will process data faster and pull rushjob back down to zero. If rushjob gets too high (half the maximum sync delay, usually 15), the system resorts to other measures. Your code bumps rushjob up by the arbitrary value 32, which is rather large. Doing so is going to throw things out of whack. What you would probably want to do is leave rushjob alone. If it ever becomes nonzero, the syncer should wake up and start writing again. If you would like to write the data out more quickly whenever the disks start up so you can make them spin down again, look at softdep_request_cleanup() in -CURRENT. But really, even getting fsync() to do *remotely* the right thing (i.e. correct ordering but no guarantee of writing data to stable storage when in power save mode) is going to be *really*hard*. Warner has a much better suggestion. From owner-freebsd-fs@FreeBSD.ORG Fri Apr 18 05:49:20 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4C2BE37B401; Fri, 18 Apr 2003 05:49:20 -0700 (PDT) Received: from HAL9000.homeunix.com (12-233-57-131.client.attbi.com [12.233.57.131]) by mx1.FreeBSD.org (Postfix) with ESMTP id 949E343F3F; Fri, 18 Apr 2003 05:49:19 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.9/8.12.5) with ESMTP id h3ICnG9E011022; Fri, 18 Apr 2003 05:49:16 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.9/8.12.5/Submit) id h3ICnEHq011021; Fri, 18 Apr 2003 05:49:15 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Date: Fri, 18 Apr 2003 05:49:14 -0700 From: David Schultz To: Terry Lambert Message-ID: <20030418124914.GA10979@HAL9000.homeunix.com> Mail-Followup-To: Terry Lambert , Marko Zec , freebsd-fs@freebsd.org, Ian Dowse , freebsd-stable@freebsd.org, Kirk McKusick References: <200304162310.aa96829@salmon.maths.tcd.ie> <3E9E9827.4BB19197@tel.fer.hr> <3E9EDC38.1CE381C6@mindspring.com> <200304172143.26387.zec@tel.fer.hr> <3E9F4413.D294E69E@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3E9F4413.D294E69E@mindspring.com> cc: freebsd-fs@FreeBSD.ORG cc: freebsd-stable@FreeBSD.ORG cc: Ian Dowse cc: Kirk McKusick Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 12:49:20 -0000 On Thu, Apr 17, 2003, Terry Lambert wrote: > Marko Zec wrote: > > > You are much better off accumulating requests in the kernel in > > > buffers, and then using the normal write mechanism to push them > > > out to the drive ordered (IMO). > > > > That is precisely what the original OS-controlled delayed synching patch does > > :) > > Yeah, but the spin-down isn't really under OS control, except > as a sort of statistical hysteresis thing. 8-). The OS can know exactly when the disk is spinning if it tells the disk not to timeout all by itself with the IDLE command, and explicitly tells it to IDLE IMMEDIATE at the appropriate time. But being exact about this isn't particularly important. As for the ATA delayed write feature, I don't believe it will guarantee consistency. This is true even if the drive doesn't reorder writes, which it is free to do. Consider a correctness constraint given by the partial ordering of blocks A->B->A. That is, we have to first make a change to block A, then update block B, then make a different change to block A. This is going to be fairly common if a fair number of writes are queued; it happens whenever an editor saves a file using the correct fsync/rename sequence, for instance. The disk will coalesce the two writes to A in its cache and therefore violate the constraint. From owner-freebsd-fs@FreeBSD.ORG Fri Apr 18 09:24:38 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 63CEE37B405; Fri, 18 Apr 2003 09:24:38 -0700 (PDT) Received: from gatekeeper.oremut01.us.wh.verio.net (gatekeeper.oremut01.us.wh.verio.net [198.65.168.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id ACEC543FB1; Fri, 18 Apr 2003 09:24:36 -0700 (PDT) (envelope-from fclift@verio.net) Received: from mx.dmz.orem.verio.net (mx.dmz.orem.verio.net [10.1.1.10]) by gatekeeper.oremut01.us.wh.verio.net (Postfix) with ESMTP id F27713BF437; Fri, 18 Apr 2003 10:24:35 -0600 (MDT) Received: from vespa.dmz.orem.verio.net (vespa.dmz.orem.verio.net [10.1.1.59]) by mx.dmz.orem.verio.net (8.11.6p2/8.11.6) with ESMTP id h3IGOZJ30971; Fri, 18 Apr 2003 10:24:35 -0600 (MDT) Date: Fri, 18 Apr 2003 10:28:24 -0600 (MDT) From: Fred Clift X-X-Sender: To: David Schultz In-Reply-To: <20030418124914.GA10979@HAL9000.homeunix.com> Message-ID: <20030418101259.M49571-100000@vespa.dmz.orem.verio.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-fs@freebsd.org cc: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 16:24:38 -0000 On Fri, 18 Apr 2003, David Schultz wrote: > explicitly tells it to IDLE IMMEDIATE at the appropriate time. > But being exact about this isn't particularly important. I think it might be nice to have something like this that immediately spins the disk down after the burst of writes - though, if I remember correctly, keeping a disk spinning takes far far less power than spinning it up, so shutting down the disk 3 minutes earlier than you otherwise might wont be that big of a power savings compared to avoiding spinning it up so much. > As for the ATA delayed write feature, I don't believe it will > guarantee consistency. This is true even if the drive doesn't There has been a lot of talk on this thread about how the (not-enabled-by-default) fsync portion of this patch violates the 'fsync contract' and violates guarantees of consistency. As was stated by the creator of the patch, this is intented to only be used in situations where it is relatively 'unimportant' to have these guarantees. His typical usage is on a non-mission-critical machine (his laptop) that doesn't contain data which, _when_lost_, isn't going to be irreplaceable. There have been many objections about various databases not getting updates, qmail/sendmail loosing mail, vi removing/overwirting a file, etc, but aparently these are not the cases for which this patch was designed. If a person cared about these possiblities, he wouldn't turn this functionality on. If on the other hand, a person were stuck at the doctor's office waiting room, with low battery, playing nethack, then perhaps this patch is just what you want. Can we stop going on and on about how terrible this patch is for 'important' and 'unrecoveralbe' data? This patch should not be used on any machine that has irreplacable data. If I were using this on my laptop working on code and I LOST my changes, I can always cvs update to get the file back and start working again, having lost 30 minutes of work. Of course my laptop doesn't get major mission-critical use either... On the other hand _if_ the patch could be slightly modified to still guarantee fsync semantics (when qmail writes mail, vi overwrites a file, or mysql updates a table, etc) so that data would be safer, but not significantly degrade the utility of the patch then I'd say lets 1) make the small change (ie disk spin-up/write/spin-down on every fsync? will this take more power than it is worth?) and 2) incorperate this into FreeBSD and let people get on with using it!. (It doesn't have to be commited into the tree to get use, but it certainly would get much more use this way.) Fred -- Fred Clift - fclift@verio.net -- Remember: If brute force doesn't work, you're just not using enough. From owner-freebsd-fs@FreeBSD.ORG Fri Apr 18 09:46:26 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 22FA537B405; Fri, 18 Apr 2003 09:46:26 -0700 (PDT) Received: from pa-plum1b-166.pit.adelphia.net (pa-plum1b-122.pit.adelphia.net [24.53.161.122]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2EECF43FCB; Fri, 18 Apr 2003 09:46:25 -0700 (PDT) (envelope-from wmoran@potentialtech.com) Received: from potentialtech.com (working [172.16.0.95]) h3IGkNwl000376; Fri, 18 Apr 2003 12:46:24 -0400 (EDT) (envelope-from wmoran@potentialtech.com) Message-ID: <3EA02BDF.7020306@potentialtech.com> Date: Fri, 18 Apr 2003 12:46:23 -0400 From: Bill Moran User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.2.1) Gecko/20030301 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Fred Clift References: <20030418101259.M49571-100000@vespa.dmz.orem.verio.net> In-Reply-To: <20030418101259.M49571-100000@vespa.dmz.orem.verio.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-fs@freebsd.org cc: David Schultz cc: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 16:46:26 -0000 Fred Clift wrote: > On Fri, 18 Apr 2003, David Schultz wrote: > >>explicitly tells it to IDLE IMMEDIATE at the appropriate time. >>But being exact about this isn't particularly important. > > I think it might be nice to have something like this that immediately > spins the disk down after the burst of writes - though, if I remember > correctly, keeping a disk spinning takes far far less power than spinning > it up, so shutting down the disk 3 minutes earlier than you otherwise > might wont be that big of a power savings compared to avoiding spinning it > up so much. > >>As for the ATA delayed write feature, I don't believe it will >>guarantee consistency. This is true even if the drive doesn't > > There has been a lot of talk on this thread about how the > (not-enabled-by-default) fsync portion of this patch violates the 'fsync > contract' and violates guarantees of consistency. As was stated by the > creator of the patch, this is intented to only be used in situations where > it is relatively 'unimportant' to have these guarantees. His typical > usage is on a non-mission-critical machine (his laptop) that doesn't > contain data which, _when_lost_, isn't going to be irreplaceable. > > There have been many objections about various databases not getting > updates, qmail/sendmail loosing mail, vi removing/overwirting a file, etc, > but aparently these are not the cases for which this patch was designed. > If a person cared about these possiblities, he wouldn't turn this > functionality on. > > If on the other hand, a person were stuck at the doctor's office waiting > room, with low battery, playing nethack, then perhaps this patch is just > what you want. > > Can we stop going on and on about how terrible this patch is for > 'important' and 'unrecoveralbe' data? This patch should not be used on > any machine that has irreplacable data. If I were using this on my laptop > working on code and I LOST my changes, I can always cvs update to get the > file back and start working again, having lost 30 minutes of work. Of > course my laptop doesn't get major mission-critical use either... > > On the other hand _if_ the patch could be slightly modified to still > guarantee fsync semantics (when qmail writes mail, vi overwrites a file, > or mysql updates a table, etc) so that data would be safer, but not > significantly degrade the utility of the patch then I'd say lets 1) make > the small change (ie disk spin-up/write/spin-down on every fsync? will > this take more power than it is worth?) and 2) incorperate this into > FreeBSD and let people get on with using it!. (It doesn't have to be > commited into the tree to get use, but it certainly would get much more > use this way.) I've been following this thread for a while out of curiosity. I understand the dangers of suicical fsync, and I understand the benefits. I know this isn't normally the kind of thing that should get said on these lists, but if anyone is taking a vote, I agree with Fred 100%. Include the functionality, document the dangers, and leave it off by default. Despite the hundred-billion places where it would be a bad idea, I feel there are a number of places where it would be helpful. -- Bill Moran Potential Technologies http://www.potentialtech.com From owner-freebsd-fs@FreeBSD.ORG Fri Apr 18 11:13:36 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F2CF937B401; Fri, 18 Apr 2003 11:13:35 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 26CA443FD7; Fri, 18 Apr 2003 11:13:35 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0577.cvx22-bradley.dialup.earthlink.net ([209.179.200.67] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196aMd-0000Nt-00; Fri, 18 Apr 2003 11:13:32 -0700 Message-ID: <3EA03FF1.280B6810@mindspring.com> Date: Fri, 18 Apr 2003 11:12:01 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: David Schultz References: <200304162310.aa96829@salmon.maths.tcd.ie> <3E9E9827.4BB19197@tel.fer.hr> <3E9EDC38.1CE381C6@mindspring.com> <200304172143.26387.zec@tel.fer.hr> <3E9F4413.D294E69E@mindspring.com> <20030418124914.GA10979@HAL9000.homeunix.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a491ef9124fa8972c225596c66c40ce4b6350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@FreeBSD.ORG cc: freebsd-stable@FreeBSD.ORG cc: Ian Dowse cc: Kirk McKusick Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 18:13:36 -0000 David Schultz wrote: > > Yeah, but the spin-down isn't really under OS control, except > > as a sort of statistical hysteresis thing. 8-). > > The OS can know exactly when the disk is spinning if it tells the > disk not to timeout all by itself with the IDLE command, and > explicitly tells it to IDLE IMMEDIATE at the appropriate time. > But being exact about this isn't particularly important. As it sits, the implementation is via a timer that is not under OS control. It would be nice if it used this method, instead, since it would allow anyone who wanted to to implement a "policy", if the default policy bothered them (e.g. do it when the screen saver kicks on, or do it when there haven't been any mouse/keyboard input events for XX seconds, etc. -- you could even hook this to whether the delayed fsync is active or not, which seems a better time for it to be active, anyway). > As for the ATA delayed write feature, I don't believe it will > guarantee consistency. It doesn't. I checked, after voicing my suspions of it. > This is true even if the drive doesn't > reorder writes, which it is free to do. Consider a correctness > constraint given by the partial ordering of blocks A->B->A. That > is, we have to first make a change to block A, then update block > B, then make a different change to block A. This is going to be > fairly common if a fair number of writes are queued; it happens > whenever an editor saves a file using the correct fsync/rename > sequence, for instance. The disk will coalesce the two writes to > A in its cache and therefore violate the constraint. You can't turn the reordering off, and your example is exactly the "barrier" case I had previously described. 8-). -- Terry From owner-freebsd-fs@FreeBSD.ORG Fri Apr 18 13:43:18 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 96CF237B401; Fri, 18 Apr 2003 13:43:18 -0700 (PDT) Received: from mail.tel.fer.hr (zg04-042.dialin.iskon.hr [213.191.137.43]) by mx1.FreeBSD.org (Postfix) with ESMTP id CF01243F85; Fri, 18 Apr 2003 13:43:15 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from marko-tp (marko@[192.168.201.107]) by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3IKfPxI000931; Fri, 18 Apr 2003 22:41:26 +0200 (CEST) (envelope-from zec@tel.fer.hr) From: Marko Zec To: David Schultz Date: Fri, 18 Apr 2003 22:43:05 +0200 User-Agent: KMail/1.5 References: <3E976EBD.C3E66EF8@tel.fer.hr> <3E9E93D8.EB16ED42@tel.fer.hr> <20030418071329.GA9125@HAL9000.homeunix.com> In-Reply-To: <20030418071329.GA9125@HAL9000.homeunix.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200304182243.05739.zec@tel.fer.hr> cc: freebsd-fs@FreeBSD.org cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 20:43:18 -0000 On Friday 18 April 2003 09:13, David Schultz wrote: > When the system is low on memory or has reached a related limit, > it tries to sync data to disk faster by slowly increasing the > value of rushjob until the situation improves. If the syncer is > able to keep up, it will process data faster and pull rushjob back > down to zero. True. > If rushjob gets too high (half the maximum sync > delay, usually 15), the system resorts to other measures. Which measures, and in which cases? The only two chunks of code in the entire -stable kernel that probe the value of rushjob (indirectly through invoking speedup_syncer() ) are newdirrem() and inodedep_lookup() in ufs/ffs/ffs_softdep.c. Neither of these two will either corrupt a single bit of data or crash the system if rushjob gets higher than max syncdelay / 2. > Your code bumps rushjob up by the arbitrary value 32, which is > rather large. Doing so is going to throw things out of whack. Which things and how? > What you would probably want to do is leave rushjob alone. If it > ever becomes nonzero, the syncer should wake up and start writing > again. Sure, that's precisely why I increment rushjob - to instruct the syncer to start synching when I want it to. What's wrong with that? > If you would like to write the data out more quickly > whenever the disks start up so you can make them spin down again, > look at softdep_request_cleanup() in -CURRENT. > > But really, even getting fsync() to do *remotely* the right thing > (i.e. correct ordering but no guarantee of writing data to stable > storage when in power save mode) is going to be *really*hard*. > Warner has a much better suggestion. If I'm not mistaking Warner was talking about using memory based FS and periodically synching it to a flash based device. Such a concept is perfectly sane for appliances using solid state disks, however I don't see how it can be applied to a typical laptop. Marko From owner-freebsd-fs@FreeBSD.ORG Fri Apr 18 14:08:07 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 78E7337B40B; Fri, 18 Apr 2003 14:08:07 -0700 (PDT) Received: from mail.tel.fer.hr (zg07-053.dialin.iskon.hr [213.191.150.54]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7A04F43FE9; Fri, 18 Apr 2003 14:08:05 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from marko-tp (marko@[192.168.201.107]) by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3IL6CxI000936; Fri, 18 Apr 2003 23:06:16 +0200 (CEST) (envelope-from zec@tel.fer.hr) From: Marko Zec To: Terry Lambert Date: Fri, 18 Apr 2003 23:07:53 +0200 User-Agent: KMail/1.5 References: <200304162310.aa96829@salmon.maths.tcd.ie> <200304180245.53107.zec@tel.fer.hr> <3E9F4FE4.9B8567DC@mindspring.com> In-Reply-To: <3E9F4FE4.9B8567DC@mindspring.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200304182307.53890.zec@tel.fer.hr> cc: freebsd-fs@freebsd.org cc: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 21:08:08 -0000 On Friday 18 April 2003 03:07, Terry Lambert wrote: > No, you are missing my previous point: the check for free space > should include a check for number of elements *TOTAL* in all slots > on the soft updates timer wheel. Otherwise it can eat all of > memory. > > The free space check only works in the case that you've done a > delete and are allocating new space: the case where you are doing > more and more allocations/opverwrites of data is not handled, and > can grow to eat all available kernel memory. There was in fact a > bug, early on, that Matt Dillon worked around that caused it under > load, and it was in exactly the code you are touching. If what you are saying were true, than one could simply crash an _unpached_ system by doing a lot of FS write operations. What my patch does is that it just temporarily suspends the softupdates "wheels" as you call it. However, if VM or another ffs subsytem indicates (by increasing the value of rushjob) that buffers should get flushed more frequently, than my patch will _immediately_ drop out of the delay loop and allow the syncing to proceed ASAP. I really do not see what can be wrong with such a concept? > > Under what circumstances you you find that delaying fsync() > helps you? What program are you running that calls fsync()? The vi editor, pretty much every e-mail client, and so on... > Even if you don't use it for a > statistical check, it will check you on the number of times > fsync() (and sync()) get called by someone. If it's a small > number, you need to fix the bogus program, rather than hack > the kernel. 8-). No, those programs are not bogus, and neither is the kernel. I just want to have a method to keep the damn disk spinned down, that's all. Marko From owner-freebsd-fs@FreeBSD.ORG Fri Apr 18 14:21:33 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BDD4B37B401; Fri, 18 Apr 2003 14:21:33 -0700 (PDT) Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2300C43F85; Fri, 18 Apr 2003 14:21:33 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0240.cvx22-bradley.dialup.earthlink.net ([209.179.198.240] helo=mindspring.com) by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196dIY-0005ad-00; Fri, 18 Apr 2003 14:21:31 -0700 Message-ID: <3EA06C07.A34F1C31@mindspring.com> Date: Fri, 18 Apr 2003 14:20:07 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Marko Zec References: <3E976EBD.C3E66EF8@tel.fer.hr> <3E9E93D8.EB16ED42@tel.fer.hr> <200304182243.05739.zec@tel.fer.hr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a435745805e1197074fb358af05677c83f667c3043c0873f7e350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 21:21:34 -0000 Marko Zec wrote: > On Friday 18 April 2003 09:13, David Schultz wrote: > > Your code bumps rushjob up by the arbitrary value 32, which is > > rather large. Doing so is going to throw things out of whack. > > Which things and how? > > > What you would probably want to do is leave rushjob alone. If it > > ever becomes nonzero, the syncer should wake up and start writing > > again. > > Sure, that's precisely why I increment rushjob - to instruct the syncer to > start synching when I want it to. What's wrong with that? Touching rushjob is probably not a good idea. The main technical (not philosophical) problem with the patch as it sits is that you can cause the soft updates wheel to wrap around. Then when you write things out, they write out of order. The purpose of the wheel is to allow placing of operations at some relative offset in the future to an outstanding operation, to ensure ordering. No matter what else you do, you can not allow the wheel to "wrap". Because the offsets are "future relative", that means that you have to flush at some number of wheel entries equal to: wrap_boundary - the_largest_potential_future_offset - 1. Making the wheel bigger is probably acceptable, but then you will exacerbate the memory problem that rushjob was invented to resolve (please do a "cvs log" and look at the checkin comments; I still believe it was "dillon" who made the change). -- Terry From owner-freebsd-fs@FreeBSD.ORG Fri Apr 18 14:24:56 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5C79B37B401; Fri, 18 Apr 2003 14:24:56 -0700 (PDT) Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id BD3B543FD7; Fri, 18 Apr 2003 14:24:55 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0240.cvx22-bradley.dialup.earthlink.net ([209.179.198.240] helo=mindspring.com) by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196dLp-00061Q-00; Fri, 18 Apr 2003 14:24:54 -0700 Message-ID: <3EA06CD2.E299D864@mindspring.com> Date: Fri, 18 Apr 2003 14:23:30 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Marko Zec References: <200304162310.aa96829@salmon.maths.tcd.ie> <200304180245.53107.zec@tel.fer.hr> <3E9F4FE4.9B8567DC@mindspring.com> <200304182307.53890.zec@tel.fer.hr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4fc7764252737ce9c695e8026c7825103667c3043c0873f7e350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@freebsd.org cc: freebsd-stable@freebsd.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 21:24:56 -0000 Marko Zec wrote: > On Friday 18 April 2003 03:07, Terry Lambert wrote: > > No, you are missing my previous point: the check for free space > > should include a check for number of elements *TOTAL* in all slots > > on the soft updates timer wheel. Otherwise it can eat all of > > memory. > > > > The free space check only works in the case that you've done a > > delete and are allocating new space: the case where you are doing > > more and more allocations/opverwrites of data is not handled, and > > can grow to eat all available kernel memory. There was in fact a > > bug, early on, that Matt Dillon worked around that caused it under > > load, and it was in exactly the code you are touching. > > If what you are saying were true, than one could simply crash an _unpached_ > system by doing a lot of FS write operations. No. See the checkin comments for "rushjob". > What my patch does is that it > just temporarily suspends the softupdates "wheels" as you call it. However, > if VM or another ffs subsytem indicates (by increasing the value of rushjob) > that buffers should get flushed more frequently, than my patch will > _immediately_ drop out of the delay loop and allow the syncing to proceed > ASAP. I really do not see what can be wrong with such a concept? No. See last posting: the wheel can not be allowed to "wrap". -- Terry From owner-freebsd-fs@FreeBSD.ORG Fri Apr 18 14:49:13 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EBB3937B404; Fri, 18 Apr 2003 14:49:12 -0700 (PDT) Received: from mail.tel.fer.hr (zg06-176.dialin.iskon.hr [213.191.148.177]) by mx1.FreeBSD.org (Postfix) with ESMTP id E960443FB1; Fri, 18 Apr 2003 14:49:10 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from marko-tp (marko@[192.168.201.107]) by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3ILlGxI000941; Fri, 18 Apr 2003 23:47:21 +0200 (CEST) (envelope-from zec@tel.fer.hr) From: Marko Zec To: Terry Lambert Date: Fri, 18 Apr 2003 23:48:58 +0200 User-Agent: KMail/1.5 References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304182243.05739.zec@tel.fer.hr> <3EA06C07.A34F1C31@mindspring.com> In-Reply-To: <3EA06C07.A34F1C31@mindspring.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200304182348.58356.zec@tel.fer.hr> cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2003 21:49:13 -0000 On Friday 18 April 2003 23:20, Terry Lambert wrote: > > Sure, that's precisely why I increment rushjob - to instruct the syncer > > to start synching when I want it to. What's wrong with that? > > Touching rushjob is probably not a good idea. > > The main technical (not philosophical) problem with the patch > as it sits is that you can cause the soft updates wheel to wrap > around. No, that just cannot happen. You are probably confusing rushjob with syncer_delayno, which gets reset to 0 each time it reaches the value of syncer_maxdelay. The rushjob variable simply tells the syncer how many times it should iterate _sequentially_ through the softupdates queues before getting to sleep on lbolt. > > Then when you write things out, they write out of order. Uhh.. NO! > The purpose of the wheel is to allow placing of operations at > some relative offset in the future to an outstanding operation, > to ensure ordering. True. And this has not changed with my patch. > No matter what else you do, you can not allow the wheel to > "wrap". Because the offsets are "future relative", that means > that you have to flush at some number of wheel entries equal > to: > > wrap_boundary - the_largest_potential_future_offset - 1. > > Making the wheel bigger is probably acceptable, but then you > will exacerbate the memory problem that rushjob was invented > to resolve (please do a "cvs log" and look at the checkin > comments; I still believe it was "dillon" who made the change). Where from did you get the idea I'm making the wheel bigger? The size of the softupdates "wheel" is determined by the value of syncer_maxdelay, which not only I haven't touched at all, but is also completely unrelated to the rushjob variable. If it is of any relevance for this discussion, I want to add that I've been running my system with extended delaying all the time for the last two weeks (even when on AC power). I have had absolutely no problems nor have lost a single bit of data, even during the most stresfull tests such as untarring of huge archives, or making the kernel etc. Not to mention this is also my primary and "production" machine, with all my e-mail on it etc. Marko From owner-freebsd-fs@FreeBSD.ORG Fri Apr 18 17:36:05 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E338837B405; Fri, 18 Apr 2003 17:36:05 -0700 (PDT) Received: from HAL9000.homeunix.com (12-233-57-131.client.attbi.com [12.233.57.131]) by mx1.FreeBSD.org (Postfix) with ESMTP id E2D7743FE9; Fri, 18 Apr 2003 17:36:04 -0700 (PDT) (envelope-from das@FreeBSD.org) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.9/8.12.5) with ESMTP id h3J0Zx9E012929; Fri, 18 Apr 2003 17:35:59 -0700 (PDT) (envelope-from das@FreeBSD.org) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.9/8.12.5/Submit) id h3J0Zwnj012928; Fri, 18 Apr 2003 17:35:58 -0700 (PDT) (envelope-from das@FreeBSD.org) Date: Fri, 18 Apr 2003 17:35:58 -0700 From: David Schultz To: Fred Clift Message-ID: <20030419003558.GA12856@HAL9000.homeunix.com> Mail-Followup-To: Fred Clift , freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org References: <20030418124914.GA10979@HAL9000.homeunix.com> <20030418101259.M49571-100000@vespa.dmz.orem.verio.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030418101259.M49571-100000@vespa.dmz.orem.verio.net> cc: freebsd-fs@FreeBSD.org cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2003 00:36:06 -0000 On Fri, Apr 18, 2003, Fred Clift wrote: > There have been many objections about various databases not getting > updates, qmail/sendmail loosing mail, vi removing/overwirting a file, etc, > but aparently these are not the cases for which this patch was designed. > If a person cared about these possiblities, he wouldn't turn this > functionality on. > > If on the other hand, a person were stuck at the doctor's office waiting > room, with low battery, playing nethack, then perhaps this patch is just > what you want. If you're in the doctor's office writing a long letter, and following a crash you find that not only the latest changes but the *entire* *file* just vanished, you might not be such a happy camper. If you leave fsync() alone, your computer will do exactly what you want it to do. It will guarantee that *some* version of the file is on disk, and when you tell your editor to save, it will guarantee that the *latest* version is on disk. So if you want the disk to stay in power save mode, you just don't ask your editor to write it to disk. If you're playing nethack, on the other hand, you won't be fsyncing anyway because nethack doesn't have state that's vitally important. From owner-freebsd-fs@FreeBSD.ORG Fri Apr 18 18:30:50 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1989937B404; Fri, 18 Apr 2003 18:30:50 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1B39943FE9; Fri, 18 Apr 2003 18:30:48 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0114.cvx21-bradley.dialup.earthlink.net ([209.179.192.114] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196hBj-0007jF-00; Fri, 18 Apr 2003 18:30:44 -0700 Message-ID: <3EA0A647.BEC5931A@mindspring.com> Date: Fri, 18 Apr 2003 18:28:39 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Marko Zec References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304182243.05739.zec@tel.fer.hr> <3EA06C07.A34F1C31@mindspring.com> <200304182348.58356.zec@tel.fer.hr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a419d7a1c3f61e46c70ea0413bc5b08d503ca473d225a0f487350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2003 01:30:50 -0000 Marko Zec wrote: > > The main technical (not philosophical) problem with the patch > > as it sits is that you can cause the soft updates wheel to wrap > > around. > > No, that just cannot happen. You are probably confusing rushjob with > syncer_delayno, which gets reset to 0 each time it reaches the value of > syncer_maxdelay. The rushjob variable simply tells the syncer how many times > it should iterate _sequentially_ through the softupdates queues before > getting to sleep on lbolt. Obviously I am not explaining myself correctly. I guess the next step would be to offer my own patch set for doing what you are trying to do. Before I do that, let me try one more time. I think that it is important that the value of syncer_delayno needs to continue to be incremented once a second, and that the modified sched_sync(), which with your patch no longer does this, needs to used it's own counter. In other words, I think that you need to implement a two handed clock algorithm, to keep the buckets from getting too deep with work items, and in case there is some dependency which is not being accounted for that has been working because there is an implicit delay of 1 second or more in vn_syncer_add_to_worklist() calls (your patch would break this, so without taking this into account, we would have to retest all of soft updates). > > Then when you write things out, they write out of order. > > Uhh.. NO! Uh, yes; potentially they do. See the implicit dependency situation described above. There are other cases, too, but they are much more complicated. I wish Kirk would speak up in more technical detail about the problems you are potentially introducing; they require a deep understanding of the soft updates code. > > The purpose of the wheel is to allow placing of operations at > > some relative offset in the future to an outstanding operation, > > to ensure ordering. > > True. And this has not changed with my patch. No, it has changed. It's changed both in the depth of the queue entries, and it's changed in the relative spacing of implicit dependencies, and it's changed in the relative depth, for two or more dependent operations with future offsets. In the depth case, when the code runs, is going to stall the system for a really long time, relatively, because there are a number of worklists which are *substantially* deep, because vn_syncer_add_to_worklist() was using a syncer_delano that has been assumed to be updated once a second, and never changed during your stall. This means that the worklist represented by syncer_workitem_pending[syncer_delayno] is going to contain *almost all work* that was enqueued in the interim. The problem with this is that in the for(;;) loop in sched_sync() in the "if (LIST_FIRST(slp) == vp)" code block, you are likely to run yourself into a panic. See the comment about "sync_fsync() moves it to a different slot so we are safe"? That comment is no longer true. > > No matter what else you do, you can not allow the wheel to > > "wrap". Because the offsets are "future relative", that means > > that you have to flush at some number of wheel entries equal > > to: > > > > wrap_boundary - the_largest_potential_future_offset - 1. > > > > Making the wheel bigger is probably acceptable, but then you > > will exacerbate the memory problem that rushjob was invented > > to resolve (please do a "cvs log" and look at the checkin > > comments; I still believe it was "dillon" who made the change). > > Where from did you get the idea I'm making the wheel bigger? The size of the > softupdates "wheel" is determined by the value of syncer_maxdelay, which not > only I haven't touched at all, but is also completely unrelated to the > rushjob variable. I didn't get the idea you were making the wheel bigger. That's the problem: you probably need to make the wheel bigger, so that the [new!] second hand on the two handed clock has more time until it runs into first hand on the clock. You will have to do this so you can bound the vn_syncer_add_to_worklist() add delay to something less than "syncer_maxdelay - 2"; I suggest "syncer_maxdelay / 2", as a first approximation (remember this needs to be a power of 2, due to syncer_mask). You also want to count workitem insertions and removals, so you have a total count. This is easy: it's already protected by sync_mtx, so all you need is a static global counter. When the counter gets to a certain size (configurable), you have too much memory tied up in the work queue -- so you flush it. > If it is of any relevance for this discussion, I want to add that I've been > running my system with extended delaying all the time for the last two weeks > (even when on AC power). I have had absolutely no problems nor have lost a > single bit of data, even during the most stresfull tests such as untarring of > huge archives, or making the kernel etc. Not to mention this is also my > primary and "production" machine, with all my e-mail on it etc. Write some code that specifically stresses a specific FS dependency on a set of files, iteratively, over and over again. Then close all the files, and call "sync", and wait. Or run your test, and then unmount the FS on which the test was running, before your delayed fsync gets a change to run, and then do a shutdown. When the system comes back up, check the data to see if it's what it's supposed to be. Basically, you are going to have to provide something *other than* "rushjob" to be able to cause unmounts and other "special" code to be able to force the fsync (consider removable media, like flash, if nothing else). -- Terry From owner-freebsd-fs@FreeBSD.ORG Fri Apr 18 19:21:49 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8E1D137B401 for ; Fri, 18 Apr 2003 19:21:49 -0700 (PDT) Received: from laptop.tenebras.com (laptop.tenebras.com [66.92.188.18]) by mx1.FreeBSD.org (Postfix) with SMTP id 944E843FBF for ; Fri, 18 Apr 2003 19:21:47 -0700 (PDT) (envelope-from kudzu@tenebras.com) Received: (qmail 22690 invoked from network); 19 Apr 2003 02:21:44 -0000 Received: from queequeg.tenebras.com (HELO tenebras.com) (192.168.188.241) by 0 with SMTP; 19 Apr 2003 02:21:44 -0000 Message-ID: <3EA0B2B8.4000600@tenebras.com> Date: Fri, 18 Apr 2003 19:21:44 -0700 From: Michael Sierchio User-Agent: Mozilla/5.0 (X11; U; Linux i386; en-US; rv:1.3) Gecko/20030312 X-Accept-Language: en-us, en, zh-cn, zh-tw MIME-Version: 1.0 To: Terry Lambert References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304182243.05739.zec@tel.fer.hr> <3EA06C07.A34F1C31@mindspring.com> <200304182348.58356.zec@tel.fer.hr> <3EA0A647.BEC5931A@mindspring.com> In-Reply-To: <3EA0A647.BEC5931A@mindspring.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-fs@FreeBSD.org cc: freebsd-stable@FreeBSD.org cc: David Schultz Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2003 02:21:49 -0000 Terry Lambert wrote: > Obviously I am not explaining myself correctly. I guess the next > step would be to offer my own patch set for doing what you are > trying to do. Before I do that, let me try one more time. Forgive me, but let me cut through this Gordian Knot and just say: the proposal is for the introduction of a feature of questionable value, with consequences that have not been adequately considered. It should never be committed. From owner-freebsd-fs@FreeBSD.ORG Sat Apr 19 00:03:27 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CBDD037B401; Sat, 19 Apr 2003 00:03:27 -0700 (PDT) Received: from HAL9000.homeunix.com (12-233-57-131.client.attbi.com [12.233.57.131]) by mx1.FreeBSD.org (Postfix) with ESMTP id 15DA743FDF; Sat, 19 Apr 2003 00:03:27 -0700 (PDT) (envelope-from das@FreeBSD.org) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.9/8.12.5) with ESMTP id h3J73P9E014134; Sat, 19 Apr 2003 00:03:25 -0700 (PDT) (envelope-from das@FreeBSD.org) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.9/8.12.5/Submit) id h3J73K2o014133; Sat, 19 Apr 2003 00:03:20 -0700 (PDT) (envelope-from das@FreeBSD.org) Date: Sat, 19 Apr 2003 00:03:20 -0700 From: David Schultz To: Marko Zec Message-ID: <20030419070320.GA14034@HAL9000.homeunix.com> Mail-Followup-To: Marko Zec , freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org References: <3E976EBD.C3E66EF8@tel.fer.hr> <3E9E93D8.EB16ED42@tel.fer.hr> <20030418071329.GA9125@HAL9000.homeunix.com> <200304182243.05739.zec@tel.fer.hr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200304182243.05739.zec@tel.fer.hr> cc: freebsd-fs@FreeBSD.org cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2003 07:03:28 -0000 On Fri, Apr 18, 2003, Marko Zec wrote: > > If rushjob gets too high (half the maximum sync > > delay, usually 15), the system resorts to other measures. > > Which measures, and in which cases? The only two chunks of code in the entire > -stable kernel that probe the value of rushjob Look at -CURRENT. > > Your code bumps rushjob up by the arbitrary value 32, which is > > rather large. Doing so is going to throw things out of whack. > > Which things and how? My complaint was simply that you're incrementing rushjob by some number you pulled out of a hat, namely 32. This causes the syncer to spin around 32 times every time someone calls sync(), and most of the time, it won't have anything to do. Moreover, in -CURRENT, you can lead the system to believe that resources are scarcer than they really are. Look at what request_cleanup() does when speedup_syncer() fails, for instance. > > What you would probably want to do is leave rushjob alone. If it > > ever becomes nonzero, the syncer should wake up and start writing > > again. > > Sure, that's precisely why I increment rushjob - to instruct the syncer to > start synching when I want it to. What's wrong with that? You seem to be overthinking this. On a relatively quiescent laptop, all you have to do is have the drives spin down and suspend the operation of the syncer as long as no processes are blocked on I/O. If this results in too many dirty buffers, the system will automatically notice this and kick the syncer. You don't need to step in and kick the syncer 32 times or disable fsync() in order to get reasonable benefits without breaking things. This simple approach can easily be refined later if need be. > > But really, even getting fsync() to do *remotely* the right thing > > (i.e. correct ordering but no guarantee of writing data to stable > > storage when in power save mode) is going to be *really*hard*. > > Warner has a much better suggestion. > > If I'm not mistaking Warner was talking about using memory based FS and > periodically synching it to a flash based device. Such a concept is perfectly > sane for appliances using solid state disks, however I don't see how it can > be applied to a typical laptop. It's the same principle. For flash, you want to limit the number of writes since you only get a finite number of them. For laptops, you want to limit the number of writes because keeping your drive spinning drains the battery. In both cases, you can solve the problem by using a memory filesystem for things like cron that write frequently. From owner-freebsd-fs@FreeBSD.ORG Sat Apr 19 02:53:30 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 11DE437B401; Sat, 19 Apr 2003 02:53:30 -0700 (PDT) Received: from mail.tel.fer.hr (zg05-039.dialin.iskon.hr [213.191.138.40]) by mx1.FreeBSD.org (Postfix) with ESMTP id 41B0043FBF; Sat, 19 Apr 2003 02:53:25 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from marko-tp (marko@[192.168.201.107]) by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3J9pMxI000991; Sat, 19 Apr 2003 11:51:27 +0200 (CEST) (envelope-from zec@tel.fer.hr) From: Marko Zec To: Terry Lambert Date: Sat, 19 Apr 2003 11:53:03 +0200 User-Agent: KMail/1.5 References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304182348.58356.zec@tel.fer.hr> <3EA0A647.BEC5931A@mindspring.com> In-Reply-To: <3EA0A647.BEC5931A@mindspring.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200304191153.03970.zec@tel.fer.hr> cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2003 09:53:30 -0000 On Saturday 19 April 2003 03:28, Terry Lambert wrote: > I think that it is important that the value of syncer_delayno > needs to continue to be incremented once a second, and that the > modified sched_sync(), which with your patch no longer does this, > needs to used it's own counter. If you look again at the _unpatched_ syncer loop, you will clearly see that syncer_delayno is not guaranteed to be incremented only once a second. If speedup_syncer() increases the value of rushjob, the syncer_delayno will be increased up to rushjob times in a second in the syncer loop, depending if the buffers can be flushed fast enough. The expedited synching will proceed until rushjob drops down to 0. My patch didn't invent nor did change that model at all. > In other words, I think that you need to implement a two handed > clock algorithm, to keep the buckets from getting too deep with > work items, and in case there is some dependency which is not > being accounted for that has been working because there is an > implicit delay of 1 second or more in vn_syncer_add_to_worklist() > calls (your patch would break this, so without taking this into > account, we would have to retest all of soft updates). Again, please look at the _unmodified_ syncer code. My patch didn't change a thing regarding the possibility for the syncer to try flushing more than one syncer_workitem_pending queue in a second. > > > > Then when you write things out, they write out of order. > > > > Uhh.. NO! > > Uh, yes; potentially they do. See the implicit dependency > situation described above. There are other cases, too, but > they are much more complicated. I wish Kirk would speak up > in more technical detail about the problems you are potentially > introducing; they require a deep understanding of the soft > updates code. I wish also... > > > The purpose of the wheel is to allow placing of operations at > > > some relative offset in the future to an outstanding operation, > > > to ensure ordering. > > > > True. And this has not changed with my patch. > > No, it has changed. It's changed both in the depth of the > queue entries, and it's changed in the relative spacing of > implicit dependencies, and it's changed in the relative depth, > for two or more dependent operations with future offsets. How? By simply stopping the softupdates clock for a couple of seconds (ok, minutes :) more than usual? > In the depth case, when the code runs, is going to stall the > system for a really long time, relatively, because there are > a number of worklists which are *substantially* deep, because > vn_syncer_add_to_worklist() was using a syncer_delano that has > been assumed to be updated once a second, and never changed > during your stall. This means that the worklist represented by > syncer_workitem_pending[syncer_delayno] is going to contain > *almost all work* that was enqueued in the interim. > > The problem with this is that in the for(;;) loop in sched_sync() > in the "if (LIST_FIRST(slp) == vp)" code block, you are likely > to run yourself into a panic. See the comment about "sync_fsync() > moves it to a different slot so we are safe"? That comment is no > longer true. Is it possible you are confusing the sync_fsync() routine in kern/vfs_subr.c (which I didn't touch) with the modified fsync() handler in kern/vfs_syscalls.c. [the rest of the debate deleted] Can we please either slowly conclude this discussion, or provide a feasible alternative to the proposed patch? I start feeling like we are wasting tremendeous amount of time here while going nowhere. Marko From owner-freebsd-fs@FreeBSD.ORG Sat Apr 19 11:20:06 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D585737B401; Sat, 19 Apr 2003 11:20:06 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id F2F6843FAF; Sat, 19 Apr 2003 11:20:05 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0234.cvx22-bradley.dialup.earthlink.net ([209.179.198.234] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196wwS-0000ja-00; Sat, 19 Apr 2003 11:20:01 -0700 Message-ID: <3EA19303.1DB825C8@mindspring.com> Date: Sat, 19 Apr 2003 11:18:43 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Marko Zec References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304182348.58356.zec@tel.fer.hr> <3EA0A647.BEC5931A@mindspring.com> <200304191153.03970.zec@tel.fer.hr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a46cdfd73c98dcc69db972ae3c87b5828d350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2003 18:20:07 -0000 Marko Zec wrote: > On Saturday 19 April 2003 03:28, Terry Lambert wrote: > > I think that it is important that the value of syncer_delayno > > needs to continue to be incremented once a second, and that the > > modified sched_sync(), which with your patch no longer does this, > > needs to used it's own counter. > > If you look again at the _unpatched_ syncer loop, you will clearly see that > syncer_delayno is not guaranteed to be incremented only once a second. If > speedup_syncer() increases the value of rushjob, the syncer_delayno will be > increased up to rushjob times in a second in the syncer loop, depending if > the buffers can be flushed fast enough. The expedited synching will proceed > until rushjob drops down to 0. > > My patch didn't invent nor did change that model at all. The problem is not the distribution of the entries removed from the wheel, it is the distribution of entries inserted onto the wheel. Running the wheel forward quickly during removal is not a problem. *Not* running the wheel forward _at all_ during insertion *is* a problem. What we care about here is distribution of entries which exist, not distribution of entries which no longer exist. > > In other words, I think that you need to implement a two handed > > clock algorithm, to keep the buckets from getting too deep with > > work items, and in case there is some dependency which is not > > being accounted for that has been working because there is an > > implicit delay of 1 second or more in vn_syncer_add_to_worklist() > > calls (your patch would break this, so without taking this into > > account, we would have to retest all of soft updates). > > Again, please look at the _unmodified_ syncer code. My patch didn't change a > thing regarding the possibility for the syncer to try flushing more than one > syncer_workitem_pending queue in a second. Again, I don't care about flushing for this case: I care about insertion. > > > > Then when you write things out, they write out of order. > > > > > > Uhh.. NO! > > > > Uh, yes; potentially they do. See the implicit dependency > > situation described above. There are other cases, too, but > > they are much more complicated. I wish Kirk would speak up > > in more technical detail about the problems you are potentially > > introducing; they require a deep understanding of the soft > > updates code. > > I wish also... Be aware that I was at least associated with the FreeBSD soft updates code implementation (I did the original "make it compile and link pass", among other things, when Whistle, the company I worked for, paid Kirk to do the implementation), and I was also part of a team which implemented soft updates for FFS in a different environment in 1995. I'm not trying to claim authority here, since I'm one of the sides in this disagreement, but realize I'm not totally clueless when it comes to soft updates. > > No, it has changed. It's changed both in the depth of the > > queue entries, and it's changed in the relative spacing of > > implicit dependencies, and it's changed in the relative depth, > > for two or more dependent operations with future offsets. > > How? By simply stopping the softupdates clock for a couple of seconds (ok, > minutes :) more than usual? Say you stop the clock for 30 seconds: syncer_delayno is not incremented during those 30 seconds. Now, during that time, vn_syncer_add_to_worklist() is called once a second to add workitems. Say they are the same workitems (delay 0, delay 6). Now (relative to the original syncer_delayno), the buckets that are represented by "syncer_workitem_pending[syncer_delayno+delay]" vn_syncer_add_to_worklist() instance | | syncer_workitem_pending[original_syncer_delayno + N] | 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 v 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [your patch:] 30 30 30 [not your patch:] 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 0 Your patch causes us to get two single buckets 30 deep. Not having your patch gives us 37 buckets; 12 are 1 deep, 18 are 2 deep. Does this make sense now? It is about insertion in the face of a stopped clock, and how bursty the resulting "catchup" will be. If you look at the code, you will see that there is no opportunity for other code to run in a single bucket list traversal, but in the rushjob case of multiple bucket traversals, the system gets control back in between buckets, so the operation of the system is much, much smoother in the case that individual buckets are not allowed to get too deep. This is normally accomplished by incrementing the value of syncer_delayno once per second, as a continuous function, rather than a bursty increment once every 30 seconds. > Is it possible you are confusing the sync_fsync() routine in kern/vfs_subr.c > (which I didn't touch) with the modified fsync() handler in > kern/vfs_syscalls.c. No. I am only talking about the vn_syncer_add_to_worklist() and sched_sync() functions, and how they interact on the syncer_delayno clock. > Can we please either slowly conclude this discussion, or provide a feasible > alternative to the proposed patch? I start feeling like we are wasting > tremendeous amount of time here while going nowhere. Please read the above, specifically the diagram of bucket list depths with a working clock vs. a stopped clock, and the fact that the bucket list traversals are atomic, but multiple bucket traversals of the same number of equally distributed work items are not. I guess I'm willing to provide an alternate patch, if I have to do so, but I would prefer that you understand the issues yourself, since that makes one more person clueful about the issues, in case the few of the rest of us get hit by a bus. 8-). -- Terry From owner-freebsd-fs@FreeBSD.ORG Sat Apr 19 12:35:29 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 871EE37B401; Sat, 19 Apr 2003 12:35:29 -0700 (PDT) Received: from mail.tel.fer.hr (zg02-002.dialin.iskon.hr [213.191.130.3]) by mx1.FreeBSD.org (Postfix) with ESMTP id DF19E43FB1; Sat, 19 Apr 2003 12:35:09 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from marko-tp (marko@[192.168.201.107]) by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3JJX9xI001016; Sat, 19 Apr 2003 21:33:14 +0200 (CEST) (envelope-from zec@tel.fer.hr) From: Marko Zec To: Terry Lambert Date: Sat, 19 Apr 2003 21:34:51 +0200 User-Agent: KMail/1.5 References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304191153.03970.zec@tel.fer.hr> <3EA19303.1DB825C8@mindspring.com> In-Reply-To: <3EA19303.1DB825C8@mindspring.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200304192134.51484.zec@tel.fer.hr> cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2003 19:35:30 -0000 On Saturday 19 April 2003 20:18, Terry Lambert wrote: > Say you stop the clock for 30 seconds: syncer_delayno is not > incremented during those 30 seconds. Now, during that time, > vn_syncer_add_to_worklist() is called once a second to add > workitems. Say they are the same workitems (delay 0, delay 6). > Now (relative to the original syncer_delayno), the buckets that > are represented by "syncer_workitem_pending[syncer_delayno+delay]" > > vn_syncer_add_to_worklist() instance > > | syncer_workitem_pending[original_syncer_delayno + N] > | 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 > | 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 > | 6 > > v > 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 [your patch:] > 30 30 30 > [not your patch:] > 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 > 0 > > Your patch causes us to get two single buckets 30 deep. > > Not having your patch gives us 37 buckets; 12 are 1 deep, 18 are 2 > deep. The whole purpose of the patch is to delay disk writes when running on battery power. In such a case it is completely irellevant whether the buckets get more or less evenly distributed over all the delay queues, or they get concentrated in only two (or more precisely: in three). In either case, all the queues will be flushed as quickly as possible when the disk gets spinned up, in order for the disk to be active for the shortest possible time. > Does this make sense now? It is about insertion in the face of a > stopped clock, and how bursty the resulting "catchup" will be. And that is exactly what the user of battery powered laptop wants - to have infrequent but bursty writes to disk, and an idle disk at all other times. I have claimed such a functionality from my very first post. This is a feature, not a bug. What's wrong with that? > If you look at the code, you will see that there is no opportunity > for other code to run in a single bucket list traversal, but in the > rushjob case of multiple bucket traversals, the system gets control > back in between buckets, so the operation of the system is much, > much smoother in the case that individual buckets are not allowed > to get too deep. This is normally accomplished by incrementing the > value of syncer_delayno once per second, as a continuous function, > rather than a bursty increment once every 30 seconds. I completely agree with you that smoothness will be sacrificed, but again, please do have in mind the original purpose of the patch. When running on battery power, smoothness is a bad thing. When running on AC, the patch will become inactive, so 100% normal operation is automatically restored, and you get all the smoothness back. > Please read the above, specifically the diagram of bucket list > depths with a working clock vs. a stopped clock, and the fact > that the bucket list traversals are atomic, but multiple bucket > traversals of the same number of equally distributed work items > are not. True. But this still doesn't justify your claims from previous posts that the patched system is likely to corrupt data or crash the system. I am still pretty much convinced it will do neither of these two things, both by looking at the scope of the modifications the patch introduces, and from my experience with a production system running all the time on a patched kernel. Cheers, Marko From owner-freebsd-fs@FreeBSD.ORG Sat Apr 19 13:56:58 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5933537B401; Sat, 19 Apr 2003 13:56:58 -0700 (PDT) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 77ADA43FBD; Sat, 19 Apr 2003 13:56:57 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0234.cvx22-bradley.dialup.earthlink.net ([209.179.198.234] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 196zOF-0002sx-00; Sat, 19 Apr 2003 13:56:52 -0700 Message-ID: <3EA1B72D.B8B96268@mindspring.com> Date: Sat, 19 Apr 2003 13:53:01 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Marko Zec References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304191153.03970.zec@tel.fer.hr> <3EA19303.1DB825C8@mindspring.com> <200304192134.51484.zec@tel.fer.hr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4b20386bb8968a0ef5f2a25a2ddac2613a7ce0e8f8d31aa3f350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2003 20:56:58 -0000 Marko Zec wrote: > > If you look at the code, you will see that there is no opportunity > > for other code to run in a single bucket list traversal, but in the > > rushjob case of multiple bucket traversals, the system gets control > > back in between buckets, so the operation of the system is much, > > much smoother in the case that individual buckets are not allowed > > to get too deep. This is normally accomplished by incrementing the > > value of syncer_delayno once per second, as a continuous function, > > rather than a bursty increment once every 30 seconds. > > I completely agree with you that smoothness will be sacrificed, but again, > please do have in mind the original purpose of the patch. When running on > battery power, smoothness is a bad thing. When running on AC, the patch will > become inactive, so 100% normal operation is automatically restored, and you > get all the smoothness back. You are still missing the point. If I have 30 entries each on 2 queues, the rest of the system gets an opportunity to run once between what might be significant bouts of I/O, which is the slowest thing you can do. If I have 2 entries each on 30 queue, the rest of the system gets an opportunity to run 29 times between much less significant bouts of I/O (1/15th of the latency). So the difference is between the disk spinning up and the system freezing for the duration, or the disk spinning up and the system freezing unnoticbly to the user for 1/10th of a second per worklist for a larger number of worklists. Add to this that the batches of I/O are unlikely to be on the same track, and therefore there's seek latency as well, and you have a significant freeze that's going to appear like the machine is locked up. I guess if you are willing to monitor the mailing lists and explain why this isn't a bad thing every time users complain about it, it's no big deal, ecept to people who want the feature, but don't agree with your implementation. 8-). > > Please read the above, specifically the diagram of bucket list > > depths with a working clock vs. a stopped clock, and the fact > > that the bucket list traversals are atomic, but multiple bucket > > traversals of the same number of equally distributed work items > > are not. > > True. But this still doesn't justify your claims from previous posts > that the patched system is likely to corrupt data or crash the system. The previous claim for potential panic was based on the fact that the same bucket was being used for the next I/O, rather than the same + 1 bucket, which is what the code assumed. I just took it for granted that the failure case was self-evident. You need to read the comment in the sched_sync() code, and understand why it is saying what it is saying: /* * Note: VFS vnodes can remain on the * worklist too with no dirty blocks, but * since sync_fsync() moves it to a different * slot we are safe. */ Your changes makes it so the insertion *does not* put it in a different slot (because the fsync is most likely delayed). Therefore we are *not* safe. The other FS corruption occurs because you don't specifically disable the delaying code before a shutdown or umount or mount -u -o ro, etc.. > I am still pretty much convinced it will do neither of these two > things, both by looking at the scope of the modifications the > patch introduces, My analysis (and several other people's) differs from yours. > and from my experience with a production system > running all the time on a patched kernel. This is totally irrelevent; it's anecdotal, and therefore has nothing whatsoever to do with provable correctness. "From my experience" is the same argument that Linux used to justify async mounts in ext2fs, and they were provably wrong. - I guess at this point, I have to ask: what's wrong with Ian Dowse's patches to do approximately the same thing? -- Terry From owner-freebsd-fs@FreeBSD.ORG Sat Apr 19 14:51:10 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 056C137B401; Sat, 19 Apr 2003 14:51:10 -0700 (PDT) Received: from mail.tel.fer.hr (zg02-229.dialin.iskon.hr [213.191.130.230]) by mx1.FreeBSD.org (Postfix) with ESMTP id E7B0B43FBF; Sat, 19 Apr 2003 14:51:04 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from marko-tp (marko@[192.168.201.107]) by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3JLn6xI001024; Sat, 19 Apr 2003 23:49:11 +0200 (CEST) (envelope-from zec@tel.fer.hr) From: Marko Zec To: Terry Lambert Date: Sat, 19 Apr 2003 23:50:48 +0200 User-Agent: KMail/1.5 References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304192134.51484.zec@tel.fer.hr> <3EA1B72D.B8B96268@mindspring.com> In-Reply-To: <3EA1B72D.B8B96268@mindspring.com> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: 7bit Message-Id: <200304192350.48576.zec@tel.fer.hr> cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2003 21:51:10 -0000 On Saturday 19 April 2003 22:53, Terry Lambert wrote: > You are still missing the point. > > If I have 30 entries each on 2 queues, the rest of the system > gets an opportunity to run once between what might be significant > bouts of I/O, which is the slowest thing you can do. > > If I have 2 entries each on 30 queue, the rest of the system > gets an opportunity to run 29 times between much less significant > bouts of I/O (1/15th of the latency). > > So the difference is between the disk spinning up and the system > freezing for the duration, or the disk spinning up and the > system freezing unnoticbly to the user for 1/10th of a second > per worklist for a larger number of worklists. > > Add to this that the batches of I/O are unlikely to be on the > same track, and therefore there's seek latency as well, and you > have a significant freeze that's going to appear like the machine > is locked up. Does the laptop owner care if the system freezes for a couple of miliseconds more than usual? If you have tried the patch yourself, you would certainly observe that the freeze you are talking about is completely unnoticable. Even under the highest loads, my system can accumulate at most around 300 dirty buffers before starting to sync for one reason or another. Modern ATA disks posess a significant amount of RAM available for write caching, which will compensate even for such write bursts. Therefore the disk head seek latency you mentioned won't be noticeable in most cases. > Your changes makes it so the insertion *does not* put it in a > different slot (because the fsync is most likely delayed). ^^^^^ > Therefore we are *not* safe. Again, in my understanding the (modified) fsync() handler is completely unrelated to the (unmodified) sync_fsync() function. > The other FS corruption occurs because you don't specifically > disable the delaying code before a shutdown or umount or mount > -u -o ro, etc.. Such a problem simply does not exist. Please try out the patch, enable the delaying, fill in as much dirty buffers as possible, and unmount the FS. You will notice that a) all the dirty buffers will be automatically written to the disk; b) the unmount operation will succeed; c) the system will not crash and d) the FS will be perfectly consistent at the next mount. > My analysis (and several other people's) differs from yours. > > > and from my experience with a production system > > running all the time on a patched kernel. > > This is totally irrelevent; it's anecdotal, and therefore has > nothing whatsoever to do with provable correctness. No offence please, but your argumentation would look much more convincing if you could provoke a system crash with the patch enabled, and then provide a backtrace. If the patch is as bad as you are suggesting, that shouldn't be that hard to do, should it? Marko From owner-freebsd-fs@FreeBSD.ORG Sat Apr 19 16:42:14 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2E9AD37B404; Sat, 19 Apr 2003 16:42:14 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7C91243FA3; Sat, 19 Apr 2003 16:42:13 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0077.cvx40-bradley.dialup.earthlink.net ([216.244.42.77] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 1971yE-0007HZ-00; Sat, 19 Apr 2003 16:42:11 -0700 Message-ID: <3EA1DE82.68F32B77@mindspring.com> Date: Sat, 19 Apr 2003 16:40:50 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Marko Zec References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304192134.51484.zec@tel.fer.hr> <3EA1B72D.B8B96268@mindspring.com> <200304192350.48576.zec@tel.fer.hr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4c348be04d95caf49280f378ec0a8b4c6a2d4e88014a4647c350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2003 23:42:14 -0000 Marko Zec wrote: > Does the laptop owner care if the system freezes for a couple of miliseconds > more than usual? I am a laptop owner. I care. > If you have tried the patch yourself, you would certainly observe > that the freeze you are talking about is completely unnoticable. I run 13 jails for 12 virtual machines on my laptop. I noticed. > Therefore the disk head seek latency you mentioned won't be > noticeable in most cases. Define "most cases". > > Your changes makes it so the insertion *does not* put it in a > > different slot (because the fsync is most likely delayed). > ^^^^^ > > Therefore we are *not* safe. > > Again, in my understanding the (modified) fsync() handler is completely > unrelated to the (unmodified) sync_fsync() function. You're wrong. You have to take into account both the vnodes on the FS, and the vnodes that the FS is mounted on on devfs. > > The other FS corruption occurs because you don't specifically > > disable the delaying code before a shutdown or umount or mount > > -u -o ro, etc.. > > Such a problem simply does not exist. Please try out the patch, enable the > delaying, fill in as much dirty buffers as possible, and unmount the FS. You > will notice that a) all the dirty buffers will be automatically written to > the disk; b) the unmount operation will succeed; c) the system will not crash > and d) the FS will be perfectly consistent at the next mount. This is not true. I've proved it by corrupting an FS by holding down the power button on my laptop to force an ATX power-off, with no recourse. This is the same type of failure that could occur on a normal laptop when battery output drops the power out from under you. The basic problem is that the forces fsync is no longer forced. > > This is totally irrelevent; it's anecdotal, and therefore has > > nothing whatsoever to do with provable correctness. > > No offence please, but your argumentation would look much more > convincing if you could provoke a system crash with the patch > enabled, and then provide a backtrace. If the patch is as bad > as you are suggesting, that shouldn't be that hard to do, should it? I've done it. I guess you want me to do it again, citing that absence of evidence is not evidence of absence? The problem her is well understood. Rather than arguing further, I will offer a modification of your patches. Note that this modification is still unsafe, due to the lack of a "force" flag for the fsync in the unmount and mount -u cases; give me a couple of days, since I test patches before I post them (normally 2 weeks; I'll make an exception in this case). -- Terry From owner-freebsd-fs@FreeBSD.ORG Sat Apr 19 17:27:58 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CB67E37B401; Sat, 19 Apr 2003 17:27:58 -0700 (PDT) Received: from mail.tel.fer.hr (zg04-020.dialin.iskon.hr [213.191.137.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9181743FB1; Sat, 19 Apr 2003 17:27:56 -0700 (PDT) (envelope-from zec@tel.fer.hr) Received: from marko-tp (marko@[192.168.201.107]) by mail.tel.fer.hr (8.12.6/8.12.6) with ESMTP id h3K0Q2xI001037; Sun, 20 Apr 2003 02:26:06 +0200 (CEST) (envelope-from zec@tel.fer.hr) From: Marko Zec To: Terry Lambert Date: Sun, 20 Apr 2003 02:27:43 +0200 User-Agent: KMail/1.5 References: <3E976EBD.C3E66EF8@tel.fer.hr> <200304192350.48576.zec@tel.fer.hr> <3EA1DE82.68F32B77@mindspring.com> In-Reply-To: <3EA1DE82.68F32B77@mindspring.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200304200227.44268.zec@tel.fer.hr> cc: freebsd-fs@FreeBSD.org cc: David Schultz cc: freebsd-stable@FreeBSD.org Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Apr 2003 00:27:59 -0000 On Sunday 20 April 2003 01:40, Terry Lambert wrote: > Marko Zec wrote: > > If you have tried the patch yourself, you would certainly observe > > that the freeze you are talking about is completely unnoticable. > > I run 13 jails for 12 virtual machines on my laptop. I noticed. :) If you are really serious about running 12 VMs on a laptop, then: a) you do not want to have this patch enabled in the first place, and b) what kind of delay exactly did you notice? > > Therefore the disk head seek latency you mentioned won't be > > noticeable in most cases. > > Define "most cases". Those where the onboard write-caching RAM on the ATA disk is large enough to compensate for disk head seek latency for the whole write burst. > > Again, in my understanding the (modified) fsync() handler is completely > > unrelated to the (unmodified) sync_fsync() function. > > You're wrong. You have to take into account both the vnodes on > the FS, and the vnodes that the FS is mounted on on devfs. Hmm, the original patch was against 4.8-R, and this whole discussion is flooding the -stable mailing list, in case you forgot. Where from did you now pull the devfs? And even with devfs, what if my patch (optionally) ignores fsync()? Does that mean that all the programs that close their files without caling fsync() are going to crash the system? Uhhh.... > > > The other FS corruption occurs because you don't specifically > > > disable the delaying code before a shutdown or umount or mount > > > -u -o ro, etc.. > > > > Such a problem simply does not exist. Please try out the patch, enable > > the delaying, fill in as much dirty buffers as possible, and unmount the > > FS. You will notice that a) all the dirty buffers will be automatically > > written to the disk; b) the unmount operation will succeed; c) the system > > will not crash and d) the FS will be perfectly consistent at the next > > mount. > > This is not true. I've proved it by corrupting an FS by holding > down the power button on my laptop to force an ATX power-off, with > no recourse. ??? You have proved what by pulling out the plug? That umount or shutdown do not work (pls. read your previous claim 10 lines above)? I do not believe to be reading this... > > No offence please, but your argumentation would look much more > > convincing if you could provoke a system crash with the patch > > enabled, and then provide a backtrace. If the patch is as bad > > as you are suggesting, that shouldn't be that hard to do, should it? > > I've done it. I guess you want me to do it again, citing that > absence of evidence is not evidence of absence? I'd simply prefer to receive a backtrace, rather than just tons of noise. An improved patch couldn't hurt either :) Marko From owner-freebsd-fs@FreeBSD.ORG Sat Apr 19 23:30:52 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F36AE37B401; Sat, 19 Apr 2003 23:30:51 -0700 (PDT) Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by mx1.FreeBSD.org (Postfix) with SMTP id 1040643FDF; Sat, 19 Apr 2003 23:30:50 -0700 (PDT) (envelope-from iedowse@maths.tcd.ie) Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 20 Apr 2003 07:30:49 +0100 (BST) To: Terry Lambert In-Reply-To: Your message of "Fri, 18 Apr 2003 11:12:01 PDT." <3EA03FF1.280B6810@mindspring.com> Date: Sun, 20 Apr 2003 07:30:44 +0100 From: Ian Dowse Message-ID: <200304200730.aa34354@salmon.maths.tcd.ie> cc: freebsd-fs@FreeBSD.ORG cc: David Schultz cc: freebsd-stable@FreeBSD.ORG cc: Kirk McKusick Subject: Re: PATCH: Forcible delaying of UFS (soft)updates X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Apr 2003 06:30:52 -0000 In message <3EA03FF1.280B6810@mindspring.com>, Terry Lambert writes: >David Schultz wrote: >> As for the ATA delayed write feature, I don't believe it will >> guarantee consistency. > >It doesn't. I checked, after voicing my suspions of it. Yes, write ordering and hence FS consistency is not guaranteed; my original point was just that the situation regarding FS consistency with ATA delayed writes is not significantly worse than that with the default behaviour of having ATA write cacheing enabled. In fact, if the OS is modified to perform writes in batches then the two cases are almost identical: in one case the disk collects a batch of writes, possibly reorders them, and writes them out in one burst; in the other case the OS sends a burst of writes, the disk possibly reorders them and writes them out. For reference I've included below what IBM say about the delayed write feature in their disk documentation. BTW, to answer a point Marko mentioned, I don't consider the delayed write behaviour to be nearly as bad as a null fsync(), because you are very unlikely to completely lose a file that has been modified, saved and then fsync()'d. If the write/rename/fsync all happen while the disk is spun down then the old version of the file is still intact on the media if the power fails. With a null fsync(), there can be a considerable window where the disk contains just a zero-length file. I completely accept that there is more flexibility at the OS side to control which writes get delayed and by how much, and that an OS-side implementation would be extremely useful. However I think it would require further work to develop a good implementation. For example, the current proposed patch effectively assumes that there is only one disk in the system since `stratcalls' is a global variable (e.g., I believe that reading from an ATA flash device would trigger a flush to any real ATA disks in the system). It would also be useful if the solution was not specific to ATA devices and had per-device control over the behaviour. I guess my point of view is more that doing this right at the OS side is hard, and ATA delayed write is an unobtrusive neat feature that does mostly the right thing at the cost of only a marginal increase in the risk of data loss for typical uses. Ian 11.13 Delayed Write function (vendor specific) Delayed Write function is a power saving enhancement whereby the device delays the actual data writing into the media. When the device is in the power saving mode and the Write command (Write Sectors, Write Multiple, or Write DMA) comes from the host, the transferred data is not written into the media immediately, only stored into the cache buffer. When the cache buffer becomes full or reaches the predefined size, or if any command except the Write command is issued, the operation to write the data from the cache buffer into the media is begun. Power consumption can be reduced by Delayed Write. When Write commands come with a long interval, the device must exit from the power saving mode and enter into the power saving mode again without Delayed Write function. If Delayed Write is enabled, such power saving mode transition times can be reduced. As a result, the additional energy for power saving mode transition can be saved, then the average power consumption of the device can be reduced. However, the time elapsed from the completion of the Write command to the media write completion will be extended with Delayed Write function. If the power for the device is turned off during this time, the data which has not been written to the media is lost. Therefore, a command listed in the Write Cache Function section shall be issued before the power off to confirm whole cached data has been written into the media. For safety, Delayed Write function is disabled at Power On Default.