From owner-svn-src-all@FreeBSD.ORG Tue Feb 7 20:43:28 2012 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B9AEF1065680; Tue, 7 Feb 2012 20:43:28 +0000 (UTC) (envelope-from mckusick@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id A815D8FC1C; Tue, 7 Feb 2012 20:43:28 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.4/8.14.4) with ESMTP id q17KhSQV006767; Tue, 7 Feb 2012 20:43:28 GMT (envelope-from mckusick@svn.freebsd.org) Received: (from mckusick@localhost) by svn.freebsd.org (8.14.4/8.14.4/Submit) id q17KhSJx006765; Tue, 7 Feb 2012 20:43:28 GMT (envelope-from mckusick@svn.freebsd.org) Message-Id: <201202072043.q17KhSJx006765@svn.freebsd.org> From: Kirk McKusick Date: Tue, 7 Feb 2012 20:43:28 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r231160 - head/sys/ufs/ffs X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Feb 2012 20:43:28 -0000 Author: mckusick Date: Tue Feb 7 20:43:28 2012 New Revision: 231160 URL: http://svn.freebsd.org/changeset/base/231160 Log: In the original days of BSD, a sync was issued on every filesystem every 30 seconds. This spike in I/O caused the system to pause every 30 seconds which was quite annoying. So, the way that sync worked was changed so that when a vnode was first dirtied, it was put on a 30-second cleaning queue (see the syncer_workitem_pending queues in kern/vfs_subr.c). If the file has not been written or deleted after 30 seconds, the syncer pushes it out. As the syncer runs once per second, dirty files are trickled out slowly over the 30-second period instead of all at once by a call to sync(2). The one drawback to this is that it does not cover the filesystem metadata. To handle the metadata, vfs_allocate_syncvnode() is called to create a "filesystem syncer vnode" at mount time which cycles around the cleaning queue being sync'ed every 30 seconds. In the original design, the only things it would sync for UFS were the filesystem metadata: inode blocks, cylinder group bitmaps, and the superblock (e.g., by VOP_FSYNC'ing devvp, the device vnode from which the filesystem is mounted). Somewhere in its path to integration with FreeBSD the flushing of the filesystem syncer vnode got changed to sync every vnode associated with the filesystem. The result of this change is to return to the old filesystem-wide flush every 30-seconds behavior and makes the whole 30-second delay per vnode useless. This change goes back to the originally intended trickle out sync behavior. Key to ensuring that all the intended semantics are preserved (e.g., that all inode updates get flushed within a bounded period of time) is that all inode modifications get pushed to their corresponding inode blocks so that the metadata flush by the filesystem syncer vnode gets them to the disk in a timely way. Thanks to Konstantin Belousov (kib@) for doing the audit and commit -r231122 which ensures that all of these updates are being made. Reviewed by: kib Tested by: scottl MFC after: 2 weeks Modified: head/sys/ufs/ffs/ffs_vfsops.c Modified: head/sys/ufs/ffs/ffs_vfsops.c ============================================================================== --- head/sys/ufs/ffs/ffs_vfsops.c Tue Feb 7 20:24:52 2012 (r231159) +++ head/sys/ufs/ffs/ffs_vfsops.c Tue Feb 7 20:43:28 2012 (r231160) @@ -1436,17 +1436,26 @@ ffs_sync(mp, waitfor) int softdep_accdeps; struct bufobj *bo; + wait = 0; + suspend = 0; + suspended = 0; td = curthread; fs = ump->um_fs; if (fs->fs_fmod != 0 && fs->fs_ronly != 0 && ump->um_fsckpid == 0) panic("%s: ffs_sync: modification on read-only filesystem", fs->fs_fsmnt); /* + * For a lazy sync, we just care about the filesystem metadata. + */ + if (waitfor == MNT_LAZY) { + secondary_accwrites = 0; + secondary_writes = 0; + lockreq = 0; + goto metasync; + } + /* * Write back each (modified) inode. */ - wait = 0; - suspend = 0; - suspended = 0; lockreq = LK_EXCLUSIVE | LK_NOWAIT; if (waitfor == MNT_SUSPEND) { suspend = 1; @@ -1517,11 +1526,12 @@ loop: #ifdef QUOTA qsync(mp); #endif + +metasync: devvp = ump->um_devvp; bo = &devvp->v_bufobj; BO_LOCK(bo); - if (waitfor != MNT_LAZY && - (bo->bo_numoutput > 0 || bo->bo_dirty.bv_cnt > 0)) { + if (bo->bo_numoutput > 0 || bo->bo_dirty.bv_cnt > 0) { BO_UNLOCK(bo); vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY); if ((error = VOP_FSYNC(devvp, waitfor, td)) != 0)