From owner-freebsd-current Thu Sep 27 14: 0:32 2001 Delivered-To: freebsd-current@freebsd.org Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by hub.freebsd.org (Postfix) with ESMTP id 1ED2D37B40D for ; Thu, 27 Sep 2001 14:00:27 -0700 (PDT) Received: from vicor-nb.com (dhcp122.vicor-nb.com [208.206.78.122]) by mail.vicor-nb.com (Postfix) with ESMTP id B396D1B21B for ; Thu, 27 Sep 2001 14:00:26 -0700 (PDT) Message-ID: <3BB3936A.D1E1F6B3@vicor-nb.com> Date: Thu, 27 Sep 2001 14:00:26 -0700 From: Julian Elischer Organization: VICOR X-Mailer: Mozilla 4.76 [en] (X11; U; FreeBSD 4.3-RELEASE i386) X-Accept-Language: en MIME-Version: 1.0 To: current@freebsd.org Subject: RFC: mod for 'du' Content-Type: multipart/mixed; boundary="------------5937F02517145C712E28C299" Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG This is a multi-part message in MIME format. --------------5937F02517145C712E28C299 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit 'du' keeps an array of files it has encountered that have > 1 link. Whenever it encounters another, it checks to see if it's one it has already seen and thus can avoid counting its space twice.. This is ok for small filesystems, however VICOR maintains 500GB filesystems on which much of the data has several links. The following patch to replace the linear array (which it realocs if too small) (which it scans linearly) with a hash-table can makle a DRASTIC change to how DU perfomrs for us in this environment. In a small test, we made a linked copy of /usr/src the run times were: old: 0.410u 2.221s 1:55.41 2.2% 12+1355k 6325+0io 2pf+0w new: 8.610u 2.665s 2:09.23 8.7% 10+718k 6367+0io 2pf+0 I must stress that the new code is slightly slower for small numbers of linked files but the old algorythm is O(N^2) and so started taking sizable portions of whole days on our filesystems as they grew. The link checking code is not called at all for files with only a single link and so should not affect the speed in those cases. Does anyone have objections to me committing this (or a variant of it)? Julian --------------5937F02517145C712E28C299 Content-Type: text/plain; charset=us-ascii; name="lnk.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="lnk.diff" Index: du.c =================================================================== RCS file: /home/ncvs/src/usr.bin/du/du.c,v retrieving revision 1.21 diff -u -r1.21 du.c --- du.c 2001/09/04 09:43:31 1.21 +++ du.c 2001/09/27 20:57:13 @@ -2,6 +2,14 @@ * Copyright (c) 1989, 1993, 1994 * The Regents of the University of California. All rights reserved. * + * This version of du has been modified by Andre de Bruin and Scott Macy for Vicor. + * The change is related to the handling of file links, in the official + * release, du keeps a simple linked list of all visited i-nodes with + * multiple links. This list is now implemented as a "hash table" (actually + * a fixed size array) with link list nodes providing a + * performance of up to 30% better than the original version. + * The only changes are in file create.c and marked with "AdB". + * * This code is derived from software contributed to Berkeley by * Chris Newcomb. * @@ -104,6 +112,15 @@ void ignoreclean __P((void)); int ignorep __P((FTSENT *)); +typedef struct _ID { + dev_t dev; + ino_t inode; + struct _ID *next; +} ID; + +#define NHASHSLOTS 100000 + + int main(argc, argv) int argc; @@ -310,35 +327,43 @@ } -typedef struct _ID { - dev_t dev; - ino_t inode; -} ID; - int linkchk(p) FTSENT *p; { - static ID *files; - static int maxfiles, nfiles; - ID *fp, *start; + ID *lp; ino_t ino; dev_t dev; + int hashval; + static ID **ahash; ino = p->fts_statp->st_ino; dev = p->fts_statp->st_dev; - if ((start = files) != NULL) - for (fp = start + nfiles - 1; fp >= start; --fp) - if (ino == fp->inode && dev == fp->dev) - return (1); + hashval = ino % NHASHSLOTS; + + + /* AdB Use simple hash with linklist to record visited inodes */ + if ( ahash == NULL) { + + if ( (ahash = (ID **)malloc( sizeof(void *) * NHASHSLOTS )) == NULL) + err(1, "`can't allocate memory"); + memset( ahash, 0, NHASHSLOTS); + } else { + for (lp = ahash[ hashval ];lp != NULL; lp = lp->next) { + + if ( (ino == lp->inode) && (dev == lp->dev) ) return (1 ); + } + } + + /* Not found, add it */ - if (nfiles == maxfiles && (files = realloc((char *)files, - (u_int)(sizeof(ID) * (maxfiles += 128)))) == NULL) + if ( (lp = (ID *) malloc((unsigned) (sizeof (ID) ))) == NULL ) errx(1, "can't allocate memory"); - files[nfiles].inode = ino; - files[nfiles].dev = dev; - ++nfiles; + lp->inode = ino; + lp->dev = dev; + lp->next = ahash[hashval]; + ahash[hashval] = lp; return (0); } --------------5937F02517145C712E28C299-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message