Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Sep 2001 14:00:26 -0700
From:      Julian Elischer <julian@vicor-nb.com>
To:        current@freebsd.org
Subject:   RFC: mod for 'du'
Message-ID:  <3BB3936A.D1E1F6B3@vicor-nb.com>

next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------5937F02517145C712E28C299
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

'du' keeps an array of files it has encountered that have > 1 link.
Whenever it encounters another, it checks to see if it's one it has
already seen
and thus can avoid counting its space twice..

This is ok for small filesystems, however VICOR maintains
500GB filesystems on which much of the data has several links.

The following patch to replace the linear array (which it realocs if too
small)
(which it scans linearly) with a hash-table can makle a DRASTIC change
to how DU perfomrs for us in this environment.

In a small test, we made a linked copy of /usr/src

the run times were:
old: 0.410u 2.221s 1:55.41 2.2%      12+1355k 6325+0io 2pf+0w
new: 8.610u 2.665s 2:09.23 8.7%      10+718k 6367+0io 2pf+0  

I must stress that the new code is slightly slower for small numbers 
of linked files but the old algorythm is O(N^2) and so started taking 
sizable portions of whole days on our filesystems as they grew.

The link checking code is not called at all for files with only a 
single link and so should not affect the speed in those cases.

Does anyone have objections to me committing this (or a variant of it)?

Julian
--------------5937F02517145C712E28C299
Content-Type: text/plain; charset=us-ascii;
 name="lnk.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="lnk.diff"

Index: du.c
===================================================================
RCS file: /home/ncvs/src/usr.bin/du/du.c,v
retrieving revision 1.21
diff -u -r1.21 du.c
--- du.c	2001/09/04 09:43:31	1.21
+++ du.c	2001/09/27 20:57:13
@@ -2,6 +2,14 @@
  * Copyright (c) 1989, 1993, 1994
  *	The Regents of the University of California.  All rights reserved.
  *
+ * This version of du has been modified by Andre de Bruin and Scott Macy for Vicor.
+ * The change is related to the handling of file links, in the official
+ * release, du keeps a simple linked list of all visited i-nodes with
+ * multiple links. This list is now implemented as a "hash table" (actually
+ * a fixed size array) with link list nodes providing a 
+ * performance of up to 30% better than the original version.
+ * The only changes are in file create.c and marked with "AdB".
+ *
  * This code is derived from software contributed to Berkeley by
  * Chris Newcomb.
  *
@@ -104,6 +112,15 @@
 void		ignoreclean __P((void));
 int		ignorep __P((FTSENT *));
 
+typedef struct _ID {
+	dev_t	    dev;
+	ino_t	    inode;
+        struct _ID *next;
+} ID;
+
+#define NHASHSLOTS 100000
+
+
 int
 main(argc, argv)
 	int argc;
@@ -310,35 +327,43 @@
 }
 
 
-typedef struct _ID {
-	dev_t	dev;
-	ino_t	inode;
-} ID;
-
 
 int
 linkchk(p)
 	FTSENT *p;
 {
-	static ID *files;
-	static int maxfiles, nfiles;
-	ID *fp, *start;
+	ID     *lp;
 	ino_t ino;
 	dev_t dev;
+	int hashval;
+	static ID **ahash;
 
 	ino = p->fts_statp->st_ino;
 	dev = p->fts_statp->st_dev;
-	if ((start = files) != NULL)
-		for (fp = start + nfiles - 1; fp >= start; --fp)
-			if (ino == fp->inode && dev == fp->dev)
-				return (1);
+	hashval = ino % NHASHSLOTS;
+
+
+	/* AdB Use simple hash with linklist to record visited inodes */
+	if ( ahash == NULL) {
+
+	  if ( (ahash = (ID **)malloc( sizeof(void *) * NHASHSLOTS )) == NULL)
+	    err(1,  "`can't allocate memory");
+	  memset( ahash, 0, NHASHSLOTS);
+	} else {
+	  for (lp = ahash[ hashval ];lp != NULL; lp = lp->next) {
+    
+	    if ( (ino == lp->inode) && (dev == lp->dev) ) return (1 );
+	  }
+	}
+
+	/* Not found, add it */
 
-	if (nfiles == maxfiles && (files = realloc((char *)files,
-	    (u_int)(sizeof(ID) * (maxfiles += 128)))) == NULL)
+	if ( (lp = (ID *) malloc((unsigned) (sizeof (ID) )))  == NULL )
 		errx(1, "can't allocate memory");
-	files[nfiles].inode = ino;
-	files[nfiles].dev = dev;
-	++nfiles;
+	lp->inode = ino;
+	lp->dev = dev;
+	lp->next = ahash[hashval];
+	ahash[hashval] = lp;
 	return (0);
 }
 

--------------5937F02517145C712E28C299--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3BB3936A.D1E1F6B3>