Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Sep 2001 08:23:29 +0200
From:      Poul-Henning Kamp <phk@critter.freebsd.dk>
To:        Julian Elischer <julian@vicor-nb.com>
Cc:        current@FreeBSD.ORG
Subject:   Re: RFC: mod for 'du' 
Message-ID:  <69645.1001658209@critter>
In-Reply-To: Your message of "Thu, 27 Sep 2001 14:00:26 PDT." <3BB3936A.D1E1F6B3@vicor-nb.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
In message <3BB3936A.D1E1F6B3@vicor-nb.com>, Julian Elischer writes:
>This is a multi-part message in MIME format.
>--------------5937F02517145C712E28C299
>Content-Type: text/plain; charset=us-ascii
>Content-Transfer-Encoding: 7bit
>
>'du' keeps an array of files it has encountered that have > 1 link.
>Whenever it encounters another, it checks to see if it's one it has
>already seen
>and thus can avoid counting its space twice..
>
>This is ok for small filesystems, however VICOR maintains
>500GB filesystems on which much of the data has several links.
>
>The following patch to replace the linear array (which it realocs if too
>small)
>(which it scans linearly) with a hash-table can makle a DRASTIC change
>to how DU perfomrs for us in this environment.
>
>In a small test, we made a linked copy of /usr/src
>
>the run times were:
>old: 0.410u 2.221s 1:55.41 2.2%      12+1355k 6325+0io 2pf+0w
>new: 8.610u 2.665s 2:09.23 8.7%      10+718k 6367+0io 2pf+0  

Have you examined if the hask key is evenly distributed ?

>Does anyone have objections to me committing this (or a variant of it)?

I would certainly have objections of you show us one patch and 
commit another..

>  *	The Regents of the University of California.  All rights reserved.
>  *
>+ * This version of du has been modified by Andre de Bruin and Scott Macy for Vicor.

Too long line ?

>+ * The change is related to the handling of file links, in the official
>+ * release, du keeps a simple linked list of all visited i-nodes with
>+ * multiple links. This list is now implemented as a "hash table" (actually
>+ * a fixed size array) with link list nodes providing a 
>+ * performance of up to 30% better than the original version.
>+ * The only changes are in file create.c and marked with "AdB".

We put such stuff in the CVS commit message, not in the source file.

>+typedef struct _ID {
>+	dev_t	    dev;
>+	ino_t	    inode;
>+        struct _ID *next;
>+} ID;

Why not use db(3) ?

It has a known good hash, it autosizes to different task-sets and
it could be setup with an option to put the hash-table in a diskfile
("du -t" ?) for known bad cases of many hardlinks

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?69645.1001658209>