From owner-freebsd-hackers  Sun Oct 29 16: 8:24 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135])
	by hub.freebsd.org (Postfix) with ESMTP id 1BE6637B65E
	for <freebsd-hackers@FreeBSD.ORG>; Sun, 29 Oct 2000 16:08:21 -0800 (PST)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.1/8.9.3) id e9U07nY71868;
	Sun, 29 Oct 2000 16:07:49 -0800 (PST)
	(envelope-from dillon)
Date: Sun, 29 Oct 2000 16:07:49 -0800 (PST)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200010300007.e9U07nY71868@earth.backplane.com>
To: Ryan Thompson <ryan@sasknow.com>
Cc: freebsd-hackers@FreeBSD.ORG
Subject: Re: Filesystem holes
References:  <Pine.BSF.4.21.0010291350010.27883-100000@ren.sasknow.com>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:> :> storage is rather inefficient for our table of about 2,850,000 members
:> :> (~2.1 GB total storage).  There are 64M possible hash values in our
:> :> current implementation, and our record size is variable, but could be
:> :> safely fixed at about 1.5KB... So, total storage if all values were used
:> :> would be about 96GB.  (See where I'm going with this?)
:...
:
:Remember, though, I'm not proposing a hash table.  I have been lucky
:enough to design a "hash" function that will crunch down all possible
:values into about 64M unique keys--so, no collisions.  I propose to use
:this function with address calculation.  So, O(1) everything, for single
:random lookups, which is (virtually) all this data structure must deal
:with.  Sorting isn't necessary, and that could be accomplished in O(n)
:anyway.

    Are you still talking about creating a 96GB file to manage 2.1GB worth
    of data?  I gotta say, that sounds kinda nuts to me!  Cpu cycles are
    cheap, I/O and memory is not.

    Take the BTree example again -- if you think about it, all internal nodes
    in the tree will fit into memory trivially.  It is only the leaf nodes
    that do not.  So in regards to actual disk accesses you still wind up
    only doing *ONE* for the btree, even if it winds up being four or five
    levels deep.

    The result is that you will be able to create an index to your data,
    which is only 2.8 million records, in around 32 bytes per record or
    89 Megabytes.  Not 96GB.  Since you can cache all the internal nodes
    of the btree you are done.  The machine will do a much better job
    caching a 89MB index then a 96GB index.

    Do not make the mistake of equating cpu to I/O.  The cpu required to
    iterate through 4 levels in the btree (assuming 64 entries per node,
    it only takes 4 to cover 16 million records) is *ZERO* ... as in,
    probably less then a microsecond, whereas accessing the disk is going
    to be milliseconds.

:>     A B+Tree will also scale with the size of the dataset being managed,
:>     so you do not have to preallocate or prereserve file space.
:
:So will address calculation + filesystem holes, and sufficiently large
:filesizes :-)

    Uh, not with a 50:1 file size factor it won't.

					-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message