From owner-freebsd-hackers Sun Oct 29 16: 8:24 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from earth.backplane.com (placeholder-dcat-1076843399.broadbandoffice.net [64.47.83.135]) by hub.freebsd.org (Postfix) with ESMTP id 1BE6637B65E for ; Sun, 29 Oct 2000 16:08:21 -0800 (PST) Received: (from dillon@localhost) by earth.backplane.com (8.11.1/8.9.3) id e9U07nY71868; Sun, 29 Oct 2000 16:07:49 -0800 (PST) (envelope-from dillon) Date: Sun, 29 Oct 2000 16:07:49 -0800 (PST) From: Matt Dillon Message-Id: <200010300007.e9U07nY71868@earth.backplane.com> To: Ryan Thompson Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: Filesystem holes References: Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :> :> storage is rather inefficient for our table of about 2,850,000 members :> :> (~2.1 GB total storage). There are 64M possible hash values in our :> :> current implementation, and our record size is variable, but could be :> :> safely fixed at about 1.5KB... So, total storage if all values were used :> :> would be about 96GB. (See where I'm going with this?) :... : :Remember, though, I'm not proposing a hash table. I have been lucky :enough to design a "hash" function that will crunch down all possible :values into about 64M unique keys--so, no collisions. I propose to use :this function with address calculation. So, O(1) everything, for single :random lookups, which is (virtually) all this data structure must deal :with. Sorting isn't necessary, and that could be accomplished in O(n) :anyway. Are you still talking about creating a 96GB file to manage 2.1GB worth of data? I gotta say, that sounds kinda nuts to me! Cpu cycles are cheap, I/O and memory is not. Take the BTree example again -- if you think about it, all internal nodes in the tree will fit into memory trivially. It is only the leaf nodes that do not. So in regards to actual disk accesses you still wind up only doing *ONE* for the btree, even if it winds up being four or five levels deep. The result is that you will be able to create an index to your data, which is only 2.8 million records, in around 32 bytes per record or 89 Megabytes. Not 96GB. Since you can cache all the internal nodes of the btree you are done. The machine will do a much better job caching a 89MB index then a 96GB index. Do not make the mistake of equating cpu to I/O. The cpu required to iterate through 4 levels in the btree (assuming 64 entries per node, it only takes 4 to cover 16 million records) is *ZERO* ... as in, probably less then a microsecond, whereas accessing the disk is going to be milliseconds. :> A B+Tree will also scale with the size of the dataset being managed, :> so you do not have to preallocate or prereserve file space. : :So will address calculation + filesystem holes, and sufficiently large :filesizes :-) Uh, not with a 50:1 file size factor it won't. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message