From owner-freebsd-hardware@FreeBSD.ORG Wed Aug 4 11:59:32 2004 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3D80F16A4CE for ; Wed, 4 Aug 2004 11:59:32 +0000 (GMT) Received: from mail.eecs.harvard.edu (bowser.eecs.harvard.edu [140.247.60.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id E4EEB43D55 for ; Wed, 4 Aug 2004 11:59:31 +0000 (GMT) (envelope-from ellard@eecs.harvard.edu) Received: from localhost (localhost.eecs.harvard.edu [127.0.0.1]) by mail.eecs.harvard.edu (Postfix) with ESMTP id 3C3AC54C562; Wed, 4 Aug 2004 07:59:31 -0400 (EDT) Received: from mail.eecs.harvard.edu ([127.0.0.1]) by localhost (bowser.eecs.harvard.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 24326-05; Wed, 4 Aug 2004 07:59:31 -0400 (EDT) Received: by mail.eecs.harvard.edu (Postfix, from userid 465) id 1362254C52B; Wed, 4 Aug 2004 07:59:31 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.eecs.harvard.edu (Postfix) with ESMTP id 10ACE54C512; Wed, 4 Aug 2004 07:59:31 -0400 (EDT) Date: Wed, 4 Aug 2004 07:59:30 -0400 (EDT) From: Daniel Ellard To: Kathy Quinlan In-Reply-To: <4110C9C7.6080506@kaqelectronics.dyndns.org> Message-ID: <20040804074504.L22815@bowser.eecs.harvard.edu> References: <4110C9C7.6080506@kaqelectronics.dyndns.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by amavisd-new at eecs.harvard.edu cc: freebsd-hardware@freebsd.org Subject: Re: Big Problem X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Aug 2004 11:59:32 -0000 On Wed, 4 Aug 2004, Kathy Quinlan wrote: > First off I am not a troll, this is a serious email. I can not go > into to many fine points as I am bound by an NDA. That's too bad, because it will make this a little more complicated. But nevertheless... > The problem: > > I need to hold a text file in ram, the text file in the forseable > future could be up to 10TB in size. > > My Options: > > Design a computer (probably multiple AMD 64's) to handle 10TB of > memory (+ a few extra Gb of ram for system overhead) and hold the > file in one physical computer system. If you can find/construct a mobo with sockets for 10TB of RAM... That's 10,000 1 GB sticks. That would be quite a design exercise. > Build a server farm and have each server hold a portion eg 4GB each > Server (250 servers (plus a few extra for system overhead) > > The reason the file needs to be in ram is that I need speed of > search for paterns in the data (less than 1 second to pull out > relevent chunks) If what you're doing is searching for relatively small subsets of the data (i.e. a particular record or handful of records) then you don't need to do this entirely in RAM. If you use an appropriately-sized B-Tree and cache the high levels of the tree, it only takes a few I/Os to find any particular record. If you can spread the tree around a bit (split it among a bunch of hosts, each caching the upper few GB of their subtre) then it's even better. Arrange the data over lots of thinly-allocated disks (you can get very good read performance from disks if you aren't concerned about space efficiency, and if you've got the budget to buy 10 TB of RAM and design a custom machine to put it in, I'm guessing that buying a few racks of disks won't be an issue). If, on the other hand, you're looking for something interesting in the data (i.e. you're not just searching for keys, but are doing some processing) then the issue probably isn't RAM and I/O, but raw processing power. It takes a long time to scan through 10 TB of data, whether that data is in RAM or on disk -- you'll never get it done in a second. In this case, heaps of processors are probably your only hope. Controlling them will be an interesting challenge. Of course, the problem is probably somewhere in the middle. Tell us what you can without violating your NDA... -Dan