Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Aug 2004 09:49:54 -0700
From:      Brooks Davis <brooks@one-eyed-alien.net>
To:        Kathy Quinlan <kat-free@kaqelectronics.dyndns.org>
Cc:        freebsd-hardware@freebsd.org
Subject:   Re: Big Problem
Message-ID:  <20040804164954.GB10063@Odin.AC.HMC.Edu>
In-Reply-To: <4110C9C7.6080506@kaqelectronics.dyndns.org>
References:  <4110C9C7.6080506@kaqelectronics.dyndns.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--JP+T4n/bALQSJXh8
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Aug 04, 2004 at 07:34:31PM +0800, Kathy Quinlan wrote:
> Hi Guys and Gals,
>=20
> First off I am not a troll, this is a serious email. I can not go into=20
> to many fine points as I am bound by an NDA.
>=20
> The problem:
>=20
> I need to hold a text file in ram, the text file in the forseable future=
=20
> could be up to 10TB in size.
>=20
> My Options:
>=20
> Design a computer (probably multiple AMD 64's) to handle 10TB of memory=
=20
> (+ a few extra Gb of ram for system overhead) and hold the file in one=20
> physical computer system.
>=20
> Build a server farm and have each server hold a portion eg 4GB each=20
> Server (250 servers (plus a few extra for system overhead)

That only gets you to 1TB...

> The reason the file needs to be in ram is that I need speed of search=20
> for paterns in the data (less than 1 second to pull out relevent chunks)
>
> I am sure I have missed some options, right now I am just kicking ideas=
=20
> around, the software will be based on FreeBSD with some major=20
> modifications to address the large amount of ram (probably set it up as=
=20
> a virtual drive with one file)

Depending on your budget, I'd either give Cray or SGI a call, or build
a cluster of AMD64 machines.

You can get 16GB in a 1U chassis so that would reduce your requirements
to around 700 machines, call it 18 racks minus the networking.  You will
not be able to use that as a ram disk and stripe it for a single machine
to search.  First, there's no way you'll be able to maintain any kind
of uptime if you do that.  With 5600 DIMMs, you'll lose at least one a
week, probably more.  Second, assuming you can completely process one
64-bit word per cycle and you had enough bandwidth you would need 625
seconds to process 10TB of data.  What you will need to do is build a
distributed application that a) allows processing to run on each machine
and b) provides a mechanism for fault tolerance in the face of machine
failures.

You would do well to read up on the techniques used by google to manage
unreliable systems and provide high-performance search.

-- Brooks

--=20
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

--JP+T4n/bALQSJXh8
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFBEROxXY6L6fI4GtQRAsE1AJ9UUeIiEVvvY/2cEd/CWKSrhXh1eQCeL+Qb
keUyKiUQ59j7JdDyv3/punw=
=zqLg
-----END PGP SIGNATURE-----

--JP+T4n/bALQSJXh8--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040804164954.GB10063>