From owner-freebsd-database Mon Mar 30 12:15:53 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id MAA10996 for freebsd-database-outgoing; Mon, 30 Mar 1998 12:15:53 -0800 (PST) (envelope-from owner-freebsd-database@FreeBSD.ORG) Received: from sendero.simon-shapiro.org (sendero-fddi.Simon-Shapiro.ORG [206.190.148.2]) by hub.freebsd.org (8.8.8/8.8.8) with SMTP id MAA10988 for ; Mon, 30 Mar 1998 12:15:48 -0800 (PST) (envelope-from shimon@simon-shapiro.org) Received: (qmail 3345 invoked from network); 30 Mar 1998 20:25:00 -0000 Received: from localhost.simon-shapiro.org (HELO sendero-fxp0.simon-shapiro.org) (@127.0.0.1) by localhost.simon-shapiro.org with SMTP; 30 Mar 1998 20:25:00 -0000 Message-ID: X-Mailer: XFMail 1.3-alpha-032398 [p0] on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <19980330164024.47510@iii.co.uk> Date: Mon, 30 Mar 1998 12:25:00 -0800 (PST) Reply-To: shimon@simon-shapiro.org Organization: The Simon Shapiro Foundation From: Simon Shapiro To: nik@iii.co.uk Subject: Re: Mailing list search interface Cc: Amancio Hasty , Satoshi Asami , scrappy@hub.org, andreas@klemm.gtn.com, freebsd-database@FreeBSD.ORG, Wolfram Schneider , John Fieber Sender: owner-freebsd-database@FreeBSD.ORG Precedence: bulk On 30-Mar-98 nik@iii.co.uk wrote: ... > My disk is single 2GB Atlas II, with tagged queuing turned *off* (because > of buggy firmware which I haven't updated yet). Ah! This is useful information. thanx! >> By quick back-of-an-envelope calculations, this is slower than >> the current indexing scheme on hub by at least a factor of 10. > > The time above was for creation of the HTML archives and for indexing, > not just indexing alone. This is something we need to keep in mind. Generating 100% output coverage for (probably) less than 10% need is wasteful. >> Indexing anything large is typically an I/O bound operation and >> when you start indexing much more than can fit in RAM, your >> performance will degrade dramatically, so it is probably slower >> by much more than a factor of 10. > > Don't know. I'll grab last years archive of -hackers (or another one, > if there's another you think would be more representative) and try that. > I can bring back figures for the time to create the entire archive (and > index), the time just to index, and the time to add a new message and > then reindex. Listen to the man :-) It gets worse. Extrapolation on a non-linear function is called gambling :-) You will run into scaling problems at certain sizes. The worsening can be dramatic. > I'd try this with the whole of the archives, but I don't have the spare > disk space (yet). I have. Is there an efficient way to get the whole archive here? Downloading on a modem is NOT considered efficient. > Are those survey results available online somewhere? Please! > A hybrid system is on my list of things to build here (but it'll be > Oracle based). I haven't investigated Postgres enough to know if it's > up to the task. Oracle based is good. Now, plase tell us how to run Oracle on FreeBSD, legally, and with source available. PostgreSQL is up to the task. This is not a dramatically complex database problem. Pretty much a linear table, with the text searching TBD. ---------- Sincerely Yours, Simon Shapiro Shimon@Simon-Shapiro.ORG Voice: 503.799.2313 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-database" in the body of the message