Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Mar 1998 12:25:00 -0800 (PST)
From:      Simon Shapiro <shimon@simon-shapiro.org>
To:        nik@iii.co.uk
Cc:        Amancio Hasty <hasty@rah.star-gate.com>, Satoshi Asami <asami@FreeBSD.ORG>, scrappy@hub.org, andreas@klemm.gtn.com, freebsd-database@FreeBSD.ORG, Wolfram Schneider <wosch@cs.tu-berlin.de>, John Fieber <jfieber@indiana.edu>
Subject:   Re: Mailing list search interface
Message-ID:  <XFMail.980330122500.shimon@simon-shapiro.org>
In-Reply-To: <19980330164024.47510@iii.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help

On 30-Mar-98 nik@iii.co.uk wrote:
 ...

> My disk is single 2GB Atlas II, with tagged queuing turned *off* (because
> of buggy firmware which I haven't updated yet).

Ah!  This is useful information.  thanx!

>> By quick back-of-an-envelope calculations, this is slower than
>> the current indexing scheme on hub by at least a factor of 10.
> 
> The time above was for creation of the HTML archives and for indexing,
> not just indexing alone.

This is something we need to keep in mind.  Generating 100% output coverage
for (probably) less than 10% need is wasteful.

>> Indexing anything large is typically an I/O bound operation and
>> when you start indexing much more than can fit in RAM, your
>> performance will degrade dramatically, so it is probably slower
>> by much more than a factor of 10.
> 
> Don't know. I'll grab last years archive of -hackers (or another one,
> if there's another you think would be more representative) and try that.
> I can bring back figures for the time to create the entire archive (and
> index), the time just to index, and the time to add a new message and
> then reindex.

Listen to the man :-)  It gets worse.  Extrapolation on a non-linear
function is called gambling :-)  You will run into scaling problems at
certain sizes.  The worsening can be dramatic.

> I'd try this with the whole of the archives, but I don't have the spare
> disk space (yet).

I have.  Is there an efficient way to get the whole archive here? 
Downloading on a modem is NOT considered efficient.

> Are those survey results available online somewhere?

Please!


> A hybrid system is on my list of things to build here (but it'll be 
> Oracle based). I haven't investigated Postgres enough to know if it's
> up to the task.

Oracle based is good.  Now, plase tell us how to run Oracle on FreeBSD,
legally, and with source available.

PostgreSQL is up to the task.  This is not a dramatically complex database
problem.  Pretty much a linear table, with the text searching TBD.


----------


Sincerely Yours, 

Simon Shapiro
Shimon@Simon-Shapiro.ORG                      Voice:   503.799.2313

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-database" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.980330122500.shimon>