Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Mar 1998 11:52:11 -0800 (PST)
From:      Simon Shapiro <shimon@simon-shapiro.org>
To:        Wolfram Schneider <wosch@cs.tu-berlin.de>
Cc:        Amancio Hasty <hasty@rah.star-gate.com>, Satoshi Asami <asami@FreeBSD.ORG>, scrappy@hub.org, andreas@klemm.gtn.com, freebsd-database@FreeBSD.ORG
Subject:   Re: [PORTS] Pgaccess doesn't run on -current anymore, Update
Message-ID:  <XFMail.980330115211.shimon@simon-shapiro.org>
In-Reply-To: <19980330123130.39177@caramba.cs.tu-berlin.de>

next in thread | previous in thread | raw e-mail | index | archive | help

On 30-Mar-98 Wolfram Schneider wrote:
> On 1998-03-29 13:57:30 -0800, Simon Shapiro wrote:
>> We have been playing with the idea of normalizing the archive into an
>> RDBMS.  Some of the benefits are:
>> 
>> *  no need to update the threads database.  It will always be updated.
>> *  Users can create, easily, their own thread logic with no impact on
>>    system performance.
>> *  Searching on normalized fields are many times faster, and much less
>>    costly in system resources.
> 
> Some figures ...
> 
> The FreeBSD mailing list archive is 620MB large. There are currently
> 270,000 messages. The archive grow with 100,000 messages/year.

Excellent.  How many years back do we want to keep?

> If you plan to use a real SQL database, you should consider at least
> 500,000 data sets, better 1 million. You need 2GB for the raw E-Mails
> and 2-4GB for the index. I don't know if there are free available
> databases which can handle this large data.

Large?  Assume 1 million messages in the ``current'' database.  People can
search the ``ancient'' database separately.  Even if your dataset numbers
are correct, this fits in 2 4GB partitions in a RAID array.  For 4 million
records, an indexed search in PostgreSQL 6.2.1 took about 1-2 seconds on a
busy system (make buildworld in the background).

> That was the hardware part. You must hire a database expert, a Web
> designer and a cgi script programmer. All people should be willing to
> work for at least 2-3 years on this project. This is not an easy task.

Using your logic, we should close the FreeBSD project, as maintaining an
Operating system like this takes 200-300 kernel experts.  The database
expert is available and willing to do it for free.  If not, there are other
database experts amoung FreeBSD users.  A CGI interface already exists for
the database interface.  The HTML interface can be written by people like
those who did the excellent job on the FreeBSD web pages.

In other words, if the FreeBSD project cannot find the people to do this,
then noone can.  BTW, your time estimate is good ig you plan to e paid
hourly for it.  I nuilt much, much more complex RDBMS based information
systems in fraction of that time.  An email parser is no more than a week. 
The text search about the same.

> A full update of the thread database took 6 min on hub (Pentium Pro),
> thats 100MB/min ;-) An update for the last week took 3-6 seconds.

Something is too good to be true here.  How can you read Unix filesystems
at 100 Megabytes per second?

Also, if the current engine is so great, how come all these people are
excited about replacing it?  I have no opinion as my usage is too scarce
and too superficial to vioce any opinion.  My position is that IF there is
a desire to build an RDBMS based engine, I will be happy to contribute my
modest knowledge in the matters and some of my time.


----------


Sincerely Yours, 

Simon Shapiro
Shimon@Simon-Shapiro.ORG                      Voice:   503.799.2313

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-database" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.980330115211.shimon>