From owner-freebsd-www Fri Aug 29 08:47:34 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id IAA10697 for www-outgoing; Fri, 29 Aug 1997 08:47:34 -0700 (PDT) Received: from schubert.promo.de (schubert.Promo.DE [194.45.188.65]) by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id IAA10648 for ; Fri, 29 Aug 1997 08:46:27 -0700 (PDT) Received: from [194.45.188.81] (stefan.Promo.DE [194.45.188.81]) by schubert.promo.de (8.8.5/8.8.5) with ESMTP id RAA13924; Fri, 29 Aug 1997 17:41:35 +0200 (MET DST) X-Sender: stefan@mail.promo.de Message-Id: In-Reply-To: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Fri, 29 Aug 1997 17:43:22 +0200 To: John Fieber From: Stefan Bethke Subject: Re: Something I've always wanted to see with the mailing list search Cc: "Jordan K. Hubbard" , www@FreeBSD.ORG Sender: owner-freebsd-www@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk At 15:52 Uhr +0200 29.08.1997, John Fieber wrote: >For a good discussion of the issues, see: > > David D. Lewis and Kimberly A Knowles (1997) Threading > electronic mail: a preliminary study. Information Processing & > Management, 33(2):209-217. > Is this a book? If so, do you have an ISBN or something? Or do you know any online ressource? >It turns out that breaking messages down in to quoted and >unquoted chunks, indexing them separately, and using vector space >similarity measures (what freeWAIS uses) for retrieval is more >accurate in retrieving what a human would consider to be a >message thread than following subject lines, in-reply-to or >references fields. In the absence of those fields, it is really >the only way to discover a thread. You might be right. But given the amount of spare time I can put into this project, I'll stick to In-Reply-To:/References: and Subject:. Hopefully, the code will be modular enought that this can be changed later. >As for constructing threads at index time, this may be best for >efficiency but extra care must be give to how threads are >represented. For example, an "in-reply-to" linked tree may >contain several distinct, but related threads. It should be >possible to get at the sub-threads individually, as well as the >larger thread. This means that any message may have multiple >thread membership, either directly or indirectly via some thread >record with pointers to parent/child threads. Ultimately, I >would hope for thread discovery at search time rather than >indexing time because it offers much more flexibility in tweaking >various dimensions of the thread concept--broadening or narrowing >the boundaries, building threads that cross boundaries between >in-reply-to message trees, etc.... For the first step, it won't be a linked tree but a list: given a message, the in-reply-to id is used to look up the thread id from the message referenced, thus assigning all follow-ups the same thread-id. Yes, this leaves room for improvement :-) How do you determine the border between two threads, that are linked to the same anchestors? My general feeling is that I rather look through 100 messages to find the one I want than let an "intelligent" system present only one to me, and that being not the one I'm looking for. Cheers, Stefan -- Stefan Bethke Promo Datentechnik | Tel. +49-40-851744-0 + Systemberatung GmbH | Fax. +49-40-851744-44 Eduardstrasse 46-48 | e-mail: stefan@Promo.DE D-20257 Hamburg | http://www.Promo.DE/