Date: Fri, 29 Aug 1997 17:43:22 +0200 From: Stefan Bethke <stefan@promo.de> To: John Fieber <jfieber@indiana.edu> Cc: "Jordan K. Hubbard" <jkh@time.cdrom.com>, www@FreeBSD.ORG Subject: Re: Something I've always wanted to see with the mailing list search Message-ID: <l03102802b02c8925d358@[194.45.188.81]> In-Reply-To: <Pine.BSF.3.96.970829075012.341E-100000@fallout.campusview.indiana.edu> References: <l03102801b02c54a7fe9c@[194.45.188.81]>
next in thread | previous in thread | raw e-mail | index | archive | help
At 15:52 Uhr +0200 29.08.1997, John Fieber wrote: >For a good discussion of the issues, see: > > David D. Lewis and Kimberly A Knowles (1997) Threading > electronic mail: a preliminary study. Information Processing & > Management, 33(2):209-217. > Is this a book? If so, do you have an ISBN or something? Or do you know any online ressource? >It turns out that breaking messages down in to quoted and >unquoted chunks, indexing them separately, and using vector space >similarity measures (what freeWAIS uses) for retrieval is more >accurate in retrieving what a human would consider to be a >message thread than following subject lines, in-reply-to or >references fields. In the absence of those fields, it is really >the only way to discover a thread. You might be right. But given the amount of spare time I can put into this project, I'll stick to In-Reply-To:/References: and Subject:. Hopefully, the code will be modular enought that this can be changed later. >As for constructing threads at index time, this may be best for >efficiency but extra care must be give to how threads are >represented. For example, an "in-reply-to" linked tree may >contain several distinct, but related threads. It should be >possible to get at the sub-threads individually, as well as the >larger thread. This means that any message may have multiple >thread membership, either directly or indirectly via some thread >record with pointers to parent/child threads. Ultimately, I >would hope for thread discovery at search time rather than >indexing time because it offers much more flexibility in tweaking >various dimensions of the thread concept--broadening or narrowing >the boundaries, building threads that cross boundaries between >in-reply-to message trees, etc.... For the first step, it won't be a linked tree but a list: given a message, the in-reply-to id is used to look up the thread id from the message referenced, thus assigning all follow-ups the same thread-id. Yes, this leaves room for improvement :-) How do you determine the border between two threads, that are linked to the same anchestors? My general feeling is that I rather look through 100 messages to find the one I want than let an "intelligent" system present only one to me, and that being not the one I'm looking for. Cheers, Stefan -- Stefan Bethke Promo Datentechnik | Tel. +49-40-851744-0 + Systemberatung GmbH | Fax. +49-40-851744-44 Eduardstrasse 46-48 | e-mail: stefan@Promo.DE D-20257 Hamburg | http://www.Promo.DE/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?l03102802b02c8925d358>