Date: Fri, 29 Aug 1997 11:44:34 -0500 (EST) From: John Fieber <jfieber@indiana.edu> To: Stefan Bethke <stefan@promo.de> Cc: "Jordan K. Hubbard" <jkh@time.cdrom.com>, www@FreeBSD.ORG Subject: Re: Something I've always wanted to see with the mailing list search Message-ID: <Pine.BSF.3.96.970829112828.341J-100000@fallout.campusview.indiana.edu> In-Reply-To: <l03102802b02c8925d358@[194.45.188.81]>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 29 Aug 1997, Stefan Bethke wrote: > At 15:52 Uhr +0200 29.08.1997, John Fieber wrote: > >For a good discussion of the issues, see: > > > > David D. Lewis and Kimberly A Knowles (1997) Threading > > electronic mail: a preliminary study. Information Processing & > > Management, 33(2):209-217. > > Is this a book? If so, do you have an ISBN or something? Or do you know any > online ressource? An article in a journal (Information Processing & Management, published in Great Britain by Elsevier). > You might be right. But given the amount of spare time I can put into this > project, I'll stick to In-Reply-To:/References: and Subject:. Hopefully, > the code will be modular enought that this can be changed later. Dates will be very helpful, but you the Date: field cannot be relied upon. Rather, use the date in the received: line of its arrival at freefall or hub. That should provide much more consistency. > How do you determine the border between two threads, that are linked to the > same anchestors? My general feeling is that I rather look through 100 > messages to find the one I want than let an "intelligent" system present > only one to me, and that being not the one I'm looking for. Without machinery for doing document similarity measures, about the only think that comes to mind is relying on the human convention of using (was: ...) in subject lines. From a topological view of a tree alone, I can't think of any way to determine that two long branches are related or not. You could assume that any branch longer than X constitutes a thread of its own, which just might be on the same topic as another branch. In fact, that is probably a better way to do it. But, if you can compute similarities between documents on one branch with documents in the other, and they are more different than the documents within a branch are from each other, we can guess that the common parent represented a branching point in the discussion. Which gives me an interesting idea...find one interesting thread and use the whole thread as a query to see if the topic was discussed at another time. -john
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.970829112828.341J-100000>