Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Aug 1997 11:44:34 -0500 (EST)
From:      John Fieber <jfieber@indiana.edu>
To:        Stefan Bethke <stefan@promo.de>
Cc:        "Jordan K. Hubbard" <jkh@time.cdrom.com>, www@FreeBSD.ORG
Subject:   Re: Something I've always wanted to see with the mailing list search
Message-ID:  <Pine.BSF.3.96.970829112828.341J-100000@fallout.campusview.indiana.edu>
In-Reply-To: <l03102802b02c8925d358@[194.45.188.81]>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 29 Aug 1997, Stefan Bethke wrote:

> At 15:52 Uhr +0200 29.08.1997, John Fieber wrote:
> >For a good discussion of the issues, see:
> >
> >  David D. Lewis and Kimberly A Knowles (1997) Threading
> >  electronic mail: a preliminary study.  Information Processing &
> >  Management, 33(2):209-217.
> 
> Is this a book? If so, do you have an ISBN or something? Or do you know any
> online ressource?

An article in a journal (Information Processing & Management,
published in Great Britain by Elsevier).

> You might be right. But given the amount of spare time I can put into this
> project, I'll stick to In-Reply-To:/References: and Subject:. Hopefully,
> the code will be modular enought that this can be changed later.

Dates will be very helpful, but you the Date: field cannot be
relied upon.  Rather, use the date in the received: line of its
arrival at freefall or hub.  That should provide much more
consistency.

> How do you determine the border between two threads, that are linked to the
> same anchestors? My general feeling is that I rather look through 100
> messages to find the one I want than let an "intelligent" system present
> only one to me, and that being not the one I'm looking for.

Without machinery for doing document similarity measures, about
the only think that comes to mind is relying on the human
convention of using (was: ...) in subject lines.  From a
topological view of a tree alone, I can't think of any way to
determine that two long branches are related or not.  You could
assume that any branch longer than X constitutes a thread of its
own, which just might be on the same topic as another branch.  In
fact, that is probably a better way to do it.

But, if you can compute similarities between documents on one
branch with documents in the other, and they are more different
than the documents within a branch are from each other, we can
guess that the common parent represented a branching point in the
discussion.

Which gives me an interesting idea...find one interesting thread
and use the whole thread as a query to see if the topic was
discussed at another time.


-john




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.970829112828.341J-100000>