Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 6 Jul 1999 16:57:08 -0500
From:      Chris Costello <chris@calldei.com>
To:        Nik Clayton <nclayton@lehman.com>
Cc:        Bill Fumerola <billf@chc-chimes.com>, doc@FreeBSD.ORG, hackers@FreeBSD.ORG
Subject:   Re: Searching the Handbook (was Re: 'rtfm script')
Message-ID:  <19990706165708.N4158@holly.dyndns.org>
In-Reply-To: <19990706115526.Z15628@lehman.com>; from Nik Clayton on Tue, Jul 06, 1999 at 11:55:26AM %2B0100
References:  <Pine.HPP.3.96.990705100523.26110A-100000@hp9000.chc-chimes.com> <19990705141635.D97224@holly.dyndns.org> <19990706115526.Z15628@lehman.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jul 6, 1999, Nik Clayton wrote:
> I've added doc@freebsd.org to the distribution list, for obvious reasons.
> 
> On Mon, Jul 05, 1999 at 02:16:36PM -0500, Chris Costello wrote:
> >    Note that I can't figure out a decent way to search the
> > Handbook at this point, but I'm open to ideas.
> 
> There are a couple of ways you could do it.  Some of them more optimal 
> than others.
> 
>    Executive summary:  sgrep is probably your best choice now, which can
>    can be found at <URL:http://www.cs.helsinki.fi/~jjaakkol/sgrep.html>. 
>    But read on for more.
> 
> The simplest way is to assume that the user has the plain text handbook
> installed, and do a simple grep through that for what you're looking for.

   See the FAQ parser.  I want to be able to get meaningful
output for users.  sgrep is also not viable because it's not in
the default system.

> This is nice and easy to do, but doesn't take advantage of the additional
> smarts built in to the Handbook's native format.  To do that requires some
> additional work.

   The handbook's native format won't be on the default system,
will it?  They're all in HTML.

> A smart searching mechanism will be able to use this additional semantic
> information to reject (or lower the rankings of) results that don't match
> what the user wanted.

   See above.  I want rtfm(1) to remain viable on base installs.

> For example, suppose you're searching the Handbook for examples of the 
> make(1) command in action.  The simple string "make" occurs lots of times
> in the Handbook.  However, you're only interested in those sections where
> it occurs *inside* a <userinput> element; all the other occurences can be
> ignored.
> 
> For a simple rtfm(1) style search most of this can probably be ignored, and
> you can just search the plain text handbook.  But even then you might want
> to provide switches that allow the user to specify:
> 
>   -  Only match this word if found in an example
> 
>   -  Only match this word if found in a title
> 
>   -  Only match this word if found in a command name

   This can only be done if the user can access the SGML files.
First off, the SGML files are not installed by default.
Accessing them online would require having to search every single
one or one big one, which isn't a good idea at this point.


> You could go the full SGML route.  This would involve building an 
> application that can parse the DocBook source of the Handbook (and other
> articles, and soon to be the FAQ) and allow the user to do their queries
> using this application.  This is probably the most 'correct' route from
> a purist point of view, but is an awful lot of work.

   If the FAQ is to be DocBook-ified, will the SGML sources be
made availible via HTTP so rtfm(1) can still cleanly parse them
with a minor rewrite of the FAQ section?

> *Much* simpler is to build a grep-alike that understands structured 
> documents, but that doesn't care how those documents are structured.  This
> is such a great idea that someone's already done it -- sgrep, which can
> be found at <URL:http://www.cs.helsinki.fi/~jjaakkol/sgrep.html>; can 
> search structured text (such as DocBook, HTML, or even mail files).

   I'd have to integrate it into rtfm(1), and see above about
access to the handbook.

-- 
Chris Costello                                <chris@calldei.com>
Computers are only human.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990706165708.N4158>