Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 11 Oct 2005 22:21:35 +0200
From:      Wolfram Schneider <wosch@FreeBSD.org>
To:        Tim Wilde <twilde@dyndns.com>
Cc:        www@FreeBSD.org
Subject:   Re: Using Yahoo! or Google search bar instead of search.cgi
Message-ID:  <434C1ECF.4090608@FreeBSD.org>
In-Reply-To: <Pine.BSF.4.63.0510101337140.4465@manganese.bos.dyndns.org>
References:  <Pine.BSF.4.63.0510101337140.4465@manganese.bos.dyndns.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Tim Wilde wrote:

> (Apologies for breaking threading, just joined freebsd-www so I don't 
> have the appropriate messages for a References: header.)
>
> As I mentioned in my earlier post, I think an even bigger problem than 
> the one Murray mentioned can be observed by the fact that a search for 
> "kernel" returns no results at all.


I guess what happens here: "kernel" is a very common word (believe it or 
not).
google has 18.900 hits for the word "kernel" on www.freebsd.org.
Common words (e.g. "a", "the", "an", "www", "is") are usually
ignored by search engines to save space or to speed up searches.
These are known as "stop words." Even google has stop words.

 From my memory, search.cgi has a dynamic stop word list -
words which hit the limit of 20.000 will be ignored.

-Wolfram


> At DynDNS, we recently started indexing our site using ht://Dig 
> (http://www.htdig.org/), and have been very happy with the flexibility 
> it provides for tuning search results to get the most relevant 
> matches.  It is also a true spider, crawling the website over HTTP 
> rather than searching on the filesystem as the current search.cgi 
> seems to do.

-- 

Wolfram Schneider <wosch@FreeBSD.org> http://wolfram.schneider.org




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?434C1ECF.4090608>