Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Jul 2005 11:22:32 -0400
From:      Chuck Swiger <cswiger@mac.com>
To:        Michael Sharp <ms@probsd.org>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Need a good Unix script that..
Message-ID:  <42EA49B8.4070804@mac.com>
In-Reply-To: <1784.192.168.1.1.1122647757.squirrel@probsd.org>
References:  <1784.192.168.1.1.1122647757.squirrel@probsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Michael Sharp wrote:
> I need a simple sh script that will daily (via cron) crawl a website
> looking for multiple keywords, then reporting those keyword results and
> URL to an email address.
> 
> Anyone know of a pre-written script that does this, or point me in the
> right direction in using the FreeBSD core commands that can accomplish
> this?

If you feed the webserver's access log into various programs like analog, these 
will report on the keywords people used to search for when linking into the 
site.  (This is not quite what you asked for, but I mention it because the 
suggestion might be closer to what you want to see... :-)

Anyway, if you do not own the site & have access to the logfiles, you ought to 
honor things like /robots.txt and the site's policies with regard to copyright 
and datamining, but you could easily use lynx, curl, or anything similiar which 
supports a recursive/web-spider download capability, and then grep for 
keywords, do histograms, whatever on the content you DL.

-- 
-Chuck





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42EA49B8.4070804>