Date: Fri, 29 Jul 2005 11:22:32 -0400 From: Chuck Swiger <cswiger@mac.com> To: Michael Sharp <ms@probsd.org> Cc: freebsd-questions@freebsd.org Subject: Re: Need a good Unix script that.. Message-ID: <42EA49B8.4070804@mac.com> In-Reply-To: <1784.192.168.1.1.1122647757.squirrel@probsd.org> References: <1784.192.168.1.1.1122647757.squirrel@probsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Michael Sharp wrote: > I need a simple sh script that will daily (via cron) crawl a website > looking for multiple keywords, then reporting those keyword results and > URL to an email address. > > Anyone know of a pre-written script that does this, or point me in the > right direction in using the FreeBSD core commands that can accomplish > this? If you feed the webserver's access log into various programs like analog, these will report on the keywords people used to search for when linking into the site. (This is not quite what you asked for, but I mention it because the suggestion might be closer to what you want to see... :-) Anyway, if you do not own the site & have access to the logfiles, you ought to honor things like /robots.txt and the site's policies with regard to copyright and datamining, but you could easily use lynx, curl, or anything similiar which supports a recursive/web-spider download capability, and then grep for keywords, do histograms, whatever on the content you DL. -- -Chuck
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42EA49B8.4070804>