From owner-freebsd-questions Fri Mar 13 07:18:56 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id HAA05598 for freebsd-questions-outgoing; Fri, 13 Mar 1998 07:18:56 -0800 (PST) (envelope-from owner-freebsd-questions@FreeBSD.ORG) Received: from colossus.dyn.ml.org (dburr@206-18-115-218.la.inreach.net [206.18.115.218]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id HAA05580; Fri, 13 Mar 1998 07:18:37 -0800 (PST) (envelope-from dburr@colossus.dyn.ml.org) Received: (from dburr@localhost) by colossus.dyn.ml.org (8.8.7/8.8.7) id HAA03815; Fri, 13 Mar 1998 07:20:37 -0800 (PST) (envelope-from dburr) Message-ID: X-Mailer: XFMail 1.2 [p0] on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 Date: Fri, 13 Mar 1998 07:20:37 -0800 (PST) Organization: Computer Help From: Donald Burr To: FreeBSD Ports , FreeBSD Questions Subject: Squid: Proxying for fun and profit Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG -----BEGIN PGP SIGNED MESSAGE----- Well, I just set up my first Web proxy server, using squid (in ports). Wow. Now I have all of my various machines in here going through the web proxy on my FreeBSD box instead of going out and trying to fetch the page themselves. Certainly speeds things up since a lot of the users access the same pages over and over again (the FreeBSD site is quite popular :) ). Now I'd like to try and do something more fancy, and perhaps someone out there can help me out. I would like to automatically keep the most current pages of various web sites online. For example, a lot of people access http://www.freebsd.org (FreeBSD) and http://www.linux.org (Linux). I'd like my proxy server to automatically go through these entire sites, and fetch all pages, and keep checking periodically to see if any pages here get updated. (Sort of like what a "web spider" type of indexing bot like AltaVista does) The catch, though, is that I don't want this automatic fetching to cross site boundaries. For example, let's say I'm indexing http://www.freebsd.org, and I get along to a page mentioning a new device driver doohickey by Acme Computer (http://www.acme.com/). I would like it to skip over www.acme.com --ie only index www.freebsd.org pages. Obviously, this is so that my index thing doesn't run wild and try and download the entire Web to my computer, which I don't want! [I do have a lot of disk space, but not THAT much! -- like Steven Wright said, "You can't have everything -- where would you put it?"] Is there anything available (either in ports, or a Perl script that someone hacked up, etc.) that will do this? Your help would be greatly appreciated. Thanks! - --- Donald Burr - Ask me for my PGP key | PGP: Your WWW HomePage: http://DonaldBurr.base.org/ ICQ #1347455 | right to Address: P.O. Box 91212, Santa Barbara, CA 93190-1212 | 'Net privacy. Phone: (805) 957-9666 FAX: (800) 492-5954 | USE IT. -----BEGIN PGP SIGNATURE----- Version: 2.6.2 iQCVAwUBNQlOxfjpixuAwagxAQFgKAP+IEWcp/9ZwN9+zN16KoopnLXH9SHxXbqQ HmVu/gcnCsirYGPYuNbbRbuDIhc+3SC76f0EsJjQPJ9ImYcWcEz1O3FkmUAmWchK 6HUbC6f5D6rgcgZEBsvGjTsndJQk3dVTUCnkuCx+P+QzvgybxbNkbOZSveUSsox+ Nx0OE5G+eBY= =ZmwL -----END PGP SIGNATURE----- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message