Date: Thu, 14 Aug 2003 10:36:52 -0500 From: "Jack L. Stone" <jackstone@sage-one.net> To: Jez Hancock <jez.hancock@munk.nu>, freebsd-questions@freebsd.org Subject: Re: Script help needed please Message-ID: <3.0.5.32.20030814103652.012fa800@sage-one.net> In-Reply-To: <20030814144446.GC69860@users.munk.nu> References: <3.0.5.32.20030814084949.012f40e8@sage-one.net> <3.0.5.32.20030814084949.012f40e8@sage-one.net>
next in thread | previous in thread | raw e-mail | index | archive | help
At 03:44 PM 8.14.2003 +0100, Jez Hancock wrote: >On Thu, Aug 14, 2003 at 08:49:49AM -0500, Jack L. Stone wrote: >> Server Version: Apache/1.3.27 (Unix) FrontPage/5.0.2.2510 PHP/4.3.1 >> The above is typical of the servers in use, and with csh shells employed, >> plus IPFW. >> >> My apologies for the length of this question, but the background seems >> necessary as brief as I can make it so the question makes sense. >> >> The problem: >> We have several servers that provide online reading of Technical articles >> and each have several hundred MB to a GB of content. >> >> When we started providing the articles 6-7 years ago, folks used browsers >> to read the articles. Now, the trend has become a more lazy approach and >> there is an increasing use of those download utilities which can be left >> unattended to download entire web sites taking several hours to do so. >> Multiply this by a number of similar downloads and there goes the >> bandwidth, denying those other normal online readers the speed needed for >> loading and browsing in the manner intended. Several hundred will be >> reading at a time and several 1000 daily. ><snip> >There is no easy solution to this, but one avenue might be to look at >bandwidth throttling in an apache module. > >One that I've used before is mod_throttle which is in the ports: > >/usr/ports/www/mod_throttle > >which allows you to throttle users by ip address to a certain number of >documents and/or up to a certain transfer limit. IIRC it's fairly >limited though in that you can only apply per IP limits to _every_ >virtual host - ie in the global httpd.conf context. > >A more finegrained solution (from what I've read, haven't tried it) is >mod_bwshare - this one isn't in the ports but can be found here: > >http://www.topology.org/src/bwshare/ > >this module overcomes some of the shortfalls of mod_throttle and allows >you to specify finer granularity over who consumes how much bandwidth >over what time period. > >> Now, my question: Is it possible to write a script that can constantly scan >> the Apache logs to look for certain footprints of those downloaders, >> perhaps the names, like "HTTRACK", being one I see a lot. Whenever I see >> one of those sessions, I have been able to abort them by adding a rule to >> the firewall to deny the IP address access to the server. This aborts the >> downloading, but have seen the attempts constantly continue for a day or >> two, confirming unattended downloads. >> >> Thus, if the script could spot an "offender" and then perhaps make use of >> the firewall to add a rule containing the offender's IP address and then >> flush to reset the firewall, this would at least abort the download and >> free up the bandwidth (I already have a script that restarts the firewall). >> >> Is this possible and how would I go about it....??? >If you really wanted to go down this route then I found a script someone >wrote a while back to find 'rude robots' from a httpd logfile which you >could perhaps adapt to do dynamic filtering in conjunction with your >firewall: > >http://stein.cshl.org/~lstein/talks/perl_conference/cute_tricks/log9.html > >If you have any success let me know. > >-- >Jez > Interesting. Looks like a step in the right direction. Will weigh this one along the possibilities. Many thanks...! Best regards, Jack L. Stone, Administrator SageOne Net http://www.sage-one.net jackstone@sage-one.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3.0.5.32.20030814103652.012fa800>