Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Apr 1998 08:48:32 -0600
From:      "Jan L. Peterson" <jlp@Part.NET>
To:        spork <spork@super-g.com>
Cc:        isp@FreeBSD.ORG
Subject:   Re: log to st0? 
Message-ID:  <199804171448.IAA23909@loa.part.net>
In-Reply-To: Your message of "Fri, 17 Apr 1998 01:50:06 EDT." <Pine.BSF.3.96.980417014813.399B-100000@super-g.inch.com> 
References:  <Pine.BSF.3.96.980417014813.399B-100000@super-g.inch.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> We're running into problems with archiving hits from some of the larger
> sites we host.  We have been toying with the idea of a "log server" to
> collect and analyze the logs.  Any suggestions?

Well, I mentioned some things about how we did log processing at iMALL. 
Here's a little more detail:  

We were running the Stronghold server, which is based on Apache.  It has
the facility to log to a pipe instead of a file, which we used to feed a
process called "batcher".  We also used the LogFormat feature to tag log
lines for rooted vdomains with the account name (letting us only have to
maintain one log stream instead of one for each vdomain). batcher would
produce batch files containing five minutes worth of logs each, which
would be copied over to the log machine with ssh.  All logs were written
without DNS resolution (made the servers too slow), and logging via a
pipe meant that we never had to reload the servers just to change log
files.  Also, you could have multiple servers (we had four) all feeding
the same log processing system.

On the log machine, a process called the "cooker" would take the raw
five minute batches and process out the DNS information, leaving
hostnames for IP's that could be resolved (local caching of both
resolvable and unresolvable IP addresses was also maintained).  It would
also re-write the request for any log lines that came from a rooted
vdomain so that they looked like they were served by the normal web
servers (all vdomains were actually sub-directories of the main server,
i.e. http://www.circuscircus.com/ could also be referenced as 
http://www.imall.com/stores/ccmain/inc/, so all log lines from 
http://www.circuscircus.com/whatever were re-written by the cooker to
look like they hit http://www.imall.com/stores/ccmain/inc/whatever.

After the cooker finished with the batches, they were handed off to a 
third processes called the "splitter".  splitter would take the five 
minute batches and join them together into a master log file for each 
day.  When a day's log file had not been modified for 24 hours, 
splitter would compress it with gzip.  splitter also had the 
functionality to split out a particular store's logs from the master 
log and save them in an independent log file (one per store per day), 
but that facility was turned off since our custom log processing 
software did not require separate log files for each customer yet 
provided sufficient information to the customer that they didn't need 
the raw logs themselves.

These master logs were processed nightly by a locally developed package 
called DAP, which would produce summary files and detailed log 
breakdowns for each customer.  These summaries were placed in a 
"backroom" for each store, where the customer could pick them up.  
(They included not only web server log information, but also 
information about how many times the store had shown up in a search, 
and how many times that resulted in a visit to the store.  Also logged 
were the number and dollar amounts of all sales through that store, and 
summary information such as dollars per visit, etc.)  DAP also mailed a 
summary to the store owner every two weeks.

The finished master logs were left in a directory, and a cron job would 
come along and move any that were more than 45 days old out to a 
separate archive directory.  Another cron job would watch this archive 
directory, and when it was getting close to having about 650MB in it, 
would mail a request to our operations queue, requesting that it be 
burned off to CD.  This process was performed manually by a staff 
member (usually about once every four to six weeks).  This way, we had 
a permanent record of all web server accesses, broken down by day (in 
case some auditor needed to see or sample them).  Archiving them off to 
tape would have been similar, but we would have lost the random-access 
quality of the CDs (i.e., say you wanted logs from 8 September 1997, 
you'd probably have to spend a lot of time reading tape to find the log 
file you want... with the CD, you just pop it into a drive, mount it 
up, and copy off the log file you want).  Oh yeah, we did run nightly 
backups of the log processing machine, so worst case we could lose one 
day's worth of logs.  It wouldn't have been too difficult to have a 
staging area on another machine that would have removed this risk, but 
we determined that it wasn't worth the cost.  Another option would have 
been to have two independent log processing machines, and copy the 
batches down to both of them, but again, the cost outweighed the risk 
in our opinion.

Our hardware investment to handle this log processing was a Pentium 133
based system running FreeBSD (128MB of RAM), with a buslogics fast/wide
scsi controller driving a 4GB disk.  All of the log processing programs
described above were written in Perl.  The CD's were burned on a second
Pentium 133 running (gasp) Linux (we were never able to get cdrecord to
work under FreeBSD with our weird-o CD-R drive, but Linux drove it just
fine).

If you're interested in setting up a similar system, that should be 
enough detail to get you going.  If not, I'm available for consulting 
at $100/hour plus expenses.  :-)

	-jan-
-- 
Jan L. Peterson         PartNET                    tel. +1 801 581 1118
Senior Systems Admin    423 Wakara Way, Suite 216  fax  +1 801 581 1785
jlp@part.net            Salt Lake City, UT 84108   http://www.part.net/



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-isp" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199804171448.IAA23909>