Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 7 Apr 1996 21:55:47 -0700 (PDT)
From:      "Bryan K. Ogawa" <bkogawa@netvoyage.net>
To:        Dave Andersen <angio@shell.aros.net>
Cc:        Jaye Mathisen <mrcpu@cdsnet.net>, freebsd-questions@FreeBSD.ORG
Subject:   IP-host lookups in NCSA common log files (was: Re: Apache still and timeouts)
Message-ID:  <Pine.NEB.3.92.960407213404.9801A-100000@digital.netvoyage.net>
In-Reply-To: <199603301043.DAA20878@shell.aros.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 30 Mar 1996, Dave Andersen wrote:

> Lo and behold, Jaye Mathisen once said:
>
> > The local named would cache some of them I would think as well, it may be
> > better to let named worry about it...
> >
> > I'd be interested in the script if you finish it.
>
>    The only downside to that is that you'll suffer a pretty hefty
> performance penalty.  Yes, the odds are .. somewhat good that named will
> cache the successful hits, but you're still stuck using the networking
> interface to do lookups (read: slow as hell) instead of reading them from
> local memory (infinitely faster. :) and you lose the benefit of being
> able to 'flag' unlookupable addresses quickly and efficiently so you
> don't do multiple unsuccessful queries - the real bogdown.
>
>    Just make sure you've got enough memory in the beast.  Even using a
> bunch of swap would be faster than a reverse namelookup on the IP.
[...]

My tests with my original script demonstrated the above behavior (e.g. the
caching was much faster).  I was also surprised at the small amount of
memory the caching script took.

That said, I rewrote the scripts.  I have included two versions below:
Somewhat obtuse, and extremely obtuse. :)  The first version is pretty
straightforward, but it includes a subroutine that does all of the caching
IP to host conversions that is designed to be short and magic.

Here it is:

#!/usr/bin/perl

# does IP to hostname conversion of NCSA common log format access_log
# files

# Usage: ip2host <filename> <filename> ...
# Or, it will take input from standard in.  In either case, output is to
# standard out.

# Bryan K. Ogawa <bkogawa@netvoyage.net>

require "sys/socket.ph";

while(<>) {
    ($ip, $rest) = split(/ /,$_,2);
    print &gethost($ip), " $rest";
}

sub gethost {
    $CACHE{$_[0]} || ($CACHE{$_[0]}
      = (gethostbyaddr(pack("C4",split(/\./,$_[0])), &AF_INET))[0] || $_[0]);
}

__END__

For you non-perl people out there, you can leave out the __END__ if the
program (from the !# to the last } ) is by itself in a file.

Short explanation: The subroutine gethost uses an associative array as a
cache, filling it with the found hostname, or the original value if no
hostname is found (so, if the IP lookup fails, or the item was already a
name, the returned value should be the original value).

If the host has a name like 205.162.host.net , and there is a name
associated with 205.162.0.0 , it might produce incorrect values; I didn't
test that case.

I decided I wanted to see how small I could make it, so I came up with
this:

#!/usr/bin/perl -ap
$_=$F[0];$_=join(" ",$a{$_}||($a{$_}=
(gethostbyaddr(pack(C4,split(/\./)),2))[0]||$_),@F[1..$#F])."\n";
__END__

Again, the __END__ isn't necessary.  The 2nd and 3rd lines can be
concatenated together--I split it for sending via mail.  In addition, it
can be invoked from the command line as:

perl -ape '$_=$F[0] ...<insert rest of 2nd/3rd lines here>...' <filenames>

Again, it can also use STDIN instead of a list of files, and in both
cases, it outputs to standard out.

This second version has another known caveat--it presumes AF_INET equals
2.

I hope you find this useful.

bryan

--
Bryan K. Ogawa  II Infinitum  <><  On this account I speak for myself.
<bkogawa@netvoyage.net>       SDG  http://www.netvoyage.net/~bkogawa/




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.92.960407213404.9801A-100000>