Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Sep 2009 06:03:38 -0400
From:      James Tanis <james@tanis.us>
To:        freebsd-questions@freebsd.org
Subject:   dhcpd related issues
Message-ID:  <7640F3CC-586D-4087-A78B-DF43F515A8E4@tanis.us>

next in thread | raw e-mail | index | archive | help
I have a FreeBSD 7.0 gateway/server with isc-dhcpd 3.1.2p1_2. Late  
yesterday I began having some unique and intermittent issues.  
Basically, random computers will all of a sudden lose their dhcp  
leases and be unable to contact the dhcp server.

At first I figured the dhcp server had crashed, but it did not. It was  
still up and running. Secondly I figured we ran out of leases; this  
has happened before -- the school is growing rapidly enough, not to  
mention the kids keep getting more connected. Unfortunately, after  
doubling the amount of available leases the problem is still persisting.

Now the issue gets more confused by the fact that some computers  
haven't been affected at all. There seems to be no real difference  
between their configurations and the configurations of the computers  
affected. For a while I was considering the possibility of the switch  
dropping packets or developing bad ports, but the behavior isn't  
consistent with that. One would think that if the port connecting a  
secondary switch to the main switch was going bad that it would affect  
all clients on the secondary switch -- this is not the case. There  
doesn't seem to be much rhyme or reason to which computers are affected.

The server isn't reporting any dropped packets on either of its  
interfaces and the links aren't even close to saturated. I'm  
completely at a loss as to the cause of the problem. The problem  
occurs in a time period that is  pretty consistent with the default  
lease time -- which would suggest there is something odd happening  
with lease renewal, but I certainly can't seem to get a grasp on it.

If I do a "cat debug.log|grep dhcpd" I get:

Sep 17 08:36:07 grendel dhcpd: ICMP Echo Reply for 192.168.1.243 late  
or spurious.
Sep 17 12:58:04 grendel dhcpd: ICMP Echo Reply for 192.168.1.57 late  
or spurious.
Sep 17 12:58:04 grendel dhcpd: ICMP Echo Reply for 192.168.1.57 late  
or spurious.
Sep 17 13:56:27 grendel dhcpd: ICMP Echo Reply for 192.168.1.155 late  
or spurious.
Sep 17 14:03:15 grendel dhcpd: ICMP Echo reply while lease  
192.168.1.253 valid.
Sep 17 15:25:19 grendel dhcpd: ICMP Echo Reply for 192.168.1.74 late  
or spurious.

which doesn't seem particularly relevant or heinous. Many more  
computers than the ones above have been affected.

doing the same for the console.log got me a whole bunch of:

Sep 17 16:45:18 grendel dhcpd: if mdchs203-2.mdchs.org IN A rrset  
doesn't exist add mdchs203-2.mdchs.org 300 IN A 192.168.1.162: timed  
out.
Sep 17 16:45:26 grendel dhcpd: if mdchs100-1.mdchs.org IN A rrset  
doesn't exist add mdchs100-1.mdchs.org 300 IN A 192.168.1.126: timed  
out.

which is pretty much the norm and shouldn't be causing the problem.

The main switch is a HP Procurve 1700-24 and it doesn't seem to be  
reporting any problems. All ports are up that should be. There is 1  
"Rx Error Packet" on Port 23 being reported. Port 23 is the one that  
goes out to the server, but a single packet couldn't be causing this  
kind of behavior.

Does anyone have *any* ideas? I'm about tapped out myself here. I'll  
attack the problem fresh if it persists tomorrow, but I'd like to come  
with some ideas from different perspectives.

Here is the dhcpd.conf file, recently changed to add more leases:

ddns-update-style ad-hoc;
option domain-name "mdchs.org";
option domain-name-servers 192.168.1.1;
option netbios-name-servers 192.168.1.1;
option netbios-node-type 8;

shared-network mdchs {
        default-lease-time 600;
        max-lease-time 7200;
        option subnet-mask 255.255.0.0;
        option broadcast-address 192.168.255.255;
        option routers 192.168.1.1;

subnet 192.168.1.0 netmask 255.255.255.0 {
	range 192.168.1.46 192.168.1.253;

	host mdchs12 {
		hardware ethernet xx:xx:xx:xx:xx:xx;
		fixed-address 192.168.1.6;
	}
       #### snipped the rest of the host entries  for brevity ####
}

subnet 192.168.2.0 netmask 255.255.255.0 {
        range 192.168.2.1 192.168.2.254;
}
}

It seems worth noting that this server was functioning perfectly well  
for a year and half before this occured. Nothing was changed before  
the problem manifested. After the problem manifested I upgraded to the  
above mentioned version and added the shared-network with the second  
subnet. So far the nature of the problem has not change whatsoever.

--
James Tanis
Technical Coordinator
Computer Science Department
Monsignor Donovan Catholic High School






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7640F3CC-586D-4087-A78B-DF43F515A8E4>