Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Dec 2001 13:01:58 -0600 (CST)
From:      Ryan Thompson <ryan@sasknow.com>
To:        Anthony Atkielski <anthony@atkielski.com>
Cc:        FreeBSD Questions <freebsd-questions@FreeBSD.ORG>
Subject:   Re: Uptime not so good after all -- why does my net connection go dead?
Message-ID:  <20011213122631.L94416-100000@catalyst.sasknow.net>
In-Reply-To: <002201c183fd$6d028210$0a00000a@atkielski.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Anthony Atkielski wrote to FreeBSD Questions:

> I thought my FreeBSD system was going to stay up forever, based on
> what I had heard,

Yes, and it should, barring hardware problems, pilot error, or
extended power outage, or managerial downtime.

> but I had to boot it today.  For the umpteenth time, the OS
> abruptly and silently decided to stop communicating with my
> router.  It had no trouble talking to the other PC on my LAN, but
> it absolutely would not talk to the router.  As far as I could
> tell, it would not respond to traffic from the router, nor would
> it send traffic to the router.

To give you a more detailed response, we'll need to see what's
actually going on with FreeBSD. You're reporting, for the most part,
application-level symptoms. ICMP echo requests (ping) in this case
aren't much different. If the problem is with your LAN, you need to go
to the link layer...

From the router, AND the NT machine, try arp lookups for the FreeBSD
machine's public IP address. Do you get the same MAC address as is
shown in by the output of ifconfig(8) in FreeBSD? If no, then perhaps
your router has claimed the IP, or the IP was assigned to another
machine, etc, and you need to pinpoint that. This sort of thing can
happen behind your back.

On the FreeBSD box, put your NIC in promiscuous mode and start
analyzing frames. What actually gets sent out on the wire? Is the
machine seeing the IP packets, but not actually passing them up to the
transport layer? Or maybe it just isn't sending anything out?

I assume your IP address and netmask are set correctly with
ifconfig(8)? Does the router agree with you in terms of netmask?

The output of `netstat -rn` would be extremely helpful. The output and
network config of the router would also be helpful.


Some things You can do:

Try plugging your FreeBSD machine directly into a port on your router,
and unplugging everything else (except your uplink :-). If THIS works,
then another device on the wire is misbehaving.


> - It's not the FreeBSD machine's NIC; the NIC continues to talk to the NT
> machine, and I can also make it work with the router by adding a new IP
> address to the interface ("ifconfig xl0 xxx.xxx.xxx.xxx alias").

This suggests that either something is wrong with ARP, and/or the
routing tables on the FreeBSD machine or the router.


> Nothing seemed to make the problem go away, so after two weeks of
> continuous uptime, I finally bit the bullet and rebooted the
> machine.  The problem was gone when the machine came back up.  I
> did not power-cycle the hardware.

I'd hardly be "biting the bullet" after 2 weeks:
$ uptime
12:41PM  up 261 days,  9:56, 3 users, load averages: 2.37, 2.46, 2.42
$ uname -a
FreeBSD ren.sasknow.com 3.5-STABLE FreeBSD 3.5-STABLE #0: Sun Mar 25
22:28:19 CST 2001 hutenosa@ren.sasknow.com:/usr/src/sys/compile/REN i386

After 10 months or so, I think twice about rebooting. In this time,
this machine has survived two power failures, several brownouts, one
particularly memorable surge, a dead CPU fan, experimental code which
resulted in a fork bomb that filled up the proc table, exhausted the
swap space, and killed just about everything that was running on the
machine, not to mention the abuse it takes from all of our web clients
:-) And, 261 days isn't anywhere near the potential a properly
maintained FreeBSD system can achieve, but it definitely shows it is
sustainable.

10 months ago, the system was taken down to be moved to a different
room and be connected to a different UPS. I had a kernel upgrade ready
for that. Total downtime < 5 min. If not for the "managerial
decisions" I have made, this system probably wouldn't have been down
for the past 4 years (when it was installed).

FWIW, you most did NOT have to reboot the FreeBSD machine :-) There
are plenty of problems that can be "solved" by a reboot, but the vast
majority of those can be solved WITHOUT a reboot if you know what to
fix. That is how many UNIX systems stay operational for several months
or even a few years.


> This means that the NT machine still holds the record for uptime
> by a very handsome margin (several weeks).
>
> I'd like to know exactly what is happening inside FreeBSD when it
> decides to consign this particular IP address to the Twilight Zone
> for one particular destination/source (the router).

Sure, send answers to the questions I've posed, and we'll be able to
get much closer to an explanation.


> Obviously, this is a mission-critical issue, as no production
> system can afford to be completely deprived of external network
> connectivity.
>
> I used to have this problem a lot more until I discovered that the
> router was sending out DHCP and RIP traffic to the LAN.  I turned
> that off and the problem _seemed_ to go away.  Unfortunately, it
> looks like it simply became less frequent instead.  Once in two
> weeks is still completely unacceptable, however.

Which is exactly why you'll have to fix it! :-)



Hope this helps,
- Ryan

-- 
  Ryan Thompson <ryan@sasknow.com>
  Network Administrator, Accounts

  SaskNow Technologies - http://www.sasknow.com
  #106-380 3120 8th St E - Saskatoon, SK - S7H 0W2

        Tel: 306-664-3600   Fax: 306-664-1161   Saskatoon
  Toll-Free: 877-727-5669     (877-SASKNOW)     North America


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011213122631.L94416-100000>