Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Apr 2000 17:03:53 EDT
From:      "Eric Withabee" <ericwithabee@hotmail.com>
To:        freebsd-questions@freebsd.org
Subject:   Network interface hanging on 3.3-RELEASE system
Message-ID:  <20000427210353.79863.qmail@hotmail.com>

next in thread | raw e-mail | index | archive | help
Hello.

I'm experiencing some strange problems with a 3.3-RELEASE system.  It runs 
fine for a few days, then it starts getting a continually increasing number 
of TCP connections stuck in the TIME_WAIT state.  The number of connections 
keeps building until it reaches a total of about 4000 TCP connections, then 
the server simply stops responding to any requests from the network. From 
the time the connections start building up to the time the server hangs 
varies from under half an hour to a few hours.  Again, once the buildup 
starts, the number of connections in the TIME_WAIT state only increases.

I've been trying to diagnose the problem, but haven't had much luck.  I'm 
not sure whether it's due to a bug or not, so I'm posting the question here 
instead of to freebsd-bugs.

The problem started as soon as I took the system live.  It replaced another 
FreeBSD system, and took over all its duties.  It's primarily acting as a 
mail server (Sendmail 8.9.3 and QPopper 2.53) and a web server (Apache 
1.3.9).  It's also running MySQL 9.33.  The server it replaced was a 133MHz 
Pentium, and the new server is a 233MHz Pentium II.  The old server did not 
experience this problem -- in fact, it was extremely stable.

I originally thought that it might be the NIC card, a 3Com 3C905B, or the 
"xl" driver, so I replaced it with a Linksys LNE100TX ("mx" driver).  This 
seemed to help somewhat, as the duration between occurrences increased from 
a few hours to a few days.  However, it continues to occur, and I'm 
wondering if the improvement when I switched the NIC card was just a 
coincidence.  Although, since I made the switch, the problem has never 
occurred as quickly as it did with the 3Com card.  We've had very good luck 
with 3Com NICs in the past, but this was the first time we'd used a 3C509B 
and the "xl" driver.

The time between occurrences varies significantly.  Sometimes, the system 
will run for over a week, while other times it will run for less than a day.

Just in case the problem was related to the number of mbufs, I bumped up the 
default settings so that it has a maximum of 4096 mbuf clusters.  It didn't 
help.  The system seems to be peak at around 300 mbufs until the problem 
occurs.

I decided to see whether it might be a DOS attack, even though that doesn't 
really make sense, because the problem started as soon as I took the system 
live.  At the time the problem is occurring, the connections in the 
TIME_WAIT state don't originate primarily from one IP address.  I suppose 
this doesn't rule out a distributed DOS attack, but I think that's pretty 
unlikely.

Here's some specifics about the system:

ASUS P3B-F motherboard
Intel 233MHz PII
128MB RAM
2 Western Digital Expert 9.1GB 7200 RPM drives
   Mirrored via an Arco DupliDisk (Bay Mount)
Linksys EtherFast 10/100 NIC (LNE100TX)
Adaptec 2940UW SCSI Adapter
HP SureStore T20i Travan Tape Drive
Full-tower case with lots of fans

In the meantime, while I've been trying to figure this out, I've got a 
cron'ed a script that checks the number of connections and reboots the 
server if it gets to a stage that indicates that the server has passed the 
point of no return.  Before it reboots it, it sends me an e-mail message 
giving the output from a "netstat -n", a "netstat -m" (I just added this 
today), and a "ps -ax".  It's an ugly hack, but it's keeping me from getting 
paged at 3:00AM.

Does anyone have any thoughts?  Thanks for taking the time to read all this.

Eric
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000427210353.79863.qmail>