Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Jul 2000 18:02:16 EDT
From:      "Eric Withabee" <ericwithabee@hotmail.com>
To:        freebsd-questions@freebsd.org
Subject:   Network interface hanging on 3.3-RELEASE system
Message-ID:  <20000728220216.16352.qmail@hotmail.com>

next in thread | raw e-mail | index | archive | help
Hello.

I posted this message a while back, and it got a few responses, but nothing 
to really help solve the problem, so I thought I'd try throwing it out one 
last time.

The responses I got the first time suggested that it may be an overheating 
processor.  However, the processor in the system in question is very 
adequately cooled.  Also, it seems strange that an overheating processor 
would only affect the TCP/IP code, while all other applications continue to 
run fine.

Anyway, if you have time to read it all, here's the original message with 
the detailed description of the problem:

I'm experiencing some strange problems with a 3.3-RELEASE system.  It runs 
fine for a few days, then it starts getting a continually increasing number 
of TCP connections stuck in the TIME_WAIT state.  The number of connections 
keeps building until it reaches a total of about 4000 TCP connections, then 
the server simply stops responding to any requests from the network. From 
the time the connections start building up to the time the server hangs 
varies from under half an hour to a few hours.  Again, once the buildup 
starts, the number of connections in the TIME_WAIT state only increases.

I've been trying to diagnose the problem, but haven't had much luck.  I'm 
not sure whether it's due to a bug or not, so I'm posting the question here 
instead of to freebsd-bugs.

The problem started as soon as I took the system live.  It replaced another 
FreeBSD system, and took over all its duties.  It's primarily acting as a 
mail server (Sendmail 8.9.3 and QPopper 2.53) and a web server (Apache 
1.3.9).  It's also running MySQL 9.33.  The server it replaced was a 133MHz 
Pentium, and the new server is a 233MHz Pentium II.  The old server did not 
experience this problem -- in fact, it was extremely stable.

I originally thought that it might be the NIC card, a 3Com 3C905B, or the 
"xl" driver, so I replaced it with a Linksys LNE100TX ("mx" driver).  This 
seemed to help somewhat, as the duration between occurrences increased from 
a few hours to a few days.  However, it continues to occur, and I'm 
wondering if the improvement when I switched the NIC card was just a 
coincidence.  Although, since I made the switch, the problem has never 
occurred as quickly as it did with the 3Com card.  We've had very good luck 
with 3Com NICs in the past, but this was the first time we'd used a 3C905B 
and the "xl" driver.

The time between occurrences varies significantly.  Sometimes, the system 
will run for over a week, while other times it will run for less than a day.

Just in case the problem was related to the number of mbufs, I bumped up the 
default settings so that it has a maximum of 4096 mbuf clusters.  It didn't 
help.  The system seems to be peak at around 300 mbufs until the problem 
occurs.

I decided to see whether it might be a DOS attack, even though that doesn't 
really make sense, because the problem started as soon as I took the system 
live.  At the time the problem is occurring, the connections in the 
TIME_WAIT state don't originate primarily from one IP address.  I suppose 
this doesn't rule out a distributed DOS attack, but I think that's pretty 
unlikely.

Here's some specifics about the system:

ASUS P3B-F motherboard
Intel 233MHz PII
128MB RAM
2 Western Digital Expert 9.1GB 7200 RPM drives
   Mirrored via an Arco DupliDisk (Bay Mount)
Linksys EtherFast 10/100 NIC (LNE100TX)
Adaptec 2940UW SCSI Adapter
HP SureStore T20i Travan Tape Drive
Full-tower case with lots of fans

In the meantime, while I've been trying to figure this out, I've set up a 
cron script that checks the number of connections and reboots the server if 
it gets to a stage that indicates that the server has passed the point of no 
return.  Before it reboots it, it sends me an e-mail message giving the 
output from a "netstat -n", a "netstat -m" (I just added this today), and a 
"ps -ax".  It's an ugly hack, but it's keeping me from getting paged at 
3:00AM.

Does anyone have any thoughts?  Thanks for taking the time to read all this.

Eric
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000728220216.16352.qmail>