Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Apr 2000 06:52:32 +0930 (CST)
From:      james <wabit@adl.ussr.net>
To:        Eric Withabee <ericwithabee@hotmail.com>
Cc:        freebsd-questions@FreeBSD.ORG
Subject:   Re: Network interface hanging on 3.3-RELEASE system
Message-ID:  <Pine.BSF.4.21.0004280651490.8893-100000@gw.Adl.USSR.net>
In-Reply-To: <20000427210353.79863.qmail@hotmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I had this problem when my CPU fan was buggered, might be worth looking at
that?

regards
james


On Thu, 27 Apr 2000, Eric Withabee wrote:

> Date: Thu, 27 Apr 2000 17:03:53 EDT
> From: Eric Withabee <ericwithabee@hotmail.com>
> To: freebsd-questions@FreeBSD.ORG
> Subject: Network interface hanging on 3.3-RELEASE system
> 
> Hello.
> 
> I'm experiencing some strange problems with a 3.3-RELEASE system.  It runs 
> fine for a few days, then it starts getting a continually increasing number 
> of TCP connections stuck in the TIME_WAIT state.  The number of connections 
> keeps building until it reaches a total of about 4000 TCP connections, then 
> the server simply stops responding to any requests from the network. From 
> the time the connections start building up to the time the server hangs 
> varies from under half an hour to a few hours.  Again, once the buildup 
> starts, the number of connections in the TIME_WAIT state only increases.
> 
> I've been trying to diagnose the problem, but haven't had much luck.  I'm 
> not sure whether it's due to a bug or not, so I'm posting the question here 
> instead of to freebsd-bugs.
> 
> The problem started as soon as I took the system live.  It replaced another 
> FreeBSD system, and took over all its duties.  It's primarily acting as a 
> mail server (Sendmail 8.9.3 and QPopper 2.53) and a web server (Apache 
> 1.3.9).  It's also running MySQL 9.33.  The server it replaced was a 133MHz 
> Pentium, and the new server is a 233MHz Pentium II.  The old server did not 
> experience this problem -- in fact, it was extremely stable.
> 
> I originally thought that it might be the NIC card, a 3Com 3C905B, or the 
> "xl" driver, so I replaced it with a Linksys LNE100TX ("mx" driver).  This 
> seemed to help somewhat, as the duration between occurrences increased from 
> a few hours to a few days.  However, it continues to occur, and I'm 
> wondering if the improvement when I switched the NIC card was just a 
> coincidence.  Although, since I made the switch, the problem has never 
> occurred as quickly as it did with the 3Com card.  We've had very good luck 
> with 3Com NICs in the past, but this was the first time we'd used a 3C509B 
> and the "xl" driver.
> 
> The time between occurrences varies significantly.  Sometimes, the system 
> will run for over a week, while other times it will run for less than a day.
> 
> Just in case the problem was related to the number of mbufs, I bumped up the 
> default settings so that it has a maximum of 4096 mbuf clusters.  It didn't 
> help.  The system seems to be peak at around 300 mbufs until the problem 
> occurs.
> 
> I decided to see whether it might be a DOS attack, even though that doesn't 
> really make sense, because the problem started as soon as I took the system 
> live.  At the time the problem is occurring, the connections in the 
> TIME_WAIT state don't originate primarily from one IP address.  I suppose 
> this doesn't rule out a distributed DOS attack, but I think that's pretty 
> unlikely.
> 
> Here's some specifics about the system:
> 
> ASUS P3B-F motherboard
> Intel 233MHz PII
> 128MB RAM
> 2 Western Digital Expert 9.1GB 7200 RPM drives
>    Mirrored via an Arco DupliDisk (Bay Mount)
> Linksys EtherFast 10/100 NIC (LNE100TX)
> Adaptec 2940UW SCSI Adapter
> HP SureStore T20i Travan Tape Drive
> Full-tower case with lots of fans
> 
> In the meantime, while I've been trying to figure this out, I've got a 
> cron'ed a script that checks the number of connections and reboots the 
> server if it gets to a stage that indicates that the server has passed the 
> point of no return.  Before it reboots it, it sends me an e-mail message 
> giving the output from a "netstat -n", a "netstat -m" (I just added this 
> today), and a "ps -ax".  It's an ugly hack, but it's keeping me from getting 
> paged at 3:00AM.
> 
> Does anyone have any thoughts?  Thanks for taking the time to read all this.
> 
> Eric
> ________________________________________________________________________
> Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com
> 
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-questions" in the body of the message
> 



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0004280651490.8893-100000>