From owner-freebsd-net  Thu Jan  2 12:33:59 2003
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id BE17E37B401
	for <freebsd-net@freebsd.org>; Thu,  2 Jan 2003 12:33:56 -0800 (PST)
Received: from 66-162-33-178.gen.twtelecom.net (66-162-33-178.gen.twtelecom.net [66.162.33.178])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4448E43EC5
	for <freebsd-net@freebsd.org>; Thu,  2 Jan 2003 12:33:56 -0800 (PST)
	(envelope-from jeff@expertcity.com)
Received: from [10.4.1.134] (helo=expertcity.com)
	by 66-162-33-178.gen.twtelecom.net with esmtp (Exim 3.22 #4)
	id 18UC2H-0006dd-00; Thu, 02 Jan 2003 12:33:49 -0800
Message-ID: <3E14A22B.5070203@expertcity.com>
Date: Thu, 02 Jan 2003 12:33:47 -0800
From: Jeff Behl <jeff@expertcity.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3a) Gecko/20021212
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Mike Silbersack <silby@silby.com>
Cc: freebsd-net <freebsd-net@freebsd.org>
Subject: Re: when are mbuf clusters released?
References: <3E108C31.9060403@expertcity.com> <20021230132145.N22880-100000@patrocles.silby.com>
In-Reply-To: <20021230132145.N22880-100000@patrocles.silby.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-net@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-net.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-net>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-net>
X-Loop: FreeBSD.org

Thanks for the info.  Could you explain how mbuf clusters and mbufs are 
related?  i'd like to better understand how we can run out of one and 
not the other.  also, is there an upper value for mbuf clusters that we 
should be wary of?  again, the tuning page is quite vague on this. 
64000 seems to not do the trick so i was thinking of 2X this.  we have 
plenty of memory (1GB and the machine only runs apache).

for those interested, we think we've found why we're getting lots of 
connections in FIN_WAIT_1 state...it appears to be some sort of odd/bad 
interaction between IE and flash (we think).  these machines serve popup 
adds (sorry!), and we're guessing that when a user with a slower 
connection gets one of these pop-ups and kills the window before the 
flash file is done downloading, IE leaves the socket open...sorta, and 
here's where it gets interesting.  it leaves it open, but closes the 
window (sets it to 0)...or is the plugin doing this?  apache is done 
sending data and considers the conenction closed, so its client timeout 
feature never comes into play.  but there is still data in the sendq, 
including the FIN, we believe.  BSD obeys the spec (unfortunately) and 
keeps probing to see if the window has opened so it can transmit the 
last of the data.  this goes on indefinitely!  so we get gobs 
connections stuck in fin_wait_1.  interestingly, we have a solaris 
machine serving the same purpose and it does not have the problem.  it 
seems to not folluw the rfc and silently closes the conenction after a 
number of tries when a socket is in fin_wait_1.  these seems more 
reasonable to me.  this seems (as i've read from other posts as well) 
quite an opportunity for a DoS attack....just keep advertising a window 
of 0.  a single client could easily tie everything up in fin_wait_1...

anyone think of a workaround (besides not serving pop-ups :)

jeff


Mike Silbersack wrote:
> On Mon, 30 Dec 2002, Jeff Behl wrote:
> 
> 
>>5066/52544/256000 mbufs in use (current/peak/max):
>>5031/50612/64000 mbuf clusters in use (current/peak/max)
>>
>>is there some strange interaction going on between apace2 and bsd?
>>killing apache caused the mbuf clusters to start draining, but only
>>slowly.  will clusters still be allocated in FIN_WAIT_? states?  TIME_WAIT?
> 
> 
> Before I answer your question, let me explain how clusters are allocated.
> The first number above shows how many are in use at the moment.  The
> second number shows how many have been used, and are currently allocated.
> The third is the limit you have set.
> 
> What this means is that once an mbuf (or cluster) has been allocated, it
> is never truly freed, only returned to the free list.  As a result, after
> your spike in mbuf usage, you never really get the memory back.  However,
> this may be OK if you have plenty of ram.
> 
> 
>>This maching was serving a couple hundred connections a second...which
>>doesn't seem like it should have taxed it much (p3 1.2 gHz).  CPU util
>>was low.
>>
>>Any help appreciated.
>>
>>Jeff
> 
> 
> Now, on to why the value spiked.  Yes, connections in FIN_WAIT* states
> still hang on to mbuf clusters relating to the data they have been asked
> to send.  There was a DoS script going around which intentionally stuck
> many sockets on a server in the FIN_WAIT_2 state until enough had been
> stuck to cause mbuf cluster exhaustion.  To determine if this is the case,
> just run netstat -n and look at the sendq value; if you see high sendq
> values on a lot of sockets, this may be your answer.
> 
> The other possibility is that you're being hit with lots of IP
> fragments... currently, the IP reassembly code allows too many unassembled
> packets to sit around.  There's no way to inspect the IP reassembly queue
> actively, but you could use netstat -s to see "fragments received" - if
> the number is high, then it's likely something is up.
> 
> Mike "Silby" Silbersack


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message