From owner-freebsd-performance@FreeBSD.ORG Tue Jul 22 23:04:43 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5D3E337B401 for ; Tue, 22 Jul 2003 23:04:43 -0700 (PDT) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id B13EA43F3F for ; Tue, 22 Jul 2003 23:04:42 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfkao.dialup.mindspring.com ([165.247.209.88] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19fCjk-00009i-00; Tue, 22 Jul 2003 23:04:29 -0700 Message-ID: <3F1E252B.7F1F676@mindspring.com> Date: Tue, 22 Jul 2003 23:03:23 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Michael Conlen References: <3F1D6B04.4010704@obfuscated.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a472385ceb6388863aa9e9d9511088b69c350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-performance@freebsd.org Subject: Re: sbwait state for loaded Apache server X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Jul 2003 06:04:43 -0000 Michael Conlen wrote: > I'm working with an Apache webserver running 1400 apache processes and > the system pusing somewhere in the area of 50-60Mbit/sec sustained. The > system seems to top out around 60Mbit/sec and I see some minor > degradation of server response times. The server response times are > generally very very stable otherwise. Most of the apache processes are > in the sbwait state. I've got 4Gig of memory, so I can play with some of > the values (nmbclusters has been turned up and I never see delayed or > dropped requests for mbufs). > > I don't see in my old Design & Implementation of 4.4BSD (Red Book?) much > about the state, and I don't a copy of TCP/IP Illustrated 2 handy these > days, but if memory serves sbwait is waiting on a socket buffer > resource. My guess is that these are processes waiting on the send > buffer to drain. > > $ netstat -an | egrep '[0-9] 3[0-9]{4}' | wc -l > 297 > > seems to indicate that I've got a lot of processes waiting to drain. > Looking at the actual output it shows most of these are ESTABLISHED. cd /usr/src/sys/kern grep sbwait The sleep call is the sbwait() function in uipc_socket2.c. It's called: o On the send buffer from sendfile() o On the send buffer from sosend() o On the receive buffer from soreceive() There's also a commented out call on the receive buffer in unp_gc(), which you can ignore, since it only deals with rights issues in the uipc cases related to AF_UNIX (UNIX domain sockets). The receive() case can probably be ignored, too, since it deals with blocking reads on sockets with no data present, and Apache generally doesn't do this. So you are spending your time waiting for the send buffers to drain on an sosend() or a sendfile(). Basically, this probably means that you have a client on a slow link talking to your server on a fast link, or you have a client that is intentionally attempting to DOS you by sending a request and not keeping up the TCP/IP conversation, or you are running Microsoft's WAST HTTP benchmark program, and you really don't understand how it works, or you are running a Web Avalanche(tm) box against your server, or you have legitimate client traffic, but the client has dropped off the net. If I had to guess, I'd say "slow clients". > So my thought is by increasing the send queue size I could reduce this. > I've got a pretty good idea on the size of the files being sent and my > thoughts were to increase the send-q size to where Apache can write() > the file and go to the keep alive state quickly instead of waiting. This would most likely be an incredibly bad idea, since you will be more likely to go to FIN-WAIT-2 state, instead, and if you did go into an application level KeepAlive in Apache, you aren't going to be sending any KeepAlive messages that are going anywhere until the rest of the data has drained. Further, you really want to delay closes until the client has done the close (1-2 seconds after you would have closed the socket) so that the client goes into FIN-WAIT-2 instead of you going into that state (servers don't want to be the ones to close the connection, because the FIN-WAIT-1 -> FIN-WAIT-2 transition doesn't get reversed ...even though you could technically pretend you never got the second FIN and solicit either an RST or another FIN... or no response, after a couple of which you could safely drop the connection. > So the questions are > > Would this affect actual network performance Yes. It would negatively impact the overall total number of connections you could handle without running out of RAM, and it would penalize faster connections (people who get what they want and get the heck off your server, thus reducing your load) in favor of people with slower connections (people who get on your server and stay there forever, consuming your resources). > Would this reduce load on the machine (a handy thing to do, but secondary) > given c = number of connections and q = queue adjustment and s = size of > mbuf do I just need to make sure I have (c*q)/s buffers available, and > any fudge? Not really. All it would do is move your Apache processes into trying to poll clients who aren't going to answer, trying to ask them if they are still there and still need the connection. Or it will drop you into FIN-WAIT-2, and if the client drops off the net, or is trying to DOS your server, leave it in that state for about 4 hours, chewing up those resources (the default for the timer that the TCP/IP standard doesn't permit people to implement to reap hanging FIN-WAIT-2's, but which people implement anyway). Much better to hack the stack to pretend it didn't get the second FIN at this point, send a FIN/ACK, and then only keep the connection around if you get a FIN back. > How do I know when I need to increase the overall system buffer size > beyond 200 MB? That's a hard one to answer. The general answer is "When a overall system buffer size less tan or equal to 200 MB is constraining my ability to service connections". -- Terry