Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Jan 2002 15:29:08 -0700
From:      Chad David <davidc@acns.ab.ca>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        Chad David <davidc@acns.ab.ca>, current@freebsd.org
Subject:   Re: socket shutdown delay?
Message-ID:  <20020116152908.A1476@colnta.acns.ab.ca>
In-Reply-To: <3C45F32A.5B517F7E@mindspring.com>; from tlambert2@mindspring.com on Wed, Jan 16, 2002 at 01:39:54PM -0800
References:  <20020116070908.A803@colnta.acns.ab.ca> <3C45F32A.5B517F7E@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jan 16, 2002 at 01:39:54PM -0800, Terry Lambert wrote:
> Chad David wrote:
> > Has anyone noticed (or fixed) a bug in -current where socket connections
> > on the local machine do not shutdown properly?  During stress testing
> > I'm seeing thousands (2316 right now) of these:
> > 
> > tcp4       0      0  192.168.1.2.8080       192.168.1.2.2215       FIN_WAIT_2
> > tcp4       0      0  192.168.1.2.2215       192.168.1.2.8080       LAST_ACK
> > 
> > Both the client and the server are dead, but the connections stay in this
> > state.
> > 
> > I tested with the server on -current and the client on another box, and
> > all of the server sockets end up in TIME_WAIT.  Is there something delaying
> > the last ack on local connections?
> 
> A connection goes into FIN_WAIT_2 when it has received the ACK
> of the FIN, but not received a FIN (or sent an ACK) itself, thus
> permitting it to enter TIME_WAIT state for 2MSL before proceeding
> to the CLOSED state, as a result of a server initiated close.
> 
> A connection goes into LAST_ACK when it has sent a FIN and not
> received the ACK of the FIN before proceeding to the CLOSED
> state, as a result of a client initiated close.

I've got TCP/IP Illistrated V1 right beside me, so I basically
knew what was happening.  Just not why.

Like I said in the original email, connections from another machine
end up in TIME_WAIT right away, it is only local connection.

> 
> Since it's showing IP addresses, you appear to be using real
> network connections, rather than loopback connections.

In this case yes.  Connections to 127.0.0.1 result in the same thing.

> 
> There are basically several ways to cause this:
> 
> 1)	You have something on your network, like a dummynet,
> 	that is deteministically dropping the the ACK to
> 	the client when the server goes from FIN_WAIT_1,
> 	so that the server goes to CLOSING instead of going
> 	to FIN_WAIT_2 (client closes first), or the FIN in
> 	the other direction so that the server doesn't go
> 	to TIME_WAIT from FIN_WAIT_2 (server closes first).

Nothing like that on the box.

> 
> 2)	You have intentionally disabled KEEPALIVE, so that
> 	a close results in an RST instead of a normal
> 	shutdown of the TCP connection (I can't tell if
> 	you are doing a real call to "shutdown(2)", or if
> 	you are just relying on the OS resource tracking
> 	behaviour that is implicit to "close(2)" (but only
> 	if you don't set KEEPALIVE, and have disabled the
> 	sysctl default of always doing KEEPALIVE on every
> 	connection).  In this case, it's possible that the
> 	RST was lost on the wire, and since RSTs are not
> 	retransmitted, you have shot yourself in the foot.
> 
> 	Note:	You often see this type of foolish foot
> 		shooting when running MAST, WAST, or
> 		webbench, which try to factor out response
> 		speed and measure connection speed, so that
> 		they benchmark the server, not the FS or
> 		other OS latencies in the document delivery
> 		path (which is why these tools suck as real
> 		world benchmarks go).  You could also cause
> 		this (unlikely) with a bad firewall rule.

I haven't changed any sysctls, and other than SO_REUSEADDR,
the default sockopts are being used.  I also do not call
shutdown() on either end, and both the client and server
processes have exited and the connections still do not clear
up (in time they do, around 10 minutes).

> 
> 3)	You've exhausted your mbufs before you've exhausted
> 	the number of simultaneous connections you are
> 	permitted, because you have incorrectly tuned your
> 	kernel, and therefore all your connections are sitting
> 	in a starvation deadlock, waiting for packets that can
> 	never be sent because there are no mbufs available.

The client eventually fails with EADDRNOTAVAIL.

Here are the mbuf stats before and after.

Before test:
------------------------------------------------------------------------

colnta->netstat -m
mbuf usage:
        GEN list:       0/0 (in use/in pool)
        CPU #0 list:    51/144 (in use/in pool)
        CPU #1 list:    51/144 (in use/in pool)
        Total:          102/288 (in use/in pool)
        Maximum number allowed on each CPU list: 512
        Maximum possible: 67584
        Allocated mbuf types:
          102 mbufs allocated to data
        0% of mbuf map consumed
mbuf cluster usage:
        GEN list:       0/0 (in use/in pool)
        CPU #0 list:    50/86 (in use/in pool)
        CPU #1 list:    51/88 (in use/in pool)
        Total:          101/174 (in use/in pool)
        Maximum number allowed on each CPU list: 128
        Maximum possible: 33792
        0% of cluster map consumed
420 KBytes of wired memory reserved (54% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

After test:
------------------------------------------------------------------------
colnta->netstat -m
mbuf usage:
        GEN list:       0/0 (in use/in pool)
        CPU #0 list:    59/144 (in use/in pool)
        CPU #1 list:    43/144 (in use/in pool)
        Total:          102/288 (in use/in pool)
        Maximum number allowed on each CPU list: 512
        Maximum possible: 67584
        Allocated mbuf types:
          102 mbufs allocated to data
        0% of mbuf map consumed
mbuf cluster usage:
        GEN list:       0/0 (in use/in pool)
        CPU #0 list:    58/86 (in use/in pool)
        CPU #1 list:    43/88 (in use/in pool)
        Total:          101/174 (in use/in pool)
        Maximum number allowed on each CPU list: 128
        Maximum possible: 33792
        0% of cluster map consumed
420 KBytes of wired memory reserved (54% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

and

colnta->netstat -an | grep FIN_WAIT_2 | wc
    2814   16884  219492

and a few minutes later:
colnta->netstat -an | grep FIN_WAIT_2 | wc
    1434    8604  111852


The box currently has 630MB free memory, and is 98.8% idle.

I'm not sure what other information would be useful?

> 
> 4)	You've got local hacks that your aren't telling us
> 	about (shame on you!).

Nope.  Stock -current, none of my patches applied.

> 
> 5)	You have found an introduced bug in -current.
> 
> 	Note:	I personally think this one is unlikely.

Me too, but I can't think of any reason why the machine wouldn't
send the last ack.  I must be starving something... I'll go over
my code again, and see if I can find a bug.

> 
> 6)	Maybe something I haven't thought of...
> 
> 	Note:	I personally think this one is unlikely,
> 		too... ;^)

Well if you don't know, where does that leave me? :).

> 
> See RFC 793 (or Stevens) for details on the state machine for
> both ends of the connection, and you will see how your machine
> got into this mess in the first place.

I've been reading it... 

Thanks.

-- 
Chad David        davidc@acns.ab.ca
www.FreeBSD.org   davidc@freebsd.org
ACNS Inc.         Calgary, Alberta Canada
Fourthly, The constant breeders, beside the gain of eight shillings
sterling per annum by the sale of their children, will be rid of the
charge of maintaining them after the first year. - Johnathan Swift

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020116152908.A1476>