From owner-freebsd-qa  Sun Jan  6 14:55: 2 2002
Delivered-To: freebsd-qa@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7BC4E37B404; Sun,  6 Jan 2002 14:54:59 -0800 (PST)
Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.11.6/8.11.5) with SMTP id g06MsuD96317;
	Sun, 6 Jan 2002 17:54:56 -0500 (EST)
	(envelope-from robert@fledge.watson.org)
Date: Sun, 6 Jan 2002 17:54:56 -0500 (EST)
From: Robert Watson <rwatson@FreeBSD.org>
X-Sender: robert@fledge.watson.org
To: jlemon@FreeBSD.org
Cc: qa@FreeBSD.org
Subject: Reduced reliability due to larger socket queue defaults for TCP
Message-ID: <Pine.NEB.3.96L.1020106174749.96223A-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-qa@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-qa.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-qa>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-qa>
X-Loop: FreeBSD.ORG


Recently ran into the following circumstance on a server with about a 15
day uptime (and hence about a 15-day old version of -STABLE):

tcp4       0  33090  204.156.12.50.80       213.197.75.52.2378 FIN_WAIT_1
tcp4       0  33304  204.156.12.50.80       198.54.202.4.24052 FIN_WAIT_1
tcp4       0  32120  204.156.12.50.80       24.27.14.83.50129 FIN_WAIT_1
tcp4       0  33089  204.156.12.50.80       213.197.75.52.2381 FIN_WAIT_1
tcp4       0  33304  204.156.12.50.80       198.54.202.4.23509 FIN_WAIT_1
tcp4       0  33304  204.156.12.50.80       212.182.63.102.28130 FIN_WAIT_1
tcp4       0  33304  204.156.12.50.80       62.233.128.65.13712 FIN_WAIT_1
tcp4       0  33580  204.156.12.50.80       212.182.13.23.3473 LAST_ACK
tcp4       0  31856  204.156.12.50.80       198.54.202.4.20584 FIN_WAIT_1
tcp4       0  31856  204.156.12.50.80       212.182.63.102.29962 LAST_ACK
tcp4       0  33304  204.156.12.50.80       198.54.202.4.23960 FIN_WAIT_1
tcp4       0  31482  204.156.12.50.80       213.197.75.52.2373 FIN_WAIT_1
tcp4       0  32551  204.156.12.50.80       213.197.75.52.2374 FIN_WAIT_1

(on the order of hundreds of these), resulting in mbufs getting exhausted. 
maxusers is set to 256, so nmbclusters is 4608, which was previously a
reasonable default.  Presumably the problem I'm experiencing is that dud
connections have doubled in capacity due to a larger send queue size. I've
temporarily dropped the send queue max until I can reboot the machine to
increase nmbclusters, but this failure mode does seem unfortunate. It's
also worth considering adding a release note entry indicating that while
this can improve performance, it can also reduce scalability.  I suppose
this shouldn't have caught me by surprise, but it did, since that server
had previously not had a problem... :-) 

I don't suppose the TCP spec allows us to drain send socket queues in
FIN_WAIT_1 or LAST_ACK? :-)  Any other bright suggestions on ways we can
make this change "safer"?

Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
robert@fledge.watson.org      NAI Labs, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-qa" in the body of the message