From owner-freebsd-qa Sun Jan 6 14:55: 2 2002 Delivered-To: freebsd-qa@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id 7BC4E37B404; Sun, 6 Jan 2002 14:54:59 -0800 (PST) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.11.6/8.11.5) with SMTP id g06MsuD96317; Sun, 6 Jan 2002 17:54:56 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Sun, 6 Jan 2002 17:54:56 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: jlemon@FreeBSD.org Cc: qa@FreeBSD.org Subject: Reduced reliability due to larger socket queue defaults for TCP Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-qa@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Recently ran into the following circumstance on a server with about a 15 day uptime (and hence about a 15-day old version of -STABLE): tcp4 0 33090 204.156.12.50.80 213.197.75.52.2378 FIN_WAIT_1 tcp4 0 33304 204.156.12.50.80 198.54.202.4.24052 FIN_WAIT_1 tcp4 0 32120 204.156.12.50.80 24.27.14.83.50129 FIN_WAIT_1 tcp4 0 33089 204.156.12.50.80 213.197.75.52.2381 FIN_WAIT_1 tcp4 0 33304 204.156.12.50.80 198.54.202.4.23509 FIN_WAIT_1 tcp4 0 33304 204.156.12.50.80 212.182.63.102.28130 FIN_WAIT_1 tcp4 0 33304 204.156.12.50.80 62.233.128.65.13712 FIN_WAIT_1 tcp4 0 33580 204.156.12.50.80 212.182.13.23.3473 LAST_ACK tcp4 0 31856 204.156.12.50.80 198.54.202.4.20584 FIN_WAIT_1 tcp4 0 31856 204.156.12.50.80 212.182.63.102.29962 LAST_ACK tcp4 0 33304 204.156.12.50.80 198.54.202.4.23960 FIN_WAIT_1 tcp4 0 31482 204.156.12.50.80 213.197.75.52.2373 FIN_WAIT_1 tcp4 0 32551 204.156.12.50.80 213.197.75.52.2374 FIN_WAIT_1 (on the order of hundreds of these), resulting in mbufs getting exhausted. maxusers is set to 256, so nmbclusters is 4608, which was previously a reasonable default. Presumably the problem I'm experiencing is that dud connections have doubled in capacity due to a larger send queue size. I've temporarily dropped the send queue max until I can reboot the machine to increase nmbclusters, but this failure mode does seem unfortunate. It's also worth considering adding a release note entry indicating that while this can improve performance, it can also reduce scalability. I suppose this shouldn't have caught me by surprise, but it did, since that server had previously not had a problem... :-) I don't suppose the TCP spec allows us to drain send socket queues in FIN_WAIT_1 or LAST_ACK? :-) Any other bright suggestions on ways we can make this change "safer"? Robert N M Watson FreeBSD Core Team, TrustedBSD Project robert@fledge.watson.org NAI Labs, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-qa" in the body of the message