From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 05:43:52 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7E25916A418 for ; Tue, 18 Dec 2007 05:43:52 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail34.syd.optusnet.com.au (mail34.syd.optusnet.com.au [211.29.133.218]) by mx1.freebsd.org (Postfix) with ESMTP id 2852013C457 for ; Tue, 18 Dec 2007 05:43:51 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail34.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBI5heff027162 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Dec 2007 16:43:43 +1100 Date: Tue, 18 Dec 2007 16:43:40 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: David G Lawrence In-Reply-To: <20071217102433.GQ25053@tnn.dglawrence.com> Message-ID: <20071218155642.D32807@delplex.bde.org> References: <20071217102433.GQ25053@tnn.dglawrence.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 05:43:52 -0000 On Mon, 17 Dec 2007, David G Lawrence wrote: >> While trying to diagnose a packet loss problem in a RELENG_6 snapshot >> dated >> November 8, 2007 it looks like I've stumbled across a broken driver or >> kernel routine which stops interrupt processing long enough to severly >> degrade network performance every 30.99 seconds. I see the same behaviour under a heavily modified version of FreeBSD-5.2 (except the period was 2 ms longer and the latency was 7 ms instead of 11 ms when numvnodes was at a certain value. Now with numvnodes = 17500, the latency is 3 ms. > I noticed this as well some time ago. The problem has to do with the > processing (syncing) of vnodes. When the total number of allocated vnodes > in the system grows to tens of thousands, the ~31 second periodic sync > process takes a long time to run. Try this patch and let people know if > it helps your problem. It will periodically wait for one tick (1ms) every > 500 vnodes of processing, which will allow other things to run. However, the syncer should be running at a relative low priority and not cause packet loss. I don't see any packet loss even in ~5.2 where the network stack (but not drivers) is still Giant-locked. Other too-high latencies showed up: - syscons LED setting and vt switching gives a latency of 5.5 msec because syscons still uses busy-waiting for setting LEDs :-(. Oops, I do see packet loss -- this causes it under ~5.2 but not under -current. For the bge and/or em drivers, the packet loss shows up in netstat output as a few hundred errors for every LED setting on the receiving machine, while receiving tiny packets at the maximum possible rate of 640 kpps. sysctl is completely Giant-locked and so are upper layers of the network stack. The bge hardware rx ring size is 256 in -current and 512 in ~5.2. At 640 kpps, 512 packets take 800 us so bge wants to call the the upper layers with a latency of far below 800 us. I don't know exactly where the upper layers block on Giant. - a user CPU hog process gives a latency of over 200 ms every half a second or so when the hog starts up, and a 300-400 ms after the hog has been running for some time. Two user CPU hog processes double the latency. Reducing kern.sched.quantum from 100 ms to 10 ms and/or renicing the hogs don't seem to affect this. Running the hogs at idle priority fixes this. This won't affect packet loss, but it might affect user network processes -- they might need to run at real time priority to get low enough latency. They might need to do this anyway -- a scheduling quantum of 100 ms should give a latency of 100 ms per CPU hog quite often, though not usually since the hogs should never be prefered to a higher-prioerity process. Previously I've used a less specialized clock-watching program to determine the syscall latency. It showed similar problems for CPU hogs. I just remembered that I found the fix for these under ~5.2 -- remove a local hack that sacrifices latency for reduced context switches between user threads. -current with SCHED_4BSD does this non-hackishly, but seems to have a bug somehwhere that gives a latency that is large enough to be noticeable in interactive programs. Bruce