From owner-freebsd-net@FreeBSD.ORG  Tue Dec 18 05:43:52 2007
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7E25916A418
	for <freebsd-net@freebsd.org>; Tue, 18 Dec 2007 05:43:52 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail34.syd.optusnet.com.au (mail34.syd.optusnet.com.au
	[211.29.133.218])
	by mx1.freebsd.org (Postfix) with ESMTP id 2852013C457
	for <freebsd-net@freebsd.org>; Tue, 18 Dec 2007 05:43:51 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au
	(c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213])
	by mail34.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	lBI5heff027162
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 18 Dec 2007 16:43:43 +1100
Date: Tue, 18 Dec 2007 16:43:40 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: David G Lawrence <dg@dglawrence.com>
In-Reply-To: <20071217102433.GQ25053@tnn.dglawrence.com>
Message-ID: <20071218155642.D32807@delplex.bde.org>
References: <D50B5BA8-5A80-4370-8F20-6B3A531C2E9B@eng.oar.net>
	<20071217102433.GQ25053@tnn.dglawrence.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org
Subject: Re: Packet loss every 30.999 seconds
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Dec 2007 05:43:52 -0000

On Mon, 17 Dec 2007, David G Lawrence wrote:

>> While trying to diagnose a packet loss problem in a RELENG_6 snapshot
>> dated
>> November 8, 2007 it looks like I've stumbled across a broken driver or
>> kernel routine which stops interrupt processing long enough to severly
>> degrade network performance every 30.99 seconds.

I see the same behaviour under a heavily modified version of FreeBSD-5.2
(except the period was 2 ms longer and the latency was 7 ms instead
of 11 ms when numvnodes was at a certain value.  Now with numvnodes =
17500, the latency is 3 ms.

>   I noticed this as well some time ago. The problem has to do with the
> processing (syncing) of vnodes. When the total number of allocated vnodes
> in the system grows to tens of thousands, the ~31 second periodic sync
> process takes a long time to run. Try this patch and let people know if
> it helps your problem. It will periodically wait for one tick (1ms) every
> 500 vnodes of processing, which will allow other things to run.

However, the syncer should be running at a relative low priority and not
cause packet loss.  I don't see any packet loss even in ~5.2 where the
network stack (but not drivers) is still Giant-locked.

Other too-high latencies showed up:
- syscons LED setting and vt switching gives a latency of 5.5 msec because
   syscons still uses busy-waiting for setting LEDs :-(.  Oops, I do see
   packet loss -- this causes it under ~5.2 but not under -current.  For
   the bge and/or em drivers, the packet loss shows up in netstat output
   as a few hundred errors for every LED setting on the receiving machine,
   while receiving tiny packets at the maximum possible rate of 640 kpps.
   sysctl is completely Giant-locked and so are upper layers of the
   network stack.  The bge hardware rx ring size is 256 in -current and
   512 in ~5.2.  At 640 kpps, 512 packets take 800 us so bge wants to
   call the the upper layers with a latency of far below 800 us.  I
   don't know exactly where the upper layers block on Giant.
- a user CPU hog process gives a latency of over 200 ms every half a
   second or so when the hog starts up, and a 300-400 ms after the
   hog has been running for some time.  Two user CPU hog processes
   double the latency.  Reducing kern.sched.quantum from 100 ms to 10
   ms and/or renicing the hogs don't seem to affect this.  Running the
   hogs at idle priority fixes this.  This won't affect packet loss,
   but it might affect user network processes -- they might need to
   run at real time priority to get low enough latency.  They might need
   to do this anyway -- a scheduling quantum of 100 ms should give a
   latency of 100 ms per CPU hog quite often, though not usually since
   the hogs should never be prefered to a higher-prioerity process.

Previously I've used a less specialized clock-watching program to
determine the syscall latency.  It showed similar problems for CPU
hogs.  I just remembered that I found the fix for these under ~5.2 --
remove a local hack that sacrifices latency for reduced context
switches between user threads.  -current with SCHED_4BSD does this
non-hackishly, but seems to have a bug somehwhere that gives a latency
that is large enough to be noticeable in interactive programs.

Bruce