From owner-freebsd-stable@FreeBSD.ORG  Tue Dec 18 16:20:40 2007
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8861C16A46B;
	Tue, 18 Dec 2007 16:20:40 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail18.syd.optusnet.com.au (mail18.syd.optusnet.com.au
	[211.29.132.199])
	by mx1.freebsd.org (Postfix) with ESMTP id 10CCA13C4D1;
	Tue, 18 Dec 2007 16:20:39 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au
	(c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213])
	by mail18.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	lBIGKSE0008461
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 19 Dec 2007 03:20:31 +1100
Date: Wed, 19 Dec 2007 03:20:28 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: David G Lawrence <dg@dglawrence.com>
In-Reply-To: <20071218141742.GS25053@tnn.dglawrence.com>
Message-ID: <20071219022102.I34422@delplex.bde.org>
References: <D50B5BA8-5A80-4370-8F20-6B3A531C2E9B@eng.oar.net>
	<20071217103936.GR25053@tnn.dglawrence.com>
	<20071218170133.X32807@delplex.bde.org>
	<47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org>
	<20071218141742.GS25053@tnn.dglawrence.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-net@FreeBSD.org, freebsd-stable@FreeBSD.org,
	Bruce Evans <brde@optusnet.com.au>
Subject: Re: Packet loss every 30.999 seconds
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Dec 2007 16:20:40 -0000

On Tue, 18 Dec 2007, David G Lawrence wrote:

>   I didn't say it caused any bogus disk I/O. My original problem
> (after a day or two of uptime) was an occasional large scheduling delay
> for a process that needed to process VoIP frames in real-time. It was
> happening every 31 seconds and was causing voice frames to be dropped
> due to the large latency causing the frame to be outside of the jitter
> window. I wrote a program that measures the scheduling delay by sleeping
> for one tick and then comparing the timeofday offset from what was
> expected. This revealed that every 31 seconds, the process was seeing
> a 17ms delay in scheduling. Further investigation found that 1) the

I got an almost identical delay (with 64000 vnodes).

Now, 17ms isn't much.  Delays much have been much longer when CPUs
were many times slower and RAM/vnodes were not so many times smaller.
High-priority threads just need to be able to preempt the syncer so
that they don't lose data (unless really hard real time is supported,
which it isn't).  This should work starting with about FreeBSD-6
(probably need "options PREEMPT").  I doesn't work in ~5.2 due to Giant
locking, but I find Giant locking to rarely matter for UP.  Old
versions of FreeBSD were only able to preempt to non-threads (interrupt
handlers) yet they somehow survived the longer delays.  They didn't
have Giant locking to get in the way, and presumably avoided packet
loss by doing lots in interrupt handlers (hardware isr and netisr).

I just remembered that I have seen packet loss even under -current
when I leave out or turn off "options PREEMPT".

> ...
> and it completely resolved the problem. Since the wait that I added
> is at the bottom of the loop and the limit is 500 vnodes, this tells
> me that every 31 seconds, there are a whole lot of vnodes that are
> being "synced", when there shouldn't have been any (this fact wasn't
> apparent to me at the time, but when I later realized this, I had
> no time to investigate further). My tests and analysis have all been
> on an otherwise quiet system (no disk I/O), so the bottom of the
> ffs_sync vnode loop should not have been reached at all, let alone
> tens of thousands of times every 31 seconds. All machines were uni-
> processor, FreeBSD 6+. I don't know if this problem is present in 5.2.
> I didn't see ffs_syncvnode in your call graph, so it probably is not.

I chopped to a float profile with only top callers.  Any significant
calls from ffs_sync() would show up as top callers.  I still have the
data, and the call graph shows much more clearly that there was just
one dirty vnode for the whole sync():

%                 0.00        0.01       1/1           syscall [3]
% [4]     88.7    0.00        0.01       1         sync [4]
%                 0.01        0.00       5/5           ffs_sync [5]
%                 0.01        0.00       6/6           vfs_msync [6]
%                 0.00        0.00       7/8           vfs_busy [260]
%                 0.00        0.00       7/8           vfs_unbusy [263]
%                 0.00        0.00       6/7           vn_finished_write [310]
%                 0.00        0.00       6/6           vn_start_write [413]
%                 0.00        0.00       1/1           vfs_stdnosync [472]
% 
% -----------------------------------------------
% 
%                 0.01        0.00       5/5           sync [4]
% [5]     50.7    0.01        0.00       5         ffs_sync [5]
%                 0.00        0.00       1/1           ffs_fsync [278]
%                 0.00        0.00       1/60          vget <cycle 1> [223]
%                 0.00        0.00       1/60          ufs_vnoperatespec <cycle 1> [78]
%                 0.00        0.00       1/26          vrele [76]

It passed the flags test just once to get to the vget().  ffs_syncvnode()
doesn't exist in 5.2, and ffs_fsync() is called instead.

% 
% -----------------------------------------------
% 
%                 0.01        0.00       6/6           sync [4]
% [6]     38.0    0.01        0.00       6         vfs_msync [6]
% 
% -----------------------------------------------
% ...
% 
%                 0.00        0.00       1/1           ffs_sync [5]
% [278]    0.0    0.00        0.00       1         ffs_fsync [278]
%                 0.00        0.00       1/1           ffs_update [368]
%                 0.00        0.00       1/4           vn_isdisk [304]

This is presumbly to sync the 1 dirty vnode.

BTW I use noatime a lot, including for all file systems used in the test,
so the tree walk didn't dirty any vnodes.  A tar to /dev/zero would dirty
all vnodes if everything were mounted without this option.

% ...
%   %   cumulative   self              self     total 
%  time   seconds   seconds    calls  ns/call  ns/call  name 
%  50.7      0.008    0.008        5  1666427  1667246  ffs_sync [5]
%  38.0      0.015    0.006        6  1041217  1041217  vfs_msync [6]
%   3.1      0.015    0.001        0  100.00%           mcount [7]
%   1.5      0.015    0.000        0  100.00%           mexitcount [8]
%   0.6      0.015    0.000        0  100.00%           cputime [22]
%   0.6      0.016    0.000       34     2660     2660  generic_bcopy [24]
%   0.5      0.016    0.000        0  100.00%           user [26]

Bruce