Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Dec 2007 16:49:14 -0500
From:      Mark Fullmer <maf@eng.oar.net>
To:        David G Lawrence <dg@dglawrence.com>
Cc:        freebsd-net@freebsd.org, freebsd-stable@freebsd.org, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: Packet loss every 30.999 seconds
Message-ID:  <CD187AD1-8712-418F-9F49-FA3407BA1AC7@eng.oar.net>
In-Reply-To: <20071217102433.GQ25053@tnn.dglawrence.com>
References:  <D50B5BA8-5A80-4370-8F20-6B3A531C2E9B@eng.oar.net> <20071217102433.GQ25053@tnn.dglawrence.com>

next in thread | previous in thread | raw e-mail | index | archive | help
A little progress.

I have a machine with a KTR enabled kernel running.

Another machine is running David's ffs_vfsops.c's patch.

I left two other machines (GENERIC kernels) running the packet loss test
overnight.  At ~ 32480 seconds of uptime the problem starts.  This is  
really
close to a 16 bit overflow... See http://www.eng.oar.net/~maf/bsd6/ 
p1.png and
http://www.eng.oar.net/~maf/bsd6/p2.png.  The missing impulses at 31  
second
marks are the intervals between test runs.  The window of missing  
packets
(timestamps between two packets where a sequence number is missing)
is usually less than 4us, altough I'm not sure gettimeofday() can be
trusted for measuring this.  See https://www.eng.oar.net/~maf/bsd6/ 
p3.png

Things I'll try tonight:

   o check on the patched kernel

   o Try KTR debugging enabled before and after an expected high  
latency period.

   o Dump all files to /dev/null to trigger the behavior.

I would expect the vnode problem to look a little different on the  
packet
loss graphs over time.  If this leads anywher I'll add a counter
before the msleep() and see how often it's getting there.

On Dec 17, 2007, at 5:24 AM, David G Lawrence wrote:
>    I noticed this as well some time ago. The problem has to do with  
> the
> processing (syncing) of vnodes. When the total number of allocated  
> vnodes
> in the system grows to tens of thousands, the ~31 second periodic sync
> process takes a long time to run. Try this patch and let people  
> know if
> it helps your problem. It will periodically wait for one tick (1ms)  
> every
> 500 vnodes of processing, which will allow other things to run.
>
> Index: ufs/ffs/ffs_vfsops.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_vfsops.c,v
> retrieving revision 1.290.2.16
> diff -c -r1.290.2.16 ffs_vfsops.c
> *** ufs/ffs/ffs_vfsops.c	9 Oct 2006 19:47:17 -0000	1.290.2.16
> --- ufs/ffs/ffs_vfsops.c	25 Apr 2007 01:58:15 -0000
> ***************
> *** 1109,1114 ****
> --- 1109,1115 ----
>   	int softdep_deps;
>   	int softdep_accdeps;
>   	struct bufobj *bo;
> + 	int flushed_count = 0;
>
>   	fs = ump->um_fs;
>   	if (fs->fs_fmod != 0 && fs->fs_ronly != 0) {		/* XXX */
> ***************
> *** 1174,1179 ****
> --- 1175,1184 ----
>   			allerror = error;
>   		vput(vp);
>   		MNT_ILOCK(mp);
> + 		if (flushed_count++ > 500) {
> + 			flushed_count = 0;
> + 			msleep(&flushed_count, MNT_MTX(mp), PZERO, "syncw", 1);
> + 		}
>   	}
>   	MNT_IUNLOCK(mp);
>   	/*
>
> -DG
>
> David G. Lawrence
> President
> Download Technologies, Inc. - http://www.downloadtech.com - (866)  
> 399 8500
> The FreeBSD Project - http://www.freebsd.org
> Pave the road of life with opportunities.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CD187AD1-8712-418F-9F49-FA3407BA1AC7>