From owner-freebsd-stable@FreeBSD.ORG Tue Dec 18 14:17:43 2007 Return-Path: Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AC8CE16A419; Tue, 18 Dec 2007 14:17:43 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id 716BE13C4F2; Tue, 18 Dec 2007 14:17:43 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBIEHgwi017772; Tue, 18 Dec 2007 06:17:42 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBIEHg1v017771; Tue, 18 Dec 2007 06:17:42 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Tue, 18 Dec 2007 06:17:42 -0800 From: David G Lawrence To: Bruce Evans Message-ID: <20071218141742.GS25053@tnn.dglawrence.com> References: <20071217103936.GR25053@tnn.dglawrence.com> <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071218233644.U756@besplex.bde.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Tue, 18 Dec 2007 06:17:42 -0800 (PST) Cc: freebsd-net@FreeBSD.ORG, freebsd-stable@FreeBSD.ORG Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 14:17:43 -0000 > >Right, it's a non-optimal loop when N is very large, and that's a fairly > >well understood problem. I think what DG was getting at, though, is > >that this massive flush happens every time the syncer runs, which > >doesn't seem correct. Sure, maybe you just rsynced 100,000 files 20 > >seconds ago, so the upcoming flush is going to be expensive. But the > >next flush 30 seconds after that shouldn't be just as expensive, yet it > >appears to be so. > > I'm sure it doesn't cause many bogus flushes. iostat shows zero writes > caused by calling this incessantly using "while :; do sync; done". I didn't say it caused any bogus disk I/O. My original problem (after a day or two of uptime) was an occasional large scheduling delay for a process that needed to process VoIP frames in real-time. It was happening every 31 seconds and was causing voice frames to be dropped due to the large latency causing the frame to be outside of the jitter window. I wrote a program that measures the scheduling delay by sleeping for one tick and then comparing the timeofday offset from what was expected. This revealed that every 31 seconds, the process was seeing a 17ms delay in scheduling. Further investigation found that 1) the syncer was the process that was running every 31 seconds and causing the delay (and it was the only one in the system with that timing interval), and that 2) lowering the kern.maxvnodes to something lowish (5000) would mostly mitigate the problem. The patch to limit the number of vnodes to process in the loop before sleeping was then developed and it completely resolved the problem. Since the wait that I added is at the bottom of the loop and the limit is 500 vnodes, this tells me that every 31 seconds, there are a whole lot of vnodes that are being "synced", when there shouldn't have been any (this fact wasn't apparent to me at the time, but when I later realized this, I had no time to investigate further). My tests and analysis have all been on an otherwise quiet system (no disk I/O), so the bottom of the ffs_sync vnode loop should not have been reached at all, let alone tens of thousands of times every 31 seconds. All machines were uni- processor, FreeBSD 6+. I don't know if this problem is present in 5.2. I didn't see ffs_syncvnode in your call graph, so it probably is not. Anyway, someone needs to instrument the vnode loop in ffs_sync and figure out what is going on. As you've pointed out, it is necessary to first read a lot of files (I use tar to /dev/null and make sure it reads at least 100K files) in order to get the vnodes allocated. As I mentioned previously, I suspect that either ip->i_flag is not getting completely cleared in ffs_syncvnode or its children or v_bufobj.bo_dirty.bv_cnt accounting is broken. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities.