From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 18:33:52 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5BBE916A420 for ; Wed, 19 Dec 2007 18:33:52 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id 2209213C478 for ; Wed, 19 Dec 2007 18:33:51 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: (qmail 60029 invoked from network); 19 Dec 2007 18:33:51 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 19 Dec 2007 18:33:51 -0000 In-Reply-To: <20071219171331.GH25053@tnn.dglawrence.com> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <50B64D0B-35E6-453F-A8AF-65982A503E20@eng.oar.net> Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Wed, 19 Dec 2007 13:33:34 -0500 To: David G Lawrence X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 18:33:52 -0000 Just to confirm the patch did not change the behavior. I ran with it last night and double checked this morning to make sure. It looks like if you put the check at the top of the loop and the next node is changed during msleep() SLIST_NEXT will walk into the trash. I'm in over my head here.... Setting kern.maxvnodes=1000 does stop both the periodic packet loss and the high latency syscall's, so it does look like walking this chain without yielding the processor is part of the problem I'm seeing. The other behavior I don't understand is why the em driver is able to increment if_ipackets but still lose the packet. Dumping the internal stats with dev.em.1.stats=1: Dec 19 13:07:46 dytnq-nf1 kernel: em1: Excessive collisions = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Sequence errors = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Defer count = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Missed Packets = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive No Buffers = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive Length Errors = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive errors = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Crc errors = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Alignment errors = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Collision/Carrier extension errors = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: RX overruns = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: watchdog timeouts = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: XON Rcvd = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: XON Xmtd = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: XOFF Rcvd = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: XOFF Xmtd = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Good Packets Rcvd = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Good Packets Xmtd = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: TSO Contexts Xmtd = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: TSO Contexts Failed = 0 With FreeBSD 4 I was able to run a UDP data collector with rtprio set, kern.ipc.maxsockbuf=20480000, then use setsockopt() with SO_RCVBUF in the application. If packets were dropped they would show up with netstat -s as "dropped due to full socket buffers". Since the packet never makes it to ip_input() I no longer have any way to count drops. There will always be corner cases where interrupts are lost and drops not accounted for if the adapter hardware can't report them, but right now I've got no way to estimate any loss. -- mark On Dec 19, 2007, at 12:13 PM, David G Lawrence wrote: >>> Try it with "find / -type f >/dev/null" to duplicate the problem >>> almost >>> instantly. >> >> I was able to verify last night that (cd /; tar -cpf -) > all.tar >> would >> trigger the problem. I'm working getting a test running with >> David's ffs_sync() workaround now, adding a few counters there should >> get this narrowed down a little more. > > Unfortunately, the version of the patch that I sent out isn't > going to > help your problem. It needs to yield at the top of the loop, but vp > isn't > necessarily valid after the wakeup from the msleep. That's a > problem that > I'm having trouble figuring out a solution to - the solutions that > come > to mind will all significantly increase the overhead of the loop. > As a very inadequate work-around, you might consider lowering > kern.maxvnodes to something like 20000 - that might be low enough to > not trigger the problem, but also be high enough to not significantly > affect system I/O performance. > > -DG > > David G. Lawrence > President > Download Technologies, Inc. - http://www.downloadtech.com - (866) > 399 8500 > The FreeBSD Project - http://www.freebsd.org > Pave the road of life with opportunities.