Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Dec 2007 13:33:34 -0500
From:      Mark Fullmer <maf@eng.oar.net>
To:        David G Lawrence <dg@dglawrence.com>
Cc:        freebsd-net@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: Packet loss every 30.999 seconds
Message-ID:  <50B64D0B-35E6-453F-A8AF-65982A503E20@eng.oar.net>
In-Reply-To: <20071219171331.GH25053@tnn.dglawrence.com>
References:  <D50B5BA8-5A80-4370-8F20-6B3A531C2E9B@eng.oar.net> <20071217102433.GQ25053@tnn.dglawrence.com> <CD187AD1-8712-418F-9F49-FA3407BA1AC7@eng.oar.net> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Just to confirm the patch did not change the behavior.  I ran with it
last night and double checked this morning to make sure.

It looks like if you put the check at the top of the loop and the  
next node
is changed during msleep() SLIST_NEXT will walk into the trash.  I'm
in over my head here....

Setting kern.maxvnodes=1000 does stop both the periodic packet loss and
the high latency syscall's, so it does look like walking this chain
without yielding the processor is part of the problem I'm seeing.

The other behavior I don't understand is why the em driver is able
to increment if_ipackets but still lose the packet.

Dumping the internal stats with dev.em.1.stats=1:

Dec 19 13:07:46 dytnq-nf1 kernel: em1: Excessive collisions = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Sequence errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Defer count = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Missed Packets = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive No Buffers = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive Length Errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Crc errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Alignment errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Collision/Carrier extension  
errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: RX overruns = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: watchdog timeouts = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: XON Rcvd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: XON Xmtd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: XOFF Rcvd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: XOFF Xmtd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Good Packets Rcvd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Good Packets Xmtd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: TSO Contexts Xmtd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: TSO Contexts Failed = 0

With FreeBSD 4 I was able to run a UDP data collector with rtprio set,
kern.ipc.maxsockbuf=20480000, then use setsockopt() with SO_RCVBUF
in the application.  If packets were dropped they would show up
with netstat -s as "dropped due to full socket buffers".

Since the packet never makes it to ip_input() I no longer have
any way to count drops.  There will always be corner cases where
interrupts are lost and drops not accounted for if the adapter
hardware can't report them, but right now I've got no way to
estimate any loss.

--
mark

On Dec 19, 2007, at 12:13 PM, David G Lawrence wrote:

>>> Try it with "find / -type f >/dev/null" to duplicate the problem
>>> almost
>>> instantly.
>>
>> I was able to verify last night that (cd /; tar -cpf -) > all.tar  
>> would
>> trigger the problem.  I'm working getting a test running with
>> David's ffs_sync() workaround now, adding a few counters there should
>> get this narrowed down a little more.
>
>    Unfortunately, the version of the patch that I sent out isn't  
> going to
> help your problem. It needs to yield at the top of the loop, but vp  
> isn't
> necessarily valid after the wakeup from the msleep. That's a  
> problem that
> I'm having trouble figuring out a solution to - the solutions that  
> come
> to mind will all significantly increase the overhead of the loop.
>    As a very inadequate work-around, you might consider lowering
> kern.maxvnodes to something like 20000 - that might be low enough to
> not trigger the problem, but also be high enough to not significantly
> affect system I/O performance.
>
> -DG
>
> David G. Lawrence
> President
> Download Technologies, Inc. - http://www.downloadtech.com - (866)  
> 399 8500
> The FreeBSD Project - http://www.freebsd.org
> Pave the road of life with opportunities.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50B64D0B-35E6-453F-A8AF-65982A503E20>