From owner-freebsd-hackers@FreeBSD.ORG  Sat Jan 24 19:12:25 2004
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2E49F16A4CE
	for <hackers@freebsd.org>; Sat, 24 Jan 2004 19:12:25 -0800 (PST)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7B69643D1D
	for <hackers@freebsd.org>; Sat, 24 Jan 2004 19:12:23 -0800 (PST)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (localhost [127.0.0.1])
	by fledge.watson.org (8.12.10/8.12.10) with ESMTP id i0P3ABUd078858;
	Sat, 24 Jan 2004 22:10:11 -0500 (EST)
	(envelope-from robert@fledge.watson.org)
Received: from localhost (robert@localhost)i0P3ABgb078821;
	Sat, 24 Jan 2004 22:10:11 -0500 (EST)
	(envelope-from robert@fledge.watson.org)
Date: Sat, 24 Jan 2004 22:10:11 -0500 (EST)
From: Robert Watson <rwatson@freebsd.org>
X-Sender: robert@fledge.watson.org
To: Matthew Dillon <dillon@apollo.backplane.com>
In-Reply-To: <200401250302.i0P32BON039881@apollo.backplane.com>
Message-ID: <Pine.NEB.3.96L.1040124220715.62871T-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: Max Laier <max@love2party.net>
cc: hackers@freebsd.org
Subject: Re: XL driver checksum producing corrupted but checksum-correct
	packets
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Jan 2004 03:12:25 -0000


On Sat, 24 Jan 2004, Matthew Dillon wrote:

>     Well, I tried to tcpdump a session.  I managed to hit the error three
>     times but in all three cases the tcpdump on the server dropped the
>     particular packet I was looking for.  I'm only able to get a 70%
>     retention rate in the tcpdump output on the server... its just trying
>     to record too much for the machine to handle at the rate the NFS requests
>     are coming in.

To pick up the corrupted packet on the machine where the corruption is
occurring, you might want to try hooking up the UDP checksum drop case to
BPF_MTAP() for a special BPF device or rule, or have it spit them into a
raw socket (probably easier).

Problem is, the context switching does in BPF, so if you can get another
machine onto the segment without it being excessively switched (perhaps on
a monitor port), using a third machine to grab the on-the-wire packets
might work best.  That way you can compare pre-corruption and
post-corruption.

>     I'm going to give up trying to characterize the corruption for now.
>     It could very well be the PCI latency timer as previously discussed
>     but I can't test that right now.

If it is the problem, it may be easier to do this and see if it works than
to track down the packet :-).

good luck...

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Senior Research Scientist, McAfee Research