From owner-freebsd-hackers@FreeBSD.ORG Sat Jan 24 19:12:25 2004 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2E49F16A4CE for ; Sat, 24 Jan 2004 19:12:25 -0800 (PST) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7B69643D1D for ; Sat, 24 Jan 2004 19:12:23 -0800 (PST) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.10/8.12.10) with ESMTP id i0P3ABUd078858; Sat, 24 Jan 2004 22:10:11 -0500 (EST) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i0P3ABgb078821; Sat, 24 Jan 2004 22:10:11 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Sat, 24 Jan 2004 22:10:11 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: Matthew Dillon In-Reply-To: <200401250302.i0P32BON039881@apollo.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Max Laier cc: hackers@freebsd.org Subject: Re: XL driver checksum producing corrupted but checksum-correct packets X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jan 2004 03:12:25 -0000 On Sat, 24 Jan 2004, Matthew Dillon wrote: > Well, I tried to tcpdump a session. I managed to hit the error three > times but in all three cases the tcpdump on the server dropped the > particular packet I was looking for. I'm only able to get a 70% > retention rate in the tcpdump output on the server... its just trying > to record too much for the machine to handle at the rate the NFS requests > are coming in. To pick up the corrupted packet on the machine where the corruption is occurring, you might want to try hooking up the UDP checksum drop case to BPF_MTAP() for a special BPF device or rule, or have it spit them into a raw socket (probably easier). Problem is, the context switching does in BPF, so if you can get another machine onto the segment without it being excessively switched (perhaps on a monitor port), using a third machine to grab the on-the-wire packets might work best. That way you can compare pre-corruption and post-corruption. > I'm going to give up trying to characterize the corruption for now. > It could very well be the PCI latency timer as previously discussed > but I can't test that right now. If it is the problem, it may be easier to do this and see if it works than to track down the packet :-). good luck... Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research