From owner-freebsd-current@FreeBSD.ORG Wed Jun 10 02:47:44 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 134D5106566B for ; Wed, 10 Jun 2009 02:47:44 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.236]) by mx1.freebsd.org (Postfix) with ESMTP id D57F68FC0A for ; Wed, 10 Jun 2009 02:47:43 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: by rv-out-0506.google.com with SMTP id k40so174754rvb.43 for ; Tue, 09 Jun 2009 19:47:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:from:date:to:cc :subject:message-id:reply-to:references:mime-version:content-type :content-disposition:in-reply-to:user-agent; bh=X/n3GylX7M2lwxwWEoUnpvHMHPxB/L2RbON3t5khWnQ=; b=lsBfMUn7nkhqXkeQcflIAxYvA2KigBms/N+P7vH3ZZHLl+BM1N91Ln3gGBEM2weLb6 c13beu5/eR9ZdOtO+D7n7zJSCOzJ1TiNFEUzkyF82xTJO/Al8cQGpHXTE/kzqaIhxqlc ojLKQjdSgjoFtfij4q9VWbrrcS8MGZ7RkWFEA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=N4iEvoEs+mfZfm/zAJMss8Wm8AuO1mFJvMr3QGxolu76flJ1zWrXQxLezaWeVF4oqn +GT2MEBaTWiOKSpGhwGvp7VRaWmCXVvuVK43Z45NzdCe83mL8RH8Og491mr66BAZIRaA dAGSje2Y7Mp3ziSWU/MQD/YR63p5hbDsC6u6A= Received: by 10.141.194.6 with SMTP id w6mr770588rvp.118.1244602063511; Tue, 09 Jun 2009 19:47:43 -0700 (PDT) Received: from michelle.cdnetworks.co.kr ([114.111.62.249]) by mx.google.com with ESMTPS id k41sm18126363rvb.27.2009.06.09.19.47.41 (version=SSLv3 cipher=RC4-MD5); Tue, 09 Jun 2009 19:47:42 -0700 (PDT) Received: by michelle.cdnetworks.co.kr (sSMTP sendmail emulation); Wed, 10 Jun 2009 11:49:59 +0900 From: Pyun YongHyeon Date: Wed, 10 Jun 2009 11:49:59 +0900 To: Thomas Lotterer Message-ID: <20090610024959.GD63941@michelle.cdnetworks.co.kr> References: <4A2DA8D9.2030300@lotterer.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4A2DA8D9.2030300@lotterer.net> User-Agent: Mutt/1.4.2.3i Cc: freebsd-current@freebsd.org Subject: Re: suspect bug in vge(4) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jun 2009 02:47:44 -0000 On Tue, Jun 09, 2009 at 02:12:09AM +0200, Thomas Lotterer wrote: > I need advice hunting down a network problem which I suspect to be > a bug in the vge(4) driver. After spending a lot of time on > investigation, I'm out of ideas > > My recently built new home server running FreeBSD 8.0-CURRENT as of > 2009-06-07 on a VIA ARTiGO A2000 [1] exhibits network problems when > sending more than a couple of dozened kilobytes of TCP traffic. > > The server application is "Dovecot" [2] Secure IMAP server. > The client application is "Thunderbird" [3] running on WindowsXP. > > The high-level view of the problem is that the client seems to stall > downloading messages or even a complex structure of IMAP folder names. > When using STARTTLS the client often prints the infamous generic and > misleading error "Thunderbird received a message with incorrect Message > Authentication Code. If the error occurs frequently, contact the website > administrator". The origin of this message is the SSL library that ships > with Thunderbird. The same library is used for Firefox where the hint > might actually make sense when the user is attempting to access a broken > HTTPS server. After lots of debugging I found out that the same error is > not only printed for TLS/SSL issues but simply also for broken TCP > streams, let it be wrong TCP checksums or a server process dumping core. > So I tried IMAP without TLS just to see the same issue with the > misleading SSL error replaced by an application hang. I ran truss(1) > against Dovecot, placed Thunderbird in debug mode [4] and found out that > during a stall condition the server did write(2) all the data to the TCP > socket but some data did not arrive at the client. > > The low-level view of the problem is that Wireshark on the client side > sooner or later - not for the first few dozened packets - sees a packet > with an incorrect TCP checksum. Usually the next packet is from the > server again, continuing the stream. What follows is an expected but > fruitless attempt of the client sending duplicate ACKs for the last good > packet but the server incorrectly retransmitting more TCP packets with > bad checksums. > > To me it sounds like a broken implementation of hardware generated > checksums. Trying to disable all the "-tso" "-lro" "-txcsum" "-rxcsum" > options and using "polling" option on the server side network interface > did not help. So either something deeper is broken or maybe just the > ability to disable these features needs fixing. Btw, the client using > "VMware Accelerated AMD PCNet Adapter" driver with "TCP/IP Offload=off" > and "TsoEnable=0". > > Sorry to bother you with more details but here's why I believe it's an > hardware/driver issue. Before I purchased the hardware I tried a dry > run. Installed FreeBSD 7.1-RELEASE as VM guest, then upgraded to FreeBSD > 8.0-CURRENT using FreeBSD Administration Toolkit [5]. Built OS and apps > from source, loaded my data - worked! Used the same client that has > problems with the real hardware today. Then used that VM as build host > to create the NanoBSD [6] Flash image for the ARTiGO. Both use exactly > the same sources. The VM works, the metal is broken. One of the few > differences is the NIC and it's driver. As a workaround I copied the VM > to a usual PC equipped with a fxp(4) NIC - worked! So it really looks > like an OS/HW compatibility issue on the ARTiGO. > > In case you are considering a hardware defect please note that before I > loaded the OS, apps and my data to this new hardware I thoroughly tested > what I could. One week filling the disks to the max using repetitive > copies of a file created from /dev/random and, after manually breaking > and rebuilding ZFS mirror, checking data integrity using message > digests. No problems with disks, albeit poor SATA performance, but > that's another story. One day running memtest86 [7]. No problems with > memory. One hour NIC test copying /dev/zero to /dev/null over the wire > using "scp -o compression=no". No hangs or hiccups here. > > Hope you can help me. > I already know there are possible edge-cases in vge(4) but your issue looks quite different one than ever reported. Unfortunately vge(4) hardware I had was broken so I couldn't complete overhauling the vge(4). The code in the following URL is the latest WIP version but I don't know whether it fixes the issue as it wasn't tested at all on real hardware. http://people.freebsd.org/~yongari/vge/if_vge.c http://people.freebsd.org/~yongari/vge/if_vgereg.h http://people.freebsd.org/~yongari/vge/if_vgevar.h