From owner-freebsd-current@FreeBSD.ORG Wed Jun 10 01:39:55 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4979A106566B for ; Wed, 10 Jun 2009 01:39:55 +0000 (UTC) (envelope-from thomas+freebsd@lotterer.net) Received: from angel.hellmouth.lotterer.net (angel.hellmouth.lotterer.net [88.198.53.82]) by mx1.freebsd.org (Postfix) with ESMTP id B9B9B8FC15 for ; Wed, 10 Jun 2009 01:39:54 +0000 (UTC) (envelope-from thomas+freebsd@lotterer.net) Received: from dawn.sunnydale.lotterer.net (ppp-93-104-163-93.dynamic.mnet-online.de [93.104.163.93]) by angel.hellmouth.lotterer.net (Postfix) with ESMTPS id DAD631EC102 for ; Tue, 9 Jun 2009 02:12:05 +0200 (CEST) Received: from [172.17.16.148] (lab-winxp-1.sunnydale.lotterer.net [172.17.16.148]) by dawn.sunnydale.lotterer.net (Postfix) with ESMTPSA id 17CAB47A6F for ; Tue, 9 Jun 2009 02:11:58 +0200 (CEST) Message-ID: <4A2DA8D9.2030300@lotterer.net> Date: Tue, 09 Jun 2009 02:12:09 +0200 From: Thomas Lotterer User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: freebsd-current@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=3.0 tests=UNPARSEABLE_RELAY autolearn=failed version=3.2.5-openpkg X-Spam-Checker-Version: SpamAssassin 3.2.5-openpkg (2008-06-10) on angel.lotterer.net X-Mailman-Approved-At: Wed, 10 Jun 2009 02:10:49 +0000 Subject: suspect bug in vge(4) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jun 2009 01:39:55 -0000 I need advice hunting down a network problem which I suspect to be a bug in the vge(4) driver. After spending a lot of time on investigation, I'm out of ideas My recently built new home server running FreeBSD 8.0-CURRENT as of 2009-06-07 on a VIA ARTiGO A2000 [1] exhibits network problems when sending more than a couple of dozened kilobytes of TCP traffic. The server application is "Dovecot" [2] Secure IMAP server. The client application is "Thunderbird" [3] running on WindowsXP. The high-level view of the problem is that the client seems to stall downloading messages or even a complex structure of IMAP folder names. When using STARTTLS the client often prints the infamous generic and misleading error "Thunderbird received a message with incorrect Message Authentication Code. If the error occurs frequently, contact the website administrator". The origin of this message is the SSL library that ships with Thunderbird. The same library is used for Firefox where the hint might actually make sense when the user is attempting to access a broken HTTPS server. After lots of debugging I found out that the same error is not only printed for TLS/SSL issues but simply also for broken TCP streams, let it be wrong TCP checksums or a server process dumping core. So I tried IMAP without TLS just to see the same issue with the misleading SSL error replaced by an application hang. I ran truss(1) against Dovecot, placed Thunderbird in debug mode [4] and found out that during a stall condition the server did write(2) all the data to the TCP socket but some data did not arrive at the client. The low-level view of the problem is that Wireshark on the client side sooner or later - not for the first few dozened packets - sees a packet with an incorrect TCP checksum. Usually the next packet is from the server again, continuing the stream. What follows is an expected but fruitless attempt of the client sending duplicate ACKs for the last good packet but the server incorrectly retransmitting more TCP packets with bad checksums. To me it sounds like a broken implementation of hardware generated checksums. Trying to disable all the "-tso" "-lro" "-txcsum" "-rxcsum" options and using "polling" option on the server side network interface did not help. So either something deeper is broken or maybe just the ability to disable these features needs fixing. Btw, the client using "VMware Accelerated AMD PCNet Adapter" driver with "TCP/IP Offload=off" and "TsoEnable=0". Sorry to bother you with more details but here's why I believe it's an hardware/driver issue. Before I purchased the hardware I tried a dry run. Installed FreeBSD 7.1-RELEASE as VM guest, then upgraded to FreeBSD 8.0-CURRENT using FreeBSD Administration Toolkit [5]. Built OS and apps from source, loaded my data - worked! Used the same client that has problems with the real hardware today. Then used that VM as build host to create the NanoBSD [6] Flash image for the ARTiGO. Both use exactly the same sources. The VM works, the metal is broken. One of the few differences is the NIC and it's driver. As a workaround I copied the VM to a usual PC equipped with a fxp(4) NIC - worked! So it really looks like an OS/HW compatibility issue on the ARTiGO. In case you are considering a hardware defect please note that before I loaded the OS, apps and my data to this new hardware I thoroughly tested what I could. One week filling the disks to the max using repetitive copies of a file created from /dev/random and, after manually breaking and rebuilding ZFS mirror, checking data integrity using message digests. No problems with disks, albeit poor SATA performance, but that's another story. One day running memtest86 [7]. No problems with memory. One hour NIC test copying /dev/zero to /dev/null over the wire using "scp -o compression=no". No hangs or hiccups here. Hope you can help me. **** manually trimmed/shaped server details **** # uname -a FreeBSD [...] 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sun Jun 7 13:09:44 CEST 2009 root@[...]:/usr/obj/nanobsd/usr/src/sys/VIAARTIGOA2000 i386 # dmesg CPU: VIA C7-D Processor 1500MHz (1499.85-MHz 686-class CPU) Origin = "CentaurHauls" Id = 0x6d0 Stepping = 0 Features=0xa7c9bbff Features2=0x4001 VIA Padlock Features=0xffcc real memory = 2147483648 (2048 MB) avail memory = 2031333376 (1937 MB) ACPI APIC Table: ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard acpi0: on motherboard pci0: on pcib0 vgapci0: mem 0xd8000000-0xdbffffff,0xde000000-0xdeffffff,0xc0000000-0xcfffffff at device 1.0 on pci0 pcib1: irq 27 at device 2.0 on pci0 pci1: on pcib1 pcib2: irq 31 at device 3.0 on pci0 pci2: on pcib2 vge0: port 0xec00-0xecff mem 0xdf7ff000-0xdf7ff0ff irq 28 at device 0.0 on pci2 miibus0: on vge0 ip1000phy0: PHY 22 on miibus0 ip1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto vge0: WARNING: using obsoleted if_watchdog interface vge0: Ethernet address: 00:40:63:xx:xx:xx # after boot # ifconfig vge0 vge0: flags=8843 metric 0 mtu 1500 options=1b ether 00:40:63:xx:xx:xx inet [...] media: Ethernet autoselect (1000baseT ) status: active # after adding options "-tso" "-lro" "-txcsum" "-rxcsum" "polling" and trying after each one the final result is # ifconfig vge0 vge0: flags=8843 metric 0 mtu 1500 options=18 ether 00:40:63:xx:xx:xx inet [...] media: Ethernet autoselect (1000baseT ) status: active # pciconf -lbv vge0@pci0:2:0:0: class=0x020000 card=0x01101106 chip=0x31191106 rev=0x82 hdr=0x00 vendor = 'VIA Technologies Inc' device = ''Velocity' Gigabit Ethernet Controllers (VT6120/VT6121/VT6122)' class = network subclass = ethernet bar [10] = type I/O Port, range 32, base 0xec00, size 256, enabled bar [14] = type Memory, range 64, base 0xdf7ff000, size 256, enabled # vmstat -i interrupt total rate irq28: vge0 328436 23 **** references **** [1] VIA ARTiGO A2000 is a storage-oriented compact barebone PC -> http://www.via.com.tw/en/products/embedded/artigo/a2000/ [2] Dovecot Secure IMAP server, version 1.1.15 -> http://www.dovecot.org/ [3] Mozilla's Thunderbird email application, version 2.0.0.21 (20090302) -> http://www.mozillamessaging.com/en-US/thunderbird/ [4] run Thunderbird in debug mode set NSPR_LOG_MODULES=IMAP:5 set NSPR_LOG_FILE=C:\thunderbird.txt start /d "C:\Program Files\Mozilla Thunderbird\" thunderbird.exe -> http://wiki.Dovecot.org/Debugging/Thunderbird [5] Convenient FreeBSD Administration Toolkit -> http://people.freebsd.org/~rse/adm/ [6] NanoBSD Howto -> http://www.freebsd.org/doc/en_US.ISO8859-1/articles/nanobsd/ [7] Memory Diagnostic -> http://www.memtest86.com/memtest86-3.5.iso.zip **** related **** No 1000baseTX on VIA Artigo A2000 -> http://apps.sourceforge.net/phpbb/freenas/viewtopic.php?f=9&t=851 kern/130846: [vge] vge0 not autonegotiating to 1000baseTX full duplex in 7.1 -> http://www.freebsd.org/cgi/query-pr.cgi?pr=130846 FreeNAS on the ARTiGO A2000 -> http://www.logicsupply.com/blog/2008/12/29/freenas-on-the-artigo-a2000/ -- http://thomas.lotterer.net