From owner-freebsd-net@FreeBSD.ORG Fri Nov 17 20:41:59 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BFFAD16A40F for ; Fri, 17 Nov 2006 20:41:59 +0000 (UTC) (envelope-from jdp@polstra.com) Received: from blake.polstra.com (blake.polstra.com [64.81.189.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3308943D49 for ; Fri, 17 Nov 2006 20:41:58 +0000 (GMT) (envelope-from jdp@polstra.com) Received: from strings.polstra.com (strings.polstra.com [64.81.189.67]) by blake.polstra.com (8.13.6/8.13.6) with ESMTP id kAHKfw2w085643; Fri, 17 Nov 2006 12:41:58 -0800 (PST) (envelope-from jdp@polstra.com) Message-ID: X-Mailer: XFMail 1.5.5 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 Date: Fri, 17 Nov 2006 12:41:58 -0800 (PST) From: John Polstra To: freebsd-net@freebsd.org Cc: Jack Vogel Subject: Serious em problems under -current on two different platforms X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Nov 2006 20:41:59 -0000 Folks, I'm using -current from 2006-11-16 05:00 UTC and find that my em interfaces are unusable on two quite different platforms. I've tried a lot of things to make sure it's not a local fubar here, including doing a "make release" using a virgin source tree and installing fresh from the resulting CD (with GENERIC kernel). I also have a netbootable CD image that is part of the project I'm working on, and it admittedly has some minor mods to the kernel. I booted that exact same image on two different platforms with em devices in them, and got the same results as when I used the virgin FreeBSD CD. I don't think this is caused by the recent MSI support. I get the same results when I disable it by adding "hw.pci.enable_msi=0" and "hw.pci.enable_msix=0" to my /boot/loader.conf file. (And I confirmed that MSI wasn't being used when I did that.) The symptoms are complicated, so let's focus on one of the machines. It's a Dell 1950 with two dual-core 3.0 GHz Xeons in it. The em devices look like this (it's a dual-port card PCI-Express card): em0@pci11:0:0: class=0x020000 card=0x125e8086 chip=0x105e8086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/1000 PT' class = network subclass = ethernet em1@pci11:0:1: class=0x020000 card=0x125e8086 chip=0x105e8086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/1000 PT' class = network subclass = ethernet Starting with a freshly-booted system, we see this ifconfig output, as expected: em0: flags=8802 mtu 1500 options=18b ether 00:0e:0c:6f:0e:18 media: Ethernet autoselect (1000baseTX ) status: active em1: flags=8802 mtu 1500 options=18b ether 00:0e:0c:6f:0e:19 media: Ethernet autoselect (1000baseTX ) status: active Now I do "ifconfig em0 10.5.1.1/24" and then ping that address from another machine on the LAN: thin# ping 10.5.1.1 PING 10.5.1.1 (10.5.1.1): 56 data bytes 64 bytes from 10.5.1.1: icmp_seq=0 ttl=64 time=0.524 ms Then nothing after the first reply. Leaving the ping running on the other machine, I configure the address a 2nd time on the Dell with "ifconfig em0 10.5.1.1/24". Still no response. Next, ifconfig em0 down and then up again. After a few seconds, the ping responses start coming in and continue to work. Try a flood ping from the other machine: it works fine. I kill the flood ping and go have lunch for a half-hour, then start up a normal 1-per-second ping from the other machine: thin# ping 10.5.1.1 PING 10.5.1.1 (10.5.1.1): 56 data bytes 64 bytes from 10.5.1.1: icmp_seq=0 ttl=64 time=0.612 ms [then nothing] This time, I check the vmstat -i output a few times, and see that em0 isn't generating any interrupts. I ifconfig em0 down and then up, and the pings start working again. Now, leaving that 1-per-second ping running, I start messing with em1. I do "ifconfig em1 10.6.1.1/24", and within a few seconds, the pings on em0 stop responding. Again em0 isn't generating interrupts. Pings to em1 aren't working, either. I ifconfig em1 down and then up. The pings still aren't working. I set em1's address again with "ifconfig em1 10.6.1.1/24", and the pings start working. Now I ping em0 from the other machine and find that it works, too. Hallelujah! Now both interfaces are working at the same time. But what's the key to getting to this point? I let the pings run for awhile. Pretty soon, both of them stop working again. The other machine is a Tyan 2721 with dual Xeons in it. Its dual-port NIC is on the motherboard, and it looks like this: em0@pci7:1:0: class=0x020000 card=0x10118086 chip=0x10108086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82546EB Dual Port Gigabit Ethernet Controller (Copper)' class = network subclass = ethernet em1@pci7:1:1: class=0x020000 card=0x10118086 chip=0x10108086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82546EB Dual Port Gigabit Ethernet Controller (Copper)' class = network subclass = ethernet I can't get either port to send any packets at all. When I try, the driver reports transmit watchdog timeouts. Is this stuff working for anybody at all? John