Date: Thu, 20 Mar 2014 15:32:02 +0200 From: Daniel Braniss <danny@cs.huji.ac.il> To: Garrett Wollman <wollman@bimajority.org> Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org, jackv@freebsd.org Subject: Re: Network stack returning EFBIG? Message-ID: <868FFD0A-106E-4C5E-A61C-10C3895C3281@cs.huji.ac.il> In-Reply-To: <21290.60558.750106.630804@hergotha.csail.mit.edu> References: <21290.60558.750106.630804@hergotha.csail.mit.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
turn off TSO the problems sound similar to the one I reported a while back. truing = off tso fixed it. danny On Mar 20, 2014, at 3:26 PM, Garrett Wollman <wollman@bimajority.org> = wrote: > I recently put a new server running 9.2 (with a local patches for NFS) > into production, and it's immediately started to fail in an odd way. > Since I pounded this server pretty heavily and never saw the error in > testing, I'm more than a little bit taken aback. We have identical > hardware in production with 9.1, and I have the same kernel running > just peachy on a machine with Chelsio T4 NICs. The problem machine = has > ixgbe(4): >=20 > ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15> = port 0x9c00-0x9c1f mem 0xdef80000-0xdeffffff,0xdef7c000-0xdef7ffff irq = 24 at device 0.0 on pci2 > ix0: Using MSIX interrupts with 7 vectors > ix0: Ethernet address: 04:7d:7b:a5:87:32 > ix0: PCI Express Bus: Speed 5.0GT/s Width x4 > ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15> = port 0x9880-0x989f mem 0xdee80000-0xdeefffff,0xdee7c000-0xdee7ffff irq = 34 at device 0.1 on pci2 > ix1: Using MSIX interrupts with 7 vectors > ix1: Ethernet address: 04:7d:7b:a5:87:33 > ix1: PCI Express Bus: Speed 5.0GT/s Width x4 >=20 > (pciconf tells me these are "82599EB 10-Gigabit SFI/SFP+ Network > Connection". It's a bug that the driver doesn't tell me that.) >=20 > These are glued together in a lagg(4) using LACP. >=20 > Since we put this server into production, random network system calls > have started failing with [EFBIG] or maybe sometimes [EIO]. I've > observed this with a simple ping, but various daemons also log the > errors: > Mar 20 09:22:04 nfs-prod-4 sshd[42487]: fatal: Write failed: File too = large [preauth] > Mar 20 09:23:44 nfs-prod-4 nrpe[42492]: Error: Could not complete SSL = handshake. 5 >=20 > The machine eventually becomes unreachable and has to be rebooted from > the console. >=20 > So, can anyone tell me how this is possible, and what changed between > 9.1 and 9.2 to cause it? >=20 > -GAWollman > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to = "freebsd-stable-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?868FFD0A-106E-4C5E-A61C-10C3895C3281>