Date: Thu, 15 Jun 2017 08:22:04 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 220004] nfsv4: vflush failure/nfsv4 client/server error (IFLIB) Message-ID: <bug-220004-8@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D220004 Bug ID: 220004 Summary: nfsv4: vflush failure/nfsv4 client/server error (IFLIB) Product: Base System Version: CURRENT Hardware: Any OS: Any Status: New Severity: Affects Many People Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: ohartmann@walstatt.org Background: After IFLIB has been introduced a while ago, our Intel-NIC based server/cli= ent systems suffer from corrupted network connectivity. While this problem has introduced itself first on em() and then igb() devices, it seems to be most prominet on em() devices (but also igb()). Both server and client are FreeBSD CURRENT (server: FreeBSD 12.0-CURRENT #15 r319965: Thu Jun 15 05:56:12 CEST 2017 amd64, client is FreeBSD 12.0-CURRENT #20 r319934: Wed Jun 14 06:18:46 CEST 2017 amd64 at this moment). Symptom: While syncing large amounts of data via rsync in context of a NFSv4 client/server infrastructure (both sides, server and client, are 12-CURRENT= of most recent revision), the connection from the client to the server gets corrupted and dies.=20 A while ago, shortly after the introduction of IFLIB and the occurence of t= his bug desaster, the connection from the client side (in the most prominent ca= se an Intel i217-LM (class=3D0x020000 card=3D0x11ed1734 chip=3D0x153a8086 rev= =3D0x05 hdr=3D0x0) NIC of a Fujitsu Celsius M740 workstation), it was possible to r= evive the connection by putting down and bringing up the device via "ifconfig em0 down/up". In the progress of the development, we reached a state where this procedure ends up in a total loss of the NIC in some kind and it wasn't possible to revive the NIC in anyway but the reboot of the system. On a cou= ple of servers equipted with Intels i350 dual-port NICs I was also able to reproduce such a situation and failure by dd'ing a large amount of data fro= m a i350-equipted client to a i350 equipted pendant server (NFSv4 both sides, FreeBSD 12-CURRENT both sides). That was a couple of weeks ago. Now the situation has grown further nasty. While syncing large amounts of data between a FreeBSD 12-CURRENT client and= a server (most recent OS versions as depicted above, client mounts NFSv4 share via autofs, which is built-in into the kernel), which is a large poudriere repository built on the creator machine and then synced via rsync to the repository-delivering host, the connection gets terminated on the client si= de.=20 This time, the NIC on the client seems to be alive, because I can ping other hosts. Hitting "Ctrl-T" on the terminal from which I initiated the syncing process, I get either=20 load: 0.34 cmd: rsync 13468 [nfsaio] 101.43r 0.00u 29.36s 0% 2089k or, after I tried to restart automount on client-side as well as mountd on = the server side load: 0.22 cmd: rsync 23467 [nfsreq] 2364.43r 0.00u 39.11s 0% 3076k. The console on the client shows [...] WARNING: autofs_unmount: vflush failed with error 16 nfsv4 client/server protocol prob err=3D10020 nfsv4 client/server protocol prob err=3D10020 nfsv4 client/server protocol prob err=3D10020 So the most intuitive procedure to reset the connection fails and I'm stuck. Please see also Bug 219428. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-220004-8>