Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Jun 2017 08:22:04 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 220004] nfsv4: vflush failure/nfsv4 client/server error (IFLIB)
Message-ID:  <bug-220004-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D220004

            Bug ID: 220004
           Summary: nfsv4: vflush failure/nfsv4 client/server error
                    (IFLIB)
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: ohartmann@walstatt.org

Background:

After IFLIB has been introduced a while ago, our Intel-NIC based server/cli=
ent
systems suffer from corrupted network connectivity. While this problem has
introduced itself first on em() and then igb() devices, it seems to be most
prominet on em() devices (but also igb()).

Both server and client are FreeBSD CURRENT (server: FreeBSD 12.0-CURRENT #15
r319965: Thu Jun 15 05:56:12 CEST 2017 amd64, client is FreeBSD 12.0-CURRENT
#20 r319934: Wed Jun 14 06:18:46 CEST 2017 amd64 at this moment).

Symptom: While syncing large amounts of data via rsync in context of a NFSv4
client/server infrastructure (both sides, server and client, are 12-CURRENT=
 of
most recent revision), the connection from the client to the server gets
corrupted and dies.=20

A while ago, shortly after the introduction of IFLIB and the occurence of t=
his
bug desaster, the connection from the client side (in the most prominent ca=
se
an Intel i217-LM (class=3D0x020000 card=3D0x11ed1734 chip=3D0x153a8086 rev=
=3D0x05
hdr=3D0x0) NIC of a Fujitsu Celsius M740 workstation), it was possible to r=
evive
the connection by putting down and bringing up the device via "ifconfig em0
down/up". In the progress of the development, we reached a state where this
procedure ends up in a total loss of the NIC in some kind and it wasn't
possible to revive the NIC in anyway but the reboot of the system. On a cou=
ple
of servers equipted with Intels i350 dual-port NICs I was also able to
reproduce such a situation and failure by dd'ing a large amount of data fro=
m a
i350-equipted client to a i350 equipted pendant server (NFSv4 both sides,
FreeBSD 12-CURRENT both sides). That was a couple of weeks ago.

Now the situation has grown further nasty.

While syncing large amounts of data between a FreeBSD 12-CURRENT client and=
 a
server (most recent OS versions as depicted above, client mounts NFSv4 share
via autofs, which is built-in into the kernel), which is a large poudriere
repository built on the creator machine and then synced via rsync to the
repository-delivering host, the connection gets terminated on the client si=
de.=20

This time, the NIC on the client seems to be alive, because I can ping other
hosts. Hitting "Ctrl-T" on the terminal from which I initiated the syncing
process, I get either=20

load: 0.34  cmd: rsync 13468 [nfsaio] 101.43r 0.00u 29.36s 0% 2089k

or, after I tried to restart automount on client-side as well as mountd on =
the
server side

load: 0.22  cmd: rsync 23467 [nfsreq] 2364.43r 0.00u 39.11s 0% 3076k.

The console on the client shows

[...]
WARNING: autofs_unmount: vflush failed with error 16
nfsv4 client/server protocol prob err=3D10020
nfsv4 client/server protocol prob err=3D10020
nfsv4 client/server protocol prob err=3D10020

So the most intuitive procedure to reset the connection fails and I'm stuck.

Please see also Bug 219428.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-220004-8>