From owner-freebsd-net@freebsd.org  Sun Jan 20 05:28:14 2019
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1EAB214A5B52
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Sun, 20 Jan 2019 05:28:14 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 9749F86F57
 for <freebsd-net@freebsd.org>; Sun, 20 Jan 2019 05:28:13 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 576BA14A5B50; Sun, 20 Jan 2019 05:28:13 +0000 (UTC)
Delivered-To: net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 34EE614A5B4F
 for <net@mailman.ysv.freebsd.org>; Sun, 20 Jan 2019 05:28:13 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 8077E86F55
 for <net@freebsd.org>; Sun, 20 Jan 2019 05:28:12 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 0BDB73CBD4F;
 Sun, 20 Jan 2019 16:28:09 +1100 (AEDT)
Date: Sun, 20 Jan 2019 16:28:07 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Martin Birgmeier <d8zNeCFG@aon.at>
cc: Eugene Grosbein <eugen@grosbein.net>, net@freebsd.org
Subject: Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior
In-Reply-To: <16ce1832-13da-d7bb-cce2-6682e058b5a6@aon.at>
Message-ID: <20190120145627.X1077@besplex.bde.org>
References: <bug-235031-7501@https.bugs.freebsd.org/bugzilla/>
 <bug-235031-7501-goXNmp3zVl@https.bugs.freebsd.org/bugzilla/>
 <20190119204156.D929@besplex.bde.org>
 <3e407ee7-54e3-a6ac-5535-d11aceca9558@grosbein.net>
 <20190120061258.X3312@besplex.bde.org>
 <16ce1832-13da-d7bb-cce2-6682e058b5a6@aon.at>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=DZtnkrlW c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=8mpgGmpqUNuNhbMj2SgA:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: 8077E86F55
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-6.96 / 15.00];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-0.96)[-0.956,0]; REPLY(-4.00)[];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 20 Jan 2019 05:28:14 -0000

On Sat, 19 Jan 2019, Martin Birgmeier wrote:

> I just tried the patch by Bruce (from the mail sent 10 hours ago), but
> it makes no difference.
>
> Also, it does not seem like bad frames or too high an interrupt rate are
> the problem (the machine should easily handle what is coming from its
> NFS client which only has a 100 Mbps interface).
>
> I believe that the simplifications introduced to sys/dev/e1000 between
> 11.2 and 12.0 have broken something.

They aren't exactly simplifications :-).

Did you check for the common problex of a duplex mismatch?  ISR that some
versions if iflib'ed em didn't negotiate right for your speed of 100 Mbps.

Here I can break nfs using "ifconfig em0 media 100baseTX mediaopt
full-duplex" and forgetting the mediaopt part.  This gives half-duplex.
ipv4 ping still works, but its latency increases from ~125 usec to ~76
msec.  The latter latency destroys nfs performance.  After the media
change, there are a lot of DUP packets with an initial latency of ~43
second and the latency decreasing by the ping interval of 1 second for
the next 42 or 43 DUPs until the backlog is cleared; the latency is
then between 71 and 80 msec.  Changing the media and mediaopt back
to 1000baseT[X] full-duplex restores low latency but causes 1 DUP with
delay ~19 seconds

Suspend/resume used to give much the same misbehaviour, by not stopping
the NIC when reinitializing it in resume.  This was fixed in r342855.
This might be the bug!  iflib_media_change() calls iflib_init_locked()
liked resume used to, so seems to be missing stopping.  Changing this
should fix at least the DUPs.

The function names or layering are confusing.  iflib_init_locked()
doesn't initialize the if.  iflib_if_init_locked() does that.  All
iflib_init_locked() does is call iflib_stop(), then iflib_init_locked().
and iflib.  Grep shows the following related iflib*init*() calls:
- iflib_netmap_register manually inlines iflib_if_init_locked().  This
   is a style bug
- iflib_media_change() only calls iflib_init_locked().  This seems to be
   a bug
- _task_fn_admin() calls iflib_if_init_locked() for resetting.  This seems
   to be correctly obfuscated
- iflib_if_init_locked() calls iflib_init_locked().  This is part of
   implementing the obfuscation 
- iflib_if_init() calls iflib_if_init_locked().  This is correct
- iflib_if_ioctl(): SIOCSIFMTU calls iflib_stop(), then does some locking,
   then sets the mtu in software, then calls iflib_init_locked().  This
   seems to be correct, and shows that the iflib_if_init_locked() is not
   even generally useful.  This gives down/up for non-null changes.  This
   works correctly (some ping packets are lost, but there are no DUPs.
- iflib_if_ioctl(): SIOCSIFCAP is like SIOCSIFMTU, except I didn't test
   it and its splitting of stopping and init'ing is a bit messier because
   both operations are under a more complicated conditional.
- iflib_if_ioctl():<reinit case at the end> calls iflib_if_init().  This
   is correct.
- iflib_vlan_[un]register() call iflib_if_init_locked().  This seems to be
   correctly obfuscated
- iflib_device_resume() calls iflib_if_init_locked().  This is correctly
   obfuscated
- if_setinitfn() is called to set iflib_if_init as the init function.  This
   is correct.

Summary: only media change seems to be broken, but there are some style bugs.

The bug apparently btoke resume by reinitializing an active state
(even locking doesn't help much, but I now remember than resume
succeeded every 10-100 tries in the buggy versions -- there were always
a lot of DUPs, but sometimes to low latency came back).  My tests
usually used zzz and my zzz and other utilities are on nfs, so nfs was
fairly active just before suspend.

I don't know if iflib_media_change() is called at boot time, especially
if the media is autoselect.  At boot time, the state might be less
active or closer to the reset state, so that even a manual media change
that surely calls iflib_media_change() has more chance of working than
at resume time with zzz and other utilities on nfs.

I don't know what the media was after the broken resume.  Its reported
result can't be trusted anyway.  To recover from the broken resume, it
usually worked to repeat down/up a few times.  This is consistent with
bug -- eventually, previous down/up's change the state to close enough
to stopped.  But using the interface in any way (including pinging it
to see if it is still broken) makes it not so close to being stopped.

Bruce