From owner-freebsd-net@freebsd.org Sun Jan 20 05:28:14 2019 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1EAB214A5B52 for ; Sun, 20 Jan 2019 05:28:14 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 9749F86F57 for ; Sun, 20 Jan 2019 05:28:13 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id 576BA14A5B50; Sun, 20 Jan 2019 05:28:13 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 34EE614A5B4F for ; Sun, 20 Jan 2019 05:28:13 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 8077E86F55 for ; Sun, 20 Jan 2019 05:28:12 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 0BDB73CBD4F; Sun, 20 Jan 2019 16:28:09 +1100 (AEDT) Date: Sun, 20 Jan 2019 16:28:07 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Martin Birgmeier cc: Eugene Grosbein , net@freebsd.org Subject: Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior In-Reply-To: <16ce1832-13da-d7bb-cce2-6682e058b5a6@aon.at> Message-ID: <20190120145627.X1077@besplex.bde.org> References: <20190119204156.D929@besplex.bde.org> <3e407ee7-54e3-a6ac-5535-d11aceca9558@grosbein.net> <20190120061258.X3312@besplex.bde.org> <16ce1832-13da-d7bb-cce2-6682e058b5a6@aon.at> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=DZtnkrlW c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=8mpgGmpqUNuNhbMj2SgA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 8077E86F55 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.96 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.96)[-0.956,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Jan 2019 05:28:14 -0000 On Sat, 19 Jan 2019, Martin Birgmeier wrote: > I just tried the patch by Bruce (from the mail sent 10 hours ago), but > it makes no difference. > > Also, it does not seem like bad frames or too high an interrupt rate are > the problem (the machine should easily handle what is coming from its > NFS client which only has a 100 Mbps interface). > > I believe that the simplifications introduced to sys/dev/e1000 between > 11.2 and 12.0 have broken something. They aren't exactly simplifications :-). Did you check for the common problex of a duplex mismatch? ISR that some versions if iflib'ed em didn't negotiate right for your speed of 100 Mbps. Here I can break nfs using "ifconfig em0 media 100baseTX mediaopt full-duplex" and forgetting the mediaopt part. This gives half-duplex. ipv4 ping still works, but its latency increases from ~125 usec to ~76 msec. The latter latency destroys nfs performance. After the media change, there are a lot of DUP packets with an initial latency of ~43 second and the latency decreasing by the ping interval of 1 second for the next 42 or 43 DUPs until the backlog is cleared; the latency is then between 71 and 80 msec. Changing the media and mediaopt back to 1000baseT[X] full-duplex restores low latency but causes 1 DUP with delay ~19 seconds Suspend/resume used to give much the same misbehaviour, by not stopping the NIC when reinitializing it in resume. This was fixed in r342855. This might be the bug! iflib_media_change() calls iflib_init_locked() liked resume used to, so seems to be missing stopping. Changing this should fix at least the DUPs. The function names or layering are confusing. iflib_init_locked() doesn't initialize the if. iflib_if_init_locked() does that. All iflib_init_locked() does is call iflib_stop(), then iflib_init_locked(). and iflib. Grep shows the following related iflib*init*() calls: - iflib_netmap_register manually inlines iflib_if_init_locked(). This is a style bug - iflib_media_change() only calls iflib_init_locked(). This seems to be a bug - _task_fn_admin() calls iflib_if_init_locked() for resetting. This seems to be correctly obfuscated - iflib_if_init_locked() calls iflib_init_locked(). This is part of implementing the obfuscation - iflib_if_init() calls iflib_if_init_locked(). This is correct - iflib_if_ioctl(): SIOCSIFMTU calls iflib_stop(), then does some locking, then sets the mtu in software, then calls iflib_init_locked(). This seems to be correct, and shows that the iflib_if_init_locked() is not even generally useful. This gives down/up for non-null changes. This works correctly (some ping packets are lost, but there are no DUPs. - iflib_if_ioctl(): SIOCSIFCAP is like SIOCSIFMTU, except I didn't test it and its splitting of stopping and init'ing is a bit messier because both operations are under a more complicated conditional. - iflib_if_ioctl(): calls iflib_if_init(). This is correct. - iflib_vlan_[un]register() call iflib_if_init_locked(). This seems to be correctly obfuscated - iflib_device_resume() calls iflib_if_init_locked(). This is correctly obfuscated - if_setinitfn() is called to set iflib_if_init as the init function. This is correct. Summary: only media change seems to be broken, but there are some style bugs. The bug apparently btoke resume by reinitializing an active state (even locking doesn't help much, but I now remember than resume succeeded every 10-100 tries in the buggy versions -- there were always a lot of DUPs, but sometimes to low latency came back). My tests usually used zzz and my zzz and other utilities are on nfs, so nfs was fairly active just before suspend. I don't know if iflib_media_change() is called at boot time, especially if the media is autoselect. At boot time, the state might be less active or closer to the reset state, so that even a manual media change that surely calls iflib_media_change() has more chance of working than at resume time with zzz and other utilities on nfs. I don't know what the media was after the broken resume. Its reported result can't be trusted anyway. To recover from the broken resume, it usually worked to repeat down/up a few times. This is consistent with bug -- eventually, previous down/up's change the state to close enough to stopped. But using the interface in any way (including pinging it to see if it is still broken) makes it not so close to being stopped. Bruce