From owner-freebsd-net@freebsd.org Sat Jan 19 10:21:12 2019 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1E7F114A18C0 for ; Sat, 19 Jan 2019 10:21:12 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 694AB72CC0 for ; Sat, 19 Jan 2019 10:21:11 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id 2ACFC14A18BD; Sat, 19 Jan 2019 10:21:11 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E435514A18BC for ; Sat, 19 Jan 2019 10:21:10 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4C1AF72CB3 for ; Sat, 19 Jan 2019 10:21:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 0BF273D22E0; Sat, 19 Jan 2019 21:21:07 +1100 (AEDT) Date: Sat, 19 Jan 2019 21:21:04 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Martin Birgmeier cc: net@freebsd.org Subject: Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior In-Reply-To: Message-ID: <20190119204156.D929@besplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=_i4tGWe3wubdyQ8lN-IA:9 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22 X-Rspamd-Queue-Id: 4C1AF72CB3 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.98 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.98)[-0.984,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jan 2019 10:21:12 -0000 On Fri, 18 Jan 2019 a bug that doesn't want replies@freebsd.org wrote: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235031 > > Yes; I just thought it was going to help and wanted to make it permanent right > away. Bad idea. > > In the meantime: > > [0]# cat /var/db/ntpd.drift > -6.596 > [0]# > > What can you get from the ntp drift? I doubt that anything can be got from the ntp drift. Maybe watching it for several hours would show that it is wild, but wildness shouldn't affect nfs throughput much. I use a couple of fixes for iflib and em, but only the following one is related to nfs on PRO-1000: XX Index: em_txrx.c XX =================================================================== XX --- em_txrx.c (revision 343087) XX +++ em_txrx.c (working copy) XX @@ -634,9 +634,20 @@ XX XX /* Make sure bad packets are discarded */ XX if (errors & E1000_RXD_ERR_FRAME_ERR_MASK) { XX +#if 0 XX adapter->dropped_pkts++; XX - /* XXX fixup if common */ XX return (EBADMSG); XX +#else XX + /* XX + * XXX the above error handling is worse than none. XX + * First it it drops 'i' packets before the current XX + * one and doesn't count them. Then it returns an XX + * error. iflib can't really handle this error. XX + * It just resets, and this usually drops many more XX + * packets (without counting them) and much time. XX + */ XX + printf("lem: frame error: ignored\n"); XX +#endif XX } XX XX ri->iri_frags[i].irf_flid = 0; XX @@ -697,8 +708,12 @@ XX XX /* Make sure bad packets are discarded */ XX if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) { XX +#if 0 XX adapter->dropped_pkts++; XX return EBADMSG; XX +#else XX + printf("em: frame error: ignored\n"); XX +#endif XX } XX XX ri->iri_frags[i].irf_flid = 0; On my system, the bug fixed by this only occurs rarely, and only on PRO-1000 (not on I218-V going through the same low-end network switch), and has only been observed under moderately heavy nfs use with lots of small RPCs and not many i/o's. When it occurs, nfs with unpatched em takes many seconds to recover, but with the patch nfs barely notices the error. I use nfs over UDP since TCP is significantly slower due to higher latency once the network latency is low enough (here it is 51 usec for old PRO-1000 and 80 usec for I218-V, with about 20 usec in the switch and a lower latency old bge NIC on the other side). UDP gives worse error recovery. Your problem looks more like lost interrupts. All em NICs should interrupt at the default interrupt moderation rate of 8 kHz under load. Once there are are that many interrupts, there is not much else that can go wrong (nfs would have to be working to generate that many interrupts). Bugs in iflib are easy to avoid by running FreeBSD-11. PRO-1000 is supported by most versions of FreeBSD and doesn't have the bug fixed by the above in FreeBSD[7-11]. Bruce