From owner-freebsd-current@freebsd.org Fri Mar 17 13:15:24 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7FE6CD10CD6 for ; Fri, 17 Mar 2017 13:15:24 +0000 (UTC) (envelope-from Alexander@leidinger.net) Received: from mailgate.Leidinger.net (mailgate.leidinger.net [IPv6:2a00:1828:2000:375::1:5]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 297EC169E; Fri, 17 Mar 2017 13:15:24 +0000 (UTC) (envelope-from Alexander@leidinger.net) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=leidinger.net; s=outgoing-alex; t=1489756502; bh=AJaTWFNqtXUjTyuNpHkKnnhb6BKrH2arAVWjM6vU+k0=; h=Date:From:To:Subject:In-Reply-To; b=3qj0J3wrKb2zC+4pn1Ln4Vs4FwT241SqVpR/gj8/POcr4B7t3oWBv32FlgjLdEUAp Ii4CfoejaA19BaH6d3UrVjQpHgXa0NoMNzpAi5TpkCSh+iKu0vPlfWklgD10tgbaTs KszYFwG1AFXFKhXJuglDig9u+ybiozc2Rc9c80rTUnil+8mb6L1vVd2pRct1HF2sVF r6SJV0UGP3/j+P/FVenUAconraTji+WYCeyqCFEapL9cn3Gmn0pf8/34wEtg7VXcBZ IplffY+OH/TCNYGzwKLQ/Wzrnaas9lxfU01zx/puZdY+GlmILQD//zyBNAVBRZe2rK pH/vZJVWcEbVw== DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=leidinger.net; s=outgoing-alex; t=1489756521; bh=AJaTWFNqtXUjTyuNpHkKnnhb6BKrH2arAVWjM6vU+k0=; h=Date:From:To:Subject:In-Reply-To; b=xLD+ZBiye5S94LNsHEeR6QNYLcowQLL7G3DyoaBZI54LkcTRVdQxyz0xnYHfVPPfm QL/WjlpI84mZ6gM3uAmLfKlj6isOhlMlFYfqteWrt9pyoZvsB6/6BiOBDS/nRD+thq +Q1bHS5xvHZqReXDC+r8M5/bCsDcIti2d8l/g3EtDGwlgjIhdPfytFbhGZAdPsy4sm Jz5F3aQk+RXHxuwKQalWo09yxblHC/xPLTk0Bd24pM4+Z2VwL4Ufv9IltjaZfTt23n j3l4iKH95fl8VqEzyYcPkaV4CukDZhzyCQx4ozCcW+5x1DY/yoXzqRvz+PsWcGOhZr DbR6xQ9dmBIuQ== Date: Fri, 17 Mar 2017 14:15:01 +0100 Message-ID: <20170317141501.Horde.YTCr8GuMV2yI1YaUkdRTLlu@webmail.leidinger.net> From: Alexander Leidinger To: freebsd-current@freebsd.org, sbruno@freebsd.org, mmacy@nextbsd.org Subject: Re: CURRENT: massive em0 NIC problems since IFLIB changes/introduction In-Reply-To: <20170317122018.21384497@freyja.zeit4.iv.bundesimmobilien.de> User-Agent: Horde Application Framework 5 Content-Type: multipart/signed; boundary="=_AtAW9EhHBcEoJFkw-6Jexgy"; protocol="application/pgp-signature"; micalg=pgp-sha1 MIME-Version: 1.0 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Mar 2017 13:15:24 -0000 This message is in MIME format and has been PGP signed. --=_AtAW9EhHBcEoJFkw-6Jexgy Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Quoting "O. Hartmann" (from Fri, 17 Mar 2017=20=20 12:20:18=20+0100): > Since the introduction of the IFLIB changes, I realise severe problems on > CURRENT. I already reported something like this to sbruno@ and M. Macy (in copy). > Running the most recent CURRENT (FreeBSD 12.0-CURRENT #27 r315442: Fri Ma= r 17 > 10:46:04 CET 2017 amd64), the problems on a workstation got severe=20=20 >=20within the > past two days: > > since a couple of weeks the em0 NIC (Intel i217-LM, see below) dies on he= avy > I/O. I realised this first when "rsync"ing poudriere repositories to a re= mote > NFSv4 (automounted) folder. The em0 device could be revived by=20=20 >=20ifconfig down/up > procedure. > But not the i217-LM chip is affected. On another box equipted with a=20= =20 >=20i350 dual > port GBit NIC I observed a similar behaviour under (artificially)=20=20 >=20high I/O load > (but I didn't investigate that further since it occured very seldom). It's not only those chipsets. It may be beneficial if you could provide the pciconf output for those=20= =20 devices.=20Mine is: ---snip--- em0@pci0:2:6:0: class=3D0x020000 card=3D0x13768086 chip=3D0x107c8086=20=20 rev=3D0x05 hdr=3D0x00 vendor =3D 'Intel Corporation' device =3D '82541PI Gigabit Ethernet Controller' ---snip--- > Now, since around yesterday, the i217-LM dies without being reviveable wi= th > ifconfig down/up: Doing so, my FreeBSD CURRENT machine (Fujitsu Celsius M= 740) I don't know if for the chip I see this issue with a simple down/up=20=20 would=20help (it's a headless server in a remote datacenter). For the=20=20 moment=20I'm using the workaround of something like "ping -C 1 =20= =20 ||=20shutdown -r now" in crontab. The system in question is at r314137. > remains with a dead em0 device, reporting "no route" in some occasions bu= t > stuck in the dead state. Every attempt to establish manually the route ag= ain > fails, only rebooting the box gives some relief. > > On the console, I have some very strange reports: > > - ping reports suddenly about no buffer space > - or I see sometimes massive occurences of "em0: TX(0) desc avail =3D=20= =20 >=201024, pidx > =3D 0" on the console I don't see this in messages or console log, but I see that ntpd can't=20= =20 resolve=20hostnames in the logs. > Either way, sending/receiving large files on an established network GBit = line > which could be saturated by approx 100 MBytes/s tend to make the NIC fail= . I can report that the "svnlite update" on the box of of the FreeBSD=20=20 src=20tree is able to trigger the issue in my case. I have to add that before the iflib changes I've seen frequent=20=20 em-watchdog=20timeouts in the logs / dmesg. So for me we have two issues=20= =20 here: =20 - the hardware wasn't 100% supported before the iflib changes (it seems= ) - the iflib changes have lost some watchdog functionality /=20=20 auto-failure-recovery=20feature Bye, Alexander. --=20 http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF --=_AtAW9EhHBcEoJFkw-6Jexgy Content-Type: application/pgp-signature Content-Description: Digitale PGP-Signatur Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYy+FVAAoJEKrxQhqFIICEHKoQAKU3nPovl+i/j7kn1DmmSDlQ NNs0DF3+Ist19T8p8aZ0H4H5Xbb/m3eS/lL0iqprtLS2Bc0n0RzLBcJ2BOTtNfps S4NaX3pe4BHVS24A64Yz1+v4w7Gv6Y7mFpT/QfYK9gA7eXfiKfCQeP3cVnNrwNTu BDKYNamxgZtWTs2woIBkUvlskViG4zqwlZXffxPd6j3t3SnvsxScXUwQDcK2RXuZ d+OcGt4GS0Phbg3W8JQJZF9x3o59kHIiXpE+XIbhtKEpyuakuOqm8F2Hx5VtpTnG /u/mREcM72xkQ0PrEKeZ8lpQkMQVhBNrTVuFh7A2bUOg/54U7RQx6M0YVCSZE7H0 Xru6+oyU7DpQtcm9e/gRT9xTP9RERTYECIrkXT+CR3Hm8ZBadKurm9WWZd+DpLn1 8KpQ8b9jfyQKayq+evtoHnZBhow7es4sFsIQF57wCX8xWTy3ZC/tzpSDqvgwS+DI ozKUv2bZqCDreqUBjpD04rukG5qcBZCG5kPPLLeSyGBuerhmiuXaXTiy8VBhQsLK kqi+f/DTxfQo0AzkzaQT4q1VM7EbizazkfxvD+1FqPLT/cC/vjU7CvxH2hNC788a KlzhmxPkRu/0xBQ6ydHiZPl9TPbq/su5ufJB8fyTel0iV1SrR42EpTnZRCgeQlWU NKHef44yHOeNiXv9HgDz =IBfu -----END PGP SIGNATURE----- --=_AtAW9EhHBcEoJFkw-6Jexgy--