Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Mar 2017 14:15:01 +0100
From:      Alexander Leidinger <Alexander@leidinger.net>
To:        freebsd-current@freebsd.org, sbruno@freebsd.org, mmacy@nextbsd.org
Subject:   Re: CURRENT: massive em0 NIC problems since IFLIB changes/introduction
Message-ID:  <20170317141501.Horde.YTCr8GuMV2yI1YaUkdRTLlu@webmail.leidinger.net>
In-Reply-To: <20170317122018.21384497@freyja.zeit4.iv.bundesimmobilien.de>

next in thread | previous in thread | raw e-mail | index | archive | help
This message is in MIME format and has been PGP signed.

--=_AtAW9EhHBcEoJFkw-6Jexgy
Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Quoting "O. Hartmann" <ohartmann@walstatt.org> (from Fri, 17 Mar 2017=20=20
12:20:18=20+0100):

> Since the introduction of the IFLIB changes, I realise severe problems on
> CURRENT.

I already reported something like this to sbruno@ and M. Macy (in copy).

> Running the most recent CURRENT (FreeBSD 12.0-CURRENT #27 r315442: Fri Ma=
r 17
> 10:46:04 CET 2017  amd64), the problems on a workstation got severe=20=20
>=20within the
> past two days:
>
> since a couple of weeks the em0 NIC (Intel i217-LM, see below) dies on he=
avy
> I/O. I realised this first when "rsync"ing poudriere repositories to a re=
mote
> NFSv4 (automounted) folder. The em0 device could be revived by=20=20
>=20ifconfig down/up
> procedure.
> But not the i217-LM chip is affected. On another box equipted with a=20=
=20
>=20i350 dual
> port GBit NIC I observed a similar behaviour under (artificially)=20=20
>=20high I/O load
> (but I didn't investigate that further since it occured very seldom).

It's not only those chipsets.

It may be beneficial if you could provide the pciconf output for those=20=
=20
devices.=20Mine is:
---snip---
em0@pci0:2:6:0: class=3D0x020000 card=3D0x13768086 chip=3D0x107c8086=20=20
rev=3D0x05 hdr=3D0x00
     vendor     =3D 'Intel Corporation'
     device     =3D '82541PI Gigabit Ethernet Controller'
---snip---

> Now, since around yesterday, the i217-LM dies without being reviveable wi=
th
> ifconfig down/up: Doing so, my FreeBSD CURRENT machine (Fujitsu Celsius M=
740)

I don't know if for the chip I see this issue with a simple down/up=20=20
would=20help (it's a headless server in a remote datacenter). For the=20=20
moment=20I'm using the workaround of something like "ping -C 1 <gateway>=20=
=20
||=20shutdown -r now" in crontab.

The system in question is at r314137.

> remains with a dead em0 device, reporting "no route" in some occasions bu=
t
> stuck in the dead state. Every attempt to establish manually the route ag=
ain
> fails, only rebooting the box gives some relief.
>
> On the console, I have some very strange reports:
>
> - ping reports suddenly about no buffer space
> - or I see sometimes massive occurences of "em0: TX(0) desc avail =3D=20=
=20
>=201024, pidx
>   =3D 0" on the console

I don't see this in messages or console log, but I see that ntpd can't=20=
=20
resolve=20hostnames in the logs.

> Either way, sending/receiving large files on an established network GBit =
line
> which could be saturated by approx 100 MBytes/s tend to make the NIC fail=
.

I can report that the "svnlite update" on the box of of the FreeBSD=20=20
src=20tree is able to trigger the issue in my case.

I have to add that before the iflib changes I've seen frequent=20=20
em-watchdog=20timeouts in the logs / dmesg. So for me we have two issues=20=
=20
here:
=20 - the hardware wasn't 100% supported before the iflib changes (it seems=
)
  - the iflib changes have lost some watchdog functionality /=20=20
auto-failure-recovery=20feature

Bye,
Alexander.

--=20
http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF

--=_AtAW9EhHBcEoJFkw-6Jexgy
Content-Type: application/pgp-signature
Content-Description: Digitale PGP-Signatur
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAABAgAGBQJYy+FVAAoJEKrxQhqFIICEHKoQAKU3nPovl+i/j7kn1DmmSDlQ
NNs0DF3+Ist19T8p8aZ0H4H5Xbb/m3eS/lL0iqprtLS2Bc0n0RzLBcJ2BOTtNfps
S4NaX3pe4BHVS24A64Yz1+v4w7Gv6Y7mFpT/QfYK9gA7eXfiKfCQeP3cVnNrwNTu
BDKYNamxgZtWTs2woIBkUvlskViG4zqwlZXffxPd6j3t3SnvsxScXUwQDcK2RXuZ
d+OcGt4GS0Phbg3W8JQJZF9x3o59kHIiXpE+XIbhtKEpyuakuOqm8F2Hx5VtpTnG
/u/mREcM72xkQ0PrEKeZ8lpQkMQVhBNrTVuFh7A2bUOg/54U7RQx6M0YVCSZE7H0
Xru6+oyU7DpQtcm9e/gRT9xTP9RERTYECIrkXT+CR3Hm8ZBadKurm9WWZd+DpLn1
8KpQ8b9jfyQKayq+evtoHnZBhow7es4sFsIQF57wCX8xWTy3ZC/tzpSDqvgwS+DI
ozKUv2bZqCDreqUBjpD04rukG5qcBZCG5kPPLLeSyGBuerhmiuXaXTiy8VBhQsLK
kqi+f/DTxfQo0AzkzaQT4q1VM7EbizazkfxvD+1FqPLT/cC/vjU7CvxH2hNC788a
KlzhmxPkRu/0xBQ6ydHiZPl9TPbq/su5ufJB8fyTel0iV1SrR42EpTnZRCgeQlWU
NKHef44yHOeNiXv9HgDz
=IBfu
-----END PGP SIGNATURE-----

--=_AtAW9EhHBcEoJFkw-6Jexgy--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170317141501.Horde.YTCr8GuMV2yI1YaUkdRTLlu>