From owner-freebsd-net@FreeBSD.ORG Thu Apr 23 08:21:27 2015 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AE138A7D; Thu, 23 Apr 2015 08:21:27 +0000 (UTC) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5120E1AE5; Thu, 23 Apr 2015 08:21:27 +0000 (UTC) Received: from mh0.gentlemail.de (ezra.dcm1.omnilan.net [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id t3N8LML0006391; Thu, 23 Apr 2015 10:21:22 +0200 (CEST) (envelope-from h.schmalzbauer@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 3354281C; Thu, 23 Apr 2015 10:21:22 +0200 (CEST) Message-ID: <5538AB75.4070401@omnilan.de> Date: Thu, 23 Apr 2015 10:21:09 +0200 From: Harald Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Jack Vogel CC: FreeBSD Stable , "freebsd-net@freebsd.org" Subject: Re: igb(4) watchdog timeout, lagg(4) fails References: <54ACC6A2.1050400@omnilan.de> <54AE565D.50208@omnilan.de> <54AE5A6B.7040601@omnilan.de> <54AFA784.6020102@omnilan.de> <54B10432.8050909@omnilan.de> <54DB8975.2030001@omnilan.de> <54DBB1F5.1090601@omnilan.de> <54E733FA.1020208@omnilan.de> In-Reply-To: <54E733FA.1020208@omnilan.de> X-Enigmail-Version: 1.1.2 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigE4E7208CF628E4EC4EE9682D" X-Greylist: ACL 119 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Thu, 23 Apr 2015 10:21:22 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Apr 2015 08:21:27 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigE4E7208CF628E4EC4EE9682D Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Bez=FCglich Harald Schmalzbauer's Nachricht vom 20.02.2015 14:17 (localt= ime): (https://lists.freebsd.org/pipermail/freebsd-stable/2015-February/081810.= html) > Bez=FCglich Harald Schmalzbauer's Nachricht vom 11.02.2015 20:48 > (localtime): >> Bez=FCglich Jack Vogel's Nachricht vom 11.02.2015 18:31 (localtime): >>> tdh and tdt mean the head and tail indices of the ring, and these >>> values are >>> obviously severely borked :) Hello Jack, could you find some time for having a look at this problem? The reported values don't bother me, but the watchdog timeout which happens on NICs that are PCIe-connected via the PCH. Please see my previouse findings. I think the most significant hint for my problem seems to be the link_irq, which becomes garbage at the first watchdog timeout occurrence, like previously described: >> =85 >> For the records: Rebooting the machine (ESXi guest-only!) brought the >> stalled igb1 back to operation. >> The guest has 2 igb (kawela) ports, one from a NIC(Intel ET Dual Port >> 82576)@CPU-PCIe and the second port from an identical NIC, but connect= ed >> via PCH-PCIe. >> The watchdog timeout problem only occurs with the port from the >> PCH-PCIe-connected NIC (falisfied)! >> After the reboot the suspicious "dev.igb.1.link_irq=3D848" turned into= : >> dev.igb.0.link_irq: 3 >> dev.igb.1.link_irq: 4 > Jack, > > I'd like to let you know that "dev.igb.1.link_irq" again shows garbage > after the watchdog timeout problem occured again: > dev.igb.1.link_irq: 1458 > > I can imagine that resetting goes wrong and ends in loss of link_irq. > I now have to reboot the guest to get igb1 back to a working state, the= n > the link_irq will show "4" again, but I can't tell you what was first, > the timeour-reset or the "link_irq" jam. I guess the latter can't be th= e > case, but I have no idea about the code Thanks for any help, currently my lagg setup is permanently degraded :-( Would be nice to have it back in a working state :-) -Harry --------------enigE4E7208CF628E4EC4EE9682D Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAlU4q4EACgkQLDqVQ9VXb8iWYQCg0nEJGLjm1TSYzXZ4ZQtCG0yh MM4An1k7NyDhS9rSfHuZsndSj+amv+hN =nvl7 -----END PGP SIGNATURE----- --------------enigE4E7208CF628E4EC4EE9682D--