From owner-freebsd-stable@FreeBSD.ORG Sat Jan 10 10:51:44 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B3961DD6 for ; Sat, 10 Jan 2015 10:51:44 +0000 (UTC) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 57189786 for ; Sat, 10 Jan 2015 10:51:43 +0000 (UTC) Received: from mh0.gentlemail.de (mh0.gentlemail.de [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id t0AApeWD066573; Sat, 10 Jan 2015 11:51:40 +0100 (CET) (envelope-from h.schmalzbauer@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 751B0453; Sat, 10 Jan 2015 11:51:39 +0100 (CET) Message-ID: <54B10432.8050909@omnilan.de> Date: Sat, 10 Jan 2015 11:51:30 +0100 From: Harald Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Jack Vogel Subject: Re: igb(4) watchdog timeout, lagg(4) fails References: <54ACC6A2.1050400@omnilan.de> <54AE565D.50208@omnilan.de> <54AE5A6B.7040601@omnilan.de> <54AFA784.6020102@omnilan.de> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigD1C941E7404D34A3D89F0F8E" X-Greylist: ACL 119 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Sat, 10 Jan 2015 11:51:40 +0100 (CET) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) Cc: FreeBSD Stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Jan 2015 10:51:44 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigD1C941E7404D34A3D89F0F8E Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Bez=FCglich Jack Vogel's Nachricht vom 09.01.2015 18:46 (localtime): > The tuneable interrupt rate code is not mine, and looking at it I'm not= > entirely > sure it works. Why are you focused on the interrupt rate anyway, do you= have > some reason to tie it to the watchdog? > > You could turn AIM off (enable_aim) and see if that changed anything? > > It seems most the time problems show up they involve the use of lagg, i= f you > take it out of the mix does the problem go away? Thanks for your attention! Unfortunately I can't test anything without lagg(4), this machine is in production (with lagg(4) being parent of lots of vlan-interfaces). I guess the watchdog timeout is more often reported by people with lagg(4) in use for the reason that that's where igb(4) really get's some (peak-)load ;-) Serious, I can't see how lagg(4) should be the culprit for watchdog timeots, but stuck interrupts was my first guess. Especially since I'm doing the kld-reload-trick to get msi-x working inside ESXi (reported 2 years ago that booting FreeBSD initializes the passthrough device with some kind of wrong device-type-identifier; warmbooting the guest or simply kld-reloading solves this problem, the hypervisor then get's the correct device-type-indicator (for using msi-x)= ). Like mentioned this has been working without any issue for more than one year with FreeBSD 9.1. I have another machine with kawela cards and similar setup, but without load at all. I'll see if I can reproduce the problem there and narrow it down by removing lagg(4). Is there a way to reset the interface without rebooting the machine? The watchdog doesn't really reset the device, it's in non-operating state afterwards. I need to 'ifconfig down' it for bringin lagg(4) back into operational state. Some kind of D3D0-state switch for a single address? kldunloading would destroy the remaining interface too=85 Thanks, -Harry --------------enigD1C941E7404D34A3D89F0F8E Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAlSxBDsACgkQLDqVQ9VXb8hfdgCgyWAiS3Cvutnrs5pX073E8AG9 QzEAn1A3pfZDYzb6nCmpSVuoyleMPWnZ =dxB3 -----END PGP SIGNATURE----- --------------enigD1C941E7404D34A3D89F0F8E--