From owner-freebsd-stable@FreeBSD.ORG Thu Jan 8 10:05:22 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B7556949 for ; Thu, 8 Jan 2015 10:05:22 +0000 (UTC) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 46C40853 for ; Thu, 8 Jan 2015 10:05:22 +0000 (UTC) Received: from mh0.gentlemail.de (mh0.gentlemail.de [IPv6:2a00:e10:2800::a135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id t08A5J6C043697 for ; Thu, 8 Jan 2015 11:05:19 +0100 (CET) (envelope-from h.schmalzbauer@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 2F272BBA; Thu, 8 Jan 2015 11:05:18 +0100 (CET) Message-ID: <54AE565D.50208@omnilan.de> Date: Thu, 08 Jan 2015 11:05:17 +0100 From: Harald Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: FreeBSD Stable Subject: Re: igb(4) watchdog timeout, lagg(4) fails References: <54ACC6A2.1050400@omnilan.de> In-Reply-To: <54ACC6A2.1050400@omnilan.de> X-Enigmail-Version: 1.1.2 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigC61D9B5CE24D54DAC6837971" X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]); Thu, 08 Jan 2015 11:05:19 +0100 (CET) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: ; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Jan 2015 10:05:22 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigC61D9B5CE24D54DAC6837971 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Bez=C3=BCglich Harry Schmalzbauer's Nachricht vom 07.01.2015 06:39 (loca= ltime): > Hello, > > recently I upgraded one server from 9.1 to 10.1. There are two 82576 > (one port of two Intel ET Dual-Port GbE [kawela]), driven by igb(4). > I've never seen any watchdog timeout with FreeBSD-9.1 but suddenly (wit= h > 10-stable) I see: > igb0: Watchdog timeout -- resetting > igb0: Queue(0) tdh =3D 2974, hw tdt =3D 2973 > igb0: TX(0) desc avail =3D 0,Next TX to Clean =3D 0 > > My biggest problem is, that lagg(4) doesn't detect the problem with > igb0. It's configured with "lagghash l2' and most connections were > interupted until I manually do 'ifconfig igb0 down'. Then lagg does it'= s > job and connectivity was restored via the remaining igb1. > > Is there a way to auto-if-down an interface which suffers from watchdog= > timeouts? And any way to really reset it without rebooting the machine?= igb wathchdog timeout happened again :-( ~48 hours after the last with very moderate-to-low avarage traffic. This time I could fetch dev.igb sysctls before igb0 was reset by watchdog= It's showing strange irq load: dev.igb.%parent: dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.4.0 dev.igb.0.%driver: igb dev.igb.0.%location: slot=3D0 function=3D0 handle=3D\_SB_.PCI0.PE60.S1F0 dev.igb.0.%pnpinfo: vendor=3D0x8086 device=3D0x10c9 subvendor=3D0x8086 subdevice=3D0xa03c class=3D0x020000 dev.igb.0.%parent: pci7 dev.igb.0.nvm: -1 dev.igb.0.enable_aim: 1 dev.igb.0.fc: 3 dev.igb.0.rx_processing_limit: 100 dev.igb.0.link_irq: 5 dev.igb.0.dropped: 0 dev.igb.0.tx_dma_fail: 0 dev.igb.0.rx_overruns: 0 dev.igb.0.watchdog_timeouts: 1 dev.igb.0.device_control: 1488978497 dev.igb.0.rx_control: 67272738 dev.igb.0.interrupt_mask: 4 dev.igb.0.extended_int_mask: 2147483679 dev.igb.0.tx_buf_alloc: 0 dev.igb.0.rx_buf_alloc: 0 dev.igb.0.fc_high_water: 47488 dev.igb.0.fc_low_water: 47472 dev.igb.0.queue0.interrupt_rate: 8000 dev.igb.0.queue0.txd_head: 0 dev.igb.0.queue0.txd_tail: 468 dev.igb.0.queue0.no_desc_avail: 41 dev.igb.0.queue0.tx_packets: 90807 dev.igb.0.queue0.rxd_head: 0 dev.igb.0.queue0.rxd_tail: 4095 dev.igb.0.queue0.rx_packets: 443307 dev.igb.0.queue0.rx_bytes: 0 dev.igb.0.queue0.lro_queued: 0 dev.igb.0.queue0.lro_flushed: 0 dev.igb.0.queue1.interrupt_rate: 8000 dev.igb.0.queue1.txd_head: 0 dev.igb.0.queue1.txd_tail: 221 dev.igb.0.queue1.no_desc_avail: 0 dev.igb.0.queue1.tx_packets: 300702 dev.igb.0.queue1.rxd_head: 0 dev.igb.0.queue1.rxd_tail: 4095 dev.igb.0.queue1.rx_packets: 734853 dev.igb.0.queue1.rx_bytes: 0 dev.igb.0.queue1.lro_queued: 0 dev.igb.0.queue1.lro_flushed: 0 dev.igb.0.queue2.interrupt_rate: 8000 dev.igb.0.queue2.txd_head: 0 dev.igb.0.queue2.txd_tail: 116 dev.igb.0.queue2.no_desc_avail: 0 dev.igb.0.queue2.tx_packets: 635285 dev.igb.0.queue2.rxd_head: 0 dev.igb.0.queue2.rxd_tail: 4095 dev.igb.0.queue2.rx_packets: 163156 dev.igb.0.queue2.rx_bytes: 0 dev.igb.0.queue2.lro_queued: 0 dev.igb.0.queue2.lro_flushed: 0 dev.igb.0.queue3.interrupt_rate: 8000 dev.igb.0.queue3.txd_head: 0 dev.igb.0.queue3.txd_tail: 199 dev.igb.0.queue3.no_desc_avail: 0 dev.igb.0.queue3.tx_packets: 177701 dev.igb.0.queue3.rxd_head: 0 dev.igb.0.queue3.rxd_tail: 4095 dev.igb.0.queue3.rx_packets: 209749 dev.igb.0.queue3.rx_bytes: 0 dev.igb.0.queue3.lro_queued: 0 dev.igb.0.queue3.lro_flushed: 0 dev.igb.0.mac_stats.excess_coll: 0 dev.igb.0.mac_stats.single_coll: 0 dev.igb.0.mac_stats.multiple_coll: 0 dev.igb.0.mac_stats.late_coll: 0 dev.igb.0.mac_stats.collision_count: 0 dev.igb.0.mac_stats.symbol_errors: 0 dev.igb.0.mac_stats.sequence_errors: 0 dev.igb.0.mac_stats.defer_count: 0 dev.igb.0.mac_stats.missed_packets: 0 dev.igb.0.mac_stats.recv_length_errors: 0 dev.igb.0.mac_stats.recv_no_buff: 0 dev.igb.0.mac_stats.recv_undersize: 0 dev.igb.0.mac_stats.recv_fragmented: 0 dev.igb.0.mac_stats.recv_oversize: 0 dev.igb.0.mac_stats.recv_jabber: 0 dev.igb.0.mac_stats.recv_errs: 0 dev.igb.0.mac_stats.crc_errs: 0 dev.igb.0.mac_stats.alignment_errs: 0 dev.igb.0.mac_stats.tx_no_crs: 0 dev.igb.0.mac_stats.coll_ext_errs: 0 dev.igb.0.mac_stats.xon_recvd: 0 dev.igb.0.mac_stats.xon_txd: 0 dev.igb.0.mac_stats.xoff_recvd: 0 dev.igb.0.mac_stats.xoff_txd: 0 dev.igb.0.mac_stats.unsupported_fc_recvd: 0 dev.igb.0.mac_stats.mgmt_pkts_recvd: 0 dev.igb.0.mac_stats.mgmt_pkts_drop: 0 dev.igb.0.mac_stats.mgmt_pkts_txd: 0 dev.igb.0.mac_stats.total_pkts_recvd: 1707305 dev.igb.0.mac_stats.good_pkts_recvd: 1551183 dev.igb.0.mac_stats.bcast_pkts_recvd: 179491 dev.igb.0.mac_stats.mcast_pkts_recvd: 1868 dev.igb.0.mac_stats.rx_frames_64: 212 dev.igb.0.mac_stats.rx_frames_65_127: 843418 dev.igb.0.mac_stats.rx_frames_128_255: 116516 dev.igb.0.mac_stats.rx_frames_256_511: 81391 dev.igb.0.mac_stats.rx_frames_512_1023: 14010 dev.igb.0.mac_stats.rx_frames_1024_1522: 495636 dev.igb.0.mac_stats.good_octets_recvd: 4228681579 dev.igb.0.mac_stats.total_octets_recvd: 4239899893 dev.igb.0.mac_stats.good_octets_txd: 3039302164 dev.igb.0.mac_stats.total_octets_recvd: 4239899893 dev.igb.0.mac_stats.good_octets_txd: 3039302164 dev.igb.0.mac_stats.total_octets_txd: 3039302164 dev.igb.0.mac_stats.total_pkts_txd: 1424648 dev.igb.0.mac_stats.good_pkts_txd: 1424648 dev.igb.0.mac_stats.bcast_pkts_txd: 412 dev.igb.0.mac_stats.mcast_pkts_txd: 6 dev.igb.0.mac_stats.tx_frames_64: 639519 dev.igb.0.mac_stats.tx_frames_65_127: 253844 dev.igb.0.mac_stats.tx_frames_128_255: 180022 dev.igb.0.mac_stats.tx_frames_256_511: 873 dev.igb.0.mac_stats.tx_frames_512_1023: 292 dev.igb.0.mac_stats.tx_frames_1024_1522: 350098 dev.igb.0.mac_stats.tso_txd: 95280 dev.igb.0.mac_stats.tso_ctx_fail: 0 dev.igb.0.interrupts.asserts: 3323144 dev.igb.0.interrupts.rx_pkt_timer: 1551160 dev.igb.0.interrupts.rx_abs_timer: 0 dev.igb.0.interrupts.tx_pkt_timer: 0 dev.igb.0.interrupts.tx_abs_timer: 1551069 dev.igb.0.interrupts.tx_queue_empty: 1424637 dev.igb.0.interrupts.tx_queue_min_thresh: 0 dev.igb.0.interrupts.rx_desc_min_thresh: 0 dev.igb.0.interrupts.rx_overrun: 0 dev.igb.0.host.breaker_tx_pkt: 0 dev.igb.0.host.host_tx_pkt_discard: 0 dev.igb.0.host.rx_pkt: 23 dev.igb.0.host.breaker_rx_pkts: 0 dev.igb.0.host.breaker_rx_pkt_drop: 0 dev.igb.0.host.tx_good_pkt: 11 dev.igb.0.host.breaker_tx_pkt_drop: 0 dev.igb.0.host.rx_good_bytes: 4228681579 dev.igb.0.host.tx_good_bytes: 3039302164 dev.igb.0.host.length_errors: 0 dev.igb.0.host.serdes_violation_pkt: 0 dev.igb.0.host.header_redir_missed: 0 Also igb1 was quiet busy at that time, but igb1 never hung: dev.igb.1.queue0.interrupt_rate: 10526 dev.igb.1.queue0.txd_head: 1879 dev.igb.1.queue0.txd_tail: 1879 dev.igb.1.queue0.no_desc_avail: 0 dev.igb.1.queue0.tx_packets: 8694 dev.igb.1.queue0.rxd_head: 1116 dev.igb.1.queue0.rxd_tail: 1115 dev.igb.1.queue0.rx_packets: 181340 dev.igb.1.queue0.rx_bytes: 11819287 dev.igb.1.queue0.lro_queued: 0 dev.igb.1.queue0.lro_flushed: 0 dev.igb.1.queue1.interrupt_rate: 76923 dev.igb.1.queue1.txd_head: 945 dev.igb.1.queue1.txd_tail: 945 dev.igb.1.queue1.no_desc_avail: 0 dev.igb.1.queue1.tx_packets: 9295572 dev.igb.1.queue1.rxd_head: 203 dev.igb.1.queue1.rxd_tail: 202 dev.igb.1.queue1.rx_packets: 18239691 dev.igb.1.queue1.rx_bytes: 23591559819 dev.igb.1.queue1.lro_queued: 0 dev.igb.1.queue1.lro_flushed: 0 dev.igb.1.queue2.interrupt_rate: 43478 dev.igb.1.queue2.txd_head: 4027 dev.igb.1.queue2.txd_tail: 4027 dev.igb.1.queue2.no_desc_avail: 0 dev.igb.1.queue2.tx_packets: 7335 dev.igb.1.queue2.rxd_head: 2158 dev.igb.1.queue2.rxd_tail: 2157 dev.igb.1.queue2.rx_packets: 2153 dev.igb.1.queue2.rx_bytes: 413198 dev.igb.1.queue2.lro_queued: 0 dev.igb.1.queue2.lro_flushed: 0 dev.igb.1.queue3.interrupt_rate: 43478 Should I consider tungin "hw.igb.max_interrupt_rate" ? Any help highly appreciated! Like mentioned initially, I've never had this issue with FreeBSD 9.1 with exactly the same environment/workload. Thanks, -Harry --------------enigC61D9B5CE24D54DAC6837971 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAlSuVl0ACgkQLDqVQ9VXb8hP+ACglU00n6O1aYGjbRV5jUbIjyHU BBYAnA8ckcrihi59DGrFnaCsLFmmOZMR =1bn0 -----END PGP SIGNATURE----- --------------enigC61D9B5CE24D54DAC6837971--