From owner-freebsd-stable@FreeBSD.ORG Thu Nov 10 10:03:55 2011 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BFC3C1065672 for ; Thu, 10 Nov 2011 10:03:55 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta01.emeryville.ca.mail.comcast.net (qmta01.emeryville.ca.mail.comcast.net [76.96.30.16]) by mx1.freebsd.org (Postfix) with ESMTP id A91698FC16 for ; Thu, 10 Nov 2011 10:03:53 +0000 (UTC) Received: from omta21.emeryville.ca.mail.comcast.net ([76.96.30.88]) by qmta01.emeryville.ca.mail.comcast.net with comcast id vMn01h0011u4NiLA1MqcCB; Thu, 10 Nov 2011 09:50:36 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta21.emeryville.ca.mail.comcast.net with comcast id vMyJ1h0071t3BNj8hMyKT1; Thu, 10 Nov 2011 09:58:19 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id AE0BF102C1D; Thu, 10 Nov 2011 01:50:41 -0800 (PST) Date: Thu, 10 Nov 2011 01:50:41 -0800 From: Jeremy Chadwick To: Willem Jan Withagen Message-ID: <20111110095041.GA73812@icarus.home.lan> References: <4EBB97DF.3020803@digiware.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4EBB97DF.3020803@digiware.nl> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "stable@freebsd.org" , "Vogel, Jack" Subject: Re: em0 watchdog timeout X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Nov 2011 10:03:55 -0000 On Thu, Nov 10, 2011 at 10:22:39AM +0100, Willem Jan Withagen wrote: > Still running this file server on ZFS, and every now and then em0 > goes down, and is not revivable.... Nothing goes in or out the > box... > > Any suggestions as how to (help) fix this? CC'ing Jack Vogel of Intel. We need "pciconf -lvbc" output (-lv by itself isn't sufficient in this regard). Also, please do "sysctl dev.em.0.debug=1", which will show nothing useful in the output, however "dmesg" shortly after should have a bunch of driver-level debugging information that should help (output starts with "Interface is ...". Please provide that too. > Nov 10 09:07:41 zfs kernel: em0: Watchdog timeout -- resetting > Nov 10 09:07:41 zfs kernel: em0: Queue(0) tdh = 187, hw tdt = 189 > Nov 10 09:07:41 zfs kernel: em0: TX(0) desc avail = 1022,Next TX to Clean = 187 > Nov 10 09:11:32 zfs kernel: em0: Watchdog timeout -- resetting > Nov 10 09:11:32 zfs kernel: em0: Queue(0) tdh = 139, hw tdt = 151 > Nov 10 09:11:32 zfs kernel: em0: TX(0) desc avail = 1012,Next TX to Clean = 139 > Nov 10 09:16:05 zfs kernel: em0: Watchdog timeout -- resetting > Nov 10 09:16:05 zfs kernel: em0: Queue(0) tdh = 152, hw tdt = 163 > Nov 10 09:16:05 zfs kernel: em0: TX(0) desc avail = 1013,Next TX to Clean = 152 > Nov 10 09:33:10 zfs kernel: em0: Watchdog timeout -- resetting > Nov 10 09:33:10 zfs kernel: em0: Queue(0) tdh = 161, hw tdt = 176 > Nov 10 09:33:10 zfs kernel: em0: TX(0) desc avail = 1008,Next TX to Clean = 160 > Nov 10 09:53:18 zfs kernel: em0: Watchdog timeout -- resetting > Nov 10 09:53:18 zfs kernel: em0: Queue(0) tdh = 157, hw tdt = 172 > Nov 10 09:53:18 zfs kernel: em0: TX(0) desc avail = 1009,Next TX to Clean = 157 > > Device is: > Nov 10 10:07:27 zfs kernel: em0: port 0x1820-0x183f mem 0xdf900000-0xdf91ffff,0xdf924000-0xdf924fff irq 16 at device 25.0 on pci0 > Nov 10 10:07:27 zfs kernel: em0: Using an MSI interrupt > Nov 10 10:07:27 zfs kernel: em0: [FILTER] > > pciconf -lv: > em0@pci0:0:25:0: class=0x020000 card=0x10bd15d9 > chip=0x10bd8086 rev=0x02 hdr=0x00 > vendor = 'Intel Corporation' > device = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)' > class = network > subclass = ethernet > > uname: > 8.2-STABLE FreeBSD 8.2-STABLE #12: Sun Oct 2 13:36:55 CEST 2011 > amd64 > > sysctl -a | grep em.0: > dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3 > dev.em.0.%driver: em > dev.em.0.%location: slot=25 function=0 handle=\_SB_.PCI0.LAN_ > dev.em.0.%pnpinfo: vendor=0x8086 device=0x10bd subvendor=0x15d9 > subdevice=0x10bd class=0x020000 > dev.em.0.%parent: pci0 > dev.em.0.nvm: -1 > dev.em.0.debug: -1 > dev.em.0.rx_int_delay: 0 > dev.em.0.tx_int_delay: 66 > dev.em.0.rx_abs_int_delay: 66 > dev.em.0.tx_abs_int_delay: 66 > dev.em.0.rx_processing_limit: 100 > dev.em.0.flow_control: 3 > dev.em.0.eee_control: 0 > dev.em.0.link_irq: 0 > dev.em.0.mbuf_alloc_fail: 0 > dev.em.0.cluster_alloc_fail: 0 > dev.em.0.dropped: 0 > dev.em.0.tx_dma_fail: 0 > dev.em.0.rx_overruns: 6 > dev.em.0.watchdog_timeouts: 5 > dev.em.0.device_control: 1074790976 > dev.em.0.rx_control: 67141634 > dev.em.0.fc_high_water: 8192 > dev.em.0.fc_low_water: 6692 > dev.em.0.queue0.txd_head: 78 > dev.em.0.queue0.txd_tail: 78 > dev.em.0.queue0.tx_irq: 0 > dev.em.0.queue0.no_desc_avail: 0 > dev.em.0.queue0.rxd_head: 376 > dev.em.0.queue0.rxd_tail: 375 > dev.em.0.queue0.rx_irq: 0 > dev.em.0.mac_stats.excess_coll: 0 > dev.em.0.mac_stats.single_coll: 0 > dev.em.0.mac_stats.multiple_coll: 0 > dev.em.0.mac_stats.late_coll: 0 > dev.em.0.mac_stats.collision_count: 0 > dev.em.0.mac_stats.symbol_errors: 0 > dev.em.0.mac_stats.sequence_errors: 0 > dev.em.0.mac_stats.defer_count: 0 > dev.em.0.mac_stats.missed_packets: 9 > dev.em.0.mac_stats.recv_no_buff: 0 > dev.em.0.mac_stats.recv_undersize: 0 > dev.em.0.mac_stats.recv_fragmented: 0 > dev.em.0.mac_stats.recv_oversize: 0 > dev.em.0.mac_stats.recv_jabber: 0 > dev.em.0.mac_stats.recv_errs: 1 > dev.em.0.mac_stats.crc_errs: 1 > dev.em.0.mac_stats.alignment_errs: 0 > dev.em.0.mac_stats.coll_ext_errs: 0 > dev.em.0.mac_stats.xon_recvd: 0 > dev.em.0.mac_stats.xon_txd: 0 > dev.em.0.mac_stats.xoff_recvd: 0 > dev.em.0.mac_stats.xoff_txd: 0 > dev.em.0.mac_stats.total_pkts_recvd: 160062850 > dev.em.0.mac_stats.good_pkts_recvd: 160062840 > dev.em.0.mac_stats.bcast_pkts_recvd: 79648 > dev.em.0.mac_stats.mcast_pkts_recvd: 10220 > dev.em.0.mac_stats.rx_frames_64: 0 > dev.em.0.mac_stats.rx_frames_65_127: 0 > dev.em.0.mac_stats.rx_frames_128_255: 0 > dev.em.0.mac_stats.rx_frames_256_511: 0 > dev.em.0.mac_stats.rx_frames_512_1023: 0 > dev.em.0.mac_stats.rx_frames_1024_1522: 0 > dev.em.0.mac_stats.good_octets_recvd: 107143604749 > dev.em.0.mac_stats.good_octets_txd: 129876768158 > dev.em.0.mac_stats.total_pkts_txd: 179010567 > dev.em.0.mac_stats.good_pkts_txd: 179010567 > dev.em.0.mac_stats.bcast_pkts_txd: 14608 > dev.em.0.mac_stats.mcast_pkts_txd: 206 > dev.em.0.mac_stats.tx_frames_64: 0 > dev.em.0.mac_stats.tx_frames_65_127: 0 > dev.em.0.mac_stats.tx_frames_128_255: 0 > dev.em.0.mac_stats.tx_frames_256_511: 0 > dev.em.0.mac_stats.tx_frames_512_1023: 0 > dev.em.0.mac_stats.tx_frames_1024_1522: 0 > dev.em.0.mac_stats.tso_txd: 3691806 > dev.em.0.mac_stats.tso_ctx_fail: 0 > dev.em.0.interrupts.asserts: 130023913 > dev.em.0.interrupts.rx_pkt_timer: 0 > dev.em.0.interrupts.rx_abs_timer: 0 > dev.em.0.interrupts.tx_pkt_timer: 0 > dev.em.0.interrupts.tx_abs_timer: 0 > dev.em.0.interrupts.tx_queue_empty: 0 > dev.em.0.interrupts.tx_queue_min_thresh: 0 > dev.em.0.interrupts.rx_desc_min_thresh: 0 > dev.em.0.interrupts.rx_overrun: 0 > dev.em.0.wake: 0 -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |