From owner-freebsd-net@FreeBSD.ORG Mon Nov 7 17:54:33 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1DF0F106564A for ; Mon, 7 Nov 2011 17:54:33 +0000 (UTC) (envelope-from nitroboost@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 946C38FC13 for ; Mon, 7 Nov 2011 17:54:32 +0000 (UTC) Received: by bkbzs8 with SMTP id zs8so1664207bkb.13 for ; Mon, 07 Nov 2011 09:54:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=6WCjyQyQrgBzTqbX4Ds3xZ5TOICPHPi7MmstU1G8UF8=; b=T36Gn+kG2K1R5kItfcZeDMRnSpKWYmHTbeoBSctfIm1wiok+lhmvsR+O3BPQ66r3sh XIkCOyoQcF/SG2wWzNtHKErKUIi8U7LbKI6nESchKaFOePPN6vT2kQKrwpYBq5daFILI Olc8DJOFUA+6f3KoOmfsrPUh5uwXgXvse5s60= MIME-Version: 1.0 Received: by 10.182.227.99 with SMTP id rz3mr9463068obc.4.1320688470480; Mon, 07 Nov 2011 09:54:30 -0800 (PST) Received: by 10.182.30.164 with HTTP; Mon, 7 Nov 2011 09:54:30 -0800 (PST) In-Reply-To: <4EAE58A2.9040803@gmail.com> References: <4E8F157A.40702@sentex.net> <4E8F51D4.1060509@sentex.net> <4EA7E203.3020306@sepehrs.com> <4EA80818.3030504@sentex.net> <4EA80F88.4000400@hotplug.ru> <4EA82715.2000404@gmail.com> <4EA8FA40.7010504@hotplug.ru> <4EA91836.2040508@gmail.com> <4EA959EE.2070806@hotplug.ru> <4EAD116A.8090006@gmail.com> <4EAE58A2.9040803@gmail.com> Date: Mon, 7 Nov 2011 10:54:30 -0700 Message-ID: From: Jason Wolfe To: Hooman Fazaeli Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: pyunyh@gmail.com, freebsd-net@freebsd.org, Jack Vogel , Emil Muratov Subject: Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Nov 2011 17:54:33 -0000 On Mon, Oct 31, 2011 at 1:13 AM, Hooman Fazaeli wrote: > > Attached is a patch for if_em.c. It flushes interface queue when it is > full > and link is not active. Please note that when this happens, drops are > increasing > on interface and this will trigger your scripts as before. You need to > change > a little the scripts as follows: > > check interface TX status > if (interface TX seems hung) { > sleep 5 > check interface TX status > if (interface TX seems hung) { > reset the interface. > } > } > > For MULTIQUEUE, it just disables the check for link status (which is not > good). > so pls. test in non-MULTIQUEUE mode. > > The patch also contains some minor fixups to compile on 7 plus > a fix from r1.69 which addressed RX hang problem (the fix was > later removed in r1.70). I included it for Emil to give it a > try. > > Pls. let me know if you have any problems with patch. > > Hooman, Unfortunately one of the server just had a wedge event a couple hours ago with this patch. To confirm your changes should cause a recovery within the time I'm allowing, here is the current format: check interface TX status if (interface TX seems hung) { sleep 3 check packets out sleep 2 check packets out if (packets not incrementing) { reset the interface } } I bounced em0 because dropped packets incremented 1749543 to 1749708 and the interface is not incrementing packets out. 4:10AM up 6 days, 15:23, 0 users, load averages: 0.02, 0.12, 0.14 em0: flags=8843 metric 0 mtu 1500 options=219b ether X inet6 X%em0 prefixlen 64 scopeid 0x1 nd6 options=1 media: Ethernet autoselect (1000baseT ) status: active em1: flags=8843 metric 0 mtu 1500 options=219b ether X inet6 X%em1 prefixlen 64 scopeid 0x2 nd6 options=3 media: Ethernet autoselect (1000baseT ) status: active ipfw0: flags=8801 metric 0 mtu 65536 lo0: flags=8049 metric 0 mtu 16384 options=3 inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 inet X.X.X.X netmask 0xffffffff inet X.X.X.X netmask 0xffffffff inet X.X.X.X netmask 0xffffffff nd6 options=3 lagg0: flags=8843 metric 0 mtu 1500 options=219b ether X inet X.X.X.X netmask 0xffffff00 broadcast X.X.X.X inet6 X%lagg0 prefixlen 64 scopeid 0x5 inet6 X prefixlen 64 autoconf nd6 options=3 media: Ethernet autoselect status: active laggproto loadbalance laggport: em0 flags=4 laggport: em1 flags=4 interrupt total rate irq3: uart1 3810 0 cpu0: timer 1147568087 2000 irq256: em0:rx 0 59779710 104 irq257: em0:tx 0 2771888652 4831 irq258: em0:link 1 0 irq259: em1:rx 0 3736828886 6512 irq260: em1:tx 0 2790566376 4863 irq261: em1:link 27286 0 irq262: mps0 395687386 689 cpu1: timer 1147559894 2000 cpu2: timer 1147559901 2000 cpu3: timer 1147559902 2000 Total 14345029891 25001 13466/4144/17610 mbufs in use (current/cache/total) 2567/2635/5202/5853720 mbuf clusters in use (current/cache/total/max) 2567/633 mbuf+clusters out of packet secondary zone in use (current/cache) 6798/554/7352/2926859 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 35692K/8522K/44214K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop em0 1500 00:25:90:2b:e5:75 60747643 0 0 11246408092 0 0 1750763 em0 1500 fe80:1::225:9 fe80:1::225:90ff: 0 - - 4 - - - em1 1500 00:25:90:2b:e5:75 11237195776 123950 0 11344722383 0 0 545682 em1 1500 fe80:2::225:9 fe80:2::225:90ff: 0 - - 1 - - - lagg0 1500 00:25:90:2b:e5:75 11297850142 0 0 22588666102 2296445 0 0 lagg0 1500 69.164.38.0/2 69.164.38.83 10189108030 - - 22592881776 - - - lagg0 1500 fe80:5::225:9 fe80:5::225:90ff: 24 - - 28 - - - lagg0 1500 2607:f4e8:310 2607:f4e8:310:12: 19578 - - 19591 - - - kern.msgbuf: Nov 7 04:10:06 cds1033 kernel: Interface is RUNNING and INACTIVE Nov 7 04:10:06 cds1033 kernel: em0: hw tdh = 558, hw tdt = 558 Nov 7 04:10:06 cds1033 kernel: em0: hw rdh = 889, hw rdt = 888 Nov 7 04:10:06 cds1033 kernel: em0: Tx Queue Status = 0 Nov 7 04:10:06 cds1033 kernel: em0: TX descriptors avail = 1024 Nov 7 04:10:06 cds1033 kernel: em0: Tx Descriptors avail failure = 0 Nov 7 04:10:06 cds1033 kernel: em0: RX discarded packets = 0 Nov 7 04:10:06 cds1033 kernel: em0: RX Next to Check = 889 Nov 7 04:10:06 cds1033 kernel: em0: RX Next to Refresh = 888 Nov 7 04:10:06 cds1033 kernel: em0: Link state: active Nov 7 04:10:06 cds1033 kernel: Interface is RUNNING and INACTIVE Nov 7 04:10:06 cds1033 kernel: em1: hw tdh = 837, hw tdt = 837 Nov 7 04:10:06 cds1033 kernel: em1: hw rdh = 198, hw rdt = 189 Nov 7 04:10:06 cds1033 kernel: em1: Tx Queue Status = 0 Nov 7 04:10:06 cds1033 kernel: em1: TX descriptors avail = 1021 Nov 7 04:10:06 cds1033 kernel: em1: Tx Descriptors avail failure = 0 Nov 7 04:10:06 cds1033 kernel: em1: RX discarded packets = 0 Nov 7 04:10:06 cds1033 kernel: em1: RX Next to Check = 303 Nov 7 04:10:06 cds1033 kernel: em1: RX Next to Refresh = 302 Nov 7 04:10:06 cds1033 kernel: em1: Link state: active net.inet.ip.intr_queue_maxlen: 512 net.inet.ip.intr_queue_drops: 0 dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3 dev.em.0.%driver: em dev.em.0.%location: slot=0 function=0 dev.em.0.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x10d3 class=0x020000 dev.em.0.%parent: pci1 dev.em.0.nvm: -1 dev.em.0.debug: -1 dev.em.0.rx_int_delay: 0 dev.em.0.tx_int_delay: 66 dev.em.0.rx_abs_int_delay: 66 dev.em.0.tx_abs_int_delay: 66 dev.em.0.rx_processing_limit: 100 dev.em.0.flow_control: 0 dev.em.0.eee_control: 0 dev.em.0.link_irq: 1 dev.em.0.mbuf_alloc_fail: 0 dev.em.0.cluster_alloc_fail: 0 dev.em.0.dropped: 0 dev.em.0.tx_dma_fail: 0 dev.em.0.rx_overruns: 0 dev.em.0.watchdog_timeouts: 0 dev.em.0.link_toggles: 3 dev.em.0.device_control: 1074790984 dev.em.0.rx_control: 67141634 dev.em.0.fc_high_water: 18432 dev.em.0.fc_low_water: 16932 dev.em.0.queue0.txd_head: 558 dev.em.0.queue0.txd_tail: 558 dev.em.0.queue0.tx_irq: 2771888613 dev.em.0.queue0.no_desc_avail: 0 dev.em.0.queue0.rxd_head: 102 dev.em.0.queue0.rxd_tail: 101 dev.em.0.queue0.rx_irq: 59779941 dev.em.0.mac_stats.excess_coll: 0 dev.em.0.mac_stats.single_coll: 0 dev.em.0.mac_stats.multiple_coll: 0 dev.em.0.mac_stats.late_coll: 0 dev.em.0.mac_stats.collision_count: 0 dev.em.0.mac_stats.symbol_errors: 0 dev.em.0.mac_stats.sequence_errors: 0 dev.em.0.mac_stats.defer_count: 0 dev.em.0.mac_stats.missed_packets: 0 dev.em.0.mac_stats.recv_no_buff: 0 dev.em.0.mac_stats.recv_undersize: 0 dev.em.0.mac_stats.recv_fragmented: 0 dev.em.0.mac_stats.recv_oversize: 0 dev.em.0.mac_stats.recv_jabber: 0 dev.em.0.mac_stats.recv_errs: 0 dev.em.0.mac_stats.crc_errs: 0 dev.em.0.mac_stats.alignment_errs: 0 dev.em.0.mac_stats.coll_ext_errs: 0 dev.em.0.mac_stats.xon_recvd: 0 dev.em.0.mac_stats.xon_txd: 0 dev.em.0.mac_stats.xoff_recvd: 0 dev.em.0.mac_stats.xoff_txd: 0 dev.em.0.mac_stats.total_pkts_recvd: 60768005 dev.em.0.mac_stats.good_pkts_recvd: 60768005 dev.em.0.mac_stats.bcast_pkts_recvd: 60744833 dev.em.0.mac_stats.mcast_pkts_recvd: 2948 dev.em.0.mac_stats.rx_frames_64: 60744822 dev.em.0.mac_stats.rx_frames_65_127: 3491 dev.em.0.mac_stats.rx_frames_128_255: 19293 dev.em.0.mac_stats.rx_frames_256_511: 364 dev.em.0.mac_stats.rx_frames_512_1023: 35 dev.em.0.mac_stats.rx_frames_1024_1522: 0 dev.em.0.mac_stats.good_octets_recvd: 3890774076 dev.em.0.mac_stats.good_octets_txd: 15295538422911 dev.em.0.mac_stats.total_pkts_txd: 11246435226 dev.em.0.mac_stats.good_pkts_txd: 11246435226 dev.em.0.mac_stats.bcast_pkts_txd: 4 dev.em.0.mac_stats.mcast_pkts_txd: 3830 dev.em.0.mac_stats.tx_frames_64: 60338532 dev.em.0.mac_stats.tx_frames_65_127: 897135811 dev.em.0.mac_stats.tx_frames_128_255: 13933787 dev.em.0.mac_stats.tx_frames_256_511: 22107540 dev.em.0.mac_stats.tx_frames_512_1023: 116374045 dev.em.0.mac_stats.tx_frames_1024_1522: 10136545511 dev.em.0.mac_stats.tso_txd: 0 dev.em.0.mac_stats.tso_ctx_fail: 0 dev.em.0.interrupts.asserts: 3 dev.em.0.interrupts.rx_pkt_timer: 0 dev.em.0.interrupts.rx_abs_timer: 0 dev.em.0.interrupts.tx_pkt_timer: 0 dev.em.0.interrupts.tx_abs_timer: 0 dev.em.0.interrupts.tx_queue_empty: 0 dev.em.0.interrupts.tx_queue_min_thresh: 0 dev.em.0.interrupts.rx_desc_min_thresh: 0 dev.em.0.interrupts.rx_overrun: 0 dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 7.2.3 dev.em.1.%driver: em dev.em.1.%location: slot=0 function=0 dev.em.1.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x10d3 class=0x020000 dev.em.1.%parent: pci2 dev.em.1.nvm: -1 dev.em.1.debug: -1 dev.em.1.rx_int_delay: 0 dev.em.1.tx_int_delay: 66 dev.em.1.rx_abs_int_delay: 66 dev.em.1.tx_abs_int_delay: 66 dev.em.1.rx_processing_limit: 100 dev.em.1.flow_control: 0 dev.em.1.eee_control: 0 dev.em.1.link_irq: 27207 dev.em.1.mbuf_alloc_fail: 0 dev.em.1.cluster_alloc_fail: 0 dev.em.1.dropped: 0 dev.em.1.tx_dma_fail: 0 dev.em.1.rx_overruns: 0 dev.em.1.watchdog_timeouts: 0 dev.em.1.link_toggles: 3 dev.em.1.device_control: 1074790984 dev.em.1.rx_control: 67141634 dev.em.1.fc_high_water: 18432 dev.em.1.fc_low_water: 16932 dev.em.1.queue0.txd_head: 804 dev.em.1.queue0.txd_tail: 804 dev.em.1.queue0.tx_irq: 2790571888 dev.em.1.queue0.no_desc_avail: 0 dev.em.1.queue0.rxd_head: 686 dev.em.1.queue0.rxd_tail: 686 dev.em.1.queue0.rx_irq: 3729597000 dev.em.1.mac_stats.excess_coll: 0 dev.em.1.mac_stats.single_coll: 0 dev.em.1.mac_stats.multiple_coll: 0 dev.em.1.mac_stats.late_coll: 0 dev.em.1.mac_stats.collision_count: 0 dev.em.1.mac_stats.symbol_errors: 0 dev.em.1.mac_stats.sequence_errors: 0 dev.em.1.mac_stats.defer_count: 0 dev.em.1.mac_stats.missed_packets: 123950 dev.em.1.mac_stats.recv_no_buff: 7063 dev.em.1.mac_stats.recv_undersize: 0 dev.em.1.mac_stats.recv_fragmented: 0 dev.em.1.mac_stats.recv_oversize: 0 dev.em.1.mac_stats.recv_jabber: 0 dev.em.1.mac_stats.recv_errs: 0 dev.em.1.mac_stats.crc_errs: 0 dev.em.1.mac_stats.alignment_errs: 0 dev.em.1.mac_stats.coll_ext_errs: 0 dev.em.1.mac_stats.xon_recvd: 0 dev.em.1.mac_stats.xon_txd: 0 dev.em.1.mac_stats.xoff_recvd: 0 dev.em.1.mac_stats.xoff_txd: 0 dev.em.1.mac_stats.total_pkts_recvd: 11237327404 dev.em.1.mac_stats.good_pkts_recvd: 11237203454 dev.em.1.mac_stats.bcast_pkts_recvd: 60741559 dev.em.1.mac_stats.mcast_pkts_recvd: 2935 dev.em.1.mac_stats.rx_frames_64: 4479481614 dev.em.1.mac_stats.rx_frames_65_127: 5124633522 dev.em.1.mac_stats.rx_frames_128_255: 13228214 dev.em.1.mac_stats.rx_frames_256_511: 23321029 dev.em.1.mac_stats.rx_frames_512_1023: 74960116 dev.em.1.mac_stats.rx_frames_1024_1522: 1521578959 dev.em.1.mac_stats.good_octets_recvd: 3024997755633 dev.em.1.mac_stats.good_octets_txd: 15349574964083 dev.em.1.mac_stats.total_pkts_txd: 11344737635 dev.em.1.mac_stats.good_pkts_txd: 11344737635 dev.em.1.mac_stats.bcast_pkts_txd: 2501 dev.em.1.mac_stats.mcast_pkts_txd: 8 dev.em.1.mac_stats.tx_frames_64: 62839963 dev.em.1.mac_stats.tx_frames_65_127: 962219224 dev.em.1.mac_stats.tx_frames_128_255: 14620849 dev.em.1.mac_stats.tx_frames_256_511: 20410906 dev.em.1.mac_stats.tx_frames_512_1023: 117272600 dev.em.1.mac_stats.tx_frames_1024_1522: 10167374093 dev.em.1.mac_stats.tso_txd: 0 dev.em.1.mac_stats.tso_ctx_fail: 0 dev.em.1.interrupts.asserts: 19749 dev.em.1.interrupts.rx_pkt_timer: 1 dev.em.1.interrupts.rx_abs_timer: 0 dev.em.1.interrupts.tx_pkt_timer: 0 dev.em.1.interrupts.tx_abs_timer: 0 dev.em.1.interrupts.tx_queue_empty: 0 dev.em.1.interrupts.tx_queue_min_thresh: 0 dev.em.1.interrupts.rx_desc_min_thresh: 0 dev.em.1.interrupts.rx_overrun: 18 hw.em.rx_hang_fixup: 0 hw.em.eee_setting: 0 hw.em.fc_setting: 0 hw.em.rx_process_limit: 100 hw.em.enable_msix: 1 hw.em.sbp: 0 hw.em.smart_pwr_down: 0 hw.em.txd: 1024 hw.em.rxd: 1024 hw.em.rx_abs_int_delay: 66 hw.em.tx_abs_int_delay: 66 hw.em.rx_int_delay: 0 hw.em.tx_int_delay: 66 Jason