From owner-freebsd-stable@FreeBSD.ORG Thu Sep 30 13:34:28 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 710AC1065674 for ; Thu, 30 Sep 2010 13:34:28 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost2.sentex.ca (smarthost2-6.sentex.ca [IPv6:2607:f3e0:80:80::2]) by mx1.freebsd.org (Postfix) with ESMTP id 138B18FC17 for ; Thu, 30 Sep 2010 13:34:27 +0000 (UTC) Received: from lava.sentex.ca (pyroxene.sentex.ca [199.212.134.18]) by smarthost2.sentex.ca (8.14.4/8.14.4) with ESMTP id o8UDYJhQ096400 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 30 Sep 2010 09:34:19 -0400 (EDT) (envelope-from mike@sentex.net) Received: from mdt-xp.sentex.net (simeon.sentex.ca [192.168.43.27]) by lava.sentex.ca (8.14.4/8.14.4) with ESMTP id o8UDYJiT017075; Thu, 30 Sep 2010 09:34:19 -0400 (EDT) (envelope-from mike@sentex.net) Message-Id: <201009301334.o8UDYJiT017075@lava.sentex.ca> X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9 Date: Thu, 30 Sep 2010 09:34:16 -0400 To: Jack Vogel From: Mike Tancsa In-Reply-To: References: <201006102031.o5AKVCH2016467@lava.sentex.ca> <201007021739.o62HdMOU092319@lava.sentex.ca> <20100702193654.GD10862@michelle.cdnetworks.com> <201008162107.o7GL76pA080191@lava.sentex.ca> <20100817185208.GA6482@michelle.cdnetworks.com> <201008171955.o7HJt67T087902@lava.sentex.ca> <20100817200020.GE6482@michelle.cdnetworks.com> <201009141759.o8EHxcZ0013539@lava.sentex.ca> <201009262157.o8QLvR0L012171@lava.sentex.ca> <201009262343.o8QNhgDG012676@lava.sentex.ca> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Scanned-By: MIMEDefang 2.67 on 205.211.164.50 Cc: pyunyh@gmail.com, freebsd-stable@freebsd.org Subject: Re: RELENG_7 em problems (and RELENG_8) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 13:34:28 -0000 At 08:00 PM 9/26/2010, Jack Vogel wrote: >The system I've had stress tests running on has 82574 LOMs, so I hope it >will solve the problem, will see tomorrow morning at how things have held >up... I pulled a copy of sys/dev/e1000 from HEAD and copied onto my RELENG_8 box. I had another nic lock up last night :( Anyways, now running with the driver from HEAD on RELENG_8 amd64 em0: port 0x4040-0x405f mem 0xb4400000-0xb441ffff,0xb4425000-0xb4425fff irq 16 at device 25.0 on pci0 em0: Using an MSI interrupt em0: [FILTER] em0: Ethernet address: 00:15:17:ed:68:a5 em1: port 0x2000-0x201f mem 0xb4100000-0xb411ffff,0xb4120000-0xb4123fff irq 16 at device 0.0 on pci9 em1: Using MSIX interrupts with 3 vectors em1: [ITHREAD] em1: [ITHREAD] em1: [ITHREAD] em1: Ethernet address: 00:15:17:ed:68:a4 em0@pci0:0:25:0: class=0x020000 card=0x34ec8086 chip=0x10ef8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 13[e0] = PCI Advanced Features: FLR TP em1@pci0:9:0:0: class=0x020000 card=0x34ec8086 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = 'Intel 82574L Gigabit Ethernet Controller (82574L)' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected ecap 0003[140] = Serial 1 001517ffffed68a4 interrupt total rate irq4: uart0 2283 6 irq16: siis0 4332 11 irq18: arcmsr0 137175 372 irq19: twa0 18805 51 irq21: ehci0 2734 7 irq23: ehci1 675 1 cpu0: timer 733804 1994 irq256: em0 73195 198 irq257: em1:rx 0 238 0 irq258: em1:tx 0 37 0 irq260: ahci0 4328 11 cpu1: timer 725637 1971 cpu3: timer 725709 1972 cpu2: timer 725688 1971 Total 3154640 8572 ---Mike >Jack > > >On Sun, Sep 26, 2010 at 4:43 PM, Mike Tancsa ><mike@sentex.net> wrote: >At 06:19 PM 9/26/2010, Jack Vogel wrote: >Your em1 is using MSI not MSIX and thus can't have multiple queues. I'm >not sure whats broken from what you show here. I will try to get the new >driver out shortly for you to try. > > >With this particular NIC, it will wedge under high load. I tried 2 >different motherboards and chipsets the same behaviour. > > ---Mike > > >Jack > > > >On Sun, Sep 26, 2010 at 2:57 PM, Mike Tancsa ><mike@sentex.net> wrote: >At 06:36 PM 9/24/2010, Jack Vogel wrote: >There is a new revision of the em driver coming next week, its going thru some >stress pounding over the weekend, if no issues show up I'll put it into HEAD. > >Yongari's changes in TX context handling which effects checksum and tso >are added. I've also decided that multiple queues in 82574 just are a source >of problems without a lot of benefit, so it still uses MSIX but with >only 3 vectors, >meaning it seperates TX and RX but has a single queue. > > >Thanks, looking forward to trying it out! With respect to the >multiple queues, I thought the driver already used just the one on >RELENG_8 ? If not, is there a way to force the existing driver to >use just the one queue ? > >On the box that has the NIC locking up, it shows > >em1@pci0:9:0:0: class=0x020000 card=0x34ec8086 chip=0x10d38086 >rev=0x00 hdr=0x00 > > vendor = 'Intel Corporation' > device = 'Intel 82574L Gigabit Ethernet Controller (82574L)' > class = network > subclass = ethernet > cap 01[c8] = powerspec 2 supports D0 D3 current D0 > cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message > cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) > >and > >vmstat -i shows > >irq256: em0 5129063 353 >irq257: em1 531251 36 > >in a wedged state, stats look like > >dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 7.0.5 >dev.em.1.%driver: em >dev.em.1.%location: slot=0 function=0 handle=\_SB_.PCI0.PEX4.HART >dev.em.1.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x8086 >subdevice=0x34ec class=0x020000 >dev.em.1.%parent: pci9 >dev.em.1.nvm: -1 >dev.em.1.rx_int_delay: 0 >dev.em.1.tx_int_delay: 66 >dev.em.1.rx_abs_int_delay: 66 >dev.em.1.tx_abs_int_delay: 66 >dev.em.1.rx_processing_limit: 100 >dev.em.1.link_irq: 0 >dev.em.1.mbuf_alloc_fail: 0 >dev.em.1.cluster_alloc_fail: 0 >dev.em.1.dropped: 0 >dev.em.1.tx_dma_fail: 0 >dev.em.1.fc_high_water: 18432 >dev.em.1.fc_low_water: 16932 >dev.em.1.mac_stats.excess_coll: 0 >dev.em.1.mac_stats.symbol_errors: 0 >dev.em.1.mac_stats.sequence_errors: 0 >dev.em.1.mac_stats.defer_count: 0 >dev.em.1.mac_stats.missed_packets: 41522 >dev.em.1.mac_stats.recv_no_buff: 19 >dev.em.1.mac_stats.recv_errs: 0 >dev.em.1.mac_stats.crc_errs: 0 >dev.em.1.mac_stats.alignment_errs: 0 >dev.em.1.mac_stats.coll_ext_errs: 0 >dev.em.1.mac_stats.rx_overruns: 41398 >dev.em.1.mac_stats.watchdog_timeouts: 0 >dev.em.1.mac_stats.xon_recvd: 0 >dev.em.1.mac_stats.xon_txd: 0 >dev.em.1.mac_stats.xoff_recvd: 0 >dev.em.1.mac_stats.xoff_txd: 0 >dev.em.1.mac_stats.total_pkts_recvd: 95229129 >dev.em.1.mac_stats.good_pkts_recvd: 95187607 >dev.em.1.mac_stats.bcast_pkts_recvd: 79244 >dev.em.1.mac_stats.mcast_pkts_recvd: 0 >dev.em.1.mac_stats.rx_frames_64: 93680 >dev.em.1.mac_stats.rx_frames_65_127: 1516349 >dev.em.1.mac_stats.rx_frames_128_255: 4464941 >dev.em.1.mac_stats.rx_frames_256_511: 4024 >dev.em.1.mac_stats.rx_frames_512_1023: 2096067 >dev.em.1.mac_stats.rx_frames_1024_1522: 87012546 >dev.em.1.mac_stats.good_octets_recvd: 0 >dev.em.1.mac_stats.good_octest_txd: 0 >dev.em.1.mac_stats.total_pkts_txd: 66775098 >dev.em.1.mac_stats.good_pkts_txd: 66775098 >dev.em.1.mac_stats.bcast_pkts_txd: 509 >dev.em.1.mac_stats.mcast_pkts_txd: 7 >dev.em.1.mac_stats.tx_frames_64: 48038472 >dev.em.1.mac_stats.tx_frames_65_127: 13402833 >dev.em.1.mac_stats.tx_frames_128_255: 5324413 >dev.em.1.mac_stats.tx_frames_256_511: 957 >dev.em.1.mac_stats.tx_frames_512_1023: 319 >dev.em.1.mac_stats.tx_frames_1024_1522: 8104 >dev.em.1.mac_stats.tso_txd: 1069 >dev.em.1.mac_stats.tso_ctx_fail: 0 >dev.em.1.interrupts.asserts: 0 >dev.em.1.interrupts.rx_pkt_timer: 0 >dev.em.1.interrupts.rx_abs_timer: 0 >dev.em.1.interrupts.tx_pkt_timer: 0 >dev.em.1.interrupts.tx_abs_timer: 0 >dev.em.1.interrupts.tx_queue_empty: 0 >dev.em.1.interrupts.tx_queue_min_thresh: 0 >dev.em.1.interrupts.rx_desc_min_thresh: 0 >dev.em.1.interrupts.rx_overrun: 0 >dev.em.1.host.breaker_tx_pkt: 0 >dev.em.1.host.host_tx_pkt_discard: 0 >dev.em.1.host.rx_pkt: 0 >dev.em.1.host.breaker_rx_pkts: 0 >dev.em.1.host.breaker_rx_pkt_drop: 0 >dev.em.1.host.tx_good_pkt: 0 >dev.em.1.host.breaker_tx_pkt_drop: 0 >dev.em.1.host.rx_good_bytes: 0 >dev.em.1.host.tx_good_bytes: 0 >dev.em.1.host.length_errors: 0 >dev.em.1.host.serdes_violation_pkt: 0 >dev.em.1.host.header_redir_missed: 0 > >ifconfig down/up just panics or locks up the box when its in this >state. I also have IPMI enabled on this nic, but it shows the same >issue with it disabled. > > ---Mike > > > >-------------------------------------------------------------------- >Mike Tancsa, tel +1 519 651 3400 >Sentex Communications, >mike@sentex.net >Providing Internet since >1994 ><http://www.sentex.net>www.sentex.net >Cambridge, Ontario >Canada ><http://www.sentex.net/mike>www.sentex.net/mike > > >-------------------------------------------------------------------- >Mike Tancsa, tel +1 519 651 3400 >Sentex >Communications, >mike@sentex.net >Providing Internet since >1994 www.sentex.net >Cambridge, Ontario >Canada www.sentex.net/mike > -------------------------------------------------------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet since 1994 www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike