Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 02 Oct 2010 10:32:02 -0400
From:      Mike Tancsa <mike@sentex.net>
To:        Jack Vogel <jfvogel@gmail.com>
Cc:        pyunyh@gmail.com, freebsd-stable@freebsd.org
Subject:   Re: RELENG_7 em problems (and RELENG_8)
Message-ID:  <201010021432.o92EWAIs033670@lava.sentex.ca>
In-Reply-To: <AANLkTik5mzeKPYrp3_80Ng9ByFj%2BLSHsd3xT2JCP98E%2B@mail.gmail.c om>
References:  <201006102031.o5AKVCH2016467@lava.sentex.ca> <201007021739.o62HdMOU092319@lava.sentex.ca> <20100702193654.GD10862@michelle.cdnetworks.com> <201008162107.o7GL76pA080191@lava.sentex.ca> <20100817185208.GA6482@michelle.cdnetworks.com> <201008171955.o7HJt67T087902@lava.sentex.ca> <20100817200020.GE6482@michelle.cdnetworks.com> <201009141759.o8EHxcZ0013539@lava.sentex.ca> <AANLkTimiTmA1HHeWmGm1MAFf-H=OqC17vwZvFWpgcHCZ@mail.gmail.com> <201009262157.o8QLvR0L012171@lava.sentex.ca> <AANLkTinxLScVxQ2ib%2BcLXEBGTATAU36%2BOKr7%2B5SQXE89@mail.gmail.com> <201009262343.o8QNhgDG012676@lava.sentex.ca> <AANLkTik5mzeKPYrp3_80Ng9ByFj%2BLSHsd3xT2JCP98E%2B@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

Hi Jack,
         Two quick notes about the new driver.

On the server that was having nic lockups, so far so good.  Saturday 
AM, the box would take a lot of level0 dumps as well as do about 
70Mb/s of outbound rsync traffic.  By now, the nic would have wedged 
at least once So far so good!


On different, new box, I decided to try HEAD, with the new driver, 
and ran into problems with the onboard nic

em0@pci0:0:25:0:        class=0x020000 card=0x00368086 
chip=0x10f08086 rev=0x06 hdr=0x00
     vendor     = 'Intel Corporation'
     class      = network
     subclass   = ethernet
     cap 01[c8] = powerspec 2  supports D0 D3  current D0
     cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
     cap 13[e0] = PCI Advanced Features: FLR TP

em0: <Intel(R) PRO/1000 Network Connection 7.0.5> port 0xf020-0xf03f 
mem 0xfe500000-0xfe51ffff,0xfe527000-0xfe527fff irq 20 at device 25.0 on pci0
em0: Using MSI interrupt
em0: [FILTER]
em0: Ethernet address: 70:71:bc:09:5e:aa

This is an intel branded desktop board

acpi0: <INTEL DH55TC> on motherboard

I find I have to disable rx and tx csum on the interface, otherwise 
there are a lot of re-transmits due to missed packets.  tcpdump 
implies the packets are going out, but it seems never to get 
out.  The mother board is at the office on an unmanaged switch right 
now, so I dont have any stats from the switch.  But tcpdump shows a 
lot of outbound re-transmits. Turning off rxcsum and txcsum fixes the problem.

dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.0.8
dev.em.0.%driver: em
dev.em.0.%location: slot=25 function=0 handle=\_SB_.PCI0.GBE_
dev.em.0.%pnpinfo: vendor=0x8086 device=0x10f0 subvendor=0x8086 
subdevice=0x0036 class=0x020000
dev.em.0.%parent: pci0
dev.em.0.nvm: -1
dev.em.0.rx_int_delay: 0
dev.em.0.tx_int_delay: 66
dev.em.0.rx_abs_int_delay: 66
dev.em.0.tx_abs_int_delay: 66
dev.em.0.rx_processing_limit: 100
dev.em.0.link_irq: 0
dev.em.0.mbuf_alloc_fail: 0
dev.em.0.cluster_alloc_fail: 0
dev.em.0.dropped: 0
dev.em.0.tx_dma_fail: 0
dev.em.0.rx_overruns: 0
dev.em.0.watchdog_timeouts: 0
dev.em.0.device_control: 1074790976
dev.em.0.rx_control: 67141634
dev.em.0.fc_high_water: 8192
dev.em.0.fc_low_water: 6692
dev.em.0.queue0.txd_head: 15
dev.em.0.queue0.txd_tail: 17
dev.em.0.queue0.tx_irq: 0
dev.em.0.queue0.no_desc_avail: 0
dev.em.0.queue0.rxd_head: 843
dev.em.0.queue0.rxd_tail: 842
dev.em.0.queue0.rx_irq: 0
dev.em.0.mac_stats.excess_coll: 0
dev.em.0.mac_stats.single_coll: 0
dev.em.0.mac_stats.multiple_coll: 0
dev.em.0.mac_stats.late_coll: 0
dev.em.0.mac_stats.collision_count: 0
dev.em.0.mac_stats.symbol_errors: 0
dev.em.0.mac_stats.sequence_errors: 0
dev.em.0.mac_stats.defer_count: 0
dev.em.0.mac_stats.missed_packets: 0
dev.em.0.mac_stats.recv_no_buff: 0
dev.em.0.mac_stats.recv_undersize: 0
dev.em.0.mac_stats.recv_fragmented: 0
dev.em.0.mac_stats.recv_oversize: 0
dev.em.0.mac_stats.recv_jabber: 0
dev.em.0.mac_stats.recv_errs: 0
dev.em.0.mac_stats.crc_errs: 0
dev.em.0.mac_stats.alignment_errs: 0
dev.em.0.mac_stats.coll_ext_errs: 0
dev.em.0.mac_stats.xon_recvd: 80
dev.em.0.mac_stats.xon_txd: 0
dev.em.0.mac_stats.xoff_recvd: 82
dev.em.0.mac_stats.xoff_txd: 0
dev.em.0.mac_stats.total_pkts_recvd: 35697
dev.em.0.mac_stats.good_pkts_recvd: 35535
dev.em.0.mac_stats.bcast_pkts_recvd: 231
dev.em.0.mac_stats.mcast_pkts_recvd: 85
dev.em.0.mac_stats.rx_frames_64: 0
dev.em.0.mac_stats.rx_frames_65_127: 0
dev.em.0.mac_stats.rx_frames_128_255: 0
dev.em.0.mac_stats.rx_frames_256_511: 0
dev.em.0.mac_stats.rx_frames_512_1023: 0
dev.em.0.mac_stats.rx_frames_1024_1522: 0
dev.em.0.mac_stats.good_octets_recvd: 14878015
dev.em.0.mac_stats.good_octets_txd: 14051783
dev.em.0.mac_stats.total_pkts_txd: 45313
dev.em.0.mac_stats.good_pkts_txd: 45313
dev.em.0.mac_stats.bcast_pkts_txd: 3
dev.em.0.mac_stats.mcast_pkts_txd: 5
dev.em.0.mac_stats.tx_frames_64: 0
dev.em.0.mac_stats.tx_frames_65_127: 0
dev.em.0.mac_stats.tx_frames_128_255: 0
dev.em.0.mac_stats.tx_frames_256_511: 0
dev.em.0.mac_stats.tx_frames_512_1023: 0
dev.em.0.mac_stats.tx_frames_1024_1522: 0
dev.em.0.mac_stats.tso_txd: 2788
dev.em.0.mac_stats.tso_ctx_fail: 0
dev.em.0.interrupts.asserts: 48733
dev.em.0.interrupts.rx_pkt_timer: 0
dev.em.0.interrupts.rx_abs_timer: 0
dev.em.0.interrupts.tx_pkt_timer: 0
dev.em.0.interrupts.tx_abs_timer: 0
dev.em.0.interrupts.tx_queue_empty: 0
dev.em.0.interrupts.tx_queue_min_thresh: 0
dev.em.0.interrupts.rx_desc_min_thresh: 0
dev.em.0.interrupts.rx_overrun: 0
dev.em.0.wake: 0



At 08:00 PM 9/26/2010, Jack Vogel wrote:
>The system I've had stress tests running on has 82574 LOMs, so I hope it
>will solve the problem, will see tomorrow morning at how things have held
>up...
>
>Jack
>
>
>On Sun, Sep 26, 2010 at 4:43 PM, Mike Tancsa 
><<mailto:mike@sentex.net>mike@sentex.net> wrote:
>At 06:19 PM 9/26/2010, Jack Vogel wrote:
>Your em1 is using MSI not MSIX and thus can't have multiple queues. I'm
>not sure whats broken from what you show here. I will try to get the new
>driver out shortly for you to try.
>
>
>With this particular NIC, it will wedge under high load.  I tried 2 
>different motherboards and chipsets the same behaviour.
>
>        ---Mike
>
>
>Jack
>
>
>
>On Sun, Sep 26, 2010 at 2:57 PM, Mike Tancsa 
><<mailto:mike@sentex.net><mailto:mike@sentex.net>mike@sentex.net> wrote:
>At 06:36 PM 9/24/2010, Jack Vogel wrote:
>There is a new revision of the em driver coming next week, its going thru some
>stress pounding over the weekend, if no issues show up I'll put it into HEAD.
>
>Yongari's changes in TX context handling which effects checksum and tso
>are added. I've also decided that multiple queues in 82574 just are a source
>of problems without a lot of benefit, so it still uses MSIX but with 
>only 3 vectors,
>meaning it seperates TX and RX but has a single queue.
>
>
>Thanks, looking forward to trying it out!  With respect to the 
>multiple queues, I thought the driver already used just the one on 
>RELENG_8 ?  If not, is there a way to force the existing driver to 
>use just the one queue ?
>
>On the box that has the NIC locking up, it shows
>
>em1@pci0:9:0:0: class=0x020000 card=0x34ec8086 chip=0x10d38086 
>rev=0x00 hdr=0x00
>
>   vendor     = 'Intel Corporation'
>   device     = 'Intel 82574L Gigabit Ethernet Controller (82574L)'
>   class      = network
>   subclass   = ethernet
>   cap 01[c8] = powerspec 2  supports D0 D3  current D0
>   cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
>   cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
>
>and
>
>vmstat -i shows
>
>irq256: em0                      5129063        353
>irq257: em1                       531251         36
>
>in a wedged state, stats look like
>
>dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 7.0.5
>dev.em.1.%driver: em
>dev.em.1.%location: slot=0 function=0 handle=\_SB_.PCI0.PEX4.HART
>dev.em.1.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x8086 
>subdevice=0x34ec class=0x020000
>dev.em.1.%parent: pci9
>dev.em.1.nvm: -1
>dev.em.1.rx_int_delay: 0
>dev.em.1.tx_int_delay: 66
>dev.em.1.rx_abs_int_delay: 66
>dev.em.1.tx_abs_int_delay: 66
>dev.em.1.rx_processing_limit: 100
>dev.em.1.link_irq: 0
>dev.em.1.mbuf_alloc_fail: 0
>dev.em.1.cluster_alloc_fail: 0
>dev.em.1.dropped: 0
>dev.em.1.tx_dma_fail: 0
>dev.em.1.fc_high_water: 18432
>dev.em.1.fc_low_water: 16932
>dev.em.1.mac_stats.excess_coll: 0
>dev.em.1.mac_stats.symbol_errors: 0
>dev.em.1.mac_stats.sequence_errors: 0
>dev.em.1.mac_stats.defer_count: 0
>dev.em.1.mac_stats.missed_packets: 41522
>dev.em.1.mac_stats.recv_no_buff: 19
>dev.em.1.mac_stats.recv_errs: 0
>dev.em.1.mac_stats.crc_errs: 0
>dev.em.1.mac_stats.alignment_errs: 0
>dev.em.1.mac_stats.coll_ext_errs: 0
>dev.em.1.mac_stats.rx_overruns: 41398
>dev.em.1.mac_stats.watchdog_timeouts: 0
>dev.em.1.mac_stats.xon_recvd: 0
>dev.em.1.mac_stats.xon_txd: 0
>dev.em.1.mac_stats.xoff_recvd: 0
>dev.em.1.mac_stats.xoff_txd: 0
>dev.em.1.mac_stats.total_pkts_recvd: 95229129
>dev.em.1.mac_stats.good_pkts_recvd: 95187607
>dev.em.1.mac_stats.bcast_pkts_recvd: 79244
>dev.em.1.mac_stats.mcast_pkts_recvd: 0
>dev.em.1.mac_stats.rx_frames_64: 93680
>dev.em.1.mac_stats.rx_frames_65_127: 1516349
>dev.em.1.mac_stats.rx_frames_128_255: 4464941
>dev.em.1.mac_stats.rx_frames_256_511: 4024
>dev.em.1.mac_stats.rx_frames_512_1023: 2096067
>dev.em.1.mac_stats.rx_frames_1024_1522: 87012546
>dev.em.1.mac_stats.good_octets_recvd: 0
>dev.em.1.mac_stats.good_octest_txd: 0
>dev.em.1.mac_stats.total_pkts_txd: 66775098
>dev.em.1.mac_stats.good_pkts_txd: 66775098
>dev.em.1.mac_stats.bcast_pkts_txd: 509
>dev.em.1.mac_stats.mcast_pkts_txd: 7
>dev.em.1.mac_stats.tx_frames_64: 48038472
>dev.em.1.mac_stats.tx_frames_65_127: 13402833
>dev.em.1.mac_stats.tx_frames_128_255: 5324413
>dev.em.1.mac_stats.tx_frames_256_511: 957
>dev.em.1.mac_stats.tx_frames_512_1023: 319
>dev.em.1.mac_stats.tx_frames_1024_1522: 8104
>dev.em.1.mac_stats.tso_txd: 1069
>dev.em.1.mac_stats.tso_ctx_fail: 0
>dev.em.1.interrupts.asserts: 0
>dev.em.1.interrupts.rx_pkt_timer: 0
>dev.em.1.interrupts.rx_abs_timer: 0
>dev.em.1.interrupts.tx_pkt_timer: 0
>dev.em.1.interrupts.tx_abs_timer: 0
>dev.em.1.interrupts.tx_queue_empty: 0
>dev.em.1.interrupts.tx_queue_min_thresh: 0
>dev.em.1.interrupts.rx_desc_min_thresh: 0
>dev.em.1.interrupts.rx_overrun: 0
>dev.em.1.host.breaker_tx_pkt: 0
>dev.em.1.host.host_tx_pkt_discard: 0
>dev.em.1.host.rx_pkt: 0
>dev.em.1.host.breaker_rx_pkts: 0
>dev.em.1.host.breaker_rx_pkt_drop: 0
>dev.em.1.host.tx_good_pkt: 0
>dev.em.1.host.breaker_tx_pkt_drop: 0
>dev.em.1.host.rx_good_bytes: 0
>dev.em.1.host.tx_good_bytes: 0
>dev.em.1.host.length_errors: 0
>dev.em.1.host.serdes_violation_pkt: 0
>dev.em.1.host.header_redir_missed: 0
>
>ifconfig down/up just panics or locks up the box when its in this 
>state.  I also have IPMI enabled on this nic, but it shows the same 
>issue with it disabled.
>
>       ---Mike
>
>
>
>--------------------------------------------------------------------
>Mike Tancsa,                                      tel +1 519 651 3400
>Sentex Communications, 
><mailto:mike@sentex.net><mailto:mike@sentex.net>mike@sentex.net
>Providing Internet since 
>1994 
><<http://www.sentex.net>http://www.sentex.net>www.sentex.net
>Cambridge, Ontario 
>Canada 
><<http://www.sentex.net/mike>http://www.sentex.net/mike>www.sentex.net/mike
>
>
>--------------------------------------------------------------------
>Mike Tancsa,                                      tel +1 519 651 3400
>Sentex 
>Communications, 
><mailto:mike@sentex.net>mike@sentex.net
>Providing Internet since 
>1994                    <http://www.sentex.net>www.sentex.net
>Cambridge, Ontario 
>Canada                         <http://www.sentex.net/mike>www.sentex.net/mike
>

--------------------------------------------------------------------
Mike Tancsa,                                      tel +1 519 651 3400
Sentex Communications,                            mike@sentex.net
Providing Internet since 1994                    www.sentex.net
Cambridge, Ontario Canada                         www.sentex.net/mike




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201010021432.o92EWAIs033670>