Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 16 Nov 2014 18:58:40 -0500
From:      Mike Tancsa <mike@sentex.net>
To:        Adrian Chadd <adrian@freebsd.org>, FF <fusionfoto@gmail.com>
Cc:        "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org>
Subject:   Re: em0 tx_dma_fail incrementing [SOLVED]
Message-ID:  <54693A30.3040404@sentex.net>
In-Reply-To: <CAJ-Vmo=oeRA6rFtm7t3eOumLVpnEJsRwBPP3d8XgLzQgpoJLqw@mail.gmail.com>
References:  <CAD=tpefu21VGs2sx8GMqhbQw-ivJKE99m_QNLKgayKCyKS8-ZQ@mail.gmail.com> <CAJ-Vmo=oeRA6rFtm7t3eOumLVpnEJsRwBPP3d8XgLzQgpoJLqw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11/16/2014 12:28 PM, Adrian Chadd wrote:
> Hi!
>
> Good catch! Would you mind filing a bug so we remember and
> (hopefully!) fix it to be the default?
>
> https://bugs.freebsd.org/submit/

I wonder if this is the bug I was running into

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193802

	---Mike

>
> Thanks!
>
>
> -adrian
>
>
> On 15 November 2014 08:31, FF <fusionfoto@gmail.com> wrote:
>> It looks like FreeBSD may be a victim of this bug:
>>
>>
>>
>> http://www.intel.com.au/content/dam/www/public/us/en/documents/specification-updates/82574-gbe-controller-spec-update.pdf
>>
>>
>>
>> 17. Tx Data Corruption When Using TCP Segmentation Offload
>>
>> Problem: When using TSO, a situation can occur where a PCIe MRd request is
>> repeated with the
>>
>> same address, resulting in data corruption. At the end of the TCP packet,
>> the Tx DMA
>>
>> hangs because the length doesn't match. This can only occur when the
>> following are
>>
>> true:
>>
>> • The first buffer of the packet is larger than [3 * (max_read_request -
>> 4)].
>>
>> • There is a 4 KB boundary within 64 bytes following the end of the header
>> bytes in
>>
>> the buffer
>>
>> Implication: Possible data corruption since a TCP packet is transmitted
>> containing the wrong data but
>>
>> with the correct checksum.
>>
>> Data transmission halts as the Tx DMA module enters a hang state.
>>
>> Workaround: The failure can be avoided by ensuring at least one of the
>> following:
>>
>> • The buffer containing the headers should not be larger than [3 *
>>
>> (max_read_request - 4)]. To meet this requirement even for the minimum
>> value of
>>
>> 128 bytes for max_read_request, the buffer should not be larger than 372
>> bytes.
>>
>> • The alignment of the buffer containing the headers should be such that
>> there is no
>>
>> 4 KB boundary within 64 bytes following the end of the header bytes.
>> Assuming
>>
>> standard Ethernet/IP/TCP headers of 54 bytes, this means that the buffer
>> should
>>
>> not start 54-118 bytes before a 4 KB boundary. For example, 128-byte
>> alignment
>>
>> for this buffer could be used to fulfill this condition.
>>
>> This problem has not been reported when using an Intel Linux* or Windows*
>> drivers.
>>
>> Current analysis shows it is very unlikely for a situation to exist that
>> would cause the
>>
>> 82574 to be at risk for the errata when using the Intel Linux or Windows
>> drivers.
>>
>>
>>
>> Linux and other distros seem to have fixed it. This could be getting
>> exercised because FreeBSD recently changed the default buffer size above
>> 256 for this driver.
>>
>>
>> Since I didn't want to reboot to try the lower buffer size, I turned off
>> TSO on all the machines that I'd checked that were actively incrementing
>> tx_dma_fail for em interfaces then re-enabled their membership into the
>> LACP.
>>
>>
>> In brief testing, (few gigabits for a few minutes) tx_dma_fail has not
>> incremented and throughput has not been negatively impacted (before vs
>> after re-enable).
>>
>>
>> This is so anyone else who is scratching their head about why em
>> performance is terrible can solve it.
>>
>>
>> Best,
>>
>>
>> FF
>>
>>
>> On Thu, Nov 13, 2014 at 1:52 PM, FF <fusionfoto@gmail.com> wrote:
>>
>>>
>>> What knob do I need to turn to address this?
>>>
>>> This em0 is in an LACP bundle with an igb0 that isn't showing this problem.
>>>
>>> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.3.8
>>> dev.em.0.%driver: em
>>> dev.em.0.%location: slot=25 function=0 handle=\_SB_.PCI0.GLAN
>>> dev.em.0.%pnpinfo: vendor=0x8086 device=0x153b subvendor=0x15d9
>>> subdevice=0x153b class=0x020000
>>> dev.em.0.%parent: pci0
>>> dev.em.0.nvm: -1
>>> dev.em.0.debug: -1
>>> dev.em.0.fc: 3
>>> dev.em.0.rx_int_delay: 0
>>> dev.em.0.tx_int_delay: 66
>>> dev.em.0.rx_abs_int_delay: 66
>>> dev.em.0.tx_abs_int_delay: 66
>>> dev.em.0.itr: 488
>>> dev.em.0.rx_processing_limit: 100
>>> dev.em.0.eee_control: 1
>>> dev.em.0.link_irq: 0
>>> dev.em.0.mbuf_alloc_fail: 52
>>> dev.em.0.cluster_alloc_fail: 0
>>> dev.em.0.dropped: 0
>>> **
>>> dev.em.0.tx_dma_fail: 1834648
>>> dev.em.0.rx_overruns: 3109
>>> **
>>> dev.em.0.watchdog_timeouts: 0
>>> dev.em.0.device_control: 1209532992
>>> dev.em.0.rx_control: 67141634
>>> dev.em.0.fc_high_water: 23584
>>> dev.em.0.fc_low_water: 20552
>>> dev.em.0.queue0.txd_head: 577
>>> dev.em.0.queue0.txd_tail: 577
>>> dev.em.0.queue0.tx_irq: 0
>>> dev.em.0.queue0.no_desc_avail: 0
>>> dev.em.0.queue0.rxd_head: 967
>>> dev.em.0.queue0.rxd_tail: 966
>>> dev.em.0.queue0.rx_irq: 0
>>> dev.em.0.mac_stats.excess_coll: 0
>>> dev.em.0.mac_stats.single_coll: 0
>>> dev.em.0.mac_stats.multiple_coll: 0
>>> dev.em.0.mac_stats.late_coll: 0
>>> dev.em.0.mac_stats.collision_count: 0
>>> dev.em.0.mac_stats.symbol_errors: 0
>>> dev.em.0.mac_stats.sequence_errors: 0
>>> dev.em.0.mac_stats.defer_count: 0
>>> dev.em.0.mac_stats.missed_packets: 61094
>>> dev.em.0.mac_stats.recv_no_buff: 60008
>>> dev.em.0.mac_stats.recv_undersize: 0
>>> dev.em.0.mac_stats.recv_fragmented: 0
>>> dev.em.0.mac_stats.recv_oversize: 0
>>> dev.em.0.mac_stats.recv_jabber: 0
>>> dev.em.0.mac_stats.recv_errs: 0
>>> dev.em.0.mac_stats.crc_errs: 0
>>> dev.em.0.mac_stats.alignment_errs: 0
>>> dev.em.0.mac_stats.coll_ext_errs: 0
>>> dev.em.0.mac_stats.xon_recvd: 40226659
>>> dev.em.0.mac_stats.xon_txd: 2132
>>> dev.em.0.mac_stats.xoff_recvd: 40241216
>>> dev.em.0.mac_stats.xoff_txd: 2073563
>>> dev.em.0.mac_stats.total_pkts_recvd: 3219537541
>>> dev.em.0.mac_stats.good_pkts_recvd: 3139008594
>>> dev.em.0.mac_stats.bcast_pkts_recvd: 3953817
>>> dev.em.0.mac_stats.mcast_pkts_recvd: 607157
>>> dev.em.0.mac_stats.rx_frames_64: 0
>>> dev.em.0.mac_stats.rx_frames_65_127: 0
>>> dev.em.0.mac_stats.rx_frames_128_255: 0
>>> dev.em.0.mac_stats.rx_frames_256_511: 0
>>> dev.em.0.mac_stats.rx_frames_512_1023: 0
>>> dev.em.0.mac_stats.rx_frames_1024_1522: 0
>>> dev.em.0.mac_stats.good_octets_recvd: 3527296369841
>>> dev.em.0.mac_stats.good_octets_txd: 14348531993101
>>> dev.em.0.mac_stats.total_pkts_txd: 10735190291
>>> dev.em.0.mac_stats.good_pkts_txd: 10733114595
>>> dev.em.0.mac_stats.bcast_pkts_txd: 14
>>> dev.em.0.mac_stats.mcast_pkts_txd: 54334
>>> dev.em.0.mac_stats.tx_frames_64: 0
>>> dev.em.0.mac_stats.tx_frames_65_127: 0
>>> dev.em.0.mac_stats.tx_frames_128_255: 0
>>> dev.em.0.mac_stats.tx_frames_256_511: 0
>>> dev.em.0.mac_stats.tx_frames_512_1023: 0
>>> dev.em.0.mac_stats.tx_frames_1024_1522: 0
>>> dev.em.0.mac_stats.tso_txd: 902605586
>>> dev.em.0.mac_stats.tso_ctx_fail: 0
>>> dev.em.0.interrupts.asserts: 1392541431
>>> dev.em.0.interrupts.rx_pkt_timer: 0
>>> dev.em.0.interrupts.rx_abs_timer: 0
>>> dev.em.0.interrupts.tx_pkt_timer: 0
>>> dev.em.0.interrupts.tx_abs_timer: 0
>>> dev.em.0.interrupts.tx_queue_empty: 0
>>> dev.em.0.interrupts.tx_queue_min_thresh: 0
>>> dev.em.0.interrupts.rx_desc_min_thresh: 0
>>> dev.em.0.interrupts.rx_overrun: 0
>>> dev.em.0.wake: 0
>>>
>>> dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.3.10
>>> dev.igb.0.%driver: igb
>>> dev.igb.0.%location: slot=0 function=0 handle=\_SB_.PCI0.RP04.PXSX
>>> dev.igb.0.%pnpinfo: vendor=0x8086 device=0x1533 subvendor=0x15d9
>>> subdevice=0x1533 class=0x020000
>>> dev.igb.0.%parent: pci5
>>> dev.igb.0.nvm: -1
>>> dev.igb.0.enable_aim: 1
>>> dev.igb.0.fc: 3
>>> dev.igb.0.rx_processing_limit: 100
>>> dev.igb.0.dmac: 0
>>> dev.igb.0.eee_disabled: 0
>>> dev.igb.0.link_irq: 33
>>> dev.igb.0.dropped: 0
>>> dev.igb.0.tx_dma_fail: 0
>>> dev.igb.0.rx_overruns: 0
>>> dev.igb.0.watchdog_timeouts: 0
>>> dev.igb.0.device_control: 1209795137
>>> dev.igb.0.rx_control: 71335938
>>> dev.igb.0.interrupt_mask: 4
>>> dev.igb.0.extended_int_mask: 2147483679
>>> dev.igb.0.tx_buf_alloc: 0
>>> dev.igb.0.rx_buf_alloc: 0
>>> dev.igb.0.fc_high_water: 31328
>>> dev.igb.0.fc_low_water: 31312
>>> dev.igb.0.queue0.no_desc_avail: 0
>>> dev.igb.0.queue0.tx_packets: 62464141
>>> dev.igb.0.queue0.rx_packets: 73012939
>>> dev.igb.0.queue0.rx_bytes: 22529663814
>>> dev.igb.0.queue0.lro_queued: 0
>>> dev.igb.0.queue0.lro_flushed: 0
>>> dev.igb.0.queue1.no_desc_avail: 0
>>> dev.igb.0.queue1.tx_packets: 404298046
>>> dev.igb.0.queue1.rx_packets: 307675818
>>> dev.igb.0.queue1.rx_bytes: 185919902229
>>> dev.igb.0.queue1.lro_queued: 0
>>> dev.igb.0.queue1.lro_flushed: 0
>>> dev.igb.0.queue2.no_desc_avail: 0
>>> dev.igb.0.queue2.tx_packets: 3441053015
>>> dev.igb.0.queue2.rx_packets: 5511826751
>>> dev.igb.0.queue2.rx_bytes: 3054219311510
>>> dev.igb.0.queue2.lro_queued: 0
>>> dev.igb.0.queue2.lro_flushed: 0
>>> dev.igb.0.queue3.no_desc_avail: 0
>>> dev.igb.0.queue3.tx_packets: 1047838830
>>> dev.igb.0.queue3.rx_packets: 1987495318
>>> dev.igb.0.queue3.rx_bytes: 2696179247028
>>> dev.igb.0.queue3.lro_queued: 0
>>> dev.igb.0.queue3.lro_flushed: 0
>>> dev.igb.0.mac_stats.excess_coll: 0
>>> dev.igb.0.mac_stats.single_coll: 0
>>> dev.igb.0.mac_stats.multiple_coll: 0
>>> dev.igb.0.mac_stats.late_coll: 0
>>> dev.igb.0.mac_stats.collision_count: 0
>>> dev.igb.0.mac_stats.symbol_errors: 0
>>> dev.igb.0.mac_stats.sequence_errors: 0
>>> dev.igb.0.mac_stats.defer_count: 283811
>>> dev.igb.0.mac_stats.missed_packets: 9449
>>> dev.igb.0.mac_stats.recv_no_buff: 340
>>> dev.igb.0.mac_stats.recv_undersize: 0
>>> dev.igb.0.mac_stats.recv_fragmented: 0
>>> dev.igb.0.mac_stats.recv_oversize: 0
>>> dev.igb.0.mac_stats.recv_jabber: 0
>>> dev.igb.0.mac_stats.recv_errs: 0
>>> dev.igb.0.mac_stats.crc_errs: 0
>>> dev.igb.0.mac_stats.alignment_errs: 0
>>> dev.igb.0.mac_stats.coll_ext_errs: 0
>>> dev.igb.0.mac_stats.xon_recvd: 46255557
>>> dev.igb.0.mac_stats.xon_txd: 261
>>> dev.igb.0.mac_stats.xoff_recvd: 46255994
>>> dev.igb.0.mac_stats.xoff_txd: 7027
>>> dev.igb.0.mac_stats.total_pkts_recvd: 7975033582
>>> dev.igb.0.mac_stats.good_pkts_recvd: 7880001465
>>> dev.igb.0.mac_stats.bcast_pkts_recvd: 5783868
>>> dev.igb.0.mac_stats.mcast_pkts_recvd: 563315
>>> dev.igb.0.mac_stats.rx_frames_64: 28412906
>>> dev.igb.0.mac_stats.rx_frames_65_127: 3310187919
>>> dev.igb.0.mac_stats.rx_frames_128_255: 784920450
>>> dev.igb.0.mac_stats.rx_frames_256_511: 17225962
>>> dev.igb.0.mac_stats.rx_frames_512_1023: 73415350
>>> dev.igb.0.mac_stats.rx_frames_1024_1522: 3665838878
>>> dev.igb.0.mac_stats.good_octets_recvd: 5990356613544
>>> dev.igb.0.mac_stats.good_octets_txd: 46326753008181
>>> dev.igb.0.mac_stats.total_pkts_txd: 33016014138
>>> dev.igb.0.mac_stats.good_pkts_txd: 33016006850
>>> dev.igb.0.mac_stats.bcast_pkts_txd: 834
>>> dev.igb.0.mac_stats.mcast_pkts_txd: 54331
>>> dev.igb.0.mac_stats.tx_frames_64: 30741691
>>> dev.igb.0.mac_stats.tx_frames_65_127: 2174824217
>>> dev.igb.0.mac_stats.tx_frames_128_255: 139804927
>>> dev.igb.0.mac_stats.tx_frames_256_511: 59190261
>>> dev.igb.0.mac_stats.tx_frames_512_1023: 386886648
>>> dev.igb.0.mac_stats.tx_frames_1024_1522: 30224559106
>>> dev.igb.0.mac_stats.tso_txd: 2384636909
>>> dev.igb.0.mac_stats.tso_ctx_fail: 0
>>> dev.igb.0.interrupts.asserts: 4556119857
>>> dev.igb.0.interrupts.rx_pkt_timer: 7879778770
>>> dev.igb.0.interrupts.rx_abs_timer: 0
>>> dev.igb.0.interrupts.tx_pkt_timer: 0
>>> dev.igb.0.interrupts.tx_abs_timer: 0
>>> dev.igb.0.interrupts.tx_queue_empty: 33015268817
>>> dev.igb.0.interrupts.tx_queue_min_thresh: 7880001470
>>> dev.igb.0.interrupts.rx_desc_min_thresh: 0
>>> dev.igb.0.interrupts.rx_overrun: 0
>>> dev.igb.0.host.breaker_tx_pkt: 0
>>> dev.igb.0.host.host_tx_pkt_discard: 0
>>> dev.igb.0.host.rx_pkt: 222702
>>> dev.igb.0.host.breaker_rx_pkts: 0
>>> dev.igb.0.host.breaker_rx_pkt_drop: 0
>>> dev.igb.0.host.tx_good_pkt: 738033
>>> dev.igb.0.host.breaker_tx_pkt_drop: 0
>>> dev.igb.0.host.rx_good_bytes: 5990357073320
>>> dev.igb.0.host.tx_good_bytes: 46326753008181
>>> dev.igb.0.host.length_errors: 0
>>> dev.igb.0.host.serdes_violation_pkt: 0
>>> dev.igb.0.host.header_redir_missed: 0
>>> dev.igb.0.wake: 0
>>>
>>>
>>> hw.em.eee_setting: 1
>>> hw.em.rx_process_limit: 100
>>> hw.em.enable_msix: 1
>>> hw.em.sbp: 0
>>> hw.em.smart_pwr_down: 0
>>> hw.em.txd: 1024
>>> hw.em.rxd: 1024
>>> hw.em.rx_abs_int_delay: 66
>>> hw.em.tx_abs_int_delay: 66
>>> hw.em.rx_int_delay: 0
>>> hw.em.tx_int_delay: 66
>>>
>>> hw.igb.rx_process_limit: 100
>>> hw.igb.num_queues: 0
>>> hw.igb.header_split: 0
>>> hw.igb.buf_ring_size: 4096
>>> hw.igb.max_interrupt_rate: 8000
>>> hw.igb.enable_msix: 1
>>> hw.igb.enable_aim: 1
>>> hw.igb.txd: 1024
>>> hw.igb.rxd: 1024
>>>
>>> FreeBSD systemname.com 9.2-RELEASE-p10 FreeBSD 9.2-RELEASE-p10 #0
>>> r270148M: Mon Aug 18 23:14:36 EDT 2014     root@peta108:/usr/obj/usr/src/sys/CUSTOM10
>>> amd64
>>>
>>> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>>>
>>> options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>>>          ether 00:25:90:f2:2d:24
>>>          inet6 fe80::225:90ff:fef2:2d24%em0 prefixlen 64 scopeid 0x2
>>>          nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>>          media: Ethernet autoselect (1000baseT <full-duplex>)
>>>          status: active
>>> igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>>>
>>> options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>>>          ether 00:25:90:f2:2d:24
>>>          inet6 fe80::225:90ff:fef2:2d25%igb0 prefixlen 64 scopeid 0x4
>>>          nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>>          media: Ethernet autoselect (1000baseT <full-duplex>)
>>>          status: active
>>> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>>>          options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
>>>          inet6 ::1 prefixlen 128
>>>          inet6 fe80::1%lo0 prefixlen 64 scopeid 0x7
>>>          inet 127.0.0.1 netmask 0xff000000
>>>          nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
>>> lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>>>
>>> options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>>>          ether 00:25:90:f2:2d:24
>>>          inet 192.168.0.108 netmask 0xffffff00 broadcast 192.168.0.255
>>>          inet6 fe80::225:90ff:fef2:2d24%lagg0 prefixlen 64 scopeid 0x8
>>>          nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>>          media: Ethernet autoselect
>>>          status: active
>>>          laggproto lacp lagghash l2,l3,l4
>>>          laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
>>>          laggport: em0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
>>>
>>> Thanks in advance!
>>>
>>> --
>>> FF
>>>
>>
>>
>>
>> --
>> FF
>> _______________________________________________
>> freebsd-questions@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
>> To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"
>
>


-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54693A30.3040404>