Date: Mon, 18 Jul 2011 17:11:18 +0100 From: "Steven Hartland" <killing@multiplay.co.uk> To: "Steven Hartland" <killing@multiplay.co.uk>, "Kevin Oberman" <kob6558@gmail.com>, "Vogel, Jack" <jack.vogel@intel.com> Cc: freebsd-net@freebsd.org Subject: Re: high bandwidth tcp connection stalls on igb (was: igb enable_aim or flow_control causing tcp stalls?) Message-ID: <22837855EB8D495BB30524767F04B5FF@multiplay.co.uk> References: <379885BA631F4C7787C24E00A174B429@multiplay.co.uk><CAN6yY1s=o=Fd5mgkjrPPqeKmtCXVeDbFrEUqJZk85ccBBa2X4Q@mail.gmail.com> <CEE01EEE7538428F8CB9A764CF0EFFFA@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
Confirmed with blade to blade transfer. Also noticed that if two transfers are happening at the same time, both will stall, not just one, but ssh consoles don't seem to be effected only high volume transfers like scp and rsync. It also seems like the more active connections the more likely a stall will happen. Another thing I've noticed is the trace from the source host shows a large number of "TCP ACKed lost segment" interspersed by 8 - 16K IP packets starting just after the ssh handshake, when looking at the trace in wireshark, might this be relavent? I've tried with as many hardware options disabled as I could find, but no change -tso -rxcsum -txcsum -lro -vlanhwtag net.inet.tcp.tso=0 dev.igb.0.enable_aim=0 dev.igb.0.flow_control=0 dev.igb.1.enable_aim=0 dev.igb.1.flow_control=0 Here's the stats from the suspect device which has just stalled again Jack does anything look suspect here and do you have any ideas what this might be? dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.0.7 dev.igb.0.%driver: igb dev.igb.0.%location: slot=0 function=0 dev.igb.0.%pnpinfo: vendor=0x8086 device=0x10e7 subvendor=0x15d9 subdevice=0x10e7 class=0x020000 dev.igb.0.%parent: pci5 dev.igb.0.nvm: -1 dev.igb.0.flow_control: 0 dev.igb.0.enable_aim: 0 dev.igb.0.rx_processing_limit: 100 dev.igb.0.link_irq: 4 dev.igb.0.dropped: 0 dev.igb.0.tx_dma_fail: 0 dev.igb.0.rx_overruns: 0 dev.igb.0.watchdog_timeouts: 0 dev.igb.0.device_control: 14424641 dev.igb.0.rx_control: 67141634 dev.igb.0.interrupt_mask: 4 dev.igb.0.extended_int_mask: 2147484159 dev.igb.0.tx_buf_alloc: 0 dev.igb.0.rx_buf_alloc: 0 dev.igb.0.fc_high_water: 58976 dev.igb.0.fc_low_water: 58960 dev.igb.0.queue0.interrupt_rate: 8000 dev.igb.0.queue0.txd_head: 266 dev.igb.0.queue0.txd_tail: 266 dev.igb.0.queue0.no_desc_avail: 0 dev.igb.0.queue0.tx_packets: 9462610 dev.igb.0.queue0.rxd_head: 891 dev.igb.0.queue0.rxd_tail: 890 dev.igb.0.queue0.rx_packets: 15326075 dev.igb.0.queue0.rx_bytes: 19146964251 dev.igb.0.queue0.lro_queued: 0 dev.igb.0.queue0.lro_flushed: 0 dev.igb.0.queue1.interrupt_rate: 8000 dev.igb.0.queue1.txd_head: 225 dev.igb.0.queue1.txd_tail: 225 dev.igb.0.queue1.no_desc_avail: 0 dev.igb.0.queue1.tx_packets: 15985904 dev.igb.0.queue1.rxd_head: 999 dev.igb.0.queue1.rxd_tail: 998 dev.igb.0.queue1.rx_packets: 25696231 dev.igb.0.queue1.rx_bytes: 32902117763 dev.igb.0.queue1.lro_queued: 0 dev.igb.0.queue1.lro_flushed: 0 dev.igb.0.queue2.interrupt_rate: 8000 dev.igb.0.queue2.txd_head: 157 dev.igb.0.queue2.txd_tail: 157 dev.igb.0.queue2.no_desc_avail: 0 dev.igb.0.queue2.tx_packets: 12697405 dev.igb.0.queue2.rxd_head: 778 dev.igb.0.queue2.rxd_tail: 777 dev.igb.0.queue2.rx_packets: 20780810 dev.igb.0.queue2.rx_bytes: 26096219675 dev.igb.0.queue2.lro_queued: 0 dev.igb.0.queue2.lro_flushed: 0 dev.igb.0.queue3.interrupt_rate: 8000 dev.igb.0.queue3.txd_head: 242 dev.igb.0.queue3.txd_tail: 242 dev.igb.0.queue3.no_desc_avail: 0 dev.igb.0.queue3.tx_packets: 11831167 dev.igb.0.queue3.rxd_head: 111 dev.igb.0.queue3.rxd_tail: 110 dev.igb.0.queue3.rx_packets: 18590831 dev.igb.0.queue3.rx_bytes: 25894011731 dev.igb.0.queue3.lro_queued: 0 dev.igb.0.queue3.lro_flushed: 0 dev.igb.0.queue4.interrupt_rate: 8000 dev.igb.0.queue4.txd_head: 841 dev.igb.0.queue4.txd_tail: 841 dev.igb.0.queue4.no_desc_avail: 0 dev.igb.0.queue4.tx_packets: 13540958 dev.igb.0.queue4.rxd_head: 835 dev.igb.0.queue4.rxd_tail: 834 dev.igb.0.queue4.rx_packets: 21880643 dev.igb.0.queue4.rx_bytes: 28291440234 dev.igb.0.queue4.lro_queued: 0 dev.igb.0.queue4.lro_flushed: 0 dev.igb.0.queue5.interrupt_rate: 8000 dev.igb.0.queue5.txd_head: 941 dev.igb.0.queue5.txd_tail: 941 dev.igb.0.queue5.no_desc_avail: 0 dev.igb.0.queue5.tx_packets: 11124540 dev.igb.0.queue5.rxd_head: 214 dev.igb.0.queue5.rxd_tail: 213 dev.igb.0.queue5.rx_packets: 18048214 dev.igb.0.queue5.rx_bytes: 22957384083 dev.igb.0.queue5.lro_queued: 0 dev.igb.0.queue5.lro_flushed: 0 dev.igb.0.queue6.interrupt_rate: 8000 dev.igb.0.queue6.txd_head: 782 dev.igb.0.queue6.txd_tail: 783 dev.igb.0.queue6.no_desc_avail: 0 dev.igb.0.queue6.tx_packets: 13581988 dev.igb.0.queue6.rxd_head: 504 dev.igb.0.queue6.rxd_tail: 503 dev.igb.0.queue6.rx_packets: 21590520 dev.igb.0.queue6.rx_bytes: 29030489548 dev.igb.0.queue6.lro_queued: 0 dev.igb.0.queue6.lro_flushed: 0 dev.igb.0.queue7.interrupt_rate: 8000 dev.igb.0.queue7.txd_head: 961 dev.igb.0.queue7.txd_tail: 961 dev.igb.0.queue7.no_desc_avail: 0 dev.igb.0.queue7.tx_packets: 14163482 dev.igb.0.queue7.rxd_head: 38 dev.igb.0.queue7.rxd_tail: 37 dev.igb.0.queue7.rx_packets: 23149606 dev.igb.0.queue7.rx_bytes: 29114500225 dev.igb.0.queue7.lro_queued: 0 dev.igb.0.queue7.lro_flushed: 0 dev.igb.0.mac_stats.excess_coll: 0 dev.igb.0.mac_stats.single_coll: 0 dev.igb.0.mac_stats.multiple_coll: 0 dev.igb.0.mac_stats.late_coll: 0 dev.igb.0.mac_stats.collision_count: 0 dev.igb.0.mac_stats.symbol_errors: 0 dev.igb.0.mac_stats.sequence_errors: 0 dev.igb.0.mac_stats.defer_count: 0 dev.igb.0.mac_stats.missed_packets: 0 dev.igb.0.mac_stats.recv_no_buff: 0 dev.igb.0.mac_stats.recv_undersize: 0 dev.igb.0.mac_stats.recv_fragmented: 0 dev.igb.0.mac_stats.recv_oversize: 0 dev.igb.0.mac_stats.recv_jabber: 0 dev.igb.0.mac_stats.recv_errs: 0 dev.igb.0.mac_stats.crc_errs: 0 dev.igb.0.mac_stats.alignment_errs: 0 dev.igb.0.mac_stats.coll_ext_errs: 0 dev.igb.0.mac_stats.xon_recvd: 0 dev.igb.0.mac_stats.xon_txd: 0 dev.igb.0.mac_stats.xoff_recvd: 0 dev.igb.0.mac_stats.xoff_txd: 0 dev.igb.0.mac_stats.total_pkts_recvd: 165067073 dev.igb.0.mac_stats.good_pkts_recvd: 165062852 dev.igb.0.mac_stats.bcast_pkts_recvd: 7827 dev.igb.0.mac_stats.mcast_pkts_recvd: 20 dev.igb.0.mac_stats.rx_frames_64: 18346 dev.igb.0.mac_stats.rx_frames_65_127: 2395695 dev.igb.0.mac_stats.rx_frames_128_255: 6686114 dev.igb.0.mac_stats.rx_frames_256_511: 9501896 dev.igb.0.mac_stats.rx_frames_512_1023: 14475414 dev.igb.0.mac_stats.rx_frames_1024_1522: 131985387 dev.igb.0.mac_stats.good_octets_recvd: 214093372362 dev.igb.0.mac_stats.good_octets_txd: 7388817393 dev.igb.0.mac_stats.total_pkts_txd: 102387885 dev.igb.0.mac_stats.good_pkts_txd: 102387885 dev.igb.0.mac_stats.bcast_pkts_txd: 4 dev.igb.0.mac_stats.mcast_pkts_txd: 0 dev.igb.0.mac_stats.tx_frames_64: 662 dev.igb.0.mac_stats.tx_frames_65_127: 102263884 dev.igb.0.mac_stats.tx_frames_128_255: 44518 dev.igb.0.mac_stats.tx_frames_256_511: 25033 dev.igb.0.mac_stats.tx_frames_512_1023: 18188 dev.igb.0.mac_stats.tx_frames_1024_1522: 35600 dev.igb.0.mac_stats.tso_txd: 0 dev.igb.0.mac_stats.tso_ctx_fail: 0 dev.igb.0.interrupts.asserts: 27233325 dev.igb.0.interrupts.rx_pkt_timer: 165061054 dev.igb.0.interrupts.rx_abs_timer: 0 dev.igb.0.interrupts.tx_pkt_timer: 0 dev.igb.0.interrupts.tx_abs_timer: 165062852 dev.igb.0.interrupts.tx_queue_empty: 102387060 dev.igb.0.interrupts.tx_queue_min_thresh: 0 dev.igb.0.interrupts.rx_desc_min_thresh: 0 dev.igb.0.interrupts.rx_overrun: 0 dev.igb.0.host.breaker_tx_pkt: 0 dev.igb.0.host.host_tx_pkt_discard: 0 dev.igb.0.host.rx_pkt: 1798 dev.igb.0.host.breaker_rx_pkts: 0 dev.igb.0.host.breaker_rx_pkt_drop: 0 dev.igb.0.host.tx_good_pkt: 825 dev.igb.0.host.breaker_tx_pkt_drop: 0 dev.igb.0.host.rx_good_bytes: 214093406063 dev.igb.0.host.tx_good_bytes: 7388817393 dev.igb.0.host.length_errors: 0 dev.igb.0.host.serdes_violation_pkt: 0 dev.igb.0.host.header_redir_missed: 0 ----- Original Message ----- From: "Steven Hartland" <killing@multiplay.co.uk> To: "Kevin Oberman" <kob6558@gmail.com> Cc: <freebsd-net@freebsd.org> Sent: Monday, July 18, 2011 12:20 PM Subject: Re: igb enable_aim or flow_control causing tcp stalls? > ----- Original Message ----- > From: "Kevin Oberman" <kob6558@gmail.com> >> >> Use "tcpdump -s0 -w file.pcap host remote-system" to see how it fails. You >> may want to capture on both ends. Then use wireshark (in ports) to analyze >> the data. >> >> There are other tools to provide other types of analysis, depending on the >> type of problem. > > I've managed to get a capture from both ends but its doesn't really make too > much sense to me. You can clearly see the stall which starts at the 2.1 second > mark, and recovers at the 65 second mark but what's causing it is a mystery. > > I've attached what I believe is the relevant a snippet from each trace. > > At this point I believe I've eliminated aim and flow_control as these where > both off when this test was preformed > > Any advice would be appreciated. > > The layout for this test was:- > Source (7.0-RELEASE-p2 on em0) -> Cisco 6509 -> supermicro blade -> Target > (8.2-RELEASE on igb0) > > I'm going to try and eliminate the Cisco next by going from two blades > on the local supermicro blade switch. > > Regards > Steve > > > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the > event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any > information contained in it. > > In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. -------------------------------------------------------------------------------- > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?22837855EB8D495BB30524767F04B5FF>