Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Apr 2010 16:50:21 GMT
From:      AD <tempo@kgs.ru>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   i386/145728: Stops working lagg between two servers.
Message-ID:  <201004151650.o3FGoLTA035635@www.freebsd.org>
Resent-Message-ID: <201004151700.o3FH0CJk063549@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         145728
>Category:       i386
>Synopsis:       Stops working lagg between two servers.
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-i386
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Apr 15 17:00:12 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator:     AD
>Release:        7.2-RELEASE-p6 and 7.2-STABLE
>Organization:
ad
>Environment:
FreeBSD 1 7.2-RELEASE-p6 FreeBSD 7.2-RELEASE-p6 #1: Wed Mar 17 22:31:00 KRAT 2010     root@1:/usr/obj/usr/src/sys/1  i386


FreeBSD 2 7.2-STABLE FreeBSD 7.2-STABLE #8: Thu Apr  1 02:06:36 KRAST 2010     root@2:/usr/obj/usr/src/sys/2  i386

>Description:
There are 2 servers, in everyone costs on 4 network cards. 2 from them are united in lagg.

In some days lagg collapses:
1 server
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
        ether 00:1b:21:3b:4d:4d
        inet 1.1.1.1 netmask 0xffffffc0 broadcast 1.1.1.255
        media: Ethernet autoselect
        status: active
        laggproto lacp
        laggport: em3 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: em2 flags=4<ACTIVE>

ifconfig em2
em2: flags=9c43<UP,BROADCAST,RUNNING,OACTIVE,SIMPLEX,LINK0,MULTICAST> metric 0 mtu 1500
        options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
        ether 00:1b:21:3b:4d:4d
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
        lagg: laggdev lagg0


#less /var/run/dmesg.boot | grep em2
em2: <Intel(R) PRO/1000 Network Connection 6.9.6.Yandex[$Revision: 1.36.2.17 $]> port 0x3000-0x301f mem 0xd3180000-0xd319ffff,0xd3100000-0xd317ffff,0xd31a0000-0xd31a3fff irq 16 at device 0.0 on pci2
em2: Using MSIX interrupts
em2: Using TXD_LOW instead of TXDW
em2: [FILTER]
em2: [FILTER]
em2: [FILTER]
em2: Ethernet address: 00:1b:21:3b:4d:4d


em2@pci0:2:0:0: class=0x020000 card=0xa01f8086 chip=0x10d38086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    class      = network
    subclass   = ethernet

em3@pci0:4:0:0: class=0x020000 card=0xa01f8086 chip=0x10d38086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    class      = network
    subclass   = ethernet


2 server
lagg1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
        ether 00:1b:21:1b:19:5d
        media: Ethernet autoselect
        status: active
        laggproto lacp
        laggport: em4 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: em1 flags=18<COLLECTING,DISTRIBUTING>

em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
        ether 00:1b:21:1b:19:5d
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
        lagg: laggdev lagg1

# less /var/run/dmesg.boot |grep em1
em1: <Intel(R) PRO/1000 Network Connection 6.9.6.Yandex[$Revision: 1.36.2.17 $]> port 0x4000-0x401f mem 0xd0320000-0xd033ffff,0xd0300000-0xd031ffff irq 16 at device 0.0 on pci3
em1: Using MSI interrupt
em1: Using TXD_LOW instead of TXDW
em1: [FILTER]
em1: Ethernet address: 00:1b:21:1b:19:5d


em1@pci0:3:0:0: class=0x020000 card=0x10838086 chip=0x10b98086 rev=0x06 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82572EI PRO/1000 PT Desktop Adapter (Copper)'
    class      = network
    subclass   = ethernet
em4@pci0:5:0:0: class=0x020000 card=0xa01f8086 chip=0x10d38086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    class      = network
    subclass   = ethernet


Error log:
Apr 16 00:27:31 2 kernel: em4: link state changed to UP
Apr 16 00:27:34 2 kernel: em4: watchdog timeout -- resetting
Apr 16 00:27:34 2 kernel: em4: Excessive collisions = 0
Apr 16 00:27:34 2 kernel: em4: Sequence errors = 0
Apr 16 00:27:34 2 kernel: em4: Defer count = 0
Apr 16 00:27:34 2 kernel: em4: Missed Packets = 1217754
Apr 16 00:27:34 2 kernel: em4: Receive No Buffers = 0
Apr 16 00:27:34 2 kernel: em4: Receive Length Errors = 0
Apr 16 00:27:34 2 kernel: em4: Receive errors = 0
Apr 16 00:27:34 2 kernel: em4: Crc errors = 0
Apr 16 00:27:34 2 kernel: em4: Alignment errors = 0
Apr 16 00:27:34 2 kernel: em4: Collision/Carrier extension errors = 0
Apr 16 00:27:34 2 kernel: em4: RX overruns = 0
Apr 16 00:27:34 2 kernel: em4: watchdog timeouts = 143
Apr 16 00:27:34 2 kernel: em4: RX MSIX IRQ = 1654280804 TX MSIX IRQ = 1491971579 LINK MSIX IRQ = 1214367
Apr 16 00:27:34 2 kernel: em4: XON Rcvd = 203508246
Apr 16 00:27:34 2 kernel: em4: XON Xmtd = 3183073363
Apr 16 00:27:34 2 kernel: em4: XOFF Rcvd = 202792650
Apr 16 00:27:34 2 kernel: em4: XOFF Xmtd = 3170508497
Apr 16 00:27:34 2 kernel: em4: Good Packets Rcvd = 108209172443
Apr 16 00:27:34 2 kernel: em4: Good Packets Xmtd = 113645818564
Apr 16 00:27:34 2 kernel: em4: TSO Contexts Xmtd = 0
Apr 16 00:27:34 2 kernel: em4: TSO Contexts Failed = 0
Apr 16 00:27:34 2 kernel: em4: Adapter hardware address = 0xc52a0218
Apr 16 00:27:34 2 kernel: em4: CTRL = 0x58100248 RCTL = 0x801a
Apr 16 00:27:34 2 kernel: em4: Packet buffer = Tx=20k Rx=20k
Apr 16 00:27:34 2 kernel: em4: Flow control watermarks high = 18432 low = 16932
Apr 16 00:27:34 2 kernel: em4: tx_int_delay = 0, tx_abs_int_delay = 64
Apr 16 00:27:34 2 kernel: em4: rx_int_delay = 0, rx_abs_int_delay = 66
Apr 16 00:27:34 2 kernel: em4: fifo workaround = 0, fifo_reset_count = 0
Apr 16 00:27:34 2 kernel: em4: hw tdh = 0, hw tdt = 1
Apr 16 00:27:34 2 kernel: em4: hw rdh = 0, hw rdt = 4095, next_rx_desc_to_check = 0
Apr 16 00:27:34 2 kernel: em4: Num Tx descriptors avail = 4095
Apr 16 00:27:34 2 kernel: em4: Tx Descriptors not avail1 = 12063
Apr 16 00:27:34 2 kernel: em4: Tx Descriptors not avail2 = 0
Apr 16 00:27:34 2 kernel: em4: Std mbuf failed = 0
Apr 16 00:27:34 2 kernel: em4: Std mbuf cluster failed = 6
Apr 16 00:27:34 2 kernel: em4: Driver dropped packets = 0
Apr 16 00:27:34 2 kernel: em4: Driver tx dma failure in encap = 0
Apr 16 00:27:34 2 kernel: em4: Packets pended due to reorder = 0
Apr 16 00:27:34 2 kernel: em4: RX interrupts has been masked = 77251713
Apr 16 00:27:34 2 kernel: em4: TX interrupts has been generated = 0
Apr 16 00:27:34 2 kernel: em4: link state changed to DOWN


tcpdump -i em4
00:47:06.511867 LACPv1, length: 110
00:47:36.997247 LACPv1, length: 110



After reboot for some time all is normalised.


>How-To-Repeat:
To connect 2 servers directly through lagg.
>Fix:
While only reboot :( 

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201004151650.o3FGoLTA035635>