Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Jun 2013 11:32:46 GMT
From:      Boris Astardzhiev <boris.astardzhiev@gmail.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/179926: LACP: active aggregator selection bug
Message-ID:  <201306241132.r5OBWkdo085748@oldred.freebsd.org>
Resent-Message-ID: <201306241140.r5OBe0s5010521@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         179926
>Category:       kern
>Synopsis:       LACP: active aggregator selection bug
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jun 24 11:40:00 UTC 2013
>Closed-Date:
>Last-Modified:
>Originator:     Boris Astardzhiev
>Release:        FreeBSD 9.1-RELEASE #0 r243826
>Organization:
Smartcom Bulgaria AD
>Environment:
FreeBSD freebsd91 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243826: Tue Dec  4 06:55:39 UTC 2012     root@obrian.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  i386
>Description:
Hi,

I've been investigating the LACP implementation in FreeBSD and have
encountered a bug. Here's the set:

"                      ---------          ----------              "
"                      | lagg1 |          | bond0  |              "
"   ---------          |      xl0--------eth0      |   ---------  "
"   | hosts |----b1----1 FBSD rl0--------eth1 Linux|---| hosts |  "
"   ---------          |  9.1 rl1--------eth2      |   ---------  "
"                      |       |          |        |              "
"                      ---------          ----------              "

On a FreeBSD 9.1-RELEASE #0 r243826 system a lagg is created and three 
interfaces are added to it:
- xl0
- rl0
- rl1

On a Linux system a bonding interface is added *ONLY ONE* interface:
- eth0
Note: I think the Linux may be substituted with any other LACP implementation.

The lagg protocol on both of the systems is LACP.

LACPDUs transmission/reception takes place only between xl0 and eth0.
Here's the result:

root@freebsd91:/root # ifconfig lagg1
lagg1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=2008<VLAN_MTU,WOL_MAGIC>
	ether 00:10:b5:7f:97:fb
	inet6 fe80::210:b5ff:fe7f:97fb%lagg1 prefixlen 64 scopeid 0x9 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: active
	laggproto lacp lagghash l2,l3,l4
	laggport: xl0 flags=18<COLLECTING,DISTRIBUTING>
	laggport: rl1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
	laggport: rl0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

I consider that xl0 is the only available link therefor the aggregation must
rely on it. However the lacp implementation has chosen the other two links
that haven't received a single LACPDU.

I think the problem is related to the selection of best active aggregator -
in lacp_select_active_aggregator(). I've attached the debug output of
sysctl net.lacp_debug.

.. snippet ...
Jun 24 10:41:43 freebsd91 kernel: xl0: new pstate 3f<ACTIVITY,TIMEOUT,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Jun 24 10:41:43 freebsd91 kernel: rl0: lacp_sm_mux: state 4
Jun 24 10:41:43 freebsd91 kernel: rl1: lacp_sm_mux: state 4
Jun 24 10:41:43 freebsd91 kernel: xl0: lacp_sm_mux: state 3
Jun 24 10:41:43 freebsd91 kernel: xl0: enable distributing on aggregator [(8000,00-10-B5-7F-97-FB,0126,0000,0000),(FFFF,E0-8F-EC-00-B5-2F,0009,0000,0000)], nports 0 -> 1
Jun 24 10:41:43 freebsd91 kernel: lacp_select_active_aggregator
Jun 24 10:41:43 freebsd91 kernel: [(8000,00-10-B5-7F-97-FB,0126,0000,0000),(FFFF,00-00-00-00-00-00,0000,0000,0000)], speed=200000000, nports=2
Jun 24 10:41:43 freebsd91 kernel: [(8000,00-10-B5-7F-97-FB,0126,0000,0000),(FFFF,E0-8F-EC-00-B5-2F,0009,0000,0000)], speed=100000000, nports=1
Jun 24 10:41:43 freebsd91 kernel: active aggregator not changed
Jun 24 10:41:43 freebsd91 kernel: new [(8000,00-10-B5-7F-97-FB,0126,0000,0000),(FFFF,00-00-00-00-00-00,0000,0000,0000)]
Jun 24 10:41:43 freebsd91 kernel: xl0: mux_state 3 -> 4
Jun 24 10:41:43 freebsd91 kernel: xl0: lacpdu transmit
.. snippet ...

Though there is an aggregator with an active partner the implementation has chosen the other aggregator:
Jun 24 10:41:43 freebsd91 kernel: new [(8000,00-10-B5-7F-97-FB,0126,0000,0000),(FFFF,00-00-00-00-00-00,0000,0000,0000)]

Do you think that such aggregators must be skipped in favour of aggregators with active partners? I've applied a patch that fixes this issue and xl0 remains the only active link but I'm not sure it is correct and it has the correct approach.

Any comments are appreciated.

Greetings,
Boris Astardzhiev,
Smartcom Bulgaria AD
>How-To-Repeat:
Follow the described set and the bug is reproduced.
>Fix:
A patch is attached.

Patch attached with submission follows:

diff --git a/sys/net/ieee8023ad_lacp.c b/sys/net/ieee8023ad_lacp.c
index 70de743..5060223 100644
--- a/sys/net/ieee8023ad_lacp.c
+++ b/sys/net/ieee8023ad_lacp.c
@@ -947,6 +947,7 @@ lacp_select_active_aggregator(struct lacp_softc *lsc)
 	struct lacp_aggregator *best_la = NULL;
 	uint64_t best_speed = 0;
 	char buf[LACP_LAGIDSTR_MAX+1];
+	u_char zero_mac[] = { 0, 0, 0, 0, 0, 0 };
 
 	LACP_TRACE(NULL);
 
@@ -957,6 +958,13 @@ lacp_select_active_aggregator(struct lacp_softc *lsc)
 			continue;
 		}
 
+		/*
+		 * Skip aggregators that has no partner.
+		 */
+		if (!memcmp(LACP_SYS_MAC(la->la_partner),
+		    zero_mac, sizeof(zero_mac)))
+			continue;
+
 		speed = lacp_aggregator_bandwidth(la);
 		LACP_DPRINTF((NULL, "%s, speed=%jd, nports=%d\n",
 		    lacp_format_lagid_aggregator(la, buf, sizeof(buf)),
diff --git a/sys/net/ieee8023ad_lacp.h b/sys/net/ieee8023ad_lacp.h
index 9481ce2..4076513 100644
--- a/sys/net/ieee8023ad_lacp.h
+++ b/sys/net/ieee8023ad_lacp.h
@@ -264,6 +264,7 @@ struct lacp_softc {
 	((((s1) ^ (s2)) & (mask)) == 0)
 
 #define	LACP_SYS_PRI(peer)	(peer).lip_systemid.lsi_prio
+#define LACP_SYS_MAC(peer)	(peer).lip_systemid.lsi_mac
 
 #define	LACP_PORT(_lp)	((struct lacp_port *)(_lp)->lp_psc)
 #define	LACP_SOFTC(_sc)	((struct lacp_softc *)(_sc)->sc_psc)


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201306241132.r5OBWkdo085748>