Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 19 Jul 2009 16:21:58 -0500
From:      Matthew Grooms <mgrooms@shrew.net>
To:        freebsd-net@freebsd.org
Cc:        max@love2party.net
Subject:   FreeBSD + carp on VMWare ESX
Message-ID:  <4A638E76.2060706@shrew.net>

next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------050006090109070805040105
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi all,

I was having problems running carp on VMWare ESX 4 and did a little 
investigative work to determine the cause of the problem. There are 
several posts on the VMWare forums of other users having the same 
difficulty, so I know its not just me :)

In any case, for carp to have a chance of working on ESX you have to 
enable promiscuous mode on the vSwitch the port group its associated 
with. But after doing this, carp interfaces immediately go into BACKUP 
state. If the the net.inet.carp.allow is set to 0, then they immediately 
move into a MASTER state. Of course this isn't useful if you actually 
want carp to work. tcpdump output showed multiple copies of the carp 
packets being bounced back to the host that emitted them. This made me 
suspect that the host was seeing its own advertisement, evaluating it as 
being sent by another host and placing its own carp interface into a 
BACKUP state as a result.

To solve this, my first inclination was to add a pf rule to block all 
inbound carp traffic from itself for a given interface. Unfortunately, 
that didn't seem to work for some reason. I ended up writing a small 
kernel patch that basically does the same thing ( IPv4 only ) which does 
work without any problem that I can see. Unfortunately I don't have much 
experience with the FreeBSD kernel so I assume that its not safe to walk 
the interface address list without holding the appropriate lock.

Would someone please have a look at this? I really need this to work in 
a production system. Others would likely be very happy to have this work 
as well, even if they have to apply a patch.

Thanks in advance,

-Matthew

--------------050006090109070805040105
Content-Type: text/plain;
 name="ip_carp.c.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="ip_carp.c.diff"

Index: ip_carp.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/ip_carp.c,v
retrieving revision 1.52.2.3
diff -u -r1.52.2.3 ip_carp.c
--- ip_carp.c	9 May 2009 00:35:38 -0000	1.52.2.3
+++ ip_carp.c	19 Jul 2009 20:12:49 -0000
@@ -533,7 +533,9 @@
 {
 	struct ip *ip = mtod(m, struct ip *);
 	struct carp_header *ch;
-	int iplen, len;
+	struct ifnet *ifp = m->m_pkthdr.rcvif;
+	struct ifaddr *ifa;
+	int len, iplen;
 
 	carpstats.carps_ipackets++;
 
@@ -543,21 +545,39 @@
 	}
 
 	/* check if received on a valid carp interface */
-	if (m->m_pkthdr.rcvif->if_carp == NULL) {
+	if (ifp->if_carp == NULL) {
 		carpstats.carps_badif++;
 		CARP_LOG("carp_input: packet received on non-carp "
 		    "interface: %s\n",
-		    m->m_pkthdr.rcvif->if_xname);
+		    ifp->if_xname);
 		m_freem(m);
 		return;
 	}
 
+	/*
+	 * verify that the source address is not valid
+	 * for the interface it was received on. this
+	 * tends to happen with VMWare ESX vSwitches.
+	 */
+	TAILQ_FOREACH(ifa, &ifp->if_addrlist, ifa_list) {
+		struct in_addr in;
+		in.s_addr = ifatoia(ifa)->ia_addr.sin_addr.s_addr;
+		if (ifa->ifa_addr->sa_family == AF_INET &&
+		    in.s_addr == ip->ip_src.s_addr ) {
+			m_freem(m);
+			return;
+		}
+	}
+
 	/* verify that the IP TTL is 255.  */
 	if (ip->ip_ttl != CARP_DFLTTL) {
 		carpstats.carps_badttl++;
 		CARP_LOG("carp_input: received ttl %d != 255i on %s\n",
 		    ip->ip_ttl,
-		    m->m_pkthdr.rcvif->if_xname);
+		    ifp->if_xname);
 		m_freem(m);
 		return;
 	}
@@ -592,7 +612,7 @@
 		carpstats.carps_badlen++;
 		CARP_LOG("carp_input: packet too short %d on %s\n",
 		    m->m_pkthdr.len,
-		    m->m_pkthdr.rcvif->if_xname);
+		    ifp->if_xname);
 		m_freem(m);
 		return;
 	}
@@ -609,7 +629,7 @@
 	if (carp_cksum(m, len - iplen)) {
 		carpstats.carps_badsum++;
 		CARP_LOG("carp_input: checksum failed on %s\n",
-		    m->m_pkthdr.rcvif->if_xname);
+		    ifp->if_xname);
 		m_freem(m);
 		return;
 	}

--------------050006090109070805040105--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A638E76.2060706>