From owner-freebsd-net@FreeBSD.ORG Sun Jul 19 21:45:58 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ABFB6106564A for ; Sun, 19 Jul 2009 21:45:58 +0000 (UTC) (envelope-from mgrooms@shrew.net) Received: from shrew.net (shrew.net [206.223.169.85]) by mx1.freebsd.org (Postfix) with ESMTP id 6E8288FC08 for ; Sun, 19 Jul 2009 21:45:58 +0000 (UTC) (envelope-from mgrooms@shrew.net) Received: from localhost (unknown [206.223.169.82]) by shrew.net (Postfix) with ESMTP id E024A79E29F; Sun, 19 Jul 2009 16:22:02 -0500 (CDT) Received: from shrew.net ([206.223.169.85]) by localhost (mx1.hub.org [206.223.169.82]) (amavisd-new, port 10024) with ESMTP id 88347-03; Sun, 19 Jul 2009 21:22:02 +0000 (UTC) Received: from hole.shrew.net (cpe-66-25-161-129.austin.res.rr.com [66.25.161.129]) by shrew.net (Postfix) with ESMTP id 2762A79E26B; Sun, 19 Jul 2009 16:22:00 -0500 (CDT) Received: from [10.22.200.30] (elon.shrew.net [10.22.200.30]) by hole.shrew.net (8.14.3/8.14.3) with ESMTP id n6JLKJf0080533 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sun, 19 Jul 2009 16:20:20 -0500 (CDT) (envelope-from mgrooms@shrew.net) Message-ID: <4A638E76.2060706@shrew.net> Date: Sun, 19 Jul 2009 16:21:58 -0500 From: Matthew Grooms User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: freebsd-net@freebsd.org Content-Type: multipart/mixed; boundary="------------050006090109070805040105" Cc: max@love2party.net Subject: FreeBSD + carp on VMWare ESX X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jul 2009 21:45:58 -0000 This is a multi-part message in MIME format. --------------050006090109070805040105 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi all, I was having problems running carp on VMWare ESX 4 and did a little investigative work to determine the cause of the problem. There are several posts on the VMWare forums of other users having the same difficulty, so I know its not just me :) In any case, for carp to have a chance of working on ESX you have to enable promiscuous mode on the vSwitch the port group its associated with. But after doing this, carp interfaces immediately go into BACKUP state. If the the net.inet.carp.allow is set to 0, then they immediately move into a MASTER state. Of course this isn't useful if you actually want carp to work. tcpdump output showed multiple copies of the carp packets being bounced back to the host that emitted them. This made me suspect that the host was seeing its own advertisement, evaluating it as being sent by another host and placing its own carp interface into a BACKUP state as a result. To solve this, my first inclination was to add a pf rule to block all inbound carp traffic from itself for a given interface. Unfortunately, that didn't seem to work for some reason. I ended up writing a small kernel patch that basically does the same thing ( IPv4 only ) which does work without any problem that I can see. Unfortunately I don't have much experience with the FreeBSD kernel so I assume that its not safe to walk the interface address list without holding the appropriate lock. Would someone please have a look at this? I really need this to work in a production system. Others would likely be very happy to have this work as well, even if they have to apply a patch. Thanks in advance, -Matthew --------------050006090109070805040105 Content-Type: text/plain; name="ip_carp.c.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ip_carp.c.diff" Index: ip_carp.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/ip_carp.c,v retrieving revision 1.52.2.3 diff -u -r1.52.2.3 ip_carp.c --- ip_carp.c 9 May 2009 00:35:38 -0000 1.52.2.3 +++ ip_carp.c 19 Jul 2009 20:12:49 -0000 @@ -533,7 +533,9 @@ { struct ip *ip = mtod(m, struct ip *); struct carp_header *ch; - int iplen, len; + struct ifnet *ifp = m->m_pkthdr.rcvif; + struct ifaddr *ifa; + int len, iplen; carpstats.carps_ipackets++; @@ -543,21 +545,39 @@ } /* check if received on a valid carp interface */ - if (m->m_pkthdr.rcvif->if_carp == NULL) { + if (ifp->if_carp == NULL) { carpstats.carps_badif++; CARP_LOG("carp_input: packet received on non-carp " "interface: %s\n", - m->m_pkthdr.rcvif->if_xname); + ifp->if_xname); m_freem(m); return; } + /* + * verify that the source address is not valid + * for the interface it was received on. this + * tends to happen with VMWare ESX vSwitches. + */ + TAILQ_FOREACH(ifa, &ifp->if_addrlist, ifa_list) { + struct in_addr in; + in.s_addr = ifatoia(ifa)->ia_addr.sin_addr.s_addr; + if (ifa->ifa_addr->sa_family == AF_INET && + in.s_addr == ip->ip_src.s_addr ) { + m_freem(m); + return; + } + } + /* verify that the IP TTL is 255. */ if (ip->ip_ttl != CARP_DFLTTL) { carpstats.carps_badttl++; CARP_LOG("carp_input: received ttl %d != 255i on %s\n", ip->ip_ttl, - m->m_pkthdr.rcvif->if_xname); + ifp->if_xname); m_freem(m); return; } @@ -592,7 +612,7 @@ carpstats.carps_badlen++; CARP_LOG("carp_input: packet too short %d on %s\n", m->m_pkthdr.len, - m->m_pkthdr.rcvif->if_xname); + ifp->if_xname); m_freem(m); return; } @@ -609,7 +629,7 @@ if (carp_cksum(m, len - iplen)) { carpstats.carps_badsum++; CARP_LOG("carp_input: checksum failed on %s\n", - m->m_pkthdr.rcvif->if_xname); + ifp->if_xname); m_freem(m); return; } --------------050006090109070805040105--