From owner-svn-src-projects@FreeBSD.ORG Tue Apr 29 06:18:07 2014 Return-Path: Delivered-To: svn-src-projects@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CF64E37E; Tue, 29 Apr 2014 06:18:07 +0000 (UTC) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B0D5A16D3; Tue, 29 Apr 2014 06:18:07 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.8/8.14.8) with ESMTP id s3T6I7As070005; Tue, 29 Apr 2014 06:18:07 GMT (envelope-from bryanv@svn.freebsd.org) Received: (from bryanv@localhost) by svn.freebsd.org (8.14.8/8.14.8/Submit) id s3T6I6UI069999; Tue, 29 Apr 2014 06:18:06 GMT (envelope-from bryanv@svn.freebsd.org) Message-Id: <201404290618.s3T6I6UI069999@svn.freebsd.org> From: Bryan Venteicher Date: Tue, 29 Apr 2014 06:18:06 +0000 (UTC) To: src-committers@freebsd.org, svn-src-projects@freebsd.org Subject: svn commit: r265084 - in projects/vxlan: sbin/ifconfig sys/modules sys/modules/if_vxlan sys/net X-SVN-Group: projects MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-projects@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "SVN commit messages for the src " projects" tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Apr 2014 06:18:08 -0000 Author: bryanv Date: Tue Apr 29 06:18:06 2014 New Revision: 265084 URL: http://svnweb.freebsd.org/changeset/base/265084 Log: Initial WIP snapshot of the vxlan pseudo device Currently only supports IPv4 point to point configurations. Generally works, but still rough around the edges. Remaining work includes: - Add multicast support. - Add IPv6 support. I think adding support for RFC6936 is a prerequisite. - Cleanup the ifconfig CLI and ioctl interfaces. Added: projects/vxlan/sbin/ifconfig/ifvxlan.c (contents, props changed) projects/vxlan/sys/modules/if_vxlan/ projects/vxlan/sys/modules/if_vxlan/Makefile (contents, props changed) projects/vxlan/sys/net/if_vxlan.c (contents, props changed) projects/vxlan/sys/net/if_vxlan.h (contents, props changed) Modified: projects/vxlan/sbin/ifconfig/Makefile projects/vxlan/sys/modules/Makefile Modified: projects/vxlan/sbin/ifconfig/Makefile ============================================================================== --- projects/vxlan/sbin/ifconfig/Makefile Tue Apr 29 06:15:21 2014 (r265083) +++ projects/vxlan/sbin/ifconfig/Makefile Tue Apr 29 06:18:06 2014 (r265084) @@ -30,6 +30,7 @@ SRCS+= ifmac.c # MAC support SRCS+= ifmedia.c # SIOC[GS]IFMEDIA support SRCS+= iffib.c # non-default FIB support SRCS+= ifvlan.c # SIOC[GS]ETVLAN support +SRCS+= ifvxlan.c # VXLAN support SRCS+= ifgre.c # GRE keys etc SRCS+= ifgif.c # GIF reversed header workaround Added: projects/vxlan/sbin/ifconfig/ifvxlan.c ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ projects/vxlan/sbin/ifconfig/ifvxlan.c Tue Apr 29 06:18:06 2014 (r265084) @@ -0,0 +1,458 @@ +/*- + * Copyright (c) 2014, Bryan Venteicher + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include + +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "ifconfig.h" + +#define sstosin(_ss) ((struct sockaddr_in *)(_ss)) +#define sintosa(_sin) ((struct sockaddr *)(_sin)) + +static struct ifvxlanparam params = { + .vxlp_vni = VXLAN_VNI_MAX, +}; + +static int +get_val(const char *cp, u_long *valp) +{ + char *endptr; + u_long val; + + errno = 0; + val = strtoul(cp, &endptr, 0); + if (cp[0] == '\0' || endptr[0] != '\0' || errno == ERANGE) + return (-1); + + *valp = val; + return (0); +} + +static int +do_cmd(int sock, u_long op, void *arg, size_t argsize, int set) +{ + struct ifdrv ifd; + + bzero(&ifd, sizeof(ifd)); + + strlcpy(ifd.ifd_name, ifr.ifr_name, sizeof(ifd.ifd_name)); + ifd.ifd_cmd = op; + ifd.ifd_len = argsize; + ifd.ifd_data = arg; + + return (ioctl(sock, set ? SIOCSDRVSPEC : SIOCGDRVSPEC, &ifd)); +} + +static int +vxlan_exists(int sock) +{ + struct ifvxlancfg cfg; + + bzero(&cfg, sizeof(cfg)); + + return (do_cmd(s, VXLAN_CMD_GET_CONFIG, &cfg, sizeof(cfg)) != -1); +} + +static void +vxlan_status(int s) +{ + struct ifvxlancfg cfg; + char src[NI_MAXHOST], dst[NI_MAXHOST]; + struct sockaddr_in *sin; + int vni, group; + uint16_t sport, dport; + + if (do_cmd(s, VXLAN_CMD_GET_CONFIG, &cfg, sizeof(cfg), 0) < 0) + return; + + vni = cfg.vxlc_vni; + + sin = sstosin(&cfg.vxlc_local_sa); + if (getnameinfo(sintosa(sin), sin->sin_len, src, sizeof(src), + NULL, 0, NI_NUMERICHOST) != 0) + src[0] = '\0'; + sport = ntohs(sin->sin_port); + + sin = sstosin(&cfg.vxlc_remote_sa); + if (getnameinfo(sintosa(sin), sin->sin_len, dst, sizeof(dst), + NULL, 0, NI_NUMERICHOST) != 0) + dst[0] = '\0'; + dport = ntohs(sin->sin_port); + group = IN_MULTICAST(sin->sin_addr.s_addr); + + printf("\tvxlan %d local %s:%u %s %s:%u\n", + vni, src, sport, group ? "group" : "remote", dst, dport); +} + +#define _LOCAL_ADDR46 \ + (VXLAN_PARAM_WITH_LOCAL_ADDR4 | VXLAN_PARAM_WITH_LOCAL_ADDR6) +#define _REMOTE_ADDR46 \ + (VXLAN_PARAM_WITH_REMOTE_ADDR4 | VXLAN_PARAM_WITH_REMOTE_ADDR6) + +static void +vxlan_verify_params(void) +{ + + if (params.vxlp_vni == VXLAN_VNI_MAX) + errx(1, "must specify a network identifier for vxlan create"); + if ((params.vxlp_with & _REMOTE_ADDR46) == 0) + errx(1, "must specify a remote or multicast group address"); + if ((params.vxlp_with & _LOCAL_ADDR46) == _LOCAL_ADDR46) + errx(1, "cannot specify both local IPv4 and IPv6 addresses"); + if ((params.vxlp_with & _REMOTE_ADDR46) == _REMOTE_ADDR46) + errx(1, "cannot specify both remote IPv4 and IPv6 addresses"); + if ((params.vxlp_with & VXLAN_PARAM_WITH_LOCAL_ADDR4 && + params.vxlp_with & VXLAN_PARAM_WITH_REMOTE_ADDR6) || + (params.vxlp_with & VXLAN_PARAM_WITH_LOCAL_ADDR6 && + params.vxlp_with & VXLAN_PARAM_WITH_REMOTE_ADDR4)) + errx(1, "cannot mix IPv4/IPv6 addresses"); +} + +#undef _LOCAL_ADDR46 +#undef _REMOTE_ADDR46 + +static void +vxlan_cb(int s, void *arg) +{ + + //vxlan_verify_params(); +} + +static void +vxlan_create(int s, struct ifreq *ifr) +{ + + vxlan_verify_params(); + + ifr->ifr_data = (caddr_t) ¶ms; + if (ioctl(s, SIOCIFCREATE2, ifr) < 0) + err(1, "SIOCIFCREATE2"); +} + +static +DECL_CMD_FUNC(setvxlan_vni, arg, d) +{ + u_long val; + + if (get_val(arg, &val) < 0 || val >= VXLAN_VNI_MAX) + errx(1, "invalid network identifier: %s", arg); + + params.vxlp_with |= VXLAN_PARAM_WITH_VNI; + params.vxlp_vni = val; +} + +static +DECL_CMD_FUNC(setvxlan_local, addr, d) +{ + struct addrinfo *ai; + struct sockaddr *sa; + int error; + + if ((error = getaddrinfo(addr, NULL, NULL, &ai)) != 0) + errx(1, "error in parsing local address string: %s", + gai_strerror(error)); + + sa = ai->ai_addr; + + switch (ai->ai_family) { +#ifdef INET + case AF_INET: { + struct in_addr addr = ((struct sockaddr_in *) sa)->sin_addr; + + if (IN_MULTICAST(addr.s_addr)) + errx(1, "local address cannot be multicast"); + + params.vxlp_with |= VXLAN_PARAM_WITH_LOCAL_ADDR4; + params.vxlp_local_in4 = addr; + break; + } +#endif +#ifdef INET6 + case AF_INET6: { + struct in6_addr *addr = &((struct sockaddr_in6 *)sa)->sin6_addr; + + if (IN6_IS_ADDR_MULTICAST(addr)) + errx(1, "local address cannot be multicast"); + + params.vxlp_with |= VXLAN_PARAM_WITH_LOCAL_ADDR6; + params.vxlp_local_in6 = *addr; + break; + } +#endif + default: + errx(1, "local address %s not supported", addr); + } + + freeaddrinfo(ai); +} + +static +DECL_CMD_FUNC(setvxlan_remote, addr, d) +{ + struct addrinfo *ai; + struct sockaddr *sa; + int error; + + if ((error = getaddrinfo(addr, NULL, NULL, &ai)) != 0) + errx(1, "error in parsing remote address string: %s", + gai_strerror(error)); + + sa = ai->ai_addr; + + switch (ai->ai_family) { +#ifdef INET + case AF_INET: { + struct in_addr addr = ((struct sockaddr_in *)sa)->sin_addr; + + if (IN_MULTICAST(addr.s_addr)) + errx(1, "remote address cannot be multicast"); + + params.vxlp_with |= VXLAN_PARAM_WITH_REMOTE_ADDR4; + params.vxlp_remote_in4 = addr; + break; + } +#endif +#ifdef INET6 + case AF_INET6: { + struct in6_addr *addr = &((struct sockaddr_in6 *)sa)->sin6_addr; + + if (IN6_IS_ADDR_MULTICAST(addr)) + errx(1, "remote address cannot be multicast"); + + params.vxlp_with |= VXLAN_PARAM_WITH_REMOTE_ADDR6; + params.vxlp_remote_in6 = *addr; + break; + } +#endif + default: + errx(1, "remote address %s not supported", addr); + } + + freeaddrinfo(ai); +} + +static +DECL_CMD_FUNC(setvxlan_group, addr, d) +{ + struct addrinfo *ai; + struct sockaddr *sa; + int error; + + if ((error = getaddrinfo(addr, NULL, NULL, &ai)) != 0) + errx(1, "error in parsing group address string: %s", + gai_strerror(error)); + + sa = ai->ai_addr; + + switch (ai->ai_family) { +#ifdef INET + case AF_INET: { + struct in_addr addr = ((struct sockaddr_in *)sa)->sin_addr; + + if (!IN_MULTICAST(addr.s_addr)) + errx(1, "group address must be multicast"); + + params.vxlp_with |= VXLAN_PARAM_WITH_REMOTE_ADDR4; + params.vxlp_remote_in4 = addr; + break; + } +#endif +#ifdef INET6 + case AF_INET6: { + struct in6_addr *addr = &((struct sockaddr_in6 *)sa)->sin6_addr; + + if (IN6_IS_ADDR_MULTICAST(addr)) + errx(1, "group address must be multicast"); + + params.vxlp_with |= VXLAN_PARAM_WITH_REMOTE_ADDR6; + params.vxlp_remote_in6 = *addr; + break; + } +#endif + default: + errx(1, "group address %s not supported", addr); + } + + freeaddrinfo(ai); +} + +static +DECL_CMD_FUNC(setvxlan_local_port, arg, d) +{ + u_long val; + + if (get_val(arg, &val) < 0 || val >= UINT16_MAX) + errx(1, "invalid local port: %s", arg); + + params.vxlp_with |= VXLAN_PARAM_WITH_LOCAL_PORT; + params.vxlp_local_port = val; +} + +static +DECL_CMD_FUNC(setvxlan_remote_port, arg, d) +{ + u_long val; + + if (get_val(arg, &val) < 0 || val >= UINT16_MAX) + errx(1, "invalid remote port: %s", arg); + + params.vxlp_with |= VXLAN_PARAM_WITH_REMOTE_PORT; + params.vxlp_remote_port = val; +} + +static +DECL_CMD_FUNC2(setvxlan_port_range, arg1, arg2) +{ + u_long min, max; + + if (get_val(arg1, &min) < 0 || min >= UINT16_MAX) + errx(1, "invalid port range minimum: %s", arg1); + if (get_val(arg2, &max) < 0 || max >= UINT16_MAX) + errx(1, "invalid port range maximum: %s", arg2); + if (max < min) + errx(1, "invalid port range"); + + params.vxlp_with |= VXLAN_PARAM_WITH_PORT_RANGE; + params.vxlp_min_port = min; + params.vxlp_max_port = max; +} + +static +DECL_CMD_FUNC(setvxlan_timeout, arg, d) +{ + u_long val; + + if (get_val(arg, &val) < 0 || (val & ~0xFFFFFFFF) != 0) + errx(1, "invalid timeout value: %s", arg); + + params.vxlp_with |= VXLAN_PARAM_WITH_FTABLE_TIMEOUT; + params.vxlp_ftable_timeout = val & 0xFFFFFFFF; +} + +static +DECL_CMD_FUNC(setvxlan_maxaddr, arg, d) +{ + u_long val; + + if (get_val(arg, &val) < 0 || (val & ~0xFFFFFFFF) != 0) + errx(1, "invalid maxaddr value: %s", arg); + + params.vxlp_with |= VXLAN_PARAM_WITH_FTABLE_MAX; + params.vxlp_ftable_max = val & 0xFFFFFFFF; +} + +static +DECL_CMD_FUNC(setvxlan_ttl, arg, d) +{ + u_long val; + + if (get_val(arg, &val) < 0 || val > 256) + errx(1, "invalid TTL value: %s", arg); + + params.vxlp_with |= VXLAN_PARAM_WITH_TTL; + params.vxlp_ttl = val; +} + +static +DECL_CMD_FUNC(setvxlan_learn, arg, d) +{ + + params.vxlp_with |= VXLAN_PARAM_WITH_NOLEARN; + params.vxlp_nolearn = !!d; +} + +static void +setvxlan_flush(const char *val, int d, int s, const struct afswtch *afp) +{ + struct ifvxlancmd cmd; + + bzero(&cmd, sizeof(cmd)); + if (d != 0) + cmd.vxlcmd_flags |= VXLAN_CMD_FLAG_FLUSH_ALL; + + if (do_cmd(s, VXLAN_CMD_FLUSH, &cmd, sizeof(cmd), 1) < 0) + err(1, "VXLAN_CMD_FLUSH"); +} + +static struct cmd vxlan_cmds[] = { + + DEF_CLONE_CMD_ARG("vni", setvxlan_vni), + DEF_CLONE_CMD_ARG("local", setvxlan_local), + DEF_CLONE_CMD_ARG("remote", setvxlan_remote), + DEF_CLONE_CMD_ARG("group", setvxlan_group), + DEF_CLONE_CMD_ARG("localport", setvxlan_local_port), + DEF_CLONE_CMD_ARG("remoteport", setvxlan_remote_port), + DEF_CLONE_CMD_ARG2("portrange", setvxlan_port_range), + DEF_CLONE_CMD_ARG("timeout", setvxlan_timeout), + DEF_CLONE_CMD_ARG("maxaddr", setvxlan_maxaddr), + DEF_CLONE_CMD_ARG("ttl", setvxlan_ttl), + DEF_CLONE_CMD("learn", 1, setvxlan_learn), + DEF_CLONE_CMD("-learn", 0, setvxlan_learn), + + DEF_CMD("flush", 0, setvxlan_flush), + DEF_CMD("flushall", 1, setvxlan_flush), +}; + +static struct afswtch af_vxlan = { + .af_name = "af_vxlan", + .af_af = AF_UNSPEC, + .af_other_status = vxlan_status, +}; + +static __constructor void +vxlan_ctor(void) +{ +#define N(a) (sizeof(a) / sizeof(a[0])) + size_t i; + + for (i = 0; i < N(vxlan_cmds); i++) + cmd_register(&vxlan_cmds[i]); + af_register(&af_vxlan); + callback_register(vxlan_cb, NULL); + clone_setdefcallback("vxlan", vxlan_create); +#undef N +} Modified: projects/vxlan/sys/modules/Makefile ============================================================================== --- projects/vxlan/sys/modules/Makefile Tue Apr 29 06:15:21 2014 (r265083) +++ projects/vxlan/sys/modules/Makefile Tue Apr 29 06:18:06 2014 (r265084) @@ -146,6 +146,7 @@ SUBDIR= \ if_tap \ if_tun \ if_vlan \ + if_vxlan \ ${_igb} \ ${_iir} \ ${_imgact_binmisc} \ Added: projects/vxlan/sys/modules/if_vxlan/Makefile ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ projects/vxlan/sys/modules/if_vxlan/Makefile Tue Apr 29 06:18:06 2014 (r265084) @@ -0,0 +1,10 @@ +# $FreeBSD$ + +.PATH: ${.CURDIR}/../../net + +KMOD= if_vxlan +SRCS= if_vxlan.c +SRCS+= opt_inet.h opt_inet6.h +#SRCS+= opt_vxlan.h + +.include Added: projects/vxlan/sys/net/if_vxlan.c ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ projects/vxlan/sys/net/if_vxlan.c Tue Apr 29 06:18:06 2014 (r265084) @@ -0,0 +1,2097 @@ +/*- + * Copyright (c) 2014, Bryan Venteicher + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include "opt_inet.h" +#include "opt_inet6.h" + +#include +__FBSDID("$FreeBSD$"); + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +struct vxlan_softc; +struct vxlan_ftable_entry; + +LIST_HEAD(vxlan_ftable_head, vxlan_ftable_entry); +LIST_HEAD(vxlan_softc_head, vxlan_softc); + +#define VXLAN_SO_VNI_HASH_SHIFT 5 +#define VXLAN_SO_VNI_HASH_SIZE (1 << VXLAN_SO_VNI_HASH_SHIFT) +#define VXLAN_SO_VNI_HASH(_vni) ((_vni) % VXLAN_SO_VNI_HASH_SIZE) + +struct vxlan_socket { + struct socket *vxlso_sock; + struct rwlock vxlso_lock; + u_int vxlso_refcnt; + union vxlan_sockaddr vxlso_laddr; + LIST_ENTRY(vxlan_socket) vxlso_entry; + /* Lookup hash from VNI to vxlan softc. */ + struct vxlan_softc_head vxlso_vni_hash[VXLAN_SO_VNI_HASH_SIZE]; +}; + +#define VXLAN_SO_RLOCK(_vso) rw_rlock(&(_vso)->vxlso_lock) +#define VXLAN_SO_RUNLOCK(_vso) rw_runlock(&(_vso)->vxlso_lock) +#define VXLAN_SO_WLOCK(_vso) rw_wlock(&(_vso)->vxlso_lock) +#define VXLAN_SO_WUNLOCK(_vso) rw_wunlock(&(_vso)->vxlso_lock) +#define VXLAN_SO_UNLOCK(_vso) rw_unlock(&(_vso)->vxlso_lock) +#define VXLAN_SO_LOCK_ASSERT(_vso) \ + rw_assert(&(_vso)->vxlso_lock, RA_LOCKED) +#define VXLAN_SO_LOCK_WASSERT(_vso) \ + rw_assert(&(_vso)->vxlso_lock, RA_WLOCKED) + +#define VXLAN_SO_ACQUIRE(_vso) refcount_acquire(&(_vso)->vxlso_refcnt) +#define VXLAN_SO_RELEASE(_vso) refcount_release(&(_vso)->vxlso_refcnt) + +struct vxlan_ftable_entry { + LIST_ENTRY(vxlan_ftable_entry) vxlfe_hash; + uint16_t vxlfe_flags; + uint8_t vxlfe_mac[ETHER_ADDR_LEN]; + union vxlan_sockaddr vxlfe_raddr; + time_t vxlfe_expire; +}; + +#define VXLAN_FE_FLAG_DYNAMIC 0x01 +#define VXLAN_FE_FLAG_STATIC 0x02 + +#define VXLAN_FE_IS_DYNAMIC(_fe) \ + ((_fe)->vxlfe_flags & VXLAN_FE_FLAG_DYNAMIC) + +#define VXLAN_SC_FTABLE_SHIFT 8 +#define VXLAN_SC_FTABLE_SIZE (1 << VXLAN_SC_FTABLE_SHIFT) +#define VXLAN_SC_FTABLE_MASK (VXLAN_SC_FTABLE_SIZE - 1) +#define VXLAN_SC_FTABLE_HASH(_sc, _mac) \ + (vxlan_mac_hash(_sc, _mac) % VXLAN_SC_FTABLE_SIZE) + +struct vxlan_statistics { + uint32_t ftable_lock_upgrade; + +}; + +struct vxlan_softc { + struct ifnet *vxl_ifp; + struct vxlan_socket *vxl_sock; + struct rwlock vxl_lock; + volatile u_int vxl_refcnt; + union vxlan_sockaddr vxl_src_addr; + union vxlan_sockaddr vxl_dst_addr; + uint32_t vxl_vni; + uint32_t vxl_flags; +#define VXLAN_FLAG_INITING 0x0001 +#define VXLAN_FLAG_RELEASED 0x0002 +#define VXLAN_FLAG_NOLEARN 0x0004 + + uint32_t vxl_last_port_hash; + uint16_t vxl_min_port; + uint16_t vxl_max_port; + uint8_t vxl_ttl; + + /* Lookup table from MAC address to forwarding entry. */ + uint32_t vxl_ftable_cnt; + uint32_t vxl_ftable_max; + uint32_t vxl_ftable_nospace; + uint32_t vxl_ftable_timeout; + uint32_t vxl_ftable_hash_key; + struct vxlan_ftable_head *vxl_ftable; + + /* Derived from vxl_dst_addr. */ + struct vxlan_ftable_entry vxl_default_fe; + + struct vxlan_statistics vxl_stats; + int vxl_unit; + struct callout vxl_callout; + uint8_t vxl_hwaddr[ETHER_ADDR_LEN]; + LIST_ENTRY(vxlan_softc) vxl_entry; +}; + +#define VXLAN_RLOCK(_sc) rw_rlock(&(_sc)->vxl_lock) +#define VXLAN_RUNLOCK(_sc) rw_runlock(&(_sc)->vxl_lock) +#define VXLAN_WLOCK(_sc) rw_wlock(&(_sc)->vxl_lock) +#define VXLAN_WUNLOCK(_sc) rw_wunlock(&(_sc)->vxl_lock) +#define VXLAN_UNLOCK(_sc) rw_unlock(&(_sc)->vxl_lock) +#define VXLAN_LOCK_UPGRADE(_sc) rw_try_upgrade(&(_sc)->vxl_lock) +#define VXLAN_LOCK_WOWNED(_sc) rw_wowned(&(_sc)->vxl_lock) +#define VXLAN_LOCK_ASSERT(_sc) rw_assert(&(_sc)->vxl_lock, RA_LOCKED) +#define VXLAN_LOCK_WASSERT(_sc) rw_assert(&(_sc)->vxl_lock, RA_WLOCKED) + +#define VXLAN_ACQUIRE(_sc) refcount_acquire(&(_sc)->vxl_refcnt) +#define VXLAN_RELEASE(_sc) refcount_release(&(_sc)->vxl_refcnt) + +struct vxlanudphdr { + struct udphdr vxlh_udp; + struct vxlan_header vxlh_hdr; +} __packed; + +static int vxlan_ftable_addr_cmp(const uint8_t *, const uint8_t *); +static void vxlan_ftable_init(struct vxlan_softc *); +static void vxlan_ftable_fini(struct vxlan_softc *); +static void vxlan_ftable_flush(struct vxlan_softc *, int); +static struct vxlan_ftable_entry * + vxlan_ftable_entry_alloc(void); +static void vxlan_ftable_entry_init(struct vxlan_softc *, + struct vxlan_ftable_entry *, const uint8_t *, + const struct sockaddr *, uint32_t); +static void vxlan_ftable_entry_free(struct vxlan_ftable_entry *); +static struct vxlan_ftable_entry * + vxlan_ftable_entry_lookup(struct vxlan_softc *, + const uint8_t *); +static int vxlan_ftable_entry_insert(struct vxlan_softc *, + struct vxlan_ftable_entry *); +static void vxlan_ftable_entry_destroy(struct vxlan_softc *, + struct vxlan_ftable_entry *fe); +static int vxlan_ftable_update(struct vxlan_softc *, + const struct sockaddr *, const uint8_t *); +static void vxlan_ftable_expire(struct vxlan_softc *); + +static struct vxlan_socket * + vxlan_socket_alloc(const union vxlan_sockaddr *); +static void vxlan_socket_destroy(struct vxlan_socket *); +static int vxlan_socket_create(struct ifnet *, + const union vxlan_sockaddr *, struct vxlan_socket **); +static int vxlan_socket_insert_softc(struct vxlan_socket *, + struct vxlan_softc *); +static struct vxlan_softc * + vxlan_socket_lookup_softc_locked(struct vxlan_socket *, + uint32_t); +static struct vxlan_softc * + vxlan_socket_lookup_softc(struct vxlan_socket *, uint32_t); + +static int vxlan_valid_init_config(struct vxlan_softc *); +static void vxlan_init(void *); +static void vxlan_stop(struct vxlan_softc *); +static void vxlan_release_socket(struct vxlan_softc *); +static void vxlan_release(struct vxlan_softc *); +static void vxlan_timer(void *); +static uint16_t vxlan_pick_source_port(struct vxlan_softc *, + const struct ether_header *); +static void vxlan_encap_header(struct vxlan_softc *, struct mbuf *, + int, uint16_t, uint16_t); +static int vxlan_encap4(struct vxlan_softc *, + const union vxlan_sockaddr *, struct mbuf *); +static int vxlan_transmit(struct ifnet *, struct mbuf *); +static void vxlan_qflush(struct ifnet *); + +static void vxlan_rcv_udp_packet(struct mbuf *, int, struct inpcb *, + const struct sockaddr *); +static int vxlan_input(struct vxlan_socket *, uint32_t, struct mbuf **, + const struct sockaddr *); + +static int vxlan_ctrl_get_config(struct vxlan_softc *, void *); +static int vxlan_ctrl_set_vni(struct vxlan_softc *, void *); +static int vxlan_ctrl_set_local_addr(struct vxlan_softc *, void *); +static int vxlan_ctrl_set_remote_addr(struct vxlan_softc *, void *); +static int vxlan_ctrl_set_local_port(struct vxlan_softc *, void *); +static int vxlan_ctrl_set_remote_port(struct vxlan_softc *, void *); +static int vxlan_ctrl_flush(struct vxlan_softc *, void *); +static int vxlan_ctrl_ftable_timeout(struct vxlan_softc *, void *); +static int vxlan_ctrl_ftable_max(struct vxlan_softc *, void *); +static int vxlan_ctrl_ftable_entry_add(struct vxlan_softc *, void *); +static int vxlan_ctrl_ftable_entry_rem(struct vxlan_softc *, void *); +static int vxlan_ctrl_ttl(struct vxlan_softc *, void *); +static int vxlan_ctrl_learn(struct vxlan_softc *, void *); +static int vxlan_ioctl_drvspec(struct vxlan_softc *, + struct ifdrv *, int); +static int vxlan_ioctl_ifflags(struct vxlan_softc *); +static int vxlan_ioctl(struct ifnet *, u_long, caddr_t); + +static void vxlan_set_default_config(struct vxlan_softc *); +static int vxlan_set_user_config(struct vxlan_softc *, + struct ifvxlanparam *); +static int vxlan_clone_create(struct if_clone *, int, caddr_t); +static void vxlan_clone_destroy(struct ifnet *); + +static uint32_t vxlan_mac_hash(struct vxlan_softc *, const uint8_t *); +static void vxlan_fakeaddr(struct vxlan_softc *); +static int vxlan_sockaddr_cmp(const union vxlan_sockaddr *, + const struct sockaddr *); +static void vxlan_sockaddr_copy(union vxlan_sockaddr *, + const struct sockaddr *); +static int vxlan_sockaddr_in_equal(const union vxlan_sockaddr *, + const struct sockaddr *); +static void vxlan_sockaddr_in_copy(union vxlan_sockaddr *, + const struct sockaddr *); +static int vxlan_sockaddr_in_any(const union vxlan_sockaddr *); +static int vxlan_sockaddr_in_multicast(const union vxlan_sockaddr *); + +static int vxlan_initing_or_running(struct vxlan_softc *); +static int vxlan_check_vni(uint32_t); +static int vxlan_check_ttl(int); +static int vxlan_check_ftable_timeout(uint32_t); +static int vxlan_check_ftable_max(uint32_t); + +static int vxlan_tunable_int(struct vxlan_softc *, const char *, int); + +SYSCTL_DECL(_net_link); +SYSCTL_NODE(_net_link, OID_AUTO, vxlan, CTLFLAG_RW, 0, + "Virtual eXtensible Local Area Network"); + +static int vxlan_legacy_port = 0; +TUNABLE_INT("net.link.vxlan.legacy_port", &vxlan_legacy_port); + +static const char vxlan_name[] = "vxlan"; +static MALLOC_DEFINE(M_VXLAN, vxlan_name, + "Virtual eXtensible LAN Interface"); +static struct if_clone *vxlan_cloner; +static struct mtx vxlan_list_mtx; +static LIST_HEAD(, vxlan_socket) vxlan_socket_list; + +/* Default maximum number of addresses in the forwarding table. */ +#ifndef VXLAN_FTABLE_MAX +#define VXLAN_FTABLE_MAX 2000 +#endif + +/* Timeout (in seconds) of addresses learned in the forwarding table. */ +#ifndef VXLAN_FTABLE_TIMEOUT +#define VXLAN_FTABLE_TIMEOUT (20 * 60) +#endif + +/* Number of seconds between pruning attempts of the forwarding table. */ +#ifndef VXLAN_FTABLE_PRUNE +#define VXLAN_FTABLE_PRUNE (5 * 60) +#endif + +static int vxlan_ftable_prune_period = VXLAN_FTABLE_PRUNE; + +struct vxlan_control { + int (*vxlc_func)(struct vxlan_softc *, void *); + int vxlc_argsize; + int vxlc_flags; +#define VXLAN_CTRL_FLAG_COPYIN 0x01 +#define VXLAN_CTRL_FLAG_COPYOUT 0x02 +#define VXLAN_CTRL_FLAG_SUSER 0x04 +}; + +static const struct vxlan_control vxlan_control_table[] = { + [VXLAN_CMD_GET_CONFIG] = + { vxlan_ctrl_get_config, + sizeof(struct ifvxlancfg), + VXLAN_CTRL_FLAG_COPYOUT + }, + + [VXLAN_CMD_SET_VNI] = + { vxlan_ctrl_set_vni, + sizeof(struct ifvxlancfg), + VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER, + }, + + [VXLAN_CMD_SET_LOCAL_ADDR] = + { vxlan_ctrl_set_local_addr, + sizeof(struct ifvxlancfg), + VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER, + }, + + [VXLAN_CMD_SET_REMOTE_ADDR] = + { vxlan_ctrl_set_remote_addr, + sizeof(struct ifvxlancfg), + VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER, + }, + + [VXLAN_CMD_SET_LOCAL_PORT] = + { vxlan_ctrl_set_local_port, + sizeof(struct ifvxlancfg), + VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER, + }, + + [VXLAN_CMD_SET_REMOTE_PORT] = + { vxlan_ctrl_set_remote_port, + sizeof(struct ifvxlancfg), + VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER, + }, + + [VXLAN_CMD_FLUSH] = + { vxlan_ctrl_flush, + sizeof(struct ifvxlancmd), + VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER, + }, + + [VXLAN_CMD_FTABLE_TIMEOUT] = + { vxlan_ctrl_ftable_timeout, + sizeof(struct ifvxlancmd), + VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER, + }, + + [VXLAN_CMD_FTABLE_MAX] = + { vxlan_ctrl_ftable_max, + sizeof(struct ifvxlancmd), + VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER, + }, + + [VXLAN_CMD_FTABLE_ENTRY_ADD] = + { vxlan_ctrl_ftable_entry_add, + sizeof(struct ifvxlancmd), + VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER, + }, + + [VXLAN_CMD_FTABLE_ENTRY_REM] = + { vxlan_ctrl_ftable_entry_rem, + sizeof(struct ifvxlancmd), + VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER, + }, + + [VXLAN_CMD_TTL] = + { vxlan_ctrl_ttl, + sizeof(struct ifvxlancmd), + VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER, + }, + + [VXLAN_CMD_LEARN] = + { vxlan_ctrl_learn, + sizeof(struct ifvxlancmd), + VXLAN_CTRL_FLAG_COPYIN | VXLAN_CTRL_FLAG_SUSER, + }, +}; + +static const int vxlan_control_table_size = nitems(vxlan_control_table); + +static int +vxlan_ftable_addr_cmp(const uint8_t *a, const uint8_t *b) +{ + int i, d; + + /* Same MAC comparison as done in if_bridge. */ + for (i = 0, d = 0; i < ETHER_ADDR_LEN && d == 0; i++) + d = ((int)a[i]) - ((int)b[i]); + + return (d); +} + +static void +vxlan_ftable_init(struct vxlan_softc *sc) +{ + static const uint8_t mac[ETHER_ADDR_LEN]; + int i; + + sc->vxl_ftable = malloc(sizeof(struct vxlan_ftable_head) * + VXLAN_SC_FTABLE_SIZE, M_VXLAN, M_ZERO | M_WAITOK); + + for (i = 0; i < VXLAN_SC_FTABLE_SIZE; i++) + LIST_INIT(&sc->vxl_ftable[i]); + sc->vxl_ftable_hash_key = arc4random(); + + vxlan_ftable_entry_init(sc, &sc->vxl_default_fe, mac, + &sc->vxl_dst_addr.sa, VXLAN_FE_FLAG_STATIC); +} + +static void +vxlan_ftable_fini(struct vxlan_softc *sc) +{ + int i; + + for (i = 0; i < VXLAN_SC_FTABLE_SIZE; i++) + KASSERT(LIST_EMPTY(&sc->vxl_ftable[i]), + ("%s: vxlan %p ftable[%d] not empty", __func__, sc, i)); + MPASS(sc->vxl_ftable_cnt == 0); + + free(sc->vxl_ftable, M_VXLAN); + sc->vxl_ftable = NULL; +} + +static void +vxlan_ftable_flush(struct vxlan_softc *sc, int all) +{ + struct vxlan_ftable_entry *fe, *tfe; + int i; + + for (i = 0; i < VXLAN_SC_FTABLE_SIZE; i++) { + LIST_FOREACH_SAFE(fe, &sc->vxl_ftable[i], vxlfe_hash, tfe) { + if (all || VXLAN_FE_IS_DYNAMIC(fe)) + vxlan_ftable_entry_destroy(sc, fe); + } + } +} + +static void +vxlan_ftable_expire(struct vxlan_softc *sc) +{ + struct vxlan_ftable_entry *fe, *tfe; + int i; + + VXLAN_LOCK_WASSERT(sc); + + for (i = 0; i < VXLAN_SC_FTABLE_SIZE; i++) { + LIST_FOREACH_SAFE(fe, &sc->vxl_ftable[i], vxlfe_hash, tfe) { + if (VXLAN_FE_IS_DYNAMIC(fe) && + time_uptime >= fe->vxlfe_expire) + vxlan_ftable_entry_destroy(sc, fe); + } + } +} + +static int +vxlan_ftable_update_locked(struct vxlan_softc *sc, const struct sockaddr *sa, + const uint8_t *mac) +{ + union vxlan_sockaddr vxlsa; + struct vxlan_ftable_entry *fe; + int error; + + VXLAN_LOCK_ASSERT(sc); + +again: + /* + * A forwarding entry for this MAC address might already exist. If + * so, update it, otherwise create a new one. We may have to upgrade + * the lock if we have to change or create an entry. + */ + fe = vxlan_ftable_entry_lookup(sc, mac); + if (fe != NULL) { + /* Accept the race if we only hold the read lock. */ + fe->vxlfe_expire = time_uptime + sc->vxl_ftable_timeout; + + if (!VXLAN_FE_IS_DYNAMIC(fe)) + return (0); + if (vxlan_sockaddr_in_equal(&fe->vxlfe_raddr, sa)) + return (0); + if (!VXLAN_LOCK_WOWNED(sc) && VXLAN_LOCK_UPGRADE(sc) == 0) { + VXLAN_RUNLOCK(sc); *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***