From owner-freebsd-transport@freebsd.org Fri Dec 18 22:26:25 2015 Return-Path: Delivered-To: freebsd-transport@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 89127A4C5C5 for ; Fri, 18 Dec 2015 22:26:25 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-io0-x230.google.com (mail-io0-x230.google.com [IPv6:2607:f8b0:4001:c06::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 65C261E3F for ; Fri, 18 Dec 2015 22:26:22 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: by mail-io0-x230.google.com with SMTP id 186so105734343iow.0 for ; Fri, 18 Dec 2015 14:26:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=TxHgQJ6M1Ikw0cPR7L7hIWVL9NhJA4WsPV96obAhRUk=; b=vHjKJtPmqNB7uGNO8EsUv/zSaS6xOpu+orrUBfX5nH7Pp8Z0xC1dWbA7HHHNguHVE4 5ptRPLzDXxGbZkM/W4M2TjjOl709vgs2p8YhfZilYIAKJk2LRdPsed6M2Jgt8reQsFtc rKvIpfG7Ce8kVUxeufHjj4QYt/JtBSo8p/htqndH5ZSu1OkWVeexDTDSTyilOnF7/dpO J5kLD91XBMjY+vy3ASKlqfDwHr2Y+obsQj26MFxov/ID6QAVmJeL/RoV59kC7+hmm3/Z ZT+e8LGZSILX2TJBGAqFHFGObmAQawFIjhDWFjqE2glW7yBsHOyJcFJTInP4lhN9Cv7O FUeA== MIME-Version: 1.0 X-Received: by 10.107.30.209 with SMTP id e200mr7603325ioe.113.1450477581863; Fri, 18 Dec 2015 14:26:21 -0800 (PST) Received: by 10.107.163.202 with HTTP; Fri, 18 Dec 2015 14:26:21 -0800 (PST) Date: Fri, 18 Dec 2015 17:26:21 -0500 Message-ID: Subject: Extending FIBs to support multi-tenancy From: Ryan Stone To: freebsd-transport@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Dec 2015 22:26:25 -0000 My employer is going through the process of extending our product to support multi-tenant networking. The details of what are product does isn't really relevant to the discussion -- it's enough to know that we have a number of daemons acting as servers for various network protocols. Multi-tenacy, as we've defined the feature, imposes the requirement that our network services be able to communicate with clients from completely independent networks. This has imposed the following new requirements on us: - different tenant networks may have different DNS servers - they may use different AAA mechanisms (e.g. one tenant uses LDAP, another uses Kerberos, and a third also uses LDAP but a different server) - they may use independent routing tables - different tenant networks may use overlapping IP ranges (so we might see two different clients from two different tenant networks with the IP 192.168.0.1, for instance) - traffic from different tenant networks is not guaranteed to be segregated in any way -- it might all come in the same network interface, without any vlan tagging or any other encapsulation that might differentiate tenant networks - we need to scale to thousands of tenant networks - we will impose the requirement that our system can't be assigned the same IP address for different tenant networks Our intention is to use the destination IP address of incoming packets to determine which tenant network the packet came from (hence the requirement of not allowing the same IP to be configured for different tenant networks).. The obvious tool for meeting these requirements in FreeBSD is VIMAGE. However, we have prototyped that approach already, and I have been told that we discovered that we found this will not scale to thousands of networks. I don't have all of the details as to why, but the root of the problem is that any given process can only be associated with a single vnet instance. In our current architecture, we can't have thousands of instances of each network service running (and I'm not sure that we really could: if we support A services, B tenant networks and C CPU cores, we would need a minimum of A * B * C threads to ensure that any given service on any single tenant network could fully utilize the system's resources to process requests). We're instead looking at using FIBs to implement the routing table requirement. To meet our requirements, we're expecting to have to make three important chances to how FIBs are managed: 1) Allow listening sockets to be wildcarded across FIBs 2) Make FIBs a property of an interface address, not the interface 3) Allow each thread to set a default FIB that will be applied to newly created sockets 1) We don't really want to change all of our services to instantiate one listening socket for every tenant network. Instead we're looking at implementing (and upstreaming) a kernel extension that allows a listening socket to be wildcarded across all FIBs (note: yesterday I described this feature as allowing us to pick-and-choose FIBs, but people internally have convinced me that a wildcard match would make their lives significantly easier). When a new connection attempt to a listening socket in this mode is accepted, the socket would not inherit its FIB from the listening socket. Instead, it would be set based on the local IP address of the connection. 2) Currently, FIBs are a property of an interface (struct ifnet). We aren't very enthusiastic about the prospect of having to create thousands of interfaces to support thousands of network interfaces. We would instead like to make the FIB a property of the interface address. For backwards compatibility reasons I would still let admins set a FIB on an ifnet, but instead that would be the default FIB assigned to addresses that aren't explicitly assigned a FIB. That should maintain the current behaviour while making it easy to push FIBs down into the address. 3) The idea of a per-thread FIB has gotten the most pushback so far, and I understand the objection. I'll explain the problem that we're trying to solve with this. When a new request comes in, we may need to perform authentication through LDAP or Kerberos. The problem is that the existing open-source implementations that we are using manage sockets directly. We really don't want to have to go through them and make their APIs entirely FIB-aware -- that is far too much churn. By moving awareness of the current FIB into the kernel, existing calls to socket() can do the right thing transparently. We're not entirely happy with the solution, but the "right" way to solve the problem involves rototilling a number of libraries. Even if we could convince the upstream projects to take patches, it's far more work than we're willing to take on. We're planning on doing the work ourselves, but we feel that coming up with a solution that FreeBSD is comfortable taking into head is critical. We really don't want to be carrying around diffs from upstream in such a critical part of the network stack, so I am hoping that we can come to an agreement on the best path forward for both sides. If anybody has any comments, concerns or objections to any of this, please pipe up now rather than after I've implemented all of this. Thanks, Ryan