From owner-freebsd-transport@freebsd.org  Fri Dec 18 22:26:25 2015
Return-Path: <owner-freebsd-transport@freebsd.org>
Delivered-To: freebsd-transport@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 89127A4C5C5
 for <freebsd-transport@mailman.ysv.freebsd.org>;
 Fri, 18 Dec 2015 22:26:25 +0000 (UTC)
 (envelope-from rysto32@gmail.com)
Received: from mail-io0-x230.google.com (mail-io0-x230.google.com
 [IPv6:2607:f8b0:4001:c06::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 65C261E3F
 for <freebsd-transport@freebsd.org>; Fri, 18 Dec 2015 22:26:22 +0000 (UTC)
 (envelope-from rysto32@gmail.com)
Received: by mail-io0-x230.google.com with SMTP id 186so105734343iow.0
 for <freebsd-transport@freebsd.org>; Fri, 18 Dec 2015 14:26:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:date:message-id:subject:from:to:content-type;
 bh=TxHgQJ6M1Ikw0cPR7L7hIWVL9NhJA4WsPV96obAhRUk=;
 b=vHjKJtPmqNB7uGNO8EsUv/zSaS6xOpu+orrUBfX5nH7Pp8Z0xC1dWbA7HHHNguHVE4
 5ptRPLzDXxGbZkM/W4M2TjjOl709vgs2p8YhfZilYIAKJk2LRdPsed6M2Jgt8reQsFtc
 rKvIpfG7Ce8kVUxeufHjj4QYt/JtBSo8p/htqndH5ZSu1OkWVeexDTDSTyilOnF7/dpO
 J5kLD91XBMjY+vy3ASKlqfDwHr2Y+obsQj26MFxov/ID6QAVmJeL/RoV59kC7+hmm3/Z
 ZT+e8LGZSILX2TJBGAqFHFGObmAQawFIjhDWFjqE2glW7yBsHOyJcFJTInP4lhN9Cv7O
 FUeA==
MIME-Version: 1.0
X-Received: by 10.107.30.209 with SMTP id e200mr7603325ioe.113.1450477581863; 
 Fri, 18 Dec 2015 14:26:21 -0800 (PST)
Received: by 10.107.163.202 with HTTP; Fri, 18 Dec 2015 14:26:21 -0800 (PST)
Date: Fri, 18 Dec 2015 17:26:21 -0500
Message-ID: <CAFMmRNxVUDNQ-H=r24iOQOAbnvXi17s77HC-ap+4_K1AHEbSvA@mail.gmail.com>
Subject: Extending FIBs to support multi-tenancy
From: Ryan Stone <rysto32@gmail.com>
To: freebsd-transport@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-transport@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussions of transport level network protocols in FreeBSD
 <freebsd-transport.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-transport>, 
 <mailto:freebsd-transport-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-transport/>
List-Post: <mailto:freebsd-transport@freebsd.org>
List-Help: <mailto:freebsd-transport-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-transport>, 
 <mailto:freebsd-transport-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Dec 2015 22:26:25 -0000

My employer is going through the process of extending our product to
support multi-tenant networking.  The details of what are product does
isn't really relevant to the discussion -- it's enough to know that we have
a number of daemons acting as servers for various network protocols.
Multi-tenacy, as we've defined the feature, imposes the requirement that
our network services be able to communicate with clients from completely
independent networks. This has imposed the following new requirements on us:

- different tenant networks may have different DNS servers
- they may use different AAA mechanisms (e.g. one tenant uses LDAP, another
uses Kerberos, and a third also uses LDAP but a different server)
- they may use independent routing tables
- different tenant networks may use overlapping IP ranges (so we might see
two different clients from two different tenant networks with the IP
192.168.0.1, for instance)
- traffic from different tenant networks is not guaranteed to be segregated
in any way -- it might all come in the same network interface, without any
vlan tagging or any other encapsulation that might differentiate tenant
networks
- we need to scale to thousands of tenant networks
- we will impose the requirement that our system can't be assigned the same
IP address for different tenant networks

Our intention is to use the destination IP address of incoming packets to
determine which tenant network the packet came from (hence the requirement
of not allowing the same IP to be configured for different tenant
networks)..

The obvious tool for meeting these requirements in FreeBSD is VIMAGE.
However, we have prototyped that approach already, and I have been told
that we discovered that we found this will not scale to thousands of
networks.  I don't have all of the details as to why, but the root of the
problem is that any given process can only be associated with a single vnet
instance.  In our current architecture, we can't have thousands of
instances of each network service running (and I'm not sure that we really
could: if we support A services, B tenant networks and C CPU cores, we
would need a minimum of A * B * C threads to ensure that any given service
on any single tenant network could fully utilize the system's resources to
process requests).


We're instead looking at using FIBs to implement the routing table
requirement.  To meet our requirements, we're expecting to have to make
three important chances to how FIBs are managed:

1) Allow listening sockets to be wildcarded across FIBs
2) Make FIBs a property of an interface address, not the interface
3) Allow each thread to set a default FIB that will be applied to newly
created sockets

1)
We don't really want to change all of our services to instantiate one
listening socket for every tenant network.  Instead we're looking at
implementing (and upstreaming) a kernel extension that allows a listening
socket to be wildcarded across all FIBs (note: yesterday I described this
feature as allowing us to pick-and-choose FIBs, but people internally have
convinced me that a wildcard match would make their lives significantly
easier).  When a new connection attempt to a listening socket in this mode
is accepted, the socket would not inherit its FIB from the listening
socket.  Instead, it would be set based on the local IP address of the
connection.

2)
Currently, FIBs are a property of an interface (struct ifnet).  We aren't
very enthusiastic about the prospect of having to create thousands of
interfaces to support thousands of network interfaces.  We would instead
like to make the FIB a property of the interface address. For backwards
compatibility reasons I would still let admins set a FIB on an ifnet, but
instead that would be the default FIB assigned to addresses that aren't
explicitly assigned a FIB.  That should maintain the current behaviour
while making it easy to push FIBs down into the address.

3)
The idea of a per-thread FIB has gotten the most pushback so far, and I
understand the objection.  I'll explain the problem that we're trying to
solve with this.  When a new request comes in, we may need to perform
authentication through LDAP or Kerberos.  The problem is that the existing
open-source implementations that we are using manage sockets directly.  We
really don't want to have to go through them and make their APIs entirely
FIB-aware -- that is far too much churn.  By moving awareness of the
current FIB into the kernel, existing calls to socket() can do the right
thing transparently.

We're not entirely happy with the solution, but the "right" way to solve
the problem involves rototilling a number of libraries.  Even if we could
convince the upstream projects to take patches, it's far more work than
we're willing to take on.


We're planning on doing the work ourselves, but we feel that coming up with
a solution that FreeBSD is comfortable taking into head is critical.  We
really don't want to be carrying around diffs from upstream in such a
critical part of the network stack, so I am hoping that we can come to an
agreement on the best path forward for both sides.  If anybody has any
comments, concerns or objections to any of this, please pipe up now rather
than after I've implemented all of this.

Thanks,
Ryan