From owner-freebsd-net@FreeBSD.ORG  Tue Aug  9 15:53:17 2005
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
X-Original-To: freebsd-net@freebsd.org
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id F386016A41F;
	Tue,  9 Aug 2005 15:53:16 +0000 (GMT) (envelope-from zec@icir.org)
Received: from xaqua.tel.fer.hr (xaqua.tel.fer.hr [161.53.19.25])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4267543D45;
	Tue,  9 Aug 2005 15:53:16 +0000 (GMT) (envelope-from zec@icir.org)
Received: by xaqua.tel.fer.hr (Postfix, from userid 20006)
	id B82849B6C4; Tue,  9 Aug 2005 17:55:29 +0200 (CEST)
Received: from [127.0.0.1] (imunes.tel.fer.hr [161.53.19.8])
	by xaqua.tel.fer.hr (Postfix) with ESMTP id 8FE6A9B763;
	Tue,  9 Aug 2005 17:55:20 +0200 (CEST)
From: Marko Zec <zec@icir.org>
To: freebsd-net@freebsd.org
Date: Tue, 9 Aug 2005 17:37:32 +0200
User-Agent: KMail/1.7.2
References: <1123040973.95445.TMDA@seddon.ca> <200508091104.06572.zec@icir.org>
	<42F8A487.67183CA6@freebsd.org>
In-Reply-To: <42F8A487.67183CA6@freebsd.org>
MIME-Version: 1.0
Content-Disposition: inline
Message-Id: <200508091737.32391.zec@icir.org>
Content-Type: text/plain;
  charset="iso-8859-2"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on xaqua.tel.fer.hr
X-Spam-Level: 
X-Spam-Status: No, score=-2.6 required=8.0 tests=BAYES_00 autolearn=ham 
	version=3.0.2
Cc: Andre Oppermann <andre@freebsd.org>
Subject: Stack virtualization (was: running out of mbufs?)
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Aug 2005 15:53:17 -0000

On Tuesday 09 August 2005 14:41, Andre Oppermann wrote:
> Marko Zec wrote:
> > On Monday 08 August 2005 18:47, Andre Oppermann wrote:
> > > Marko Zec wrote:
> > > > On Monday 08 August 2005 12:32, Andre Oppermann wrote:
...
> > > > > There is a patch doing that for FreeBSD 4.x.  However while
> > > > > interesting it is not the way to go.  You don't want to have
> > > > > multiple parallel stacks but just multiple routing tables and
> > > > > interface groups one per jail. This gives you the same
          ^^^^^^^^^^^^^^^^
> > > > > functionality as Cisco VRF but is far less intrusive to the
> > > > > kernel.
...
> I don't want to have non-global interface lists in the kernel.

But sooner or later you _will_ end up with some sort of non-global 
interface lists after all, just as you stated yourself at the beginning 
of this tread.  Of course one can still maintain all interfaces linked 
in one list and introduce another set of separated lists on per-stack 
basis which will be used to logically group interfaces into smaller 
sets, but that's really just a question of coding / design style.

...
> > > Having multiple stacks duplicates a lot of structures for each
> > > stack which don't have to be duplicated.  With your approach you
> > > need a new jail for every new stack.  In each jail you have to
> > > run a new instance of a routing daemon (if you do routing).  And
> > > it precludes having one routing daemon managing multiple routing
> > > tables.  While removing one limitation you create some new ones
> > > in addition to the complexity.
> >
> > Bemusingly, none of the above claims are true.
> >
> > A new jail for each network stack instance is NOT required.  Inside
> > the kernel what could be considered "per-jail" and per-network
> > stack structures are cleanly separated and independent.  In fact,
> > one can run multiple jails bound to a single network stack
> > instance, if desired.
>
> Ok.
>
> > Furthermore, a single process can simultaneously attach to multiple
> > network stacks, thus potentially allowing a single routing daemon
> > to manage multiple separated routing tables and interface groups. 
> > The entity that gets permanently bound to a network stack instance
> > is a socket and not a process. This translates to the capability of
> > a single process to open multiple sockets in multiple independent
> > stacks.  IMO, one particular strength of such an approach is that
> > it requires absolutely no extensions or modifications to the
> > existing routing socket API.
>
> The existing API should be modified, it is pretty out of date.
>
> > And finally, I'm wondering what structures exactly are you
> > referring to when you say that this approach "duplicates a lot of
> > structures for each stack which don't have to be duplicated"?  I
> > absolutely agree that the virtualization of the network stack
> > should be done as simple and non-intrusive as possible, but my
> > point is that it just cannot be done cleanly / properly without
> > taking some sacrifices in terms of the scope of minimum required
> > modifications.
>
> Multiple interface lists, vm zones, etc. as your FAQ spells out.

Multiple interface lists are a must, whether as a replacement (the way I 
did it) or as a supplement to a global interface list.  They cost 
nothing in terms of memory use, and greatly simplify the code and 
prevent potential performance and cross-stack-boundary-leaking 
pitfails.

For a long time my framework does _not_ use separate VM zones per 
network stack instance for storing PCBs.  True, the FAQ should probably 
be updated, but it already clearly stated my doubts whether separate VM 
zones were really needed, and the later experiments and working code 
proved they indeed weren't.  What still uses multiple VM zones is the 
TCP syncache code, and I agree it could most likely be reworked to use 
only a single global zone.

Any other offending structures? :-)

It looks like we might be converging in terms of what it takes to 
virtualize a network stack :-)

> Again, I think we are talking past each other right now and we have
> different solutions to different problem sets in mind (or already
> coded).  When I have my paper finished my vision and intentions
> should be more clear and then we can have the discussion on the
> merits of each approach and whether parts of each are complementary
> or converse.

OK, looking forward to it...

Cheers,

Marko