Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Aug 2013 20:19:40 +0200
From:      Andre Oppermann <andre@freebsd.org>
To:        Adrian Chadd <adrian@freebsd.org>
Cc:        Barney Cordoba <barney_cordoba@yahoo.com>, Luigi Rizzo <rizzo@iet.unipi.it>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux)
Message-ID:  <521504BC.6030205@freebsd.org>
In-Reply-To: <CAJ-Vmokpvvis-vvtSiQXzk_UXwKwf9kEex_6J6Vb0Y8nSF0QGw@mail.gmail.com>
References:  <520A6D07.5080106@freebsd.org> <520AFBE8.1090109@freebsd.org> <520B24A0.4000706@freebsd.org> <520B3056.1000804@freebsd.org> <20130814102109.GA63246@onelab2.iet.unipi.it> <1376745244.6575.YahooMailNeo@web121606.mail.ne1.yahoo.com> <1376748170.66110.YahooMailNeo@web121601.mail.ne1.yahoo.com> <CAJ-VmonGeqn5qqbfvF9xWaFPYNMNSVb6VwMx%2BoEVSGXVid98ag@mail.gmail.com> <1376833738.94737.YahooMailNeo@web121605.mail.ne1.yahoo.com> <71EA3DFB-B410-432D-98E0-B6341556BE6D@netgate.com> <CAJ-Vmo=0OX=_6cO_pZ45XrvfQzb%2BNVms00LUo5oRriZWUMBx%2Bg@mail.gmail.com> <1376851152.3322.YahooMailNeo@web121606.mail.ne1.yahoo.com> <CAJ-VmokPhxAe1CAVqfKDJhssqg0VaUZT4hRPNB9gigECebh7VA@mail.gmail.com> <1376859717.20232.YahooMailNeo@web121605.mail.ne1.yahoo.com> <CA%2BhQ2%2Bips=QUeyK3bwMQhc8yPavMzd3i-3YDjksy4hEVNBR%2BXA@mail.gmail.com> <CAJ-Vmokpvvis-vvtSiQXzk_UXwKwf9kEex_6J6Vb0Y8nSF0QGw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 18.08.2013 23:54, Adrian Chadd wrote:
> Hi,
>
> I think the "UNIX architecture" is a bit broken for anything other than the
> occasional (for various traffic levels defining "occasional!") traffic
> connection. It's serving us well purely through the sheer force of will of
> modern CPU power but I think we can do a lot better.

I do not agree with you here.  The UNIX architecture is fine but of course
as with anything you're not going to get the full raw and theoretically
possible performance for every special case out of it.  It is extremely
versatile and performs rather good over a broad set of applications.

> _I_ think the correct model is a netmap model - batched packet handling,
> lightweight drivers pushing and pulling batches of things, with some
> lightweight plugins to service that inside the kernel and/or push into the
> netmap ring buffer in userland. Interfacing into the ethernet and socket
> layer should be something that bolts on the side, kind of netgraph style.
> It would likely look a lot more like a switching backplane with socket IO
> being one of many processing possibilities. If socket IO stays packet at a
> time than great; but that's messing up the ability to do a lot of other
> interesting things.

Not really.  While netmap is really good at pushing packets (on x86 cache
coherent architectures only I may add) it fails miserably as general "socket"
layer.

On the receive side it has a fixed buffer pool and would grind to a halt if
you were directly using those for TCP receive socket buffers if the tcp
application stops immediately processing every packet.  This means you have
to copy the packet contents from the NIC DMA pool to some other allocated
memory to prevent that.

It doesn't have any security model and isn't really multi-app aware.  How
do you multiplex a number of protocols and connections to different
applications?  Copy through shared memory?

You'd have to re-implement the entire protocol stack starting with ethernet,
IPv4 and IPv6 up to UDP and TCP.  The latter being rather complex.  For send
you need a routing table and ARP to be managed.

For data sending you run into the same buffer problem as with receive.  TCP
has to hold on to the data sent until it is acknowledged.  That's a data copy
again because you can't store it in the NIC DMA pool.  Memory pools (mbufs)
then need to be allocated and managed as well, not to mention page fault
issues (userspace).

Once you're through all that you end up with the UNIX style kernel stack
moved into a userspace library.

> That's why I'm (more) interested in what you've done architecture wise than
> just saying "dump it in userland and be done with it." I think the VALE
> kernel stuff is very interesting from an architectural perspective. The
> questions (to me!) are:
>
> * how do we implement this in the current framework? (That's not too scary
> though; we'd just have the existing ethernet input/output path be one of
> many processing modules, and VALE would be another; netmap-userland would
> be another; etc, etc);
> * how do we make it a compile time fallback to the traditional model, for
> platforms that continue to be memory and/or cache constrained? (read:
> everything that's embedded)
> * ... and not simply have lots of #Ifdef NETMAP everywhere, but make the
> fallback be something sane and fall out of the API design?

Netmap really excels at pushing packets.  I think a recent extension allows
a netmap process to push a netmap-received packet back into the kernel for
further processing.  That's a good hybrid model for those use cases that
need raw packet pushing speed and have only little local traffic.

> I'll try to rope some more ideas into that design at the cambridge and euro
> BSD developer summits. I'll try to post some kind of work roadmap to the
> list(s) for comments and potential code hacking.
>
> Anyway. I'll continue waving hands and hacking on code until I have
> something that works.

Rather than day-dreaming of shiny new things we should invest in making the
kernel better and fix/remove bottlenecks.  It's a good kernel.

> Luigi - when are you next at a BSD developer summit / conference? Will you
> be at Malta?

He has submitted a talk about netmap and was accepted, so I surely do hope
that he shows up. ;)

-- 
Andre




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?521504BC.6030205>