From owner-freebsd-net@FreeBSD.ORG Wed Aug 21 18:19:50 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id ACA03E3A for ; Wed, 21 Aug 2013 18:19:50 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id BF1A72F4E for ; Wed, 21 Aug 2013 18:19:49 +0000 (UTC) Received: (qmail 73525 invoked from network); 21 Aug 2013 19:02:50 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 21 Aug 2013 19:02:50 -0000 Message-ID: <521504BC.6030205@freebsd.org> Date: Wed, 21 Aug 2013 20:19:40 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Adrian Chadd Subject: Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux) References: <520A6D07.5080106@freebsd.org> <520AFBE8.1090109@freebsd.org> <520B24A0.4000706@freebsd.org> <520B3056.1000804@freebsd.org> <20130814102109.GA63246@onelab2.iet.unipi.it> <1376745244.6575.YahooMailNeo@web121606.mail.ne1.yahoo.com> <1376748170.66110.YahooMailNeo@web121601.mail.ne1.yahoo.com> <1376833738.94737.YahooMailNeo@web121605.mail.ne1.yahoo.com> <71EA3DFB-B410-432D-98E0-B6341556BE6D@netgate.com> <1376851152.3322.YahooMailNeo@web121606.mail.ne1.yahoo.com> <1376859717.20232.YahooMailNeo@web121605.mail.ne1.yahoo.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Barney Cordoba , Luigi Rizzo , "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Aug 2013 18:19:50 -0000 On 18.08.2013 23:54, Adrian Chadd wrote: > Hi, > > I think the "UNIX architecture" is a bit broken for anything other than the > occasional (for various traffic levels defining "occasional!") traffic > connection. It's serving us well purely through the sheer force of will of > modern CPU power but I think we can do a lot better. I do not agree with you here. The UNIX architecture is fine but of course as with anything you're not going to get the full raw and theoretically possible performance for every special case out of it. It is extremely versatile and performs rather good over a broad set of applications. > _I_ think the correct model is a netmap model - batched packet handling, > lightweight drivers pushing and pulling batches of things, with some > lightweight plugins to service that inside the kernel and/or push into the > netmap ring buffer in userland. Interfacing into the ethernet and socket > layer should be something that bolts on the side, kind of netgraph style. > It would likely look a lot more like a switching backplane with socket IO > being one of many processing possibilities. If socket IO stays packet at a > time than great; but that's messing up the ability to do a lot of other > interesting things. Not really. While netmap is really good at pushing packets (on x86 cache coherent architectures only I may add) it fails miserably as general "socket" layer. On the receive side it has a fixed buffer pool and would grind to a halt if you were directly using those for TCP receive socket buffers if the tcp application stops immediately processing every packet. This means you have to copy the packet contents from the NIC DMA pool to some other allocated memory to prevent that. It doesn't have any security model and isn't really multi-app aware. How do you multiplex a number of protocols and connections to different applications? Copy through shared memory? You'd have to re-implement the entire protocol stack starting with ethernet, IPv4 and IPv6 up to UDP and TCP. The latter being rather complex. For send you need a routing table and ARP to be managed. For data sending you run into the same buffer problem as with receive. TCP has to hold on to the data sent until it is acknowledged. That's a data copy again because you can't store it in the NIC DMA pool. Memory pools (mbufs) then need to be allocated and managed as well, not to mention page fault issues (userspace). Once you're through all that you end up with the UNIX style kernel stack moved into a userspace library. > That's why I'm (more) interested in what you've done architecture wise than > just saying "dump it in userland and be done with it." I think the VALE > kernel stuff is very interesting from an architectural perspective. The > questions (to me!) are: > > * how do we implement this in the current framework? (That's not too scary > though; we'd just have the existing ethernet input/output path be one of > many processing modules, and VALE would be another; netmap-userland would > be another; etc, etc); > * how do we make it a compile time fallback to the traditional model, for > platforms that continue to be memory and/or cache constrained? (read: > everything that's embedded) > * ... and not simply have lots of #Ifdef NETMAP everywhere, but make the > fallback be something sane and fall out of the API design? Netmap really excels at pushing packets. I think a recent extension allows a netmap process to push a netmap-received packet back into the kernel for further processing. That's a good hybrid model for those use cases that need raw packet pushing speed and have only little local traffic. > I'll try to rope some more ideas into that design at the cambridge and euro > BSD developer summits. I'll try to post some kind of work roadmap to the > list(s) for comments and potential code hacking. > > Anyway. I'll continue waving hands and hacking on code until I have > something that works. Rather than day-dreaming of shiny new things we should invest in making the kernel better and fix/remove bottlenecks. It's a good kernel. > Luigi - when are you next at a BSD developer summit / conference? Will you > be at Malta? He has submitted a talk about netmap and was accepted, so I surely do hope that he shows up. ;) -- Andre