From owner-freebsd-net@FreeBSD.ORG Mon Sep 27 16:12:44 2010 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6CC3F106564A; Mon, 27 Sep 2010 16:12:44 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from out-0.mx.aerioconnect.net (out-0-18.mx.aerioconnect.net [216.240.47.78]) by mx1.freebsd.org (Postfix) with ESMTP id 4AFD78FC08; Mon, 27 Sep 2010 16:12:44 +0000 (UTC) Received: from idiom.com (postfix@mx0.idiom.com [216.240.32.160]) by out-0.mx.aerioconnect.net (8.13.8/8.13.8) with ESMTP id o8RGChCQ029208; Mon, 27 Sep 2010 09:12:43 -0700 X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (h-67-100-89-137.snfccasy.static.covad.net [67.100.89.137]) by idiom.com (Postfix) with ESMTP id 6B6122D6018; Mon, 27 Sep 2010 09:12:42 -0700 (PDT) Message-ID: <4CA0C2A3.7000508@freebsd.org> Date: Mon, 27 Sep 2010 09:13:23 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.9) Gecko/20100915 Thunderbird/3.1.4 MIME-Version: 1.0 To: Andre Oppermann References: <4C9DA26D.7000309@freebsd.org> <4C9DB0C3.5010601@freebsd.org> <4C9EE905.5090701@freebsd.org> <4CA09792.3070307@freebsd.org> In-Reply-To: <4CA09792.3070307@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.67 on 216.240.47.51 Cc: FreeBSD Net Subject: Re: mbuf changes X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2010 16:12:44 -0000 On 9/27/10 6:09 AM, Andre Oppermann wrote: > On 26.09.2010 08:32, Julian Elischer wrote: >> On 9/25/10 1:20 AM, Andre Oppermann wrote: >>> On 25.09.2010 09:19, Julian Elischer wrote: >>>> over the last few years there has been a bit of talk about some >>>> changes people want to see in mbufs >>>> for 9.x >>>> extra fields, changes in the way things are done, etc. >>>> >>>> If you are one of these people, pipe up now.. >>>> >>>> to get the ball rolling.. >>>> >>>> * Add a field for the current FIB.. currently this is 4 bits >>>> stolen from the flags. >>>> what would be a good width: 8,12,16,24,32 bits? >>>> this would allow setfib to use numbers greater than 16 (the >>>> current max) >>> >>> 16 bits for 65535 FIB's should be sufficient. More than that seems >>> really >>> excessive. >>> >>>> * Preallocating some room for some number of tags before we start >>>> allocating >>>> (expensively) new ones. >>> >>> Within the mbuf? Or at external and attached mbuf allocation time? >>> Tags >>> are variable width and such not really suitable for pre-allocation. >> >> yes possibly within.. thre could be for example a reaserver 20 byte >> field and if it >> doesn't fit in that we go to expensive tags. >> I'm just waving my arms here. > > See my reply to Luigi for a detailed view on this. > >>>> * dynamically working out what the front padding size should be.. >>>> per session.. i.e. >>>> when a packet is sent out and needs to be adjusted to add more >>>> headers, the originating >>>> socket should be notified, or maybe the route should have this >>>> information... >>>> so that future packets can start out with enough head room. >>>> (this is not strictly to do with mbufs but might need some added >>>> field to point to the structure >>>> that needs to be >>>> updated. >>> >>> We already have "max_linkhdr" that specifies how much space is left >>> for prepends at the start of each packet. The link protocols set >>> this and also IPSec adds itself in there if enabled. If you have >>> other encapsulations you should make them add in there as well. >> >> this doesn't take into account tunneling and encapsulation. > > It should/could but the tunneling and encapsulation protocols have to > add themself to it when active. IPSec does this. yes bit the troubel is that every packet is then given a worst -case reserved area at the front > >> we could do a lot better than this. >> especially on a per-route basis. >> if the first mbuf in a session had a pointer to the relevent rtentry, >> then as it is processed that could be updated.. > > Please please please don't add a rtentry pointer to the mbuf. Besides > that the routing table is a very poor place to do this. We don't have > host routes anymore and the locking and refcounting is rather > expensive. yes but we do have a route cache (and we probably should still have some form of host routes but that's a different issue not to be argued here.) > > max_linkhdr should be sufficient (fix small fixes to some protocol mbuf > allocators) even for excessive cases of encapsulation: max-linkhdr is way too big for 99% of all packets. > > TCP over IPv4 over IPSec(AH+ESP) over UDP over IPv6 over PPPoE over > Ethernet = > 60 + 20 + (8+24) + 8 + 40 + 8 + 14 = 182 total, of which 102 are > prepends. > > Maybe we need an API for the tunneling and encapsulation protocols to > add their overhead to max_linkhdr. >