From owner-freebsd-net@FreeBSD.ORG  Mon Sep 27 16:12:44 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6CC3F106564A;
	Mon, 27 Sep 2010 16:12:44 +0000 (UTC)
	(envelope-from julian@freebsd.org)
Received: from out-0.mx.aerioconnect.net (out-0-18.mx.aerioconnect.net
	[216.240.47.78])
	by mx1.freebsd.org (Postfix) with ESMTP id 4AFD78FC08;
	Mon, 27 Sep 2010 16:12:44 +0000 (UTC)
Received: from idiom.com (postfix@mx0.idiom.com [216.240.32.160])
	by out-0.mx.aerioconnect.net (8.13.8/8.13.8) with ESMTP id
	o8RGChCQ029208; Mon, 27 Sep 2010 09:12:43 -0700
X-Client-Authorized: MaGic Cook1e
X-Client-Authorized: MaGic Cook1e
Received: from julian-mac.elischer.org
	(h-67-100-89-137.snfccasy.static.covad.net [67.100.89.137])
	by idiom.com (Postfix) with ESMTP id 6B6122D6018;
	Mon, 27 Sep 2010 09:12:42 -0700 (PDT)
Message-ID: <4CA0C2A3.7000508@freebsd.org>
Date: Mon, 27 Sep 2010 09:13:23 -0700
From: Julian Elischer <julian@freebsd.org>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US;
	rv:1.9.2.9) Gecko/20100915 Thunderbird/3.1.4
MIME-Version: 1.0
To: Andre Oppermann <andre@freebsd.org>
References: <4C9DA26D.7000309@freebsd.org> <4C9DB0C3.5010601@freebsd.org>
	<4C9EE905.5090701@freebsd.org> <4CA09792.3070307@freebsd.org>
In-Reply-To: <4CA09792.3070307@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.67 on 216.240.47.51
Cc: FreeBSD Net <net@freebsd.org>
Subject: Re: mbuf changes
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2010 16:12:44 -0000

  On 9/27/10 6:09 AM, Andre Oppermann wrote:
> On 26.09.2010 08:32, Julian Elischer wrote:
>> On 9/25/10 1:20 AM, Andre Oppermann wrote:
>>> On 25.09.2010 09:19, Julian Elischer wrote:
>>>> over the last few years there has been a bit of talk about some 
>>>> changes people want to see in mbufs
>>>> for 9.x
>>>> extra fields, changes in the way things are done, etc.
>>>>
>>>> If you are one of these people, pipe up now..
>>>>
>>>> to get the ball rolling..
>>>>
>>>> * Add a field for the current FIB.. currently this is 4 bits 
>>>> stolen from the flags.
>>>> what would be a good width: 8,12,16,24,32 bits?
>>>> this would allow setfib to use numbers greater than 16 (the 
>>>> current max)
>>>
>>> 16 bits for 65535 FIB's should be sufficient. More than that seems 
>>> really
>>> excessive.
>>>
>>>> * Preallocating some room for some number of tags before we start 
>>>> allocating
>>>> (expensively) new ones.
>>>
>>> Within the mbuf? Or at external and attached mbuf allocation time? 
>>> Tags
>>> are variable width and such not really suitable for pre-allocation.
>>
>> yes possibly within.. thre could be for example a reaserver 20 byte 
>> field and if it
>> doesn't fit in that we go to expensive tags.
>> I'm just waving my arms here.
>
> See my reply to Luigi for a detailed view on this.
>
>>>> * dynamically working out what the front padding size should be.. 
>>>> per session.. i.e.
>>>> when a packet is sent out and needs to be adjusted to add more 
>>>> headers, the originating
>>>> socket should be notified, or maybe the route should have this 
>>>> information...
>>>> so that future packets can start out with enough head room.
>>>> (this is not strictly to do with mbufs but might need some added 
>>>> field to point to the structure
>>>> that needs to be
>>>> updated.
>>>
>>> We already have "max_linkhdr" that specifies how much space is left
>>> for prepends at the start of each packet. The link protocols set
>>> this and also IPSec adds itself in there if enabled. If you have
>>> other encapsulations you should make them add in there as well.
>>
>> this doesn't take into account tunneling and encapsulation.
>
> It should/could but the tunneling and encapsulation protocols have to
> add themself to it when active.  IPSec does this.

yes bit the troubel is that every packet is then given a worst -case 
reserved area at the front
>
>> we could do a lot better than this.
>> especially on a per-route basis.
>> if the first mbuf in a session had a pointer to the relevent rtentry,
>> then as it is processed that could be updated..
>
> Please please please don't add a rtentry pointer to the mbuf.  Besides
> that the routing table is a very poor place to do this.  We don't have
> host routes anymore and the locking and refcounting is rather 
> expensive.

yes but we do have a route cache
(and we probably should still have some form of host routes but that's a
different issue not to be argued here.)

>
> max_linkhdr should be sufficient (fix small fixes to some protocol mbuf
> allocators) even for excessive cases of encapsulation:

max-linkhdr is way too big for 99% of all packets.
>
>  TCP over IPv4 over IPSec(AH+ESP) over UDP over IPv6 over PPPoE over 
> Ethernet =
>  60 + 20 + (8+24) + 8 + 40 + 8 + 14 = 182 total, of which 102 are 
> prepends.
>
> Maybe we need an API for the tunneling and encapsulation protocols to
> add their overhead to max_linkhdr.
>