From owner-freebsd-net@FreeBSD.ORG Mon Sep 27 16:08:46 2010 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F15591065693; Mon, 27 Sep 2010 16:08:46 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from out-0.mx.aerioconnect.net (out-0-18.mx.aerioconnect.net [216.240.47.78]) by mx1.freebsd.org (Postfix) with ESMTP id CEAEB8FC16; Mon, 27 Sep 2010 16:08:46 +0000 (UTC) Received: from idiom.com (postfix@mx0.idiom.com [216.240.32.160]) by out-0.mx.aerioconnect.net (8.13.8/8.13.8) with ESMTP id o8RG8iOo028432; Mon, 27 Sep 2010 09:08:45 -0700 X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (h-67-100-89-137.snfccasy.static.covad.net [67.100.89.137]) by idiom.com (Postfix) with ESMTP id C86BD2D6012; Mon, 27 Sep 2010 09:08:43 -0700 (PDT) Message-ID: <4CA0C1B5.2090309@freebsd.org> Date: Mon, 27 Sep 2010 09:09:25 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.9) Gecko/20100915 Thunderbird/3.1.4 MIME-Version: 1.0 To: Andre Oppermann References: <4C9DA26D.7000309@freebsd.org> <4C9DB0C3.5010601@freebsd.org> <20100925163010.GA76213@onelab2.iet.unipi.it> <4CA09451.7010401@freebsd.org> <20100927131836.GA99909@onelab2.iet.unipi.it> <4CA098BA.2010106@freebsd.org> In-Reply-To: <4CA098BA.2010106@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.67 on 216.240.47.51 Cc: Jeff Roberson , Luigi Rizzo , FreeBSD Net Subject: Re: mbuf changes X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2010 16:08:47 -0000 On 9/27/10 6:14 AM, Andre Oppermann wrote: > On 27.09.2010 15:18, Luigi Rizzo wrote: >> On Mon, Sep 27, 2010 at 02:55:45PM +0200, Andre Oppermann wrote: >> ... >>>> my idea was to have an extra field in the mbuf to tell how much room >>>> should be reserved/used for metadata (such as mtags) after >>>> the payload area so you don't need to change the allocator, and >>>> possibly can even modify this on an existing mbuf. >>>> Almost always mbufs have spare room (e.g. incoming pkts have all >>>> data in the cluster and mostly empty mdata; outgoing, except >>>> for rare cases, tend to be in a similar situation. >>>> So this approach would allow to take an already allocated >>>> mbuf and put the mtag in the spare area after the data. >>> >>> For incoming data this approach could work as usually 2K mbuf >>> clusters >>> are used and they have trailing space available, or rather the normal >>> mbuf referencing the cluster doesn't have its own data section >>> unused. >>> >>> When trailing space should be used the M_TAILINGSPACE() needs >>> modifications >>> and a full tree audit is required to make sure that all mbuf >>> consumers are >>> correctly using it and not some own version that directly assumes >>> certain >>> mbuf sizes, etc. A lot of work. >>> >>> For locally generated mbufs and socket buffers we try to use the >>> mbufs to >>> their maximal extent. When the socket buffer data is packetized >>> it normally >>> is referenced then we get the normal mbuf with its data portion >>> unused. So >>> that could work. >>> >>> A complication is the m_tag_free() field and function which puts >>> the memory >>> deallocation into the hands of the mtag user. That means all mtag >>> consumers >>> have to made aware of provided storage w/o having to return the >>> memory >>> directly >>> to the memory allocator (malloc/UMA). >>> >>> So the only way I realistically see is to make use of the mbuf's >>> unused >>> data portion when it has external storage to it. This should >>> probably >>> cover about 98% of all cases. The rest has to malloc() the mtag >>> storage >>> as usual. >> >> so it wouldn't be bad -- i cannot judge the numbers, but definitely >> it would work for all incoming traffic, plus all tcp data packets >> (as the payload is in the cluster), plus all pure acks (which are >> small), >> plus all UDP above some 200 bytes... > > Yes, about that. > >>> I could whip up a prototype for review in the next weeks. >> >> I seem to remember that jeffr had already something done in Perforce. > > That's a more general overhaul of the way mbuf's are structured and > allocated with UMA. I'm not sure it provides for the mtag issue. Will > check though. I'd like to see if we can go over his stuff and any other suggested changes before 9.0 and see if we can agree on a change for 9.0 Jeff, we discussed this a year ago.. do you still have your suggested changes?