From owner-freebsd-net@FreeBSD.ORG Tue Jan 1 23:58:14 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C19016A419 for ; Tue, 1 Jan 2008 23:58:14 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outM.internet-mail-service.net (outM.internet-mail-service.net [216.240.47.236]) by mx1.freebsd.org (Postfix) with ESMTP id 0D9BF13C44B for ; Tue, 1 Jan 2008 23:58:13 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Tue, 01 Jan 2008 15:58:13 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 592B4126DE5; Tue, 1 Jan 2008 15:58:12 -0800 (PST) Message-ID: <477AD3A1.4060401@elischer.org> Date: Tue, 01 Jan 2008 15:58:25 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Bruce Evans References: <4779697A.4050806@elischer.org> <20080101105918.I9594@delplex.bde.org> In-Reply-To: <20080101105918.I9594@delplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Net Subject: Re: m_freem() X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Jan 2008 23:58:14 -0000 Bruce Evans wrote: > On Mon, 31 Dec 2007, Julian Elischer wrote: > >> m_freem() would be a perfect candidate for an inline function. >> (or even macro). >> in the case where m is null, no function call would even be made... >> The whole function is only 2 lines, and it gets called once for every >> packet :-) > > On the contrary, m_freem() is a large function that is fairly unsuitable > for inlining. I just happened to count that it usually takes 180 > instructions in -current with no INVARIANTS etc. (down from 245 > instructions in ~5.2). Further counting gave 112 and 132 instructions > for it (180 was for ttcp udp input packets and 112 and 132 are for > ping packets in 2 directions). > > m_freem() is only one statement, but that statement consists mainly > of a function call to a function that is inline (m_free()). m_free() > sometimes calls m_free_ext(), which is not inline, and usually calls > uma_zfree(), which is inline, but which is just a wrapper for > uma_zfree_arg(), which is not inline. uma_zfree_arg() is very large > and thus very unsuitable for inlining. I didn't check for [nested] > inlining of its internals at the source level. At runtime it usually > calls the non-inline-function m_dtor_mbuf() which calls the non-inline > function m_tag_delete_chain(); then it calls critical_enter() and > critical_exit(). critical_exit() is fairly large and sometimes calls > thread_lock(), mi_switch() and thread_unlock(), but usually doesn't. > So the non-inline part of the call chain is usually: > > m_freem() > uma_zfree_arg() # the following is just 1 short path through this > m_dtor_mbuf() > m_tag_delete_chain() > critical_enter() > critical_exit() > > [Pause to recover from a double fault panic in critical*(). critical*() > or kdb is non-reeantrant somehwere, so tracing through critical_*() or > one of its callers in order to count instructions tends to cause panics.] > > All this is too large to inline. Inlining only the top level of it would > only make a tiny difference. It might make a positive or negative > difference, depending on whether the reduced instruction count has a larger > effect than the increased cache pressure. Generally I think it is bogus > to inline at the top level. Here inlining at the top level may win in 2 > ways: > - by avoiding the function call to the next level (and thus all function > calls) in the usual case. I think this doesn't happen here. I think it > is the usual case for the m_free_ext() call in m_free(), so inlining > m_free() is a clear win. > - by improving branch prediction. With a branch in a non-inline function, > it may be mispredicted often because different classes of callers > make it go in different ways. With branch the distributed in callers > by inlining, it can be predicted perfectly in individual callers > that don't change its direction often and/or change its direction > in predictable ways. On Athlon CPUs, mispredicting a single branch > costs the same several function calls provided the implicit branches > for all the function calls are not mispredicted. Too much inlining > is still bad. Apart from busting icaches, it can bust branch > prediction caches -- with enough distribution of branches, all > branches will be mispredicted. > > The m_freem() wrapper currently limits the icache bloat from the > m_free() inline. In RELENG_4, both m_free() and m_freem() are non-inline > and non-macro. That may be why networking in RELENG_4 is so much more > efficient than in -current ;-). (Actually it makes little difference.) Interesting.. I hadn't realised that m_free() had become an inline. It does make things more interesting. > > Bruce > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"