Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Jun 2001 04:05:03 -0700 (PDT)
From:      Matt Dillon <dillon@earth.backplane.com>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        Peter Wemm <peter@wemm.org>, Mikhail Teterin <mi@aldan.algebra.com>, jlemon@FreeBSD.ORG, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG
Subject:   Re: kernel size w/ optimized bzero() & patch set (was Re: Inline optimized bzero (was Re: cvs commit: src/sys/netinettcp_subr.c)) 
Message-ID:  <200106251105.f5PB53004512@earth.backplane.com>
References:   <Pine.BSF.4.21.0106252337370.7918-100000@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

:On Sun, 24 Jun 2001, Matt Dillon wrote:
:
:[Peter Wemm wrote]
:> :Just think.. This new ``improved'' bzero code can now fill up all 4K of L1
:> :instruction cache on most of my systems, and most of my 8K L1 instruction
:> :cache on >= coppermine cpus.  I'm impressed.  Those microbenchmarks had
:> 
:>     Huh?  Peter, you obviously haven't been listening.  I strongly recommend
:>     that you review the last few postings I've made.  The suggested bzero
:>     code certainly does NOT in any way blow up the L1 cache, and I think
:>     I'm pretty clear on that.  I wouldn't be doing it if it did.
:
:It was an intermediate version that blew up the cache.  I have been trying
:slightly different versions, and found that gcc's builtin version doesn't
:make all that much difference in the code size, either up or down.  With
:the following version of bzero:
:
:#define	bzero(p, n) ({						\
:	if (__builtin_constant_p(n) && (n) <= X)		\
:		__builtin_memset((p), 0, (n));			\
:	else							\
:		(bzero)((p), (n));				\
:})
:
:for X = 0, 4, 8, 12, 16, 32 and "infinity", the kernel sizes were:
:
:   text	   data	    bss	    dec	    hex	filename
:1962434	 151436	 349824	2463694	 2597ce	kernel.4
:1962442	 151436	 349824	2463702	 2597d6	kernel.8
:1962446	 151436	 349824	2463706	 2597da	kernel.12
:1962466	 151436	 349824	2463726	 2597ee	kernel.0
:1962802	 151436	 349824	2464062	 25993e	kernel.16
:1962866	 151436	 349824	2464126	 25997e	kernel.20
:1963538	 151436	 349824	2464798	 259c1e	kernel.32
:1964098	 151436	 349824	2465358	 259e4e	kernel.infinity
:
:Summary: it's hard for the inline version to be smaller; even when it
:only needs to do one store-immediate operation, the kernel is only 32
:bytes smaller than the one using function calls which have to push
:2 args, do the call, and clean up.  This is presumably due to increased
:register pressure for the inlined versions.

    Very interesting!  Yes, I would tend to agree... though with bzero
    the register load should be minimal since all it is doing is storing
    zero through a pointer.  When I wrote DICE for the Amiga I had very
    similar problems implementing structural copies, which required an
    index and two pointers, but I did not have a problem with indirection
    through non-registerized pointers (which required just one address
    register), or array indexes (the 68000 didn't have scaled indexes,
    though the 68020 and later did).

:OTOH, the recent uninlining of the mbuf macros somehow reduced the
:size of my standard kernel by more than 5% (more than 100K).  It also
:reduced the compilation time by more than 10%.  Kernel compilation
:times are still 65% larger than in RELENG_3 for kernels with essentially
:the same options (this is using -current's compiler; they are 85%
:larger using RELENG_3's compiler).

    You know, I'm not surprised.  The mbuf macros were a really excellent
    example of things that should not be macroized.  Another good example
    of macros that should never have been written are the sys/nfs/nfsm_subs.h
    in the NFS subsystem (wasn't someone working on cleaning those up?
    Alfred?).

:> :better be damn good, because it may end up the only thing that the system
:> :will do well now since all this excessive inlining looks like it is blowing
:> :the L1 cache out the door.
:> :
:> :(I also apply the same complaint to the vm/* inlines).
:> 
:>     And you are just as wrong.  The few functions inlined in vm/* are inlined
:>     mainly because (A) they are called with constant arguments, which means
:
:Some seem to have rotted a bit.  E.g., _vm_map_lock_upgrade() (adding
:an mtx_lock() to anything will bloat it in both space and time).
:
:Bruce

    Oh god, what have they done to my VM inlines!  What a holy mess! 
    I disclaim all responsibility... blame whoever comitted that mess.
    I would never do anything like that!  Those macros used to be just
    lockmgr() calls.

						-Matt



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200106251105.f5PB53004512>