From owner-cvs-all Mon Jun 25 9: 8:41 2001 Delivered-To: cvs-all@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 3A3CB37B405; Mon, 25 Jun 2001 09:08:32 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f5PB53004512; Mon, 25 Jun 2001 04:05:03 -0700 (PDT) (envelope-from dillon) Date: Mon, 25 Jun 2001 04:05:03 -0700 (PDT) From: Matt Dillon Message-Id: <200106251105.f5PB53004512@earth.backplane.com> To: Bruce Evans Cc: Peter Wemm , Mikhail Teterin , jlemon@FreeBSD.ORG, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG Subject: Re: kernel size w/ optimized bzero() & patch set (was Re: Inline optimized bzero (was Re: cvs commit: src/sys/netinettcp_subr.c)) References: Sender: owner-cvs-all@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :On Sun, 24 Jun 2001, Matt Dillon wrote: : :[Peter Wemm wrote] :> :Just think.. This new ``improved'' bzero code can now fill up all 4K of L1 :> :instruction cache on most of my systems, and most of my 8K L1 instruction :> :cache on >= coppermine cpus. I'm impressed. Those microbenchmarks had :> :> Huh? Peter, you obviously haven't been listening. I strongly recommend :> that you review the last few postings I've made. The suggested bzero :> code certainly does NOT in any way blow up the L1 cache, and I think :> I'm pretty clear on that. I wouldn't be doing it if it did. : :It was an intermediate version that blew up the cache. I have been trying :slightly different versions, and found that gcc's builtin version doesn't :make all that much difference in the code size, either up or down. With :the following version of bzero: : :#define bzero(p, n) ({ \ : if (__builtin_constant_p(n) && (n) <= X) \ : __builtin_memset((p), 0, (n)); \ : else \ : (bzero)((p), (n)); \ :}) : :for X = 0, 4, 8, 12, 16, 32 and "infinity", the kernel sizes were: : : text data bss dec hex filename :1962434 151436 349824 2463694 2597ce kernel.4 :1962442 151436 349824 2463702 2597d6 kernel.8 :1962446 151436 349824 2463706 2597da kernel.12 :1962466 151436 349824 2463726 2597ee kernel.0 :1962802 151436 349824 2464062 25993e kernel.16 :1962866 151436 349824 2464126 25997e kernel.20 :1963538 151436 349824 2464798 259c1e kernel.32 :1964098 151436 349824 2465358 259e4e kernel.infinity : :Summary: it's hard for the inline version to be smaller; even when it :only needs to do one store-immediate operation, the kernel is only 32 :bytes smaller than the one using function calls which have to push :2 args, do the call, and clean up. This is presumably due to increased :register pressure for the inlined versions. Very interesting! Yes, I would tend to agree... though with bzero the register load should be minimal since all it is doing is storing zero through a pointer. When I wrote DICE for the Amiga I had very similar problems implementing structural copies, which required an index and two pointers, but I did not have a problem with indirection through non-registerized pointers (which required just one address register), or array indexes (the 68000 didn't have scaled indexes, though the 68020 and later did). :OTOH, the recent uninlining of the mbuf macros somehow reduced the :size of my standard kernel by more than 5% (more than 100K). It also :reduced the compilation time by more than 10%. Kernel compilation :times are still 65% larger than in RELENG_3 for kernels with essentially :the same options (this is using -current's compiler; they are 85% :larger using RELENG_3's compiler). You know, I'm not surprised. The mbuf macros were a really excellent example of things that should not be macroized. Another good example of macros that should never have been written are the sys/nfs/nfsm_subs.h in the NFS subsystem (wasn't someone working on cleaning those up? Alfred?). :> :better be damn good, because it may end up the only thing that the system :> :will do well now since all this excessive inlining looks like it is blowing :> :the L1 cache out the door. :> : :> :(I also apply the same complaint to the vm/* inlines). :> :> And you are just as wrong. The few functions inlined in vm/* are inlined :> mainly because (A) they are called with constant arguments, which means : :Some seem to have rotted a bit. E.g., _vm_map_lock_upgrade() (adding :an mtx_lock() to anything will bloat it in both space and time). : :Bruce Oh god, what have they done to my VM inlines! What a holy mess! I disclaim all responsibility... blame whoever comitted that mess. I would never do anything like that! Those macros used to be just lockmgr() calls. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message