Date: Thu, 14 Jun 2007 18:10:49 -0700 (PDT) From: youshi10@u.washington.edu To: Matthew Dillon <dillon@apollo.backplane.com> Cc: hackers@freebsd.org Subject: Re: Reason for doing malloc / bzero over calloc (performance)? Message-ID: <Pine.LNX.4.43.0706141810490.10404@hymn01.u.washington.edu> In-Reply-To: <200706150104.l5F14RjG001010@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 14 Jun 2007, Matthew Dillon wrote: > I'm going to throw a wrench in the works, because it all gets turned > around the moment you find yourself in a SMP environment where several > threads are running on different cpus at the same time, using the > same shared VM space. > > The moment you have a situation like that where you are futzing with > the page tables, i.e. using mmap() for demand-zero and munmap() to > free, the operation becomes extremely expensive verses anything > else because any update to the page table (specifically any removal > of page table entries from the page table) requires a SMP synchronization > to occur between all the cpu's actively sharing that VM space, and > that's on top of the overhead of taking the page fault(s). > > This is true of any memory mapping the kernel has to do in kernel > virtual memory (must be synchronized with ALL cpus) and any mapping > the kernel does on behalf of userland for user memory (must be > synchronized with any cpu's actively using that VM space, i.e. threaded > user programs). The synchronization is required to properly invalidate > stale mappings on other cpus and it must be done synchronously due > to bugs in Intel/AMD related to changing page table entries on one > cpu when instructions are executing using that memory on another cpu. > There is no way to avoid it without tripping up on the Intel/AMD hardware > bugs. > > From this point of view it is much, much better to bzero() memory that > is already mapped then it is to map/unmap new memory. I recently > audited DragonFly and found an insane number of IPIs flying about due > to PAGE_SIZE'd kernel mallocs using the VM trick via kernel_map & > kmem_alloc(). They all went away when I made the kernel malloc use > the slab cache for allocations up to and including PAGE_SIZE*2 bytes. > > Fun, eh? > > -Matt > Matthew Dillon > <dillon@backplane.com> I have no intention of using malloc/calloc with free, and then repeating the same procedure. It's better just to use the memory allocated, if possible, size permitting this. I wasn't thinking that closely though (ISA/hardware config versus OS implementation), but I had my suspicions since the AMD64 architecture is very different from the PowerPC architecture, in terms of word size, sychronization schemes, instruction count, etc. Interesting insight though. Thanks :). -Garrett
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.43.0706141810490.10404>