Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Jun 2007 18:10:49 -0700 (PDT)
From:      youshi10@u.washington.edu
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        hackers@freebsd.org
Subject:   Re: Reason for doing malloc / bzero over calloc (performance)?
Message-ID:  <Pine.LNX.4.43.0706141810490.10404@hymn01.u.washington.edu>
In-Reply-To: <200706150104.l5F14RjG001010@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 14 Jun 2007, Matthew Dillon wrote:

>    I'm going to throw a wrench in the works, because it all gets turned
>    around the moment you find yourself in a SMP environment where several
>    threads are running on different cpus at the same time, using the
>    same shared VM space.
>
>    The moment you have a situation like that where you are futzing with
>    the page tables, i.e. using mmap() for demand-zero and munmap() to
>    free, the operation becomes extremely expensive verses anything
>    else because any update to the page table (specifically any removal
>    of page table entries from the page table) requires a SMP synchronization
>    to occur between all the cpu's actively sharing that VM space, and
>    that's on top of the overhead of taking the page fault(s).
>
>    This is true of any memory mapping the kernel has to do in kernel
>    virtual memory (must be synchronized with ALL cpus) and any mapping
>    the kernel does on behalf of userland for user memory (must be
>    synchronized with any cpu's actively using that VM space, i.e. threaded
>    user programs).  The synchronization is required to properly invalidate
>    stale mappings on other cpus and it must be done synchronously due
>    to bugs in Intel/AMD related to changing page table entries on one
>    cpu when instructions are executing using that memory on another cpu.
>    There is no way to avoid it without tripping up on the Intel/AMD hardware
>    bugs.
>
>    From this point of view it is much, much better to bzero() memory that
>    is already mapped then it is to map/unmap new memory.  I recently
>    audited DragonFly and found an insane number of IPIs flying about due
>    to PAGE_SIZE'd kernel mallocs using the VM trick via kernel_map &
>    kmem_alloc().  They all went away when I made the kernel malloc use
>    the slab cache for allocations up to and including PAGE_SIZE*2 bytes.
>
>    Fun, eh?
>
> 					-Matt
> 					Matthew Dillon
> 					<dillon@backplane.com>

I have no intention of using malloc/calloc with free, and then repeating the same procedure. It's better just to use the memory allocated, if possible, size permitting this.

I wasn't thinking that closely though (ISA/hardware config versus OS implementation), but I had my suspicions since the AMD64 architecture is very different from the PowerPC architecture, in terms of word size, sychronization schemes, instruction count, etc.

Interesting insight though. Thanks :).

-Garrett




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.43.0706141810490.10404>