Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Aug 2015 00:21:26 +0000 (UTC)
From:      Jason Evans <jasone@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   svn commit: r286866 - in head: contrib/jemalloc contrib/jemalloc/doc contrib/jemalloc/include/jemalloc contrib/jemalloc/include/jemalloc/internal contrib/jemalloc/src include lib/libc/gen lib/libc/...
Message-ID:  <201508180021.t7I0LQwE016289@repo.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: jasone
Date: Tue Aug 18 00:21:25 2015
New Revision: 286866
URL: https://svnweb.freebsd.org/changeset/base/286866

Log:
  Update jemalloc to version 4.0.0.

Added:
  head/contrib/jemalloc/include/jemalloc/internal/jemalloc_internal_decls.h   (contents, props changed)
  head/contrib/jemalloc/include/jemalloc/internal/pages.h   (contents, props changed)
  head/contrib/jemalloc/include/jemalloc/internal/valgrind.h   (contents, props changed)
  head/contrib/jemalloc/include/jemalloc/jemalloc_typedefs.h   (contents, props changed)
  head/contrib/jemalloc/src/pages.c   (contents, props changed)
Modified:
  head/contrib/jemalloc/COPYING
  head/contrib/jemalloc/ChangeLog
  head/contrib/jemalloc/FREEBSD-Xlist
  head/contrib/jemalloc/FREEBSD-diffs
  head/contrib/jemalloc/FREEBSD-upgrade
  head/contrib/jemalloc/VERSION
  head/contrib/jemalloc/doc/jemalloc.3
  head/contrib/jemalloc/include/jemalloc/internal/arena.h
  head/contrib/jemalloc/include/jemalloc/internal/atomic.h
  head/contrib/jemalloc/include/jemalloc/internal/base.h
  head/contrib/jemalloc/include/jemalloc/internal/bitmap.h
  head/contrib/jemalloc/include/jemalloc/internal/chunk.h
  head/contrib/jemalloc/include/jemalloc/internal/chunk_dss.h
  head/contrib/jemalloc/include/jemalloc/internal/chunk_mmap.h
  head/contrib/jemalloc/include/jemalloc/internal/ckh.h
  head/contrib/jemalloc/include/jemalloc/internal/ctl.h
  head/contrib/jemalloc/include/jemalloc/internal/extent.h
  head/contrib/jemalloc/include/jemalloc/internal/hash.h
  head/contrib/jemalloc/include/jemalloc/internal/huge.h
  head/contrib/jemalloc/include/jemalloc/internal/jemalloc_internal.h
  head/contrib/jemalloc/include/jemalloc/internal/jemalloc_internal_defs.h
  head/contrib/jemalloc/include/jemalloc/internal/jemalloc_internal_macros.h
  head/contrib/jemalloc/include/jemalloc/internal/mutex.h
  head/contrib/jemalloc/include/jemalloc/internal/private_namespace.h
  head/contrib/jemalloc/include/jemalloc/internal/prng.h
  head/contrib/jemalloc/include/jemalloc/internal/prof.h
  head/contrib/jemalloc/include/jemalloc/internal/public_namespace.h
  head/contrib/jemalloc/include/jemalloc/internal/ql.h
  head/contrib/jemalloc/include/jemalloc/internal/qr.h
  head/contrib/jemalloc/include/jemalloc/internal/quarantine.h
  head/contrib/jemalloc/include/jemalloc/internal/rb.h
  head/contrib/jemalloc/include/jemalloc/internal/rtree.h
  head/contrib/jemalloc/include/jemalloc/internal/size_classes.h
  head/contrib/jemalloc/include/jemalloc/internal/stats.h
  head/contrib/jemalloc/include/jemalloc/internal/tcache.h
  head/contrib/jemalloc/include/jemalloc/internal/tsd.h
  head/contrib/jemalloc/include/jemalloc/internal/util.h
  head/contrib/jemalloc/include/jemalloc/jemalloc.h
  head/contrib/jemalloc/include/jemalloc/jemalloc_FreeBSD.h
  head/contrib/jemalloc/src/arena.c
  head/contrib/jemalloc/src/base.c
  head/contrib/jemalloc/src/bitmap.c
  head/contrib/jemalloc/src/chunk.c
  head/contrib/jemalloc/src/chunk_dss.c
  head/contrib/jemalloc/src/chunk_mmap.c
  head/contrib/jemalloc/src/ckh.c
  head/contrib/jemalloc/src/ctl.c
  head/contrib/jemalloc/src/extent.c
  head/contrib/jemalloc/src/huge.c
  head/contrib/jemalloc/src/jemalloc.c
  head/contrib/jemalloc/src/mutex.c
  head/contrib/jemalloc/src/prof.c
  head/contrib/jemalloc/src/quarantine.c
  head/contrib/jemalloc/src/rtree.c
  head/contrib/jemalloc/src/stats.c
  head/contrib/jemalloc/src/tcache.c
  head/contrib/jemalloc/src/tsd.c
  head/contrib/jemalloc/src/util.c
  head/include/malloc_np.h
  head/lib/libc/gen/tls.c
  head/lib/libc/stdlib/jemalloc/Makefile.inc

Modified: head/contrib/jemalloc/COPYING
==============================================================================
--- head/contrib/jemalloc/COPYING	Mon Aug 17 23:44:38 2015	(r286865)
+++ head/contrib/jemalloc/COPYING	Tue Aug 18 00:21:25 2015	(r286866)
@@ -1,10 +1,10 @@
 Unless otherwise specified, files in the jemalloc source distribution are
 subject to the following license:
 --------------------------------------------------------------------------------
-Copyright (C) 2002-2014 Jason Evans <jasone@canonware.com>.
+Copyright (C) 2002-2015 Jason Evans <jasone@canonware.com>.
 All rights reserved.
 Copyright (C) 2007-2012 Mozilla Foundation.  All rights reserved.
-Copyright (C) 2009-2014 Facebook, Inc.  All rights reserved.
+Copyright (C) 2009-2015 Facebook, Inc.  All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:

Modified: head/contrib/jemalloc/ChangeLog
==============================================================================
--- head/contrib/jemalloc/ChangeLog	Mon Aug 17 23:44:38 2015	(r286865)
+++ head/contrib/jemalloc/ChangeLog	Tue Aug 18 00:21:25 2015	(r286866)
@@ -1,10 +1,166 @@
 Following are change highlights associated with official releases.  Important
-bug fixes are all mentioned, but internal enhancements are omitted here for
-brevity (even though they are more fun to write about).  Much more detail can be
-found in the git revision history:
+bug fixes are all mentioned, but some internal enhancements are omitted here for
+brevity.  Much more detail can be found in the git revision history:
 
     https://github.com/jemalloc/jemalloc
 
+* 4.0.0 (August 17, 2015)
+
+  This version contains many speed and space optimizations, both minor and
+  major.  The major themes are generalization, unification, and simplification.
+  Although many of these optimizations cause no visible behavior change, their
+  cumulative effect is substantial.
+
+  New features:
+  - Normalize size class spacing to be consistent across the complete size
+    range.  By default there are four size classes per size doubling, but this
+    is now configurable via the --with-lg-size-class-group option.  Also add the
+    --with-lg-page, --with-lg-page-sizes, --with-lg-quantum, and
+    --with-lg-tiny-min options, which can be used to tweak page and size class
+    settings.  Impacts:
+    + Worst case performance for incrementally growing/shrinking reallocation
+      is improved because there are far fewer size classes, and therefore
+      copying happens less often.
+    + Internal fragmentation is limited to 20% for all but the smallest size
+      classes (those less than four times the quantum).  (1B + 4 KiB)
+      and (1B + 4 MiB) previously suffered nearly 50% internal fragmentation.
+    + Chunk fragmentation tends to be lower because there are fewer distinct run
+      sizes to pack.
+  - Add support for explicit tcaches.  The "tcache.create", "tcache.flush", and
+    "tcache.destroy" mallctls control tcache lifetime and flushing, and the
+    MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to the *allocx() API
+    control which tcache is used for each operation.
+  - Implement per thread heap profiling, as well as the ability to
+    enable/disable heap profiling on a per thread basis.  Add the "prof.reset",
+    "prof.lg_sample", "thread.prof.name", "thread.prof.active",
+    "opt.prof_thread_active_init", "prof.thread_active_init", and
+    "thread.prof.active" mallctls.
+  - Add support for per arena application-specified chunk allocators, configured
+    via the "arena.<i>.chunk_hooks" mallctl.
+  - Refactor huge allocation to be managed by arenas, so that arenas now
+    function as general purpose independent allocators.  This is important in
+    the context of user-specified chunk allocators, aside from the scalability
+    benefits.  Related new statistics:
+    + The "stats.arenas.<i>.huge.allocated", "stats.arenas.<i>.huge.nmalloc",
+      "stats.arenas.<i>.huge.ndalloc", and "stats.arenas.<i>.huge.nrequests"
+      mallctls provide high level per arena huge allocation statistics.
+    + The "arenas.nhchunks", "arenas.hchunk.<i>.size",
+      "stats.arenas.<i>.hchunks.<j>.nmalloc",
+      "stats.arenas.<i>.hchunks.<j>.ndalloc",
+      "stats.arenas.<i>.hchunks.<j>.nrequests", and
+      "stats.arenas.<i>.hchunks.<j>.curhchunks" mallctls provide per size class
+      statistics.
+  - Add the 'util' column to malloc_stats_print() output, which reports the
+    proportion of available regions that are currently in use for each small
+    size class.
+  - Add "alloc" and "free" modes for for junk filling (see the "opt.junk"
+    mallctl), so that it is possible to separately enable junk filling for
+    allocation versus deallocation.
+  - Add the jemalloc-config script, which provides information about how
+    jemalloc was configured, and how to integrate it into application builds.
+  - Add metadata statistics, which are accessible via the "stats.metadata",
+    "stats.arenas.<i>.metadata.mapped", and
+    "stats.arenas.<i>.metadata.allocated" mallctls.
+  - Add the "stats.resident" mallctl, which reports the upper limit of
+    physically resident memory mapped by the allocator.
+  - Add per arena control over unused dirty page purging, via the
+    "arenas.lg_dirty_mult", "arena.<i>.lg_dirty_mult", and
+    "stats.arenas.<i>.lg_dirty_mult" mallctls.
+  - Add the "prof.gdump" mallctl, which makes it possible to toggle the gdump
+    feature on/off during program execution.
+  - Add sdallocx(), which implements sized deallocation.  The primary
+    optimization over dallocx() is the removal of a metadata read, which often
+    suffers an L1 cache miss.
+  - Add missing header includes in jemalloc/jemalloc.h, so that applications
+    only have to #include <jemalloc/jemalloc.h>.
+  - Add support for additional platforms:
+    + Bitrig
+    + Cygwin
+    + DragonFlyBSD
+    + iOS
+    + OpenBSD
+    + OpenRISC/or1k
+
+  Optimizations:
+  - Maintain dirty runs in per arena LRUs rather than in per arena trees of
+    dirty-run-containing chunks.  In practice this change significantly reduces
+    dirty page purging volume.
+  - Integrate whole chunks into the unused dirty page purging machinery.  This
+    reduces the cost of repeated huge allocation/deallocation, because it
+    effectively introduces a cache of chunks.
+  - Split the arena chunk map into two separate arrays, in order to increase
+    cache locality for the frequently accessed bits.
+  - Move small run metadata out of runs, into arena chunk headers.  This reduces
+    run fragmentation, smaller runs reduce external fragmentation for small size
+    classes, and packed (less uniformly aligned) metadata layout improves CPU
+    cache set distribution.
+  - Randomly distribute large allocation base pointer alignment relative to page
+    boundaries in order to more uniformly utilize CPU cache sets.  This can be
+    disabled via the --disable-cache-oblivious configure option, and queried via
+    the "config.cache_oblivious" mallctl.
+  - Micro-optimize the fast paths for the public API functions.
+  - Refactor thread-specific data to reside in a single structure.  This assures
+    that only a single TLS read is necessary per call into the public API.
+  - Implement in-place huge allocation growing and shrinking.
+  - Refactor rtree (radix tree for chunk lookups) to be lock-free, and make
+    additional optimizations that reduce maximum lookup depth to one or two
+    levels.  This resolves what was a concurrency bottleneck for per arena huge
+    allocation, because a global data structure is critical for determining
+    which arenas own which huge allocations.
+
+  Incompatible changes:
+  - Replace --enable-cc-silence with --disable-cc-silence to suppress spurious
+    warnings by default.
+  - Assure that the constness of malloc_usable_size()'s return type matches that
+    of the system implementation.
+  - Change the heap profile dump format to support per thread heap profiling,
+    rename pprof to jeprof, and enhance it with the --thread=<n> option.  As a
+    result, the bundled jeprof must now be used rather than the upstream
+    (gperftools) pprof.
+  - Disable "opt.prof_final" by default, in order to avoid atexit(3), which can
+    internally deadlock on some platforms.
+  - Change the "arenas.nlruns" mallctl type from size_t to unsigned.
+  - Replace the "stats.arenas.<i>.bins.<j>.allocated" mallctl with
+    "stats.arenas.<i>.bins.<j>.curregs".
+  - Ignore MALLOC_CONF in set{uid,gid,cap} binaries.
+  - Ignore MALLOCX_ARENA(a) in dallocx(), in favor of using the
+    MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to control tcache usage.
+
+  Removed features:
+  - Remove the *allocm() API, which is superseded by the *allocx() API.
+  - Remove the --enable-dss options, and make dss non-optional on all platforms
+    which support sbrk(2).
+  - Remove the "arenas.purge" mallctl, which was obsoleted by the
+    "arena.<i>.purge" mallctl in 3.1.0.
+  - Remove the unnecessary "opt.valgrind" mallctl; jemalloc automatically
+    detects whether it is running inside Valgrind.
+  - Remove the "stats.huge.allocated", "stats.huge.nmalloc", and
+    "stats.huge.ndalloc" mallctls.
+  - Remove the --enable-mremap option.
+  - Remove the "stats.chunks.current", "stats.chunks.total", and
+    "stats.chunks.high" mallctls.
+
+  Bug fixes:
+  - Fix the cactive statistic to decrease (rather than increase) when active
+    memory decreases.  This regression was first released in 3.5.0.
+  - Fix OOM handling in memalign() and valloc().  A variant of this bug existed
+    in all releases since 2.0.0, which introduced these functions.
+  - Fix an OOM-related regression in arena_tcache_fill_small(), which could
+    cause cache corruption on OOM.  This regression was present in all releases
+    from 2.2.0 through 3.6.0.
+  - Fix size class overflow handling for malloc(), posix_memalign(), memalign(),
+    calloc(), and realloc() when profiling is enabled.
+  - Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
+    "secondary" precedence is specified, but sbrk(2) is not supported.
+  - Fix fallback lg_floor() implementations to handle extremely large inputs.
+  - Ensure the default purgeable zone is after the default zone on OS X.
+  - Fix latent bugs in atomic_*().
+  - Fix the "arena.<i>.dss" mallctl to handle read-only calls.
+  - Fix tls_model configuration to enable the initial-exec model when possible.
+  - Mark malloc_conf as a weak symbol so that the application can override it.
+  - Correctly detect glibc's adaptive pthread mutexes.
+  - Fix the --without-export configure option.
+
 * 3.6.0 (March 31, 2014)
 
   This version contains a critical bug fix for a regression present in 3.5.0 and
@@ -21,7 +177,7 @@ found in the git revision history:
     backtracing to be reliable.
   - Use dss allocation precedence for huge allocations as well as small/large
     allocations.
-  - Fix test assertion failure message formatting.  This bug did not manifect on
+  - Fix test assertion failure message formatting.  This bug did not manifest on
     x86_64 systems because of implementation subtleties in va_list.
   - Fix inconsequential test failures for hash and SFMT code.
 
@@ -516,7 +672,7 @@ found in the git revision history:
   - Make it possible for the application to manually flush a thread's cache, via
     the "tcache.flush" mallctl.
   - Base maximum dirty page count on proportion of active memory.
-  - Compute various addtional run-time statistics, including per size class
+  - Compute various additional run-time statistics, including per size class
     statistics for large objects.
   - Expose malloc_stats_print(), which can be called repeatedly by the
     application.

Modified: head/contrib/jemalloc/FREEBSD-Xlist
==============================================================================
--- head/contrib/jemalloc/FREEBSD-Xlist	Mon Aug 17 23:44:38 2015	(r286865)
+++ head/contrib/jemalloc/FREEBSD-Xlist	Tue Aug 18 00:21:25 2015	(r286866)
@@ -1,6 +1,6 @@
 $FreeBSD$
-.git
-.gitignore
+.autom4te.cfg
+.git*
 FREEBSD-*
 INSTALL
 Makefile*
@@ -40,7 +40,10 @@ include/jemalloc/jemalloc_protos.h
 include/jemalloc/jemalloc_protos.h.in
 include/jemalloc/jemalloc_rename.h
 include/jemalloc/jemalloc_rename.sh
+include/jemalloc/jemalloc_typedefs.h.in
 include/msvc_compat/
 install-sh
+jemalloc.pc*
+src/valgrind.c
 src/zone.c
 test/

Modified: head/contrib/jemalloc/FREEBSD-diffs
==============================================================================
--- head/contrib/jemalloc/FREEBSD-diffs	Mon Aug 17 23:44:38 2015	(r286865)
+++ head/contrib/jemalloc/FREEBSD-diffs	Tue Aug 18 00:21:25 2015	(r286866)
@@ -1,15 +1,14 @@
 diff --git a/doc/jemalloc.xml.in b/doc/jemalloc.xml.in
-index d8e2e71..330ba2a 100644
+index 8fc774b..47b453c 100644
 --- a/doc/jemalloc.xml.in
 +++ b/doc/jemalloc.xml.in
-@@ -57,12 +57,23 @@
+@@ -53,6 +53,17 @@
      <para>This manual describes jemalloc @jemalloc_version@.  More information
      can be found at the <ulink
      url="http://www.canonware.com/jemalloc/">jemalloc website</ulink>.</para>
 +
 +    <para>The following configuration options are enabled in libc's built-in
-+    jemalloc: <option>--enable-dss</option>,
-+    <option>--enable-experimental</option>, <option>--enable-fill</option>,
++    jemalloc: <option>--enable-fill</option>,
 +    <option>--enable-lazy-lock</option>, <option>--enable-munmap</option>,
 +    <option>--enable-stats</option>, <option>--enable-tcache</option>,
 +    <option>--enable-tls</option>, <option>--enable-utrace</option>, and
@@ -17,17 +16,11 @@ index d8e2e71..330ba2a 100644
 +    <option>--enable-debug</option> is enabled in development versions of
 +    FreeBSD (controlled by the <constant>MALLOC_PRODUCTION</constant> make
 +    variable).</para>
++
    </refsect1>
    <refsynopsisdiv>
      <title>SYNOPSIS</title>
-     <funcsynopsis>
-       <funcsynopsisinfo>#include &lt;<filename class="headerfile">stdlib.h</filename>&gt;
--#include &lt;<filename class="headerfile">jemalloc/jemalloc.h</filename>&gt;</funcsynopsisinfo>
-+#include &lt;<filename class="headerfile">malloc_np.h</filename>&gt;</funcsynopsisinfo>
-       <refsect2>
-         <title>Standard API</title>
-         <funcprototype>
-@@ -2342,4 +2353,19 @@ malloc_conf = "lg_chunk:24";]]></programlisting></para>
+@@ -2759,4 +2770,18 @@ malloc_conf = "lg_chunk:24";]]></programlisting></para>
      <para>The <function>posix_memalign<parameter/></function> function conforms
      to IEEE Std 1003.1-2001 (&ldquo;POSIX.1&rdquo;).</para>
    </refsect1>
@@ -38,9 +31,8 @@ index d8e2e71..330ba2a 100644
 +    FreeBSD 7.0.</para>
 +
 +    <para>The <function>aligned_alloc<parameter/></function>,
-+    <function>malloc_stats_print<parameter/></function>,
-+    <function>mallctl*<parameter/></function>, and
-+    <function>*allocm<parameter/></function> functions first appeared in
++    <function>malloc_stats_print<parameter/></function>, and
++    <function>mallctl*<parameter/></function> functions first appeared in
 +    FreeBSD 10.0.</para>
 +
 +    <para>The <function>*allocx<parameter/></function> functions first appeared
@@ -48,20 +40,11 @@ index d8e2e71..330ba2a 100644
 +  </refsect1>
  </refentry>
 diff --git a/include/jemalloc/internal/jemalloc_internal.h.in b/include/jemalloc/internal/jemalloc_internal.h.in
-index 574bbb1..e3eafdf 100644
+index 7a137b6..b0001e9 100644
 --- a/include/jemalloc/internal/jemalloc_internal.h.in
 +++ b/include/jemalloc/internal/jemalloc_internal.h.in
-@@ -1,5 +1,8 @@
- #ifndef JEMALLOC_INTERNAL_H
- #define	JEMALLOC_INTERNAL_H
-+#include "libc_private.h"
-+#include "namespace.h"
-+
- #include <math.h>
- #ifdef _WIN32
- #  include <windows.h>
-@@ -65,6 +68,9 @@ typedef intptr_t ssize_t;
- #include <valgrind/memcheck.h>
+@@ -8,6 +8,9 @@
+ #include <sys/ktrace.h>
  #endif
  
 +#include "un-namespace.h"
@@ -70,7 +53,7 @@ index 574bbb1..e3eafdf 100644
  #define	JEMALLOC_NO_DEMANGLE
  #ifdef JEMALLOC_JET
  #  define JEMALLOC_N(n) jet_##n
-@@ -99,13 +105,7 @@ static const bool config_fill =
+@@ -42,13 +45,7 @@ static const bool config_fill =
      false
  #endif
      ;
@@ -85,11 +68,25 @@ index 574bbb1..e3eafdf 100644
  static const bool config_prof =
  #ifdef JEMALLOC_PROF
      true
+diff --git a/include/jemalloc/internal/jemalloc_internal_decls.h b/include/jemalloc/internal/jemalloc_internal_decls.h
+index a601d6e..e7094b2 100644
+--- a/include/jemalloc/internal/jemalloc_internal_decls.h
++++ b/include/jemalloc/internal/jemalloc_internal_decls.h
+@@ -1,6 +1,9 @@
+ #ifndef JEMALLOC_INTERNAL_DECLS_H
+ #define	JEMALLOC_INTERNAL_DECLS_H
+ 
++#include "libc_private.h"
++#include "namespace.h"
++
+ #include <math.h>
+ #ifdef _WIN32
+ #  include <windows.h>
 diff --git a/include/jemalloc/internal/mutex.h b/include/jemalloc/internal/mutex.h
-index de44e14..564d604 100644
+index f051f29..561378f 100644
 --- a/include/jemalloc/internal/mutex.h
 +++ b/include/jemalloc/internal/mutex.h
-@@ -43,9 +43,6 @@ struct malloc_mutex_s {
+@@ -47,15 +47,13 @@ struct malloc_mutex_s {
  
  #ifdef JEMALLOC_LAZY_LOCK
  extern bool isthreaded;
@@ -99,24 +96,31 @@ index de44e14..564d604 100644
  #endif
  
  bool	malloc_mutex_init(malloc_mutex_t *mutex);
+ void	malloc_mutex_prefork(malloc_mutex_t *mutex);
+ void	malloc_mutex_postfork_parent(malloc_mutex_t *mutex);
+ void	malloc_mutex_postfork_child(malloc_mutex_t *mutex);
++bool	malloc_mutex_first_thread(void);
+ bool	mutex_boot(void);
+ 
+ #endif /* JEMALLOC_H_EXTERNS */
 diff --git a/include/jemalloc/internal/private_symbols.txt b/include/jemalloc/internal/private_symbols.txt
-index 93516d2..22f9af9 100644
+index dbf6aa7..f87dba8 100644
 --- a/include/jemalloc/internal/private_symbols.txt
 +++ b/include/jemalloc/internal/private_symbols.txt
-@@ -226,7 +226,6 @@ iralloc
- iralloct
- iralloct_realign
+@@ -277,7 +277,6 @@ iralloct_realign
  isalloc
+ isdalloct
+ isqalloc
 -isthreaded
  ivsalloc
  ixalloc
  jemalloc_postfork_child
 diff --git a/include/jemalloc/jemalloc_FreeBSD.h b/include/jemalloc/jemalloc_FreeBSD.h
 new file mode 100644
-index 0000000..94554bc
+index 0000000..66d6da5
 --- /dev/null
 +++ b/include/jemalloc/jemalloc_FreeBSD.h
-@@ -0,0 +1,134 @@
+@@ -0,0 +1,137 @@
 +/*
 + * Override settings that were generated in jemalloc_defs.h as necessary.
 + */
@@ -192,6 +196,7 @@ index 0000000..94554bc
 +#undef je_realloc
 +#undef je_free
 +#undef je_posix_memalign
++#undef je_aligned_alloc
 +#undef je_malloc_usable_size
 +#undef je_mallocx
 +#undef je_rallocx
@@ -209,6 +214,7 @@ index 0000000..94554bc
 +#define	je_realloc		__realloc
 +#define	je_free			__free
 +#define	je_posix_memalign	__posix_memalign
++#define	je_aligned_alloc	__aligned_alloc
 +#define	je_malloc_usable_size	__malloc_usable_size
 +#define	je_mallocx		__mallocx
 +#define	je_rallocx		__rallocx
@@ -238,6 +244,7 @@ index 0000000..94554bc
 +__weak_reference(__realloc, realloc);
 +__weak_reference(__free, free);
 +__weak_reference(__posix_memalign, posix_memalign);
++__weak_reference(__aligned_alloc, aligned_alloc);
 +__weak_reference(__malloc_usable_size, malloc_usable_size);
 +__weak_reference(__mallocx, mallocx);
 +__weak_reference(__rallocx, rallocx);
@@ -263,32 +270,142 @@ index f943891..47d032c 100755
 +#include "jemalloc_FreeBSD.h"
  EOF
 diff --git a/src/jemalloc.c b/src/jemalloc.c
-index 204778b..9e5f2df 100644
+index ed7863b..d078a1f 100644
 --- a/src/jemalloc.c
 +++ b/src/jemalloc.c
-@@ -8,6 +8,10 @@ malloc_tsd_data(, arenas, arena_t *, NULL)
- malloc_tsd_data(, thread_allocated, thread_allocated_t,
-     THREAD_ALLOCATED_INITIALIZER)
+@@ -4,6 +4,10 @@
+ /******************************************************************************/
+ /* Data. */
  
 +/* Work around <http://llvm.org/bugs/show_bug.cgi?id=12623>: */
 +const char	*__malloc_options_1_0 = NULL;
 +__sym_compat(_malloc_options, __malloc_options_1_0, FBSD_1.0);
 +
  /* Runtime configuration options. */
- const char	*je_malloc_conf;
+ const char	*je_malloc_conf JEMALLOC_ATTR(weak);
  bool	opt_abort =
-@@ -457,7 +461,8 @@ malloc_conf_init(void)
- #endif
- 			    ;
+@@ -2475,6 +2479,107 @@ je_malloc_usable_size(JEMALLOC_USABLE_SIZE_CONST void *ptr)
+  */
+ /******************************************************************************/
+ /*
++ * Begin compatibility functions.
++ */
++
++#define	ALLOCM_LG_ALIGN(la)	(la)
++#define	ALLOCM_ALIGN(a)		(ffsl(a)-1)
++#define	ALLOCM_ZERO		((int)0x40)
++#define	ALLOCM_NO_MOVE		((int)0x80)
++
++#define	ALLOCM_SUCCESS		0
++#define	ALLOCM_ERR_OOM		1
++#define	ALLOCM_ERR_NOT_MOVED	2
++
++int
++je_allocm(void **ptr, size_t *rsize, size_t size, int flags)
++{
++	void *p;
++
++	assert(ptr != NULL);
++
++	p = je_mallocx(size, flags);
++	if (p == NULL)
++		return (ALLOCM_ERR_OOM);
++	if (rsize != NULL)
++		*rsize = isalloc(p, config_prof);
++	*ptr = p;
++	return (ALLOCM_SUCCESS);
++}
++
++int
++je_rallocm(void **ptr, size_t *rsize, size_t size, size_t extra, int flags)
++{
++	int ret;
++	bool no_move = flags & ALLOCM_NO_MOVE;
++
++	assert(ptr != NULL);
++	assert(*ptr != NULL);
++	assert(size != 0);
++	assert(SIZE_T_MAX - size >= extra);
++
++	if (no_move) {
++		size_t usize = je_xallocx(*ptr, size, extra, flags);
++		ret = (usize >= size) ? ALLOCM_SUCCESS : ALLOCM_ERR_NOT_MOVED;
++		if (rsize != NULL)
++			*rsize = usize;
++	} else {
++		void *p = je_rallocx(*ptr, size+extra, flags);
++		if (p != NULL) {
++			*ptr = p;
++			ret = ALLOCM_SUCCESS;
++		} else
++			ret = ALLOCM_ERR_OOM;
++		if (rsize != NULL)
++			*rsize = isalloc(*ptr, config_prof);
++	}
++	return (ret);
++}
++
++int
++je_sallocm(const void *ptr, size_t *rsize, int flags)
++{
++
++	assert(rsize != NULL);
++	*rsize = je_sallocx(ptr, flags);
++	return (ALLOCM_SUCCESS);
++}
++
++int
++je_dallocm(void *ptr, int flags)
++{
++
++	je_dallocx(ptr, flags);
++	return (ALLOCM_SUCCESS);
++}
++
++int
++je_nallocm(size_t *rsize, size_t size, int flags)
++{
++	size_t usize;
++
++	usize = je_nallocx(size, flags);
++	if (usize == 0)
++		return (ALLOCM_ERR_OOM);
++	if (rsize != NULL)
++		*rsize = usize;
++	return (ALLOCM_SUCCESS);
++}
++
++#undef ALLOCM_LG_ALIGN
++#undef ALLOCM_ALIGN
++#undef ALLOCM_ZERO
++#undef ALLOCM_NO_MOVE
++
++#undef ALLOCM_SUCCESS
++#undef ALLOCM_ERR_OOM
++#undef ALLOCM_ERR_NOT_MOVED
++
++/*
++ * End compatibility functions.
++ */
++/******************************************************************************/
++/*
+  * The following functions are used by threading libraries for protection of
+  * malloc during fork().
+  */
+@@ -2575,4 +2680,11 @@ jemalloc_postfork_child(void)
+ 	ctl_postfork_child();
+ }
  
--			if ((opts = getenv(envname)) != NULL) {
-+			if (issetugid() == 0 && (opts = getenv(envname)) !=
-+			    NULL) {
- 				/*
- 				 * Do nothing; opts is already initialized to
- 				 * the value of the MALLOC_CONF environment
++void
++_malloc_first_thread(void)
++{
++
++	(void)malloc_mutex_first_thread();
++}
++
+ /******************************************************************************/
 diff --git a/src/mutex.c b/src/mutex.c
-index 788eca3..6f5954e 100644
+index 2d47af9..934d5aa 100644
 --- a/src/mutex.c
 +++ b/src/mutex.c
 @@ -66,6 +66,17 @@ pthread_create(pthread_t *__restrict thread,
@@ -296,21 +413,45 @@ index 788eca3..6f5954e 100644
  JEMALLOC_EXPORT int	_pthread_mutex_init_calloc_cb(pthread_mutex_t *mutex,
      void *(calloc_cb)(size_t, size_t));
 +
-+__weak_reference(_pthread_mutex_init_calloc_cb_stub,
-+    _pthread_mutex_init_calloc_cb);
-+
++#pragma weak _pthread_mutex_init_calloc_cb
 +int
-+_pthread_mutex_init_calloc_cb_stub(pthread_mutex_t *mutex,
++_pthread_mutex_init_calloc_cb(pthread_mutex_t *mutex,
 +    void *(calloc_cb)(size_t, size_t))
 +{
 +
-+	return (0);
++	return (((int (*)(pthread_mutex_t *, void *(*)(size_t, size_t)))
++	    __libc_interposing[INTERPOS__pthread_mutex_init_calloc_cb])(mutex,
++	    calloc_cb));
 +}
  #endif
  
  bool
+@@ -137,7 +148,7 @@ malloc_mutex_postfork_child(malloc_mutex_t *mutex)
+ }
+ 
+ bool
+-mutex_boot(void)
++malloc_mutex_first_thread(void)
+ {
+ 
+ #ifdef JEMALLOC_MUTEX_INIT_CB
+@@ -151,3 +162,14 @@ mutex_boot(void)
+ #endif
+ 	return (false);
+ }
++
++bool
++mutex_boot(void)
++{
++
++#ifndef JEMALLOC_MUTEX_INIT_CB
++	return (malloc_mutex_first_thread());
++#else
++	return (false);
++#endif
++}
 diff --git a/src/util.c b/src/util.c
-index 93a19fd..70b3e45 100644
+index 4cb0d6c..25b61c2 100644
 --- a/src/util.c
 +++ b/src/util.c
 @@ -58,6 +58,22 @@ wrtmessage(void *cbopaque, const char *s)

Modified: head/contrib/jemalloc/FREEBSD-upgrade
==============================================================================
--- head/contrib/jemalloc/FREEBSD-upgrade	Mon Aug 17 23:44:38 2015	(r286865)
+++ head/contrib/jemalloc/FREEBSD-upgrade	Tue Aug 18 00:21:25 2015	(r286866)
@@ -80,7 +80,13 @@ do_extract() {
 }
 
 do_diff() {
-  (cd ${work}; git add -A; git diff --cached) > FREEBSD-diffs
+  (
+    cd ${work}
+    find . -name '*.orig' -delete
+    find . -name '*.rej' -delete
+    git add -A
+    git diff --cached
+  ) > FREEBSD-diffs
 }
 
 command=$1

Modified: head/contrib/jemalloc/VERSION
==============================================================================
--- head/contrib/jemalloc/VERSION	Mon Aug 17 23:44:38 2015	(r286865)
+++ head/contrib/jemalloc/VERSION	Tue Aug 18 00:21:25 2015	(r286866)
@@ -1 +1 @@
-3.6.0-0-g46c0af68bd248b04df75e4f92d5fb804c3d75340
+4.0.0-0-g6e98caf8f064482b9ab292ef3638dea67420bbc2

Modified: head/contrib/jemalloc/doc/jemalloc.3
==============================================================================
--- head/contrib/jemalloc/doc/jemalloc.3	Mon Aug 17 23:44:38 2015	(r286865)
+++ head/contrib/jemalloc/doc/jemalloc.3	Tue Aug 18 00:21:25 2015	(r286866)
@@ -2,12 +2,12 @@
 .\"     Title: JEMALLOC
 .\"    Author: Jason Evans
 .\" Generator: DocBook XSL Stylesheets v1.76.1 <http://docbook.sf.net/>;
-.\"      Date: 03/31/2014
+.\"      Date: 08/17/2015
 .\"    Manual: User Manual
-.\"    Source: jemalloc 3.6.0-0-g46c0af68bd248b04df75e4f92d5fb804c3d75340
+.\"    Source: jemalloc 4.0.0-0-g6e98caf8f064482b9ab292ef3638dea67420bbc2
 .\"  Language: English
 .\"
-.TH "JEMALLOC" "3" "03/31/2014" "jemalloc 3.6.0-0-g46c0af68bd24" "User Manual"
+.TH "JEMALLOC" "3" "08/17/2015" "jemalloc 4.0.0-0-g6e98caf8f064" "User Manual"
 .\" -----------------------------------------------------------------
 .\" * Define some portability stuff
 .\" -----------------------------------------------------------------
@@ -31,12 +31,10 @@
 jemalloc \- general purpose memory allocation functions
 .SH "LIBRARY"
 .PP
-This manual describes jemalloc 3\&.6\&.0\-0\-g46c0af68bd248b04df75e4f92d5fb804c3d75340\&. More information can be found at the
+This manual describes jemalloc 4\&.0\&.0\-0\-g6e98caf8f064482b9ab292ef3638dea67420bbc2\&. More information can be found at the
 \m[blue]\fBjemalloc website\fR\m[]\&\s-2\u[1]\d\s+2\&.
 .PP
 The following configuration options are enabled in libc\*(Aqs built\-in jemalloc:
-\fB\-\-enable\-dss\fR,
-\fB\-\-enable\-experimental\fR,
 \fB\-\-enable\-fill\fR,
 \fB\-\-enable\-lazy\-lock\fR,
 \fB\-\-enable\-munmap\fR,
@@ -53,8 +51,7 @@ make variable)\&.
 .sp
 .ft B
 .nf
-#include <stdlib\&.h>
-#include <malloc_np\&.h>
+#include <jemalloc/jemalloc\&.h>
 .fi
 .ft
 .SS "Standard API"
@@ -81,6 +78,8 @@ make variable)\&.
 .BI "size_t sallocx(void\ *" "ptr" ", int\ " "flags" ");"
 .HP \w'void\ dallocx('u
 .BI "void dallocx(void\ *" "ptr" ", int\ " "flags" ");"
+.HP \w'void\ sdallocx('u
+.BI "void sdallocx(void\ *" "ptr" ", size_t\ " "size" ", int\ " "flags" ");"
 .HP \w'size_t\ nallocx('u
 .BI "size_t nallocx(size_t\ " "size" ", int\ " "flags" ");"
 .HP \w'int\ mallctl('u
@@ -97,17 +96,6 @@ make variable)\&.
 .BI "void (*malloc_message)(void\ *" "cbopaque" ", const\ char\ *" "s" ");"
 .PP
 const char *\fImalloc_conf\fR;
-.SS "Experimental API"
-.HP \w'int\ allocm('u
-.BI "int allocm(void\ **" "ptr" ", size_t\ *" "rsize" ", size_t\ " "size" ", int\ " "flags" ");"
-.HP \w'int\ rallocm('u
-.BI "int rallocm(void\ **" "ptr" ", size_t\ *" "rsize" ", size_t\ " "size" ", size_t\ " "extra" ", int\ " "flags" ");"
-.HP \w'int\ sallocm('u
-.BI "int sallocm(const\ void\ *" "ptr" ", size_t\ *" "rsize" ", int\ " "flags" ");"
-.HP \w'int\ dallocm('u
-.BI "int dallocm(void\ *" "ptr" ", int\ " "flags" ");"
-.HP \w'int\ nallocm('u
-.BI "int nallocm(size_t\ *" "rsize" ", size_t\ " "size" ", int\ " "flags" ");"
 .SH "DESCRIPTION"
 .SS "Standard API"
 .PP
@@ -134,7 +122,7 @@ The
 \fBposix_memalign\fR\fB\fR
 function allocates
 \fIsize\fR
-bytes of memory such that the allocation\*(Aqs base address is an even multiple of
+bytes of memory such that the allocation\*(Aqs base address is a multiple of
 \fIalignment\fR, and returns the allocation in the value pointed to by
 \fIptr\fR\&. The requested
 \fIalignment\fR
@@ -145,7 +133,7 @@ The
 \fBaligned_alloc\fR\fB\fR
 function allocates
 \fIsize\fR
-bytes of memory such that the allocation\*(Aqs base address is an even multiple of
+bytes of memory such that the allocation\*(Aqs base address is a multiple of
 \fIalignment\fR\&. The requested
 \fIalignment\fR
 must be a power of 2\&. Behavior is undefined if
@@ -188,7 +176,8 @@ The
 \fBrallocx\fR\fB\fR,
 \fBxallocx\fR\fB\fR,
 \fBsallocx\fR\fB\fR,
-\fBdallocx\fR\fB\fR, and
+\fBdallocx\fR\fB\fR,
+\fBsdallocx\fR\fB\fR, and
 \fBnallocx\fR\fB\fR
 functions all have a
 \fIflags\fR
@@ -217,11 +206,32 @@ is a power of 2\&.
 Initialize newly allocated memory to contain zero bytes\&. In the growing reallocation case, the real size prior to reallocation defines the boundary between untouched bytes and those that are initialized to contain zero bytes\&. If this macro is absent, newly allocated memory is uninitialized\&.
 .RE
 .PP
+\fBMALLOCX_TCACHE(\fR\fB\fItc\fR\fR\fB) \fR
+.RS 4
+Use the thread\-specific cache (tcache) specified by the identifier
+\fItc\fR, which must have been acquired via the
+"tcache\&.create"
+mallctl\&. This macro does not validate that
+\fItc\fR
+specifies a valid identifier\&.
+.RE
+.PP
+\fBMALLOCX_TCACHE_NONE\fR
+.RS 4
+Do not use a thread\-specific cache (tcache)\&. Unless
+\fBMALLOCX_TCACHE(\fR\fB\fItc\fR\fR\fB)\fR
+or
+\fBMALLOCX_TCACHE_NONE\fR
+is specified, an automatically managed tcache will be used under many circumstances\&. This macro cannot be used in the same
+\fIflags\fR
+argument as
+\fBMALLOCX_TCACHE(\fR\fB\fItc\fR\fR\fB)\fR\&.
+.RE
+.PP
 \fBMALLOCX_ARENA(\fR\fB\fIa\fR\fR\fB) \fR
 .RS 4
 Use the arena specified by the index
-\fIa\fR
-(and by necessity bypass the thread cache)\&. This macro has no effect for huge regions, nor for regions that were allocated via an arena other than the one specified\&. This macro does not validate that
+\fIa\fR\&. This macro has no effect for regions that were allocated via an arena other than the one specified\&. This macro does not validate that
 \fIa\fR
 specifies an arena index in the valid range\&.
 .RE
@@ -274,6 +284,17 @@ function causes the memory referenced by
 to be made available for future allocations\&.
 .PP
 The
+\fBsdallocx\fR\fB\fR
+function is an extension of
+\fBdallocx\fR\fB\fR
+with a
+\fIsize\fR
+parameter to allow the caller to pass in the allocation size as an optimization\&. The minimum valid input size is the original requested size of the allocation, and the maximum valid input size is the corresponding value returned by
+\fBnallocx\fR\fB\fR
+or
+\fBsallocx\fR\fB\fR\&.
+.PP
+The
 \fBnallocx\fR\fB\fR
 function allocates no memory, but it performs the same size computation as the
 \fBmallocx\fR\fB\fR
@@ -367,7 +388,7 @@ uses the
 \fBmallctl*\fR\fB\fR
 functions internally, so inconsistent statistics can be reported if multiple threads use these functions simultaneously\&. If
 \fB\-\-enable\-stats\fR
-is specified during configuration, \(lqm\(rq and \(lqa\(rq can be specified to omit merged arena and per arena statistics, respectively; \(lqb\(rq and \(lql\(rq can be specified to omit per size class statistics for bins and large objects, respectively\&. Unrecognized characters are silently ignored\&. Note that thread caching may prevent some statistics from being completely up to date, since extra locking would be required to merge counters that track thread cache operations\&.
+is specified during configuration, \(lqm\(rq and \(lqa\(rq can be specified to omit merged arena and per arena statistics, respectively; \(lqb\(rq, \(lql\(rq, and \(lqh\(rq can be specified to omit per size class statistics for bins, large objects, and huge objects, respectively\&. Unrecognized characters are silently ignored\&. Note that thread caching may prevent some statistics from being completely up to date, since extra locking would be required to merge counters that track thread cache operations\&.
 .PP
 The
 \fBmalloc_usable_size\fR\fB\fR
@@ -378,126 +399,6 @@ function is not a mechanism for in\-plac
 \fBrealloc\fR\fB\fR; rather it is provided solely as a tool for introspection purposes\&. Any discrepancy between the requested allocation size and the size reported by
 \fBmalloc_usable_size\fR\fB\fR
 should not be depended on, since such behavior is entirely implementation\-dependent\&.
-.SS "Experimental API"
-.PP
-The experimental API is subject to change or removal without regard for backward compatibility\&. If
-\fB\-\-disable\-experimental\fR
-is specified during configuration, the experimental API is omitted\&.
-.PP
-The
-\fBallocm\fR\fB\fR,
-\fBrallocm\fR\fB\fR,
-\fBsallocm\fR\fB\fR,
-\fBdallocm\fR\fB\fR, and
-\fBnallocm\fR\fB\fR
-functions all have a
-\fIflags\fR
-argument that can be used to specify options\&. The functions only check the options that are contextually relevant\&. Use bitwise or (|) operations to specify one or more of the following:
-.PP
-\fBALLOCM_LG_ALIGN(\fR\fB\fIla\fR\fR\fB) \fR
-.RS 4
-Align the memory allocation to start at an address that is a multiple of
-(1 << \fIla\fR)\&. This macro does not validate that
-\fIla\fR
-is within the valid range\&.
-.RE
-.PP
-\fBALLOCM_ALIGN(\fR\fB\fIa\fR\fR\fB) \fR
-.RS 4
-Align the memory allocation to start at an address that is a multiple of
-\fIa\fR, where
-\fIa\fR
-is a power of two\&. This macro does not validate that
-\fIa\fR
-is a power of 2\&.
-.RE
-.PP
-\fBALLOCM_ZERO\fR
-.RS 4
-Initialize newly allocated memory to contain zero bytes\&. In the growing reallocation case, the real size prior to reallocation defines the boundary between untouched bytes and those that are initialized to contain zero bytes\&. If this macro is absent, newly allocated memory is uninitialized\&.
-.RE
-.PP
-\fBALLOCM_NO_MOVE\fR
-.RS 4
-For reallocation, fail rather than moving the object\&. This constraint can apply to both growth and shrinkage\&.
-.RE
-.PP
-\fBALLOCM_ARENA(\fR\fB\fIa\fR\fR\fB) \fR
-.RS 4
-Use the arena specified by the index
-\fIa\fR
-(and by necessity bypass the thread cache)\&. This macro has no effect for huge regions, nor for regions that were allocated via an arena other than the one specified\&. This macro does not validate that
-\fIa\fR
-specifies an arena index in the valid range\&.
-.RE
-.PP
-The
-\fBallocm\fR\fB\fR
-function allocates at least
-\fIsize\fR
-bytes of memory, sets
-\fI*ptr\fR
-to the base address of the allocation, and sets
-\fI*rsize\fR
-to the real size of the allocation if
-\fIrsize\fR
-is not
-\fBNULL\fR\&. Behavior is undefined if
-\fIsize\fR
-is
-\fB0\fR, or if request size overflows due to size class and/or alignment constraints\&.
-.PP
-The
-\fBrallocm\fR\fB\fR
-function resizes the allocation at
-\fI*ptr\fR
-to be at least
-\fIsize\fR
-bytes, sets
-\fI*ptr\fR
-to the base address of the allocation if it moved, and sets
-\fI*rsize\fR
-to the real size of the allocation if
-\fIrsize\fR
-is not
-\fBNULL\fR\&. If
-\fIextra\fR
-is non\-zero, an attempt is made to resize the allocation to be at least
-(\fIsize\fR + \fIextra\fR)
-bytes, though inability to allocate the extra byte(s) will not by itself result in failure\&. Behavior is undefined if
-\fIsize\fR
-is
-\fB0\fR, if request size overflows due to size class and/or alignment constraints, or if
-(\fIsize\fR + \fIextra\fR > \fBSIZE_T_MAX\fR)\&.
-.PP
-The
-\fBsallocm\fR\fB\fR
-function sets
-\fI*rsize\fR
-to the real size of the allocation\&.
-.PP
-The
-\fBdallocm\fR\fB\fR
-function causes the memory referenced by
-\fIptr\fR
-to be made available for future allocations\&.
-.PP
-The
-\fBnallocm\fR\fB\fR
-function allocates no memory, but it performs the same size computation as the
-\fBallocm\fR\fB\fR
-function, and if
-\fIrsize\fR
-is not
-\fBNULL\fR
-it sets
-\fI*rsize\fR
-to the real size of the allocation that would result from the equivalent
-\fBallocm\fR\fB\fR
-function call\&. Behavior is undefined if
-\fIsize\fR
-is
-\fB0\fR, or if request size overflows due to size class and/or alignment constraints\&.
 .SH "TUNING"
 .PP
 Once, when the first call is made to one of the memory allocation routines, the allocator initializes its internals based in part on various options that can be specified at compile\- or run\-time\&.
@@ -535,8 +436,8 @@ options\&. Some options have boolean val
 Traditionally, allocators have used
 \fBsbrk\fR(2)
 to obtain memory, which is suboptimal for several reasons, including race conditions, increased fragmentation, and artificial limitations on maximum usable memory\&. If
-\fB\-\-enable\-dss\fR
-is specified during configuration, this allocator uses both
+\fBsbrk\fR(2)
+is supported by the operating system, this allocator uses both
 \fBmmap\fR(2)
 and
 \fBsbrk\fR(2), in that order of preference; otherwise only
@@ -551,18 +452,29 @@ is specified during configuration, this 
 .PP
 Memory is conceptually broken into equal\-sized chunks, where the chunk size is a power of two that is greater than the page size\&. Chunks are always aligned to multiples of the chunk size\&. This alignment makes it possible to find metadata for user objects very quickly\&.
 .PP
-User objects are broken into three categories according to size: small, large, and huge\&. Small objects are smaller than one page\&. Large objects are smaller than the chunk size\&. Huge objects are a multiple of the chunk size\&. Small and large objects are managed by arenas; huge objects are managed separately in a single data structure that is shared by all threads\&. Huge objects are used by applications infrequently enough that this single data structure is not a scalability issue\&.
+User objects are broken into three categories according to size: small, large, and huge\&. Small and large objects are managed entirely by arenas; huge objects are additionally aggregated in a single data structure that is shared by all threads\&. Huge objects are typically used by applications infrequently enough that this single data structure is not a scalability issue\&.
 .PP
 Each chunk that is managed by an arena tracks its contents as runs of contiguous pages (unused, backing a set of small objects, or backing one large object)\&. The combination of chunk alignment and chunk page maps makes it possible to determine all metadata regarding small and large allocations in constant time\&.
 .PP
-Small objects are managed in groups by page runs\&. Each run maintains a frontier and free list to track which regions are in use\&. Allocation requests that are no more than half the quantum (8 or 16, depending on architecture) are rounded up to the nearest power of two that is at least
-sizeof(\fBdouble\fR)\&. All other small object size classes are multiples of the quantum, spaced such that internal fragmentation is limited to approximately 25% for all but the smallest size classes\&. Allocation requests that are larger than the maximum small size class, but small enough to fit in an arena\-managed chunk (see the
+Small objects are managed in groups by page runs\&. Each run maintains a bitmap to track which regions are in use\&. Allocation requests that are no more than half the quantum (8 or 16, depending on architecture) are rounded up to the nearest power of two that is at least
+sizeof(\fBdouble\fR)\&. All other object size classes are multiples of the quantum, spaced such that there are four size classes for each doubling in size, which limits internal fragmentation to approximately 20% for all but the smallest size classes\&. Small size classes are smaller than four times the page size, large size classes are smaller than the chunk size (see the
 "opt\&.lg_chunk"
-option), are rounded up to the nearest run size\&. Allocation requests that are too large to fit in an arena\-managed chunk are rounded up to the nearest multiple of the chunk size\&.
+option), and huge size classes extend from the chunk size up to one size class less than the full address space size\&.
 .PP
 Allocations are packed tightly together, which can be an issue for multi\-threaded applications\&. If you need to assure that allocations do not suffer from cacheline sharing, round your allocation requests up to the nearest multiple of the cacheline size, or specify cacheline alignment when allocating\&.
 .PP
-Assuming 4 MiB chunks, 4 KiB pages, and a 16\-byte quantum on a 64\-bit system, the size classes in each category are as shown in
+The
+\fBrealloc\fR\fB\fR,
+\fBrallocx\fR\fB\fR, and
+\fBxallocx\fR\fB\fR
+functions may resize allocations without moving them under limited circumstances\&. Unlike the
+\fB*allocx\fR\fB\fR
+API, the standard API does not officially round up the usable size of an allocation to the nearest size class, so technically it is necessary to call
+\fBrealloc\fR\fB\fR
+to grow e\&.g\&. a 9\-byte allocation to 16 bytes, or shrink a 16\-byte allocation to 9 bytes\&. Growth and shrinkage trivially succeeds in place as long as the pre\-size and post\-size both round up to the same size class\&. No other API guarantees are made regarding in\-place resizing, but the current implementation also tries to resize large and huge allocations in place, as long as the pre\-size and post\-size are both large or both huge\&. In such cases shrinkage always succeeds for large size classes, but for huge size classes the chunk allocator must support splitting (see
+"arena\&.<i>\&.chunk_hooks")\&. Growth only succeeds if the trailing memory is currently available, and additionally for huge size classes the chunk allocator must support merging\&.
+.PP
+Assuming 2 MiB chunks, 4 KiB pages, and a 16\-byte quantum on a 64\-bit system, the size classes in each category are as shown in
 Table 1\&.
 .sp
 .it 1 an-trap
@@ -588,8 +500,23 @@ l r l
 ^ r l
 ^ r l
 ^ r l
+^ r l
+^ r l
+l r l
+^ r l
+^ r l
+^ r l
+^ r l
+^ r l
+^ r l
+^ r l
 l r l
-l r l.
+^ r l
+^ r l
+^ r l
+^ r l
+^ r l
+^ r l.
 T{
 Small
 T}:T{
@@ -600,7 +527,7 @@ T}
 :T{
 16
 T}:T{
-[16, 32, 48, \&.\&.\&., 128]
+[16, 32, 48, 64, 80, 96, 112, 128]
 T}
 :T{

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201508180021.t7I0LQwE016289>