7.  Implementation

      A new malloc(3) implementation was written to meet the goals, and to the extent possible to address the shortcomings listed previously.

      The source is 1218 lines of C code, and can be found in FreeBSD 2.2 (and probably later versions as well) as src/lib/libc/stdlib/malloc.c.

      The main data structure is the page-directory which contains a void* for each page we have control over. The value can be one of:

      In addition, there exists a linked list of small data structures that describe the free space as runs of free pages.

      Notice that these structures are not part of the free pages themselves, but rather allocated with malloc so that the free pages themselves are never referenced while they are free.

      When a request for storage comes in, it will be treated as a ``page'' allocation if it is bigger than half a page. The free list will be searched and the first run of free pages that can satisfy the request is used. The first page gets set to MALLOC_FIRST status. If more than that one page is needed, the rest of them get MALLOC_FOLLOW status in the page-directory.

      If there were no pages on the free list, brk(2) will be called, and the pages will get added to the page-directory with status MALLOC_FREE and the search restarts.

      Freeing a number of pages is done by changing their state in the page directory to MALLOC_FREE, and then traversing the free-pages list to find the right place for this run of pages, possibly collapsing with the two neighbouring runs into one run and, if possible, releasing some memory back to the kernel by calling brk(2).

      If the request is less than or equal to half of a page, its size will be rounded up to the nearest power of two before being processed and if the request is less than some minimum size, it is rounded up to that size.

      These sub-page allocations are served from pages which are split up into some number of equal size chunks. For each of these pages a struct pginfo describes the size of the chunks on this page, how many there are, how many are free and so on. The description consist of a bitmap of used chunks, and various counters and numbers used to keep track of the stuff in the page.

      For each size of sub-page allocation, the pginfo structures for the pages that have free chunks in them form a list. The heads of these lists are stored in predetermined slots at the beginning of the page directory to make access fast.

      To allocate a chunk of some size, the head of the list for the corresponding size is examined, and a free chunk found. The number of free chunks on that page is decreased by one and, if zero, the pginfo structure is unlinked from the list.

      To free a chunk, the page is derived from the pointer, the page table for that page contains a pointer to the pginfo structure, where the free bit is set for the chunk, the number of free chunks increased by one, and if equal to one, the pginfo structure is linked into the proper place on the list for this size of chunks. If the count increases to match the number of chunks on the page, the pginfo structure is unlinked from the list and free(3)'ed and the actual page itself is free(3)'ed too.

      To be 100% correct performance-wise these lists should be ordered according to the recent number of accesses to that page. This information is not available and it would essentially mean a reordering of the list on every memory reference to keep it up-to-date. Instead they are ordered according to the address of the pages. Interestingly enough, in practice this comes out to almost the same thing performance wise.

      It's not that surprising after all, it's the difference between following the crowd or actively directing where it can go, in both ways you can end up in the middle of it all.

      The side effect of this compromise is that it also uses less storage, and the list never has to be reordered, all the ordering happens when pages are added or deleted.

      It is an interesting twist to the implementation that the struct pginfo Is allocated with malloc. That is, "as with malloc" to be painfully correct. The code knows the special case where the first (couple) of allocations on the page is actually the pginfo structure and deals with it accordingly. This avoids some silly "chicken and egg" issues.