Hawkins Allocator

Features

Power-of-two memory allocator suitable for use in an embedded application or RTOS kernel
Constant-time allocations and deallocations
Automatically coalesces blocks and returns pages to the free list
Can return the size of an allocated block in constant time
Uses one word of overhead per page in a static structure

Early History

I needed a custom memory allocator for use in the DPU software framework I designed as part of my thesis. The built-in C++ new was unacceptable because it used a best fit algorithm that runs in unbounded time. The result of this effort was a less sophisticated algorithm than the one presented here that allowed only two block sizes (one large, one small).

Description of the original algorithm from my thesis

The one place where dynamic memory allocation is needed is the Scheduling System to allow for runtime creation of Tasks. Unfortunately the built-in C++ new operator is optimized for size efficiency and cannot meet our real-time constraints [41]. The simplest memory allocation algorithm we could use is to partition a large buffer into fixed-size chunks large enough to hold the largest possible Task. This would work fine if most Tasks were of comparable size, but we expect to have small Tasks for monitoring a single quantity and large Tasks for Event capture and histogramming. A one-size-fits-all approach would be very inefficient. Having two buffers using two different sized chunks could work, but we might find ourselves with no free memory in one buffer and plenty of free memory in the other. What we would like is to find a solution similar to the double-ended priority queue that would allow us to collocate the small and large chunks while allowing for inevitable fragmentation of memory.

The solution is to choose the large chunk size to be a power-of-two multiple of the small chunk size (for efficiency we choose both chunk sizes to be powers of two). The buffer is divided into large chunks that can be allocated for larger Tasks. If a smaller Task is needed, a large chunk can be subdivided into several small chunks (the ratio of the two sizes) and those small chunks can be handed out.

Figure 29: A typical allocation of large and small chunks in the Task heap

The large chunks are managed by a set of THeapNode instances that contain a bitmap indicating which contained small chunks are in use when subdivided. These are created in a static array and the index in the array is the index of the corresponding large chunk of memory being managed. Each THeapNode instance stores previous/next indices and they are initially connected to form a doubly linked list that contains all free large chunks. When a large chunk is needed, the head of this list is removed and the corresponding chunk is given out for use. When the first small chunk is needed, the first small chunk in the head of the free large chunk list is marked as used in the bitmap and the THeapNode instance is moved to a new list for large chunks with one small chunk in use. There are similar lists for large chunks with two, three, or any number of small chunks in use including all of them. When a new small chunk is needed, the lists are checked to find the large chunk with the fewest small chunks free. Using this large chunk helps minimize memory fragmentation.

To free memory the address is used to locate the corresponding THeapNode instance (and the small chunk within it if necessary). If it is a large chunk being freed then the chunk is simply placed at the head of the free large chunk list. If it is a small chunk then the bitmap is updated to free the small chunk and the THeapNode instance is moved to the list that corresponds to the new number of small chunks in use. This requires splicing the list around the THeapNode instance where it is removed (which requires doubly-linked lists).

Allocating a large chunk, freeing a large chunk, and freeing a small chunk are all constant-time operations. Allocating a small chunk is linear in the ratio of the large and small chunk sizes (the head of each list must be checked to find the large chunk with the fewest free small chunks). Since this ratio is fixed, this is also constant time and does not scale with the total number of memory chunks.

McKusick-Karels Allocator

The algorithm I present here turns out to be very similar to the well-known McKusick-Karels allocator. The chief difference is that pages, not blocks, are placed into my primary linked lists and each page contains a linked list of its free blocks. The memory usage is essentially the same but additional operations are needed for allocations and deallocations. In exchange we get automatic coalescing and guaranteed constant-time operations.

From Design of a General Purpose Memory Allocator for the 4.3BSD UNIX Kernel

McKusick, Marshall and Karels, Michael: Proceedings of the San Francisco USENIX Conference: 295-303, June 1988

Another technique to improve both the efficiency of memory utilization and the speed of allocation is to cluster same-sized small allocations on a page. When a list for a power-of-two allocation is empty, a new page is allocated and divided into pieces of the needed size. This strategy speeds future allocations as several pieces of memory become available as a result of the call into the allocator.

Because the size is not specified when a block of memory is freed, the allocator must keep track of the sizes of the pieces it has handed out. The 4.2BSD user-level allocator stores the size of each block in a header just before the allocation. However, this strategy doubles the memory requirement for allocations that require a power-of-two-sized block. Therefore, instead of storing the size of each piece of memory with the piece itself, the size information is associated with the memory page. Figure 2 shows how the kernel determines the size of a piece of memory that is being freed, by calculating the page in which it resides, and looking up the size associated with that page. Eliminating the cost of the overhead per piece improved utilization far more than expected. The reason is that many allocations in the kernel are for blocks of memory whose size is exactly a power of two. These requests would be nearly doubled if the user-level strategy were used. Now they can be accommodated with no wasted memory.

Initialization Values

PageSize
Size of a page.

SmallSize
Minimum block size.

PageCount
Number of entries in the Size array, and number of pages in the Page Array (see below). This value will generally be derived from the total amount of memory available for all data structures.

Data Structures

Page Array
This is the memory that is being allocated. Blocks larger than PageSize can be handled by using a first-fit or best-fit algorithm to find contiguous pages of sufficient size. This possible extension is not considered here because such allocations cannot be performed in bounded time.
Size Array
This is the list indicating which block size is being stored in a given page. Unlike the kmemsizes[] array used by the McKusick-Karels allocator, I store the log₂ of the block size. Also, when a page is only partially full the Size Array entry will instead contain the offset to the Key Block (discussed below) and the Key Block will contain the block size. The data type stored in the Size Array is SizeType and will be no larger than a size_t in C/C++.
Head/Tail Table
This is a list containing one head and one tail page index for each block size. There are log₂( PageSize / SmallSize ) + 1 index pairs in the Head/Tail Table, and each index is a SizeType.
Allocator Instance
A number of additional variables, including the free page list head index and various pre-computed log₂'s, are stored in the class/struct instance of the allocator itself.

Data Structure Diagram

Diagram of Page Array, Size Array, and Head/Tail Table

Invariants

Free Page List
All free pages participate in a singly-linked list. A page index to the next free page is stored at the beginning of the first block of each page.
Allocated Page Linked lists
Each allocated page with at least one free block participates in a doubly-linked list according to its block size. One block (called the Key Block) contains the two page indices needed along with the block size, a singly-linked list pointer for the internal block list (see below), and a count of the number of blocks that have been allocated from this page. This places a minimum on the value of SmallSize.
Head/Tail Table Lists
Each Head/Tail Table entry contains the head and tail indices for a linked list of all pages of the corresponding size with at least one free block.
Size Table Value
The Size Table entry corresponding to a page with allocated blocks is the log₂ of the size of the allocated blocks being kept in that page. If all blocks in the page have been allocated, the entry contains the offset to the start of the Key Block instead. The entry corresponding to a free page is undefined.
Internal Block List
Each allocated page with at least one free block contains an internal singly-linked list of all unallocated blocks that begins in the Key Block. The beginning of each free block other than the Key Block contains the offset to the beginning of the next free block. To maintain constant-time performance, a special end marker is available that means "all blocks after this one are also in the list". The blocks that follow in that page should be considered part of the linked-list even though they do not contain offsets to the next block.

Linked-List Diagram

Diagram of linked list structures within the pages/blocks

Constants

SENTINEL_END
Marker value for end of singly-linked list of blocks.

SENTINEL_CONT
Marker value for end of singly-linked list of blocks that implies all blocks that follow are included.

Variables

Size
Allocation request in bytes.

SizeIndex
Index into Head/Tail Table for Size, ranges 0 through log₂( PageSize / SmallSize ).

PageIndex
Index of page in Size Array, ranges 0 through PageCount - 1.

BlockCount
Number of blocks per page for a given block size.

BlockOffset
Offset of block relative to page start, ranges 0 through PageSize - 1.

KeyBlockOffset
Offset of the Key Block relative to page start.

FreePageIndex
Head of singly-linked list of free pages.

Allocation Algorithm

Summary
Use the size requested to find the appropriate Head/Tail Table list and move a page from the free page list to that list if that list is empty. Use the Size Array entry for the first page on that list to locate the Key Block. Check if the Key Block points to another block that we can hand out. If it doesn't, take the Key Block instead and remove the page from the list it was in. Return a pointer to the block we located.

Allocate()

Input Size in bytes

SizeIndex is log₂( Size ) - log₂( SmallSize )

If Head/Tail Table list for SizeIndex is empty, TakeFromFreeList( SizeIndex )

Get PageIndex from front of Head/Tail Table list for SizeIndex

Get KeyBlockOffset from Size Array entry for PageIndex

Get BlockOffset from GetFreeBlock( SizeIndex, PageIndex, KeyBlockOffset )

If BlockOffset = KeyBlockOffset, RemovePageFromFront( SizeIndex )

Return pointer to block corresponding to PageIndex and BlockOffset

TakeFromFreeList()

Input SizeIndex

Remove a page from front of singly-linked list that begins with FreePageIndex

Key Block is the first block of the page

Add page to front of doubly-linked Head/Tail Table list for SizeIndex

Set size element of Key Block to SizeIndex

Set allocated count element of Key Block to 0

Set Size Array entry for page to point to Key Block

SecondBlock is SmallSize * ( 1 << SizeIndex )

If SecondBlock = PageSize, set Key Block singly-linked list head to SENTINEL_END and return

Set Key Block singly-linked list head to SecondBlock

Set second block singly-linked list pointer to SENTINEL_CONT

GetFreeBlock()

Input SizeIndex, PageIndex, and KeyBlockOffset

BlockListHead is singly-linked list head in Key Block

If BlockListHead = SENTINEL_END, return KeyBlockOffset

BlockListNext is singly-linked list pointer in block pointed to by BlockListHead

If BlockListNext != SENTINEL_CONT, NewBlockListHead is BlockListNext

Otherwise, NewBlockListHead is ContinueBlock( SizeIndex, BlockListHead )

Set singly-linked list head in Key Block to NewBlockListHead

Increment allocated count element of Key Block

Return BlockListHead

ContinueBlock()

Input SizeIndex and BlockListHead

NextBlock is BlockListHead + SmallSize * ( 1 << SizeIndex )

If NextBlock = PageSize, return SENTINEL_END

Set singly-linked list pointer in block pointed to by NextBlock to SENTINEL_CONT

Return NextBlock

RemovePageFromFront()

Input SizeIndex

Remove page from front of doubly-linked Head/Tail Table list for SizeIndex

Set Size Array entry for page to SizeIndex

Performance

If Size is known at compile-time the log₂ can also be computed at compile-time. Otherwise it will take O{ log₂( bits in SizeType ) } basic operations to compute.

Total is an avoidable O{ log₂( bits in SizeType ) } plus O{ 1 }.

Deallocation Algorithm

Summary
Use the pointer to be deallocated to find the page and block being deallocated. Look up the size of the page's blocks in the Size Table or in the Key Block it points to. If the size is in the Size Table, make the deallocated block the new Key Block and point the Size Table entry to it. Otherwise add the block to the front of the singly-linked block list in the Key Block and remove the page from the doubly-linked list it is in. Add the page to either the back of the appropriate Head/Tail Table list or the front of the free page list.

Deallocate()

Input Pointer to be deallocated

Get PageIndex and BlockOffset corresponding to Pointer

Examine Size Table entry corresponding to PageIndex

PageWasFull is true if the Size Table entry for PageIndex contains SizeIndex, false if it contains KeyBlockOffset

If PageWasFull, get SizeIndex from Size Table entry and MakeKeyBlock( SizeIndex, PageIndex, BlockOffset)

Otherwise get SizeIndex from size element of Key Block and RemovePageFromMiddle( SizeIndex, PageIndex )

Decrement allocated count element of Key Block

If allocated count element of Key Block is 0, AddPageToFreeList( PageIndex )

Otherwise AddPageToBack( SizeIndex, PageIndex )

MakeKeyBlock()

Input SizeIndex, PageIndex, and BlockOffset

Key Block is the block at BlockOffset

BlockCount is 1 << ( log₂( PageSize / SmallSize ) - SizeIndex )

Set Key Block singly-linked list head to SENTINEL_END

Set size element of Key Block to SizeIndex

Set allocated count element of Key Block to BlockCount

Set Size Array entry for page to point to Key Block

RemovePageFromMiddle()

Input SizeIndex and PageIndex

Remove page at PageIndex from doubly-linked Head/Tail Table list for SizeIndex and repair list
AddPageToFreeList()

Input PageIndex

Add page at PageIndex to front of singly-linked list that begins with FreePageIndex
AddPageToBack()

Input SizeIndex and PageIndex

Add page at PageIndex to back of doubly-linked Head/Tail Table list for SizeIndex
Performance

Total is O{ 1 }.

Size Algorithm

Summary
Use the pointer to be checked to find the page. Look up the size of the page's blocks in the Size Table or Key Block. Return the size.

Size()

Input Pointer to be checked

Get PageIndex corresponding to Pointer

Examine Size Table entry corresponding to PageIndex

PageWasFull is true if the Size Table entry for PageIndex contains SizeIndex, false if it contains KeyBlockOffset

If PageWasFull, get SizeIndex from Size Table entry

Otherwise get SizeIndex from size element of Key Block

Return SmallSize * ( 1 << SizeIndex )

Performance

Total is O{ 1 }.

Conclusions

The table below compares the Hawkins allocator to the McKusick-Karels allocator with and without some form of coalescing. Note that McKusick-Karels can delay coalescing making it difficult to assign a cost to specific operations.

	Hawkins	M-K	M-K with coalescing
Memory usage per page	1 *SizeType*	1 *SizeType*	1 *SizeType*
Computing *log₂* for allocation	O{ log₂( bits in SizeType ) }	O{ log₂( bits in SizeType ) }	O{ log₂( bits in SizeType ) }
Allocation worst-case	*O{ 1 }*	*O{ 1 }*	*O{ BlockCount }*
Allocation typical	*O{ 1 }*	*O{ 1 }*	*O{ 1 }*
Deallocation	*O{ 1 }*	*O{ 1 }*	depends on *PageCount*
Checking the size of a block	*O{ 1 }*	*O{ 1 }*	*O{ 1 }*
Constants in front of scaling	Generally larger	Generally smaller	Generally smaller
Coalesces?	Yes	No	Yes (may be delayed)

Overall the McKusick-Karels allocator is likely to be more efficient than the Hawkins allocator in run-time performance, and may also offer additional space savings if the kmemsizes[] array is changed to store log₂ of the block size (useful when the maximum block size is too large to fit in a byte). The primary advantages of the Hawkins allocator are:

Automatic coalescing
Superior worst-case performance compared to McKusick-Karels allocator with coalescing
Performance never depends on PageCount

In cases where coalescing is needed, the Hawkins allocator provides bounded operations suitable for use in real-time applications.

Implementation

A fully-portable C++ implementation and a more constrained and efficient C implementation are in the works and will be posted under an open-source license when complete.

Extensions

Support larger allocations using a first-fit or best-fit search for multiple pages.
Don't use all possible block sizes between SmallSize and PageSize. Storage will be less efficient but there will be less fragmentation.
Pages getting a block deallocated are currently placed at the end of their Head/Tail Table list. With random allocations and deallocations you can use any scheme and get the same amount of fragmentation, but this choice works well when blocks that were allocated together tend to be deallocated together. More elaborate choices are possible, such as grouping pages according to age (similar to generational garbage collectors).