Fast Efficient Fixed-Size Memory Pool: No Loops and No Overhead

COMPUTATION TOOLS 2012 : The Third International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking Fast Efficient Fixed-Size Memory Pool No Loops and No Overhead Ben Kenwright School of Computer Science Newcastle University Newcastle, United Kingdom, [email protected] Abstract--In this paper, we examine a ready-to-use, robust, it can be used in conjunction with an existing system to and computationally fast fixed-size memory pool manager provide a hybrid solution with minimum difficulty. On with no-loops and no-memory overhead that is highly suited the other hand, multiple instances of numerous fixed-sized towards time-critical systems such as games. The algorithm pools can be used to produce a general overall flexible achieves this by exploiting the unused memory slots for general solution to work in place of the current system bookkeeping in combination with a trouble-free indexing memory manager. scheme. We explain how it works in amalgamation with Alternatively, in some time critical systems such as straightforward step-by-step examples. Furthermore, we games; system allocations are reduced to a bare minimum compare just how much faster the memory pool manager is to make the process run as fast as possible. However, for when compared with a system allocator (e.g., malloc) over a a dynamically changing system, it is necessary to allocate range of allocations and sizes. memory for changing resources, e.g., data assets Keywords-memory pools; real-time; memory allocation; (graphical images, sound files, scripts) which are loaded memory manager; memory de-allocation; dynamic memory dynamically at runtime. The sizes of these resources can be determined prior to running. This then makes the fixed I. INTRODUCTION memory pool manager ideal. Alternatively, as mentioned A high-quality memory management system is crucial a range of pools can be used for a best-fit approach to for any application that performs a large number of accommodate varying size data. allocations and de-allocations. In retrospect, studies have Naive memory pool implementations initialize all the shown that in some cases an application can spend on memory pool segments when created [6][7] . This can be average 30% of its processing time within the memory expensive since it is usually necessary to loop over all the manager functions [1–4] and in some cases this can be as uninitialized segments. Our algorithm differs by only high as 43% [5] . initializing the first element and so has little However, speed is only one of the features we look at computational overhead when it is created (i.e., no loops). for a good memory manager; in addition, we are also Hence, if a memory pool is only partially used and concerned with: destroyed, this wastes fewer processor cycles. Furthermore, for dynamic memory systems where • Memory management must not use any resources partitioned memory is constantly created and destroyed (both memory or computational cost) this initialization cost can be important (e.g., pools being • Minimize fragmentation repeatedly partitioned into smaller pools at run-time). • Complexity (ideally a straightforward and logical In summary, a memory pool can make an application algorithm that can be implemented without too execute faster, give greater control, add greater flexibility, many problems) enable greater customizability, greatly enhance • Ability to verify and identify memory problems robustness, and prevent fragmentation. To conclude, this (corruption, leaks). paper presents the implementation for a straightforward, fast, flexible, and portable fixed-size memory pool Nevertheless, the majority of applications use a algorithm that can accomplish O(1) time complexity general memory management system, which tries to memory allocation and de-allocation that is ideal for high provide a best-for-all solution by catering for every speed applications. possible scenario. For some systems, where speed is The fixed-size pool algorithm we present boasts the critical, such as games, these solutions are overkill. following properties: Instead, a simplified approach of partitioning the memory into fixed sized regions known as pools can provide • No loops (fast access times) enormous enhancements, such as increased speed, zero • No recursive functions fragmentation and memory organization. • Little initialization overhead Hence, we focus on a fixed-pool solution and • Little memory footprint (few dozen bytes) introduce an algorithm that has little overhead and almost • Straightforward and trouble-free algorithm no computational cost to create and destroy. In addition, • No-fragmentation Copyright (c) IARIA, 2012. ISBN: 978-1-61208-222-6 1 COMPUTATION TOOLS 2012 : The Third International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking • Control and organization of memory to give the reader a real-world computational comparison of the speed differences. The comparison emphasizes just The rest of the paper is organized as follows. First, how much faster a simple and smart algorithm can be over Section II discusses related work. In Section III , we a more complex and general solution. outline the contribution of the paper , followed by Section IV , which gives a detailed explanation of how the IV. HOW IT WORKS memory pool algorithm works. Section V discusses We explain how the fixed-size memory pool works by practical implementation issues. Section VI outlines some defining what information we have and what information limitations of the method. Section VIII gives some we need to calculate (to help make the details more benchmark experimental results. Finally, Section IX understandable, see Figure 1 and Figure 2 for draws conclusions and further work. illustrations). II. RELATED WORK When the pool is created, we obtain a continuous section of memory that we know the start and end address The subject of memory management techniques has of. This continuous range of memory is subdivided into been highly studied over the past few decades [8–12][13] . equally sized memory blocks. Each memory blocks A whole variety of techniques and algorithms are address can be identified at this point from the start available, while some of them can be overly complex and address, block-size, and the number of blocks. confusing to understand. On the other hand, the technique This leaves the dynamic bookkeeping part of the we present here is not novel, but is a modification of an algorithm. The algorithm must keep track of which existing technique [14][6][13]; whereby loops and blocks are used and un-used as they are allocated and de- initialization overheads are removed; this makes the allocated. resulting algorithm extremely fast and straightforward. We begin by identifying each memory block using a The method also boasts of being one of the most memory four-byte index number. This index number can be used efficient implementation available since it has very little to identify any memory location by multiplying it by the memory footprint and while giving an O(1) access time. block size and adding it to the start memory address. We also give an uncomplicated implementation in C++ in Hence, we have 0 to n-1 blocks; where n is the number of the appendix. blocks). Memory pools have been a well known choice to The bookkeeping algorithm works by keeping a list of speed-up memory allocations/de-allocations for high- the unused blocks. We only need to know which blocks speed systems [15][16][17] . Zhao et al. [18] grouped data are being unused to find the used blocks. This list of together from successive calls into segregated memory unused blocks is modified as blocks are allocated and de- using memory pools to reduce pre-fetch latency. An allocated. article by Applegate [19] gave a well-defined overview of the various methods and advantages of high-performance End memory in portable applications and the advantages of n-1 (e.g., 0x005A1E8C) memory pools. Further discussion in Malakhow [20] Memory block 4 n-2 outlines the advantages of memory pools and their ? > 4 bytes applicability in high-performance multi-threaded systems. used While we present a similar single-pool allocator to n-3 used n equally sized blocks of memory Hanson [7] , our algorithm is more clear-cut and makes it easier to customize for an ad-hoc implementation. 2 2 Memory block index number Additionally, performance considerations are 4 32 used (0 to 2 blocks for 4 byte index) discussed by Meyers [21] , e.g., macros and monolithic 1 functions, that can be applied to gain further speed-ups 0 0 and gain greater reliability while incorporating good 2 Start coding practices. A comparison of the computational cost (e.g., 0x005A1D38) of a memory management system implemented in an (a) (b) object orientated language (e.g., C++) is less efficient than one implemented in a functional language (e.g., C) Figure 1. (a) Illustrate how the unused memory is linked together (the unused memory blocks store index information to identify the free [3][22] ; however, we implemented our fixed-size space). (b) Example of how memory is subdivided into a number of n memory pool in C++ because we believe it makes it more blocks. re-usable, extensible and modular. However, we avoid the cost of initializing and link III. CONTRIBUTION together all the unused blocks. We alternatively initialize The contribution of this paper is to demonstrate a a variable to inform us of how many of the n blocks have practical, simple, fixed-size memory pool manager that been appended to the unused list. Whereby, at each has no-loops, virtually no-memory overhead and is allocation unused blocks are appended to the list and the computationally fast. We also compare the algorithm number of initialized blocks variable is updated (see with the standard system memory allocator (e.g., malloc ) Figure 1). Copyright (c) IARIA, 2012. ISBN: 978-1-61208-222-6 2 COMPUTATION TOOLS 2012 : The Third International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking The list uses no additional memory.

Fast Efficient Fixed-Size Memory Pool: No Loops and No Overhead

CSE 220: Systems Programming

Memory Allocation Pushq %Rbp Java Vs

Unleashing the Power of Allocator-Aware Software

P1083r2 | Move Resource Adaptor from Library TS to the C++ WP

The Slab Allocator: an Object-Caching Kernel Memory Allocator

Effective STL

Memory Allocation

Composing High-Performance Memory Allocators

A Progressive Register Allocator for Irregular Architectures

Implementing a New Register Allocator for the Server Compiler in the Java Hotspot Virtual Machine

GUARDER: a Tunable Secure Allocator

The Concept of Memory Allocator