Evaluating Effects of Cache Memory Compression on Embedded Systems

Evaluating effects of cache memory compression on embedded systems Anderson Farias Briglia Allan Bezerra Nokia Institute of Technology Nokia Institute of Technology [email protected] [email protected] Leonid Moiseichuk Nitin Gupta Nokia Multimedia, OSSO VMware Inc. [email protected] [email protected] Abstract minimal) overhead when compressed caching is enabled in the system. Cache memory compression (or compressed caching) Experimental data show that not only can we improve was originally developed for desktop and server plat- data input and output rates, but also that the sys- forms, but has also attracted interest on embedded system behavior can be improved, especially in memory- tems where memory is generally a scarce resource, and critical cases leading, for example, to such improve- hardware changes bring more costs and energy con- ments as postponing the out-of-memory activities al- sumption. Cache memory compression brings a consid- together. Taking advantage of the kernel swap sys- erable advantage in input-output-intensive applications tem, this implementation adds a virtual swap area (as by means of using a virtually larger cache for the local a dynamically sized portion of the main memory) to file system through compression algorithms. As a result, store the compressed pages. Using a dictionary-based it increases the probability of fetching the necessary data compression algorithm, page cache (file-system) pages in RAM itself, avoiding the need to make low calls to and anonymous pages are compressed and spread into local storage. This work evaluates an Open Source im- variable-sized memory chunks. With this approach, plementation of the cache memory compression applied the fragmentation can be reduced to almost zero whilst to Linux on an embedded platform, dealing with the un- achieving a fast page recovery process. The size of avoidable processor and memory resource limitations as Compressed Cache can be adjusted separately for Page well as with existing architectural differences. Cache and Anonymous pages on the fly, using procfs We will describe the Compressed Cache (CCache) de- entries, giving more flexibility to tune system to re- sign, compression algorithm used, memory behavior quired use cases. tests, performance and power consumption overhead, and CCache tuning for embedded Linux. 2 Compressed Caching 1 Introduction 2.1 Linux Virtual Memory Overview Compressed caching is the introduction of a new level Physical pages are the basic unit of memory manage- into the virtual memory hierarchy. Specifically, RAM ment [8] and the MMU is the hardware that trans- is used to store both an uncompressed cache of pages lates virtual pages addresses into physical pages ad- in their ‘natural’ encoding, and a compressed cache of dress and vice-versa. This compressed caching imple- pages in some compressed format. By using RAM to mentation, CCache [3], adds some new flags to help store some number of compressed pages, the effective with compressed pages identification and uses the same size of RAM is increased, and so the number of page lists used by the PFRA (Page Frame Reclaiming Algo- faults that must be handled by very slow hard disks is rithm). When the system is under a low memory con- decreased. Our aim is to improve system performance. dition, it evicts pages from memory. It uses Least Re- When that is not possible, our goal is to introduce no (or cently Used (LRU) criteria to determine order in which • 53 • 54 • Evaluating effects of cache memory compression on embedded systems to evict pages. It maintains two LRU lists—active and changes required for user applications) take these pages inactive LRU lists. These lists may contain both page- in and out of compressed cache. cache (file-backed) and swap-cache (anonymous) pages. When under memory pressure, pages in inactive list are This implementation handles anonymous pages and freed as: page-cache (filesystem) pages differently, due to the way they are handled by the kernel: • Swap-cache pages are written out to swap disks using swapper_space writepage() (swap_ • For anonymous pages, we create a virtual swap. writepage()). This is a memory-resident area of memory where we store compressed anonymous pages. The swap- • Dirty page-cache pages are flushed to filesystem out path then treats this as yet another swap de- disks using filesystem specific writepage(). vice (with highest priority), and hence only minimal changes were required for this kernel part. The • Clean page-cache pages are simply freed. size of this swap can be dynamically adjusted using provided proc nodes. 2.1.1 About Swap Cache • For page-cache pages, we make a corresponding page-cache entry point to the location in the com- This is the cache for anonymous pages. All swap cache pressed area instead of the original page. So pages are part of a single swapper_space. A single when a page is again accessed, we decompress the radix tree maintains all pages in the swap cache. swp_ page and make the page-cache entry point back entry_t is used as a key to locate the corresponding to this page. We did not use the ‘virtual swap’ pages in memory. This value identifies the location in approach here since these (file-system) pages are swap device reserved for this page. never ‘swapped out.’ They are either flushed to file-system disk (for dirty pages) or simply freed 5 bits 27 bits (for clean pages). typedef struct { type offset unsigned long val; } swp_entry_t; swp_entry_t for default setup In both cases, the actual compressed page is stored as of MAX_SWAPFILES=32 series of variable sized ‘chunks’ in a specially managed part of memory which is designed to have minimum Figure 1: Fields in swp_entry_t fragmentation in storing these variable-sized areas with In Figure 1, ‘type’ identifies things we can swap to. quick storage/retrieval operations. All kinds of pages share the common compressed area. The compressed area begins as few memory pages. As 2.1.2 About Page Cache more pages are compressed, the compressed area in- flates (up to a maximum size which can be set using This is the cache for file-system pages. Like swap cache, procfs interface) and when requests for these com- this also uses radix-tree to keep track of file pages. Here, pressed pages arrive, these are decompressed, and cor- the offset in file is used as the search key. Each open file responding memory ‘chunks’ are put back onto the free- has a separate radix-tree. For pages present in memory, list. the corresponding radix-node points to struct page for the memory page containing file data at that offset. 2.3 Implementation Design 2.2 Compressed Cache Overview When a page is to be compressed, the radix node point- ing to the page is changed to point to the chunk_head— For compressed cache to be effective, it needs to this in turn points to first of the chunks for the com- store both swap-cache and page-cache (clean and dirty) pressed page and all the chunks are also linked. This pages. So, a way is needed to transparently (i.e., no chunk_head structure contains all the information 2007 Linux Symposium, Volume One • 55 Main memory 4Kb page boundaries Pages backed Pages backed by by Swap Filesystem Disks Virtual Swap Compressed Cache Figure 3: A sample of compressed storage view high- lighting ‘chunked’ storage. Identically colored blocks Swap Filesystem Disks belong to the same compressed page, and white is free space. An arrow indicates related chunks linked to- Figure 2: Memory hierarchy with Compressed Caching gether as a singly linked list. A long horizontal line across chunks shows that these chunks are also linked together as a doubly lifnked list in addition to whatever required to correctly locate and decompress the page other lists they might belong to. (compressed page size, compression algorithm used, location of first of chunks, etc.). • A chunk cannot cross page boundaries, as is shown When the compressed page is accessed/required later, for the ‘green’ compressed page. A chunk is split page-cache/swap-cache (radix) lookup is done. If we unconditionally at page boundaries. Thus, the get a chunk_head structure instead of a page structure maximum chunk size is PAGE_SIZE. on lookup, we know this page was compressed. Since chunk_head contains a pointer to first of the chunks • This structure will reduce fragmentation to a mini- for this page and all chunks are linked, we can easily mum, as all the variable, free-space blocks are be- retrieve the compressed version of the page. Then, using tracked. ing the information in the chunk_head structure, we • When compressed pages are taken out, correspond- decompress the page and make the corresponding radix- ing chunks are added to the free-list and physically node points back to this newly decompressed page. adjacent free chunks are merged together (while making sure chunks do not cross page boundaries). If the final merged chunk spans an entire page, the 2.3.1 Compressed Storage page is released. The basic idea is to store compressed pages in variable- So, the compressed storage begins as a single chunk of sized memory blocks (called chunks). A compressed size PAGE_SIZE and the free-list contains this single page can be stored in several of these chunks. Mem- chunk. ory space for chunks is obtained by allocating 0-order pages at a time and managing this space using chunks. An LRU-list is also maintained which contains these All the chunks are always linked as a doubly linked list chunks in the order in which they are added (i.e., the called the master chunk list. Related chunks are also ‘oldest’ chunk is at the tail).

Load more