Compressed RAM for Embedded Systems
Total Page:16
File Type:pdf, Size:1020Kb
CRAMES: Compressed RAM for Embedded Systems Robert Dick http://robertdick.org/ or http://ziyang.eecs.northwestern.edu/∼dickrp/ Department of Electrical Engineering and Computer Science Northwestern University Application Application Application Swap Page Page Decomp Comp CRAMES Virtual Memory Introduction CRAMES design Experimental evaluation Collaborators on project NEC Northwestern University NEC Labs America Lei Yang Dr. Haris Lekatsas Lan S. Bai Dr. Srimat Chakradhar Robert P. Dick 2 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Outline 1. Introduction Motivation Past work 2. CRAMES design Design overview Pattern-based partial match compression 3. Experimental evaluation Experimental setup Commercialization and future directions 3 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Problem background RAM quantity limits application functionality RAM price dropping but usage growing faster Secure Internet access, email, music, and games How much RAM? Functionality Cost Power consumption Size 4 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Ideal hardware–software design process Ideal case Hardware and software engineers collaborate on system-level design from start to finish We teach the advantages of this in our classes It doesn’t always happen 5 Robert Dick CRAMES HW engineers SW engineers Introduction Motivation CRAMES design Past work Experimental evaluation Real hardware–software design process 6 Robert Dick CRAMES HW engineers SW engineers Introduction Motivation CRAMES design Past work Experimental evaluation Real hardware–software design process 6 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Options Option 1: Add more memory Implications: Hardware redesign, miss shipping target, get fired Option 2: Rip out memory-hungry application features Implications: Lose market to competitors, fail to recoup design and production costs, get fired Option 3: Make it seem as if memory increased without changing hardware, without changing applications, and without performance or power consumption penalties Nobody knew how to do this 7 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Goals Allow application RAM requirements to overrun initial estimates even after hardware design Reduce physical RAM, negligible performance and energy cost Improve functionality or performance with same physical RAM 8 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation HW RAM compression Tremaine 2001, Benini 2002, Moore 2003 Hardware (de)compression unit between cache and RAM Hardware redesign and application-specific compression hardware Past work claimed application-specific hardware essential to keep power and performance overhead low 9 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Code compression Lekatsas 2000, Xu 2004 Store code compressed, decompress during execution Compress off-line, decompress on-line For RAM, less important than on-line data compression 10 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Compression for swap performance Compressed caching Douglis 1993, Russinovich 1996, Wilson 1999, Kjelso 1999 Add compressed software cache to VM Swap compression RamDoubler, Cortez 2000, Roy 2001, Chihaia 2005 Compress swapped-out pages and store them in software cache Both techniques Target: general purpose system with disks Goal: improve system performance Interface to backing store (disk) 11 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Outline 1. Introduction Motivation Past work 2. CRAMES design Design overview Pattern-based partial match compression 3. Experimental evaluation Experimental setup Commercialization and future directions 12 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Design principles Page selection Scheduling compression and decompression Organizing compressed and uncompressed regions Dynamically adjust compressed region size Compression scheme High performance Energy efficient Good compression ratio Low memory requirement 13 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation CRAMES design Page Fault Memory Low 8 1 1 8 Kernel Swapping System 7 2 2 7 CRAMES Device Driver 6 6 3 3 4 Block Memory Block Decompressor Manager Compressor 5 5 4 Virtual Memory MMU RAM 14 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation System configuration Slightly slower Ramdisk with Original CRAMES file system 20 MB Ramdisk Compressed RAM with file system device with file system Main memory 12 MB 12 MB working area 10 MB Main memory working area Slightly slower Main memory 10 MB main memory working area working area 20 MB Compressed 20 MB swap device 10 MB System/User view System view User view 15 Robert Dick CRAMES Compression Ratio Working Memory 7600kB 0.39 10000 3700kB 0.34 bzip2 1000 zlib default 256KB 0.29 zilb level 1 100 Compression (bytes) 64KB zlib level 9 2 44KB 0.24 Decompression RLE log 16KB 16KB LZRW-1A 10 0.19 Compression Ratio Compression LZO 00 0 0.14 1 512 1024 2048 4096 8192 LZO bzip2 zlib default LZO LZRW-1A RLE Block Size (bytes) best existing Compression Algorithms compression Compression Time algorithm for Decompression Time 0.0025 0.0012 CRAMES 0.001 bzip2 0.002 bzip2 zlib default zilb default 0.0008 zlib level 1 0.0015 zlib level 1 zlib level 9 zlib level 9 0.0006 RLE 0.001 RLE Time (sec) Time Time (sec) Time 0.0004 LZRW-1A LZRW-1A 0.0005 LZO LZO 0.0002 0 0 512 1024 2048 4096 8192 512 1024 2048 4096 8192 Block Size (bytes) Block Size (bytes) Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Compression algorithm Compression Ratio Working Memory 7600kB 0.39 10000 3700kB 0.34 bzip2 1000 zlib default 256KB 0.29 zilb level 1 100 Compression (bytes) 64KB zlib level 9 2 44KB 0.24 Decompression RLE log 16KB 16KB LZRW-1A 10 0.19 Compression Ratio Compression LZO 00 0 0.14 1 512 1024 2048 4096 8192 bzip2 zlib default LZO LZRW-1A RLE Block Size (bytes) Compression Algorithms Compression Time Decompression Time 0.0025 0.0012 0.001 bzip2 0.002 bzip2 zlib default zilb default 0.0008 zlib level 1 0.0015 zlib level 1 zlib level 9 zlib level 9 0.0006 RLE 0.001 RLE Time (sec) Time Time (sec) Time 0.0004 LZRW-1A LZRW-1A 0.0005 LZO LZO 0.0002 0 0 512 1024 2048 4096 8192 512 1024 2048 4096 8192 Block Size (bytes) Block Size (bytes) 16 Robert Dick CRAMES Compression Ratio Working Memory 7600kB 0.39 10000 3700kB 0.34 bzip2 1000 zlib default 256KB 0.29 zilb level 1 100 Compression (bytes) 64KB zlib level 9 2 44KB 0.24 Decompression RLE log 16KB 16KB LZRW-1A 10 0.19 Compression Ratio Compression LZO 00 0 0.14 1 512 1024 2048 4096 8192 LZO bzip2 zlib default LZO LZRW-1A RLE Block Size (bytes) best existing Compression Algorithms compression Compression Time algorithm for Decompression Time 0.0025 0.0012 CRAMES 0.001 bzip2 0.002 bzip2 zlib default zilb default 0.0008 zlib level 1 0.0015 zlib level 1 zlib level 9 zlib level 9 0.0006 RLE 0.001 RLE Time (sec) Time Time (sec) Time 0.0004 LZRW-1A LZRW-1A 0.0005 LZO LZO 0.0002 0 0 512 1024 2048 4096 8192 512 1024 2048 4096 8192 Block Size (bytes) Block Size (bytes) Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Compression algorithm Compression Ratio Working Memory 7600kB 0.39 10000 3700kB 0.34 bzip2 1000 zlib default 256KB 0.29 zilb level 1 100 Compression (bytes) 64KB zlib level 9 2 44KB 0.24 Decompression RLE log 16KB 16KB LZRW-1A 10 0.19 Compression Ratio Compression LZO 00 0 0.14 1 512 1024 2048 4096 8192 bzip2 zlib default LZO LZRW-1A RLE Block Size (bytes) Compression Algorithms Compression Time Decompression Time 0.0025 0.0012 0.001 bzip2 0.002 bzip2 zlib default zilb default 0.0008 zlib level 1 0.0015 zlib level 1 zlib level 9 zlib level 9 0.0006 RLE 0.001 RLE Time (sec) Time Time (sec) Time 0.0004 LZRW-1A LZRW-1A 0.0005 LZO LZO 0.0002 0 0 512 1024 2048 4096 8192 512 1024 2048 4096 8192 Block Size (bytes) Block Size (bytes) 16 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Weak link: Compression algorithm LZO average performance penalty 9.5% when RAM reduced to 40% Developed simulation environment to permit profiling Compression and decompression were taking most time Needed a better compression algorithm 17 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Weak link: Compression algorithm LZO average performance penalty 9.5% when RAM reduced to 40% Developed simulation environment to permit profiling Compression and decompression were taking most time Needed a better compression algorithm 17 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Pattern-based partial dictionary match coding Input 32-bit word N Y Output dictionary Is something similar Output value location and in dictionary? difference Eliminate LRU dictionary entry Update dictionary entry Place entry in dictionary Move entry to front of dictionary