CRAMES: Compressed RAM for Embedded Systems
Robert Dick http://robertdick.org/ or http://ziyang.eecs.northwestern.edu/∼dickrp/ Department of Electrical Engineering and Computer Science Northwestern University
Application Application Application
Swap
Page Page Decomp Comp
CRAMES
Virtual Memory Introduction CRAMES design Experimental evaluation Collaborators on project
NEC
Northwestern University NEC Labs America Lei Yang Dr. Haris Lekatsas Lan S. Bai Dr. Srimat Chakradhar Robert P. Dick
2 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Outline
1. Introduction Motivation Past work
2. CRAMES design Design overview Pattern-based partial match compression
3. Experimental evaluation Experimental setup Commercialization and future directions
3 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Problem background
RAM quantity limits application functionality
RAM price dropping but usage growing faster Secure Internet access, email, music, and games
How much RAM? Functionality Cost Power consumption Size
4 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Ideal hardware–software design process
Ideal case Hardware and software engineers collaborate on system-level design from start to finish
We teach the advantages of this in our classes It doesn’t always happen
5 Robert Dick CRAMES HW engineers SW engineers
Introduction Motivation CRAMES design Past work Experimental evaluation Real hardware–software design process
6 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Real hardware–software design process
HW engineers SW engineers
6 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Options
Option 1: Add more memory Implications: Hardware redesign, miss shipping target, get fired
Option 2: Rip out memory-hungry application features Implications: Lose market to competitors, fail to recoup design and production costs, get fired
Option 3: Make it seem as if memory increased without changing hardware, without changing applications, and without performance or power consumption penalties Nobody knew how to do this
7 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Goals
Allow application RAM requirements to overrun initial estimates even after hardware design Reduce physical RAM, negligible performance and energy cost Improve functionality or performance with same physical RAM
8 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation HW RAM compression
Tremaine 2001, Benini 2002, Moore 2003 Hardware (de)compression unit between cache and RAM Hardware redesign and application-specific compression hardware Past work claimed application-specific hardware essential to keep power and performance overhead low
9 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Code compression
Lekatsas 2000, Xu 2004 Store code compressed, decompress during execution Compress off-line, decompress on-line For RAM, less important than on-line data compression
10 Robert Dick CRAMES Introduction Motivation CRAMES design Past work Experimental evaluation Compression for swap performance
Compressed caching Douglis 1993, Russinovich 1996, Wilson 1999, Kjelso 1999 Add compressed software cache to VM
Swap compression RamDoubler, Cortez 2000, Roy 2001, Chihaia 2005 Compress swapped-out pages and store them in software cache
Both techniques Target: general purpose system with disks Goal: improve system performance Interface to backing store (disk)
11 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Outline
1. Introduction Motivation Past work
2. CRAMES design Design overview Pattern-based partial match compression
3. Experimental evaluation Experimental setup Commercialization and future directions
12 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Design principles
Page selection Scheduling compression and decompression Organizing compressed and uncompressed regions Dynamically adjust compressed region size
Compression scheme High performance Energy efficient Good compression ratio Low memory requirement
13 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation CRAMES design
Page Fault Memory Low
8 1 1 8
Kernel Swapping System
7 2 2 7
CRAMES Device Driver
6 6 3 3 4 Block Memory Block Decompressor Manager Compressor
5 5 4
Virtual Memory
MMU
RAM
14 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation System configuration
Slightly slower Ramdisk with Original CRAMES file system 20 MB
Ramdisk Compressed RAM with file system device with file system Main memory 12 MB 12 MB working area 10 MB
Main memory working area Slightly slower Main memory 10 MB main memory working area working area 20 MB Compressed 20 MB swap device 10 MB
System/User view System view User view
15 Robert Dick CRAMES Compression Ratio Working Memory
7600kB 0.39 10000 3700kB
0.34 bzip2 1000 zlib default 256KB 0.29 zilb level 1 100 Compression (bytes) 64KB
zlib level 9 2 44KB 0.24 Decompression RLE log 16KB 16KB LZRW-1A 10 0.19 Compression Ratio Compression LZO 00 0 0.14 1 512 1024 2048 4096 8192 LZO bzip2 zlib default LZO LZRW-1A RLE Block Size (bytes) best existing Compression Algorithms compression Compression Time algorithm for Decompression Time 0.0025 0.0012 CRAMES 0.001 bzip2 0.002 bzip2 zlib default zilb default 0.0008 zlib level 1 0.0015 zlib level 1 zlib level 9 zlib level 9 0.0006 RLE 0.001 RLE Time (sec) Time Time (sec) Time 0.0004 LZRW-1A LZRW-1A 0.0005 LZO LZO 0.0002
0 0 512 1024 2048 4096 8192 512 1024 2048 4096 8192 Block Size (bytes) Block Size (bytes)
Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Compression algorithm
Compression Ratio Working Memory
7600kB 0.39 10000 3700kB
0.34 bzip2 1000 zlib default 256KB 0.29 zilb level 1 100 Compression (bytes) 64KB
zlib level 9 2 44KB 0.24 Decompression RLE log 16KB 16KB LZRW-1A 10 0.19 Compression Ratio Compression LZO 00 0 0.14 1 512 1024 2048 4096 8192 bzip2 zlib default LZO LZRW-1A RLE Block Size (bytes) Compression Algorithms
Compression Time Decompression Time
0.0025 0.0012
0.001 bzip2 0.002 bzip2 zlib default zilb default 0.0008 zlib level 1 0.0015 zlib level 1 zlib level 9 zlib level 9 0.0006 RLE 0.001 RLE Time (sec) Time Time (sec) Time 0.0004 LZRW-1A LZRW-1A 0.0005 LZO LZO 0.0002
0 0 512 1024 2048 4096 8192 512 1024 2048 4096 8192 Block Size (bytes) Block Size (bytes)
16 Robert Dick CRAMES Compression Ratio Working Memory
7600kB 0.39 10000 3700kB
0.34 bzip2 1000 zlib default 256KB 0.29 zilb level 1 100 Compression (bytes) 64KB
zlib level 9 2 44KB 0.24 Decompression RLE log 16KB 16KB LZRW-1A 10 0.19 Compression Ratio Compression LZO 00 0 0.14 1 512 1024 2048 4096 8192 bzip2 zlib default LZO LZRW-1A RLE Block Size (bytes) Compression Algorithms
Compression Time Decompression Time
0.0025 0.0012
0.001 bzip2 0.002 bzip2 zlib default zilb default 0.0008 zlib level 1 0.0015 zlib level 1 zlib level 9 zlib level 9 0.0006 RLE 0.001 RLE Time (sec) Time Time (sec) Time 0.0004 LZRW-1A LZRW-1A 0.0005 LZO LZO 0.0002
0 0 512 1024 2048 4096 8192 512 1024 2048 4096 8192 Block Size (bytes) Block Size (bytes)
Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Compression algorithm
Compression Ratio Working Memory
7600kB 0.39 10000 3700kB
0.34 bzip2 1000 zlib default 256KB 0.29 zilb level 1 100 Compression (bytes) 64KB
zlib level 9 2 44KB 0.24 Decompression RLE log 16KB 16KB LZRW-1A 10 0.19 Compression Ratio Compression LZO 00 0 0.14 1 512 1024 2048 4096 8192 LZO bzip2 zlib default LZO LZRW-1A RLE Block Size (bytes) best existing Compression Algorithms compression Compression Time algorithm for Decompression Time 0.0025 0.0012 CRAMES 0.001 bzip2 0.002 bzip2 zlib default zilb default 0.0008 zlib level 1 0.0015 zlib level 1 zlib level 9 zlib level 9 0.0006 RLE 0.001 RLE Time (sec) Time Time (sec) Time 0.0004 LZRW-1A LZRW-1A 0.0005 LZO LZO 0.0002
0 0 512 1024 2048 4096 8192 512 1024 2048 4096 8192 Block Size (bytes) Block Size (bytes)
16 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Weak link: Compression algorithm
LZO average performance penalty 9.5% when RAM reduced to 40% Developed simulation environment to permit profiling Compression and decompression were taking most time Needed a better compression algorithm
17 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Weak link: Compression algorithm
LZO average performance penalty 9.5% when RAM reduced to 40% Developed simulation environment to permit profiling Compression and decompression were taking most time Needed a better compression algorithm
17 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Pattern-based partial dictionary match coding
Input 32-bit word
N Y Output dictionary Is something similar Output value location and in dictionary? difference
Eliminate LRU dictionary entry Update dictionary entry
Place entry in dictionary Move entry to front of dictionary
18 Robert Dick CRAMES 100% xmxx xzxz 90% zxxz 80% Nearby values similar xzzx zxzz 70% (data, pointers): 27% zzxz xzzz 60% Small values: 10% xxzz xxzx 50% xzxx xxxz 40%
Percentage % Percentage zxxx 30% zzxx zxzx 20% Zeros: 38% mmmx mmxx 10% mmmm zzzx 0% xxxx ) ) ) 2 ,1) 4 ,1) ,4 ,2) 1) 2) (8, (4,4) (8, 6, zzzz (16,1) (32 (16,2) (64 (32,2) (16 (64 (32,4) 28, (64,4) (128,1) (25 (1 Dictionary layout (size, level)
Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Data regularity
100% xmxx xzxz 90% zxxz
80% xzzx zxzz 70% zzxz xzzz 60% xxzz xxzx 50% xzxx xxxz 40%
Percentage % Percentage zxxx 30% zzxx zxzx 20% mmmx mmxx 10% mmmm zzzx 0% xxxx ) ) ) 2 ,1) 4 ,1) ,4 ,2) 1) 2) (8, (4,4) (8, 6, zzzz (16,1) (32 (16,2) (64 (32,2) (16 (64 (32,4) 28, (64,4) (128,1) (25 (1 Dictionary layout (size, level)
19 Robert Dick CRAMES 100% xmxx xzxz 90% zxxz
80% xzzx zxzz 70% zzxz xzzz 60% xxzz xxzx 50% xzxx xxxz 40%
Percentage % Percentage zxxx 30% zzxx zxzx 20% mmmx mmxx 10% mmmm zzzx 0% xxxx ) ) ) 2 ,1) 4 ,1) ,4 ,2) 1) 2) (8, (4,4) (8, 6, zzzz (16,1) (32 (16,2) (64 (32,2) (16 (64 (32,4) 28, (64,4) (128,1) (25 (1 Dictionary layout (size, level)
Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Data regularity
100% xmxx xzxz 90% zxxz 80% Nearby values similar xzzx zxzz 70% (data, pointers): 27% zzxz xzzz 60% Small values: 10% xxzz xxzx 50% xzxx xxxz 40%
Percentage % Percentage zxxx 30% zzxx zxzx 20% Zeros: 38% mmmx mmxx 10% mmmm zzzx 0% xxxx ) ) ) 2 ,1) 4 ,1) ,4 ,2) 1) 2) (8, (4,4) (8, 6, zzzz (16,1) (32 (16,2) (64 (32,2) (16 (64 (32,4) 28, (64,4) (128,1) (25 (1 Dictionary layout (size, level)
19 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation Pattern-based partial match
Algorithm Consider each 32-bit word as an input Allow partial dictionary match Use most frequent patterns based on statistical analysis
Optimizations Two-way associative LRU 16-entry hash-mapped dictionary Optimized coding scheme Early termination for uncompressable data Fine-grained operation parallelization
20 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation PBPM evaluations
PBPM twice as fast as LZO Compression ratio degrades by 10% Improves performance and still doubles usable memory
21 Robert Dick CRAMES Introduction Design overview CRAMES design Pattern-based partial match compression Experimental evaluation CRAMES and in-RAM filesystem compression
Compressed Kernel RAM disk 7% reserved 37.5% 2.2 MB 12 MB
55.6% 17.8 MB Main memory
22 Robert Dick CRAMES Introduction Experimental setup CRAMES design Commercialization and future directions Experimental evaluation Outline
1. Introduction Motivation Past work
2. CRAMES design Design overview Pattern-based partial match compression
3. Experimental evaluation Experimental setup Commercialization and future directions
23 Robert Dick CRAMES Introduction Experimental setup CRAMES design Commercialization and future directions Experimental evaluation Experimental setup
Sharp Zaurus SL-5600 PDA Intel XScale PXA250 32 MB flash memory 32 MB RAM Embedix (Linux 2.4.18 kernel) Qt/Qtopia PDA edition
24 Robert Dick CRAMES Introduction Experimental setup CRAMES design Commercialization and future directions Experimental evaluation CRAMES for compressed filesystems
CRAMES was used to create a compressed RAM device on Zaurus SL-5600 EXT2 file system LZO compression Resource map memory allocation Benchmarks: common file system operations Increase capacity to 1.6× Execution time increases by 8.4% Energy consumption increases by 5.2%
25 Robert Dick CRAMES Introduction Experimental setup CRAMES design Commercialization and future directions Experimental evaluation Benchmarks
Impact of using CRAMES to reduce physical RAM Results for ADPCM: Speech compression application from MediaBench JPEG: Image encoding application from MediaBench MPEG2: Video CODEC application from MediaBench Straight-forward matrix multiplication Intentionally difficult for CRAMES Also tested on 10 GUI applications that came with Qtopia Next-generation cellphone prototype
26 Robert Dick CRAMES Introduction Experimental setup CRAMES design Commercialization and future directions Experimental evaluation Results
Reduced RAM from 20 MB to 8 MB Base case: 20 MB RAM, no compression
Without CRAMES All suffered significant performance penalties Matrix multiplication cannot execute
With CRAMES LZO average case 9% overhead, worst case 29% PBPM average case 2.5% overhead, worst case 9%
Also works on arbitrary in-RAM filesystems
27 Robert Dick CRAMES Introduction Experimental setup CRAMES design Commercialization and future directions Experimental evaluation Status
Patent pending: Northwestern University and NEC Millions of cellphones using CRAMES started shipping in June 2007 Technology won Computerwold Horizon Award in 2007
28 Robert Dick CRAMES Introduction Experimental setup CRAMES design Commercialization and future directions Experimental evaluation What did this require?
Bunny suit army?
29 Robert Dick CRAMES Introduction Experimental setup CRAMES design Commercialization and future directions Experimental evaluation What did this require?
Bunny suit army? No! One good Ph.D. student.
30 Robert Dick CRAMES Introduction Experimental setup CRAMES design Commercialization and future directions Experimental evaluation Another direction Memory expansion for MMU-less embedded systems
Observations and Results user application Application: Sensor networks LLVM compiler Implemented in LLVM, Instruction byte code replacement, tested on TelosB nodes MEMMU compiler compiler-time optimizations Increases usable memory by Calls to 50%, unchanged applications MEMMU modified byte code library Little overhead after MEMMU LLVM compiler runtime compiler optimizations library executable CASES’06
With Lan Bai and Lei Yang
31 Robert Dick CRAMES Introduction Experimental setup CRAMES design Commercialization and future directions Experimental evaluation Acknowledgments
This work was supported by NEC Labs America and NSF award CCR-0347941. We would like to acknowledge Hui Ding of Northwestern University for his suggestions on efficiently implementing PBPM and his help testing CRAMES with GUI applications. Professors Peter Dinda, Gokhan Memik, and Lawrence Henschen provided suggestions and feedback.
32 Robert Dick CRAMES Introduction Experimental setup CRAMES design Commercialization and future directions Experimental evaluation Thank you for attending!
For additional information about this work CRAMES and MEMMU at http://ziyang.eecs.northwestern.edu/∼dickrp/tools.html
For additional information about the team that created it http://ziyang.eecs.northwestern.edu/∼dickrp/people/students.html
33 Robert Dick CRAMES