A Virtual Architecture for Durable Compressed Archives
Total Page:16
File Type:pdf, Size:1020Kb
VXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology http://pdos.csail.mit.edu/~baford/vxa/ The Ubiquity of Data Compression Everything is compressed these days – Archive/Backup/Distribution: ZIP, tar.gz, ... – Multimedia streams: mp3, ogg, wmv, ... – Office documents: XML-in-ZIP – Digital cameras: JPEG, proprietary RAW, ... – Video camcorders: DV, MPEG-2, ... Compressed Data Formats Observation #1: Data compression formats evolve rapidly s es pr rc 2 C O a p R p Q U om R O H IP zi A zi z S L c A Z L Z g R b 7 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Lossless Compression 1980 1985 1990 1995 2000 2005 O b C s D e o rv m a a ta t p c io r o n e L m o s ss # le p 1 s ss r e C e : o Ð SQ d Im m s p Ð LU s ag r e e io D E ss n io n c n Ð compress a V odi f id n Ð ARC o t eo g r a E Ð ZOO m nc o F A d Ð BMP a ud in Ð LHarc t io g Ð ILBM s o E Ð TIFF Ð ZIP n e r c v 1 o Ð GIF m 9 di o 8 0 ng Ð PCX lv Ð TGA Ð gzip e a r ts Ð 8SVX Ð ANIM Ð RAR ap 1 Ð FLIC i 9 d 85 Ð bzip2 l Ð AIFF Ð QuickTime Ð JPEG y Ð PNG Ð AIFF-C Ð MPEG-1 Ð 7z Ð WAV 1 Ð MPEG-2 9 9 0 Ð DV Ð MP3 Ð JPEG 2000 Ð RealAudio Ð MPEG-4 19 Ð Sorenson 9 5 Ð AAC Ð WMV7 Ð WMV8 Ð WMA7 Ð WMV9 20 Ð FLAC 0 0 Ð Vorbis Ð WMA9 2 0 0 5 Compressed Data Formats Observation #1: Data compression formats evolve rapidly Problems: – Inconvenient: each new algorithm requires decoder install/upgrade – Impedes data portability: data unusable on systems without supported decoder – Threatens long-term data usability: old decoders may not run on new operating systems Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively r) r) o to ) ct c 4 ) 6 it X e e -6 it 6 8 b v E 6 b 3 - M t S v 8 - 08 0 2 n S P x 4 8 8 (3 M (i (F (6 Ð Ð Ð Ð Ð x86 Architecture 1980 1985 1990 1995 2000 2005 Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively Fully Backward Compatible Extensions r) r) o to ) ct c 4 ) 6 it X e e -6 it 6 8 b v E 6 b 3 - M t S v 8 - 08 0 2 n S P x 4 8 8 (3 M (i (F (6 Ð Ð Ð Ð Ð x86 Architecture 1980 1985 1990 1995 2000 2005 Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively r) r) o to ) ct c 4 ) 6 it X e e -6 it 6 8 b v E 6 b 3 - M t S v 8 - 08 0 2 n S P x 4 8 8 (3 M (i (F (6 Ð Ð Ð Ð Ð a x86 Architecture h C C lp S P 0 C I r A m 0 S R M R e iu 0 P A - w C 8 I P R A o E tan 6 M S A P P D I Ð Ð Ð Ð Ð Ð Ð Ð Other Architectures 1980 1985 1990 1995 2000 2005 Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively r) r) o to ) ct c 4 ) 6 it X e e -6 it 6 8 b v E 6 b 3 - M t S v 8 - 08 0 2 n S P x 4 8 8 (3 M (i (F (6 Ð Ð Ð Ð Ð a x86 Architecture h C C lp S P 0 C I r A m 0 S R M R e iu 0 P A - w C 8 I P R A o E tan 6 M S A P P D I Ð Ð Ð Ð Ð Ð Ð Ð Other Architectures 1980 1985 1990 1995 2000 2005 Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively r) r) o to ) ct c 4 ) 6 it X e e -6 it 6 8 b v E 6 b 3 - M t S v 8 - 08 0 2 n S P x 4 8 8 (3 M (i (F (6 Ð Ð Ð Ð Ð a x86 Architecture h C C lp C S P 0 I r A m 0 S R M -R e C iu 0 IP A w 8 P R A o E tan 6 M S A P P D I Ð Ð Ð Ð Ð Ð Ð Ð Other Architectures 1980 1985 1990 1995 2000 2005 Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively r) r) o to ) ct c 4 ) 6 it X e e -6 it 6 8 b v E 6 b 3 - M t S v 8 - 08 0 2 n S P x 4 8 8 (3 M (i (F (6 Ð Ð Ð Ð Ð a x86 Architecture h C C lp C S P 0 I r A m 0 S R M R e iu 0 P A - w C 8 I P R A o E tan 6 M S A P P D I Ð Ð Ð Ð Ð Ð Ð Ð Other Architectures 1980 1985 1990 1995 2000 2005 Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively r) r) o to ) ct c 4 ) 6 it X e e -6 it 6 8 b v E 6 b 3 - M t S v 8 - 08 0 2 n S P x 4 8 8 (3 M (i (F (6 Ð Ð Ð Ð Ð a x86 Architecture h C C lp C S P 0 I r A m 0 S R M R e iu 0 P A - w C 8 I P R A o E tan 6 M S A P P D I Ð Ð Ð Ð Ð Ð Ð Ð Other Architectures 1980 1985 1990 1995 2000 2005 ic n a t I VXA: Virtual Executable Archives Observation 1+2: Instruction formats are historically more durable than compressed data formats Make archive self-extracting (data + executable decoder) To extract data, archive reader runs embedded decoder Archive Archive Archive Writer Reader Encoder Decoder D D Goals of VXA Make self-extracting archives... Archive Archive Archive Writer Reader Encoder Decoder D D Goals of VXA Make self-extracting archives... 1. Safe: malicious decoders can©t compromise host 2. Future-proof: simple, well-defined architecture [Lorie] Archive Archive Archive Writer Reader Encoder Emulator Decoder D D Goals of VXA Make self-extracting archives... 1. Safe: malicious decoders can©t compromise host 2. Future-proof: simple, well-defined architecture [Lorie] 3. Easy: allow reuse of existing code, languages, tools Archive Archive Archive Writer Reader Encoder x86 Emulator Decoder D D Goals of VXA Make self-extracting archives... 1. Safe: malicious decoders can©t compromise host 2. Future-proof: simple, well-defined architecture [Lorie] 3. Easy: allow reuse of existing code, languages, tools 4. Efficient: practical for short term data packaging too Archive Archive Archive Writer Reader Encoder Fast x86 Emulator Decoder D D Outline ● Archiver Operation ● vxZIP Archive Format ● Decoder Architecture ● Emulator Design & Implementation ● Evaluation (performance, storage overhead) ● Conclusion Archive Writer Operation VXA Archiver Archive Archive Writer Operation Uncompressed Input Files VXA Archiver General Compressor Decoder1 D1 Archive Archive Writer Operation Uncompressed Input Files VXA Archiver General Compressor Decoder1 D1 Archive Archive Writer Operation Uncompressed Input Files General Image Audio Compressor Compressor Compressor Decoder1 Decoder2 Decoder3 D1 D2 D3 Archive Archive Writer Operation Uncompressed Input Files Pre-Compressed Input Files General Image Audio Compressor Compressor Compressor Decoder1 Decoder2 Decoder3 D1 D2 D3 Archive Archive Writer Operation Uncompressed Input Files Pre-Compressed Input Files General Image Audio Image Format Audio Format Compressor Compressor Compressor Recognizer Recognizer Decoder1 Decoder2 Decoder3 Decoder4 Decoder5 D1 D2 D3 D4 D5 Archive Archive Reader Operation VXA Archive Reader x86 Emulator D1 D2 D3 D4 D5 Archive Archive Reader Operation Original Uncompressed Files VXA Archive Reader x86 Emulator Decoder1 D1 D2 D3 D4 D5 Archive Archive Reader Operation Original Uncompressed Files VXA Archive Reader x86 Emulator Decoder1 Decoder2 Decoder3 D1 D2 D3 D4 D5 Archive Archive Reader Operation Original Uncompressed Files Original Pre-Compressed Files VXA Archive Reader x86 Emulator Decoder1 Decoder2 Decoder3 D1 D2 D3 D4 D5 Archive Archive Reader Operation Original Uncompressed Files De-compressed Files VXA Archive Reader x86 Emulator Decoder1 Decoder2 Decoder3 Decoder4 Decoder5 D1 D2 D3 D4 D5 Archive vxZIP Archive Format ● Backward compatible with legacy ZIP format Image file Audio file Audio file Central Directory vxZIP Archive vxZIP Archive Format ● Backward compatible JP2 Decoder with legacy ZIP format Image file ● Decoders intermixed FLAC Decoder with archived files Audio file Audio file Central Directory vxZIP Archive vxZIP Archive Format ● Backward compatible JP2 Decoder with legacy ZIP format Image file (JP2-encoded) ● Decoders intermixed FLAC Decoder with archived files Audio file ● Archived files have (FLAC-encoded) Audio file new extension header (FLAC-encoded) pointing to decoder Central Directory vxZIP Archive vxZIP Archive Format ● Backward compatible JP2 Decoder (deflated) with legacy ZIP format Image file (JP2-encoded) ● Decoders intermixed FLAC Decoder with archived files (deflated) Audio file ● Archived files have (FLAC-encoded) Audio file new extension header (FLAC-encoded) pointing to decoder Central Directory ● Decoders are hidden, vxZIP Archive ªdeflatedº (gzip) vxZIP Decoder Architecture ● Decoders are ELF executables for x86-32 – Can be written in any language, safe or unsafe – Compiled using ordinary tools (GCC) ● Decoders have access to five ªsystem callsº: – read stdin, write stdout, malloc, next file, exit ● Decoders cannot: – open files, windows, devices, network connections, ..