Compressing Class files

Compressing Java Class files Java Class files • Compiled Java source programs generates 1999 ACM SIGPLAN Conference on (lots of) class files Programming Language Design and • Architecture neutral Implementation • Standard distribution format – Java programs can be compiled to native William Pugh executables, but I won’t talk about that Dept. of Computer Science Univ. of Maryland

Class file contents Java ARchive () files

• class files contain lots of symbolic • A collection of of class files and other information resources (e.g., images) – For javac, only 21% of uncompressed class file • same format as archives is bytecode – individual files can be compressed with • Information for linking • includes manifest • Allows code compiled against an old library – Information such as code signatures – to be linked with a new library – so long as dependent functionality still there

Executing Java programs Download individual class over the net files as needed • Download and install program • TCP set-up costs for each class file – compact archive format – unless you use persistent http connection • Execute as downloaded • No compression – class files as needed, or • Get just the class files you need – compact and progressive archive format – some class files are needed only for verification – entire class file is needed if you need only one method • This approach isn’t used in practice – for non-trivial applications

William Pugh, Univ. of Maryland 1 Compressing Java Class files

A Wire format for Download jar archive class files? • 1 TCP connection • Bandwidth most important • zlib compression of individual class files • Decompression time relatively important – about a factor of 2 savings – Compression time not very important • may download class files that are not used • Progressive • entire jar archive must be downloaded • Not random access before any class files can be accessed • Can translate into jar archive or class files – or load directly into JVM

Debugging information Cleanup

• Java class files often contain debugging • When comparing my format to jar files information – Clean up first – source file • Remove debugging information – line number • Garbage collect constant pool – local variables • Sort constant pool • Will not include debugging information in – improves compression wire format • Exclude non-class files from archive – could do so; would fairly well

Effects of Clean up Easy wire j0r jar sjar no yes yes compressed? format: Collective Zip no yes no debugging info? yes no yes Cleaned up? • In standard Jar archive, Hanoi 86 57 46 – files are compressed individually icebrowserbean 226 125 116 • /zlib finds repeating patterns javafig_dashO 269 136 131 javafig 357 198 170 • lots of patterns repeat between but not as jmark20 309 189 173 much within class files _213_javac 516 274 226 ImageEditor 454 359 257 • Generate jar file without individual tools 1,557 950 737 compression visaj 2,189 1,524 1,157 swingall 3,265 2,193 1,657 • Compress the entire resulting jar file rt 8,937 5,726 4,652

William Pugh, Univ. of Maryland 2 Compressing Java Class files

Effectiveness of A closer look at collective zip class file contents Size of Collective Zip swingall javac Original Collective as % of Total size 3,265 516 Benchmark Size Zip Original excluding jar overhead 3,010 485 Hanoi 46 31 67% Field definitions 36 7 Method definitions 97 10 IBM Host on demand 98 85 87% Code 768 114 ICE Browser 105 88 84% Other 72 12 JavaFig 171 144 84% Constant pool 2,037 342 tools 737 513 70% Utf8 entries 1,704 295 visaj 1,157 703 61% if shared 372 56 swingall 1,657 998 60% if shared and factored 235 26 JDK 1.2 runtime 4,652 2,820 61%

What did we just learn? Beating collective zip is hard

• A substantial part of class files consists of • A lot of the things you could do constant pools – e.g., share constant pool entries – Bytecode is the only other substantial • are already done by a collective zip component • Most of the space in constant pools is taken • You can work very hard up by Utf8 entries – and find that you don’t beat collective zip by • Sharing Utf8 entries across class files is a much huge win

Compressing uniform Drawbacks to sharing streams • Increases # of constant pool entries • class files are jumbles of different types – How do we encode a reference? – Utf8 encodings, bytecodes, constant pool • For most class files, less than 255 entries entries – can encode in a single • Most compression algorithms work better if given a more uniform stream – separate out class files into streams for each type of information – compress each stream individually

William Pugh, Univ. of Maryland 3 Compressing Java Class files

Compressing bytestreams Encoding references

• Zlib (and most compression algorithms) are • How do you encode a reference to an object designed to work on bytestreams (e.g., a constant pool entry) you may have • How do you compress a stream of shorts? seen before? – Standard serialization mixes types – so that most references are encoded in 1 byte – Could use separate streams for high and low • Overload id’s based on type – In almost all cases, know the type of the object – Use variable length encoding being referenced • Hope that most entries can be encoded in a single byte

Encoding references Move to front queue (continued) • Tried several schemes • Maintain a list of all the objects seen • One that worked best was a move-to-front previously queue • To encode an object seen previously – Suggested by Ernst et al. – encode its position (1 for first entry) – Long history in compression literature – move it to the front of the list • To encode an object not seen previously – encode 0 – put it at front of list

Implementation of Factoring? Move-to-front queues • Use a modified skip-list • The string “java.awt” occurs in the Utf8 – links record distance they travel encoding of many class names • In decoder, a move-to-front operation on • Method and field signatures contain element k requires O(log k) time separate Utf8 encoding of class names – regardless of total number of elements in list • String f(String s) is recorded as having type • In encoder, requires O(log n) time (Ljava/lang/String;)Ljava/lang/String; – L is to differentiate between references and primitive types

William Pugh, Univ. of Maryland 4 Compressing Java Class files

Reorganize class file Compressing bytecodes

• Factor information to avoid as much • Separating out opcodes from operands helps redundancy as possible • Use separate streams for : – packageNames – opcodes – simpleClassNames – different register types – classNames – branch offsets – method type (array of classnames) – integer constants – ... – constant pool references (already separate)

Compression results Compression ratios

collective packed/ j0r.gz Jazz Packed Benchmark jar size zip packed Czip/ jar jar Hanoi 46 31 14 67% 30% 100% IBM Host on demand 98 85 44 87% 45% 80% ICE Browser 105 88 36 84% 35% JavaFig 171 144 64 84% 37% 60% Cinderella 625 - 171 27% tools 737 513 204 70% 28% 40% Lotus eSuite Sheet 1,101 - 549 50% 20%

visaj 1,157 703 238 61% 21% Size as % of jar file Lotus eSuite Chart 1,387 - 633 46% 0% swingall 1,657 998 338 60% 20% Mockingbird 2,350 - 506 22% 1 10 100 1,000 10,000 Reservation System 3,067 - 736 24% Size of jar file (KBytes) JDK 1.2 runtime 4,652 2,820 1,069 61% 23%

Download times Execution Times Jar Pjar • For swingall.jar 1000.0 – Jar format: 1,657 KBytes

– compressed size: 338 Kbytes 100.0 – decompression time (Ultra 5 333Mhz) • to memory: 3 secs 10.0 • to jar file: ~ 14 secs – time to load classes : 5.8 secs Download time (secs) • time to define, resolve and verify 1.0 1 10 100 1,000 Download speed (KBytes/sec)

William Pugh, Univ. of Maryland 5 Compressing Java Class files

Decoder size & security Providing Jar functionality

• Decoder is about 35Kbytes • Jar archives contain more than class files – Could be downloaded – images, text files, resources • not useful for small archives – manifest (signatures, …) – Could be installed as extension • Add a stream of non-class files • Decoder either needs permission to write to – a zip archive, without individual compression a temporary file or permission to create a but with overall compression class loader – Can do this under 1.2 security model

Complication for signatures Related work - lots!

• Compression and decompression changes a • Used few ideas that hadn’t been considered class file previously – by renumbering the constant pool • Compression of executable code • Signatures from source class files won’t – Ernst, Evans, Fraser, Lucco and Proebsting, work on decoded class files PLDI97 • Decompress once, sign decompressed class • Compression of Java Classfiles files, use those signatures – Nigel Horspool et al. – decompression is deterministic

Jax from IBM Combining Jax and Pack

jar size Jax'd & • Java Application eXtraction Benchmark (Kbytes) Jax'd Packed Packed – available from www.alphaworks.ibm.com Hanoi 46 46% 30% 15% • Extracts just the classes and methods IBM Host on demand 98 84% 45% 37% ICE Browser 105 87% 35% 32% needed by application JavaFig 171 79% 37% 31% • Very useful if your application uses small Cinderella 625 65% 27% 17% part of a large library Lotus eSuite Sheet 1,101 35% 50% 11% Lotus eSuite Chart 1,387 43% 46% 14% – eliminates need to ship entire library Mockingbird 2,350 13% 22% 4% Reservation System 3,067 58% 24% 14%

William Pugh, Univ. of Maryland 6 Compressing Java Class files

Combining Jax with Packing Software release Jax'd Packed Jax'd & Packed

100% • Codec will be open source 90% 80% – want to avoid forking of source 70% 60% • Alpha release available momentarily 50% 40% – Almost certainly has bugs 30% 20% – Current format not supported in the future Size as % of Jar size 10% • any small tweak changes the format 0% 10 100 1,000 10,000 Jar size (KBytes)

Getting it ready for the Future work mass market • Additional work needs to be done • Compact object serialization formats – Testing • Progressive class file loading – User interface – Ordering class files – Installation/Code signing – Reducing class files loaded but not used • I don’t have time to provide customer • some class files loaded only for verification support – Eagerly load class files when no other work • Looking for partners – Separating application into modules • don’t download modules unless needed

Questions?

Slides, software available from: http://www.cs.umd.edu/~pugh/java

William Pugh, Univ. of Maryland 7