The Design, Implementation, and Deployment of a System
Total Page:16
File Type:pdf, Size:1020Kb
The Design, Implementation, and Deployment of a System to Transparently Compress Hundreds of Petabytes of Image Files for a File-Storage Service Daniel Reiter Horn, Ken Elkabany, and Chris Lesniewski-Lass, Dropbox; Keith Winstein, Stanford University https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/horn This paper is included in the Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’17). March 27–29, 2017 • Boston, MA, USA ISBN 978-1-931971-37-9 Open access to the Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation is sponsored by USENIX. The Design, Implementation, and Deployment of a System to Transparently Compress Hundreds of Petabytes of Image Files For a File-Storage Service Daniel Reiter Horn Ken Elkabany Chris Lesniewski-Laas Keith Winstein Dropbox Dropbox Dropbox Stanford University Abstract 200 150 We report the design, implementation, and deployment JPEGrescan (progressive) Lepton of Lepton, a fault-tolerant system that losslessly com- (this work) presses JPEG images to 77% of their original size on av- 100 erage. Lepton replaces the lowest layer of baseline JPEG compression—a Huffman code—with a parallelized arith- MozJPEG metic code, so that the exact bytes of the original JPEG 50 (arithmetic) file can be recovered quickly. Lepton matches the com- 40 pression efficiency of the best prior work, while decoding 30 more than nine times faster and in a streaming manner. Decompression speed (Mbits/s) Better Lepton has been released as open-source software and 20 packjpg has been deployed for a year on the Dropbox file-storage backend. As of February 2017, it had compressed more 15 6 7 8 9 10 15 20 25 than 203 PiB of user JPEG files, saving more than 46 PiB. Compression savings (percent) Figure 1: Compression savings and decompression speed (time- 1 Introduction to-last-byte) of four lossless JPEG compression tools. Diamonds In the last decade, centrally hosted network filesystems show 25th, 50th, and 75th percentiles across 200,000 JPEGs have grown to serve hundreds of millions of users. These between 100 KiB and 4 MiB (x 4). services include Amazon Cloud Drive, Box, Dropbox, compresses an average JPEG image by about 23%. This Google Drive, Microsoft OneDrive, and SugarSync. is expected to save Dropbox more than 8% of its overall Commercially, these systems typically offer users a backend file storage. Lepton introduces new techniques storage quota in exchange for a flat monthly fee, or no that allow it to match the compression savings of prior fee at all. Meanwhile, the cost to operate such a system work (PackJPG) while decoding more than nine times increases with the amount of data actually stored. There- faster and in a streaming manner (Figure 1). fore, operators benefit from anything that reduces the net amount of data they store. Some of the challenges in building Lepton included: These filesystems have become gargantuan. After less • Round-trip transparency. Lepton needs to deter- than ten years in operation, the Dropbox system contains ministically recover the exact bytes of the original roughly one exabyte (± 50%) of data, even after applying file, even for intentionally malformed files, and even techniques such as deduplication and zlib compression. after updates to the Lepton software since the file We report on our experience with a different technique: was originally compressed. format-specific transparent file compression, based on a • Distribution across independent chunks. The statistical model tuned to perform well on a large corpus. Dropbox back-end stores files in independent 4-MiB In operating Dropbox, we observed that JPEG im- chunks across many servers. Lepton must be able ages [10] make up roughly 35% of the bytes stored. We to decompress any substring of a JPEG file, without suspected that most of these images were compressed in- access to other substrings. efficiently, either because they were limited to “baseline” • Low latency and streaming. Because users are methods that were royalty-free in the 1990s, or because sensitive to file download latency, we must opti- they were encoded by fixed-function compression chips. mize decompression—from Lepton’s format back In response, we built a compression tool, called Lepton, into Huffman-coded JPEG output—for both time-to- that replaces the lowest layer of baseline JPEG images— first-byte and time-to-last-byte. To achieve this, the the lossless Huffman coding—with a custom statistical Lepton format includes “Huffman handover words” model that we tuned to perform well across a broad corpus that enable the decoder to be multithreaded and to of JPEG images stored in Dropbox. Lepton losslessly start transmitting bytes soon after a request. USENIX Association 14th USENIX Symposium on Networked Systems Design and Implementation 1 • Security. Before reading input data, Lepton enters a efficient compression without affecting the decoded restricted environment where the only allowed sys- image. These tools include JPEGrescan [13] and tem calls are read, write, exit, and sigreturn. MozJPEG [2], both based on the jpegtran tool written Lepton must pre-spawn threads and allocate all mem- by the Independent JPEG Group [9]. The tools employ ory before it sees any input. two key techniques: replacing Huffman coding with arith- • Memory. To preserve server resources, Lepton must metic coding (which is more efficient but was patent- work row-by-row on a JPEG file, instead of decoding encumbered when JPEG was formalized and is not part the entire file into RAM. of the baseline JPEG standard), and rewriting the file We deployed Lepton in April 2016 on the production in “progressive” order, which can group similar values Dropbox service. Compression is applied immediately together and result in more efficient coding. Xu et al. de- to all uploads of new JPEG files. We are gradually com- veloped a related algorithm which uses a large number of pressing files in existing storage and have ramped up to context-dependent Huffman tables to encode JPEG im- a rate of roughly a petabyte per day of input, consuming ages, reporting average compression savings of 15% in about 300 kilowatts continuously. As of February 2017, 30-50ms [22]. These tools preserve the exact pixels of Lepton had been run on more than 150 billion user files, the decoded image, but do not allow bit-exact round-trip accounting for more than 203 PiB of input. It reduced recovery of the original file. their size by a total of more than 46 PiB. We have released Format-aware, file-preserving recompression. A fi- Lepton as open-source software [1]. nal set of tools can re-compress a JPEG file with round- We report here on Lepton’s design (x 3), evaluation trip recovery of the exact bytes of the original file. These (x 4), and production deployment (x 5), and share a num- include PackJPG [17, 11] and PAQ8PX [12]. These tools ber of case studies and anomalies (x 6) encountered in use different techniques, but in our evaluations, achieved operating the system at a large scale. roughly the same compression savings (23%) on average. 2 Related Work These tools use techniques unavailable in a real-time setting. For example, one of PackJPG’s compression tech- In network filesystems and storage services, four general niques requires re-arranging all of the compressed pixel categories of approaches are used to compress user files. values in the file in a globally sorted order. This means Generic entropy compression. Many filesystems com- that decompression is single-threaded, requires access to press files using generic techniques, such as the Deflate the entire file, and requires decoding the image into RAM algorithm [6], which combines LZ77 compression [23] before any byte can be output. The time-to-first-byte and and Huffman coding [8] and is used by software such as time-to-last-byte are too high to satisfy the service goals zlib, pkzip, and gzip. More recent algorithms such as for the Dropbox service. PAQ8PX is considerably slower. LZMA [15], Brotli [3], and Zstandard [5] achieve a range Lepton was inspired by PackJPG and the algorithms of tradeoffs of compression savings and speed. developed by Lakhani [11], and Lepton uses the same In practice, none of these algorithms achieve much JPEG-parsing routines that PackJPG uses (uncmpjpg). compression on “already compressed” files, including However, unlike PackJPG, Lepton only utilizes compres- JPEGs. In our evaluation on 200,000 randomly selected sion techniques that can be implemented without “global” user JPEG images, each of these algorithms achieved operations, so that decoding can be distributed across in- savings of 1% or less (x 4). dependent chunks and multithreaded within each chunk. Lossy image compression. A second category of ap- 3 Lepton: Design and Implementation proach involves decoding user images and re-compressing At its core, Lepton is a stand-alone tool that performs them into a more-efficient format, at the cost of some vi- round-trip compression and decompression of baseline sual fidelity. The “more-efficient format” may simply JPEG files. We have released Lepton as open-source soft- be a lower-resolution or lower-quality JPEG. The JPEG ware [1]; it builds and runs on Linux, MacOS, Windows, files produced by fixed-function hardware encoders in iOS, Android, and Emscripten (JavaScript). cellular-phone cameras can typically be reduced in size In its current deployment at Dropbox, Lepton is exe- by 70% before users see a perceptual difference [19]. The cuted directly by a back-end file server or, when a file re-compression may also use more sophisticated tech- server is under high load, execution is “outsourced” from niques such as WebP or single-frame H.265. With this the file server to a cluster of machines dedicated to Lep- approach, storage savings for the provider can be as high ton only (x 5.5).