Dictionary Budget: 11
Total Page:16
File Type:pdf, Size:1020Kb
USOO7609179B2 (12) United States Patent (10) Patent No.: US 7,609,179 B2 Diaz-Gutierrez et al. (45) Date of Patent: Oct. 27, 2009 (54) METHOD FOR COMPRESSED DATAWITH (56) References Cited REDUCED DICTIONARY SIZES BY CODING U.S. PATENT DOCUMENTS VALUE PREFIXES 5,363,098 A * 1 1/1994 Antoshenkov ............... 341.95 6,518,895 B1 2/2003 Weiss et al. (75) Inventors: Pablo Diaz-Gutierrez, Granada (ES); 6,529,912 B2 * 3/2003 Satoh et al. ................. 707/101 Vijayshankar Raman, Sunnyvale, CA 7,088,271 B2 * 8/2006 Marpe et al. ................ 341,107 (US); Garret Swart, Palo Alto, CA (US) * cited by examiner Assignee: International Business Machines Primary Examiner Peguy Jean Pierre (73) (74) Attorney, Agent, or Firm—IP Authority, LLC; Ramraj Corporation, Armonk, NY (US) Soundararajan (*) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 (57) ABSTRACT U.S.C. 154(b) by 0 days. The speed of dictionary based decompression is limited by the cost of accessing random values in the dictionary. If the (21) Appl. No.: 11/970,844 size of the dictionary can be limited so it fits into cache, decompression is made to be CPU bound rather than memory bound. To achieve this, a value prefix coding scheme is pre (22) Filed: Jan. 8, 2008 sented, wherein value prefixes are stored in the dictionary to get good compression from Small dictionaries. Also pre sented is an algorithm that determines the optimal entries for (65) Prior Publication Data a value prefix dictionary. Once the dictionary fits in cache, decompression speed is often limited by the cost of mispre US 2009/O174583 A1 Jul. 9, 2009 dicted branches during Huffman code processing. A novel way is presented to quantize Huffman code lengths to allow (51) Int. C. code processing to be performed with few instructions, no H03M 7/38 (2006.01) branches, and very little extra memory. Also presented is an algorithm for code length quantization that produces the opti (52) U.S. Cl. ........................................... 341/51: 341/50 mal assignment of Huffman codes and show that the adverse (58) Field of Classification Search ................... 341/50, effect of quantization on the compression ratio is quite Small. 341/106, 65, 67,78, 79 See application file for complete search history. 15 Claims, 8 Drawing Sheets Dictionary budget: 11 () ) (E Column 1 Column 2 Column 3 Column 4 U.S. Patent Oct. 27, 2009 Sheet 1 of 8 US 7,609,179 B2 Aouenbeu U.S. Patent US 7,609,179 B2 U.S. Patent Oct. 27, 2009 Sheet 3 of 8 US 7,609,179 B2 U.S. Patent Oct. 27, 2009 Sheet 4 of 8 US 7,609,179 B2 U.S. Patent Oct. 27, 2009 Sheet 5 of 8 US 7,609,179 B2 U.S. Patent Oct. 27, 2009 Sheet 6 of 8 US 7,609,179 B2 U.S. Patent Oct. 27, 2009 Sheet 7 of 8 US 7,609,179 B2 4. U.S. Patent Oct. 27, 2009 Sheet 8 of 8 US 7,609,179 B2 ?un00en?eA NHELLVdBG.OOHLÐNETTEGOO US 7,609,179 B2 1. 2 METHOD FOR COMPRESSED DATA WITH memory of the processor. The straightforward approach to REDUCED DICTIONARY SIZES BY CODING limiting the size of the dictionary is to place only a Subset of VALUE PREFIXES the values in the dictionary and to place any values missing from the dictionary directly in the input, in lined. While this BACKGROUND OF THE INVENTION simple approach improves the memory behavior, it decreases the compression ratio, making the compression far less effec 1. Field of Invention tive and decreasing the data transfer speed. This is especially The present invention relates generally to the field of true when the on-chip memory is Small. encoding. More specifically, the present invention is related Once the dictionary fits in the on-chip memory of the to a method for compressing data with reduced dictionary 10 processor, the next bottleneck in processing compressed data sizes by coding value prefixes. is caused by mispredicted branches during decompression. 2. Discussion of Related Art This occurs because traditional Huffman code processing With the success of multi-core CPUs and data parallelism executes one or more unpredictable branches for each code it in GPUs, modern processors are once again gaining process processes. A branch is unpredictable if it is taken or not taken ing capability at the same rate that they are gaining transistors. 15 with similar probability. Each mispredicted branch can take However, access to memory has not been improving as the processor dozens of cycles to recover (see, for example, quickly. This has led to a resurgence of interest in compres the paper to Sprangle et al. entitled, “Increasing processor Sion, not only because it saves disk space and network band performance by implementing deeper pipelines”). width, but because it makes good use of limited size memo The entropy coding of data was one of the founding topics ries and allows data to be transferred more efficiently through in Information theory, and Huffman's seminal work (in the the data pathways of the processor and I/O system. The focus paper entitled, “A method for the construction of minimum on using compression for increasing speed rather than capac redundancy codes') on entropy coding broke the ground for ity means that there is a new focus on the speed at which data all that followed. Huffman codes separately encode each can be decompressed. The speed of decompression is gener value as a bit string that can be compared and manipulated ally of more interest that the speed of compression because 25 independently from the rest of the data. many data items are read mostly, that is they are compressed Most work on fixed size dictionaries has been done in the and written once, but read and decompressed many times. context of online adaptive dictionary maintenance as embod A standard compression technique is dictionary coding. ied in LZ compression (see, for example, the paper to Lempel The idea is to place Some frequent values in a dictionary, and et al. entitled, “On the complexity of finite sequences”), assign them codewords, which are shorter (i.e., take fewer bits 30 where the dictionary is constantly changing in response to the to represent) than the corresponding values. Then, occur data that has been compressed so far. In LZ compression and rences of these values in a data set can be replaced by the most of its variants, the dictionary is built over a window, a codewords. This technique has been well studied and is fixed size sequence of previously scanned characters that are widely used in different forms: a common example is Huff searched during compression for repeating strings and man Coding. 35 retained during decompression to be used by the dictionary. A The drawback with this standard dictionary coding is that primary topic of interest in this field is online item replace dictionaries can get very large when the domain has many ment algorithms, which determines when to replace an exist distinct values. For example, ifa column in a database of retail ing dictionary entry with a new entry or when to change the transactions contains dates, then some dates might be much Huffman code associated with an entry. This problem has more common than others (say weekends, or days before 40 been shown to be difficult (for example, see paper to Agostino major holidays), so dictionary coding might be considered. et al. entitled, “Bounded size dictionary compression: Sck— But the number of distinct dates can run intens of thousands. completeness and nc algorithms'). However, dictionary based compression—the most com Length-limited coding is a generalization of fixed size mon and effective form of compression—has a problem when dictionaries in which codes may have lengths between 1 and running on modern processors: Compression replaces values 45 Some integer L. One of the longest lasting contributions in the on the sequentially accessed input stream with shorter codes area is the paper to Larmore et al. entitled, "A fast algorithm that have to be looked up in a dictionary. This converts cheap for optimal length-limited Huffman Codes,” which intro sequential accesses from the input stream into what is argu duces an O(nL) algorithm for computing optimal compres ably the most expensive instruction on a modern processor: a sion codes of bounded length L. In a more system-specific random read from main memory to access the value in the 50 approach described in the paper to Liddell et al. entitled, dictionary. "Length-restricted coding using modified probability distri Random reads are expensive because they generally miss butions, optimally restricts the length of compression codes in the cache, access only a small portion of the cache line, and to a size that fits in a word of the underlying architecture. Like are not amenable to pre-fetching see paper to Baer et al. An the papers to Milidiu et al. (entitled, “Efficient implementa effective on-chip preloading scheme to reduce data access 55 tion of the WARM-UP algorithm for the construction of penalty'. One load instruction can cause the processor to length-restricted prefix codes' and “In-place length-re stall for hundreds of cycles waiting for the result to arrive stricted prefix coding'), they also provide heuristics that run from memory, which, has been getting slower as CPUs have significantly faster than the algorithm, as described in the gotten faster. With simultaneous multi-threading (SMT) the paper to Larmore et al. entitled, "A fast algorithm for optimal processor can overlap its resources by running a second hard 60 length-limited Huffman Codes, while suffering little loss in ware thread while the first thread waits for the load to com compression performance.