Lossless Data Compression Christian Steinruecken King’S College

Total Page:16

File Type:pdf, Size:1020Kb

Lossless Data Compression Christian Steinruecken King’S College Lossless Data Compression Christian Steinruecken King’s College Dissertation submitted in candidature for the degree of Doctor of Philosophy, University of Cambridge August 2014 2014 © Christian Steinruecken, University of Cambridge. Re-typeset on March 2, 2018, using LATEX, gnuplot, and custom-written software. Lossless Data Compression Christian Steinruecken Abstract This thesis makes several contributions to the field of data compression. Lossless data com- pression algorithms shorten the description of input objects, such as sequences of text, in a way that allows perfect recovery of the original object. Such algorithms exploit the fact that input objects are not uniformly distributed: by allocating shorter descriptions to more probable objects and longer descriptions to less probable objects, the expected length of the compressed output can be made shorter than the object’s original description. Compression algorithms can be designed to match almost any given probability distribution over input objects. This thesis employs probabilistic modelling, Bayesian inference, and arithmetic coding to derive compression algorithms for a variety of applications, making the underlying probability distributions explicit throughout. A general compression toolbox is described, consisting of practical algorithms for compressing data distributed by various fundamental probability distributions, and mechanisms for combining these algorithms in a principled way. Building on the compression toolbox, new mathematical theory is introduced for compressing objects with an underlying combinatorial structure, such as permutations, combinations, and multisets. An example application is given that compresses unordered collections of strings, even if the strings in the collection are individually incompressible. For text compression, a novel unifying construction is developed for a family of context- sensitive compression algorithms. Special cases of this family include the PPM algorithm and the Sequence Memoizer, an unbounded depth hierarchical Pitman–Yor process model. It is shown how these algorithms are related, what their probabilistic models are, and how they produce fundamentally similar results. The work concludes with experimental results, example applications, and a brief discussion on cost-sensitive compression and adversarial sequences. Declaration I hereby declare that my dissertation entitled “Lossless Data Compression” is not substantially the same as any that I have submitted for a degree or diploma or other qualification at any other University. I further state that no part of my dissertation has already been or is being concurrently submitted for any such degree or diploma or other qualification. Except where explicit reference is made to the work of others, this dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. This dissertation does not exceed sixty thousand words in length. Date: ........................... Signed: ............ ............................. Christian Steinruecken King’s College Contents Notation 11 1 Introduction 15 1.1 Codesandcommunication . .. 15 1.2 Datacompression................................. 16 1.3 Outlineofthisthesis ............................. ... 18 2 Classical compression algorithms 21 2.1 Symbolcodes .................................... 22 2.1.1 Huffmancoding............................... 23 2.2 Integercodes .................................... 24 2.2.1 Unarycode ................................. 25 2.2.2 Eliasgammacode.............................. 26 2.2.3 ExponentialGolombcodes . 27 2.2.4 Otherintegercodes............................. 28 2.3 Dictionarycoding................................ .. 29 2.3.1 LZW..................................... 29 2.3.2 Historyandsignificance . 30 2.3.3 Properties .................................. 32 2.4 Transformations................................. .. 32 2.4.1 Move-to-frontencoding. .. 32 2.4.2 Blockcompression ............................. 33 2.5 Summary ...................................... 34 3 Arithmetic coding 37 3.1 Introduction.................................... 37 3.1.1 Information content and information entropy . ........ 37 3.1.2 The relationship of codes and distributions . ........ 38 3.1.3 Algorithms for building optimal compression codes . ......... 39 3.2 Howanarithmeticcoderoperates . ..... 40 3.2.1 Mapping a sequence of symbols to nested regions . ...... 40 3.2.2 Mapping a region to a sequence of symbols . .... 42 7 8 CONTENTS 3.2.3 Encoding a region as a sequence of binary digits . ....... 42 3.3 Concrete implementation of arithmetic coding . .......... 43 3.3.1 Historicalnotes ............................... 49 3.3.2 Other generic compression algorithms . ...... 49 3.4 Arithmetic coding for various distributions . ........... 50 3.4.1 Bernoullicode................................ 50 3.4.2 Discreteuniformcode . .. .. .. 51 3.4.3 Finite discrete distributions . ..... 51 3.4.4 Binomialcode................................ 52 3.4.5 Beta-binomialcode . .. .. .. 54 3.4.6 Multinomialcode .............................. 55 3.4.7 Dirichlet-multinomial code . .... 56 3.4.8 Codes for infinite discrete distributions . ........ 57 3.5 Combiningdistributions . .... 58 3.5.1 Sequentialcoding .............................. 58 3.5.2 Exclusioncoding .............................. 59 3.5.3 Mixturemodels ............................... 60 4 Adaptive compression 61 4.1 Learningadistribution . .... 61 4.1.1 Introduction................................. 61 4.1.2 Motivation.................................. 62 4.2 Histogrammethods................................ 62 4.2.1 A simple exchangeable method for adaptive compression ........ 62 4.2.2 Dirichletprocesses . .. .. 65 4.2.3 Power-lawvariants . .. .. .. 65 4.2.4 Non-exchangeable histogram learners . ...... 66 4.3 Otheradaptivemodels ............................. .. 67 4.3.1 AP´olyatreesymbolcompressor . ... 67 4.4 Online versus header-payload compression . ......... 71 4.4.1 Onlinecompression . .. .. .. 71 4.4.2 Header-payload compression . ... 73 4.4.3 Whichmethodisbetter?. 75 4.5 Summary ...................................... 76 5 Compressing structured objects 79 5.1 Introduction.................................... 79 5.2 Permutations .................................... 82 5.2.1 Completepermutations. 82 5.2.2 Truncatedpermutations . 84 CONTENTS 9 5.2.3 Set permutations via elimination sampling . ....... 84 5.3 Combinations .................................... 85 5.4 Compositions .................................... 86 5.4.1 Strictly positive compositions with unbounded components. .. 87 5.4.2 Compositions with fixed number of components . ..... 87 5.4.3 Uniform K-compositions .......................... 88 5.4.4 Multinomial K-compositions........................ 88 5.5 Multisets....................................... 89 5.5.1 Multisets via repeated draws from a known distribution......... 90 5.5.2 Multisets drawn from a Poisson process . ..... 91 5.5.3 Multisets from unknown discrete distributions . ......... 92 5.5.4 Submultisets................................. 93 5.5.5 Multisets from a Blackwell–MacQueen urn scheme . ....... 93 5.6 Orderedpartitions ............................... .. 95 5.7 Sequences ...................................... 96 5.7.1 Sequences of known distribution, known length . ....... 97 5.7.2 Sequences of known distribution, uncertain length . .......... 98 5.7.3 Sequences of uncertain distribution . ...... 99 5.7.4 Sequences with known ordering . .. 99 5.8 Sets .........................................100 5.8.1 Uniformsets................................. 100 5.8.2 Bernoullisets ................................ 100 5.9 Summary ......................................100 6 Context-sensitive sequence compression 103 6.1 Introductiontocontextmodels . .. 104 6.1.1 Purebigramandtrigrammodels . 104 6.1.2 Hierarchical context models . 106 6.2 Generalconstruction . .. .. 107 6.2.1 Engines ...................................108 6.2.2 Probabilisticmodels . 108 6.2.3 Learning...................................110 6.3 ThePPMalgorithm ................................111 6.3.1 Basicoperation ............................... 111 6.3.2 Probabilityestimation . 113 6.3.3 Learningmechanism ............................ 115 6.3.4 PPMescapemechanisms. 115 6.3.5 OthervariantsofPPM . .. .. .. 116 6.4 OptimisingPPM ..................................117 6.4.1 Theeffectsofcontextdepth . 117 10 CONTENTS 6.4.2 Generalisation of the PPM escape mechanism . .. 117 6.4.3 The probabilistic model of PPM, stated concisely . ........122 6.4.4 Why the escape mechanism is convenient . 122 6.5 Blending.......................................123 6.5.1 Optimising the parameters of a blending PPM . .. 124 6.5.2 Backing-off versus blending . 124 6.6 Unboundeddepth.................................. 128 6.6.1 History....................................128 6.6.2 The Sequence Memoizer . 132 6.6.3 Hierarchical Pitman–Yor processes . ...... 132 6.6.4 Deplump...................................133 6.6.5 What makes Deplump compress better than PPM? . 134 6.6.6 Optimising depth-dependent parameters . ...... 136 6.6.7 Applications of PPM-like algorithms . .. 139 6.7 Conclusions ..................................... 139 7 Multisets of sequences 143 7.1 Introduction.................................... 143 7.2 Collections of fixed-length binary sequences . ..........144 7.2.1 Tree representation for multisets of fixed-length strings ......... 145 7.2.2 Fixed-depth multiset tree compression
Recommended publications
  • CALIFORNIA STATE UNIVERSITY, NORTHRIDGE LOSSLESS COMPRESSION of SATELLITE TELEMETRY DATA for a NARROW-BAND DOWNLINK a Graduate P
    CALIFORNIA STATE UNIVERSITY, NORTHRIDGE LOSSLESS COMPRESSION OF SATELLITE TELEMETRY DATA FOR A NARROW-BAND DOWNLINK A graduate project submitted in partial fulfillment of the requirements For the degree of Master of Science in Electrical Engineering By Gor Beglaryan May 2014 Copyright Copyright (c) 2014, Gor Beglaryan Permission to use, copy, modify, and/or distribute the software developed for this project for any purpose with or without fee is hereby granted. THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. Copyright by Gor Beglaryan ii Signature Page The graduate project of Gor Beglaryan is approved: __________________________________________ __________________ Prof. James A Flynn Date __________________________________________ __________________ Dr. Deborah K Van Alphen Date __________________________________________ __________________ Dr. Sharlene Katz, Chair Date California State University, Northridge iii Contents Copyright .......................................................................................................................................... ii Signature Page ...............................................................................................................................
    [Show full text]
  • Modification of Adaptive Huffman Coding for Use in Encoding Large Alphabets
    ITM Web of Conferences 15, 01004 (2017) DOI: 10.1051/itmconf/20171501004 CMES’17 Modification of Adaptive Huffman Coding for use in encoding large alphabets Mikhail Tokovarov1,* 1Lublin University of Technology, Electrical Engineering and Computer Science Faculty, Institute of Computer Science, Nadbystrzycka 36B, 20-618 Lublin, Poland Abstract. The paper presents the modification of Adaptive Huffman Coding method – lossless data compression technique used in data transmission. The modification was related to the process of adding a new character to the coding tree, namely, the author proposes to introduce two special nodes instead of single NYT (not yet transmitted) node as in the classic method. One of the nodes is responsible for indicating the place in the tree a new node is attached to. The other node is used for sending the signal indicating the appearance of a character which is not presented in the tree. The modified method was compared with existing methods of coding in terms of overall data compression ratio and performance. The proposed method may be used for large alphabets i.e. for encoding the whole words instead of separate characters, when new elements are added to the tree comparatively frequently. Huffman coding is frequently chosen for implementing open source projects [3]. The present paper contains the 1 Introduction description of the modification that may help to improve Efficiency and speed – the two issues that the current the algorithm of adaptive Huffman coding in terms of world of technology is centred at. Information data savings. technology (IT) is no exception in this matter. Such an area of IT as social media has become extremely popular 2 Study of related works and widely used, so that high transmission speed has gained a great importance.
    [Show full text]
  • Greedy Algorithm Implementation in Huffman Coding Theory
    iJournals: International Journal of Software & Hardware Research in Engineering (IJSHRE) ISSN-2347-4890 Volume 8 Issue 9 September 2020 Greedy Algorithm Implementation in Huffman Coding Theory Author: Sunmin Lee Affiliation: Seoul International School E-mail: [email protected] <DOI:10.26821/IJSHRE.8.9.2020.8905 > ABSTRACT In the late 1900s and early 2000s, creating the code All aspects of modern society depend heavily on data itself was a major challenge. However, now that the collection and transmission. As society grows more basic platform has been established, efficiency that can dependent on data, the ability to store and transmit it be achieved through data compression has become the efficiently has become more important than ever most valuable quality current technology deeply before. The Huffman coding theory has been one of desires to attain. Data compression, which is used to the best coding methods for data compression without efficiently store, transmit and process big data such as loss of information. It relies heavily on a technique satellite imagery, medical data, wireless telephony and called a greedy algorithm, a process that “greedily” database design, is a method of encoding any tries to find an optimal solution global solution by information (image, text, video etc.) into a format that solving for each optimal local choice for each step of a consumes fewer bits than the original data. [8] Data problem. Although there is a disadvantage that it fails compression can be of either of the two types i.e. lossy to consider the problem as a whole, it is definitely or lossless compressions.
    [Show full text]
  • A Survey on Different Compression Techniques Algorithm for Data Compression Ihardik Jani, Iijeegar Trivedi IC
    International Journal of Advanced Research in ISSN : 2347 - 8446 (Online) Computer Science & Technology (IJARCST 2014) Vol. 2, Issue 3 (July - Sept. 2014) ISSN : 2347 - 9817 (Print) A Survey on Different Compression Techniques Algorithm for Data Compression IHardik Jani, IIJeegar Trivedi IC. U. Shah University, India IIS. P. University, India Abstract Compression is useful because it helps us to reduce the resources usage, such as data storage space or transmission capacity. Data Compression is the technique of representing information in a compacted form. The actual aim of data compression is to be reduced redundancy in stored or communicated data, as well as increasing effectively data density. The data compression has important tool for the areas of file storage and distributed systems. To desirable Storage space on disks is expensively so a file which occupies less disk space is “cheapest” than an uncompressed files. The main purpose of data compression is asymptotically optimum data storage for all resources. The field data compression algorithm can be divided into different ways: lossless data compression and optimum lossy data compression as well as storage areas. Basically there are so many Compression methods available, which have a long list. In this paper, reviews of different basic lossless data and lossy compression algorithms are considered. On the basis of these techniques researcher have tried to purpose a bit reduction algorithm used for compression of data which is based on number theory system and file differential technique. The statistical coding techniques the algorithms such as Shannon-Fano Coding, Huffman coding, Adaptive Huffman coding, Run Length Encoding and Arithmetic coding are considered.
    [Show full text]
  • 16.1 Digital “Modes”
    Contents 16.1 Digital “Modes” 16.5 Networking Modes 16.1.1 Symbols, Baud, Bits and Bandwidth 16.5.1 OSI Networking Model 16.1.2 Error Detection and Correction 16.5.2 Connected and Connectionless 16.1.3 Data Representations Protocols 16.1.4 Compression Techniques 16.5.3 The Terminal Node Controller (TNC) 16.1.5 Compression vs. Encryption 16.5.4 PACTOR-I 16.2 Unstructured Digital Modes 16.5.5 PACTOR-II 16.2.1 Radioteletype (RTTY) 16.5.6 PACTOR-III 16.2.2 PSK31 16.5.7 G-TOR 16.2.3 MFSK16 16.5.8 CLOVER-II 16.2.4 DominoEX 16.5.9 CLOVER-2000 16.2.5 THROB 16.5.10 WINMOR 16.2.6 MT63 16.5.11 Packet Radio 16.2.7 Olivia 16.5.12 APRS 16.3 Fuzzy Modes 16.5.13 Winlink 2000 16.3.1 Facsimile (fax) 16.5.14 D-STAR 16.3.2 Slow-Scan TV (SSTV) 16.5.15 P25 16.3.3 Hellschreiber, Feld-Hell or Hell 16.6 Digital Mode Table 16.4 Structured Digital Modes 16.7 Glossary 16.4.1 FSK441 16.8 References and Bibliography 16.4.2 JT6M 16.4.3 JT65 16.4.4 WSPR 16.4.5 HF Digital Voice 16.4.6 ALE Chapter 16 — CD-ROM Content Supplemental Files • Table of digital mode characteristics (section 16.6) • ASCII and ITA2 code tables • Varicode tables for PSK31, MFSK16 and DominoEX • Tips for using FreeDV HF digital voice software by Mel Whitten, KØPFX Chapter 16 Digital Modes There is a broad array of digital modes to service various needs with more coming.
    [Show full text]
  • On the Coding Gain of Dynamic Huffman Coding Applied to a Wavelet-Based Perceptual Audio Coder
    ON THE CODING GAIN OF DYNAMIC HUFFMAN CODING APPLIED TO A WAVELET-BASED PERCEPTUAL AUDIO CODER N. Ruiz Reyes1, M. Rosa Zurera2, F. López Ferreras2, P. Jarabo Amores2, and P. Vera Candeas1 1Departamento de Electrónica, Universidad de Jaén, Escuela Universitaria Politénica, 23700 Linares, Jaén, SPAIN, e-mail: [email protected] 2Dpto. de Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Escuela Politécnica 28871 Alcalá de Henares, Madrid, SPAIN, e-mail: [email protected] ABSTRACT The last one, the ISO/MPEG-4 standard, is composed of several speech and audio coders supporting bit rates This paper evaluates the coding gain of using a dynamic from 2 to 64 kbps per channel. ISO/MPEG-4 includes Huffman entropy coder in an audio coder that uses a the already proposed AAC standard, which provides high wavelet-packet decomposition that is close to the sub- quality audio coding at bit rates of 64 kbps per channel. band decomposition made by the human ear. The sub- Parallel to the definition of the ISO/MPEG standards, band audio signals are modeled as samples of a station- several audio coding algorithms have been proposed that ary random process with laplacian probability density use the wavelet transform as the tool to decompose the function because experimental results indicate that the audio signal. The most promising results correspond to highest coding efficiency is obtained in that case. We adapted wavelet-based audio coders. Probably, the most have also studied how the entropy coding gain varies cited is the one designed by Sinha and Tewfik [2], a high with the band index.
    [Show full text]
  • The Deep Learning Solutions on Lossless Compression Methods for Alleviating Data Load on Iot Nodes in Smart Cities
    sensors Article The Deep Learning Solutions on Lossless Compression Methods for Alleviating Data Load on IoT Nodes in Smart Cities Ammar Nasif *, Zulaiha Ali Othman and Nor Samsiah Sani Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science & Technology, University Kebangsaan Malaysia, Bangi 43600, Malaysia; [email protected] (Z.A.O.); [email protected] (N.S.S.) * Correspondence: [email protected] Abstract: Networking is crucial for smart city projects nowadays, as it offers an environment where people and things are connected. This paper presents a chronology of factors on the development of smart cities, including IoT technologies as network infrastructure. Increasing IoT nodes leads to increasing data flow, which is a potential source of failure for IoT networks. The biggest challenge of IoT networks is that the IoT may have insufficient memory to handle all transaction data within the IoT network. We aim in this paper to propose a potential compression method for reducing IoT network data traffic. Therefore, we investigate various lossless compression algorithms, such as entropy or dictionary-based algorithms, and general compression methods to determine which algorithm or method adheres to the IoT specifications. Furthermore, this study conducts compression experiments using entropy (Huffman, Adaptive Huffman) and Dictionary (LZ77, LZ78) as well as five different types of datasets of the IoT data traffic. Though the above algorithms can alleviate the IoT data traffic, adaptive Huffman gave the best compression algorithm. Therefore, in this paper, Citation: Nasif, A.; Othman, Z.A.; we aim to propose a conceptual compression method for IoT data traffic by improving an adaptive Sani, N.S.
    [Show full text]
  • Answers to Exercises
    Answers to Exercises A bird does not sing because he has an answer, he sings because he has a song. —Chinese Proverb Intro.1: abstemious, abstentious, adventitious, annelidous, arsenious, arterious, face- tious, sacrilegious. Intro.2: When a software house has a popular product they tend to come up with new versions. A user can update an old version to a new one, and the update usually comes as a compressed file on a floppy disk. Over time the updates get bigger and, at a certain point, an update may not fit on a single floppy. This is why good compression is important in the case of software updates. The time it takes to compress and decompress the update is unimportant since these operations are typically done just once. Recently, software makers have taken to providing updates over the Internet, but even in such cases it is important to have small files because of the download times involved. 1.1: (1) ask a question, (2) absolutely necessary, (3) advance warning, (4) boiling hot, (5) climb up, (6) close scrutiny, (7) exactly the same, (8) free gift, (9) hot water heater, (10) my personal opinion, (11) newborn baby, (12) postponed until later, (13) unexpected surprise, (14) unsolved mysteries. 1.2: A reasonable way to use them is to code the five most-common strings in the text. Because irreversible text compression is a special-purpose method, the user may know what strings are common in any particular text to be compressed. The user may specify five such strings to the encoder, and they should also be written at the start of the output stream, for the decoder’s use.
    [Show full text]
  • Data Compression in Solid State Storage
    Data Compression in Solid State Storage John Fryar [email protected] Santa Clara, CA August 2013 1 Acknowledgements This presentation would not have been possible without the counsel, hard work and graciousness of the following individuals and/or organizations: Raymond Savarda Sandgate Technologies Santa Clara, CA August 2013 2 Disclaimers The opinions expressed herein are those of the author and do not necessarily represent those of any other organization or individual unless specifically cited. A thorough attempt to acknowledge all sources has been made. That said, we’re all human… Santa Clara, CA August 2013 3 Learning Objectives At the conclusion of this tutorial the audience will have been exposed to: • The different types of Data Compression • Common Data Compression Algorithms • The Deflate/Inflate (GZIP/GUNZIP) algorithms in detail • Implementation Options (Software/Hardware) • Impacts of design parameters in Performance • SSD benefits and challenges • Resources for Further Study Santa Clara, CA August 2013 4 Agenda • Background, Definitions, & Context • Data Compression Overview • Data Compression Algorithm Survey • Deflate/Inflate (GZIP/GUNZIP) in depth • Software Implementations • HW Implementations • Tradeoffs & Advanced Topics • SSD Benefits and Challenges • Conclusions Santa Clara, CA August 2013 5 Definitions Item Description Comments Open A system which will compress Must strictly adhere to standards System data for use by other entities. on compress / decompress I.E. the compressed data will algorithms exit the system Interoperability among vendors mandated for Open Systems Closed A system which utilizes Can support a limited, optimized System compressed data internally but subset of standard. does not expose compressed Also allows custom algorithms data to the outside world No Interoperability req’d.
    [Show full text]
  • Superior Guarantees for Sequential Prediction and Lossless Compression Via Alphabet Decomposition
    Journal of Machine Learning Research 7 (2006) 379–411 Submitted 8/05; Published 2/06 Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition Ron Begleiter [email protected] Ran El-Yaniv [email protected] Department of Computer Science Technion - Israel Institute of Technology Haifa 32000, Israel Editor: Dana Ron Abstract We present worst case bounds for the learning rate of a known prediction method that is based on hierarchical applications of binary context tree weighting (CTW) predictors. A heuristic application of this approach that relies on Huffman’s alphabet decomposition is known to achieve state-of- the-art performance in prediction and lossless compression benchmarks. We show that our new bound for this heuristic is tighter than the best known performance guarantees for prediction and lossless compression algorithms in various settings. This result substantiates the efficiency of this hierarchical method and provides a compelling explanation for its practical success. In addition, we present the results of a few experiments that examine other possibilities for improving the multi- alphabet prediction performance of CTW-based algorithms. Keywords: sequential prediction, the context tree weighting method, variable order Markov mod- els, error bounds 1. Introduction Sequence prediction and entropy estimation are fundamental tasks in numerous machine learning and data mining applications. Here we consider a standard discrete sequence prediction setting where performance is measured via the log-loss (self-information). It is well known that this setting is intimately related to lossless compression, where in fact high quality prediction is essentially equivalent to high quality lossless compression.
    [Show full text]
  • Arxiv:1506.00348V3 [Physics.Soc-Ph] 4 Feb 2019
    Enhanced Gravity Model of trade: reconciling macroeconomic and network models Assaf Almog,1 Rhys Bird,2 and Diego Garlaschelli3, 2 1Tel Aviv University, Department of Industrial Engineering, 69978 Tel Aviv, Israel 2Instituut-Lorentz for Theoretical Physics, Leiden University, Niels Bohrweg 2, 2333 CA Leiden, Netherlands 3IMT School for Advanced Studies, Piazza S. Francesco 19, 55100 Lucca, Italy The structure of the International Trade Network (ITN), whose nodes and links represent world countries and their trade relations respectively, affects key economic processes worldwide, includ- ing globalization, economic integration, industrial production, and the propagation of shocks and instabilities. Characterizing the ITN via a simple yet accurate model is an open problem. The traditional Gravity Model (GM) successfully reproduces the volume of trade between connected countries, using macroeconomic properties such as GDP, geographic distance, and possibly other factors. However, it predicts a network with complete or homogeneous topology, thus failing to reproduce the highly heterogeneous structure of the ITN. On the other hand, recent maximum- entropy network models successfully reproduce the complex topology of the ITN, but provide no information about trade volumes. Here we integrate these two currently incompatible approaches via the introduction of an Enhanced Gravity Model (EGM) of trade. The EGM is the simplest model combining the GM with the network approach within a maximum-entropy framework. Via a unified and principled mechanism that is transparent enough to be generalized to any economic network, the EGM provides a new econometric framework wherein trade probabilities and trade volumes can be separately controlled by any combination of dyadic and country-specific macroe- conomic variables.
    [Show full text]
  • Improved Adaptive Arithmetic Coding in MPEG-4 AVC/H.264 Video Compression Standard
    Improved Adaptive Arithmetic Coding in MPEG-4 AVC/H.264 Video Compression Standard Damian Karwowski Poznan University of Technology, Chair of Multimedia Telecomunications and Microelectronics [email protected] Summary. An Improved Adaptive Arithmetic Coding is presented in the paper to be applicable in contemporary hybrid video encoders. The proposed Adaptive Arithmetic Encoder makes the improvement of the state-of-the-art Context-based Adaptive Binary Arithmetic Coding (CABAC) method that is used in the new worldwide video compression standard MPEG-4 AVC/H.264. The improvements were obtained by the use of more accurate mechanism of data statistics estimation and bigger size context pattern relative to the original CABAC. Experimental results revealed bitrate savings of 1.5% - 3% when using the proposed Adaptive Arithmetic Encoder instead of CABAC in the framework of MPEG-4 AVC/H.264 video encoder. 1 Introduction In order to improve compression performance of video encoder, entropy coding is used at the final stage of coding. Two types of entropy coding methods are used in the new generation video encoders: Context-Adaptive Variable Length Coding and more efficient Context-Adaptive Arithmetic Coding. The state- of-the-art entropy coding method used in video compression is Context-based Adaptive Binary Arithmetic Coding (CABAC) [4, 5] that found the appli- cation in the new worldwide Advanced Video Coding (AVC) standard (ISO MPEG-4 AVC and ITU-T Rec. H.264) [1, 2, 3]. Through the application of sophisticated mechanisms of data statistics modeling in CABAC, it is distin- guished by the highest compression performance among standardized entropy encoders used in video compression.
    [Show full text]