CALIFORNIA STATE UNIVERSITY, NORTHRIDGE
LOSSLESS COMPRESSION OF SATELLITE TELEMETRY DATA FOR A
NARROW-BAND DOWNLINK
A graduate project submitted in partial fulfillment of the requirements
For the degree of Master of Science in
Electrical Engineering
By
Gor Beglaryan
May 2014
Copyright
Copyright (c) 2014, Gor Beglaryan
Permission to use, copy, modify, and/or distribute the software developed for this project for any purpose with or without fee is hereby granted.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR
CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING
FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
Copyright by Gor Beglaryan
ii
Signature Page
The graduate project of Gor Beglaryan is approved:
______Prof. James A Flynn Date
______Dr. Deborah K Van Alphen Date
______Dr. Sharlene Katz, Chair Date
California State University, Northridge
iii
Contents Copyright ...... ii Signature Page ...... iii List of Figures ...... vi List of Tables ...... viii ABSTRACT ...... ix 1 Introduction ...... 1 2 Background: Information Theory and Coding ...... 2 2.1 Outline ...... 2 2.2 Formulas and Measures of Performance ...... 4 2.3 Lossy vs. Lossless Compression ...... 5 3 Design Procedure ...... 7 3.1 Problem Definition ...... 7 3.2 Huffman Codes ...... 8 3.2.1 How Huffman Codes Work ...... 8 3.2.2 Huffman Algorithm Design ...... 10 3.2.3 Efficiency of Huffman Compression ...... 15 3.2.4 Sample Output of the Static Huffman Algorithm ...... 17 3.3 Adaptive Huffman Codes ...... 20 3.3.1 How Adaptive Huffman Codes Work ...... 21 3.3.2 Adaptive Huffman Algorithm Design ...... 23 3.3.3 Efficiency of Adaptive Huffman Codes ...... 29 3.3.4 Sample Output of the Adaptive Huffman Algorithm ...... 30 3.4 Arithmetic Coding ...... 31 3.4.1 How Arithmetic Coding Works ...... 32 3.4.2 Arithmetic Coding Algorithm Design ...... 36 3.4.3 Efficiency of Arithmetic Coding ...... 46 3.4.4 Sample Output of the Static Arithmetic Algorithm ...... 47 3.5 Adaptive Arithmetic Coding ...... 48 3.5.1 How Adaptive Arithmetic Coding Works ...... 48 3.5.2 Efficiency of Adaptive Arithmetic Coding ...... 50 4 Performance Tests and Comparison ...... 51 4.1 Pic33 Analog-to-Digital Converter Output Format ...... 51
iv
4.2 Benchmark Test ...... 54 4.2.1 Test Data ...... 54 4.2.2 Compression Ratio Test ...... 58 4.2.3 Timing Test ...... 63 4.2.4 Discussion of Test Results ...... 68 4.3 Delta Compression ...... 70 5 Conclusions ...... 72 Bibliography ...... 73 Appendix A: MATLAB Code for Static Huffman Compression ...... 75 Appendix B: MATLAB Code for Adaptive Huffman Compression ...... 84 Appendix C: MATLAB Code for Static Arithmetic Coding ...... 94 Appendix D: MATLAB Code for Adaptive Arithmetic Coding ...... 101 Appendix E: MATLAB Code for 10 bit Conversion ...... 106 Appendix F: MATLAB Code for 12 bit Conversion...... 112
v
List of Figures
Figure 2.1-Simplified source and channel coding system...... 3 Figure 3.1-Static Huffman compression flowchart...... 8 Figure 3.2-Huffman binary tree example...... 9 Figure 3.3-Flowchart for traversing a binary tree...... 13 Figure 3.4-Static Huffman compression output format...... 14 Figure 3.5-Encoded binary tree example...... 15 Figure 3.6-Output variable info of the Static Huffman program...... 17 Figure 3.7-Output variable info.codewords of the Static Huffman program...... 19 Figure 3.8-Histogram generated by the Static Huffman Program...... 19 Figure 3.9-Adaptive Huffman encoding flowchart [15]...... 22 Figure 3.10-Adaptive Huffman tree example...... 25 Figure 3.11-Adaptive Huffman tree node update example...... 26 Figure 3.12-Adaptive Huffman compression output format...... 27 Figure 3.13-Adaptive Huffman decoder flowchart [17]...... 28 Figure 3.14- Output variable info of the Adaptive Huffman program...... 30 Figure 3.15-Final binary tree table of the Adaptive Huffman simulation...... 30 Figure 3.16-Output variable info.codewords of the Adaptive Huffman program...... 31 Figure 3.17-Generating unique tag for Static Arithmetic Coding ...... 33 Figure 3.18-Arithmetic Coding Case 0 and Case 1 rescaling [23]...... 37 Figure 3.19-Arithmetic Coding Case S rescaling [23]...... 38 Figure 3.20-Static Arithmetic Coding flowchart...... 41 Figure 3.21-Static Arithmetic Coding output format...... 42 Figure 3.22-Static Arithmetic Coding output example...... 43 Figure 3.23-Static Arithmetic decoder flowchart...... 45 Figure 3.24-Output of the Static Arithmetic program...... 47 Figure 4.1-dsPic33 ADC Output Format...... 51 Figure 4.2-Data conversion from input to output...... 53 Figure 4.3-Symbol histogram of the book of Genesis...... 54 Figure 4.4-Sample indoor temperature data...... 55 Figure 4.5-Sample outdoor temperature data...... 55 Figure 4.6-Sample wind speed data...... 56 Figure 4.7-Sample wind gust data...... 56 Figure 4.8-Sample magnetic field data...... 57 Figure 4.9-Compression results for the book of Genesis...... 58 Figure 4.10-Compression results for indoor temperature data...... 59 Figure 4.11-Compression results for outdoor temperature data...... 60 Figure 4.12-Symbol histogram for indoor and outdoor temperature data source alphabet...... 61 Figure 4.13-Compression results for magnetic field data...... 62 Figure 4.14-Compression results for wind gust data...... 62 Figure 4.15-Compression results for wind speed data...... 63 Figure 4.16-Compression time results (linear scale)...... 64 Figure 4.17-Compression time results (logarithmic scale)...... 65
vi
Figure 4.18-Decompression time results (linear scale)...... 66 Figure 4.19-Total compression and decompression time (linear scale)...... 67 Figure 4.20-Total compression and decompression time (logarithmic scale)...... 67
vii
List of Tables
Table 2.1-Information theory and coding: outline of topics...... 3 Table 3.1-Table for storing Huffman Tree...... 11 Table 3.2-Table for storing Adaptive Huffman tree...... 24 Table 3.3-Source alphabet information of the sequence "abracadabra"...... 43
viii
ABSTRACT
LOSSLESS COMPRESSION OF SATELLITE TELEMETRY DATA FOR A
NARROW-BAND DOWNLINK
By
Gor Beglaryan
Master of Science in Electrical Engineering
The objective of this project is to select a lossless compression technique to be
implemented on a CubeSat being developed by CSUN. The goal is to compress satellite
telemetry data in a timely and computationally efficient manner and achieve reasonable compression ratio. There are two main parts in this project; Algorithm Development and
Benchmark Tests. In the former phase four lossless compression techniques, namely
Huffman, Adaptive Huffman, Arithmetic and Adaptive Arithmetic, were implemented in
MATLAB. Concise description and implementation details of each algorithm are given in
the text. Next, during the Benchmark Test phase, sample data is input to the algorithms and
performance metrics are collected. The collected performance measures include
compression ratio, compression time and decompression time. Based on the results, it is
recommended that Adaptive Arithmetic coding be selected for the CubeSat project.
ix
1 Introduction
CubeSats are small satellites between 1000 and 3000 cm3 in volume. They are a low cost method for conducting experiments in space. Due to the size, weight, power limitations and orbit of these small satellites the downlink data rates are often limited.
Thus, it may be necessary to compress the data on the satellite before it is transmitted. This project is a study of some of the lossless compression schemes that might be used.
In order to select a lossless compression technique, four algorithms, each representing a different compression method, have been implemented in MATLAB. All algorithms perform necessary data manipulation, parsing, compression, decompression and original data recovery. These routines and subroutines enable modeling of an actual encoder and decoder, hence aiding in simulation and comparison of each compression method. As a result the most suitable and efficient algorithm can be selected for further analysis and possible implementation on the CubeSat.
Section 2 of this report presents some background information on Information Theory and Coding that was needed to complete this project. Section 3 presents the four coding techniques that were studied and the algorithms for their implementation. Section 4 presents the benchmark tests and compares the coding techniques. Section 5 includes the conclusions and suggestions for future work.
1
2 Background: Information Theory and Coding
With the introduction of the personal computer in the 1970s and the establishment of the Internet in the ‘90s, the Information Age had begun and was gaining large momentum very rapidly. As it is now known, this revolution resulted in an exponential growth of accessible information for the general public; this growth is still ongoing. This information is in the form of music, pictures, video, satellite telemetry data, medical imaging data, and a vast array of multimedia formats we encounter in our daily life. It is fair to say that this would not be possible without data compression.
Before the Information Age data compression was mainly in the radar of a small group of engineers who had already developed modern compression techniques, such as Huffman
Compression [1]. However, as the number of data sources expanded, the need for larger storage expanded as well. New compression techniques were needed that could significantly reduce the number of bits required for storage.
2.1 Outline
Data compression, commonly referred to as “source coding”, is a sub-field of
Information Theory and Coding. To better understand the role of data compression in the field of information theory consider Table 2.1 [2]. The table shows that data compression is applied to the source, to reduce the information size to be transmitted. In contrast, error correction is performed to transfer information reliably over a non-ideal channel that is subject to noise and other forms of distortion depending on the channel. These different components can be visualized in the simplified source and coding system shown in Figure
2.1.
2
Furthermore, compression itself can be approached from two different angles, mathematical and algorithmic. The mathematical part is purely analytical and deals with
Compression/Source Coding Error-Correction/Channel (efficiency) Coding (reliability)
Information Theory i. Source coding i. Noisy channel (math) theorem coding theorem ii. Kraft-McMillan ii. Channel capacity inequality theorem iii. Rate-distortion iii. Typicality & AEP theorem Coding methods i. Symbol codes, e.g. i. Hamming codes (algorithms) Huffman codes ii. Turbo codes ii. Stream coder, e.g. iii. Gallager codes Arithmetic coding, iv. BCH codes, e.g. Lempel-Ziv coding Reed-Solomon codes Table 2.1-Information theory and coding: outline of topics.
Figure 2.1-Simplified source and channel coding system. theorems that help to measure performance and shed light on the limitations of compression. The algorithmic perspective is a hands-on approach trying to overcome the limitations by creating complex compression packages that can manage a variety of data formats with reasonable performance.
3
2.2 Formulas and Measures of Performance
Studying information theory and coding can quickly lead into depths that are outside
the scope of this project. However, some fundamental formulas are necessary to have a
basic understanding of how information is measured, how much the source data can be
compressed with a given compression technique and how reliably the data can be
recovered.
When speaking of information theory it is essential to know how information is
measured and what it represents. If ( ) is the probability that the event A will occur, then
the self-information associated with A quantifies the uncertainty of the event and is given
1 i(A)=log =- log P(A) (2.1) b P(A) b
by [3]. Equation 2.1 shows that the self-information of an event is a positive quantity.
Moreover, the smaller the probability of the event, the higher the information. In other
words, if an event is not expected to happen, the information gained would be high if it
actually happened, or equivalently if an event has high probability of occurrence then there
is little uncertainty associated with the event. Probability and uncertainty of an event are
inversely proportional. Self-information is measured in bits. The self-information in
Equation 2.1 is for a single event that leads to the average self-information of an experiment consisting of a number of independent events. This quantity is called entropy [4] and is calculated by: