Signal Compression 1 Signal Compression 2

Entropy Coding A complete entropy , which is an encoder/decoder pair, consists of the process of “encoding” or Entropy coding is also known as “zero-error coding”, “compressing” a random source (typically quantized “” or “”. transform coefficients) and the process of “decoding” or Entropy coding is widely used in virtually all popular “decompressing” the compressed signal to “perfectly” international multimedia compression standards such as regenerate the original random source. In other words, JPEG and MPEG.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 3 Signal Compression 4 there is no loss of information due to the process of

Random Compressed entropy coding. Entropy Encoding Source Source Thus, entropy coding does not introduce any distortion, and hence, the combination of the entropy encoder and

Random Compressed Entropy Decoding entropy decoder faithfully reconstructs the input to the Source Source entropy encoder.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 5 Signal Compression 6

Therefore, any possible loss-of-information or distortion for such a system, and from the perspective of the that may be introduced in a signal compression system is entropy encoder, the input “random source” to that not due to entropy encoding/decoding. As we discussed encoder is the quantized transform coefficients. previously, a typical system, for Quantized Transform Coefficients Coefficients example, includes a transform process, a quantization Random Entropy Compressed Transform Quantization Source Coding Source process, and an entropy coding stage. In such system, the Examples Z Examples KLT Z Z Huffman DCT Z Z Arithmetic distortion is introduced due to quantization. Moreover, Wavelets

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 7 Signal Compression 8

Code Design and Notations second alphabet  is the one that is used for constructing

In general, entropy coding (or “source coding”) is the codewords. Based on the second alphabet , we can achieved by designing a code, C, which provides a one- construct and define the set D*, which is the set of all to-one mapping from any possible outcome a random finite-length string of symbols withdrawn from the variable X (“source”) to a codeword. alphabet .

There two alphabets in this case; one alphabet is the traditional alphabet  of the random source X , and the

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 9 Signal Compression 10

The most common and popular codes are binary codes, Alphabet (A)of Set of Codewords Random Source (X) D* where the alphabet of the codewords is simply the binary X + A Alphabet of code symbols A 00 used to construct bits “one” and “zero”. a 01 Codewords B + b 100 bBi C c 101 b  0 . 1110 1 .  b2 1

In this example: B  2

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 11 Signal Compression 12

Binary codes can be represented efficiently using binary Binary tree representation of a binary (D-ary; D=2) . 000 Set of Codewords trees. In this case, the first two branches of the root node 00 001 0 represent the possible bit assigned to the first bit of a 0 01 010 10 110 codeword. Once that first bit is known, and if the 011 111 100 codeword has a second bit, then the second pair of Alphabet of code symbols 1 10 101 used to construct codewords branches represents the second bit and so on. 11 110 B 0 1 111 BD2

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 13 Signal Compression 14

Definition The codewords in D* are formed from an alphabet B that

A source code, C, is a mapping from a random variable has D elements: ||D. We say that we have a D-ary

(source) X with alphabet  to a finite length string of code; or B is a D-ary alphabet. symbols, where each string of symbols (codeword) is a As discussed previously, the most common case is when member of the set D*: the alphabet B is the set B  0,1 ; therefore, in this case,

CD:  * D  2 and we have binary codewords.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 15 Signal Compression 16

Example We can define the code C as follows:

Let X be a random source with x +1, 2, 3, 4 . Codeword Length

  Let 0,1 , and hence ||D  2. Then: CCx110 L1 1

*   D  {0, 00, 000,...1,11, 111,... CCx2210L2 2 01,10, CCx 33110 L  3 001,010,011,100,... 3 ....   CCx 4 4 111 L4 3 }

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 17 Signal Compression 18

Definition Code Types

For a random variable X with a p.m.f. p12,pp ,..., m , the The design of a good code follows the basic notion of expected length of a code CX is: entropy: For random outcomes with a high probability, a

m good code assigns “short” codewords and vice versa. The  L CpLB ii. i1 overall objective is to have the average length L  LC

to be as small as possible.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 19 Signal Compression 20

In addition, we have to design codes that are uniquely In general, and as a start, we are interested in codes that

decodable. In other words, if the source generates a map each random outcome xi into a unique codeword

sequence: xxx123,,,... that is mapped into a sequence of that differs from the codeword of any other outcome. For

a random source with alphabet 1,2,...m a non- codewords Cx123, Cx , Cx ,... , then we should be

singular code meets the following constraint: able to recover the original source sequence xxx123,,,...    Cx i Cx j i j from the codewords sequence Cx123, Cx , Cx ,... .

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 21 Signal Compression 22

Although a non-singular code is uniquely decodable for a Example:

single symbol, it does not guarantee unique decodability Code C1 Code C2 for a sequence of outcomes of X . CCx 111 Cx 110

CCx 2210 Cx 200

CCx 3 3 101 Cx 311

CCx 4 4 111 Cx 4110

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 23 Signal Compression 24

In the above example, the code C1 is non-singular, It is important to note that a uniquely decodable code however, it is not uniquely decodable. Meanwhile, the may require the decoding of multiple codewords to

uniquely identify the original source sequence. code C2 is both non-singular and uniquely decodable.

Therefore, not all non-singular codes are uniquely This is the case for the above code C2. (Can you give an

decodable; however, every uniquely decodable code is example when the C2 decoder needs to wait for more non-singular. codewords before being able to uniquely decode a

sequence?)

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 25 Signal Compression 26

Therefore, it is highly desirable to design a uniquely Example: decodable code that can be decoded instantaneously In the following example, no codeword is used as a when receiving each codeword. prefix for any other codeword.

This type of codes are known as instantaneous, prefix CCx 110 CCx 2210  free, or simply prefix codes. CCx 33110 CCx 4  4 111

In a prefix code, a codeword cannot be used as a prefix for any other codewords.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 27 Signal Compression 28

It should be rather intuitive to know that every prefix All possible Non- codes code is uniquely decodable but the inverse is not always singular codes Uniquely decodable true. codes

Prefix In summary, the three major types of codes, non-singular, (instantaneous) codes uniquely decodable, and prefix codes, are related as shown in the following diagram.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 29 Signal Compression 30

Kraft Inequality Theorem

Based on the above discussion, it should be clear that For any prefix D-ary code C with codeword lengths

uniquely decodable codes represent a subset of all L12,LL ,..., m the following must be satisfied:

m possible codes. Also, prefix codes are a subset of  B D Li 1.  uniquely decodable codes. i 1

Prefix codes meet a certain constraint, which is known as the Kraft Inequality.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 31 Signal Compression 32

Conversely, given a set of codeword lengths that meet corresponding binary tree. (The same principles apply to

m  higher order codes/trees.) the inequality B D Li 1, there exists a prefix code for i1 For illustration purposes, let us consider the code: this set of lengths. CCx 110 CCx 2210  Proof CCx 33110 CCx 4  4 111 A prefix code C can be represented by a D-ary tree. This code can be represented as follows. Below we illustrate the proof using a binary code and a

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 33 Signal Compression 34

Binary tree representation of a An important attribute of the above tree representation of binary (D-ary; D=2) prefix code. 000 Set of Codewords D* codes is the number of leaf nodes that are associated with 00 001 0 0 01 010 10 each codeword. For example, the first codeword C 10  , 011 110 111 there are four leaf nodes that are associated with it. 100 Alphabet of code symbols Similarly, the codeword C 210  , has two leaf nodes. 1 10 101 used to construct codewords 11 110 B 0 1 111 BD2

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 35 Signal Compression 36

Binary tree representation of a The last two codewords are leaf nodes themselves, and binary (D-ary; D=2) prefix code. 000 £1 hence each of these is associated with a single leaf node 00 001 Leaf nodes of the (itself). 0 01 010 Codeword 0 011 £2 100 Leaf nodes of the

1 10 101 Codeword 10 11 110 111

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 37 Signal Compression 38

Note that for a prefix code, any codeword cannot be an 000 £1 00 001 Leaf nodes of the ancestor of any other codeword.

0 01 010 Codeword 0 Let Lmax be the maximum length among all codeword 011 £2 lengths of a prefix code. 100 Leaf nodes of the  1 10 101 Codeword 10 For each codeword with length Li Lmax , this codeword

11 110 £3 Leaf nodes of codeword 110 is at depth Li of the D-ary tree. Hence, the total number 111 £4 Leaf nodes of codeword 111

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 39 Signal Compression 40 of leaf nodes that are associated with (descendant of) a By similar arguments, one can construct a prefix code for

 Lmax Li codeword at level Li is D . a set of lengths that satisfy the above constraint:

m  Furthermore since each group £i of leaf nodes of a B D Li 1. i1 codeword with length Li is a disjoint from any other QED group of leaf nodes £j , then: m m   B DDLmaxLLi  max which implies: B D Li 1. i1 i1

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 41 Signal Compression 42

m Optimum Codes *  L minLC min B pLii such that LL, ,... L LL , ,... L 12mm 12 i1 Here we address the issue of finding minimum length m L L C codes given the constraint imposed by the Kraft B D i 1. i1

m inequality. In particular, we are interested in finding  If we assume that equality is satisfied: B D Li 1, we codes that satisfy: i1 can formulate the problem using Lagrange multipliers.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 43 Signal Compression 44

Consequently, we can minimize the following objective L* p ; D i  i .  ln D mm  function: JpLDBB Li . ii m ii11 L* Using the constraint B D i 1, i1 J  ;  Li  pDi ln D 0. L 1 i ;   ln D  ;  ln D p * . i L * i L D ; i  ; *  Dpi LiDilog p

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 45 Signal Compression 46

Therefore: where H D X is the entropy of the original source X

* The average length L C of an optimum code can be (measured with a logarithmic base D). expressed as: For a binary code, D  2, then the average length is the

m same as the standard (base-2) entropy measured in bits. L** B pL i i i1 Based on the above derivation, achieving an optimum m ; L* B pplog ; L*  HX *  iDi D prefix code C with an entropy length L HXD is only i1 possible when:

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 47 Signal Compression 48

L* i  ; *  Below, we state one of the most fundamental theorems in Dpi LiDilog p . that relates the average length of any However, and in general, the probability distribution prefix code with the entropy of the random source with values ( pi) do not necessarily guarantee integer-valued general distribution values ( p ). This theorem, which is lengths for the codewords. i commonly known as the entropy bound theorem,

illustrates that any code cannot have an average length

that is smaller than the entropy of the random source.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 49 Signal Compression 50

Theorem (Entropy Bound) Observation from the Entropy Bound Theorem

The expected length L of a prefix D-ary code C for a The Entropy Bound Theorem and its proof leads to

random source X with an entropy H D X satisfies important observations that we outline below:

the following inequality:  For random sources with distributions that satisfy

L LH X  i  D pi D , where Li is an integer for im1,2,..., ,

L i   there exists a prefix code that achieves the entropy with equality if-and-only-if Dpi i.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 51 Signal Compression 52

H D X . Such distributions are known as D-adic. Entropy Coding Methods

For the binary case, D  2, we have a Dyadic Here, we will discuss leading examples of entropy

distribution (or a dyadic code). coding methods that are broadly used in practice, and

Example of a dyadic distribution is: which have been adopted by leading international

1 1 1 1 compression standards. In particular, we will discuss p  , p  , p  , p  ; and 1 2 2 4 3 8 3 8 and , both of which L 1, L  2, L  3, L  3. 1 2 3 3 lead to optimal entropy coding.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 53 Signal Compression 54

Key Properties of Optimum Prefix Codes

Here, we outline few key properties of optimum Property 1 prefix codes that will lead to the Huffman coding If C j and Ck are two codewords of an optimum prefix procedure. code C, then:

We adopt the notation Ci to represent the codeword with

length Li of a code C. ;  ppjkLLjk

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 55 Signal Compression 56

Property 2

  000 Set of Codewords Assuming pp12 pmm 1 p, then the largest 00 001 0 codewords of an optimum code have the same length: 0 01 010 10 011 110

100  Lmm1 L . 1 10  101 Lm1 2 11 110  Lm 3 111

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 57 Signal Compression 58

000 Set of Codewords 000 Set of Codewords 00 001 0 00 001 0 0 01 010 10 0 01 010 10 11 011 110 011 110

100 100 1 10  1 10  101 Lm1 2 101 Lm1 2 11 110 11 110  L  2 Lm 3 m Unused shorter codeword 111 Unused shorter codeword 111

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 59 Signal Compression 60

Property 3 Property 4

There exits an optimum code C where the largest For a binary random source, the optimum prefix code codewords are siblings (i.e., they differ in one bit). is of length:

 L12L 1.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 61 Signal Compression 62

The Huffman Entropy Coding Procedure The Huffman coding procedures can be summarized by

The above properties lead to the Huffman entropy the following steps: coding procedure for generating prefix codes.A core 1. Sort the outcomes according probability notion in this procedure is the observation that distribution:

  optimizing a given code C is equivalent to optimizing a pp12 pmm 1 p. shortened version C '. 2. Merge the two least probable outcomes. And

assign a “zero” to one outcome and a “one” to the

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 63 Signal Compression 64

other outcome (treat them as binary, and use an Example

“optimum” binary code.). Find an optimum set of codewords:  1 p1  3. Repeat step 2 until we have a binary source, 3 CCCC1234?, ?, ?, ? p  1 The optimum codewords must meet the following: which one merged result in a probability 1. 2 3  LLLL1234 We now illustrate the Huffman procedure using few p  1 3 4  LL34 examples.  1 p4 12 C3 and C4 siblings

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 65 Signal Compression 66

Combining the least probable outcomes: Use the least probable outcomes of the shortened codes;

 1  1 p  1 p1 p  1 p1 1 3 1 3 3 3  1  1 0 p2 p  1 p2 3 2 3 3 p  2 2 3  1 0  1 0 p3 p  1 p3 p  1 4 3 3 4 3 3 1 p  1 p  1 4 12 1 4 12 1

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 67 Signal Compression 68

Now we have a probability one. Nothing else to merge. 0 C  0 p  1 1 1 3   0 p1 1 p  1 1 3 p  1 0 p 1 2 3 C 10 1 p  2 2  1 0 2 3 p2  1 0 3 p3 p  1 p  2 4 3 3 1 C 110 2 3 3  1 0  1 p3 p  1 p  1 4 3 3 1 4 12  1 C4 111 1 p  1 4 12 1 What is the average length L?

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 69 Signal Compression 70

In some cases, we may encounter more than one choice for merging the probability distribution values.(This was 0 C  00 p  1 1 1 3 2 the case in the above example.) One important question 3 0  p  1 C2 01 is: what is the impact of selecting one choice for 2 3 1 1.0 0 combining the probabilities versus the other? We p  1 3 4 1  3 C3 10 illustrate this below by selecting an alternative option for p  1 1 4 12  1 C4 11 combining the probabilities.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 71 Signal Compression 72

As can be seen in the above example, the Huffman The Huffman procedure can also be used for the case procedure can lead to different prefix codes (if multiple when D  2 (i.e., the code is not binary anymore). options for merging are encountered). Hence, an Care should be taken though when dealing with a non- important question is: Does one option provide a better binary code design. code (in terms of providing a smaller average code length

L)?

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 73 Signal Compression 74

Arithmetic Coding double the amount of bits per symbol (relative to the true

Although Huffman codes are optimal on a symbol-by- optimum limit of HX  0.5). symbol basis, there is still room for improvements in Arithmetic coding is an approach that addresses the terms of achieving lower “overhead”. For example, a overhead issue by coding a continuous sequence of binary source with entropy HX 1, still requires one source symbols while trying to approach the entropy bit-per-symbol when using a Huffman code. Hence, if, limit H X . Arithmetic coding has roots in a coding for example, HX  0.5, then a Huffman code spends approach proposed by Shannon, Fano, and Elisas, and

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 75 Signal Compression 76 hence, sometimes called the Shannon-Fano-Elias (SFE) Shannon-Fano-Elias Coding codes. Therefore, we first outline the principles and The SFE coding procedure is based on using the procedures of SFE codes, and then describe arithmetic cumulative distribution function (CDF) F x of a coding. random source X ;

F xXx Pr .

The CDF provides a unique one-to-one mapping for the

possible outcomes of any random source X .

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 77 Signal Compression 78

In other words, if we denote to the alphabet of a discrete F xFi  random source X by the integer index set:   1,2,...,m , F 4 then it is well known that: F 3

F iF  j , i j. F 2 F 1 This can be illustrated by the following example of a typical CDF function of a discrete random source. 1 2 3 4 x  i

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 79 Signal Compression 80

One important characteristics of the CDF of a discrete Based on the above CDF example, we can have a well- random source is that the CDF defines a set of non- defined set of non-overlapping intervals as shown in the overlapping intervals in its range of possible values next figure. between “zero” and “one”. (Recall that the CDF provides a measure of probability, and hence it is always confined between “zero” and “one”.)

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 81 Signal Compression 82

Another important observation is that the size of each F xFi  (non-overlapping) interval in the range of the CDF F x

Non-overlapping Intervals F 4 F 3 is defined by the probability-mass-function (PMF) value

F 2 p iXiPr  of a particular outcome X  i. This is F 1 the same is the level of “jumps” that we can observe in

the staircase-like shape of a CDF of a discrete random 1 2 3 4 x  i source. This is highlighted by the next figure.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 83 Signal Compression 84

Overall, and by using the CDF of a random source, one F xFi  can define a unique mapping between any possible F 4 p4 outcome and a particular (unique) interval in the range F 3 p3 F 2 between “zero” and “one”. Furthermore, one can select p 2 F 1 any value within each (unique) interval of a

p1 corresponding random outcome ()i to represent that 1 2 3 4 x  i

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 85 Signal Compression 86 outcome. This selected value serves as a “codeword” for 1. Map each outcome X  i to the interval

F that outcome i . HF iFi1, .

The SFE procedure, which is based on the above CDF- HFF i 1 , F i driven principles of unique mapping, can be defined as follows: Inclusive Exclusive

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 87 Signal Compression 88

2. Select a particular value within the interval In principle, any value within the interval

F F HF iFi1, to represent the outcome Xi . This HF iFi1, can be used for the modified CDF

value is known as the “modified CDF” and is denoted F i .

by F xFi . A natural choice is the middle of the corresponding

F interval HF iFi1, . Hence, the modified CDF

can be expressed as follows:

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 89 Signal Compression 90

p Fi  Fi 1 i , F xFi 2

which, in turn, can be expressed as: F 4 F 4 F 3 F iFi 1 Fi  . F 3 2 F 2 F 2 F 1 This is illustrated by the next figure. F 1

1 2 3 4 x  i

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 91 Signal Compression 92

So far, it should be clear that Fi +[0,1), and it Examples of modified CDF Values and Codewords

provides a unique mapping for the possible random The following table outlines a “dyadic” set of examples

outcomes of X . of values that could be used for a modified CDF F i

3. Generate a codeword to represent F i , and hence and the corresponding codewords for such values.

to represent the outcome X  i. Below we consider

simple examples of such codewords according to a

SFE coding procedure.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 93 Signal Compression 94

F i Binary Codeword The above values of modified CDF can be combined Representation to represent higher precision values as shown in the 1 1  2 0.1 1 2 next table.

1 2  2 0.01 01 4 F i Binary Codeword Representation 1  3 2 0.001 001 12 8 0.75 2 2 0.11 11

0.625 213 2 0.101 101

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 95 Signal Compression 96

must be sufficiently large to make sure that the codeword

In general the number of bits needed to code the representing F i is unique (i.e., there should not be modified CDF value F i could be infinite since F i overlap in the intervals representing the random could be any real number. In practice, however, a finite outcomes). By using a truncated value for the original

number of bits Li is used to represent (“approximate”) value F i , we anticipate a loss in precision.

F i . It should be clear that the number of bits Li used

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 97 Signal Compression 98

Let HXGWFi be the truncated value used to represent the It can be shown that the difference between the original Li GW modified CDF value F i and its approximation HXFi original modified CDF F i based on Li bits. Naturally, Li the larger number of bits used, the higher precision, and satisfies the following inequality: the smaller the difference between F i and HXGWFi . GW 1 Li FiHX Fi L . Li 2 i

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 99 Signal Compression 100

Consequently, and based on the definition of the This leads to the following constraint on the length Li :

p  i CS1 CSp modified CDF value: Fi Fi 1 , in order to ; logDT logDTi 2 EU22Li EU

 Li maintain unique mapping, the maximum error 2 has to ;   Lpiilog log 2 be smaller than p /2: i ;   Lpiilog 1 1 p  i . CS 22Li ; 1 Li logDT 1 EUpi

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 101 Signal Compression 102

Therefore: corresponding PMF, CDF, and modified CDF values and

FVCS codewords used based on SFE coding. 1 Li GWlogDT 1. GWEUpi

Example

The following table shows an example of a random source X with four possible outcomes and the

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 103 Signal Compression 104

Arithmetic Coding Xi pi F i F i F i SFE Li (Binary) Code The advantages of the SFE coding procedure can be 1 0.5 0.5 0.25 0.01 01 2 realized when it is sued to code multiple outcomes of the 2 0.25 0.75 0.625 0.101 101 3 random source under consideration. Arithmetic coding is 3 0.125 0.875 0.8125 0.1101 1101 4 basically an SFE coding applied to multiple outcomes of 4 0.125 1.0 0.9375 0.1111 1111 4 the random source.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 105 Signal Compression 106

Under AC, we code a sequence of n outcomes: The best way to illustrate arithmetic coding is through a

 + iiii12, ,..., n , where each outcome imj 1,2,..., . couple of examples as shown below.

Each possible vector Xi of the random source X is mapped to a unique value:

Fi + F()n [0,1).

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 107 Signal Compression 108

Example 1 F 3 Arithmetic coding begins with dividing the “zero” to F 3 xi+1, 2, 3 “one” range based on the CDF function of the random

source. In this example, the source can take one of F 2 F 2 three possible outcomes. F 1 F 1 0

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 109 Signal Compression 110

If we assume that we are interested in coding n  2 1 F 3 outcomes, the following figures show the particular

F 3 xi 3,2 interval and corresponding value F xFi that

arithmetic coding focuses on to code the vector F 2 F 2  F 1 ii12,3,2. F 1 0

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 111 Signal Compression 112

1 F 3 1 F 3 xi 3,2 F 3 xi 3,2 F 3 F x

Transmit this F 2 F 2 number to F 2 F 2 represent the F 1 F 1 vector: xi 3,2 F 1 F 1 0 0

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 113 Signal Compression 114

Similarly, the following figure shows the particular 1 F 3 interval and corresponding value F xFi that F 3 xi 1,3 arithmetic coding focuses on to code the vector

ii,1,3 . F 2 12 F 2 F 1 F x F 1 0

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 115 Signal Compression 116

Based on the above examples, we can define: Example

FF()nn () The coding process starts with the initial step values: F ()n  lu and '()nnnF () F (), 2 ul (0)  Fl 0 ()n ()n where Fu and Fl are the upper and lower bounds of a (0)  Fu 1 F ()nn () ()n unique interval HFFlu, that F belong to. Below, '(0) (0)  (0) FulF we use these expressions to illustrate the arithmetic coding procedure.

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 117 Signal Compression 118

'()nnn ()  () 1 (0)  After the initial step, the interval FulF and F 3 Fu 1 FF()nn () corresponding value F ()n  lu are updated 2 '(0) F 2 according to the particular outcomes that the random (0) (0) FulF source is generating. This is illustrated below. F 3

0 (0)  Fl 0

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 119 Signal Compression 120

1 (0)  1 (0) (1)  F 3 Fu 1 F 3 Fu Fu ?

Example i  i ,i (0) 1 2 ' F 2  2,3 F 2 i  2,3 (0) (0) '(1) FulF  i1 2 (1) (1) FulF F 1 F 1

0 (0)  0 (0) (1)  Fl 0 Fl Fl ?

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 121 Signal Compression 122

1 (0) F (1) 'F (0) (0).F i 1 (0) F (1) 'F (0) (0).F i F 3 Fu u l 1 F 3 Fu u l 1 (1)   FFFu 01. 22

F 2 i  2,3 F 2 i  2,3 '(1) '(1)   i1 2 i1 2 (1) (1) (1) (1) FulF FulF F 1 F 1 ()1 (0)' (0)  Fl Fl .Fi1 1 0 (0) F ()1 F (0)' (0).Fi 1 0 (0) Fl l l 1 Fl (1)   Fl 01.FF11

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 123 Signal Compression 124

(2) (1) The arithmetic coding procedure can be summarized by Fu Fl 1 F (0) F (1) F 3 u u '(1) .F i2 the following steps that are outlines below.

 i2 3 '(2) F 2 (2) (2) '(1) FulF   i 2,3 i1 2 (1) (1) FulF F 1 F ()2 F (1) 0 F (0) F (1) l l l l '(1)  .Fi2 1

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 125 Signal Compression 126

Similar to SFE coding, after determining the value  n ()nn  () Fx FFF lu 2 F ()n , we use L()n bits to represent F ()n according to

()n (1)n' (n 1) FFiu Fl . n the constraint:

 FV ()n (1)n '()n 1  ()n 1 FFil Fl . n 1 L GWlog 1. GWpx

'(1)nnn (1) (1) FFul

(0)  (0)  0 Fl 0 Fu 1 '1

Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha

Signal Compression 127

Copyright © 2005-2008 – Hayder Radha