Lecture Notes on Channel Coding
Total Page:16
File Type:pdf, Size:1020Kb
Lecture Notes on Channel Coding Georg B¨ocherer Institute for Communications Engineering Technical University of Munich, Germany [email protected] July 5, 2016 arXiv:1607.00974v1 [cs.IT] 4 Jul 2016 These lecture notes on channel coding were developed for a one-semester course for graduate students of electrical engineering. Chapter 1 reviews the basic problem of channel coding. Chapters 2{5 are on linear block codes, cyclic codes, Reed-Solomon codes, and BCH codes, respectively. The notes are self-contained and were written with the intent to derive the presented results with mathematical rigor. The notes contain in total 68 homework problems, of which 20% require computer programming. Contents 1 Channel Coding7 1.1 Channel . .7 1.2 Encoder . .8 1.3 Decoder . .8 1.3.1 Observe the Output, Guess the Input . .8 1.3.2 MAP Rule . .9 1.3.3 ML Rule . 10 1.4 Block Codes . 10 1.4.1 Probability of Error vs Transmitted Information . 10 1.4.2 Probability of Error, Information Rate, Block Length . 12 1.4.3 ML Decoder . 14 1.5 Problems . 16 2 Linear Block Codes 20 2.1 Basic Properties . 20 2.1.1 Groups and Fields . 20 2.1.2 Vector Spaces . 22 2.1.3 Linear Block Codes . 23 2.1.4 Generator Matrix . 23 2.2 Code Performance . 25 2.2.1 Hamming Geometry . 25 2.2.2 Bhattacharyya Parameter . 27 2.2.3 Bound on Probability of Error . 27 2.3 Syndrome Decoding . 29 2.3.1 Dual Code . 31 2.3.2 Check Matrix . 32 2.3.3 Cosets . 33 2.3.4 Syndrome Decoder . 34 2.4 Problems . 36 3 Cyclic Codes 42 3.1 Basic Properties . 42 3.1.1 Polynomials . 42 3.1.2 Cyclic Codes . 43 3.1.3 Proofs . 44 3 3.2 Encoder . 46 3.2.1 Encoder for Linear Codes . 46 3.2.2 Efficient Encoder for Cyclic Codes . 47 3.3 Syndromes . 48 3.3.1 Syndrome Polynomial . 48 3.3.2 Check Matrix . 48 3.4 Problems . 51 4 Reed{Solomon Codes 53 4.1 Minimum Distance Perspective . 53 4.1.1 Correcting t Errors . 53 4.1.2 Singleton Bound and MDS Codes . 54 4.2 Finite Fields . 54 4.2.1 Prime Fields Fp ............................ 55 4.2.2 Construction of Fields Fpm ...................... 55 4.3 Reed{Solomon Codes . 57 4.3.1 Puncturing RS Codes . 59 4.3.2 RS Codes via Fourier Transform . 59 4.3.3 Syndromes . 62 4.3.4 Check Matrix for RS Codes . 62 4.3.5 RS Codes as Cyclic Codes . 63 4.4 Problems . 64 5 BCH Codes 67 5.1 Basic Properties . 67 5.1.1 Construction of Minimal Polynomials . 68 5.1.2 Generator Polynomial of BCH Codes . 70 5.2 Design of BCH Codes Correcting t Errors . 72 5.3 Erasure Decoding . 72 5.3.1 Erasure Decoding of MDS Codes . 73 5.3.2 Erasure Decoding of BCH Codes . 73 5.4 Decoding of BCH Codes . 74 5.4.1 Example . 74 5.4.2 Linear Recurrence Relations . 75 5.4.3 Syndrome Polynomial as Recurrence . 77 5.4.4 Berlekamp-Massey Algorithm . 78 5.5 Problems . 79 Bibliography 82 Index 83 4 Preface The essence of reliably transmitting data over a noisy communication medium by channel coding is captured in the following diagram. U encoder X channel Y decoder X^ U:^ ! ! ! ! ! ! ! Data U is encoded by a codeword X, which is then transmitted over the channel. The decoder uses its observation Y of the channel output to calculate a codeword estimate X^, from which an estimate U^ of the transmitted data is determined. These notes start in Chapter 1 with an invitation to channel coding, and provide in the following chapters a sample path through algebraic coding theory. The destination of this path is decoding of BCH codes. This seemed reasonable to me, since BCH codes are widely used in standards, of which DVB-T2 is an example. This course covers only a small slice of coding theory. However within this slice, I have tried to derive all results with mathematical rigor, except for some basic results from abstract algebra, which are stated without proof. The notes can hopefully serve as a starting point for the study of channel coding. References The notes are self-contained. When writing the notes, the following references were helpful: Chapter1:[1],[2]. • Chapter2:[3]. • Chapter3:[3]. • Chapter4:[4],[5]. • Chapter5:[6],[7]. • Please report errors of any kind to [email protected]. G. B¨ocherer 5 Achnowledgments I used these notes when giving the lecture \Channel Coding" at the Technical Uni- versity of Munich in the winter terms from 2013 to 2015. Many thanks to the stu- dents Julian Leyh, Swathi Patil, Patrick Schulte, Sebastian Baur, Christoph Bachhuber, Tasos Kakkavas, Anastasios Dimas, Kuan Fu Lin, Jonas Braun, Diego Su´arez,Thomas Jerkovits, and Fabian Steiner for reporting the errors to me, to Siegfried B¨ocherer for proofreading the notes, and to Markus Stinner and Hannes Bartz, who were my teaching assistants and contributed many of the homework problems. G. B¨ocherer 6 1 Channel Coding In this chapter, we develop a mathematical model of data transmission over unreliable communication channels. Within this model, we identify a trade-off between reliability, transmission rate, and complexity. We show that the exhaustive search for systems that achieve the optimal trade-off is infeasible. This motivates the development of the algebraic coding theory, which is the topic of this course. 1.1 Channel We model a communication channel by a discrete and finite input alphabet , a discrete and finite output alphabet , and transition probabilities X Y PY jX (b a) := Pr(Y = b X = a); b ; a : (1.1) j j 2 Y 2 X The probability PY jX (b a) is called the likelihood that the output value is b given that the input value is a. Forj each input value a , the output value is a random variable 2 X Y that is distributed according to PY jX ( a). ·| Example 1.1. The binary symmetric channel (BSC) has the input alphabet = 0; 1 , the output alphabet = 0; 1 and the transition probabilities X f g Y f g input 0: PY jX (1 0) = 1 PY jX (0 0) = δ (1.2) j − j input 1: PY jX (0 1) = 1 PY jX (1 1) = δ: (1.3) j − j The parameter δ is called the crossover probability. Note that PY jX (0 0) + PY jX (1 0) = (1 δ) + δ = 1 (1.4) j j − which shows that PY jX ( 0) defines a distribution on = 0; 1 . ·| Y f g Example 1.2. The binary erasure channel (BEC) has the input alphabet = 0; 1 , the output alphabet = 0; 1; e and the transition probabilities X f g Y f g input 0: PY jX (e 0) = 1 PY jX (0 0) = , PY jX (1 0) = 0 (1.5) j − j j input 1: PY jX (e 1) = 1 PY jX (1 1) = , PY jX (0 1) = 0: (1.6) j − j j 7 The parameter is called the erasure probability. 1.2 Encoder For now, we model the encoder as a device that chooses the channel input X according to a distribution PX that is defined as PX (a) := Pr(X = a); a : (1.7) 2 X For each symbol a , PX (a) is called the a priori probability of the input value a. In Section 3.2, we will2 take X a look at how an encoder generates the channel input X by encoding data. 1.3 Decoder Suppose we want to use the channel once. This corresponds to choosing the input value according to a distribution PX on the input alphabet . The probability to transmit a value a and to receive a value b is given by X PXY (ab) = PX (a)PY jX (b a): (1.8) j We can think of one channel use as a random experiment that consists in drawing a sample from the joint distribution PXY . We assume that both the a priori probabilities PX and the likelihoods PY jX are known at the decoder. 1.3.1 Observe the Output, Guess the Input At the decoder, the channel output Y is observed. Decoding consists in guessing the input X from the output Y . More formally, the decoder consists in a deterministic function f : : (1.9) Y!X We want to design an optimal decoder, i.e., a decoder for which some quantity of in- terest is maximized. A natural objective for decoder design is to maximize the average probability of correctly guessing the input from the output, i.e., we want to maximize the average probability of correct decision, which is given by Pc := Pr[X = f(Y )]: (1.10) The average probability of error is given by Pe := 1 Pc: (1.11) − 8 1.3.2 MAP Rule We now derive the decoder that maximizes Pc. X Pc = Pr[X = f(Y )] = PXY (ab) (1.12) ab2X ×Y : a=f(b) X = PY (b)PXjY (a b) (1.13) j ab2X ×Y : a=f(b) X = PY (b)PXjY [f(b) b]: (1.14) j b2Y From the last line, we see that maximizing the average probability of correct decision is equivalent to maximizing for each observation b the probability to guess the input correctly. The optimal decoder is therefore given2 by Y f(b) = arg max PXjY (a b): (1.15) a2X j The operator `arg max' returns the argument where a function assumes its maximum value, i.e., a∗ = arg max f(a∗) = max f(a): a2X , a2X The probability PXjY (a b) is called the a posteriori probability of the input value a given the output value b.