What Happens When the Noise Is Stochastic: Shannon's Theorem
Total Page:16
File Type:pdf, Size:1020Kb
Foreword This chapter is based on lecture notes from coding theory courses taught by Venkatesan Gu- ruswami at University at Washington and CMU; by Atri Rudra at University at Buffalo, SUNY and by Madhu Sudan at MIT. This version is dated July 21, 2015. For the latest version, please go to http://www.cse.buffalo.edu/ atri/courses/coding-theory/book/ The material in this chapter is supported in part by the National Science Foundation under CAREER grant CCF-0844796. Any opinions, findings and conclusions or recomendations ex- pressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF). ©Venkatesan Guruswami, Atri Rudra, Madhu Sudan, 2013. This work is licensed under the Creative Commons Attribution-NonCommercial- NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA. Chapter 6 What Happens When the Noise is Stochastic: Shannon’s Theorem Shannon was the first to present a rigorous mathematical framework for communication, which (as we have already seen) is the problem of reproducing at one point (typically called the “re- ceiver" of the channel) a message selected at another point (called the “sender" to the channel). Unlike Hamming, Shannon modeled the noise stochastically, i.e. as a well defined random pro- cess. He proved a result that pin-pointed the best possible rate of transmission of information over a very wide range of stochastic channels. In fact, Shannon looked at the communication problem at a higher level, where he allowed for compressing the data first (before applying any error-correcting code), so as to minimize the amount of symbols transmitted over the channel. In this chapter, we will study some stochastic noise models (most of) which were proposed by Shannon. We then prove an optimal tradeoff between the rate andfractionoferrorsthatare correctable for a specific stochastic noise model called the Binary Symmetric Channel. 6.1 Overview of Shannon’s Result Shannon introduced the notion of reliable communication1 over noisy channels. Broadly, there are two types of channels that were studied by Shannon: •(Noisy Channel) This type of channel introduces errors during transmission, which result in an incorrect reception of the transmitted signal by the receiver. Redundancy is added at the transmitter to increase reliability of the transmitted data. The redundancy is taken off at the receiver. This process is termed as Channel Coding. •(Noise-free Channel) As the name suggests, this channel does not introduce any type of error in transmission. Redundancy in source data is used to compress the source data at the transmitter. The data is decompressed at the receiver. The process is popularly known as Source Coding. 1That is, the ability to successfully send the required information over a channel that can lose or corrupt data. 109 Figure 6.1 presents a generic model of a communication system, which combines the two con- cepts we discussed above. Source Channel Message Encoder Encoder Channel Source Channel (Decoded) Message Decoder Decoder Figure 6.1: The communication process In Figure 6.1, source coding and channel coding are coupled. In general, to get the optimal performance, it makes sense to design both the source and channel coding schemes simultane- ously. However, Shannon’s source coding theorem allows us to decouple both these parts of the communication setup and study each of these parts separately. Intuitively, this makes sense: if one can have reliable communication over the channel using channel coding, then for the source coding the channel effectively has no noise. For source coding, Shannon proved a theorem that precisely identifies the amount by which the message can be compressed: this amount is related to the entropy of the message. We will however, not talk much more about source coding in in this book. (However, see Exer- cises 6.10, 6.11 and 6.12.) From now on, we will exclusively focus on the channel coding part of the communication setup. Note that one aspect of channel coding is how we model the chan- nel noise. So far we have seen Hamming’s worst case noise model in some detail. Next, we will study some specific stochastic channels. 6.2 Shannon’s Noise Model Shannon proposed a stochastic way of modeling noise. The input symbols to the channel are assumed to belong to some input alphabet X , while the channel selects symbols from its out- put alphabet Y . The following diagram shows this relationship: X x channel y Y ∋ → → ∈ The channels considered by Shannon are also memoryless, that is, noise acts independently on each transmitted symbol. In this book, we will only study discrete channels where both the alphabets X and Y are finite. For the sake of variety, we will define one channel that is contin- uous, though we will not study it in any detail later on. 110 The final piece in specification of a channel is the transition matrix M that governs the pro- cess of how the channel introduces error. In particular, the channel is described in form of a matrix with entries as cross over probability over all combination of the input and output al- phabets. For any pair (x, y) X Y , let Pr(y x) denote the probability that y is output by the ∈ × | channel when x is input to the channel. Then the transition matrix is given by M(x, y) Pr(y x). = | Specific structure of the matrix is shown below. M ⎛ Pr(y x) ⎞ = ··· . | ··· ⎜ . ⎟ ⎜ ⎟ ⎝ ⎠ Next, we look at some specific instances of channels. Binary Symmetric Channel (BSC). Let 0 p 1. The Binary Symmetric Channel with crossover ≤ ≤ probability p or BSC is defined as follows. X Y {0,1}. The 2 2 transition matrix can nat- p = = × urally be represented as a bipartite graph where the left vertices correspond to the rows and the right vertices correspond to the columns of the matrix, where M(x, y) is represented as the weight of the corresponding (x, y)edge.ForBSCp , the graph is illustrated in Figure 6.2. 1 p 0 − 0 p p 1 1 1 p − Figure 6.2: Binary Symmetric Channel BSCp In other words, every bit is flipped with probability p. We claim that we need to only con- sider the case when p 1 , i.e. if we know how to ensure reliable communication over BSC for ≤ 2 p p 1 , then we can also handle the case of p 1 . (See Exercise 6.1.) ≤ 2 > 2 q-ary Symmetric Channel (qSC). We now look at the generalization of BSCp to alphabets of size q 2. Let 0 p 1 1 . (As with the case of BSC , we can assume that p 1 1/q–see ≥ ≤ ≤ − q p ≤ − Exercise 6.2.) The q-ary Symmetric Channel with crossover probability p,orqSCp is defined as follows. X Y [q]. The transition matrix M for qSC is defined as follows. = = p 1 p if y x M(x, y) p− = = q 1 if y x ' − ̸= In other words, every symbol is retained as it at the output with probability 1 p and is distorted p − to each of the q 1 possible different symbols with equal probability of q 1 . − − 111 Binary Erasure Channel (BEC) In the previous two examples that we saw, X Y .However = this might not always be the case. Let 0 α 1. The Binary Erasure Channel with erasure probability α (denoted by BEC )is ≤ ≤ α defined as follows. X {0,1} and Y {0,1,?}, where ? denotes an erasure. The transition matrix = = is as follows: 1 α 0 − 0 α ? α 1 1 1 α − Figure 6.3: Binary Erasure Channel BECα In Figure 6.3 any missing edge represents a transition that occurs with 0 probability. In other words, every bit in BECα is erased with probability α (and is left unchanged with probability 1 α). − Binary Input Additive Gaussian White Noise Channel (BIAGWN). We now look at a channel that is continuous. Let σ 0. The Binary Input Additive Gaussian White Noise Channel with ≥ standard deviation σ or BIAGWN is defined as follows. X { 1,1} and Y R.Thenoiseis σ = − = modeled by continuous Gaussian probability distribution function. The Gaussian distribution has lots of nice properties and is a popular choice for modeling noise continuous in nature. Given (x, y) { 1,1} R,thenoisey x is distributed according to the Gaussian distribution of ∈ − × − mean of zero and standard deviation of σ.Inotherwords, 1 (y x)2 Pr y x exp − | = ) · − 2σ2 σ 2π * * ++ ( ) 6.2.1 Error Correction in Stochastic Noise Models We now need to revisit the notion of error correction from Section 1.3. Note that unlike Ham- ming’s noise model, we cannot hope to always recover the transmitted codeword. As an ex- ample, in BSCp there is always some positive probability that a codeword can be distorted into another codeword during transmission. In such a scenario no decoding algorithm can hope to recover the transmitted codeword. Thus, in some stochastic channels there is always some decoding error probability (where the randomness is from the channel noise): see Exercise6.14 for example channels where one can have zero decoding error probability. However, we would like this error probability to be small for every possible transmitted codeword. More precisely, for every message, we would like the decoding algorithm to recover the transmitted message with probability 1 f (n), where limn f (n) 0, that is f (n)iso(1). Ideally, we would like to Ω(n)− →∞ → have f (n) 2− .