INVITED PAPER Channel Coding: The Road to Channel Capacity Fifty years of effort and invention have finally produced coding schemes that closely approach Shannon’s channel capacity limit on memoryless communication channels.

By Daniel J. Costello, Jr., Fellow IEEE, and G. David Forney, Jr., Life Fellow IEEE

ABSTRACT | Starting from Shannon’s celebrated 1948 channel As Bob McEliece observed in his 2004 Shannon coding theorem, we trace the evolution of channel coding from Lecture [2], the extraordinary efforts that were required Hamming codes to capacity-approaching codes. We focus on to achieve this objective may not be fully appreciated by the contributions that have led to the most significant future historians. McEliece imagined a biographical note improvements in performance versus complexity for practical in the 166th edition of the Encyclopedia Galactica along the applications, particularly on the additive white Gaussian noise following lines. channel. We discuss algebraic block codes, and why they did not prove to be the way to get to the Shannon limit. We trace : Born on the planet Earth (Sol III) the antecedents of today’s capacity-approaching codes: con- in the year 1916 A.D. Generally regarded as the volutional codes, concatenated codes, and other probabilistic father of the Information Age, he formulated the coding schemes. Finally, we sketch some of the practical notion of channel capacity in 1948 A.D. Within applications of these codes. several decades, mathematicians and engineers had devised practical ways to communicate reliably at KEYWORDS | Algebraic block codes; channel coding; codes on data rates within 1% of the Shannon limit ... graphs; concatenated codes; convolutional codes; low-density parity-check codes; turbo codes The purpose of this paper is to tell the story of how Shannon’s challenge was met, at least as it appeared to us, before the details of this story are lost to memory. I. INTRODUCTION We focus on the AWGN channel, which was the target The field of channel coding started with Claude Shannon’s for many of these efforts. In Section II, we review various 1948 landmark paper [1]. For the next half century, its definitions of the Shannon limit for this channel. central objective was to find practical coding schemes that In Section III, we discuss the subfield of algebraic could approach channel capacity (hereafter called Bthe coding, which dominated the field of channel coding for its Shannon limit[) on well-understood channels such as the first couple of decades. We will discuss both the additive white Gaussian noise (AWGN) channel. This goal achievements of algebraic coding, and also the reasons proved to be challenging, but not impossible. In the past why it did not prove to be the way to approach the decade, with the advent of turbo codes and the rebirth of Shannon limit. low-density parity-check (LDPC) codes, it has finally been In Section IV, we discuss the alternative line of achieved, at least in many cases of practical interest. development that was inspired more directly by Shannon’s random coding approach, which is sometimes called

Manuscript received May 31, 2006; revised January 22, 2007. This work was supported Bprobabilistic coding.[ This line of development includes in part by the National Science Foundation under Grant CCR02-05310 and by the convolutional codes, product codes, concatenated codes, National Aeronautics and Space Administration under Grant NNGO5GH73G. D. J. Costello, Jr., is with the Department of , University of Notre trellis decoding of block codes, and ultimately modern Dame, Notre Dame, IN 46556 USA (e-mail: [email protected]). capacity-approaching codes. G. D. Forney, Jr., is with the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139 USA In Section V, we discuss codes for bandwidth- (e-mail: [email protected]). limited channels, namely lattice codes and trellis-coded Digital Object Identifier: 10.1109/JPROC.2007.895188 modulation.

1150 Proceedings of the IEEE | Vol. 95, No. 6, June 2007 0018-9219/$25.00 Ó2007 IEEE

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

Finally, in Section VI, we discuss the development of So, we may say that the Shannon limit on rate (i.e., the capacity-approaching codes, principally turbo codes and channel capacity) is W log 1 SNR b/s, or equivalently 2ð þ Þ LDPC codes. that the Shannon limit on spectral efficiency is log 1 SNR b/s/Hz, or equivalently that the Shannon 2ð þ Þ limit on SNR for a given spectral efficiency  is 2 1. À II. CODINGFORTHEAWGNCHANNEL Note that the Shannon limit on SNR is a lower bound A coding scheme for the AWGN channel may be rather than an upper bound. characterized by two simple parameters: its signal-to-noise These bounds suggest that we define a normalized SNR ratio (SNR) and its spectral efficiency  in bits per second parameter SNRnorm as follows: per Hertz (b/s/Hz). The SNR is the ratio of average signal power to average noise power, a dimensionless quantity. SNR The spectral efficiency of a coding scheme that transmits R SNR : norm ¼ 2 1 bits per second (b/s) over an AWGN channel of bandwidth À W Hz is simply  R=W b/s/Hz. ¼ Coding schemes for the AWGN channel typically map a Then, for any reliable coding scheme, SNRnorm 9 1, i.e., sequence of bits at a rate R b/s to a sequence of real the Shannon limit (lower bound) on SNRnorm is 1 (0 dB), symbols at a rate of 2B symbols per second; the discrete- independent of . Moreover, SNRnorm measures the Bgap time code rate is then r R=2B bits per symbol. The [ ¼ to capacity, i.e., 10 log10 SNRnorm is the difference in sequence of real symbols is then modulated via pulse decibels (dB)1 between the SNR actually used and the amplitude modulation (PAM) or quadrature amplitude Shannon limit on SNR given , namely 2 1. If the À modulation (QAM) for transmission over an AWGN desired spectral efficiency is less than 1 b/s/Hz (the so- channel of bandwidth W. By Nyquist theory, B (sometimes called power-limited regime), then it can be shown that called the BShannon bandwidth[ [3]) cannot exceed the binary codes can be used on the AWGN channel with a actual bandwidth W. If B W, then the spectral efficiency  cost in Shannon limit on SNR of less than 0.2 dB. On is  R=W R=B 2r. We therefore say that the ¼  ¼ the other hand, since for a binary coding scheme the nominal spectral efficiency of a discrete-time coding scheme discrete-time code rate is bounded by r 1 bit per  is 2r, the discrete-time code rate in bits per two symbols. symbol, the spectral efficiency of a binary coding scheme The actual spectral efficiency  R=W of the cor- is limited to  2r 2 b/s/Hz, so multilevel coding ¼   responding continuous-time scheme is upperbounded by schemes must be used if the desired spectral efficiency is the nominal spectral efficiency 2r and approaches 2r as greater than 2 b/s/Hz (the so-called bandwidth-limited B W. Thus, for discrete-time codes, we will often ! regime). In practice, coding schemes for the power-limited denote 2r by , implicitly assuming B W.  and bandwidth-limited regimes differ considerably. Shannon showed that on an AWGN channel with a A closely related normalized SNR parameter that has given SNR and bandwidth W Hz, the rate of reliable been traditionally used in the power-limited regime is transmission is upperbounded by Eb=N0, which may be defined as

R G W log 1 SNR : SNR 2 1 2ð þ Þ E =N À SNR : b 0 ¼  ¼  norm Moreover, if a long code with rate R G W log 1 SNR is 2ð þ Þ chosen at random, then there exists a decoding scheme For a given spectral efficiency , Eb=N0 is thus lower such that with high probability the code and decoder will bounded by achieve highly reliable transmission (i.e., low probability of decoding error).  Equivalently, Shannon’s result shows that the spectral 2 1 Eb=N0 9 À efficiency is upperbounded by 

so we may say that the Shannon limit (lower bound) on  G log2 1 SNR ð þ Þ E =N as a function of  is 2 1 =. This function b 0 ð À Þ decreases monotonically with  and approaches ln 2 as or, given a spectral efficiency , that the SNR needed for  0, so we may say that the ultimate Shannon limit ! reliable transmission is lowerbounded by (lower bound) on Eb=N0 for any  is ln 2( 1.59 dB). À

 SNR 9 2 1: 1In decibels, a multiplicative factor of  is expressed as 10 log  dB. À 10 Vol. 95, No. 6, June 2007 | Proceedings of the IEEE 1151

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

We see that as  0, E =N SNR ln 2, so to the Nyquist limit of 2W binary symbols per second, ! b 0 ! norm Eb=N0 and SNRnorm become equivalent parameters in the using standard pulse amplitude modulation (PAM) for severely power-limited regime. In the power-limited baseband channels or quadrature amplitude modulation regime, we will therefore use the traditional parameter (QAM) for passband channels. Eb=N0. However, in the bandwidth-limited regime, we will At the receiver, an optimum PAM or QAM detector use SNR , which is more informative in this regime. can produce a real-valued n-tuple y x n, where x is norm ¼ þ the transmitted sequence and n is a discrete-time white Gaussian noise sequence. The optimum (maximum like- III. ALGEBRAICCODING lihood) decision rule is then to choose the one of the 2k The algebraic coding paradigm dominated the first several possible transmitted sequences x that is closest to the decades of the field of channel coding. Indeed, most of the received sequence y in Euclidean distance. textbooks on coding of this period (including Peterson [4], If the symbol rate 2B approaches the Nyquist limit of Berlekamp [5], Lin [6], Peterson and Weldon [7], 2W symbols per second, then the transmitted data rate MacWilliams and Sloane [8], and Blahut [9]) covered can approach R k=n 2W b/s, so the spectral efficiency ¼ð Þ only algebraic . of such a binary coding scheme can approach  ¼ Algebraic coding theory is primarily concerned with 2k=n b/s/Hz. As mentioned previously, since k=n 1, we  linear n; k; d block codes over the binary field F .A have  2 b/s/Hz, i.e., binary coding cannot be used in ð Þ 2  binary linear n; k; d block code consists of 2k binary the bandwidth-limited regime. ð Þ n-tuples, called codewords, which have the group prop- With no coding (independent transmission of random erty: i.e., the componentwise mod-2 sum of any two bits via PAM or QAM), the transmitted data rate is 2W b/s, codewords is another codeword. The parameter d denotes so the nominal spectral efficency is  2 b/s/Hz. It is ¼ the minimum Hamming distance between any two distinct straightforward to show that with optimum modulation codewords, i.e., the minimum number of coordinates in and detection the probability of error per bit is which any two codewords differ. The theory generalizes to linear n; k; d block codes over nonbinary fields F . ð Þ q The principal objective of algebraic coding theory is to Pb E Q pSNR Q 2Eb=N0 ð Þ¼ ð Þ¼ ð Þ maximize the minimum distance d for a given n; k . The ffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð Þ motivation for this objective is to maximize error- correction power. Over a binary symmetric channel where (BSC: a binary-input, binary-output channel with statisti- cally independent binary errors), the optimum decoding 1 1 y2=2 rule is to decode to the codeword closest in Hamming Q x eÀ dy distance to the received n-tuple. With this rule, a code ð Þ¼p2% Z x with minimum distance d can correct all patterns of ffiffiffiffiffiffi d 1 =2 or fewer channel errors (assuming that d is ð À Þ odd), but cannot correct some patterns containing a is the Gaussian probability of error function. This baseline greater number of errors. performance curve of P E versus E =N for uncoded bð Þ b 0 The field of algebraic coding theory has had many transmission is plotted in Fig. 1. For example, in order to 5 successes, which we will briefly survey below. However, achieve a bit error probability of P E 10À , we must bð Þ even though binary algebraic block codes can be used on have E =N 9:1 (9.6 dB) for uncoded transmission. b 0  the AWGN channel, they have not proved to be the way to On the other hand, the Shannon limit on Eb=N0 for approach channel capacity on this channel, even in the  2 is E =N 1:5 (1.76 dB), so the gap to capacity at ¼ b 0 ¼ power-limited regime. Indeed, they have not proved to be the uncoded binary-PAM spectral efficiency of  2 is ¼ the way to approach channel capacity even on the BSC. As SNR 7:8 dB. If a coding scheme with unlimited norm  we proceed, we will discuss some of the fundamental bandwidth expansion were allowed, i.e.,  0, then a ! reasons for this failure. further gain of 3.35 dB to the ultimate Shannon limit on E =N of 1.59 dB would be achievable. These two limits b 0 À A. Binary Coding on the Power-Limited are also shown in Fig. 1. AWGN Channel The performance curve of any practical coding scheme A binary linear n; k; d block code may be used on a that improves on uncoded transmission must lie between ð Þ Gaussian channel as follows. the relevant Shannon limit and the uncoded performance To transmit a codeword, each of its n binary symbols curve. Thus, Fig. 1 defines the Bplaying field[ for channel may be mapped into the two symbols  of a binary coding. The real coding gain of a coding scheme at a given fÆ g pulse-amplitude-modulation (2-PAM) alphabet, yielding a probability of error per bit P E will be defined as the bð Þ two-valued real n-tuple x. This n-tuple may then be sent difference (in decibels) between the Eb=N0 required to through a channel of bandwidth W at a symbol rate 2B up obtain that P E with coding versus without coding. Thus, bð Þ 1152 Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

Fig. 1. P E versus E =N for uncoded binary PAM, compared to Shannon limits on E =N for  2 and  0. bð Þ b 0 b 0 ¼ !

5 the maximum possible real coding gain at Pb E 10À is where the quantity c dk=n is called the nominal coding ð Þ 3 ¼ about 11.2 dB. gain of the code. The real coding gain is less than the For moderate-complexity binary linear n; k; d codes, nominal coding gain  if the Berror coefficient[ N =k is ð Þ c d it can often be assumed that the decoding error probabil- greater than one. A rule of thumb that is valid when the error ity is dominated by the probability of making an error coefficient Nd=k is not too large and Pb E is on the order of 6 ð Þ to one of the nearest neighbor codewords. If this as- 10À is that a factor of 2 increase in the error coefficient sumption holds, then it is easy to show that with costs about 0.2 dB of real coding gain. As P E 0, the real bð Þ! optimum (minimum-Euclidean-distance) decoding, the coding gain approaches the nominal coding gain c, so c is decoding error probability P E per block is well ap- also called the asymptotic coding gain. Bð Þ proximated by the union bound estimate For example, consider the binary linear (32, 6, 16) Bbiorthogonal[ block code, so called because the Euclidean images of the 64 codewords consist of 32 orthogonal

PB E NdQ pd SNR NdQ dk=n 2Eb=N0 vectors and their negatives. With this code, every code- ð Þ ð ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiÁ Þ¼ ðpffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið Þ Þ word has N 62 nearest neighbors at minimum Ham- d ¼ ming distance d 16. Its nominal spectral efficiency is ¼ where denotes the number of codewords of Hamming  3=8, its nominal coding gain is c 3 (4.77 dB), and Nd ¼ ¼ weight d. The probability of decoding error per informa- its probability of decoding error per information bit is tion bit P E 2 is then given by bð Þ

Pb E 62=6 Q 6Eb=N0 ð Þ  ð Þ ðpffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiÞ P E P E =k bð Þ¼ Bð Þ 3An n; k; d code with odd minimum distance d may be extended by N =k Q dk=n 2E =N0 ð d Þ ð ð Þ b Þ addition ofð an overallÞ parity-check to an n 1; k; d 1 even-minimum- pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi distance code. For error correction, suchð anþ extensionþ isÞ of no use, since Nd=k Q c2Eb=N0 ¼ð Þ ðpffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiÞ the extended code corrects no more errors but has a lower code rate; but for an AWGN channel, such an extension always helps (unless k 1 and ¼ d n), since the nominal coding gain c dk=n increases. Thus, an 2The probability of error per information bit is not, in general, the author¼ who discusses odd-distance codes¼ is probably thinking about same as the bit error probability (average number of bit errors per minimum-Hamming-distance decoding, whereas an author who discusses transmitted bit); although both normally have the same exponent even-distance codes is probably thinking about minimum-Euclidean- (argument of the Q function). distance decoding.

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE 1153

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

Fig. 2. P E versus E =N for (32, 6, 16) biorthogonal block code, compared to uncoded PAM and Shannon limits. bð Þ b 0

which is plotted in Fig. 2. We see that this code requires in his original paper [1]. , a colleague E =N 5:8 dBtoachieve P E 10 5, so its real coding of Shannon at Bell Labs, developed an infinite class of b 0  bð Þ À gain at this error probability is about 3.8 dB. single-error-correcting d 3 binary linear codes, with ð ¼ Þ At this point, we can already identify two issues that parameters (n 2m 1, k 2m 1 m, d 3) for ¼ À ¼ À À ¼ must be addressed to approach the Shannon limit on m 2 [10]. Thus, k=n 1 and  2 as m , while  ! ! !1 AWGN channels. First, in order to obtain optimum  3 (4.77 dB). However, even with optimum soft- c ! performance, the decoder must operate on the real-valued decision decoding, the real coding gain of Hamming codes received sequence y (Bsoft decisions[) and minimize on the AWGN channel never exceeds about 3 dB. Euclidean distance, rather than quantize to a two-level The Hamming codes are Bperfect,[ in the sense that received sequence (Bhard decisions[) and minimize the spheres of Hamming radius 1 about each of the 2k Hamming distance. It can be shown that hard decisions codewords contain 2m binary n-tuples and thus form a (two-level quantization) generally cost 2 to 3 dB in Bperfect[ (exhaustive) partition of binary n-space F n. ð 2Þ decoding performance. Thus, in order to approach the Shortly after the publication of Shannon’s paper, the Swiss Shannon limit on an AWGN channel, the error-correction mathematician Marcel Golay published a half-page paper paradigm of algebraic coding must be modified to [11] with a Bperfect[ binary linear (23, 12, 7) triple-error- accommodate soft decisions. correcting code, in which the spheres of Hamming radius 3 Second, we can see already that decoding complexity is about each of the 212 codewords (containing 23 23 23 23 11 going to be an issue. For optimum decoding, soft decision or 0 1 2 3 2 binary n-tuples) form an þ þ þ ¼23 hard, the decoder must choose the best of 2k codewords, so exhaustiveÀÁ ÀÁ partitionÀÁ ofÀÁF , and also a similar Bperfect[ ð 2Þ a straightforward exhaustive optimum decoding algorithm (11, 6, 5) double-error-correcting ternary code. These will require on the order of 2k computations. Thus, as codes binary and ternary Golay codes have come to be considered become large, lower complexity decoding algorithms that probably the most remarkable of all algebraic block codes, approach optimum performance must be devised. and it is now known that no other nontrivial Bperfect[ linear codes exist. Berlekamp [12] characterized Golay’s B. The Earliest Codes: Hamming, Golay, paper as the Bbest single published page[ in coding theory and Reed–Muller during 1948–1973. The first nontrivial code to appear in the literature Another early class of error-correcting codes was the was the (7, 4, 3) Hamming code, mentioned by Shannon Reed–Muller (RM) codes, which were introduced in 1954

1154 Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

Fig. 3. Tableau of Reed–Muller codes.

by David Muller [13] and then reintroduced shortly consisting of all binary 2m-tuples and the 2m; 1; 2m ð Þ thereafter with an efficient decoding algorithm by Irving repetition codes. RM codes also include the 2m; 2m 1; 2 ð À Þ Reed [14]. The RM r; m codes are a class of multiple- single-parity-check (SPC) codes, the 2m; 2m m 1; 4 ð Þ 4 m ð m À1 À Þ error-correcting n; k; d codes parameterized by two extended Hamming codes, the 2 ; m 1; 2 À biortho- ð Þ m ð þm m 1 Þ m 1 =2 integers r and m,0 r m, such that n 2 and gonal codes, and, for odd m, a class of 2 ; 2 À ; 2ð þ Þ   ¼ 5 ð Þ d 2m r. The RM 0; m code is the 2m; 1; 2m binary self-dual codes. ¼ À ð Þ ð Þ repetition code (consisting of two codewords, the all-zero Reed [14] introduced a low-complexity hard-decision and all-one words), and the RM m; m code is the error-correction algorithm for RM codes based on a simple ð Þ 2m; 2m; 1 binary code consisting of all binary 2m-tuples majority-logic decoding rule. This simple decoding rule is ð Þ (i.e., uncoded transmission). able to correct all hard-decision error patterns of weight Starting with RM 0; 1 2; 1; 2 and d 1 =2 or less, which is the maximum possible (i.e., it ð Þ¼ð Þ bð À Þ c RM 1; 1 2; 2; 1 , the RM codes may be constructed is a bounded-distance decoding rule). This simple majority- ð Þ ¼ ð Þ recursively by the length-doubling u u v (Plotkin, logic, hard-decision decoder was attractive for the j j þ j squaring) construction as follows: technology of the 1950s and 1960s. RM codes are thus an infinite class of codes with flexible parameters that can achieve near-optimal decoding RM r; m u; u v u RM r; m 1 ; on a BSC with a simple decoding algorithm. This was an ð Þ ¼f ð þ Þj 2 ð À Þ v RM r 1; m 1 : important advance over the Hamming and Golay codes, 2 ð À À Þg whose parameters are much more restrictive. Performance curves with optimum hard-decision From this construction it follows that the dimension k coding are shown in Fig. 4 for the (31, 26, 3) Hamming of RM r; m is given recursively by code, the (23, 12, 7) Golay code, and the (31, 16, 7) ð Þ shortened RM code. We see that they achieve real coding 5 gains at Pb E 10À of only 0.9, 2.3, and 1.6 dB, k r; m k r; m 1 k r 1; m 1 ð Þ ð Þ¼ ð À Þþ ð À À Þ respectively. The reasons for this poor performance are the use of hard decisions, which costs roughly 2 dB, and r m the fact that by modern standards these codes are very or nonrecursively by k r; m i 0 i . ð Þ¼ ¼ short. Fig. 3 shows the parametersP ofÀÁ the RM codes of It is clear from the tableau of Fig. 3 that RM codes are lengths 32 in a tableau that reflects this length-  not asymptotically Bgood[; that is, there is no sequence doubling construction. For example, the RM(2, 5) code is a (32, 16, 8) code that can be constructed from the 4An n; k; d code can be extended by adding code symbols or RM 2; 4 16; 11; 4 code and the RM 1; 4 16; 5; 8 shortenedð by deletingÞ code symbols; see footnote 3. ð Þ ¼ ð Þ ð Þ ¼ ð Þ 5An n; k; d binary linear code forms a k-dimensional subspace of the code. ð Þ n vector space (F2 . The dual of an n; k; d code is the n k -dimensional RM codes include several important subclasses of orthogonal subspace.Þ A code thatð equalsÞ its dual isð calledÀ Þ self-dual. For codes. We have already mentioned the 2m; 2m; 1 codes self-dual codes, it follows that k n=2. ð Þ ¼ Vol. 95, No. 6, June 2007 | Proceedings of the IEEE 1155

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

Fig. 4. P E versus E =N for (31, 26, 3) Hamming code, (23, 12, 7) Golay code, and (31, 16, 7) shortened RM code with optimum hard-decision bð Þ b 0 decoding, compared to uncoded binary PAM.

of n; k; d RM codes of increasing length n such that C. Soft Decisions: Wagner Decoding ð Þ both k=n and d=n are bounded away from zero as On the road to modern capacity-approaching codes for n . Since asymptotic goodness was the Holy Grail of AWGN channels, an essential step has been to replace !1 algebraic coding theory (it is easy to show that typical hard-decision with soft-decision decoding, i.e., decoding random binary codes are asymptotically good), and since that takes into account the reliability of received channel codes with somewhat better n; k; d (e.g., BCH codes) outputs. ð Þ were found subsequently, theoretical attention soon The earliest soft-decision decoding algorithm known to turned away from RM codes. us is Wagner decoding, described in [15] and attributed to However, in recent years it has been recognized that C. A. Wagner. It is an optimum decoding rule for the BRM codes are not so bad.[ RM codes are particularly good special class of n; n 1; 2 SPC codes. Each received real- ð À Þ in terms of performance versus complexity with trellis- valued symbol rk from an AWGN channel may be based decoding and other soft-decision decoding algo- represented in sign-magnitude form, where the sign rithms, as we note in Section IV-E. Finally, they are almost sgn r indicates the Bhard decision,[ and the magnitude ð kÞ as good in terms of n; k; d as the best binary codes known r indicates the Breliability[ of r . The Wagner rule is: ð Þ j kj k for lengths less than 128, which is the principal application first check whether the hard-decision binary n-tuple is a domain of algebraic block codes. codeword. If so, accept it. If not, then flip the hard decision Indeed, with optimum decoding, RM codes may be corresponding to the output rk that has the minimum Bgood enough[ to reach the Shannon limit on the AWGN reliability r . j kj channel. Notice that the nominal coding gains of the self- It is easy to show that the Wagner rule finds the dual RM codes and the biorthogonal codes become minimum-Euclidean-distance codeword, i.e., that Wagner infinite as m . It is known that with optimum decoding is optimum for an n; n 1; 2 SPC code. !1 ð À Þ (minimum-Euclidean-distance) decoding, the real coding Moreover, Wagner decoding is much simpler than gain of the biorthogonal codes does asymptotically exhaustive minimum-distance decoding, which requires n 1 approach the ultimate Shannon limit, albeit with expo- on the order of 2 À computations. nentially increasing complexity and vanishing spectral efficiency. It seems likely that the real coding gains of the D. BCH and Reed-Solomon Codes self-dual RM codes with optimum decoding approach the In the 1960s, research in channel coding was Shannon limit at the nonzero spectral efficiency of  1, dominated by the development of algebraic block codes, ¼ albeit with exponential complexity, but to our knowledge particularly cyclic codes. The algebraic coding paradigm this has never been proved. used the structure of finite-field algebra to design efficient

1156 Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

encoding and error-correction procedures for linear block which was interpreted by Jim Massey [22] as an algorithm codes operating on a hard-decision channel. The emphasis for finding the shortest linear feedback shift register that was on constructing codes with a guaranteed minimum can generate a certain sequence. This Berlekamp–Massey distance d and then using the algebraic structure of the algorithm became the standard for the next decade. Finally, codes to design bounded-distance error-correction algo- it was shown that these algorithms could be straightfor- rithms whose complexity grows only as a small power of d. wardly extended to correct both erasures and errors [23] In particular, the goal was to develop flexible classes of and even to correct soft decisions [24], [25] (suboptimally, easily implementable codes with better performance than but in some cases asymptotically optimally). RM codes. The fact that RS codes are inherently nonbinary (the Cyclic codes are codes that are invariant under cyclic longest binary RS code has length 3) may cause difficulties (Bend-around[) shifts of n-tuple codewords. They were in using them over binary channels. If the 2m-ary RS code first investigated by Eugene Prange in 1957 [16] and symbols are simply represented as binary m-tuples and sent became the primary focus of research after the publication over a binary channel, then a single binary error can cause of Wesley Peterson’s pioneering text in 1961 [4]. Cyclic an entire 2m-ary symbol to be incorrect; this causes RS codes have a nice algebraic theory and attractively simple codes to be inferior to BCH codes as binary-error- encoding and decoding procedures based on cyclic shift- correcting codes. However, in this mode RS codes are register implementations. Hamming, Golay, and shortened inherently good burst-error-correcting codes, since the RM codes can be put into cyclic form. effect of an m-bit burst that is concentrated in a single The Bbig bang[ in this field was the invention of Bose– RS code symbol is only a single symbol error. In fact, it Chaudhuri–Hocquenghem (BCH) and Reed–Solomon can be shown that RS codes are effectively optimal binary (RS) codes in three independent papers in 1959 and burst-error-correcting codes [26]. 1960 [17]–[19]. It was shortly recognized that RS codes The ability of RS codes to correct both random and are a class of nonbinary BCH codes, or alternatively that burst errors makes them particularly well suited for BCH codes are subfield subcodes of RS codes. applications such as magnetic tape and disk storage, where Binary BCH codes include a large class of t-error- imperfections in the storage media sometimes cause bursty correcting cyclic codes of length n 2m 1, odd mini- errors. They are also useful as outer codes in concatenated ¼ À mum distance d 2t 1, and dimension k n mt. coding schemes, to be discussed in Section IV-D. For these ¼ þ  À Compared to shortened RM codes of a given length reasons, RS codes are probably the most widely deployed n 2m 1, there are more codes from which to choose, codes in practice. ¼ À and for n 63 the BCH codes can have a somewhat larger  dimension k for a given minimum distance d. However, E. RS Code Implementations BCH codes are still not asymptotically Bgood.[ Although The first major application of RS codes was as outer they are the premier class of binary algebraic block codes, codes in concatenated coding systems for deep-space they have not been used much in practice, except as Bcyclic communications. For the 1977 Voyager mission, the Jet redundancy check[ (CRC) codes for error detection in Propulsion Laboratory (JPL) used a (255, 223, 33), automatic-repeat-request (ARQ) systems. 16-error-correcting RS code over F256 as an outer code, In contrast, the nonbinary RS codes have proved to be with a rate-1/2, 64-state convolutional inner code (see highly useful in practice (although not necessarily in cyclic also Section IV-D). The RS decoder used special-purpose form). An (extended or shortened) RS code over the finite hardware for decoding and was capable of running up to field F , q 2m; can have any block length up to n q 1, about 1 Mb/s [27]. This concatenated convolutional/RS q ¼ ¼ þ any minimum distance d n (where Hamming distance coding system became a NASA standard.  is defined in terms of q-ary symbols), and dimension The year 1980 saw the first major commercial k n d 1, which meets an elementary upper bound application of RS codes in the compact disc (CD) ¼ À þ called the Singleton bound [20]. In this sense, RS codes standard. This system used two short RS codes over are optimum. F256, namely (32, 28, 5) and (28, 24, 5) RS codes and An important property of RS and BCH codes is that operated at bit rates on the order of 4 Mb/s [28]. All they can be efficiently decoded by algebraic decoding subsequent audio and video magnetic storage systems algorithms using finite-field arithmetic. A glance at the have used RS codes for error correction, nowadays at tables of contents of the IEEE Transactions on much higher rates. shows that the development of Cyclotomics, Inc., built a prototype Bhypersystolic[ RS such algorithms was one of the most active research fields decoder in 1986–1988 that was capable of decoding a of the 1960s. (63, 53, 11) RS code over F64 at bit rates approaching Already by 1960, Peterson had developed an error- 1 Gb/s [29]. This decoder may still hold the RS de- correction algorithm with complexity on the order of d3 coding speed record. [21]. In 1968, [5] devised an error- RS codes continue to be preferred for error correction correction algorithm with complexity on the order of d2, when the raw channel error rate is not too large, because

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE 1157

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

they can provide substantial error-correction power with been propounded in 1971, it seems to be hard for the relatively small redundancy at data rates up to tens or human mind to grasp what a factor of 106 can make hundreds of megabits per second. They also work well possible. against bursty errors. In these respects, they complement modern capacity-approaching codes. G. Further Developments in Algebraic Coding Theory F. The BCoding is Dead[ Workshop Of course algebraic coding theory has not died; it The first IEEE Communication Theory Workshop in continues to be an active research area. A recent text in St. Petersburg, Florida, in April 1971, became famous as this area is Roth [31]. the Bcoding is dead[ workshop. No written record of this A new class of block codes based on algebraic geometry workshop seems to have survived. However, Bob Lucky (AG) was introduced by Goppa in the late 1970s [32], [33]. wrote a column about it many years later in IEEE Tsfasman, Vladut, and Zink [34] constructed AG codes Spectrum [30], [134]. Lucky recalls the following: over nonbinary fields F with q 49 whose minimum q  distance as n surpasses the Gilbert-Varshamov !1 BA small group of us in the communications field bound (the best known lower bound on the minimum will always remember a workshop held in Florida distance of block codes), which is perhaps the most notable about 20 years ago ... One of my friends [Ned achievement of AG codes. AG codes are generally much Weldon] gave a talk that has lived in infamy as the longer than RS codes and can usually be decoded by Bcoding is dead[ talk. His thesis was that he and the extensions of RS decoding algorithms. However, AG codes other coding theorists formed a small, inbred group have not been adopted yet for practical applications. For that had been isolated from reality for too long. He a nice survey of this field, see [35]. illustrated this talk with a single slide showing a pen In 1997, Sudan [36] introduced a list decoding of rats that psychologists had penned in a confined algorithm based on polynomial interpolation for decoding space for an extensive period of time. I cannot tell beyond the guaranteed error-correction distance of RS you what those rats were doing, but suffice it to say and related codes.6 Although in principle there may be that the slide has since been borrowed many times to more than one codeword within such an expanded depict the depths of depravity into which such a distance, in fact, with high probability, only one will disconnected group can fall ...[ occur. Guruswami and Sudan [38] further improved the algorithm and its decoding radius, and Koetter and Vardy Of course, as Lucky goes on to say, the irony is that [39] extended it to handle soft decisions. There is cur- since 1971 coding has flourished and become embedded in rently some hope that algorithms of this type will be used practically all communications applications. He asks in practice. plaintively, BWhy are we technologists so bad at predicting Other approaches to soft-decision decoding algorithms the future of technology?[ have continued to be developed, notably the ordered- From today’s perspective, one answer to this question statistics approach of Fossorier and Lin (see, e.g., [40]) could be that what Weldon was really asserting was that whose roots can be traced back to Wagner decoding. Balgebraic coding is dead[ (or at least had reached the point of diminishing returns). Another answer was given on the spot by Irwin Jacobs, IV. PROBABILISTICCODING who stood up in the back row, flourished a medium-scale- BProbabilistic coding[ is a name for an alternative line of integrated circuit (perhaps a 4-bit shift register), and development that was more directly inspired by Shannon’s asserted that BThis is the future of coding.[ Elwyn probabilistic approach to coding. Whereas algebraic coding Berlekamp said much the same thing. Interestingly, theory aims to find specific codes that maximize the Jacobs and Berlekamp went on to lead the two principal minimum distance d for a given n; k , probabilistic coding ð Þ coding companies of the 1970s, Linkabit and Cyclotomics, is more concerned with finding classes of codes that the one championing convolutional codes, and the other, optimize average performance as a function of coding and block codes. decoding complexity. Probabilistic decoders typically use History has shown that both answers were right. soft-decision (reliability) information, both as inputs (from Coding has moved from theory to practice in the past the channel outputs), and at intermediate stages of the 35 years because: 1) other classes of coding schemes decoding process. Classical coding schemes that fall into have supplanted the algebraic coding paradigm and this class include convolutional codes, product codes, 2) advances in integrated circuit technology have ulti- concatenated codes, trellis-coded modulation, and trellis mately allowed designers to implement any (polynomial- decoding of block codes. Popular textbooks that emphasize complexity) algorithm that they can think of. Today’s technology is on the order of a million times faster than that of 1971. Even though Moore’s law had already 6List decoding was an (unpublished) invention of EliasVsee [37].

1158 Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

the probabilistic view of coding include Wozencraft and codes, and they have many other useful qualities. Jacobs [41], Gallager [42], Clark and Cain [43], Lin and Elias showed that convolutional codes also have the Costello [44], Johannesson and Zigangirov [45], and the same average performance as randomly chosen forthcoming book by Richardson and Urbanke [46]. codes. For many years, the competition between the algebraic and probabilistic approaches was cast as a competition We may mention at this point that Gallager’s doctoral between block codes and convolutional codes. Convolu- thesis on LDPC codes, supervised by Elias, was similarly tional coding was motivated from the start by the objective motivated by the problem of finding a class of Brandom- of optimizing the tradeoff of performance versus com- like[ codes that could be decoded near capacity with quasi- plexity, which on the binary-input AWGN channel optimal performance and feasible complexity [49]. necessarily implies soft decisions and quasi-optimal Linearity is the only algebraic property that is shared decoding. In practice, most channel coding systems have by convolutional codes and algebraic block codes.7 The used convolutional codes. Modern capacity-approaching additional structure introduced by Elias was later codes are the ultimate fruit of this line of development. understood as the dynamical structure of a discrete- time, k-input, n-output finite-state Markov process. A A. Elias’ Invention of Convolutional Codes convolutional code is characterized by its code rate k=n, Convolutional codes were invented by in where k and n are typically small integers, and by the 1955 [47], [135]–[137]. Elias’ goal was to find classes of number of its states, which is often closely related to codes for the binary symmetric channel (BSC) with as decoding complexity. much structure as possible, without loss of performance. In more recent terms, Elias’ and Gallager’s codes can be Elias’ several contributions have been nicely summa- represented as Bcodes on graphs,[ in which the complexity rized by Bob Gallager, who was Elias’ student [48]: of the graph increases only linearly with the code block length. This is why convolutional codes are useful as B[Elias’] 1955 paper ... was perhaps the most components of turbo coding systems. In this light, there is influential early paper in information theory after a fairly straight line of development from Elias’ invention Shannon’s. This paper takes several important steps to modern capacity-approaching codes. Nonetheless, this toward realizing the promise held out by Shannon’s development actually took the better part of a half-century. paper ....[ The first major result of the paper is a derivation B. Convolutional Codes in the 1960s and 1970s of upper and lower bounds on the smallest Shortly after Elias’ paper, Jack Wozencraft recognized achievable error probability on a BSC using codes that the tree structure of convolutional codes permits of a given block length n. These bounds decrease decoding by a sequential search algorithm [51]. Sequential exponentially with n for any data rate R less than the decoding became the subject of intense research at MIT, capacity C. Moreover, the upper and lower bounds culminating in the development of the fast, storage-free are substantially the same over a significant range of Fano sequential decoding algorithm [52] and an analytical rates up to capacity. This result shows that: proof that the rate of a sequential decoding system is 1) achieving a small error probability at any error bounded by the computational cutoff rate R0 [53]. rate near capacity necessarily requires a code Subsequently, Jim Massey proposed a very simple with a long block length; decoding method for convolutional codes, called thresh- 2) almost all randomly chosen codes perform old decoding [54]. Burst-error-correcting variants of essentially as well as the best codes; that is, threshold decoding developed by Massey and Gallager most codes are good codes. proved to be quite suitable for practical error correction Consequently, Elias turned his attention to [26]. Codex Corporation was founded in 1962 around the finding classes of codes that have some special Massey and Gallager codes (including LDPC codes, which structure, so as to simplify implementation, without were never seriously considered for practical implemen- sacrificing average performance over the class. tation). Codex built hundreds of burst-error-correcting His second major result is that the special class of threshold decoders during the 1960s, but the business linear codes has the same average performance as never grew very large, and Codex left it in 1970. the class of completely random codes. Encoding of In 1967, Andy Viterbi introduced what became known linear codes is fairly simple, and the symmetry and as the Viterbi algorithm (VA) as an Basymptotically special structure of these codes led to a promise of optimal[ decoding algorithm for convolutional codes, in simplified decoding strategies .... In practice, order to prove exponential error bounds [55]. It was practically all codes are linear. Elias’ third major result was the invention of 7Linear convolutional codes have the algebraic structure of discrete- (linear time-varying) convolutional codes .... These time multi-input multi-output linear dynamical systems [50], but this is codes are even simpler to encode than general linear rather different from the algebraic structure of linear block codes.

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE 1159

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

Fig. 5. Concatenated code.

quickly recognized [56], [57] that the VA was actually an In 1974, Bahl, Cocke, Jelinek, and Raviv [63] published optimum decoding algorithm. More importantly, Jerry an algorithm for APP decoding of convolutional codes, Heller at the Jet Propulsion Laboratory (JPL) [58], [59] now called the BCJR algorithm. Because this algorithm is realized that relatively short convolutional codes decoded more complicated than the VA (for one thing, it is a by the VA were potentially quite practicalVe.g., a 64-state forward–backward rather than a forward-only algorithm) code could obtain a sizable real coding gain, on the order and its performance is more or less the same, it did not of 6 dB. supplant the VA for decoding convolutional codes. How- Linkabit Corporation was founded by Irwin Jacobs, ever, because it is a soft-input soft-output (SISO) algorithm Len Kleinrock, and Andy Viterbi in 1968 as a consulting (i.e., APPs in, APPs out), it became a key element of company. In 1969, Jerry Heller was hired as Linkabit’s iterative turbo decoding (see Section VI). Theoretically, it first full-time employee. Shortly thereafter, Linkabit is now recognized as an implementation of the sum– built a prototype 64-state Viterbi algorithm decoder product algorithm on a trellis. (Ba big monster filling a rack[ [60]), capable of run- ning at 2 Mb/s [61]. D. Product Codes and Concatenated Codes During the 1970s, through the leadership of Linkabit Before inventing convolutional codes, Elias had in- and JPL, the VA became part of the NASA standard for vented another class of codes now known as product codes deep-space communication. Around 1975, Linkabit devel- [64]. The product of an n ; k ; d with an n ; k ; d ð 1 1 1Þ ð 2 2 2Þ oped a relatively inexpensive, flexible, and fast VA chip. binary linear block code is an n n ; k k ; d d binary ð 1 2 1 2 1 2Þ The VA soon began to be incorporated into many other linear block code. A product code may be decoded, simply communications applications. but suboptimally, by independent decoding of the compo- Meanwhile, although a convolutional code with nent codes. Elias showed that with a repeated product sequential decoding was the first code in space (for the of extended Hamming codes, an arbitrarily low error 1968 Pioneer 9 mission [56]), and a few prototype se- probability could be achieved at a nonzero code rate, quential decoding systems were built, sequential decoding albeit at a code rate well below the Shannon limit. never took off in practice. By the time electronics In 1966, introduced concatenated codes technology could support sequential decoding, the VA [65]. As originally conceived, a concatenated code involves had become a more attractive alternative. However, there a serial cascade of two linear block codes: an outer seems to be a current resurgence of interest in sequential n ; k ; d nonbinary RS code over a finite field F with ð 2 2 2Þ q decoding for specialized applications [62]. q 2k1 elements, and an inner n ; k ; d binary code with ¼ ð 1 1 1Þ q 2k1 codewords (see Fig. 5). The resulting concatenated ¼ C. Soft Decisions: APP Decoding code is an n n ; k k ; d d binary linear block code. The ð 1 2 1 2 1 2Þ Part of the attraction of convolutional codes is that all key idea is that the inner and outer codes may be relatively of these convolutional decoding algorithms are inherently short codes that are easy to encode and decode, whereas capable of using soft decisions, without any essential the concatenated code is a longer, more powerful code. increase in complexity. In particular, the VA implements For example, if the outer code is a (15, 11, 5) RS code minimum-Euclidean-distance sequence detection on an over F16 and the inner code is a (7, 4, 3) binary Hamming AWGN channel. code, then the concatenated code is a much more An alternative approach to using reliability information powerful (105, 44, 15) code. is to try to compute (exactly or approximately) the The two-stage decoder shown in Fig. 5 is not optimum, a posteriori probability (APP) of each transmitted bit being but is capable of correcting a wide variety of error patterns. a zero or a one, given the APPs of each received symbol. For example, any error pattern that causes at most one In his thesis, Gallager [49] developed an iterative error in each of the inner codewords will be corrected. In message-passing APP decoding algorithm for LDPC codes, addition, if bursty errors cause one or two inner codewords which seems to have been the first appearance in any to be decoded incorrectly, they will appear as correctable literature of the now-ubiquitous Bsum–product symbol errors to the outer decoder. The overall result is a algorithm[ (also called Bbelief propagation[). At about long, powerful code with a simple, suboptimum decoder the same time, Massey [54] developed an APP version of that can correct many combinations of burst and random threshold decoding. errors. Forney showed that with a proper choice of the

1160 Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

constituent codes, concatenated coding schemes could The subject of minimal trellis representations of operate at any code rate up to the Shannon limit, with block codes became an active research area during the exponentially decreasing error probability, but only 1990s. Given a linear block code with a fixed coordinate polynomial decoding complexity. ordering, it turns out that there is a unique minimal Concatenation can also be applied to convolutional trellis representation; however, finding the best coordi- codes. In fact, the most common concatenated code used nate ordering is an NP-hard problem. Nonetheless, op- in practice is one developed in the 1970s as a NASA timal coordinate orderings for many classes of linear standard (mentioned in Section III-E). It consists of an block codes have been found. In particular, the opti- inner rate-1/2, 64-state convolutional code with minimum mum coordinate ordering for Golay and RM codes is distance8 d 10 along with an outer (255, 223, 33) RS known, and the resulting trellis diagrams are rather ¼ code over F256. The inner decoder uses soft-decision nice. On the other hand, the state complexity of any Viterbi decoding, while the outer decoder uses the hard- class of Bgood[ block codes must increase exponentially decision Berlekamp–Massey algorithm. Also, since the as n . An excellent summary of this field by Vardy !1 decoding errors made by the Viterbi algorithm tend to be appears in [70]. bursty, a symbol interleaver is inserted between the two In practice, this approach was superseded by the advent encoders and a de-interleaver between the two decoders. of turbo and LDPC codes, to be discussed in Section VI. In the late 1980s, a more complex concatenated coding scheme with iterative decoding was proposed by Erik F. History of Coding for Deep-Space Applications Paaske [66], and independently by Oliver Collins [67], to The deep-space communications application is the improve the performance of the NASA concatenated arena in which the most powerful coding schemes for the coding standard. Instead of a single outer RS code, Paaske power-limited AWGN channel have been first deployed, and Collins proposed to use several outer RS codes of because: different rates. After one round of decoding, the outputs 1) the only noise is AWGN in the receiver front end; of the strongest (lowest rate) RS decoders may be deemed 2) bandwidth is effectively unlimited; to be reliable and thus may be fed back to the inner 3) fractions of a decibel have huge scientific and (Viterbi) convolutional decoder as known bits for another economic value; round of decoding. Performance improvements of about 4) receiver (decoding) complexity is effectively 1.0 dB were achieved after a few iterations. This scheme unlimited. was used to rescue the 1992 Galileo mission (see also As we have already noted, for power-limited AWGN Section IV-F). Also, in retrospect, its use of iterative channels, there is a negligible penalty to using binary codes decoding with a concatenated code may be seen as a with binary modulation rather than more general modu- precursor of turbo codes (see, for example, the paper by lation schemes. Hagenauer et al. [68]). The first coded scheme to be designed for space applications was a simple (32, 6, 16) biorthogonal code E. Trellis Decoding of Block Codes for the Mariner missions (1969), which can be optimally A convolutional code may be viewed as the output soft-decision decoded using a fast Hadamard transform. sequence of a discrete-time finite-state system. By rolling Such a scheme can achieve a nominal coding gain of 3 out the state-transition diagram of such a system in time, (4.8 dB). At a target bit error probability of Pb E 3 ð Þ we get a picture called a trellis diagram, which explicitly 5 10À , the real coding gain achieved was only about Á displays every possible state sequence, and also every 2.2 dB. possible output sequence (if state transitions are labelled The first coded scheme actually to be launched into by the corresponding outputs). With such a trellis space was a rate-1/2 convolutional code with constraint representation of a convolutional code, it becomes obvious length10 # 20 (220 states) for the Pioneer 1968 mission ¼ that on a memoryless channel the Viterbi algorithm is a [3]. The receiver used 3-bit-quantized soft decisions and maximum-likelihood sequence detection algorithm [56]. sequential decoding implemented on a general-purpose The success of VA decoding of convolutional codes led 16-bit minicomputer with a 1 MHz clock rate. At a rate of 3 to the idea of representing a block code by a (necessarily 512 b/s, the real coding gain achieved at P E 5 10À bð Þ Á time-varying) trellis diagram with as few states as possible was about 3.3 dB. and then using the VA to decode it. Another fundamental During the 1970s, as noted in Sections III-E and IV-D, contribution of the BCJR paper [63] was to show that every the NASA standard became a concatenated coding scheme n; k; d binary linear code may be represented by a trellis based on a rate-1/2, 64-state inner convolutional code and ð Þ k n k 9 diagram with at most min 2 ; 2 À states. a (255, 223, 33) Reed–Solomon outer code over F . The f g 256 overall rate of this code is 0.437, and it achieves an

8The minimum distance between infinite code sequences in a convolutional code is also known as the free distance. 10The constraint length # is the dimension of the state space of a 9This result is usually attributed to a subsequent paper by Wolf [69]. convolutional encoder; the number of states is thus 2# .

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE 1161

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

5 impressive 7.3 dB real coding gain at P E 10À , i.e., its A. Coding for the Bandwidth-Limited bð Þ gap to capacity SNR is only about 2.5 dB (see Fig. 6). AWGN Channel ð normÞ When the primary antenna failed to deploy on the Coding schemes for a bandwidth-limited AWGN Galileo mission (circa 1992), an elaborate concatenated channel typically use two-dimensional quadrature ampli- coding scheme using a rate-1/6, 214-state inner convolu- tude modulation (QAM). A sequence of QAM symbols tional code with a Big Viterbi Decoder (BVD) and a set of may be sent through a channel of bandwidth W at a variable-strength RS outer codes was reprogrammed into symbol rate up to the Nyquist limit of W QAM symbols the spacecraft computers (see Section IV-D). This scheme per second. If the information rate is  bits per QAM 7 was able to achieve P E 2 10À at E =N 0:8 dB, symbol, then the nominal spectral efficiency is also  bits bð Þ Á b 0  for a real coding gain of about 10.2 dB. per second per Hertz. Finally, within the last decade, turbo codes and LDPC An uncoded baseline scheme is simply to use a square codes for deep-space communications have been devel- M M QAM constellation, where M is even, typically a  oped to get within 1 dB of the Shannon limit, and these are power of two. The information rate is then  2 ¼ now becoming industry standards (see Section VI-H). log2 M bits per QAM symbol. The average energy of For a more comprehensive history of coding for deep- such a constellation is easily shown to be space channels, see [71]. M2 1 d2 2 1 d2 E ð À Þ ð À Þ V. CODESFORBANDWIDTH-LIMITED s ¼ 6 ¼ 6 CHANNELS Most work on channel coding has focussed on binary where d is the minimum Euclidean distance between con- codes. However, on a bandwidth-limited AWGN channel, stellation points. Since SNR Es=N0, it is then straight- ¼ in order to obtain a spectral efficiency  9 2 b/s/Hz, forward to show that with optimum modulation and some kind of nonbinary coding must be used. detection the probability of error per QAM symbol is Early work, primarily theoretical, focussed on lattice codes, which in many respects are analogous to binary P E 4Q p3 SNR linear block codes. The practical breakthrough in this field s norm ð Þ ð ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiÁ Þ came with Ungerboeck’s invention of trellis-coded mod- ulation, which is similarly analogous to convolutional where Q x is again the Gaussian probability of error ð Þ coding. function.

Fig. 6. P E versus E =N for NASA standard concatenated code, compared to uncoded PAM and Shannon limit for  0.874. bð Þ b 0 ¼

1162 Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

Fig. 7. P E versus SNR for uncoded QAM, compared to Shannon limits on SNR with and without shaping. sð Þ norm norm

This baseline performance curve of P E versus with n and therefore incorporates 1.53 dB of shaping sð Þ !1 SNRnorm for uncoded QAM transmission is plotted in gain (over an uncoded square QAM constellation). In the Fig. 7. For example, in order to achieve a symbol error bandwidth-limited regime, coding without shaping can 5 probability of P E 10À , we must have SNR 7 therefore get only to within 1.53 dB of the Shannon limit; sð Þ norm  (8.5 dB) for uncoded QAM transmission. the remaining 1.53 dB can be obtained by shaping and only We recall from Section II that the Shannon limit by shaping. on SNRnorm is 1 (0 dB), so the gap to capacity is about We do not have space to discuss shaping schemes in 5 8.5 dB at P E 10À . Thus, the maximum possible this paper. It turns out that obtaining shaping gains on sð Þ coding gain is somewhat smaller in the bandwidth-limited the order of 1 dB is not very hard, so nowadays most regime than in the power-limited regime. Furthermore, practical schemes for the bandwidth-limited Gaussian as we will discuss next, in the bandwidth-limited regime channel incorporate shaping. For example, the V.34 the Shannon limit on SNRnorm with no shaping is %e=6 (see Section V-D) incorporates a 16-dimensional (1.53 dB), so the maximum possible coding gain with no Bshell mapping[ shaping scheme whose shaping gain is shaping at P E 10 5 is only about 7 dB. These two about 0.8 dB. sð Þ À limits are also shown on Fig. 7. The performance curve of any practical coding We now briefly discuss shaping. The set of all n- scheme that improves on uncoded QAM must lie be- tuples of constellation points from a square QAM tween the relevant Shannon limit and the uncoded constellation is the set of all points on a d-spaced rec- QAM curve. Thus Fig. 7 defines the Bplaying field[ for tangular grid that lie within a 2n-cube in real 2n-space coding and shaping in the bandwidth-limited regime. R2n. The average energy of this 2n-dimensional constel- The real coding gain of a coding scheme at a given lation could be reduced if instead the constellation symbol error probability P E will be defined as the sð Þ consisted of all points on the same grid that lie within a difference (in decibels) between the SNRnorm required 2n-sphere of the same volume, which would comprise to obtain that P E with coding, but no shaping, sð Þ approximately the same number of points. The reduction versus without coding (uncoded QAM). Thus, the max- 5 in average energy of a 2n-sphere relative to a 2n-cube of imum possible real coding gain at P E 10À is sð Þ the same volume is called the shaping gain  S of a about 7 dB. sð 2nÞ 2n-sphere. As n ,  S %e=6 (1.53 dB). Again, for moderate-complexity coding, it can often be !1 sð nÞ! For large signal constellations, shaping can be im- assumed that the error probability is dominated by the plemented more or less independently of coding, and probability of making an error to one of the nearest shaping gain is more or less independent of coding gain. neighbor codewords. Under this assumption, using a union The Shannon limit essentially assumes n-sphere shaping bound estimate [75], [76], it is easily shown that with

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE 1163

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

optimum decoding, the probability of decoding error per QAM constellation is the combination of the nominal QAM symbol is well approximated by coding gain  à and the shaping gain  S , minus a cð Þ sð nÞ P E -dependent factor due to the larger Berror sð Þ coefficient[ 2Kmin à =n. 2 & ð Þ Ps E 2Nd=n Q 3d 2À SNRnorm For example (see [76]), the Gosset lattice E has a ð Þ  ð Þ ð Þ 8 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi nominal coding gain of 2 (3 dB); however, 2Nd=n Q 3cSNRnorm ¼ð Þ ð Þ K E 240, so with no shaping pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi minð 8Þ¼ where d2 is the minimum squared Euclidean distance P E 60Q p6SNR between code sequences (assuming an underlying QAM sð Þ ð normÞ constellation with minimum distance 1 between signal ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi points), 2Nd=n is the number of code sequences at the which is plotted in Fig. 8. We see that the real coding gain minimum distance per QAM symbol, and & is the 5 of E8 is only about 2.2 dB at Ps E 10À . The Leech redundancy of the coding scheme (the difference between ð Þ lattice Ã24 has a nominal coding gain of 4 (6 dB); however, the actual and maximum possible data rates with the Kmin Ã24 196 560, so with no shaping underlying QAM constellation) in bits per two dimen- ð Þ¼ 2 & sions. The quantity  d 2À is called the nominal coding c ¼ gain of the coding scheme. The real coding gain is usually Ps E 16 380Q p12SNRnorm slightly less than the nominal coding gain, due to the effect ð Þ ð ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiÞ of the Berror coefficient[ 2Nd=n. also plotted in Fig. 8. We see that the real coding gain of 5 B. Spherical Lattice Codes Ã24 is only about 3.6 dB at Ps E 10À . Spherical shaping ð Þ It is clear from the proof of Shannon’s capacity theorem in 8 or 24 dimensions would contribute a shaping gain of for the AWGN channel that an optimal code for a about 0.75 dB or 1.1 dB, respectively. bandwidth-limited AWGN channel consists of a dense For a detailed discussion of lattices and lattice codes, packing of signal points within an n-sphere in a high- see the book by Conway and Sloane [72]. dimensional Euclidean space Rn. Finding the densest packings in Rn is a longstanding C. Trellis-Coded Modulation mathematical problem. Most of the densest known The big breakthrough in practical coding for packings are lattices [72], i.e., packings that have a group bandwidth-limited channels was ’s property. Notable lattice packings include the integer invention of trellis-coded modulation (TCM), origi- lattice Z in one dimension, the hexagonal lattice A2 in two nally conceived in the 1970s, but not published until dimensions, the Gosset lattice E8 in eight dimensions, and 1982 [77]. the Leech lattice Ã24 in 24 dimensions. Ungerboeck realized that in the bandwidth-limited Therefore, from the very earliest days, there have been regime, the redundancy needed for coding should be proposals to use spherical lattice codes as codes for the obtained by expanding the signal constellation while bandwidth-limited AWGN channel, notably by de Buda keeping the bandwidth fixed, rather than by increasing [73] and Lang in Canada. Lang proposed an E8 lattice code the bandwidth while keeping a fixed signal constellation, for telephone-line to a CCITT international as is done in the power-limited regime. From capacity standards committee in the mid-1970s and actually built calculations, he showed that doubling the signal constel- a Leech lattice modem in the late 1980s [74]. lation should suffice, e.g., using a 32-QAM rather than a By the union bound estimate, the probability of error 16-QAM constellation. Ungerboeck invented clever trellis per two-dimensional symbol of a spherical lattice code codes for such expanded constellations, using minimum based on an n-dimensional lattice à on an AWGN channel Euclidean distance rather than Hamming distance as the with minimum-distance decoding may be estimated as design criterion. As with convolutional codes, trellis codes may be optimally decoded by a VA decoder, whose decoding

Ps E 2Kmin à =nQ 3c à s Sn SNRnorm complexity is proportional to the number of states in the ð Þ ð Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið Þ ð Þ encoder. Ungerboeck showed that effective coding gains of 3 to where K à is the kissing number (number of nearest 4 dB could be obtained with simple 4- to 8-state trellis minð Þ neighbors) of the lattice Ã,  à is the nominal coding codes, with no bandwidth expansion. An 8-state 2-D QAM cð Þ gain (Hermite parameter) of Ã, and  S is the shaping trellis code due to Lee-Fang Wei [79] (with a nonlinear sð nÞ gain of an n-sphere. Since P E 4Q p3SNR for a twist to make it Brotationally invariant[) was soon sð Þ ð normÞ square two-dimensional QAM constellation,ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi the real incorporated into the V.32 voice-grade telephone-line coding gain of a spherical lattice code over a square modem standard (see Section V-D). The nominal (and

1164 Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

Fig. 8. P E versus SNR for Gosset lattice E and Leech lattice à with no shaping, compared to uncoded QAM and Shannon limit sð Þ norm 8 24 on SNRnorm without shaping.

real) coding gain of this 8-state 2-D code is trellis code of Wei [80] (see Section V-D), which has  5=2 2:5 (3.97 dB); its performance curve is ap- less redundancy (& 1=2 versus & 1), a nominal cod- c ¼ ¼ ¼ ¼ proximately P E 4Q p7:5SNR , plotted in Fig. 9. ing gain of  4=p2 2:82 (4.52 dB), and perform- sð Þ ð normÞ c ¼ ¼ Later standards such as V.34ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi have used a 16-state 4-D ance P E 12Q p8:ffiffiffi49SNR , also plotted in Fig. 9. sð Þ ð normÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

Fig. 9. Ps E versus SNR for eight-state 2-D and 16-state 4-D Wei trellis codes with no shaping, compared to uncoded QAM and ð Þ norm Shannon limit on SNRnorm without shaping.

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE 1165

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

5 We see that its real coding gain at P E 10À is about symbol rate of up to 3429 symbols/s was used, with symbol sð Þ 4.2 dB. rate and data rate selection determined by Bline probing[ Trellis codes have proved to be more attractive than of individual channels. lattice codes in terms of performance versus complexity, However, the V.34 standard was shortly superseded by just as convolutional codes have been preferred to block V.90 (1998) and V.92 (2000), which allow users to send codes. Nonetheless, the signal constellations used for data directly over the 56 or 64 kb/s digital backbones trellis codes have generally been based on simple lattices, that are now nearly universal in the PSTN. Neither V.90 and their Bsubset partitioning[ is often best understood as nor V.92 uses coding, because of the difficulty of being based on a sublattice chain. For example, the V.32 achieving coding gain on a digital channel. code uses a QAM constellation based on the 2-D integer Currently, coding techniques similar to those of V.34 lattice Z2, with an eight-way partition based on the are used in higher speed wireline modems, such as digital 2 2 2 2 sublattice chain Z =R2Z =2Z =2R2Z , where R2 is a scaled subscriber line (DSL) modems, as well as on digital cellular rotation operator. The Wei 4-D 16-state code uses a wireless channels. Capacity-approaching coding schemes constellation based on the 4-D integer lattice Z4, with an are now normally included in new wireless standards. In eight-way partition based on the sublattice chain other words, bandwidth-limited coding has moved to these 4 4 Z =D4=R4Z =R4D4, where D4 is the 4-D Bcheckerboard newer, higher bandwidth settings. lattice,[ and R4 is a 4-D extension of R2. In 1977, Imai and Hirakawa introduced a related concept, called multilevel coding [78]. In this approach, an VI. THETURBOREVOLUTION independent binary code is used at each stage of a chain of In 1993, at the IEEE International Conference on 2 2 2 2 two-way partitions, such as Z =R2Z =2Z =2R2Z . By Communications (ICC) in Geneva, Switzerland, Berrou, information-theoretic arguments, it can be shown that Glavieux, and Thitimajshima [81] stunned the coding multilevel coding suffices to approach the Shannon limit research community by introducing a new class of Bturbo [124]. However, TCM has been the preferred approach in codes[ that purportedly could achieve near-Shannon-limit practice. performance with modest decoding complexity. Com- ments to the effect of BIt can’t be true; they must have D. History of Coding for Modem Applications made a 3 dB error[ were widespread.11 However, within For several decades, the telephone channel was the the next year various laboratories confirmed these arena in which the most powerful coding and modulation astonishing results, and the Bturbo revolution[ was schemes for the bandwidth-limited AWGN channel were launched. first developed and deployed, because: Shortly thereafter, codes similar to Gallager’s LDPC 1) at that time, the telephone channel was fairly well codes were discovered independently by MacKay at modeled as a bandwidth-limited AWGN channel; Cambridge [82], [83], [138] and by Spielman at MIT 2) 1 dB had significant commercial value; [84], [85], [139], [140], along with low-complexity 3) data rates were low enough that a considerable iterative decoding algorithms. MacKay showed that in amount of processing could be done per bit. practice moderate-length LDPC codes 103 104 bits ð À Þ The first international standard to use coding was the could attain near-Shannon-limit performance, whereas V.32 standard (1986) for 9600 b/s transmission over the Spielman showed that in theory, as n , they could !1 public switched telephone network (PSTN) (later raised to approach the Shannon limit with linear decoding com- 14.4 kb/s in V.32bis). This modem used an 8-state, 2-D plexity. These results kicked off a similar explosion of rotationally invariant Wei trellis code to achieve a real research on LDPC codes, which are currently seen as coding gain of about 3.5 dB with a 32-QAM (later competitors to turbo codes in practice. 128-QAM in V.32bis) constellation at 2400 symbols/s, In 1995, Wiberg showed in his doctoral thesis at i.e., a nominal bandwidth of 2400 Hz. Linko¨ping [86], [87] that both of these classes of codes The Bultimate modem standard[ was V.34 (1994) for could be understood as instances of Bcodes on sparse transmission at up to 28.8 kb/s over the PSTN (later raised graphs,[ and that their decoding algorithms could be to 33.6 kb/s in V.34bis). This modem used a 16-state, 4-D understood as instances of a general iterative APP rotationally invariant Wei trellis code to achieve a coding decoding algorithm called the Bsum–product algorithm.[ gain of about 4.0 dB with a variable-sized QAM Late in his thesis work, Wiberg discovered that many of his constellation with up to 1664 points. An optional 32-state, results had previously been found by Tanner [88], in a 4-D trellis code with an additional coding gain of 0.3 dB largely forgotten 1981 paper. Wiberg’s rediscovery of and four times (4 ) the decoding complexity and a Tanner’s work opened up a new field, called Bcodes on  64-state, 4-D code with a further 0.15 dB coding gain and a graphs.[ further 4 increase in complexity were also specified. A  16-D Bshell mapping[ constellation shaping scheme 11Although both were professors, neither Berrou nor Glavieux had provided an additional gain of about 0.8 dB. A variable completed a doctoral degree.

1166 Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

In this section, we will discuss the various historical threads leading to and springing from these watershed events of the mid-1990s, which have proved effectively to answer the challenge laid down by Shannon in 1948.

A. Precursors As we have discussed in previous sections, certain elements of the turbo revolution had been appreciated for a long time. It had been known since the early work of Elias that linear codes were as good as general codes. Information theorists also understood that maximizing the Fig. 10. Parallel concatenated rate-1/3 turbo encoder. minimum distance was not the key to getting to capacity; rather, codes should be Brandom-like,[ in the sense that the distribution of distances from a typical codeword to all other codewords should resemble the distance distribution further improved by repeated decoding, using some sort of in a random code. These principles were already evident in Bturbo-type[ iterative feedback. As they say, the rest is Gallager’s monograph on LDPC codes [49]. Ge´rard Battail, history. whose work inspired Berrou and Glavieux, was a long-time The original turbo encoder design introduced in [81] is advocate of seeking Brandom-like[ codes (see, e.g., [89]). shown in Fig. 10. An information sequence u is encoded Another element of the turbo revolution whose roots by an ordinary rate-1/2, 16-state, systematic recursive go far back is the use of soft decisions (reliability convolutional encoder to generate a first parity bit se- information) not only as input to a decoder, but also in quence; the same information bit sequence is then the internal workings of an iterative decoder. Indeed, by scrambled by a large pseudorandom interleaver % and 1962 Gallager had already developed the modern APP encoded by a second, identical rate-1/2 systematic con- decoder for decoding LDPC codes and had shown that volutional encoder to generate a second parity bit retaining soft-decision (APP) information in iterative sequence. The encoder transmits all three sequences, so decoding was useful even on a hard-decision channel the overall encoder has rate 1/3. (This is now called the such as a BSC. Bparallel concatenation[ of two codes, in contrast with the The idea of using SISO decoding in a concatenated original kind of concatenation, now called Bserial.[) coding scheme originated in papers by Battail [90] and by The use of recursive (feedback) convolutional en- Joachim Hagenauer and Peter Hoeher [91]. They coders and an interleaver turn out to be critical for proposed a SISO version of the Viterbi algorithm, called making a somewhat Brandom-like.[ If a non- the soft-output Viterbi algorithm (SOVA). In collabora- recursive encoder were used, then a single nonzero tion with John Lodge, Hoeher and Hagenauer extended information bit would necessarily generate a low-weight their ideas to iterating separate SISO APP decoders [92]. code sequence. It was soon shown by Benedetto and Moreover, at the same 1993 ICC at which Berrou et al. Montorsi [95] and by Perez et al. [96] that the use of a introduced turbo codes and first used the term Bextrinsic length-N interleaver effectively reduces the number of information[ (see discussion in next section), a paper by low-weight codewords by a factor of N. However, turbo Lodge et al. [93] also included the idea of Bextrinsic codes nevertheless have relatively poor minimum dis- information.[ By this time the benefits of retaining soft tance. Indeed, Breiling has shown that the minimum information throughout the decoding process had been distance of turbo codes grows only logarithmically with clearly appreciated; see, for example, Battail [90] and the interleaver length N [97]. Hagenauer [94]. We have already noted in Section IV-D The iterative turbo decoding system is shown in Fig. 11. that similar ideas had been developed at about the same Decoders 1 and 2 are APP (BCJR) decoders for the two time in the context of NASA’s iterative decoding scheme constituent convolutional codes, % is the same permuta- 1 for concatenated codes. tion as in the encoder, and %À is the inverse permutation. Berrou et al. discovered that the key to achieving good B. The Turbo Code Breakthrough iterative decoding performance is the removal of the i The invention of turbo codes began with Alain Bintrinsic information[ from the output APPs Lð Þ ul , i ð Þ Glavieux’s suggestion to his colleague , a resulting in Bextrinsic[ APPs Lð Þ u , which are then e ð lÞ professor of VLSI circuit design, that it would be passed as a priori inputs to the other decoder. BIntrinsic i interesting to implement the SOVA decoder in silicon. information[ represents the soft channel outputs Lcrlð Þ and While studying the principles underlying the SOVA the a priori inputs already known prior to decoding, while decoder, Berrou was struck by Hagenauer’s statement Bextrinsic information[ represents additional knowledge that Ba SISO decoder is a kind of SNR amplifier.[ As a learned about an information bit during an iteration. The physicist, Berrou wondered whether the SNR could be removal of Bintrinsic information[ has the effect of

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE 1167

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

Fig. 11. Iterative decoder for parallel concatenated turbo code.

reducing correlations from one decoding iteration to the C. Rediscovery of LDPC Codes next, thus allowing improved performance with an Gallager’s invention of LDPC codes and the iterative increasing number of iterations.12 (See [44], ch. 16 for APP decoding algorithm was long before its time (Ba bit more details.) The iterative feedback of the Bextrinsic[ of 21st-century coding that happened to fall in the 20th APPs recalls the feedback of exhaust gases in a turbo- century[). His work was largely forgotten for more than charged engine. 30 years. It is easy to understand why there was little The performance on an AWGN channel of the turbo interest in LDPC codes in the 1960s and 1970s, because code and decoder of Figs. 10 and 11, with an interleaver these codes were much too complex for the technology length of N 216, after Bpuncturing[ (deleting symbols) of that time. It is not so easy to explain why they con- ¼ to raise the code rate to 1/2  1 b/s/Hz , is shown in tinued to be ignored by the coding community up to the 5 ð ¼ Þ Fig. 12. At P E 10À , performance is about 0.7 dB mid-1990s. bð Þ from the Shannon limit for  1, or only 0.5 dB from Shortly after the turbo code breakthrough, several ¼ the Shannon limit for binary codes at  1. In contrast, researchers with backgrounds in computer science and ¼ the real coding gain of the NASA standard concatenated physics rather than in coding rediscovered the power and code is about 1.6 dB less, even though its rate is lower efficiency of LDPC codes. In his thesis, Dan Spielman [84], and its decoding complexity is about the same. [85], [139], [140] used LDPC codes based on expander With randomly constructed interleavers, at values of graphs to devise codes with linear-time encoding and 5 P E somewhat below 10À , the performance curve of decoding algorithms and with respectable error perfor- bð Þ turbo codes typically flattens out, resulting in what has mance. At about the same time and independently, David become known as an Berror floor[ (as seen in Fig. 12, for MacKay [83], [104], [138], [141] showed empirically that example.) This happens because turbo codes do not have near-Shannon-limit performance could be obtained with large minimum distances, so ultimately performance is long LDPC-type codes and iterative decoding. limited by the probability of confusing the transmitted Given that turbo codes were already a hot topic, the codeword with a near neighbor. Several approaches have rediscovery of LDPC codes kindled an explosion of interest been suggested to mitigate the error-floor effect. These in this field that has continued to this day. include using serial concatenation rather than parallel An LDPC code is commonly represented by a bipartite concatenation (see, for example, [98] or [99]), the design graph as in Fig. 13, introduced by Michael Tanner in 1981 of structured interleavers to improve the minimum [88], and now called a BTanner graph.[ Each code symbol distance (see, for example, [100] or [101]), or the use yk is represented by a node of one type and each parity of multiple interleavers to eliminate low-weight code- check by a node of a second type. A symbol node and a words (see, for example, [102] or [103]). However, the check node are connected by an edge if the corresponding fact that the minimum distance of turbo codes cannot symbol is involved in the corresponding check. In an grow linearly with block length implies that the ensemble LDPC code, the edges are sparse, in the sense that their of turbo codes is not asymptotically good and that Berror number increases linearly with the block length n, rather floors[ cannot be totally avoided. than as n2. The impressive complexity of the results of Spielman were quickly applied by Alon and Luby [105] to the 12We note, however, that correlations do build up with iterations and that a saturation effect is eventually observed, where no further Internet problem of reconstructing large files in the improvement is possible. presence of packet erasures. This work exploits the fact

1168 Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

Fig. 12. Performance of rate-1/2 turbo code with interleaver length N 216, compared to NASA standard concatenated code and relevant ¼ Shannon limits for  1. ¼

that on an erasure channel,13 decoding linear codes is design LDPC codes that can approach capacity arbitrarily essentially a matter of solving linear equations and closely, in the limit as n . The erasure channel is !1 becomes very efficient if it can be reduced to solving a the only channel for which such a result has been series of equations, each of which involves a single proved. unknown variable. An important general discovery that arose from this work was the superiority of irregular LDPC codes. In a regular LDPC code, such as the one shown in Fig. 13, all symbol nodes have the same degree (number of in- cident edges), and so do all check nodes. Luby et al. [106], [107] found that by using irregular graphs and optimizing the degree sequences (numbers of symbol and check nodes of each degree), they could approach the capacity of the erasure channel, i.e., achieve small error probabilities at code rates of nearly 1 p, where À p is the erasure probability. For example, a rate-1/2 LDPC code capable of correcting up to a fraction p 0:4955 of erasures is described in [108]. BTornado ¼ codes[ of this type were commercialized by Digital Fountain, Inc. [109]. More recently, it has been shown [110] that on any erasure channel, binary or nonbinary, it is possible to

13On an erasure channel, transmitted symbols or packets are either received correctly or not at all, i.e., there are no Bchannel errors.[ Fig. 13. Tanner graph of the (8, 4, 4) extended Hamming code.

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE 1169

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

Fig. 14. Performance of optimized rate-1/2 irregular LDPC codes: asymptotic analysis with maximum symbol degree d 100, 200, 8000, and l ¼ simulations with maximum symbol degree d 100, 200 and n 107 [113]. l ¼ ¼

Building on the analytical techniques developed for and check node degrees {18, 19}, with average degree Tornado codes, Richardson, Urbanke et al. [111], [112] d 18:5. The latter (simulated) code had symbol degrees & ¼ used a technique called Bdensity evolution[ to design {2, 3, 6, 7, 18, 19, 55, 56, 200}, with d 6, and all check ! ¼ long irregular LDPC codes that for all practical degrees equal to 12. purposes achieve the Shannon limit on binary AWGN In current research, more structured LDPC codes are channels. being sought for shorter block lengths, on the order of Given an irregular binary LDPC code with arbitrary 1000. The original work of Tanner [88] included several degree sequences, they showed that the evolution of algebraic constructions of codes on graphs. Algebraic probability densities on a binary-input memoryless sym- structure may be preferable to a pseudo-random struc- metric (BMS) channel using an iterative sum–product ture for implementation and may allow control over (or similar) decoder can be analyzed precisely. They important code parameters such as minimum distance, proved that error-free performance could be achieved as well as graph-theoretic variables such as expansion below a certain threshold, for very long codes and large and girth.14 The most impressive results are perhaps numbers of iterations. Degree sequences may then be those of [114], in which it is shown that certain classical chosen to optimize the threshold. By simulations, they finite-geometry codes and their extensions can produce showed that codes designed in this way could clearly good LDPC codes. High-rate codes with lengths up to outperform turbo codes for block lengths on the order of 524 256 have been constructed and shown to perform 105 or more. within 0.3 dB of the Shannon limit. Using this approach, Chung et al. [113] designed several rate-1/2 codes for the AWGN channel, including D. RA Codes and Other Variants one whose theoretical threshold approached the Shannon Divsalar, McEliece et al. [115] proposed Brepeat- limit within 0.0045 dB, and another whose simulated accumulate[ (RA) codes in 1998 as simple Bturbo-like[ 7 performance with a block length of 10 approached the codes for which one could prove coding theorems. An RA Shannon limit within 0.040 dB at an error rate of code is generated by the serial concatenation of a simple 6 Pb E 10À , as shown in Fig. 14. It is rather surprising n; 1; n repetition code, a large pseudo-random interleaver ð Þ ð Þ that this close approach to the Shannon limit required no %, and a simple two-state rate-1/1 convolutional extension of Gallager’s LDPC codes beyond irregularity. The former (threshold-optimized) code had symbol node degrees {2, 3, 6, 7, 15, 20, 50, 70, 100, 150, 400, 900, 14Expansion and girth are properties of a graph that relate to its 2000, 3000, 6000, 8000}, with average degree d 9:25, suitability for iterative decoding. ! ¼ 1170 Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

F. Approaching the Capacity of Bandwidth-Limited Channels In Section V, we discussed coding for bandwidth- limited channels. Following the introduction of capacity- approaching codes, researchers turned their attention to applying these new techniques to bandwidth-limited channels. Much of the early research followed the Fig. 15. An RA encoder. approach of Ungerboeck’s trellis-coded modulation [77] and the related work of Imai and Hirakawa on multilevel coding [78]. In two variations, turbo TCM due to Robertson and Wo¨rz [122] and parallel concatenated accumulator[ code with input–output equation TCM due to Benedetto et al. [123], Ungerboeck’s set yk xk yk 1, as shown in Fig. 15. partitioning rules were applied to turbo codes with TCM ¼ þ À The performance of RA codes turned out to be constituent encoders. In another variation, Wachsmann remarkably good, within about 1.5 dB of the Shannon and Huber [124] adapted the multilevel coding technique limitVi.e., better than that of the best coding schemes to work with turbo constituent codes. In each case, known prior to turbo codes. performance approaching the Shannon limit was demon- Other authors have proposed equally simple codes strated at spectral efficiencies  9 2 b/s/Hz with large with similar or even better performance. For example, RA pseudorandom interleavers. codes have been extended to Baccumulate-repeat-accumu- Even earlier, a somewhat different approach had been late[ (ARA) codes [116], which have even better introduced by Le Goff, Glavieux, and Berrou [125]. They performance. Ping and Wu [117] proposed Bconcatenated employed turbo codes in combination with bit-interleaved tree codes[ comprising M two-state trellis codes inter- coded modulation (BICM), a technique originally pro- connected by interleavers, which exhibit performance posed by Zehavi [126] for bandwidth-efficient convolu- almost identical to turbo codes of equal block length, but tional coding on fading channels. In this arrangement, the with an order of magnitude less complexity (see also output sequence of a turbo encoder is bit-interleaved and Massey and Costello [118]). It seems that there are many then Gray-mapped directly onto a signal constellation, ways that simple codes can be interconnected by large without any attention to set partitioning or multilevel pseudo-random interleavers and decoded with the sum– coding rules. However, because turbo codes are so product algorithm so as to yield near-Shannon-limit powerful, this seeming neglect of efficient signal mapping performance. design rules costs only a small fraction of a decibel for most practical constellation sizes, and capacity-approach- E. Fountain (Rateless) Codes ing performance can still be achieved. In more recent Fountain codes, or Brateless codes,[ are a new class of years, many variations of this basic scheme have appeared codes designed for channels without feedback whose in the literature. A number of researchers have also in- statistics are not known a priori, e.g., Internet packet vestigated the use of LDPC codes in BICM systems. channels where the probability p of packet erasure is Because of its simplicity and the fact that coding and unknown. The Bfountain[ idea is that the transmitter en- signal mapping can be considered separately, combining codes a finite-length information sequence into a po- turbo or LDPC codes with BICM has become the most tentially infinite stream of encoded symbols; the receiver common capacity-approaching coding scheme for band- then accumulates received symbols (possibly noisy) until width-limited channels. it finds that it has enough for successful decoding. The first codes of this type were the BLuby G. Codes on Graphs Transform[ (LT) codes of Luby [119], in which each The field of Bcodes on graphs[ has been developed to encoded symbol is a parity check on a randomly chosen provide a common conceptual foundation for all known subset of the information symbols. These were extended classes of capacity-approaching codes and their iterative to the BRaptor codes[ of Shokrollahi [120], in which an decoding algorithms. inner LT code is concatenated with an outer fixed- Inspired partly by Gallager, Michael Tanner founded length, high-rate LDPC code. Raptor codes permit linear- this field in a landmark paper nearly 25 years ago [88]. time decoding and clean up error floors, with a slightly Tanner introduced the Tanner graph bipartite graphical greater coding overhead than LT codes. Both types of model for LDPC codes, as shown in Fig. 13. Tanner also codes work well on erasure channels, and both have generalized the parity-check constraints of LDPC codes been implemented for Internet applications by Digital to arbitrary linear code constraints. He observed that this Fountain, Inc. Raptor codes also appear to work well model included product codes, or more generally codes over more general noisy channels, such as the AWGN constructed Brecursively[ from simpler component channel [121]. codes. He derived the generic sum–product decoding

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE 1171

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

that constrain all incident state edges to be equal, while symbol nodes are replaced by repetition constraints and symbol half-edges. This conversion thus causes no change in graph topology or complexity. Both styles of graphical realization are in current use, as are BForney-style factor graphs.[ By introducing states, Wiberg showed how turbo codes and trellis codes are related to LDPC codes. Fig. 17(a) illustrates the factor graph of a conventional trellis code, where each constraint determines the possible combina- tions of (state, symbol, next state) that can occur. Fig. 17(b) is an equivalent normal graph, with state variables represented simply by edges. Note that the graph of a trellis code has no cycles (loops). Perhaps the key result following from the unification of Fig. 16. (a) Generic bipartite factor graph, with symbol variables (filled circles), state variables (open circles), and constraints (squares). trellis codes and general codes on graphs is the Bcut-set (b) Equivalent normal graph, with equality constraints replacing bound,[ which we now briefly describe. If a code graph is variables, and observed variables indicated by ‘‘half-edges.’’ disconnected into two components by deletion of a cut set (a minimal set of edges whose removal partitions the graph into two disconnected components), then the code constraints require a certain minimum amount of infor- algorithm and introduced what is now called the Bmin- mation to pass between the two components. In a trellis, sum[ (or Bmax-product[) algorithm. Finally, his grasp of this establishes a lower bound on state space size. In a the architectural advantages of Bcodes on graphs[ was general graph, it establishes a lower bound on the product clear. of the sizes of the state spaces corresponding to a cut set. The cut-set bound implies that cycle-free graphs cannot BThe decoding by iteration of a fairly simple basic have state spaces much smaller than those of conventional operation makes the suggested decoders naturally trellises, since cut sets in cycle-free graphs are single adapted to parallel implementation with large-scale- edges; dramatic reductions in complexity can occur only integrated circuit technology. Since the decoders in graphs with cycles, such as the graphs of turbo and can use soft decisions effectively, and because of LDPC codes. their low computational complexity and parallelism In this light, turbo codes, LDPC codes, and RA codes can decode large blocks very quickly, these codes can all be seen as codes whose graphs are made up of may well compete with current convolutional simple codes with linear-complexity graph realizations, techniques in some applications.[ connected by a long, pseudo-random interleaver Å, as shown in Figs. 18–20. Like Gallager’s, Tanner’s work was largely forgotten for Wiberg made equally significant conceptual contribu- many years, until Niclas Wiberg’s seminal thesis [86], [87]. tions on the decoding side. Like Tanner, he gave clean Wiberg based his thesis on LDPC codes, Tanner’s paper, characterizations of the min-sum and sum–product and the field of Btrellis complexity of block codes,[ algorithms, showing that they were essentially identical discussed in Section IV-E. except for the substitution of Bmin[ for Bsum[ and Bsum[ Wiberg’s most important contribution may have been for Bproduct[ (and even giving the further Bsemi-ring[ to extend Tanner graphs to include state variables as well as symbol variables, as shown in Fig. 16(a). A Wiberg- type graph, now called a Bfactor graph[ [127], is still bipartite; however, in addition to symbol variables, which are external, observable, and determined a priori, a factor graph may include state variables, which are internal, unobservable, and introduced at will by the code designer. Subsequently, Forney proposed a refinement of factor graphs, namely Bnormal graphs[ [128]. Fig. 16(b) shows a normal graph that is equivalent to the generic factor graph of Fig. 16(a). In a normal graph, state variables are associated with edges and symbol variables with Bhalf- Fig. 17. (a) Factor graph of a trellis code. (b) Equivalent normal graph edges[; state nodes are replaced by repetition constraints of a trellis code.

1172 Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

Fig. 18. Normal graph of a Berrou-type turbo code. A data sequence is encoded by two low-complexity trellis codes, in one case after interleaving by a pseudo-random permutation Å.

Fig. 20. Normal graph of a rate-1/3 RA code. Data bits are repeated three times, permuted by a pseudo-random permutation Å, and encoded by a rate-1/1 convolutional encoder.

Forney [128] showed that with a normal graph representation, there is a clean separation of functions in iterative sum–product decoding: 1) All computations occur at constraint nodes, not at states. 2) State edges are used for internal communication (message passing). 3) Symbol edges are used for external communica- tion (I/O). Fig. 19. Normal graph of a regular d 3, d 6 LDPC code. Code bits ! ¼ & ¼ Connections shortly began to be made to a variety of satisfy single-parity-check constraints (indicated by ‘‘ ’’), with þ related work in various other fields, notably in [127] and connections specified by a pseudo-random permutation Å. [129]–[131]. In addition to the Viterbi and BCJR algo- rithms for decoding trellis codes and the turbo and LDPC decoding algorithms, the following algorithms have all generalization [129]). He showed that on cycle-free graphs been shown to be special cases of the sum–product they perform exact ML and APP decoding, respectively. In algorithm operating on appropriate graphs: B [ B [ particular, on trellises they reduce to the Viterbi and BCJR 1) the belief propagation and belief revision algorithms, respectively.15 This result strongly motivates algorithms of Pearl [132], used for statistical B [ the heuristic extension of iterative sum–product decoding inference on Bayesian networks ; B [ to graphs with cycles. Wiberg showed that the turbo and 2) the forward–backward (Baum-Welch) algo- LDPC decoding algorithms may be understood as instances rithm [133], used for detection of hidden Markov of iterative sum–product decoding applied to their models in signal processing, especially for speech respective graphs. While these graphs necessarily contain recognition; B [ cycles, the probability of short cycles is low, and 3) Junction tree algorithms used with Markov consequently iterative sum–product decoding works well. random fields [129]; 4) Kalman filters and smoothers for general Gaussian graphs [127]. 15Indeed, the Bextrinsic[ APPs passed in a turbo decoder are exactly In summary, the principles of all known capacity- the messages produced by the sum–product algorithm. approaching codes and a wide variety of message-passing

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE 1173

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

Table 1 Applications of Turbo Codes (Courtesy of C. Berrou)

algorithms used not only in coding but also in computer Similar gains are being achieved in other important science and signal processing can be understood within the applications, such as wireless channels and Internet framework of Bcodes on graphs.[ (packet erasure) channels. So is coding theory finally dead? The Shannon limit H. The Impact of the Turbo Revolution guarantees that on memoryless channels such as the Even though it has been less than 15 years since the AWGN channel, there is little more to be gained in terms introduction of turbo codes, these codes and the related of performance. Therefore, channel coding for classical class of LDPC codes have already had a significant impact applications has certainly reached the point of diminishing in practice. In particular, almost all digital communication returns, just as algebraic coding theory had by 1971. and storage system standards that involve coding are being However, this does not mean that research in coding upgraded to include these new capacity-approaching will dry up, any more than research in algebraic coding techniques. theory has disappeared. There will always be a place for Known applications of turbo codes as of this writing are discipline-driven research that fills out our understanding. summarized in Table 1. LDPC codes have been adopted for Research motivated by issues of performance versus the DVB-S2 (Digital Video Broadcasting) and 10GBASE-T complexity will always be in fashion, and measures of or IEEE 802.3an (Ethernet) standards and are currently Bcomplexity[ are sure to be redefined by future genera- also being considered for the IEEE 802.16e (WiMax) and tions of technology. Coding for nonclassical channels, such 802.11n (WiFi) standards, as well as for various storage as multi-user channels, networks, and channels with system applications. memory, are hot areas today that seem likely to remain It is evident from this explosion of activity that active for a long time. The world of coding research thus capacity-approaching codes are revolutionizing the way continues to be an expanding universe. h that information is transmitted and stored. Acknowledgment VII. CONCLUSION The authors would like to thank A. Pusane for his help It took only 50 years, but the Shannon limit is now in the preparation of this manuscript. Comments on routinely being approached within 1 dB on AWGN earlier drafts by C. Berrou, J. L. Massey, and R. Urbanke channels, both power-limited and bandwidth-limited. were very helpful.

REFERENCES Theory, Chicago, IL, Jun. 30, 2004 [4] W. W. Peterson, Error-Correcting Codes. (2004 Shannon Lecture). Cambridge, MA: MIT Press, 1961. [1] C. E. Shannon, BA mathematical theory of communication,[ Bell Syst. Tech. J., vol. 27, [3] J. L. Massey, BDeep-space communications [5] E. R. Berlekamp, Algebraic Coding Theory. pp. 379–423 and 623–656, 1948. and coding: A marriage made in heaven,[ in New York: McGraw-Hill, 1968. Advanced Methods for Satellite and Deep Space [2] R. W. McEliece, BAre there turbo codes [6] S. Lin, An Introduction to Error-Correcting Communications, J. Hagenauer, Ed. on Mars?[ in Proc. 2004 Int. Symp. Inform. Codes. Englewood Cliffs, NJ: Prentice-Hall, New York: Springer, 1992. 1970.

1174 Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

[7] W. W. Peterson and E. J. Weldon, Jr., V. K. Bhargava, Eds. Piscataway, NJ: IEEE [51] J. M. Wozencraft and B. Reiffen, Error-Correcting Codes. Cambridge, MA: Press, 1994, pp. 41–59. Sequential Decoding. Cambridge, MA: MIT Press, 1972. [29] E. R. Berlekamp, G. Seroussi, and P. Tong, MIT Press, 1961. [8] F. J. MacWilliams and N. J. A. Sloane, BA hypersystolic Reed-Solomon decoder,[ in [52] R. M. Fano, BA heuristic discussion of The Theory of Error-Correcting Codes. Reed-Solomon Codes and Their Applications, probabilistic decoding,[ IEEE Trans. Inform. New York: Elsevier, 1977. S. B. Wicker and V. K. Bhargava, Eds. Theory, vol. IT-9, pp. 64–74, Jan. 1963. [9] R. E. Blahut, Theory and Practice of Error Piscataway, NJ: IEEE Press, 1994, [53] I. M. Jacobs and E. R. Berlekamp, BA lower Correcting Codes. Reading, MA: pp. 205–241. bound to the distribution of computation for Addison-Wesley, 1983. [30] R. W. Lucky, BCoding is dead,[ IEEE sequential decoding,[ IEEE Trans. Inform. [10] R. W. Hamming, BError detecting and error Spectrum, 1991. Theory, vol. IT-13, pp. 167–174, 1967. correcting codes,[ Bell Syst. Tech. J., vol. 29, [31] R. M. Roth, Introduction to Coding Theory. [54] J. L. Massey, Threshold Decoding. pp. 147–160, 1950. Cambridge, U.K.: Cambridge U. Press, 2006. Cambridge, MA: MIT Press, 1963. [11] M. J. E. Golay, BNotes on digital coding,[ [32] V. D. Goppa, BCodes associated with [55] A. J. Viterbi, BError bounds for convolutional Proc. IRE, vol. 37, p. 657, Jun. 1949. divisors,[ Probl. Inform. Transm., vol. 13, codes and an asymptotically optimum [12] E. R. Berlekamp, Ed., Key Papers in pp. 22–27, 1977. decoding algorithm,[ IEEE Trans. Inform. the Development of Coding Theory. [33] V. D. Goppa, BCodes on algebraic curves,[ Theory, vol. IT-13, no. 4, pp. 260–269, New York: IEEE Press, 1974. Sov. Math. Dokl., vol. 24, pp. 170–172, 1981. Apr. 1967. [13] D. E. Muller, BApplication of Boolean [34] M. A. Tsfasman, S. G. Vladut, and T. Zink, [56] G. D. Forney, Jr., BReview of random tree algebra to switching circuit design and to BModular codes, Shimura curves and Goppa codes,[ NASA Ames Res. Ctr., Moffett error detection,[ IRE Trans. Electron. codes better than the Varshamov-Gilbert Field, CA, Contract NAS2-3637, NASA Comput., vol. EC-3, pp. 6–12, Sep. 1954. bound,[ Math. Nachr., vol. 109, CR73176, Dec. 1967, Appendix A, Final Rep. [14] I. S. Reed, BA class of multiple-error- pp. 21–28, 1982. correcting codes and the decoding [35] I. Blake, C. Heegard, T. Høholdt, and V. Wei, [57] J. K. Omura, BOn the Viterbi decoding scheme,[ IRE Trans. Inform. Theory, BAlgebraic-geometry codes,[ IEEE Trans. algorithm,[ IEEE Trans. Inform. Theory, vol. IT-4, pp. 38–49, Sep. 1954. Inform. Theory, vol. 44, pp. 2596–2618, vol. IT-15, pp. 177–179, 1969. [15] R. A. Silverman and M. Balser, BCoding Oct. 1998. [58] J. A. Heller, BShort constraint length for constant-data-rate systems,[ IRE Trans. [36] M. Sudan, BDecoding of Reed-Solomon convolutional codes,[ in Jet Prop. Lab., Inform. Theory, vol. PGIT-4, pp. 50–63, codes beyond the error-correction bound,[ Space Prog. Summary 37–54, vol. III, Sep. 1954. J. Complexity, vol. 13, pp. 180–193, 1997. pp. 171–177, 1968. [16] E. Prange, BCyclic error-correcting codes [37] P. Elias, BError-correcting codes for list [59] J. A. Heller, BImproved performance of short in two symbols,[ Air Force Cambridge decoding,[ IEEE Trans. Inform. Theory, constraint length convolutional codes,[ in Res. Center, Cambridge, MA, Tech. vol. 37, no. 1, pp. 5–12, Jan. 1991. Jet Prop. Lab., Space Prog. Summary 37–56, vol. III, pp. 83–84, 1969. Note AFCRC-TN-57-103, Sep. 1957. [38] V. Guruswami and M. Sudan, BImproved [17] A. Hocquenghem, BCodes correcteurs decoding of Reed-Solomon and [60] D. Morton, , Electrical d’erreurs,[ Chiffres, vol. 2, pp. 147–156, algebraic-geometry codes,[ IEEE Trans. Engineer: An Oral History. New Brunswick, 1959. Inform. Theory, vol. 45, no. 9, NJ: IEEE History Center, Rutgers Univ., Oct. 1999. [18] R. C. Bose and D. K. Ray-Chaudhuri, pp. 1757–1767, Sep. 1999. BOn a class of error-correcting binary group [39] R. Koetter and A. Vardy, BAlgebraic [61] J. A. Heller and I. M. Jacobs, BViterbi codes,[ Inform. Contr., vol. 3, pp. 68–79, soft-decision decoding of Reed-Solomon decoding for satellite and space Mar. 1960. codes,[ IEEE Trans. Inform. Theory, vol. 49, communication,[ IEEE Trans. Commun. Tech., vol. COM-19, pp. 835–848, [19] I. S. Reed and G. Solomon, BPolynomial no. 11, pp. 2809–2825, Nov. 2003. Oct. 1971. codes over certain finite fields,[ J. SIAM, [40] M. Fossorier and S. Lin, BComputationally vol. 8, pp. 300–304, Jun. 1960. efficient soft-decision decoding of linear [62] Revival of sequential decoding workshop, Munich Univ. of Technology, J. Hagenauer, [20] R. C. Singleton, BMaximum distance block codes based on ordered statistics,[ D. Costello, general chairs, Munich, -nary codes,[ , IEEE Trans. Inform. Theory, vol. 42, no. 5, Q IEEE Trans. Inform. Theory Germany, Jun. 2006. vol. IT-10, pp. 116–118, Apr. 1964. pp. 738–750, May 1996. [63] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, [21] W. W. Peterson, BEncoding and [41] J. M. Wozencraft and I. M. Jacobs, Principles BOptimal decoding of linear codes for error-correction procedures for the of Communication Engineering. New York: minimizing symbol error rate,[ IEEE Trans. Bose-Chaudhuri codes,[ IRE Trans. Wiley, 1965. Inform. Theory, vol. IT-20, pp. 284–287, , vol. IT-6, pp. 459–470, [42] R. G. Gallager, Inform. Theory Information Theory and Mar. 1974. Sep. 1960. Reliable Communication. New York: [64] P. Elias, BError-free coding,[ IRE Trans. [22] J. L. Massey, BShift-register synthesis and Wiley, 1968. Inform. Theory, vol. IT-4, pp. 29–37, BCH decoding,[ , [43] G. C. Clark, Jr. and J. B. Cain, IEEE Trans. Inform. Theory Sep. 1954. vol. IT-15, no. 1, pp. 122–127, Jan. 1969. Error-Correction Coding for Digital [65] G. D. Forney, Jr., Concatenated Codes. [23] G. D. Forney, Jr., BOn decoding BCH codes,[ Communications. New York: Cambridge, MA: MIT Press, 1966. IEEE Trans. Inform. Theory, vol. IT-11, no. 10, Plenum, 1981. pp. 549–557, Oct. 1965. [44] S. Lin and D. J. Costello, Jr., Error Correcting [66] E. Paaske, BImproved decoding for a concatenated coding system recommended [24] G. D. Forney, Jr., BGeneralized minimum Coding: Fundamentals and Applications, by CCSDS,[ IEEE Trans. Commun., vol. 38, distance decoding,[ 2nd ed. Englewood Cliffs, NJ: IEEE Trans. Inform. no. 8, pp. 1138–1144, Aug. 1990. Theory, vol. IT-12, no. 4, pp. 125–131, Prentice-Hall, 2004. Apr. 1966. [45] R. Johannesson and K. S. Zigangirov, [67] O. Collins and M. Hizlan, BDeterminate-state convolutional codes,[ [25] D. Chase, BA class of algorithms for decoding Fundamentals of Convolutional Coding. IEEE Trans. Commun., vol. 41, no. 12, block codes with channel measurement Piscataway, NJ: IEEE Press, 1999. pp. 1785–1794, Dec. 1993. information,[ IEEE Trans. Inform. Theory, [46] T. Richardson and R. Urbanke, Modern vol. IT-18, no. 1, pp. 170–182, Jan. 1972. Coding Theory. Cambridge, MA: [68] J. Hagenauer, E. Offer, and L. Papke, BMatching Viterbi decoders and [26] G. D. Forney, Jr., BBurst-correcting codes Cambridge Univ. Press, to be published. Reed-Solomon decoders in concatenated for the classic bursty channel,[ IEEE [47] P. Elias, BCoding for noisy channels,[ systems,[ in Reed-Solomon Codes and Trans. Commun. Technol., vol. COM-19, IRE Conv. Rec., pt. 4, pp. 37–46, Mar. 1955. Their Applications, S. B. Wicker and pp. 772–781, Oct. 1971. [48] R. G. Gallager, BIntroduction to FCoding for V. K. Bhargava, Eds. Piscataway, NJ: [27] R. W. McEliece and L. Swanson, noisy channels,_ by Peter Elias,[ in IEEE Press, 1994, pp. 242–271. BReed-Solomon codes and the exploration , J. V. Guttag, Ed. The Electron and the Bit [69] J. K. Wolf, BEfficient maximum-likelihood of the solar system,[ in Cambridge, MA: EECS Dept., MIT Press, Reed-Solomon Codes decoding of linear block codes using a and Their Applications, S. B. Wicker and 2005, pp. 91–94. trellis,[ IEEE Trans. Inform. Theory, vol. 24, V. K. Bhargava, Eds. Piscataway, NJ: [49] R. G. Gallager, Low-Density Parity-Check pp. 76–80, 1978. IEEE Press, 1994, pp. 25–40. . Cambridge, MA: MIT Press, 1963. Codes [70] A. Vardy, BTrellis structure of codes,[ in [28] K. A. S. Immink, BReed-Solomon codes and [50] G. D. Forney, Jr., BConvolutional codes I: Handbook of Coding Theory, V. Pless and the compact disc,[ in Reed-Solomon Codes Algebraic structure,[ IEEE Trans. Inform. W. C. Huffman, Eds. Amsterdam, and Their Applications, S. B. Wicker and Theory, vol. IT-16, pp. 720–738, Nov. 1970. The Netherlands: Elsevier, 1998.

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE 1175

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

[71] D. J. Costello, Jr., J. Hagenauer, H. Imai, and [90] G. Battail, BPondration des symboles de´code´s in Proc. 29th Symp. Theory Computing, 1997, S. B. Wicker, BApplications of error-control par l’algorithme de Viterbi,[ Annales des pp. 150–159. coding,[ IEEE Trans. Inform. Theory, vol. 44, Te´le´communications, vol. 42, pp. 31–38, [107] M. Luby, M. Mitzenmacher, pp. 2531–2560, Oct. 1998. Jan.–Feb. 1987. M. A. Shokrollahi, and D. A. Spielman, [72] J. H. Conway and N. J. A. Sloane, Sphere [91] J. Hagenauer and P. Hoeher, BA Viterbi BImproved low-density parity-check codes Packings, Lattices and Groups. New York: algorithm with soft-decision outputs and its using irregular graphs,[ IEEE Trans. Inform. Springer, 1988. applications,[ in Proc. GLOBECOM’89, vol. 3, Theory, vol. 47, no. 2, pp. 585–598, [73] R. de Buda, BThe upper error bound of a new pp. 1680–1686. Feb. 2001. near-optimal code,[ IEEE Trans. Inform. [92] J. Lodge, P. Hoeher, and J. Hagenauer, [108] A. Shokrollahi and R. Storn, BDesign of Theory, vol. IT-21, pp. 441–445, Jul. 1975. BThe decoding of multidimensional codes efficient erasure codes with differential [74] G. R. Lang and F. M. Longstaff, BA Leech using separable MAP Ffilters_,[ in Proc. evolution,[ in Proc. 2000 IEEE Int. lattice modem,[ IEEE J. Select. Areas 16th Biennial Symp.Communications, Symp. Inform. Theory, Sorrento, Italy, Commun., vol. 7, no. 8, pp. 968–973, Kingston, ON, Canada, May 27–29, 1992, Jun. 2000, p. 5. Aug. 1989. pp. 343–346. [109] J. W. Byers, M. Luby, M. Mitzenmacher, and [75] G. D. Forney, Jr. and L.-F. Wei, [93] J. Lodge, R. Young, P. Hoeher, and A. Rege, BA digital fountain approach to BMultidimensional constellationsVPart I: J. Hagenauer, BSeparable MAP Ffilters_ for reliable distribution of bulk data,[ in Proc. Introduction, figures of merit, and the decoding of product and concatenated ACM SIGCOMM ’98, Vancouver, generalized cross constellations,[ IEEE J. codes,[ in Proc. Int. Conf. Communications Canada, 1998. Select. Areas Commun., vol. 7, no. 8, (ICC’93), Geneva, Switzerland, May 23–26, [110] M. Luby, M. Mitzenmacher, pp. 877–892, Aug. 1989. 1993, pp. 1740–1745. M. A. Shokrollahi, and D. A. Spielman, [76] G. D. Forney, Jr. and G. Ungerboeck, [94] J. Hagenauer, BFSoft-in/soft-out,_ the BEfficient erasure-correcting codes,[ BModulation and coding for linear benefits of using soft values in all stages of IEEE Trans. Inform. Theory, vol. 47, no. 2, Gaussian channels,[ IEEE Trans. Inform. digital receivers,[ in Proc. 3rd Int. Workshop pp. 569–584, Feb. 2001. Theory, vol. 44, no. 10, pp. 2384–2315, Digital Signal Processing, Techniques Applied to [111] T. J. Richardson and R. Urbanke, Oct. 1998. Space Communications (ESTEC), Noordwijk, BThe capacity of low-density parity-check Sep. 1992, pp. 7.1–7.15. [77] G. Ungerboeck, BChannel coding with codes under message-passing decoding,[ multilevel/phase signals,[ IEEE Trans. [95] S. Benedetto and G. Montorsi, BUnveiling IEEE Trans. Inform. Theory, vol. 47, no. 2, Inform. Theory, vol. IT-28, no. 1, turbo codes: Some results on parallel pp. 599–618, Feb. 2001. pp. 55–67, Jan. 1982. concatenated coding schemes,[ IEEE Trans. [112] T. J. Richardson, A. Shokrollahi, and , no. 3, pp. 409–428, [78] H. Imai and S. Hirakawa, BA new multilevel Inform. Theory R. Urbanke, BDesign of capacity-approaching Mar. 1996. coding method using error-correcting irregular low-density parity-check codes,[ codes,[ IEEE Trans. Inform. Theory, vol. 23, [96] L. C. Perez, J. Seghers, and D. J. Costello, Jr., IEEE Trans. Inform. Theory, vol. 47, no. 2, no. 5, pp. 371–377, May 1997. BA distance spectrum interpretation of turbo pp. 619–637, Feb. 2001. codes,[ IEEE Trans. Inform. Theory, vol. 42, [79] L.-F. Wei, BRotationally invariant [113] S.-Y. Chung, G. D. Forney, Jr., no. 11, pp. 1698–1709, Nov. 1996. convolutional channel encoding with T. J. Richardson, and R. Urbanke, BOn the expanded signal spaceVPart II: Nonlinear [97] M. Breiling, BA logarithmic upper bound on design of low-density parity-check codes codes,[ IEEE J. Select. Areas Commun., vol. 2, the minimum distance of turbo codes,[ IEEE within 0.0045 dB from the Shannon limit,[ no. 9, pp. 672–686, Sep. 1984. Trans. Inform. Theory, vol. 50, no. 7, IEEE Commun. Lett., vol. 5, no. 2, pp. 58–60, pp. 1692–1710, Jul. 2004. Feb. 2001. [80] L.-F. Wei, BTrellis-coded modulation using multidimensional constellations,[ IEEE [98] J. D. Anderson, BTurbo codes extended with [114] Y. Kou, S. Lin, and M. P. C. Fossorier, Trans. Inform. Theory, vol. IT-33, no. 7, outer BCH code,[ IEE Electron. Lett., vol. 32, BLow-density parity-check codes based on pp. 483–501, Jul. 1987. pp. 2059–2060, Oct. 1996. finite geometries: A rediscovery,[ in Proc. 2000 IEEE Int. Symp. Inform. Theory, [81] C. Berrou, A. Glavieux, and [99] S. Benedetto, D. Divsalar, G. Montorsi, and Sorrento, Italy, Jun. 2000, p. 200. P. Thitimajshima, BNear Shannon limit F. Pollara, BSerial concatenation of error-correcting coding and decoding: interleaved codes: Performance analysis, [115] D. Divsalar, H. Jin, and R. J. McEliece, Turbo codes,[ in Proc. 1993 Int. Conf. design, and iterative decoding,[ IEEE Trans. BCoding theorems for Fturbo-like_ codes,[ in Commun., Geneva, Switzerland, Inform. Theory, vol. 44, no. 5, pp. 909–926, Proc. 1998 Allerton Conf., Allerton, IL, May 1993, pp. 1064–1070. May 1998. Sep. 1998, pp. 201–210. [82] D. J. C. MacKay and R. M. Neal, BGood [100] S. Crozier and P. Guinand, BDistance upper [116] A. Abbasfar, D. Divsalar, and Y. Kung, codes on very sparse matrices,[ in Proc. bounds and true minimum distance results BAccumulate-repeat-accumulate codes,[ in Cryptography Coding. 5th IMA Conf., for turbo codes designed with DRP Proc. IEEE Global Conf. C. Boyd, Ed., Berlin, Germany, 1995, interleavers,[ in Proc. 3rd Int. Symp. (GLOBECOM 2004), Dallas, TX, Dec. 2004, pp. 100–111, Springer. Turbo Codes Related Topics, Brest, France, pp. 509–513. Sep. 1–5, 2003. [83] D. J. C. MacKay and R. M. Neal, BNear [117] L. Ping and K. Y. Wu, BConcatenated tree Shannon limit performance of low-density [101] C. Douillard and C. Berrou, BTurbo codes: A low-complexity, high-performance codes with rate-m= m 1 constituent approach,[ IEEE Trans. Inform. Theory, parity-check codes,[ Elect. Lett., vol. 32, ð þ Þ pp. 1645–1646, Aug. 1996. convolutional codes,[ IEEE Trans. vol. 47, no. 2, pp. 791–799, Feb. 2001. , vol. 53, no. 10, pp. 1630–1638, [84] M. Sipser and D. A. Spielman, BExpander Commun. [118] P. C. Massey and D. J. Costello, Jr., BNew Oct. 2005. codes,[ IEEE Trans. Inform. Theory, vol. 42, low-complexity turbo-like codes,[ in Proc. no. 11, pp. 1710–1722, Nov. 1996. [102] E. Boutillon and D. Gnaedig, BMaximum IEEE Inform. Theory Workshop, Cairns, spread of -dimensional multiple turbo Australia, Sep. 2001, pp. 70–72. [85] D. A. Spielman, BLinear-time encodable d codes,[ , vol. 53, no. 8, and decodable error-correcting codes,[ IEEE Trans. Commun. [119] M. Luby, BLT codes,[ in Proc. 43rd Annu. pp. 1237–1242, Aug. 2005. IEEE Trans. Inform. Theory, vol. 42, no. 11, IEEE Symp. Foundations of Computer Science pp. 1723–1731, Nov. 1996. [103] C. He, M. Lentmaier, D. J. Costello, Jr., and (FOCS), Vancouver, Canada, Nov. 2002, K. S. Zigangirov, BJoint permutor analysis pp. 271–280. [86] N. Wiberg, BCodes and decoding on general and design for multiple turbo codes,[ graphs,[ Ph.D. dissertation, Linko¨ping Univ., [120] A. Shokrollahi, BRaptor codes,[ IEEE Trans. , vol. 52, Linko¨ping, Sweden, 1996. IEEE Trans. Inform. Theory Inform. Theory, vol. 52, no. 6, pp. 2551–2567, pp. 4068–4083, Sep. 2006. Jun. 2006. [87] N. Wiberg, H.-A. Loeliger, and R. Ko¨tter, [104] D. J. C. MacKay, BGood error-correcting BCodes and iterative decoding on general [121] O. Etesami and A. Shokrollahi, BRaptor codes based on very sparse matrices,[ graphs,[ , vol. 6, codes on binary memoryless symmetric Eur. Trans. Telecomm. , vol. 45, no. 3, pp. 513–525, Sep./Oct. 1995. IEEE Trans. Inform. Theory channels,[ IEEE Trans. Inform. Theory, pp. 399–431, Mar. 1999. vol. 52, no. 5, pp. 2033–2051, May 2006. [88] R. M. Tanner, BA recursive approach to [105] N. Alon and M. Luby, BA linear-time low complexity codes,[ [122] P. Robertson and T. Wo¨rz, BCoded IEEE Trans. Inform. erasure-resilient code with nearly optimal Theory, vol. IT-27, no. 9, pp. 533–547, modulation scheme employing turbo codes,[ recovery,[ , Sep. 1981. IEEE Trans. Inform. Theory Inst. Elec. Eng. Electron. Lett., vol. 31, vol. 42, no. 11, pp. 1732–1736, Nov. 1996. pp. 1546–1547, Aug. 1995. [89] G. Battail, BConstruction explicite [106] M. Luby, M. Mitzenmacher, de bons codes longs,[ Annales des [123] S. Benedetto, D. Divsalar, G. Montorsi, and M. A. Shokrollahi, D. A. Spielman, and ´´ , vol. 44, F. Pollara, BBandwidth-efficient parallel Telecommunications V. Stemann, BPractical loss-resilient codes,[ pp. 392–404, 1989. concatenated coding schemes,[ Inst. Elec.

1176 Proceedings of the IEEE | Vol. 95, No. 6, June 2007

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply. Costello and Forney: Channel Coding: Road to Channel Capacity

Eng. Electron. Lett., vol. 31, pp. 2067–2069, Inform. Theory, vol. 46, no. 3, pp. 325–343, Theory, D. Slepian, Ed. New York: IEEE Nov. 1995. Mar. 2000. Press, 1973. [124] U. Wachsmann, R. F. H. Fischer, and [130] R. J. McEliece, D. J. C. MacKay, and [136] P. Elias, BCoding for noisy channels,[ in Key J. B. Huber, BMultilevel codes: Theoretical J.-F. Cheng, BTurbo decoding as an instance Papers in the Development of Coding Theory, concepts and practical design rules,[ IEEE of Pearl’s Fbelief propagation_ algorithm,[ E. R. Berlekamp, Ed. New York: IEEE Trans. Inform. Theory, vol. 45, no. 7, IEEE J. Select. Areas Commun., vol. 16, no. 2, Press, 1974. pp. 1361–1391, Jul. 1999. pp. 140–152, Feb. 1998. [137] P. Elias, BCoding for noisy channels,[ in [125] S. Le Goff, A. Glavieux, and C. Berrou, [131] F. R. Kschischang and B. J. Frey, BIterative The Electron and the Bit, J. V. Guttag, Ed. BTurbo codes and high spectral efficiency decoding of compound codes by probability Cambridge, MA: EECS Dept., MIT, 2005. modulation,[ in Proc. IEEE Int. Conf. propagation in graphical models,[ IEEE [138] D. J. C. MacKay and R. M. Neal, BNear Commun. (ICC 1994), New Orleans, LA, J. Select. Areas Commun., vol. 16, no. 2, Shannon limit performance of low-density May 1994, pp. 645–649. pp. 219–230, Feb. 1998. parity-check codes,[ Electron. Lett., vol. 33, [126] E. Zehavi, B8-PSK trellis codes for a [132] J. Pearl, Probabilistic Reasoning in pp. 457–458, Mar. 1997. Rayleigh channel,[ IEEE Trans. Commun., Intelligent Systems. San Mateo, CA: [139] M. Sipser and D. A. Spielman, BExpander vol. COM-40, no. 5, pp. 873–884, May 1992. Morgan Kaufmann, 1988. codes,[ in Proc. 35th Symp. Found. Comp. Sci., [127] F. R. Kschischang, B. J. Frey, and [133] L. E. Baum and T. Petrie, BStatistical 1994, pp. 566–576. H.-A. Loeliger, BFactor graphs and the inference for probabilistic functions [140] D. A. Spielman, BLinear-time encodable and sum–product algorithm,[ IEEE Trans. In- of finite-state Markov chains,[ Ann. decodable error-correcting codes,[ in Proc. form. Theory, vol. 47, no. 2, pp. 498–519, Math. Stat., vol. 37, pp. 1554–1563, 27th ACM Symp. Theory Comp., 1995, Feb. 2001. Dec. 1966. pp. 388–397. [128] G. D. Forney, Jr., BCodes on graphs: Normal [134] R. W. Lucky, BCoding is dead,[ in Lucky [141] D. J. C. MacKay, BGood error-correcting realizations,[ IEEE Trans. Inform. Theory, Strikes...Again. Piscataway, NJ: IEEE codes based on very sparse matrices,[ in vol. IT-13, no. 2, pp. 520–548, Feb. 2001. Press, 1993, pp. 243–245. Proc. 5th IMA Conf. Crypto. Coding, 1995, [129] S. M. Aji and R. J. McEliece, BThe [135] P. Elias, BCoding for noisy channels,[ in Key pp. 100–111. generalized distributive law,[ IEEE Trans. Papers in the Development of Information

ABOUTTHEAUTHORS Daniel J. Costello, Jr. (Fellow, IEEE) received the G. David Forney, Jr. (Life Fellow, IEEE) received Ph.D. degree in electrical engineering from the the B.S.E. degree in electrical engineering from University of Notre Dame, Notre Dame, IN, in 1969. , Princeton, NJ, in 1961, and He then joined the Illinois Institute of Technol- the M.S. and Sc.D. degrees in electrical engineer- ogy as an Assistant Professor. In 1985, he became ing from the Massachusetts Institute of Technol- a Professor at the University of Notre Dame and ogy (MIT), Cambridge, MA, in 1963 and 1965, later served as Department Chair. His research respectively. interests include error control coding and coded From 1965 to 1999 he was with the Codex modulation. He has more than 300 technical Corporation, which was acquired by Motorola, publications and has coauthored a textbook Inc., in 1977, and its successor, the Motorola entitled Error Control Coding: Fundamentals and Applications (Pearson Information Systems Group, Mansfield, MA. Since 1996, he has been an Prentice-Hall, 1983; 2nd edition, 2004). Adjunct Professor at MIT. Dr. Costello received the Humboldt Research Prize from Germany in Dr. Forney was Editor of the IEEE TRANSACTIONSON INFORMATION THEORY 1999, and in 2000 he was named Bettex Professor of Electrical from 1970 to 1973. He has been a member of the Board of Governors of Engineering at Notre Dame. He has been a member of the Information the IEEE Information Theory Society during 1970–1976, 1986–1994, and Theory Society Board of Governors from 1983-1995 and 2005-, and in 2004-, and was President in 1992. He was awarded the 1970 IEEE 1986 he served as President. He also served as Associate Editor for the Information Theory Group Prize Paper Award, the 1972 IEEE Browder J. IEEETRANSACTIONSON COMMUNICATIONS and the IEEE TRANSACTIONSON Thompson Memorial Prize Paper Award, the 1990 IEEE Donald G. Fink INFORMATION THEORY and as Co-Chair of the IEEE International Symposia Prize Paper Award, the 1992 IEEE Edison Medal, the 1995 IEEE on Information Theory in Kobe, Japan (1988), Ulm, Germany (1997), and Information Theory Society Claude E. Shannon Award, the 1996 Chicago, IL (2004). In 2000, he was selected by the IT Society as a Christopher Columbus International Communications Award, and the recipient of a Third-Millennium Medal. 1997 Marconi International Fellowship. In 1998, he received an IT Golden Jubilee Award for Technological Innovation, and two IT Golden Jubilee Paper Awards. He received an honorary doctorate from EPFL, Lausanne, Switzerland, in 2007. He was elected a member of the National Academy of Engineering (U.S.) in 1983, a Fellow of the American Association for the Advancement of Science in 1993, an honorary member of the Popov Society (Russia) in 1994, a Fellow of the American Academy of Arts and Sciences in 1998, and a member of the National Academy of Sciences (U.S.) in 2003.

Vol. 95, No. 6, June 2007 | Proceedings of the IEEE 1177

Authorized licensed use limited to: University of Illinois. Downloaded on September 18, 2009 at 10:42 from IEEE Xplore. Restrictions apply.