2384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 and Coding for Linear Gaussian Channels G. David Forney, Jr., Fellow, IEEE, and Gottfried Ungerboeck, Fellow, IEEE

(Invited Paper)

Abstract—Shannon’s determination of the capacity of the linear leading communications engineers took [78], [118] to be Gaussian channel has posed a magnificent challenge to succeeding the “practical capacity,” and concluded that the problem of generations of researchers. This paper surveys how this challenge reaching capacity had effectively been solved. Nonetheless, has been met during the past half century. Orthogonal minimum- bandwidth modulation techniques and channel capacity are dis- only in the 1990’s have coding schemes that reach or ex- cussed. Binary coding techniques for low-signal-to-noise ratio ceed actually been implemented, even for such important (SNR) channels and nonbinary coding techniques for high-SNR and effectively ideal additive white Gaussian noise (AWGN) channels are reviewed. Recent developments, which now allow channels as deep-space communication channels. capacity to be approached on any linear Gaussian channel, are surveyed. These new capacity-approaching techniques include Meanwhile, in the bandwidth-limited regime there was turbo coding and decoding, multilevel coding, and combined essentially no practical progress beyond uncoded multilevel coding/precoding for intersymbol-interference channels. modulation until the invention of trellis-coded modulation Index Terms—Binary coding, block codes, canonical channel, in the mid-1970’s and its widespread implementation in the capacity, coding gain, convolutional codes, lattice codes, multi- 1980’s [104]. An alternative route to capacity in this regime carrier modulation, nonbinary coding, orthogonal modulation, is via multilevel coding, which was also introduced during precoding, shaping gain, single-carrier modulation, trellis codes. the 1970’s [60], [75]. Trellis-coded modulation and multilevel coding are both based on the concept of set partitioning. In I. INTRODUCTION the late 1980’s, constellation shaping was recognized as a separable part of the high-SNR coding problem, which yields HE problem of efficient data communication over linear a small but essential contribution to approaching capacity [48]. Gaussian channels has driven much of communication T For channels with strongly frequency-dependent attenuation and coding research in the half century since Shannon’s origi- or noise characteristics, it had long been known that mul- nal work [93], [94], where this problem was central. Shannon’s ticarrier modulation could in principle be used to achieve most celebrated result was the explicit computation of the the power and rate allocations prescribed by “water pouring” capacity of the additive Gaussian noise channel. His result (see, e.g., [51]). However, practical realizations of multicarrier has posed a magnificent challenge to succeeding generations of researchers. Only in the past decade can we say that methods of modulation in combination with powerful codes have been approaching capacity have been found for practically all linear achieved only in recent years [90]. Multicarrier techniques Gaussian channels. This paper focuses on the modulation, are particularly attractive for channels for which the capacity- coding, and equalization techniques that have proved to be achieving band consists of multiple disconnected frequency most effective in meeting Shannon’s challenge. intervals, as may happen with narrowband interference. Approaching channel capacity takes different forms in dif- When the capacity-achieving band is a single interval, ferent domains. There is a fundamental difference between the methods for approaching capacity using serial transmission power-limited regime with low signal-to-noise ratio (SNR) and and nonlinear transmitter pre-equalization (“precoding”) have the high-SNR bandwidth-limited regime. been developed in the past decade [34]. With these methods, Early work focused on the power-limited regime. In this the received signal is apparently ISI-free, so ideal-channel domain, binary codes suffice, and intersymbol interference decoding can be performed at the channel output. In addition, (ISI) due to nonflat channel responses is rarely a serious signal redundancy at the channel output is exploited for problem. As early as the 1960’s, sequential decoding [119] constellation shaping at the channel input. The performance of binary convolutional codes [31] was shown to be an achieved is equivalent to the performance that would be implementable method for achieving the cutoff rate , which obtained if ISI could be perfectly canceled in the receiver at low SNR is 3 dB away from the Shannon limit. Many with decision-feedback equalization (DFE). At high SNR, the effective SNR is approximately equal to the optimal effective Manuscript received February 9, 1998; revised June 5, 1998. SNR achieved by water pouring. The capacity of an ISI G. D. Forney, Jr. is with Motorola, Information Systems Group, Mansfield, channel can therefore be approached as closely as the capacity MA 02048-1193 USA (e-mail: [email protected]). of an ideal ISI-free channel. G. Ungerboeck was with IBM Research Division, Zurich Research Labora- tory, CH-8803 Rueschlikon, Switzerland (e-mail: [email protected]). Currently, the invention of turbo codes [13] and the re- Publisher Item Identifier S 0018-9448(98)06314-7. discovery of low-density parity-check (LDPC) codes [50]

0018–9448/98$10.00  1998 IEEE FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2385 have created tremendous excitement. These schemes operate This can be accomplished in a variety of ways. Any such successfully at rates well beyond the cutoff rate , within modulation technique converts the continuous-time AWGN tenths of a decibel of the Shannon limit, in both the low-SNR channel without loss of optimality to an ideal discrete-time and high-SNR regimes. Whereas coding for rates below AWGN channel. is a rather mature field, coding in the beyond- regime is We then review the channel capacity of the ideal AWGN in its early stages, and theoretical understanding is still weak. channel, and give capacity curves for equiprobable -ary At , it seems that a “phase transition” occurs from a static PAM ( -PAM) inputs. We emphasize the significant differ- regime of regular code structures to a statistical regime of ences between the low-SNR and high-SNR regimes. As a quasirandom codes, comparable to a phase transition between fundamental figure of merit for coding, we use the normalized a solid and a liquid or gas. signal-to-noise ratio SNR defined in Section II-E, and In Section II, we define the general linear Gaussian channel compare it with the traditional figure of merit .We and the ideal band-limited additive white Gaussian noise propose that should be used only in the low-SNR (AWGN) channel. We review how orthogonal modulation regime. We give baseline performance curves for uncoded 2- techniques convert a waveform channel to an equivalent ideal PAM modulation as a function of SNR and , and discrete-time AWGN channel. We discuss the Shannon limit for -PAM as a function of SNR . Finally, we discuss for such channels, and emphasize the difference between union-bound performance analysis. the low-SNR and high-SNR regimes. We give baseline per- formance curves of uncoded -ary PAM modulation, and A. General Linear Gaussian Channel and evaluate the error probability as a function of the gap to Ideal AWGN Channel capacity. We review union-bound performance analysis, which The general linear Gaussian channel is a real waveform is a useful tool below , but useless beyond . channel with input signal , a channel impulse response In Section III, we discuss the performance and complexity of , and additive Gaussian noise . The received signal is binary block codes and convolutional codes for the low-SNR regime at rates below . We also examine more powerful (2.1) binary coding and decoding techniques such as sequential de- coding, code concatenation with outer Reed–Solomon codes, where “ ” denotes convolution. The Fourier transform (spec- turbo codes, and low-density parity-check codes. tral response) of will be denoted by . The input In Section IV, we do the same for lattice and nonbinary signal is subject to a power constraint , and the noise has trellis codes, which are the high-SNR analogs of binary block one-sided power spectral density (p.s.d.) . and convolutional codes, respectively. We also discuss more Since all waveforms are real and thus have Hermitian- powerful schemes, such as multilevel coding with binary turbo symmetric spectra about , only positive frequencies need codes [109]. to be considered. We will follow this long-established tradition, Finally, in Section V, we consider the general linear Gauss- although for analytic purposes it is often preferable to consider ian channel. We review water pouring and discuss approaching both positive and negative frequencies. All bandwidths and capacity with multicarrier modulation. For serial single-carrier power spectral densities in this paper will therefore be “one- transmission, we explain how to reduce the channel to an sided.” equivalent discrete-time channel without loss of optimality. We The ideal band-limited additive white Gaussian noise show that capacity can be approached if the intersymbol inter- (AWGN) channel (for short, the ideal AWGN channel)isa ference in this channel can be eliminated, and discuss various linear Gaussian channel with flat (constant) and transmitter precoding techniques that have been developed to over a frequency band of one-sided bandwidth . achieve this objective. Signals can be transmitted only in the band , which is not Appendix I discusses various forms of multicarrier modu- necessarily one continuous interval. The restriction to may lation. In Appendix II, several uniformity properties that have be due either to the channel ( for )orto proved to be helpful in the design and analysis of Euclidean- system constraints. space codes are reviewed. Appendix III addresses discrete-time Without loss of generality, we may normalize to spectral factorization. within , so that

(2.2) II. MODULATION AND CODING FOR THE IDEAL AWGN CHANNEL where is AWGN with one-sided p.s.d. . The In this section, we first define the general linear Gauss- signal-to-noise ratio is then ian channel and the ideal band-limited AWGN channel. We SNR (2.3) then review minimum-bandwidth orthogonal pulse (PAM) for serial transmission of real and complex symbol sequences over the ideal AWGN channel. As is well B. Real and Complex Pulse Amplitude Modulation known, orthogonal transmission at a rate of real (re- Let be a sequence of real modulation symbols rep- spectively, complex) symbols per second requires a minimum resenting digital data. Assume that represents a set one-sided bandwidth of Hz (respectively, Hz). of real orthonormal waveforms; i.e., , 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 where the integral is over all time, and denotes the Kro- For serial baseband transmission, the Nyquist criterion is necker delta function: if , else .APAM met with minimum bandwidth by the brick-wall spectrum modulator transmits the continuous-time signal (2.4) elsewhere over an ideal AWGN channel. An optimal demodulator recov- (2.9) ers a sequence of noisy estimates of the transmitted sym- bols by correlating the received signal Solutions requiring more bandwidth include spectra with with the waveforms (matched filtering) symmetric finite rolloffs at the band-edge frequencies , e.g., the well-known family of raised-cosine spectra [74]. Of (2.5) course, the Nyquist criterion is also satisfied by the spectrum , which corresponds to “unfil- By optimum detection theory [118], the sequence is a tered” modulation with the rectangular pulse set of sufficient statistics about the sequence ; i.e., all for ( elsewhere). information about that is contained in the continuous- Serial transmission of real modulation symbols time received signal is condensed into the discrete-time can be achieved with minimum bandwidth by sequence . Moreover, the orthonormality of the for , , and zero elsewhere. ensures that there is no intersymbol interference (ISI), and This type of modulation may be referred to as carrierless that the sequence is a set of independent and identically single-sideband (CSSB) modulation; it is not often used. Serial distributed (i.i.d.) Gaussian random noise variables with mean passband transmission of complex modulation symbols zero and variance . The waveform channel is thus by carrierless amplitude-phase (CAP) modulation or reduced to an equivalent discrete-time ideal AWGN channel. quadrature amplitude modulation (QAM) is usually preferred. For serial PAM transmission, the orthogonal waveforms are To transmit a complex signal over a real passband channel, chosen to be time shifts of a pulse by integer multiples one may use a complex signal with only positive-frequency of the modulation interval ; i.e., ,so components. Sending the real part thereof creates a negative- that the transmitted signal becomes frequency image spectrum. In the receiver, the complex signal is recovered by suppressing the negative-frequency spectrum. (2.6) Let be as defined in (2.9), and let

The orthonormality condition requires that where so that the Fourier transform of is analytic, i.e., for . Then is the Hilbert transform of , and the two pulses are said to form a Hilbert pair. The two pulses are orthogonal with energy , which is equivalent to and both have one-sided bandwidth Hz centered at . The transmit signals for minimum-bandwidth CAP and (2.7) QAM are, respectively,

The frequency-domain equivalent to (2.7) is known as the Nyquist criterion for zero ISI, namely, (2.10)

and for all (2.8) where is the Fourier transform of and is the -aliased spectrum of . The Nyquist criterion imposes no restriction on the phase (2.11) of . Because is periodic with period , and Notice that QAM is equivalent to CAP with the symbols for real signals and thus are symmetric around replaced by the rotated symbols .If is an integer multiple , it suffices that for the frequency interval of , then there is no difference between CAP and QAM. . This implies that for each An advantage of CAP over CSSB is that the CAP spectrum for at least one .It does not need to be aligned with integer multiples of ,as follows that serial ISI-free transmission of real symbols is the case for CSSB. CAP may be preferred over QAM when per second over a real channel requires a minimum one-sided does not substantially exceed . On the other hand, bandwidth of Hz, or 1/2 Hz per symbol dimension QAM is the modulation of choice when . For ISI- transmitted per second (1/2 Hz/dim/s). free transmission of complex (two-dimensional) signals FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2387

CAP and QAM require a minimum one-sided bandwidth of of unitary matrices , defines a minimum- Hz, which again is 1/2 Hz/dim/s. bandwidth orthonormal transmission scheme. The number of There exists literally an infinite variety of orthonormal such schemes is therefore uncountably infinite. modulation schemes with the same bandwidth efficiency. Appendix I describes two currently popular multicarrier Consider general serial/parallel PAM modulation in which real modulation schemes, namely, discrete multitone (DMT) and -vectors are transmitted at a rate discrete wavelet multitone (DWMT) modulation. of vectors per second over an ideal AWGN channel. Let The optimum design of transmitter and receiver filters for be an -vector of real serial/parallel transmission over general noisy linear channels pulses. A general PAM modulator transmits has been discussed in [28].

C. Capacity of the Ideal AWGN Channel (2.12) We have seen that all orthonormal modulation schemes with optimum matched-filter detection are equivalent to an ideal where denotes the inner product. If is a set of discrete-time AWGN channel with real input symbols and orthonormal functions, an optimum demodulator may recover real output signals , where . The i.i.d. Gaussian noisy estimates of of from by matched filtering noise variables have zero mean and variance . Let the average input-symbol energy per dimension be , and let the average energy per information bit (2.13) be , where is the code rate in bits per dimension (b/dim). Because and , the signal-to- The sequence is a sequence of i.i.d. Gaussian -vectors, noise ratio (SNR) of the AWGN waveform channel and the whose elements have variance . Thus the AWGN waveform discrete-time AWGN channel are identical: channel is converted to a discrete-time vector AWGN channel, SNR (2.16) which is equivalent to real discrete-time AWGN channels operating in parallel. Let be the vec- The channel capacity is the maximum mutual information tor of the Fourier transforms of the elements of . The between channel input and output, which is obtained with a frequency-domain condition for orthonormality is the gener- Gaussian distribution over the symbols with average power alized Nyquist criterion [74] . Shannon’s most famous result states the capacity in bits per second as SNR (2.17)

Equivalently, the capacity in bits per dimension is given by SNR [b/dim] (2.18)

for all (2.14) Shannon [93], [94] showed that reliable transmission is possible for any rate , and impossible if . In (2.14), denotes the conjugate transpose of , (Unless stated otherwise, in this paper capacity and code is the matrix rates are given in bits per dimension.) , denotes the -aliased spectrum The capacity can be upper-bounded and approximated for of , and is the identity matrix. As with low SNR by a linear function in SNR, and lower-bounded (2.8), it suffices that for the frequency interval and approximated for high SNR by a logarithmic function in . Now consider the matrix SNR, as follows: SNR SNR (2.19)

SNR SNR (2.20) (2.15) Then (2.14) can be written as , A finite set from which modulation symbols can be which implies that must have rank for all . It follows chosen is called a signal alphabet or signal constellation. that the total one-sided bandwidth for which some , An -PAM constellation contains equidistant real , is nonzero must be at least Hz. symbols centered on the origin; i.e., Because symbol dimensions are transmitted per second, , where is the minimum distance again a minimum one-sided bandwidth of 1/2 Hz/dim/s is between symbols. For example, is needed. a 4-PAM constellation with . If the symbols are If meets the minimum-bandwidth condition, then the equiprobable, then the average symbol energy is nonzero rows of must form an unitary matrix in order that . Conversely, any set (2.21) 2388 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 1. Capacity of the ideal AWGN channel with Gaussian inputs and with equiprobable w-PAM inputs.

Fig. 1 shows the capacity and the mutual information For these reasons, we discuss coding for the low-SNR achieved with equiprobable -PAM (“equiprobable -PAM and high-SNR regimes separately in Sections III and IV, and capacity”), for , as a function of SNR. The coding for the general linear Gaussian channel in Section V. -PAM curves saturate because information cannot be sent at a rate higher than . E. Baseline Performance of Uncoded -PAM and Normalized SNR D. Low-SNR and High-SNR Regimes In an uncoded -PAM system, information We see from Fig. 1 that in the low-SNR regime an equiprob- bits are independently encoded into each -PAM symbol able binary alphabet is nearly optimal. For SNR (0 dB), transmitted. In the receiver, the optimum decoding rule is the reduction in capacity is negligible. then to make independent symbol-by-symbol decisions. The In the high-SNR regime, the capacity of equiprobable - probability that a Gaussian noise variable exceeds half PAM constellations asymptotically approaches a straight line of the distance between adjacent -PAM symbols is parallel to the capacity of the AWGN channel. The asymptotic , where [74] loss of (1.53 dB) is due to using a uniform rather than a Gaussian distribution over the signal set. To achieve capacity, (2.22) the use of powerful coding with equiprobable -PAM signals is not enough. To obtain the remaining 1.53 dB, constellation- shaping techniques that produce a Gaussian-like distribution The error probability per symbol is given by over an -PAM constellation are required (see Section IV-B). for the two outer points, and by Thus coding techniques for the low-SNR and high-SNR for the inner points, so the average error regimes are quite different. In the low-SNR regime, binary probability per symbol is codes are nearly optimal, and no constellation shaping is required. Good binary coding and decoding techniques have been known since the 1960’s. SNR On the other hand, in the high-SNR regime, nonbinary (2.23) signal constellations must be used. Very little progress in developing practical codes for this regime was made until where SNR follows from (2.16) the advent of trellis-coded modulation in the late 1970’s and (2.21). Thus is a function only of and SNR. and 1980’s. To approach capacity, coding techniques must The circles in Fig. 1 indicate the values of SNR for which be supplemented with constellation-shaping techniques. More- is achieved. over, on bandwidth-limited channels, ISI is often a dominant The capacity formula (2.18) can be rewritten as impairment, and practical techniques for combined coding, SNR . This suggests defining the normalized shaping, and equalization are required to approach capacity. SNR Most of the progress in these areas has been made only in the past decade. SNR SNR (2.24) FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2389

Fig. 2. Bit-error probability €˜@iA versus i˜axH for uncoded 2-PAM, and symbol-error probability €s@iA versus SNRnorm for uncoded w-PAM (w large). where is the actual data rate of a given modulation and this power-limited regime, it is appropriate to use as a figure coding scheme. For a capacity-achieving scheme, equals of merit the traditional ratio . the channel capacity and SNR (0 dB). If , From the general Shannon limit SNR , we obtain as will always be the case in practice, then SNR . The the Shannon limit on for a given rate value of SNR thus signifies how far a system is operating (2.27) from the Shannon limit (the “gap to capacity”). For uncoded -PAM, from and (2.24) one This lower bound decreases monotonically with , and as obtains approaches , it approaches the ultimate Shannon limit SNR SNR SNR 1.59 dB (2.28)

Therefore, the average error probability per symbol of uncoded However, if bandwidth is limited, then the Shannon limit on -PAM can be written as is higher. For example, if , then (0 dB). Because SNR at , the error SNR probability per symbol (or per bit) of uncoded 2-PAM may SNR large (2.25) be expressed in two equivalent ways

Note that the baseline -PAM performance curve of SNR (2.29) versus SNR is nearly independent of ,if is large. This shows that SNR is appropriately normalized for rate In this paper we will use in the low-SNR regime in the high-SNR regime. and SNR in the high-SNR regime as the fundamental Because SNR by (2.16), the general relation figures of merit of uncoded and coded modulation schemes. between SNR and at a given rate (in bits per This practice appears to be on its way to general adoption. dimension) is given by The “effective coding gain” of a coded modulation scheme is measured by the reduction in required or SNR to SNR (2.26) achieve a certain target error probability relative to a baseline uncoded scheme. In the low-SNR regime, the baseline will be If , then SNR , so the two figures taken as 2-PAM; in the high-SNR regime, the baseline will be of merit are equivalent. If , then SNR ; taken as -PAM ( large). if , then SNR . Fig. 2 gives the probability of bit error for uncoded 2- In the low-SNR regime, if bandwidth is truly unconstrained, PAM as a function of both SNR and . Note that the then as bandwidth is increased to permit usage of powerful axis for is shifted by a factor of (1.76 dB) relative low-rate binary codes, both SNR and tend toward zero. In to the axis for SNR .At , the baseline 2390 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 uncoded binary modulation scheme operates about 12.5 dB argument of the function in the union bound estimate away from the Shannon limit. Therefore, a coding gain of yields a first-order estimate of performance, namely, up to 12.5 dB in is in principle possible at this , provided that bandwidth can be expanded sufficiently to permit SNR the use of powerful very-low-rate binary codes .If bandwidth can be expanded by a factor of only , then with binary codes of rate a coding gain of up to about (2.34) 10.8 dB can in principle be achieved. Fig. 2 also depicts the probability of symbol error where is the squared baseline distance, is for uncoded -PAM as a function of SNR for large the baseline rate, and is the actual rate. (in this case, the axis should be ignored). At In comparison to the baseline squared distance, there is a , a baseline uncoded -PAM modulation distance gain of a factor of . However, in comparison scheme operates about 9 dB away from the Shannon limit to the baseline rate there is also a redundancy loss.In in the bandwidth-limited regime. Thus if bandwidth is a fixed, the high-SNR regime, the redundancy loss is approximately nonexpandable resource, a coding gain of up to about 9 dB , where is the redundancy in SNR is in principle possible at . This of the code in b/dim. In the low-SNR regime, with 2-PAM conclusion holds even at . signaling , the redundancy loss is simply a factor of . The product of these factors gives the nominal F. The Union Bound (or asymptotic) coding gain, which is based solely on the The union bound is a useful tool for evaluating the perfor- argument of the function in the UBE. mance of moderately powerful codes, although it breaks down The effective coding gain is the difference between the for rates beyond the cutoff rate . signal-to-noise ratios required to achieve a certain target error The union bound is based on evaluating pairwise error rate with a coded modulation scheme and to obtain the same probabilities between pairs of coded sequences. On an ideal error rate with an uncoded scheme of the same rate . The real discrete-time AWGN channel, the received sequence effective coding gain is typically less than the nominal coding is the sum of the transmitted coded sequence and an gain, because the error coefficient is typically larger i.i.d. Gaussian sequence with mean and variance per than the baseline error coefficient. Note that depends symbol. Because , maximum- on exactly what is being estimated, e.g., bit-error probability, likelihood (ML) decoding is equivalent to minimum-distance block-error probability, etc., over which interval, per bit, per decoding. dimension, per block, etc. A rule of thumb based on the slope Given two coded sequences and that differ by the of the curve in the – region is that every Euclidean distance , the probability that the received increase of a factor of two in the error coefficient costs sequence will be closer to than to is given by about 0.2 dB in effective coding gain, provided that is not too large. (2.30) The union bound blows up even for optimal codes at the cutoff rate , i.e., the corresponding error exponent goes The probability that will be closer to than to for any to zero [51]. For low-SNR channels, the cutoff rate is 3 , which is precisely the probability of error with ML dB away from the Shannon limit; i.e., the cutoff rate limit decoding, is thus upperbounded by is (1.42 dB). For high-SNR channels, the (2.31) cutoff rate corresponds to an SNR which is a factor of away from the Shannon limit [95]; i.e., the cutoff rate limit is SNR (1.68 dB). The average probability of error over all coded sequences Beyond , codes cannot be analyzed by using the union is upperbounded by the union bound bound estimate. Any apparent agreement between a union (2.32) bound estimate and actual performance in the region between and must be regarded as fortuitous. Because the oper- ational significance of minimum distance or nominal coding where is the average number of coded sequences gain is that they give a good estimate of performance via the at distance from . union bound, their significance in the beyond- regime is The Gaussian error probability function decays expo- questionable. nentially as . Therefore, if does not rise too rapidly with , the union bound is dominated by its first term, III. BINARY CODES FOR POWER-LIMITED CHANNELS which is called the union bound estimate (UBE) In this section we discuss coding techniques for power- (2.33) limited (low-SNR) ideal AWGN channels. As we have seen, low-rate binary codes are near-optimal in this regime, as long where is the minimum distance between coded sequences, as soft decisions are used at the output. We develop a union and is the average number of sequences at minimum bound estimate of probability of error per bit with maximum- distance from a given coded sequence. The squared likelihood (ML) decoding. We also show that the nominal FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2391 coding gain of a rate- binary code with minimum distance between their Euclidean images and Hamming distance over baseline 2-PAM is simply is . Consequently, the minimum squared Euclidean . distance between any two vectors in is Then we give the effective coding gains of known moderate- (3.3) complexity block and convolutional codes versus the branch complexity of their minimal trellises, assuming ML decoding Moreover, if has words of minimum weight , then at rates below . Convolutional codes are clearly superior in by linearity there are vectors at minimum squared terms of coding gain versus trellis complexity. distance from every vector in . Finally, we briefly discuss higher-performance codes, in- The union bound estimate (UBE) of the probability of a cluding convolutional codes with sequential decoding, con- block decoding error is then catenated codes with outer Reed–Solomon codes, and turbo codes and other capacity-approaching codes. (3.4) Because and , we may write A. Optimality of Low-Rate Binary Linear the UBE as Codes with Soft Decisions We have seen in Fig. 1 that on an AWGN channel with (3.5) SNR , the capacity with equiprobable binary signaling is where the nominal coding gain of an binary linear negligibly less than the true capacity. Moreover, on symmetric code is defined as channels such as the ideal AWGN channel, it has long been known that there is no reduction in capacity if one restricts (3.6) one’s attention to linear binary codes [31], [51]. As shown by (2.26), if the code rate is greater This is the product of a distance gain of a factor of with a than zero, the figure of merit is lower-bounded by redundancy loss of a factor of . The uncoded 2-PAM , which exceeds the ultimate Shannon limit of baseline corresponds to a code, for which ( 1.59 dB). If is small, then and . For comparability with the uncoded baseline, it is appro- (3.1) priate to normalize by to obtain the probability of block decoding error per information bit (not the bit-error so the lower bound is approximately probability) decibels (3.2) (3.7) For example, the penalty to operate at rate is in which the normalized error coefficient is . about 0.77 dB, which is not insignificant. However, there may Graphically, a curve of the form of (3.7) may be obtained be a limit on bandwidth, which scales as . Moreover, simply by moving the baseline curve even if bandwidth is unlimited, there is usually a technology- of Fig. 2 to the left by (in decibels), and upward by a dependent lower limit on the energy per transmitted factor of . symbol below which the performance of other system elements (e.g., tracking loops) degrades significantly. On the other hand, whereas two-level quantization of the C. Biorthogonal Codes channel input costs little, two-level quantization of the channel Biorthogonal codes are asymptotically optimal codes for the output (hard decisions) costs of the order of 2–3 dB [118]. ideal AWGN channel, provided that bandwidth is unlimited. Optimized three-level quantization (hard decisions with era- They illustrate both the usefulness and the limitations of union sures) costs only about half as much (1–1.5 dB) [39]. However, bound estimates. optimized uniform eight-level (3-bit) quantization (quantized For any integer there exists a binary soft decisions) costs only about 0.2 dB, and has often been linear code , e.g., a first-order Reed–Muller code, such that used in practice [55]. its Euclidean image is a biorthogonal signal set in -space (i.e., a set of orthogonal vectors and their B. The Union Bound and Nominal Coding Gain negatives). Its nominal coding gain is An binary linear block code is a -dimensional (3.8) subspace of the -dimensional binary vector space such that the minimum Hamming distance between any two code which goes to infinity as goes to infinity. -tuples is . Its size is . Such a code consists of the all-zero word, the all-one word, For use on the Gaussian channel, code bits are usually and codewords of weight . The union mapped to real numbers via the standard 2-PAM map bound on block error probability thus becomes . The Euclidean image is then a subset of vertices of the vertices of an -cube of side centered on the origin. If two code -tuples , differ in places, then the squared (3.9) 2392 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

TABLE I PARAMETERS AND CODING GAINS OF REED–MULLER CODES AND THE GOLAY CODE

where we have used the upper bound . There has been considerable recent work on efficient trellis Thus if (1.42 dB), or if SNR , representations of binary linear block codes, including BCH then the union bound approaches zero exponentially with . codes [107]. Thus the union bound shows that with biorthogonal codes it is Reed–Muller (RM) codes are as good as BCH codes for possible to approach within 3 dB of the Shannon limit, which , and almost as good for in terms of corresponds to the cutoff rate of the low-SNR AWGN the parameters . Efficient trellis representations are channel. known for all RM codes [43]. In fact, in terms of nominal A more refined upper bound [118] shows that for coding gain or effective coding gain versus trellis complexity, RM codes are often better than BCH codes [107]. Moreover, the number of minimum-weight codewords is (3.10) known for all RM codes, but not for all BCH codes [77]. which approaches zero exponentially with if In Table I we give the parameters , , and ( 1.59 dB), or if SNR . In other words, this bound for all RM codes with lengths up to and code shows that biorthogonal codes can achieve an arbitrarily small rates up to , including biorthogonal codes, and error probability for any above the ultimate Shannon also for the Golay code. We also give a parameter limit. It also illustrates the limitations of the union bound, that measures trellis complexity (the log branch complexity). which blows up 3 dB away. To estimate from the effective coding gain , Of course, the code rate approaches zero we use the normalized error coefficient , and rapidly as becomes large, so long biorthogonal codes can apply the “0.2-dB loss per factor of ” rule, which is not be used only if bandwidth is effectively unlimited. They can very accurate for large . We remind the reader that these be decoded efficiently by fast Hadamard transform techniques estimated effective coding gains assume soft decisions, ML [43], but even so the decoding complexity is proportional to decoding, and the accuracy of the union bound estimate, and , which increases exponentially with . that the complexity parameter assumes trellis-based decoding. A biorthogonal code decoded by a fast Moreover, complexity comparisons are always technology- Hadamard transform (“Green machine”) was used for an dependent. early deep-space mission (Mariner, 1969) [79]. There also exist algebraic decoding algorithms for all of these codes, as well as for BCH codes. Algebraic error- D. Effective Coding Gains of Known Codes correcting decoders that are based on hard decisions are not For moderate codelengths, Bose–Chaudhuri–Hocquenghem appropriate for the AWGN channel, since hard decisions cost (BCH) codes form a large and well-known class of binary 2 to 3 dB. For BCH and other algebraic block codes, there linear block codes that often are the best known codes in do exist efficient soft-decision decoding algorithms such as terms of their parameters . Unfortunately, not much generalized minimum distance (GMD) decoding [39] and the is known about soft-decision ML decoding algorithms for Chase algorithms [19] that are capable of approaching ML such codes. The most efficient general methods are trellis- performance; furthermore, the complexity of “one-pass” GMD based methods—i.e., the code is represented by a trellis, decoding [12] is not much greater than that of error-correction and the Viterbi algorithm (VA) is used for ML decoding. only. However, we know of no published results showing that FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2393

TABLE II PARAMETERS AND CODING GAINS OF SELECTED LOW-RATE COVOLUTIONAL CODES

the performance–complexity tradeoff for moderate-complexity with constraint lengths [18]. The log branch com- binary block codes with soft-decision algebraic decoding can plexity parameter equals . (For some combinations of be better than that of moderate-complexity binary convolu- parameters , superior time-varying codes are known tional codes with trellis-based decoding on AWGN channels. [70].) From this table it appears that the best tradeoff between It is apparent from a comparison of Tables I and II that con- effective coding gain and trellis complexity is obtained with volutional codes offer a much better performance/complexity low-rate biorthogonal codes, if bandwidth is not an issue. An tradeoff than block codes. It is possible to obtain more than 6 effective coding gain of about 5 dB can be obtained with the dB of effective coding gain with a -state or -state rate- code, whose code rate is , with a trellis or rate- convolutional code with a simple, regular trellis branch complexity of only . The and (VA) decoder. No block code approaches such performance codes, which are close cousins, can also obtain with such modest complexity. an effective coding gain of about 5 dB, with a somewhat Consequently, for power-limited channels such as the deep- greater branch complexity of , but with a more space channel, convolutional codes rather than block codes comfortable code rate of . Beyond these codes, have been used almost from the earliest days [26]. Although complexity increases rapidly. this is due in part to the historical fact that efficient maximum- A similar analysis can be performed for binary linear con- likelihood (ML) or near-ML soft-decision decoding algorithms volutional codes. For a rate- binary linear convolutional were invented much earlier for convolutional codes, Tables code with free distance , the nominal coding gain is again I and II show that convolutional codes have an inherent .If is the number of weight- code performance/complexity advantage. sequences that start in a given -input, -output block, then the appropriate error coefficient to measure the probability of E. Sequential Decoding error event per information bit is again . Sequential decoding of convolutional codes was perhaps the Table II gives the parameters of the best known rate- earliest near-ML decoding algorithm, and is still one of the time-invariant binary linear convolutional codes for , , few that can operate at the cutoff rate . For this reason, 2394 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 sequential decoding was the first coding scheme used for the for erasure-and-error correction and even for near-ML soft- deep-space application (Pioneer, 1968) [26]. decision decoding (i.e., generalized minimum-distance decod- Although there exist many variants of sequential decoding, ing [39]). RS error-correction algorithms are standard in VLSI they have in common a sequential search through a code libraries, and today are typically capable of decoding dozens tree (not a trellis) for the ML code sequence. The code of errors per block over at data rates of tens of megabits constraint length can be infinite in principle, although for per second (Mb/s) [98]. synchronization purposes (whether via zero-tail termination, When used with interleaving, RS codes are also ideal burst- tail-biting, or self-resynchronization) it is usually chosen to be error correctors in the sense of requiring the least possible not too large (of the order of – ). guard space [40]. The amount of computation to decode a given number of For these reasons RS codes are used in a large variety of information bits is (highly) variable in sequential decoding, applications, including the NASA deep-space standard adopted but is more or less independent of . The distribution of in the 1970’s [24]. RS codes are clearly the outstanding computation follows a Pareto distribution, practical success story of the field of algebraic block coding. , where is the Pareto exponent [63]. The Pareto exponent is equal to at the cutoff rate , which is usually considered to be the practical limit of sequential decoding (because a G. Turbo Codes and Other Capacity-Approaching Codes Pareto distribution has a finite mean only if ). Thus The invention of “turbo codes” [13] put a decisive end to on the low-SNR ideal AWGN channel, sequential decoding of the long-standing conjecture that the cutoff rate might rep- low-rate binary linear convolutional codes can approach the resent the “practical capacity.” Performance within tenths of Shannon limit within a factor of about 3 dB. a decibel of the Shannon limit is now routinely demonstrated If is large enough, then sequential decoders rarely make with reasonable decoding complexity, albeit with large delay errors; rather, they fail to decode due to excessive computation. [61]. This error-detection property is useful in many applications, The original turbo code operates as follows. An information such as deep-space telemetry. bit sequence is encoded in a simple (e.g., -state) recur- Moreover, experiments have shown that bidirectional se- sive systematic rate- convolutional encoder to produce quential decoding of moderate-length blocks can approxi- one check bit sequence. The same information bit sequence mately double the Pareto exponent and thus significantly is permuted in a very long (e.g., – bits) interleaver reduce the gap to capacity, even after giving effect to the rate and then encoded in a second recursive systematic rate- loss due to zero-tail termination [65]. convolutional encoder to produce a second check bit sequence. In view of these desirable properties, it is something of The information bit sequence and both check bit sequences are a mystery why sequential decoding has received so little transmitted, so the code rate is . (Puncturing can be used practical attention during the past 30 years. to raise the rate.) Decoding is performed in an iterative fashion as follows. The received sequences corresponding to the information bit F. Concatenated Codes and RS Codes sequence and the first check bit sequence are decoded by a Like the Viterbi algorithm and the Lempel–Ziv algorithm, soft-decision decoder for the first convolutional code. The concatenated codes were originally introduced to solve a outputs of this decoder are a sequence of soft decisions for theoretical problem [39], but have turned out to be useful for each bit of the information sequence. (This is done by some a variety of practical applications. version of the forward–backward algorithm [5], [6].) These The basic idea is to use a moderate-strength “inner code” soft decisions are then used by a similar decoder for the second with an ML or near-ML decoding algorithm to achieve a convolutional code, which hopefully produces still better soft moderate error rate like – at a code rate as close to decisions that can then be used by the first decoder for a new capacity as possible. Then a powerful algebraic “outer code” iteration. Decoding iterates in this way for 10 to 20 cycles, capable of correcting many errors with low redundancy is used before hard decisions are finally made on the information bits. to drive the error rate down to as low an error rate as may Empirically, operation within 0.3–0.7 dB of the Shannon be desired. It was shown in [39] that the error rate could be limit can be achieved at moderate error rates. Theoretical made to decrease exponentially with blocklength at any rate understanding of turbo codes is still weak. At low , less than capacity, while decoding complexity increases only turbo codes appear to behave like random codes whose block- polynomially. length is comparable to the interleaver length [84]. At higher Reed–Solomon (RS) codes are ideal for use as outer codes. , the performance of a turbo decoder is dominated by (They are not suitable for use as inner codes because they are certain low-weight codewords, whose multiplicity is inversely nonbinary.) RS codes are defined over finite fields whose proportional to the interleaver length (the so-called “error order is typically large (e.g., ). An floor” effect) [11]. RS code over exists for every and The turbo decoding algorithm is now understood to be [77]. The minimum distance is as large as possible, the general sum-product (APP) decoding algorithm, with a in view of the Singleton bound [77]. particular update schedule that is well matched to the turbo- Very efficient polynomial-time algebraic decoding algo- code structure [116]. However, little is known about the rithms are known, not just for error correction but also theoretical performance and convergence properties of this FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2395 algorithm applied to turbo codes, or more generally to codes The two principal classes of high-SNR codes are lattices defined on graphs with cycles. and trellis codes, which are analogous to binary block and Many variants of the turbo coding idea have been studied. convolutional codes, respectively. By now the principles of Different arrangements of compound codes have been inves- construction of such codes are well understood, and it seems tigated, including “serially concatenated” codes like those of likely that the best codes have been found. We plot the [39], which mitigate the error floor effect [9]. The decoding effective coding gains of these known moderate-complexity algorithm has been refined in various ways. Overall, however, lattices and trellis codes versus the branch complexity of their improvements have been minor. minimal trellises, assuming maximum-likelihood decoding. A turbo coding scheme is now being standardized for future Trellis codes are somewhat superior, due mainly to their lower deep-space missions [29]. error coefficients. Even more recently, codes and decoding algorithms that ap- We briefly discuss coding schemes that can achieve coding proach turbo coding performance have been devised according gains beyond , of the order of 5 to 7 dB, at error prob- to quite different principles. With these methods turbo-code abilities of – . Multilevel schemes with multistage performance has been approached and even exceeded. Notable decoding allow the use of high-performance binary codes among these are low-density parity-check (LDPC) codes, such as those described in the previous section to be used originally proposed by Gallager in the early 1960’s [50], to approach the Shannon limit of high-SNR channels. Such subsequently almost forgotten, and recently rediscovered by high-performance schemes naturally involve greater decoding various authors [76], [96], [116]. Like turbo codes, these codes complexity and large delays. may be viewed as “codes defined on graphs” and may be We particularly note techniques that have been used in decoded by similar iterative APP decoding algorithms [116]. telephone-line standards, which have generally re- Long LDPC codes can operate well beyond ; furthermore, flected the state of the art for high-SNR channels. they have no error floor and rarely make decoding errors, but rather simply fail to decode [76]. Theoretically, it has been A. Lattice Constellations shown that long LDPC codes can achieve rates up to capacity It is clear from the proof of Shannon’s capacity theorem with ML decoding [50], [76]. Very recently, nonbinary LDPC that an optimal block code for a high-SNR ideal band-limited codes have been devised that outperform turbo codes [27]. AWGN channel consists of a dense packing of signal points However, the parallel (“flooding”) version of the iterative within a sphere in a high-dimensional Euclidean space. Most decoding algorithm that is usually used with these codes is of the known densest packings are lattices [25]. In this computationally expensive, and their lengths are comparable section we briefly describe lattice constellations, and analyze to those of turbo codes. their performance using the union bound estimate and large- constellation approximations. An -dimensional lattice is a discrete subgroup of - IV. NONBINARY CODES FOR BANDWIDTH-LIMITED CHANNELS space , which without essential loss of generality may In this section we discuss coding techniques for high- be assumed to span . The points of the lattice form a SNR ideal bandwidth-limited AWGN channels. In this regime uniform infinite packing of . By the group property, each nonbinary signal alphabets such as -PAM must be used point of the lattice has the same number of neighbors at to approach capacity. Using large-alphabet approximations, each distance, and all decision regions of a minimum-distance we show that the total coding gain of a coded-modulation decoder (Voronoi regions) are congruent and tesselate . scheme for the high-SNR ideal AWGN channel is the sum of These properties hold for any lattice translate . a coding gain and a shaping gain. At high SNR’s, the coding The key geometrical parameters of a lattice are the minimum and shaping problems are separable. squared distance between lattice points, the kissing Shaping gains are obtained by using signal constellations in number of nearest neighbors to any lattice point, high-dimensional spaces that are bounded by a quasispherical and the volume of -space per lattice point, which region, rather than the cubical region that results from indepen- is equal to the volume of any Voronoi region [25]. The dent -PAM signaling; or, alternatively, by using points in a Hermite parameter is the normalized parameter low-dimensional constellation with a Gaussian-like probability , which we will soon identify as the nominal distribution rather than equiprobably. The maximum possible coding gain of . shaping gain is a factor of (1.53 dB). We briefly mention A lattice constellation is the finite several shaping methods that can easily obtain about 1 dB of set of points in a lattice translate that lie within a compact shaping gain. bounding region of -space. The key geometric properties In the high-SNR regime, the SNR gap between uncoded of the region are its volume and the average energy baseline performance at and the cutoff limit per dimension of a uniform probability density function without shaping is about 5.8 dB. This gap arises as follows: the over : gap to the Shannon limit is 9 dB; with shaping the cutoff limit is 1.7 dB below the Shannon limit; and without shaping the (4.1) cutoff limit is 1.5 dB lower than with shaping. Coding gains of the order of 3–5 dB at error probabilities of – can The normalized second moment of is defined as be obtained with moderate complexity. [25]. 2396 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

For example, an -PAM constellation with even The nominal coding gain measures the increase in density of over a baseline lattice, or . The shaping gain measures the decrease in average energy of relative to is a one-dimensional lattice constellation with a baseline region, namely, an interval or an - and . The key geometrical cube . Both contribute a multiplicative factor parameters of are of gain to the argument of the function. As before, the effective coding gain is reduced by the error coefficient . For comparability with the -PAM uncoded baseline, it is appropriate to normalize by the dimension to obtain the probability of block decoding error (4.2) per symbol The key parameters of are SNR (4.11)

in which the normalized error coefficient is (4.3) . Graphically, a curve of the form of (4.11) may be Both and are invariant under scaling, orthogo- obtained simply by moving the baseline curve nal transformations, and Cartesian products; i.e., SNR of Fig. 2 to the left by and (in and , where is any scale decibels), and upward by a factor of . factor, is any orthogonal matrix, and is any positive integer. In particular, this implies that for B. Shaping Gain and Shaping Techniques any version of an integer lattice , and that the normalized second moment of any -cube centered at the origin is . Although shaping is a newer and less important topic than For large lattice constellations, one may use the following coding, we discuss it first because its story is quite simple. approximations, the first two of which are collectively known The optimum -dimensional shaping region is an - as the continuous approximation [48]: sphere. The key geometrical parameters of an -sphere • the size of the constellation is ; of radius for even are [25] • the average energy per dimension of an equiprobable distribution over is ; • for large rates , we have ; • the average number of nearest neighbors to any point in is approximately . (4.12) The union bound estimate on probability of block decoding error is By Stirling’s approximation, namely as goes (4.4) to infinity [51], we have Because (4.5) (4.13) SNR (4.6) The shaping gain of an -sphere is plotted for dimensions SNR SNR in Fig. 3. Note that the shaping (4.7) gain of a -sphere is approximately 1 dB. However, for larger dimensions, we see from Fig. 3 that the shaping gain we may write the union bound estimate as approaches the ultimate shaping gain (1.53 dB) rather SNR (4.8) slowly. The projection of a uniform probability distribution over where the nominal coding gain of and the shaping gain of an -sphere onto one or two dimensions is a nonuniform are defined, respectively, as probability distribution that approaches a Gaussian density as . The ultimate shaping gain of (1.53 dB) may (4.9) be derived alternatively as the difference between the average (4.10) power of a uniform density over an interval and that of a Gaussian density with the same differential entropy [48]. For a baseline -PAM constellation, we have Shaping therefore induces a Gaussian-like probability dis- , and the UBE reduces to tribution on a one-dimensional PAM (or two-dimensional SNR QAM) constellation, rather than an equiprobable distribution. FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2397

Fig. 3. Shaping gains of x-spheres over x-cubes for x a IY PY RY Y SIP.

In principle, with spherical shaping, the lower dimensional for accommodating noninteger numbers of bits per symbol. constellation will become arbitrarily large, even with fixed The latter objective is also achieved by a mapping technique average power. In practice, the lower dimensional “shaping known as “modulus conversion” [3], which is an enumerative constellation expansion” is constrained by design to a permit- encoding technique for mapping large integers into blocks ted peak amplitude. If the -dimensional shape approximates (“mapping frames”) of signals in lexicographical order, where spherical shaping subject to this constraint, then the lower the sizes of the signal constellations do not need to be powers dimensional probability distribution approaches a truncated of . Gaussian distribution [48]. A nonuniform distribution over a low-dimensional constel- C. Coding Gains of Dense Lattices lation may alternatively be obtained by codes over bits that Finding the densest lattice packings in a given number of specify regions of the constellation [16]. dimensions is a mathematical problem of long standing. A With large constellations, shaping can be implemented more summary of the densest known packings is given in [25]. The or less independently of coding by operations on the “most nominal coding gains of these lattices in up to 24 dimensions is significant bits” of PAM or QAM constellation labels, which plotted in Fig. 4. Notable lattices include the eight-dimensional affect the large-scale shape of the -dimensional constellation. Gosset lattice , whose nominal coding gain is (3 dB), and In contrast, coding affects the “least significant bits” and the 24-dimensional Leech lattice , whose nominal coding determines small-scale structure. gain is (6 dB). Two practical schemes that can easily obtain shaping gains In contrast to shaping gain, the nominal coding gains of of 1 dB or more while limiting two-dimensional (2-D) shaping dense -dimensional lattices become infinite as . constellation expansion to a factor of or less are “trellis For example, for any integer , there exists a - shaping” and “shell mapping.” dimensional Barnes–Wall lattice whose nominal coding gain is Trellis shaping [45] is a kind of dual to trellis coding. Using (which is by no means the densest possible for large ) a trellis, one defines a set of equivalence classes of coded [25]. sequences. The shaping operation consists of determining However, effective coding gains cannot become infinite. among equivalent sequences in the trellis the minimum-energy Indeed, the Shannon limit shows that no lattice can have a sequence. Shaping gains of the order of 1 dB are easily combined effective coding gain and shaping gain greater than obtained. 9dBat . This limits the maximum possible In shell mapping [49], [66], [68], [69], [72] the signals effective coding gain to 7.5 dB, because shaping gain can in an -dimensional lattice constellation are in principle contribute up to 1.53 dB. labeled in order of increasing signal energy, and a one-to- What limits effective coding gain is the number of near one map is defined between possible input data sequences neighbors, which becomes very large for high-dimensional and an equal number of least energy constellation points. In dense lattices. For example, the kissing number of the - practice, generating-function methods are used to label points dimensional Barnes–Wall (BW) lattice is [25] in Cartesian-product constellations in approximate increasing order of energy based on shell-mapping labelings of lower BW (4.14) dimensional constituent constellations. The V.34 modem uses -dimensional shell mapping, and obtains shaping gains of the order of 0.8 dB with 2-D shaping constellation expansion which yields, for example, BW and limited to 25% [47]. These shaping methods are also useful BW 2398 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 4. Nominal and effective coding gains of densest known lattices in dimensions x PR.

D. Trellis Codes The quantities are the intrasubset Trellis codes are dense packings in infinite-dimensional minimum squared distances of the th-level (sub)sets , Euclidean sequence space. Trellis codes are to lattices as , , . The objective of set partitioning is to binary convolutional codes are to block codes. We will see increase these distances as much as possible at every subset that trellis codes give a better performance/complexity tradeoff level. Set partitioning is continued until is at least as large than lattices in the high-SNR regime, just as convolutional as the desired minimum squared distance between codes give a better performance/complexity tradeoff than block trellis code sequences. The binary -tuples are codes in the low-SNR regime, although the difference is not called the subset labels of the final subsets . as dramatic. Partitioning one- or two-dimensional constellations in this The key ideas in the invention of trellis codes [104] were way is trivial [104]. Partitioning of higher dimensional con- stellations was first performed starting from lower dimensional • use of minimum squared Euclidean distance as the design partitions [113]. Subsequently, partitioning of lattice constel- criterion; lations was introduced using lattice partitions [17], [42], [43] • set partitioning of a constellation into subsets; (or, more generally, group-theoretic partitions [44]), as we will • maximizing intrasubset minimum squared distances; discuss below. • rate- convolutional coding over the subsets; In almost all cases of practical interest, the two first-level • selection of signals within subsets by further uncoded bits subsets and will be geometrically congruent to (if necessary). each other (see Appendix II-B); i.e., they will differ only by Most practical trellis codes have used either -dimensional translation, rotation, and/or reflection. lattice constellations based on versions of with , An encoder for a trellis code then operates as shown in , ,or ,or -dimensional -PSK constellations with Fig. 5 [105]. Given input bits per -dimensional symbol, or and . We will focus on lattice-type input bits are encoded by a rate- binary convolutional trellis codes in this section. encoder with states into a coded sequence of subset labels Set partitioning of an -dimensional constellation into . At time , the label is used to select subset . The subsets is usually performed by partitioning steps, with each remaining input bits are used to choose one signal two-way selection being identified by a label bit , from signals in the selected subset . The size of [104], [105] must therefore be , or a factor of (the “coding constellation expansion” factor per dimensions) larger than needed to send uncoded bits per symbol. (If there is any (4.15) shaping, it is performed on the uncoded bits and results in further “shaping constellation expansion.”) FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2399

Fig. 5. Trellis-code encoder.

A trellis code may be maximum-likelihood decoded by a technique, the sequence of encoding operations at time is Viterbi algorithm (VA) decoder as follows. Given a received as follows: point , the receiver first finds the closest signal point a) the check bit is determined from the encoder state ; in each subset . This is called subset decoding. A b) the input bits at time select a signal from ; VA decoder then finds the code sequence for which the c) the remaining label bits are determined signals chosen in the subsets are closest to the entire received from ; sequence . The decoding complexity is dominated by the d) the next encoder state is determined from and . complexity of the VA decoder, which is approximately given by the branch complexity of the convolutional code, E. Lattice Constellation Partitioning normalized by the dimension . Partitioning of lattice constellations by means of lattice In practice, the redundancy of the convolutional code is partitions is done as follows [17], [43]. One starts with an - almost always one bit per -dimensional symbol dimensional lattice constellation , where is almost , so that the coding constellation expansion factor is always a version of an -dimensional integer lattice . The per dimensions. The lowest level (“least significant”) constellation is partitioned into subsets of equal label bit is then the sole parity-check bit. If the con- size that are congruent to a sublattice of index volutional code is linear, then in -transform notation the in , so that is the union of cosets (translates) of . check sequence is determined from the information The subsets are then the points of that lie in each bit sequences by a linear parity-check such subset, which form sublattice constellations of the form equation of the form . The region must be chosen so that there are an (4.16) equal number of points in each subset. The sublattice is usually chosen to be as dense as possible. where Codes based on lattice partitions are called coset codes. The nominal coding gain of a coset code based on is a set of relatively-prime parity-check polynomials. If a lattice is [42] the greatest degree of any of these polynomials is , then (4.17) a minimal encoder for the code has states. The convolutional code is chosen primarily to maximize where is the redundancy of the convolutional the minimum squared distance between trellis-code encoder in bits per dimension. Again, this coding gain is the sequences. If the redundancy is one bit per symbol, then product of a distance gain of over , a good first step is to ensure that in any encoder state and a redundancy loss . The effective coding gain is the set of possible next outputs is either or ,so reduced by the amount that the error coefficient that a squared distance contribution of is immediately per dimension exceeds the baseline -PAM error coefficient obtained when code sequences diverge from a common state. of per dimension. Again, the rule of thumb that an increase With a linear convolutional code, this is achieved by choosing of a factor of two costs 0.2 dB may be used. the low-order coefficients of the parity-check polynomials The encoder redundancy also leads to a “coding such that and , . The constellation expansion ratio” of a factor of per two same argument applies in the reverse time direction to the dimensions [42]—i.e., a factor of , , , for 1D, 2D, 4D, high-order coefficients . This design rule guarantees that codes, respectively. Minimization of coding constellation [104]. expansion has motivated the use of higher dimensional trellis An important consequence of such a choice is that at time codes. the lowest level label bit , which selects or , depends only on the encoder state . This observation leads to F. Coding Gains of Known Trellis Codes the idea of feedback trellis encoding [71], which was proposed Fig. 6 shows the effective coding gains of important families during the development of the V.34 modem standard to permit of trellis codes for lattice-type signal constellations. The an improved type of precoding (see Section V). With this effective coding gains are plotted versus their VA decoding 2400 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 6. Effective coding gain versus complexity for Ungerboeck and Wei codes; V.32 and V.34 codes circled. complexity, measured by a detailed operation count. The codes state 4D Wei code, which has an effective coding gain of about considered are as follows: 4.2 dB and a branch complexity of per four dimensions. 1) The original 1D (PAM) trellis codes with The -state code is due to Williams [117] and is based on of Ungerboeck [104], based on rate- convolutional a 16-way partition of into 16 subsets congruent to , codes with and the four-way partition . where is a Hadamard matrix, to ensure that there are 2) The 2D (QAM) trellis codes with of no minimum-distance error events whose length is only two Ungerboeck [104], based on (except for the simplest dimensions; it has an effective coding gain of about 4.5 dB and code with ) rate- convolutional codes with a branch complexity of per four dimensions. The -state and the eight-way partition (where code is a modification of the original 4D Wei code, modified to is a Hadamard matrix). prevent quasicatastrophic error propagation; it has an effective 3) The 4D trellis codes with of Wei [113], coding gain of about 4.7 dB and a branch complexity of based on per four dimensions. a) rate- - and -state convolutional codes and the It is noteworthy that no one has improved on the perfor- eight-way partition ; mance/complexity tradeoff of the original 1D and 2D trellis codes of Ungerboeck or the subsequent multidimensional b) a rate- -state convolutional code and the 16- codes of Wei. By this time it seems safe to predict that no way partition ; one will ever do so. There have, however, been new trellis c) a rate- -state convolutional code and the 32- codes that feature other properties and have about the same way partition ; performance and complexity, as described in the previous 4) Two families of 8D trellis codes of Wei [113]. two paragraphs, and there may still be room for further The V.32 modem (1984) uses an eight-state 2D trellis code, improvements of this kind. also due to Wei [112]. The performance/complexity tradeoff Finally, we see that trellis codes have a performance/ is the same as that of the original eight-state 2D Ungerboeck complexity advantage over lattice codes, when used with code. However, the Wei code uses a nonlinear convolutional maximum-likelihood decoding. Effective coding gains of encoder to achieve 90 rotational invariance. This code has an 4.2–4.7 dB, better than that of the Leech lattice or of BW , effective coding gain of about 3.6 dB, a branch complexity of are attainable with less complexity and much less constellation per two dimensions, and a coding constellation expansion expansion. With the -state 1D or 2D trellis codes, effective ratio of . coding gains of the order of 5.5 dB can be achieved. These The V.34 modem (1994) specifies three 4D trellis codes, gains are larger than the gains that can be obtained with lattice with performance and complexity equivalent to the 4D Wei codes of far greater complexity. codes circled on Fig. 6 [47]. All have a coding constellation On the other hand, it seems very difficult to obtain effective expansion ratio of . The -state code, which is the only coding gains approaching 6 dB. This is not surprising, because code implemented by most manufacturers, is the original - at the effective coding gain at the Shannon limit FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2401 is about 7.5 dB, and at the cutoff rate limit it is about 5.8 dB. may be used. Then a powerful binary code with rate close to To approach the Shannon limit, codes and decoding methods can be used at each level to approach the Shan- of higher complexity are necessary. non limit. In particular, by using turbo codes of appropriate So far in this section we have only discussed trellis codes for rate at each level, it has been shown that reliable transmission lattice-type signals. However, the principle of set partitioning can be achieved within 1 dB of the Shannon limit [109]. and the general encoder structure of Fig. 5 are also valid for Powerful probabilistic coding methods such as turbo codes trellis codes with signals from constellations which are not of are really needed only at the lower levels. At the higher lattice type, such as 8-PSK and 16-PSK constellations. Code levels, the channels become quite clean and the capacity tables for 8-PSK and 16-PSK trellis codes are given in [104] approaches , so that the desired and [105]. A number of multidimensional linear PSK-type redundancy approaches zero. For these levels, algebraic codes trellis codes are described in [114], all fully rotationally invari- and decoding methods may be more appropriate [46], [110]. ant. In [87], principles of set partitioning of multidimensional In summary, multilevel codes and multistage decoding allow -PSK constellations are given, based on concepts of the Shannon limit to be approached as closely in the high- multilevel block coding; tables of linear -PSK trellis SNR regime as it can be approached in the low-SNR regime codes are presented for and with with binary codes. The state of the art in multilevel coding is the largest degree of rotational invariance given for each code; reviewed in [110]. and figures similar to Fig. 6 that summarize effective coding gains versus decoding complexity are shown. I. Other Capacity-Approaching Nonbinary Coded-Modulation Schemes G. Sequential Decoding in the High-SNR Regime Various capacity-approaching alternative schemes to multi- In the high-SNR regime, the cutoff rate is a factor of level coding have been proposed. (1.68 dB) away from capacity [95]. Therefore, sequential One scheme, called bit-interleaved coded modulation decoders should be able to achieve an effective coding gain of (BICM) [15], has been developed primarily for fading about 5.8 dB at . Experiments have confirmed channels, but has turned out to be capable of approaching that sequential decoders can indeed achieve such performance the capacity of high-SNR AWGN channels as well. A BICM [111]. transmitter comprises an encoder for a binary code , a “bit interleaver,” and a signal mapper. The output sequence of H. Multilevel Codes and Multistage Decoding the bit interleaver is segmented into -bit blocks, which To approach the Shannon limit even more closely, it is clear are mapped into a -point signal constellation using Gray that much more powerful codes must be used, together with coding. Although BICM cannot operate arbitrarily closely to decoding methods that are simpler than optimal maximum- capacity, good performance close to capacity can be obtained likelihood (ML) decoding, but with near-ML performance. if is a long parallel or serially concatenated turbo code, and Multilevel codes and multistage decoding may be used for the decoder is an iterative turbo decoder. this purpose [60], [75]. Multilevel coding may be based on a Other schemes that are more straightforward extensions of chain of sublattices of standard binary turbo codes are called turbo trellis-coded mod- ulation (TTCM) [89] and parallel concatenated trellis-coded (4.18) modulation (PCTCM) [8]. Instead of two binary systematic convolutional encoders, two rate- trellis encoders which lead to a chain of lattice partitions , are used. The interleaver is arranged so that the two encoded . A different trellis encoder as in Fig. 5 may be used output bits of the two encoders are aligned with the same independently on each such lattice partition. information bits. To avoid excessive rate loss, a special Remarkably, it follows from the chain rule of mutual puncturing technique is used in TTCM: only one of the two information that the capacity that can be obtained coded bits is transmitted with each symbol, and the two by coding over the lattice partition is equal to the encoder outputs are chosen alternately. sum of the capacities that can be obtained by independent coding and decoding at each level [46], [58], V. CODING AND EQUALIZATION [67], [110]. With multistage decoding, decoding is performed FOR LINEAR GAUSSIAN CHANNELS separately at each level; at the th level the decisions at lower levels are taken into account, whereas no In this section we discuss coding and equalization methods coding is assumed for the higher levels . If the for a general linear Gaussian channel. We first present the partition is “large enough” and appropriately scaled, “water-pouring” solution for the capacity-achieving transmit then approaches the capacity of the ideal AWGN spectrum. Then we describe the multicarrier approach to channel. approaching capacity. The remaining parts of the section are The lattices may be the one- or two-dimensional integer devoted to serial transmission. We show how the linear Gauss- lattices or , and the standard binary partition chains ian channel may be converted, without loss of optimality, to a discrete-time AWGN channel with intersymbol interference. We then give a pictorial illustration of the coding principles (4.19) required to approach capacity. Finally, we review practical 2402 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

channel, it is then also appropriate to define the normalized signal-to-noise ratio for a system operating at rate (in b/dim) on a general linear Gaussian channel as SNR SNR (5.7) where SNR again measures the “gap to capacity.” The capacity-achieving band is the most important feature of the water-pouring spectrum. For many practical channels, will be a single frequency interval , as illustrated in Fig. 7. However, in other cases, e.g., in the presence of severe Fig. 7. Optimization of transmit p.s.d. € @fA by “water pouring.” narrow-band interference, may consist of multiple frequency intervals. The dependency of mutual information on the exact choice of is not as critical as its dependency on . precoding methods that have been developed to implement Mutual information achieved with a flat spectrum over is these principles. often nearly equal to capacity [64], [90]. B. Approaching Capacity with Multicarrier Transmission A. Water Pouring One way of approaching capacity is suggested directly by The general linear Gaussian channel was characterized in the water-pouring argument. The capacity-achieving band Section II as a real channel with input signal , impulse may be divided into disjoint subbands of small enough width response , and additive Gaussian noise with one-sided that SNR and thus are nearly constant over p.s.d. . The channel output signal is each subband. Each subband can then be treated as an ideal (see Fig. 8 below). In addition, the transmit signal has to AWGN channel with bandwidth . Multicarrier transmission satisfy a power constraint of the form methods that can be used for this purpose are discussed in (5.1) Appendix I. The transmit power allocated to a subband centered at where is the one-sided p.s.d. of . The channel SNR should be approximately . Then the signal-to-noise function for this channel is defined as ratio in that subband will be approximately SNR , and the subband capacity will be approximately SNR (5.2) SNR (5.8) Intuitively, the preferred transmission band is where SNR is largest. As is made sufficiently small, the aggregate power The optimum maximizes the mutual information be- in all subbands approaches and the aggregate capacity tween channel input and output subject to the power constraint approaches as given in (5.4). To approach this capacity, (5.1) and . A Lagrange multiplier argument shows powerful coding in each subband at a rate near the that the largest mutual information is achieved by the “water- subband capacity is needed. To minimize delay, pouring” solution [94], illustrated in Fig. 7 for a typical coding should be applied “across subbands” with variable SNR numbers of bits per symbol in each subband [14], [90]. Multicarrier transmission is inherently well suited for chan- SNR (5.3) nels with a highly frequency-dependent channel SNR function, particularly for channels whose capacity-achieving band where is called the capacity-achieving consists of multiple intervals. The symbol length in each band, and is a constant chosen such that (5.1) is satisfied subchannel is of the order of , so that symbols become with equality. Capacity is achieved if is a Gaussian long, generally much longer than the response of the process with p.s.d. , and in bits per second is equal to channel. This makes multicarrier transmission less sensitive to moderate impulsive noise than serial transmission, and simpli- SNR (5.4) fies equalization. However, the resulting delay is undesirable In analogy to an ideal AWGN channel, a linear Gaussian for some applications. channel may be roughly characterized by two parameters, its As the transmitted signal is the sum of many indepen- one-sided bandwidth , and its effective signal-to- dent components, its distribution tends to be Gaussian. The noise ratio SNR , defined implicitly such that resulting large peak-to-average ratio (PAR) is a potential disadvantage of multicarrier transmission. However, recently SNR (5.5) several methods of controlling the PAR of multicarrier signals Comparison with (5.4) yields have been developed [81], [82]. SNR SNR (5.6) C. Approaching Capacity with Serial Transmission In other words, SNR is the geometric mean of Alternatively, the capacity of a linear Gaussian channel may SNR over . In further analogy to an ideal AWGN be approached by serial transmission. In this subsection, we FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2403

Fig. 8. Conversion of a continuous-time linear Gaussian channel to an equivalent canonical discrete-time linear Gaussian channel.

first examine the conversion of the waveform channel to an chosen such that at least over the band equivalent discrete-time channel with intersymbol interference. , and for . The resulting complex signal This is achieved by using a transmit filter that shapes the is then transmit spectrum to the optimal water-pouring form, and (5.11) employing a sampled matched filter (MF) or whitened matched filter (WMF) in the receiver. With the WMF, a canonical discrete-time channel with trailing intersymbol interference where is a symbol response with Fourier transform (ISI) and i.i.d. Gaussian noise is obtained, as depicted in , and is normalized AWGN Fig. 8. We show that the contribution of ISI to channel capacity with p.s.d. over the signal band . vanishes at high SNR. This suggests that combining an ISI- Because the set of responses represents a basis canceling equalization technique with ideal-channel coding for the signal space, by the principles of optimum detection and shaping will suffice to approach capacity. In the following theory [118] the set of -sampled matched-filter outputs two subsections, we will then discuss the implementation of this approach. (5.12) We assume that the capacity-achieving band is a single positive-frequency interval of width . of a matched filter (MF) with response is a set of If consists of several frequency intervals, then the same sufficient statistics for the detection of the symbol sequence approach may be used over each interval separately, although . Thus no loss of mutual information or optimality occurs the practical attractiveness of this approach diminishes with in the course of reducing to the sequence . The multiple intervals. We concentrate on minimum-bandwidth composite receive filter consisting of the noise-whitening filter passband transmission of complex symbols at modulation rate and the MF has the transfer function . The transmission of complex signals over a real (5.13) channel has been discussed in Section II. In Fig. 8, the transmitted real signal is The Fourier transform of the end-to-end symbol response (5.14) (5.9) has the properties of a real nonnegative power spectrum that is band-limited to . Moreover, is equal to the noise where ideally should be a sequence of i.i.d. complex p.s.d. at the MF output, because Gaussian symbols with variance (average energy) per symbol, and is a positive-frequency transmit symbol response with Fourier transform . The transmit filter is (5.15) chosen so that the p.s.d. of is the water-pouring p.s.d. ; i.e., The sampled output sequence of the MF is given by (5.16) (5.10)

In the receiver, the received real signal is first passed where the coefficients are the sample values of through a filter which suppresses negative-frequency compo- the end-to-end symbol response . In general, the discrete- nents and whitens the noise in the positive-frequency band. time Fourier transform of the sequence is the -aliased This filter may be called a noise-whitening Hilbert splitter, spectrum because in terms of one-dimensional signals it “splits” the positive-frequency component of a real signal into its real and (5.17) imaginary parts. The transfer function of this filter is 2404 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

In our case, because is limited to a positive-frequency However, the invertibility of is not a serious issue band of width , there is no aliasing; i.e., the because can be obtained directly as the sequence of -periodic function is equal to within . sampled outputs of a whitened matched filter (WMF) [41]. Because , is a Gaussian noise sequence The transfer function of the composite receive filter consisting with autocorrelation sequence . Note that of the noise-whitening filter and the WMF is given by is Hermitian-symmetric, because is real. So far, we have obtained without loss of optimality an (5.23) equivalent discrete-time channel model which can be written in -transform notation as The only condition for the stability of this filter is the Pa- ley–Wiener criterion. It is shown in [41] that the time response (5.18) of this filter is always well defined. where is Hermitian-symmetric. We now proceed to (The channel model expresses develop an alternative, equivalent discrete-time channel model the output sequence as the sum of a noise-free sequence with a causal, monic, minimum-phase (“canonical”) response and additive white Gaussian noise. If and white Gaussian noise . For this purpose, we is an uncoded sequence of symbols from a finite signal appeal to the discrete-time spectral factorization theorem: constellation, and is of finite length, then the noise- Spectral Factorization Theorem (Discrete-Time) [83]: Let free received signal is the output of be an autocorrelation sequence with -transform a finite-state machine, and the sequence may be op- , and assume its discrete-time Fourier timally detected by maximum-likelihood sequence detection transform satisfies the discrete-time (MLSD)—i.e., by the Viterbi algorithm [41]. Alternatively, as Paley–Wiener condition shown in [103], MLSD may be performed directly on the MF output sequence , using a trellis of the same complexity. However, the main point of the development of this section where denotes integration over any interval of width . is that MLSD is not necessary to approach capacity; powerful Then can be factored as follows: coding of plus tail-canceling equalization suffices!) From the sequence of sampled WMF outputs the sequence of sampled MF outputs can always be obtained (5.19) by filtering with the stable filter response . where is causal ( for Because is a set of sufficient statistics for the estimation ), monic ( ), and minimum-phase, and of , must equally be a set of sufficient statistics. The is the discrete-time Fourier transform of . mutual information between the channel input and the MF or The factor is the geometric mean of over a band of WMF output sequence is therefore equal to the capacity width ; i.e., as given in (5.4). Because

(5.20) SNR (5.24) where the logarithms may have any common base. and , the capacity and its high-SNR The discrete-time Paley–Wiener criterion implies that approximation can be written as can have only a discrete set of algebraic zeroes. Spectral factorization is further explained in Appendix III. A causal, monic, minimum-phase response is called (5.25) canonical. The spectral factorization above is unique under the We now show that at high SNR the capacity of constraint that be canonical. the linear Gaussian channel is approximately equal to the If satisfies the Paley–Wiener condition, then (5.18) capacity of the ideal ISI-free channel that would be obtained if can be written in the form somehow ISI could be eliminated from the sequence .In (5.21) other words, ISI does not contribute significantly to capacity. in which is an i.i.d. Gaussian noise sequence with This observation was first made by Price [85]. symbol variance . Filtering by yields Suppose that somehow the ISI-causing “tail” of the response the channel model of the equivalent canonical discrete-time could be eliminated, so that in the receiver Gaussian channel rather than could (5.22) be observed. The signal-to-noise ratio of the resulting ideal AWGN channel would be SNR . Thus the where is an i.i.d. Gaussian noise sequence with variance - capacity of the ISI-free channel, and its high-SNR approxi- . This requires that has a stable inverse, and hence has a stable anti-causal inverse, which is true if mation, would be has no spectral zeroes. - (5.26) FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2405

Fig. 9. Coding objectives for a linear Gaussian channel with a known channel response r.

Price observed that at high SNR - , i.e., the transmitter knows the channel matrix , it can predistort the input vectors to be points in an appropriately predistorted - signal set (figuratively shown as a skewed integer (5.27) lattice). The volumes and per point in the two signal sets are equal. where the equality in (5.27) follows from (5.20). The noise-free output vectors are observed in the presence An alternative “minimum-mean-squared error” (MMSE) of i.i.d. Gaussian noise . A minimum-distance detector for form of this result has been obtained in [22] and [23]. the infinite signal set can therefore obtain the coding gain This result shows that if both ISI and bias are eliminated of on an ideal AWGN channel. in an MMSE-optimized receiver, then the resulting signal- In summary, coding and shaping take place in two different to-noise ratio SNR - is equal to SNR for any Euclidean spaces, which are connected by a known volume- linear Gaussian channel at any signal-to-noise ratio. If the preserving linear transformation. Coding and shaping may be interference in the MMSE-optimized unbiased receiver were separately optimized, by choosing an infinite signal set with Gaussian and uncorrelated with the signal, then this would a large coding gain (density) for an ideal AWGN channel in imply - with equality at all SNR’s; the coding space, and by choosing a shaping scheme with a however, in general, this relation is again only approximate. large shaping gain (small normalized second moment) in the Price’s result and the alternative MMSE result suggest shaping space. The channel intersymbol interference may be strongly that the combination of ISI-canceling equalization eliminated at the channel output by predistortion of the infinite method and powerful ideal-channel coding and shaping will signal set at the input, based on the known channel response. suffice to approach channel capacity on any linear Gaussian channel, particularly in the high-SNR regime. E. Precoding and Trellis Coding for Linear Gaussian Channels D. Coding Objectives for Linear ISI-AWGN Channels Precoding is a pre-equalization method that achieves the An intuitive explanation of the coding objectives for linear coding objectives set out in Section V-D, assuming that the Gaussian channels is presented in Fig. 9. canonical channel response is known at the transmitter. Signal sequences are shown here as finite-dimensional vec- A pre-equalized sequence is transmitted tors , , , and . The convolution that such that determines the noise-free output sequence becomes a vector a) the output sequence is the transformation by a channel matrix . Because output of an apparently ISI-free ideal channel with input the canonical response is causal and monic , sequence ; the matrix is triangular with an all-ones diagonal, so it b) the noise-free output sequence may therefore be has unit determinant: . It follows that the linear a coded sequence from a code designed for the ideal transformation from to is volume-preserving [23]. AWGN channel, and may be decoded by an ideal- To achieve shaping gain (minimize transmit power for a channel decoder; given volume), at the channel input the constellation points c) redundancy in may be used to minimize the aver- should be uniformly distributed within a hypersphere. The age power of the input sequence , channel matrix transforms the hypersphere into a hy- or to achieve other desirable characteristics for . perellipse of equal volume containing an equal number of Precoding was first developed for uncoded -PAM trans- constellation points . mission in two independent masters’ theses by Tomlinson [97] To achieve coding gain, the noise-free channel output vec- and Harashima [54]. Its application was not pursued at that tors should be points in an infinite signal set with good time because decision-feedback equalization was, and still is, distance properties (figuratively shown as an integer lattice). If the preferred ISI-canceling method for uncoded transmission. 2406 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 10. Tomlinson–Harashima precoding for one-dimensional w-PAM transmission, with or without trellis coding.

With trellis coding, however, decision-feedback equalization multiples of then cover (“tile”) all of real signal space, so is no longer attractive, because reliable decisions are not every real signal may be expressed uniquely as for available from a Viterbi decoder without significant delay. and some integer . The unique As we shall see, Tomlinson–Harashima precoding can be so determined is called “ .” combined with trellis coding, but not with shaping. Fig. 10 is drawn such that one can easily see that the Trellis precoding [35] was the first technique that allowed noiseless channel-output sequence is , combining trellis coding, precoding, and shaping. However, where is the sequence of PAM signals to be in trellis precoding, coding and shaping are coupled, which transmitted, and is an integer sequence generated by the to some degree inhibits control of constellation characteris- “ ” unit in the precoder so that the elements of the tics such as peak-to-average ratio (PAR). Therefore, during transmitted sequence are contained in , the development of the V.34 modem standard techniques and the elements of are in . The decoder operates called “flexible precoding” were proposed independently by on the noisy sequence and outputs Eyuboglu [80], Cole and Goldstein [52], and Laroia et al. the sequence , which is then reduced by the “ ” [73]. In flexible precoding, coding and shaping are decoupled. unit to the sequence .In Later, Laroia [71] proposed an “ISI precoding” technique that the absence of decoding errors, and therefore used feedback trellis coding (see Section IV-D) to reduce . Note that “inverse precoding” is memoryless, “dither power.” Further improvements were made by Cole and so no error propagation can occur. Because the channel Eyuboglu [53] and by Betts [4], and the [53] scheme was ulti- response does not need to be inverted in the receiver, mately adopted for the V.34 standard [62]. Recently, feedback may exhibit spectral nulls, e.g., contain factors of the trellis coding has been combined with Tomlinson–Harashima form . precoding [20]. The combination of trellis coding with TH precoding re- We will describe Tomlinson–Harashima precoding and flex- quires that is a valid code sequence. ible precoding, with and without trellis coding. The general For practically all trellis codes, when is a multiple of and principle applied in these schemes is as follows. Let the is a code sequence, then is also a code sequence canonical channel response be (in particular, this holds for “ ”or“ ” coset codes written as . Then the noise-free output [42], [43]). Thus trellis coding and Tomlinson–Harashima sequence of the channel is given by precoding are easily combined. (5.28) With a more complicated constellation , trellis coding can be combined with TH precoding by using the idea of where is the sequence of trailing intersymbol inter- feedback trellis coding [20], provided that a) the first-level ference. In all types of precoding, the precoder determines subsets and of are congruent; b) the constellation from past signals and sends shape region , namely, the union of the Voronoi regions so that the channel output will be a desired signal . of the signals in , is a space-filling region (has the “tiling” Tomlinson–Harashima (TH) precoding is illustrated in property). Condition b) obviously holds for -PAM and Fig. 10 for one-dimensional transmission with -PAM signal square -QAM constellations. It also holds for a constellations as defined in Section II-C. In the language of 12-QAM “cross” constellation, for example, but not for a 32- lattice constellations introduced in Section IV-A, an -PAM QAM cross constellation. We note that signal constellations signal constellation with a signal spacing of is defined whose sizes are not powers of two have become practically as important in connection with mapping techniques such as shell mapping and modulus conversion (see Section IV-B). In summary, Tomlinson–Harashima precoding permits the where for even and for odd, and combination of trellis coding with ISI-canceling (DFE- is the fundamental Voronoi region equivalent) equalization. Its main problem is the requirement of the sublattice . The translates of by integer that the signal space can be “tiled” with translates of the FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2407

Fig. 11. Flexible precoding for one-dimensional w-PAM transmission, with or without trellis coding. constellation shape region. Tomlinson–Harashima precoding The main advantage of flexible precoding is that any results in a small increase in transmit signal power, equal to constellation-shaping method can be used, whereas with the average energy of a random “dither variable” uniformly Tomlinson–Harashima precoding or trellis precoding, shaping distributed over the Voronoi region around the constellation is connected with coding. The main disadvantage is that signals. If the Voronoi region is , then the increase decoding errors tend to propagate due to the channel inversion in signal power is , which is negligible for large in the receiver, which is particularly troublesome on channels constellations. with spectral nulls. A technique to mitigate error propagation Flexible (FL) precoding is illustrated in Fig. 11, again for in the inverse precoder is described in [37]. The main one-dimensional -PAM transmission. The general concept application of flexible precoding so far has been in V.34 of FL precoding is to subtract from a sequence of - . The application of precoding in digital subscriber PAM signals a sequence of smallest “dither” variables lines has been studied in [38]. such that the channel-output sequence is a valid uncoded In conclusion, at high SNR’s, precoding in combination with or coded sequence with elements in . The constellation powerful trellis codes and shaping schemes allows the capacity shape region of the signal constellation can be arbitrary, of an arbitrary linear Gaussian channel to be approached as so any shaping scheme can be used. At time , given the closely as capacity can be approached on an ideal ISI-free ISI term , the “ ” unit in the precoder determines the AWGN channel, with about the same coding and shaping integer that is closest to . The dither variable is complexity. the difference . Clearly, lies in the Voronoi interval . It will typically be uniformly VI. FINAL REMARKS distributed over , and thus increases the average energy Shannon’s papers established ultimate limits, but gave no of the transmit signal by . constructive methods to achieve them. They thereby threw As can be seen from Fig. 11, the sequence of noiseless down the gauntlet for the new field of coding. In particular, channel-output signals becomes they established fundamental benchmarks for linear Gaussian channels that have posed tough challenges for subsequent code inventors. The discouraging initial progress in developing good con- The decoder operates on the noisy sequence structive codes was captured by the folk theorem [119]: “All and produces the sequence . To obtain the sequence codes are good, except those that we know of.” However, by , the sequence is recovered the early 1970’s, good practical codes had been developed for by channel inversion as shown in Fig. 11. A “bit-identical” the low-SNR regime: realization of the two functional blocks shown in dashed blocks in Fig. 11 is absolutely essential. • moderate-constraint-length binary convolutional codes The combination of flexible precoding with feedback trellis with Viterbi decoding for 3–6-dB coding gain with coding works as follows. At time , the transmitter knows the moderate complexity; integer and the current trellis-code state . To continue a • long-constraint length convolutional codes with sequen- valid code sequence at the noiseless channel output, the tial decoding to reach ; transmitter selects a data symbol such that is in • oncatenation of RS outer codes with algebraic decoding the first-level infinite subset or determined by . and moderate-complexity inner codes to achieve very low The symbol is chosen to lie in a desired constellation shape error rates near . region , or according to some other shaping scheme. The By the 1980’s, the invention of trellis codes enabled similar symbol then determines the next state , as explained progress to be made in the high-SNR regime, at least for in Section IV-D. the ideal band-limited AWGN channel. In the 1990’s, these 2408 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 gains were extended to general linear Gaussian channels by multiples of the subchannel modulation rate . Let use of multicarrier modulation, or alternatively by use of be the sequence of generally complex data symbols transmitted single-carrier modulation with transmitter precoding. in the th subchannel, and let , Notably, many of these techniques were developed by denote the -vector transmitted over these channels at practically oriented engineers in standards-setting groups and time . in industrial R&D environments. Moreover, as a consequence A DFT-based “synthesis” filter bank generates at rate of advances in VLSI design and technology, the lag between the transmit signals invention and appearance in commercial products has often been very short. (AI.1) By now, it certainly appears that the field of modulation and coding for Gaussian channels has reached a mature state, at least below the cutoff rate , even though very few systems where , is a real causal symbol response with reaching have ever actually been implemented. In the a baseband spectrum . This -sampled response is beyond- regime, however, the invention of turbo codes has sometimes called the “prototype response.” The least integer touched off intense activity that seems likely to continue well such that for all is called the into the next half-century. “overlap factor.” Note that the sequence has an - periodic spectrum and is composed of subchannels whose APPENDIX I center frequencies are located at integer multiplies of , MULTICARRIER MODULATION as illustrated in Fig. 12. The history of multicarrier modulation began more than 40 If we express the time index as , with years ago with an early system called Kineplex [30] designed and , then (AI.1) becomes for digital transmission in the HF band. Work by Holsinger [57] and others followed. The use of the discrete Fourier transform (DFT) for modula- tion and was proposed in [115]. DFT-based mul- ticarrier systems are now referred to as orthogonal frequency- division (OFDM) or discrete multitone (DMT) systems [2], [14], [56], [90]. Basically, OFDM/DMT is a (AI.2) form of frequency-division multiplexing (FDM) in which modulation symbols are transmitted in individual subchannels where is the inverse discrete using QAM or CAP modulation (see Section II-B). Current ap- Fourier transform (IDFT) of , , plications of OFDM/DMT include digital audio broadcasting and , , for , (DAB) [33] and asynchronous digital subscriber lines (ADSL) is the -sampled th-phase component of the prototype [1]. response. Thus (AI.2) shows that the signals of may be Another multicarrier transmission technique of the FDM generated by independent convolutions of sequences of the type became more recently known as discrete wavelet multi- time-domain symbols with the corresponding phase tone (DWMT) modulation [91], [102]. DWMT modulation has components of the prototype response. Such a system its origin in filter banks for subband source coding. Efficient is called a DFT-based polyphase filter bank [7], [106]. implementations of DWMT modulation and demodulation To obtain real transmit signals, the frequency-domain employ the discrete cosine transform (DCT). modulation symbols must exhibit Hermitian symmetry; i.e., A comprehensive treatment of multirate systems and filter . Then the IDFT coefficients are real banks is given in [106]. Recent reviews of theory and appli- and hence the signals are real. The restriction cations of filter banks and wavelet transforms are presented to Hermitian symmetry allows for real-signal degrees of in [99]. freedom, i.e., it permits mappings of real data symbols The coding aspects of multicarrier systems have been dis- , into complex symbols , cussed in Section IV-B. In this appendix, we give brief . For even, one possible mapping is descriptions of DMT and DWMT modulation (together with some background information). For the purposes of this appen- dix, we assume transmission over a noise-free discrete-time real linear channel which is modeled by , where is the channel response. (AI.3) A. DFT-Based Filter Banks and Discrete Multitone (DMT) Modulation In the absence of distortion, the IDFT coefficients can be recovered from by a polyphase “analysis” filter Before specifically addressing DMT modulation, we exam- bank, which performs the matched filter operations ine the general concept of DFT-based multicarrier modulation. These systems subdivide the channel into narrowband (AI.4) subchannels whose center frequencies are spaced by integer FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2409

Fig. 12. Spectrum of DFT-based multicarrier modulation (x even).

If the orthogonality conditions every block of IDFT coefficients is cyclically extended by (AI.5) coefficients. For example, with “cyclic prefix extension” the th block is extended from length to length as follows: are satisfied for all and , then . Finally, the -vector , is obtained as the DFT of , . (AI.7) We note from (AI.5) that in an ideal system each of the Assume the actual length of the channel response is phase components of the prototype response must individually . Then the received sequence contains cyclic signal satisfy the Nyquist condition for zero-ISI transmission in that repetitions within windows of length for every received subchannel. Then, summing (A1.4) over block of extended length . The receiver selects blocks shows that also the prototype response satisfies the orthogo- of signals of length such that the received signals at the nality condition for transmission at rate . beginning and end of these blocks are cyclically repeating. The orthogonality conditions are trivially satisfied if the The DFT of these blocks is then prototype response is a sampled rectangular pulse of length , i.e., . In this case, no (AI.8) polyphase filtering is required. However, because the spectra where is the Fourier transform of the channel re- of the subchannels will then overlap according to the sponse at . Multiplication with the estimated inverse shape of the prototype response spectrum, small amounts of spectral channel response yields the desired symbols . channel distortion can be sufficient to destroy orthogonality With this “one tap per symbol” equalization method, distortion and cause severe interchannel interference (ICI). For some is dealt with in a simple manner at the expense of a rate loss applications, a higher spectral concentration of the subchannel by the factor of . For example, the ADSL standard spectra will therefore be desirable. This leads to a filter design [1] specifies and , which results in a loss problem, which consists in minimizing for given filter length, of 5.8%. i.e., for given overlap factor , the spectral energy of In practical DMT systems, one may not rely entirely on the the prototype response outside of the band while cyclic-extension method to cope with distortion. An adaptive approximately maintaining orthogonality. pre-equalizer may be employed to minimize the channel- For DMT modulation, for is used, and response length prior to the DFT-based receiver operations the transmitted signals could simply be the unfiltered IDFT described above. The problem is essentially similar to adjust- coefficients: ing the forward filter in a decision-feedback equalizer or an (AI.6) MLSD receiver such that the channel memory is truncated to a given length [21], [36]. In the absence of channel distortion, this simple scheme would be sufficient. However, if the channel response has length (i.e., ), then the last IDFT coefficients of every transmitted block of IDFT B. Cosine-Modulated Filter Banks and Discrete coefficients would interfere with the first coefficients of Wavelet Multitone (DWMT) Modulation the next block (assuming ). In principle, combinations Cosine-modulated filter banks were first introduced for of pre-equalization and post-equalization (before and after the subband speech coding. The related concept of quadrature DFT operation in the receiver) could be used to mitigate the mirror filters (QMF) was probably first mentioned in [32]. In effects of this interference. the field of digital communications, multicarrier modulation Practical DMT systems employ the simpler method of by means of cosine-modulated filter banks is referred to as “cyclic extension,” originally suggested in [86], to cope with DWMT modulation [91], [102]. The main feature of DWMT distortion. If the channel response has at most length , then is that all signal processing is performed on real signals. 2410 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

Fig. 13. Spectrum of cosine-modulated multicarrier modulation.

DWMT can be understood as a form of multiple carrierless code weights), and in performance analysis one may assume single-sideband modulation (CSSB, see Section II-B). that the all-zero sequence was transmitted. The sequences , , are now sequences Similar uniformity properties are helpful in the design and of real modulation symbols. DWMT employs a real prototype analysis of Euclidean-space codes. A variety of such proper- response with a baseband spectrum that satisfies ties have been introduced, from the original “quasilinearity” the Nyquist criterion for ISI-free transmission at symbol rate property of Ungerboeck [104] to geometric uniformity [44] (a . Also, has no more than 100% spectral roll-off property that generalizes the group property of binary linear (i.e., for ). Furthermore, it is convenient codes), and finally rotational invariance. In this section we to assume that has zero phase. briefly review these properties. A cosine-modulated “synthesis” filter bank generates at rate A. Geometrical Uniformity the transmit signals As we have seen in Section IV, for coding purposes it (AI.9) is useful to regard a large constellation based on a dense packing such as a lattice translate as an effectively where the samples of the symbol response for the th sub- infinite constellation consisting of the entire packing. This channel are given by approximation eliminates boundary effects, and the resulting infinite constellation usually has greater symmetry. As we have seen, shaping methods (selection of a finite subconstellation) (AI.10) may be designed and analyzed separately in the high-SNR The corresponding spectral symbol response is then regime. The symmetry group of a constellation is the set of isome- tries of Euclidean -space (i.e., rotations, reflections, translations) that map the constellation to itself. For a lattice (AI.11) translate , the symmetry group includes the infinite The spectrum of the transmitted signal is shown in Fig. 13. set , of all translations by lattice elements One can verify that any two overlapping frequency-shifted . It will also generally include a finite set of orthogonal prototype spectra differ in phase by . This is the so-called transformations; e.g., the symmetry group of the QAM lattice QMF condition, which ensures orthogonality [106]. translate includes the eight symmetries of For a DCT-based polyphase realization of cosine-modulated the square (the dihedral group ). A constellation is geo- filter banks, see [106]. metrically uniform if it is the orbit of any of its points under its symmetry group. A lattice translate is necessarily APPENDIX II geometrically uniform, because it can be generated by the UNIFORMITY PROPERTIES OF EUCLIDEAN-SPACE CODES translation group , .An -PSK constellation is A binary linear code has a useful and fundamental unifor- geometrically uniform, because it can be generated by rotations mity property: the code “looks the same” from the viewpoint of multiples of . of any of its sequences. The number of neighbors at each A fundamental region of a lattice is a compact distance is the same for all code sequences, and, provided region that contains precisely one point from each translate the channel has appropriate symmetry, the error probability (coset) of . For example, any Voronoi region of is independent of which code sequence is transmitted. If this is a fundamental region if boundary points are appropriately uniformity property holds, then one has to consider only the divided between neighboring regions. The translates set of code distances from the all-zero sequence (i.e., the set of , of any fundamental region tile -space FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2411

without overlap, so that every can be uniquely Quasilinearity depends only on the use of a rate- expressed as for some and , where binary linear convolutional code in an encoder like that in is the unique element of in the coset . The volume Fig. 5, and the fact that the subsets and associated of any fundamental region is . with the first-level partition are geometrically congruent to As discussed in Section IV, trellis codes are often based on each other. lattice partitions . A useful viewpoint that has emerged Let and be any two binary convolutional code from multilevel coding is to regard such codes as being sequences, and let their difference be the error sequence defined on a fundamental region of the sublattice . , which by linearity is also a code sequence. The cosets of in are represented by their For each error term , define the squared Euclidean weight unique representatives in . At the front end of the receiver, each received point is mapped to the unique (AII.1) point of in . Although such a “ - map” is not information-lossless, it has been shown that it does where ranges over all binary -tuples with not reduce capacity [46]. Moreover, it decouples coding and the given bit , and and range over all signals decoding implementation and performance from any coding in the subsets labeled by and , respectively. that may occur at other levels. Conceptually, this viewpoint If and are congruent, then it is easy to see that reduces infinite-constellation coding and decoding to coding . We may then simply write for and decoding using a finite constellation of points on this common minimum. the compact region . It then follows that the minimum squared distance A geometrically uniform code based on a lattice partition between any two trellis-code sequences that correspond to may be constructed as follows. Let be a label group different convolutional-code sequences and is that is isomorphic to the quotient group ; or, more generally, let be a group of symmetries that can (AII.2) generate a set of subsets congruent to whose union is . Let be a time axis (e.g., for The proof is that and may differ arbitrarily in the block codes, or for convolutional codes), let be label bits and , to which the the set of all sequences of elements of defined encoder adds appropriate parity-check bits and , and for on , which form a group under the componentwise group any the minimum is achievable with operation of , and let be any subgroup of . either choice of . Then the orbit of any sequence of subsets of under is Thus quasilinearity permits to be computed just as a geometrically uniform Euclidean-space code. In particular, the minimum Hamming distance of binary linear convolutional for any geometrically uniform code, the set of distances from codes is computed; one simply replaces Hamming weights any code sequence to all other code sequences is the same for by the squared Euclidean weights [104], [120] all code sequences, and the probability of error on an ideal in trellis searches. This property has been used extensively AWGN channel does not depend on which sequence was sent. in searches for the trellis codes reported in [104] and by This observation has stimulated research into group codes, subsequent authors. namely codes that are subgroups of sequence groups , The number of codes to be searched for a code with the and their geometrically uniform Euclidean-space images. Al- highest value of can be further reduced significantly though the ultimate goal of this research has been to construct if attains for all the set-partitioning lower bound new classes of good codes, the main results so far have been , where is the minimum intrasubset squared to show that most known good codes constructed by other distance at the first partitioning level for which [104]. methods are geometrically uniform. Ungerboeck’s original 1D This property usually holds for large geometrically uniform and 2D codes are easily shown to be geometrically uniform. constellations. Trott showed that the nonlinear Wei 4D -state trellis code used in the V.32 modem standard may be represented as a group code whose state space is the non-Abelian dihedral C. Rotational Invariance group [100]. The 4D -state Wei code and the -state In carrier-modulated complex-signal transmission systems, Williams code used in the V.34 modem standard are both the absolute carrier phase of the received waveform signal geometrically uniform; however, Sarvis and Trott have shown is generally not known. The receiver may demodulate the that there is no geometrically uniform code with the same received signal with a constant carrier-phase offset , parameters as the 4D -state Wei V.34 code [92]. where is the group of phase rotations that leave the two- dimensional signal constellation invariant. Then, if a signal B. Quasilinearity sequence was transmitted, the decoder operates on the A weaker uniformity property introduced in [104] is quasi- sequence of rotated signals . For example, for linearity. This property permits the minimum squared distance -QAM we have . Rotational of a trellis code to be found by simply computing invariance is the property that rotated code sequences are also the weights of a set of error sequences, rather than by pairwise valid code sequences. This is obviously the case for uncoded comparison of all pairs of possible code sequences. modulation with signal constellations that exhibit rotational 2412 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 symmetries. To achieve transparency of the transmitted infor- series mation under phase offsets, phase-differential encoding and decoding may be employed. (AIII.1) A trellis code is fully rotationally invariant if for all coded signal sequences and all phase offsets whose coefficients are given by the rotated sequences are code sequences, i.e., . Note that signals here always denote (AIII.2) two-dimensional signals, which may be component signals of a higher dimensional signal constellation. Trellis codes are not necessarily rotationally invariant under The Paley–Wiener condition ensures that this Fourier series the same set of symmetries as their two-dimensional signal exists. Grouping the terms with negative, zero, and positive constellations. It has been shown that trellis codes based on indices of this Fourier series yields the desired factorization linear binary convolutional codes with mapping into two- dimensional -PSK or -QAM signals can at most be invariant under 180 rotation. Fully rotationally invariant linear trellis codes exist only for higher dimensional constellations. Such codes are given for QAM in [105], [113], , and for -PSK in [87], [114], (AIII.3) ; . Fully rotationally invariant trellis codes over two- dimensional PSK or QAM constellations must be nonlinear In particular, this yields (5.20). [88], [112]. We briefly describe the analytic approach To obtain an explicit expression for the coefficients of , presented in [88] for the design of nonlinear two-dimensional define the formal power series PSK and QAM trellis codes by an example. Let a QAM signal (AIII.4) constellation be partitioned into eight subsets with subset labels , where (ring of integers modulo ). The labeling is chosen such where is an indeterminate. Because , that under 90 rotation the labels are transformed into we have . A fully rotationally invariant trellis code can then be found by requiring that for all label (AIII.5) sequences that satisfy a given parity-check equation, the sequences also Taking formal derivatives yields satisfy this equation. This is achieved by a binary nonlinear parity-check equation of the form (AIII.6) or, equivalently, (AII.3) (AIII.7) where the coefficients of are binary, and the coefficients Repeated formal differentiation of (AIII.7) yields the recursive of and the elements of are elements of . The relation for notation means that from the binary representation of every element in the most significant bit is chosen. Let denote the all-ones sequence. The codes defined by the parity-check (AIII.8) equation are rotationally invariant if Finally, noting that for all , which requires that the coefficient sum of must be . Code tables for fully rotationally invariant nonlinear trel- (AIII.9) lis codes over two-dimensional QAM and -PSK signal constellations are given in [88]. we obtain the explicit expressions Alternative methods for the construction of rotationally invariant trellis codes are discussed in [10] and [101].

(AIII.10) APPENDIX III DISCRETE-TIME SPECTRAL FACTORIZATION This shows that is causal and monic, and that The spectral factorization given by is uniquely determined by . For a proof that is (5.19) may be obtained by expressing as a Fourier minimum-phase, see [83]. FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2413

ACKNOWLEDGMENT [20] G. Cherubini, S. Oelcer, and G. Ungerboeck, “Trellis precoding for channels with spectral nulls,” in Proc. 1997 IEEE Int. Symp. Information G. D. Forney wishes to acknowledge the education, stimula- Theory (Ulm, Germany, June 1997), p. 464. tion, and support offered by many mentors and colleagues over [21] J. S. Chow, J. M. Cioffi, and J. A. Bingham, “Equalizer training the past 35 years, particularly R. G. Gallager, J. L. Massey, J. algorithms for multicarrier modulation systems,” in Proc. 1993 Int. Conf. Communication (Geneva, Switzerland, May 1993), pp. 761–765. M. Wozencraft, A. Kohlenberg, S. U. Qureshi, G. R. Lang, F. [22] J. M. Cioffi, G. P. Dudevoir, M. V. Eyuboglu, and G. D. Forney, Jr., M. Longstaff, L.-F. Wei, and M. V. Eyuboglu. “MMSE decision-feedback equalizers and coding—Parts I and II,” IEEE Trans. Commun., vol. 43, pp. 2582–2604, Oct. 1995. G. Ungerboeck wishes to thank K. E. Drangeid, Director [23] J. M. Cioffi and G. D. Forney, Jr., “Generalized decision-feedback of the IBM Zurich Research Laboratory, for his essential equalization for packet transmission with ISI and Gaussian noise,” support of this author’s early work on new modem concepts in Communications, Computation, Control and Signal Processing,A. Paulraj et al., Eds. Boston, MA: Kluwer, 1997, pp. 79–127. and their implementation. He also gratefully acknowledges the [24] Consultative Committee for Space Data Standards, “Recommendations stimulation and encouragement that he has received throughout for space data standard: Telemetry channel coding,” Blue Book Issue 2, his career from numerous colleagues, in particular R. E. CCSDS 101.0-B2, Jan. 1987. [25] J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices and Groups. Blahut, G. D. Forney, J. L. Massey, and A. J. Viterbi. New York: Springer, 1988. [26] D. J. Costello, Jr., J. Hagenauer, H. Imai, and S. B. Wicker, “Applica- tions of error control coding,” this issue, pp. 0000–0000. [27] M. C. Davey and D. J. C. MacKay, “Low-density parity-check codes REFERENCES over GF @qA,” in Proc. 1998 Information Theory Workshop (Killarney, Ireland, June 1998). [1] American National Standards Institute, “Network and customer installa- [28] B. Dejon and E. Hamsler,¨ “Optimum multiplexing of sampled signals on tion interfaces—Asymmetrical digital subscriber line (ADSL) metallic noisy channels,” IEEE Trans. Inform. Theory, vol. IT-17, pp. 257–262, interface,” ANSI Std. T1E1, pp. 413–1995, Aug. 18, 1995. May 1971. [2] A. N. Akansu, P. Duhamel, X. Lin, and M. de Courville, “Orthogonal [29] D. Divsalar, S. Dolinar, and F. Pollara, “Draft CCDS recommenda- transmultiplexers in communications: A review,” IEEE Trans. Signal tion for telemetry channel coding (updated to include turbo codes),” Processing, vol. 46, pp. 979–995, Apr. 1998. Consultative Committee for Space Data Systems, White Book, Jan. 1997. [3] AT&T (B. Betts), “Additional details on AT&T’s candidate modem [30] M. L. Doeltz, E. T. Heald, and D. L. Martin, “Binary data transmission for V.fast,” Contribution D157, ITU Study Group XVII, Geneva, techniques for linear systems,” Proc. IRE, vol. 45, pp. 656–661, May Switzerland, Oct. 1991. 1957. [4] AT&T (B. Betts), “Trellis-enhanced precoder—Combined coding and [31] P. Elias, “Coding for noisy channels,” in IRE Conv. Rec., Mar. 1955, precoding,” TIA TR30.1 contribution, Boston, MA, July 1993. vol. 3, pt. 4, pp. 37–46. [5] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of [32] D. Esteban and C. Galand, “Application of quadrature mirror filters to linear codes for minimizing symbol error rate,” IEEE Trans. Inform. splitband voice coding schemes,” in Proc. IEEE Int. Conf. Acoustics, Theory, vol. IT-20, pp. 284–287, Mar. 1974. Speech, and Signal Processing, May 1977, pp. 191–195. [6] L. E. Baum and T. Petrie, “Statistical inference for probabilistic func- [33] European Telecommunication Standards Institute, “Radio broadcasting tions of finite-state Markov chains,” Ann. Math. Statist., vol. 37, pp. systems; digital audio broadcasting (DAB) to mobile, portable and fixed 1554–1563, 1966. receivers,” ETSI Std. ETS 300 401, Feb. 1995. [7] M. Bellanger, G. Bonnerot, and M. Coudreuse, “Digital filtering by [34] M. V. Eyuboglu and G. D. Forney, Jr., “Combined equalization and polyphase network: Application to the sample rate alteration and filter coding using precoding,” IEEE Commun. Mag., vol. 29, no. 12, pp. banks,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-24, 25–34, Dec. 1991. pp. 109–114, Apr. 1976. [35] , “Trellis precoding: Combined coding, precoding and shaping for [8] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Bandwidth intersymbol interference channels,” IEEE Trans. Inform. Theory, vol. efficient parallel concatenated coding schemes,” Electron. Lett., vol. 31, 38, pp. 301–314, Mar. 1992. pp. 2067–2069, 1995. [36] D. D. Falconer and F. R. Magee, “Adaptive channel memory truncation [9] , “Serial concatenation of interleaved codes: Performance analy- for maximum likelihood sequence estimation,” Bell Syst. Tech. J., vol. sis, design, and iterative decoding,” IEEE Trans. Inform. Theory, vol. 52, pp. 1541–1562, Nov. 1973. 44, pp. 909–926, May 1998. [37] R. F. H. Fischer, “Using flexible precoding for channels with spectral [10] S. Benedetto, R. Garello, M. Mondin, and M. D. Trott, “Rotational nulls,” Electron. Lett., vol. 31, pp. 356–358, Mar. 1995. invariance of trellis codes—Part II: Group codes and decoders,” IEEE [38] R. F. H. Fischer and J. B. Huber, “Comparison of precoding schemes for Trans. Inform. Theory, vol. 42, pp. 766–778, May 1996. digital subscriber lines,” IEEE Trans. Commun., vol. 45, pp. 334–343, [11] S. Benedetto and G. Montorsi, “Unveiling turbo codes: Some results Mar. 1997. on parallel concatenated coding schemes,” IEEE Trans. Inform. Theory, [39] G. D. Forney, Jr., Concatenated Codes. Cambridge, MA: MIT Press, vol. 42, pp. 409–428, Mar. 1996. 1966. [12] E. R. Berlekamp, “Bounded distance CI soft-decision Reed–Solomon [40] , “Burst-correcting codes for the classic bursty channel,” IEEE decoding,” IEEE Trans. Inform. Theory, vol. 42, pp. 704–720, May 1996. Trans. Commun. Technol., vol. COM-19, pp. 772–781, Oct. 1971. [13] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit [41] , “Maximum-likelihood sequence estimation of digital sequences error-correcting coding and decoding: Turbo codes,” in Proc. 1993 in the presence of intersymbol interference,” IEEE Trans. Inform. Int. Conf. Communication (Geneva, Switzerland, May 1993), pp. Theory, vol. IT-18, pp. 363–378, May 1972. 1064–1070. [42] , “Coset codes—Part I: Introduction and geometrical classifi- [14] J. A. Bingham, “Multicarrier modulation for data transmission: An idea cation,” IEEE Trans. Inform. Theory, vol. 34, pp. 1123–1151, Sept. whose time has come,” IEEE Commun. Mag., vol. 28, pp. 5–14, May 1988. 1990. [43] , “Coset codes—Part II: Binary lattices and related codes,” IEEE [15] G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded modula- Trans. Inform. Theory, vol. 34, pp. 1152–1187, Sept. 1988. tion,” IEEE Trans. Inform. Theory, vol. 44, pp. 927–946, May 1998. [44] , “Geometrically uniform codes,” IEEE Trans. Inform. Theory, [16] A. R. Calderbank and L. H. Ozarow, “Nonequiprobable signaling on the vol. 37, pp. 1241–1260, Sept. 1991. Gaussian channel,” IEEE Trans. Inform. Theory, vol. 36, pp. 726–740, [45] , “Trellis shaping,” IEEE Trans. Inform. Theory, vol. 38, pp. July 1990. 281–300, Mar. 1992. [17] A. R. Calderbank and N. J. A. Sloane, “New trellis codes based [46] , “Approaching the channel capacity of the AWGN channel with on lattices and cosets,” IEEE Trans. Inform. Theory, vol. IT-33, pp. coset codes and multilevel coset codes,” Aug. 1997, submitted to IEEE 177–195, Mar. 1987. Trans. Inform. Theory. [18] M. Cedervall and R. Johannesson, “A fast algorithm for computing the [47] G. D. Forney, Jr., L. Brown, M. V. Eyuboglu, and J. L. Moran, III, distance spectrum of convolutional codes,” IEEE Trans. Inform. Theory, “The V.34 high-speed modem standard,” IEEE Commun. Mag., vol. 34, vol. 35, pp. 1146–1159, Nov. 1989. pp. 28–33, Dec. 1996. [19] D. A. Chase, “A class of algorithms for decoding block codes with [48] G. D. Forney, Jr. and L.-F. Wei, “Multidimensional constellations–Part channel measurement information,” IEEE Trans. Inform. Theory, vol. I: Introduction, figures of merit, and generalized cross constellations,” IT-18, pp. 170–182, Jan. 1972. IEEE J. Select. Areas Commun., vol. 7, pp. 877–892, Aug. 1989. 2414 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998

[49] P. Fortier, A. Ruiz, and J. M. Cioffi, “Multidimensional signal sets [78] J. L. Massey, “Coding and modulation in digital communications,” through the shell construction for parallel channels,” IEEE Trans. in Proc. 1974 Int. Zurich Seminar on Digital Communication (Zurich, Commun., vol. 40, pp. 500–512, Mar. 1992. Switzerland, Mar. 1974), pp. E2(1)–(4). [50] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MA: [79] R. J. McEliece and L. Swanson, “Reed–Solomon codes and the ex- MIT Press, 1962. ploration of the solar system,” in Reed–Solomon Codes and Their [51] , Information Theory and Reliable Communication. New York: Applications, S. B. Wicker and V. K. Bhargava, Eds. Piscataway, NJ: Wiley, 1968. IEEE Press, 1994, pp. 25–40. [52] General Datacomm (P. Cole and Y. Goldstein), “Distribution-preserving [80] Motorola Information Systems Group (M. V. Eyuboglu), “A flexible Tomlinson algorithm,” contribution D189 to CCITT Study Group XVII, form of precoding for V.fast,” Contribution D194, CCITT Study Group Geneva, Switzerland, June 1992. XVII, Geneva, Switzerland, June 1992. [53] General Datacomm (Y. Goldstein), Motorola (M. V. Eyuboglu), and [81] S. H. Mueller and J. B. Huber, “A comparison of peak power reduction Rockwell (S. Olafsson), “Precoding for V.fast,” TIA TR30.1 contribu- schemes for OFDM,” in Proc. 1997 IEEE Global Telecomm. Conf. tion, Boston, MA, July 1993. (Globecom’97) (Phoenix, AZ, Nov. 1997), pp. 1–5. [54] H. Harashima and H. Miyakawa, “A method of code conversion for [82] A. Narula and F. R. Kschischang, “Unified framework for PAR control a digital communication channel with intersymbol interference,” IEEE in multitone systems,” ANSI T1E1.4 contribution 98-183, Huntsville, Trans. Commun., vol. COM-20, pp. 774–780, Aug. 1972. AL, June 1998. [55] J. A. Heller and I. M. Jacobs, “Viterbi decoding for satellite and space [83] A. Papoulis, Signal Analysis. New York: McGraw-Hill, 1984. communication,” IEEE Trans. Commun. Technol., vol. COM-19, pp. [84] L. C. Perez, J. Seghers, and D. J. Costello, Jr., “A distance spectrum 835–848, Oct. 1971. interpretation of turbo codes,” IEEE Trans. Inform. Theory, vol. 42, pp. [56] B. Hirosaki, “An orthogonally multiplexed QAM system using the 1698–1709, Nov. 1996. discrete Fourier transform,” IEEE Trans. Commun., vol. COM-29, pp. [85] R. Price, “Nonlinearly feedback-equalized PAM versus capacity for 982–989, July 1981. noisy filter channels,” in Proc. 1972 Int. Conf. Communication, June [57] J. L. Holsinger, “Digital communication over fixed time-continuous 1972, pp. 22.12–22.17. channels with memory, with special application to telephone channels,” [86] A. Peled and A. Ruiz, “Frequency domain data transmission using MIT Res. Lab. Electron., Tech. Rep. 430, 1964. reduced computational complexity algorithms,” in Proc. IEEE Int. Conf. [58] J. Huber and U. Wachsmann, “Capacities of equivalent channels in Acoustics, Speech, and Signal Processing, Apr. 1980, pp. 964–967. multilevel coding schemes,” Electron. Lett., vol. 30, pp. 557–558, Mar. [87] S. S. Pietrobon, R. H. Deng, A. Lafanechere, G. Ungerboeck, and D. J. 1994. Costello, Jr., “Trellis-coded multidimensional ,” IEEE [59] J. Huber, U. Wachsmann, and R. Fischer, “Coded modulation by mul- Trans. Inform. Theory, vol. 36, pp. 63–89, Jan. 1990. tilevel codes: Overview and state of the art,” in Proc. ITG Fachtagung [88] S. S. Pietrobon, G. Ungerboeck, L. C. Perez, and D. J. Costello, “Codierung fuer Quelle, Kanal und Uebertragung" (Aachen, Germany, Jr., “Rotationally invariant nonlinear trellis codes for two-dimensional Mar. 1998), pp. 255–266. modulation,” IEEE Trans. Inform. Theory, vol. 40, pp. 1773–1791, Nov. [60] H. Imai and S. Hirakawa, “A new multilevel coding method using error 1994. correcting codes,” IEEE Trans. Inform. Theory, vol. 23, pp. 371–377, [89] P. Robertson and T. Woerz, “Coded modulation scheme employing turbo May 1977. codes,” Electron. Lett., vol. 31, pp. 1546–1547, Aug. 1995. [61] Proc. Int. Symp. Turbo Codes (Brest, France, Sept. 1997). [90] A. Ruiz, J. M. Cioffi, and S. Kasturia, “Discrete multiple tone modula- [62] ITU-T Recommendation V.34, “A modem operating at data signalling tion with coset coding for the spectrally shaped channel,” IEEE Trans. rates of up to 33 600 bit/s for use on the general switched telephone Commun., vol. 40, pp. 1012–1029, June 1992. network and on leased point-to-point 2-wire telephone-type circuits,” [91] S. D. Sandberg and M. A. Tzannes, “Overlapped discrete multitone Oct. 1996, replaces first version of Sept. 1994. modulation for high speed copper wire commmunications,” IEEE J. [63] I. M. Jacobs and E. R. Berlekamp, “A lower bound to the distribution Select. Areas Commun., vol. 13, pp. 1570–1585, Dec. 1995. [92] J. P. Sarvis, “Symmetries of trellis codes,” M. Eng. thesis, Dept. Elec. of computation for sequential decoding,” IEEE Trans. Inform. Theory, Eng. and Comp. Sci., MIT, Cambridge, MA, June 1995. vol. IT-13, pp. 167–174, 1967. [93] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. [64] I. Kalet, “The multitone channel,” IEEE Trans. Commun., vol. 37, pp. Tech. J., vol. 27, pp. 379–423 and pp. 623–656, July and Oct. 1948. 119–124, Feb. 1989. [94] , “Communication in the presence of noise,” Proc. IRE, vol. 37, [65] S. Kallel and K. Li, “Bidirectional sequential decoding,” IEEE Trans. pp. 10–21, 1949. Inform. Theory, vol. 43, pp. 1319–1326, July 1997. [95] , “Probability of error for optimal codes in a Gaussian channel,” [66] A. K. Khandani and P. Kabal, “Shaping multidimensional signal Bell Syst. Tech. J., vol. 38, pp. 611–656, May 1959. spaces—Part I: Optimum shaping, shell mapping,” IEEE Trans. Inform. [96] M. Sipser and D. A. Spielman, “Expander codes,” IEEE Trans. Inform. Theory, vol. 39, pp. 1799–1808, Nov. 1993. Theory, vol. 42, pp. 1710–1722, Nov. 1996. [67] Y. Kofman, E. Zehavi, and S. Shamai (Shitz), “Performance analysis [97] M. Tomlinson, “New automatic equalizer employing modulo arith- of a multilevel coded modulation system,” IEEE Trans. Commun., vol. metic,” Electron. Lett., vol. 7, pp. 138–139, Mar. 1971. 42, pp. 299–312, Feb. 1994. [98] P. Tong, “A 40 MHz encoder-decoder chip generated by a [68] F. R. Kschischang and S. Pasupathy, “Optimal nonuniform signaling for Reed–Solomon compiler,” in Proc. Cust. Int. Circuits Conf. (Boston, Gaussian channels,” IEEE Trans. Inform. Theory, vol. 39, pp. 913–929, MA, May 1990), pp. 13.5.1–13.5.4. May 1993. [99] IEEE Trans. Signal Processing (Special Issue on Theory and Applica- [69] G. R. Lang and F. M. Longstaff, “A Leech lattice modem,” IEEE J. tions of Filter Banks and Wavelet Transforms), vol. 46, Apr. 1998. Select. Areas Commun., vol. 7, pp. 968–973, Aug. 1989. [100] M. D. Trott, “The algebraic structure of trellis codes,” Ph.D. dissertation, [70] P. J. Lee, “There are many good periodically-time-varying convolutional Dept. Elec. Eng., Stanford Univ., Stanford, CA, Aug. 1992. codes,” IEEE Trans. Inform. Theory, vol. 35, pp. 460–463, Mar. 1989. [101] M. D. Trott, S. Benedetto, R. Garello, and M. Mondin, “Rotational [71] R. Laroia, “Coding for intersymbol interference channels—Combined invariance of trellis codes—Part I: Encoders and precoders,” IEEE coding and precoding,” IEEE Trans. Inform. Theory, vol. 42, pp. Trans. Inform. Theory, vol. 42, pp. 751–765, May 1996. 1053–1061, July 1996. [102] M. A. Tzannes, M. C. Tzannes, J. Proakis, and P. N. Heller, “DMT [72] R. Laroia, N. Farvardin, and S. Tretter, “On optimal shaping of multi- systems, DWMT systems, and filter banks,” in Proc. Int. Communication dimensional constellations,” IEEE Trans. Inform. Theory, vol. 40, pp. Conf., 1994. 1044–1056, July 1994. [103] G. Ungerboeck, ”Adaptive maximum-likelihood receiver for carrier- [73] R. Laroia, S. Tretter, and N. Farvardin, “A simple and effective modulated data-transmission systems,” IEEE Trans. Commun., vol. precoding scheme for noise whitening on intersymbol interference COM-22, pp. 1124–1136, May 1974. channels,” IEEE Trans. Commun., vol. 41, pp. 1460–1463, Oct. 1993. [104] , “Channel coding with multilevel/phase signals,” IEEE Trans. [74] E. A. Lee and D. G. Messerschmitt, Digital Communication, 2nd ed. Inform. Theory, vol. IT-28, pp. 55–67, Jan. 1982. Boston, MA: Kluwer, 1994. [105] , “Trellis-coded modulation with redundant signal sets: Part II,” [75] J. Leech and N. J. A. Sloane, “Sphere packing and error-correcting IEEE Commun. Mag., vol. 25, pp. 12–21, Feb. 1987. codes,” Canad. J. Math., vol. 23, pp. 718–745, 1971. [106] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Englewood [76] D. J. C. MacKay and R. M. Neal, “Near Shannon limit performance of Cliffs, NJ: Prentice-Hall, 1993. low-density parity-check codes,” Electron. Lett., vol. 32, pp. 1645–1646, [107] A. Vardy, “Trellis structure of codes,” in Handbook of Coding Theory Aug. 1996, reprinted in vol. 33, pp. 457–458, Mar. 1997. V. S. Pless et al., Eds. Amsterdam, The Netherlands: Elsevier, 1998. [77] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting [108] A. J. Viterbi and J. K. Omura, Principles of Digital Communication and Codes. Amsterdam, The Netherlands: North-Holland, 1977. Coding. New York: McGraw-Hill, 1979. FORNEY AND UNGERBOECK: MODULATION AND CODING FOR LINEAR GAUSSIAN CHANNELS 2415

[109] U. Wachsmann and J. Huber, “Power and bandwidth efficient digital 1281–1295, Dec. 1989. communication using turbo codes in multilevel codes,” Euro. Trans. [115] S. B. Weinstein and P. M. Ebert, “Data transmission by frequency- Telecommun., vol. 6, pp. 557–567, Sept. 1995. division multiplexing,” IEEE Trans. Commun., vol. COM-19, pp. [110] U. Wachsmann, R. Fischer, and J. Huber, “Multilevel codes: Parts 1–3,” 628–634, Oct. 1971. June 1997, submitted to IEEE Trans. Inform. Theory. [116] N. Wiberg, H.-A. Loeliger, and R. Kotter,¨ “Codes and iterative decoding [111] F.-Q. Wang and D. J. Costello, Jr., “Sequential decoding of trellis codes on general graphs,” Euro. Trans. Telecommun., vol. 6, pp. 513–526, at high spectral efficiencies,” IEEE Trans. Inform. Theory, vol. 43, pp. Sept. 1995. 2013–2019, Nov. 1997. [117] R. G. C. Williams, “A trellis code for V.fast,” CCITT V.fast rapporteur [112] L.-F. Wei, “Rotationally invariant convolutional channel encoding with meeting, Bath, U.K., Sept. 1992. expanded signal space, Part II: Nonlinear codes,” IEEE J. Select. Areas [118] J. M. Wozencraft and I. M. Jacobs, Principles of Communication Commun., vol. 2, pp. 672–686, Sept. 1984. Engineering. New York: Wiley, 1965. [113] , “Trellis-coded modulation using multidimensional constella- [119] J. M. Wozencraft and B. Reiffen, Sequential Decoding. Cambridge, tions,” IEEE Trans. Inform. Theory, vol. IT-33, pp. 483–501, July 1987. MA: MIT Press, 1961. [114] , “Rotationally invariant trellis-coded with multi- [120] E. Zehavi and J. K. Wolf, “On the performance evaluation of trellis dimensional M-PSK,” IEEE J. Select. Areas Commun., vol. 7, pp. codes,” IEEE Trans. Inform. Theory, vol. IT-33, pp. 192–201, Jan. 1987.