A General Formula for Channel Capacity
Total Page:16
File Type:pdf, Size:1020Kb
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 40, NO. 4, JULY 1994 1147 A General Formula for Channel Capacity Sergio Verdi?, Fellow, IEEE, and Te Sun Han, Fellow, IEEE Abstract-A formula for the capacity of arbitrary single-user binary channel where the output codeword is equal to the channels without feedback (not necessarily information stable, transmitted codeword with probability l/2 and indepen- stationary, etc.) is proved. Capacity is shown to equal the supremum, over all input processes, of the input-output inf dent of the transmitted codeword with probability l/2. information rate defined as the liminf in probability of the The capacity of this channel is equal to 0 because arbi- normalized information density. The key to this result is a new trarily small error probability is unattainable. However converse approach based on a simple new lower bound on the the right-hand side of (1.2) is equal to l/2 bit/channel error probability of m-ary hypothesis tests among equiprobable use. hypotheses. A necessary and sufficient condition for the validity of the strong converse is given, as well as general expressions for The immediate question is whether there exists a com- e-capacity. pletely general formula for channel capacity, which does not require any assumption such as memorylessness,in- Index Terms-Shannon theory, channel capacity, channel cod- formation stability, stationarity, causality, etc. Such a for- ing theorem, channels with memory, strong converse. mula is found in this paper. Finding expressionsfor channel capacity in terms of the probabilistic description of the channel is the purpose of I. INTRODUCTION channel coding theorems. The literature on coding theo- HANNON’S formula [l] for channel capacity (the rems for single-user channels is vast (cf., e.g., [4]). Since S supremum of all rates R for which there exist se- Dobrushin’s information stability condition is not always quences of codes with vanishing error probability and easy to check for specific channels, a large number of whose size grows with the block length n as exp (rzR)), works have been devoted to showing the validity of (1.2) for classes of channels characterized by their memory C = maxI(X;Y), (1.1) X structure, such as finite-memory and asymptotically mem- oryless conditions. The first example of a channel for holds for memoryless channels. If the channel has mem- which formula (1.2) fails to hold was given in 1957 by ory, then (1.1) generalizes to the familiar limiting expres- Nedoma [5]. In order to go beyond (1.2) and obtain sion capacity formulas for information unstable channels, re- searchers typically considered averages of stationary er- godic channels, i.e., channels which, conditioned on the C = !lim s;f iI(X”; Yn>. (1.2) initial choice of a parameter, are information stable. A formula for averaged discrete memoryless channels was However, the capacity formula (1.2) does not hold in full obtained by Ahlswede [6] where he realized that the Fano generality; its validity was proved by Dobrushin [2] for the inequality fell short of providing a tight converse for those class of information stable channels. Those channels can channels. Another class of chanels that are not necessarily be roughly described as having the property that the input information stable was studied by Winkelbauer [7]: sta- that maximizes mutual information and its corresponding tionary discrete regular decomposablechannels with finite output behave ergodically. That ergodic behavior is the input memory. Using the ergodic decomposition theorem, key to generalize the use of the law of large numbers in Winkelbauer arrived at a formula for e-capacity that holds the proof of the direct part of the memoryless channel for all but a countable number of values of E. Nedoma [81 coding theorem. Information stability is not a superfluous had shown that some stationary nonergodic channels can- sufficient condition for the validity of (1.2).l Consider a not be represented as a mixture of ergodic channels; however, the use of the ergodic decomposition theorem Manuscript received December 15, 1992; revised June 12, 1993. This was circumvented by Kieffer [9] who showed that work was supported in part by the National Science Foundation under PYI Award ECSE-8857689 and by a grant from NEC. This paper was Winkelbauer’s capacity formula applies to all discrete presented in part at the 1993 IEEE workshop on Information Theory, stationary nonanticipatory channels. This was achieved by Shizuoka, Japan, June 1993. a converse whose proof involves Fano’s and Chebyshev’s S. Verdu is with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544. inequalities plus a generalized Shannon-McMillan Theo- T. S. Han is with the Graduate School of Information Systems, rem for periodic measures.The stationarity of the channel University of Electra-Communications, Tokyo 182, Japan. is a crucial assumption in that argument. IEEE Log Number 9402452. ‘In fact, it was shown by Hu [3] that information stability is essentially Using the Fano inequality, it can be easily shown (cf. equivalent to the validity of formula (1.2). Section III) that the capacity of every channel (defined in 0018-9448/94$04.00 0 1994 IEEE 1148 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 40, NO. 4, JULY 1994 the conventional way, cf. Section II) satisfies conditional output measure with respect to the uncondi- tional output measure. C 4 liminf sup I1(X”;Yn). (1.3) The notion of inf/sup-information/entropy rates and n-m xrl n the recognition of their key role in dealing with noner- To establish equality in (1.3), the direct part of the coding godic/nonstationary sourcesare due to [lo]. In particular, theorem needs to assume information stability of the that paper shows that the minimum achievable source channel. Thus, the main existing results that constitute coding rate for any finite-alphabet source X = {X”}z= 1 is our starting point are a converse theorem (i.e., an upper equal to its sup-entropy rate H(X), defined as the limsup bound on capacity) which holds in full generality and a in probability of (l/n> log l/Pxn(X”). In contrast to the direct theorem which holds for information stable chan- general capacity formula presented in this paper, the nels. At first glance, this may lead one to conclude that general source coding result can be shown by generalizing the key to a general capacity formula is a new direct existing proofs. theorem which holds without assumptions.However, the The definition of channel as a sequence of finite- foregoing example shows that the converse (1.3) is not dimensional conditional distributions can be found in tight in that case.Thus, what is needed is a new converse well-known contributions to the Shannon-theoretic litera- which is tight for every channel. Such a converse is the ture (e.g., Dobrushin [2], Wolfowitz [12, ch. 71,and Csiszar main result of this paper. It is obtained without recourse and Kiirner [13, p. loo]), although, as we saw, previous to the Fano inequality which, as we will see, cannot lead coding theorems imposed restrictions on the allowable to the desired result. The proof that the new converseis class of conditional distributions. Essentially the same tight (i.e., a general direct theorem) follows from the general channel model was analyzed in [26] arriving at a conventional argument once the right definition is made. capacity formula which is not quite correct. A different The capacity formula proved in this paper is approach has been followed in the ergodic-theoretic liter- ature, which defines a channel as a conditional distribu- c = supJ(X; Y). (1.4) tion between spacesof doubly infinite sequences.4In that X setting (and within the domain of block coding [14]), In (1.4), X denotes an input process in the form of a codewords are preceded by a prehistory (a left-sided infi- sequence of finite-dimensional distributions X = {X” = nite sequence)and followed by a posthistory (a right-sided (Xj”);+., X(“‘>]T=i. We denote by Y = {Y” = infinite sequence);the error probability may be defined in (yp’“‘,... , YJ’$]r= I the corresponding output sequence of a worst casesense over all possible input pre- and posthis- finite-dimensional distributions induced by X via the tories. The channel definition adopted in this paper, channel W = {IV” = P,,,,,: A” * Bn}rZ1, which is an namely, a sequence of finite-dimensional distributions, arbitrary sequence of n-dimensional conditional output captures the physical situation to be modeled where block distributions from A” to B”, where A and B are the codewords are transmitted through the channel. It is input and output alphabets, respectively.’ The symbol possible to encompassphysical models that incorporate J(X; Y) appearing in (1.4) is the inf-information rate be- anticipation, unlimited memory, nonstationarity, etc., be- tween X and Y, which is defined in [lo] as the liminf in cause we avoid placing restrictions on the sequence of probability3 of the sequence of normalized information conditional distributions. Instead of taking the worst case densities (l/n>i,.,dX”; Y”), where error probability over all possible pre- and posthistories, whatever statistical knowledge is available about those . lXnWa(an; 6”) = log Pynlxn(b”lan) . (1.5) sequences can be incorporated by averaging the condi- Pydbn) tional transition probabilities (and, thus, averaging the For ease of notation and to highlight the simplicity of error probability) over all possible pre- and posthistories. the proofs, we have assumedin (1.5) and throughout the For example, consider a simple channel with memory: paper that the input and output alphabets are finite. yi = xi + xiel + ni. However, it will be apparent from our proofs that the where {nJ is an i.i.d.