The Method of Types
Total Page:16
File Type:pdf, Size:1020Kb
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 2505 The Method of Types Imre Csiszar,´ Fellow, IEEE (Invited Paper) Abstract— The method of types is one of the key technical the “random coding” upper bound and the “sphere packing” tools in Shannon Theory, and this tool is valuable also in other lower bound to the error probability of the best code of a given fields. In this paper, some key applications will be presented in rate less than capacity (Fano [44], Gallager [46], Shannon, sufficient detail enabling an interested nonspecialist to gain a working knowledge of the method, and a wide selection of fur- Gallager, and Berlekamp [74]). These bounds exponentially ther applications will be surveyed. These range from hypothesis coincide for rates above a “critical rate” and provide the exact testing and large deviations theory through error exponents for error exponent of a DMC for such rates. These results could discrete memoryless channels and capacity of arbitrarily varying not be obtained via typical sequences, and their first proofs channels to multiuser problems. While the method of types is suitable primarily for discrete memoryless models, its extensions used analytic techniques that gave little insight. to certain models with memory will also be discussed. It turned out in the 1970’s that a simple refinement of the typical sequence approach is effective—at least in the discrete Index Terms—Arbitrarily varying channels, choice of decoder, counting approach, error exponents, extended type concepts, memoryless context—also for error exponents, as well as for hypothesis testing, large deviations, multiuser problems, universal situations where the probabilistic model is partially unknown. coding. The idea of this refinement, known as the method of types, is to partition the -length sequences into classes according to type (empirical distribution). Then the error event of interest is I. INTRODUCTION partitioned into its intersections with the type classes, and the NE of Shannon’s key discoveries was that—for quite error probability is obtained by summing the probabilities of Ogeneral source models—the negative logarithm of the these intersections. The first key fact is that the number of type probability of a typical long sequence divided by the number of classes grows subexponentially with . This implies that the symbols is close to the source entropy ; the total probability error probability has the same exponential asymptotics as the of all -length sequences not having this property is arbitrarily largest one among the probabilities of the above intersections. small if is large. Thus “it is possible for most purposes The second key fact is that sequences of the same type are to treat long sequences as though there were just of equiprobable under a memoryless probabilistic model. Hence them, each with a probability ” [75, p. 24]. Shannon to bound the probabilities of intersections as above it suffices demonstrated the power of this idea also in the context of to bound their cardinalities, which is often quite easy. This channels. It should be noted that Shannon [75] used the term informal description assumes models involving one set of “typical sequence” in an intuitive rather than technical sense. sequences (source coding or hypothesis testing); if two or more Formal definitions of typicality, introduced later, need not sets of sequences are involved (as in channel coding), joint concern us here. types have to be considered. At the first stage of development of information theory, the In this paper, we will illustrate the working and the power main theoretical issue was to find the best rates of source of the method of types via a sample of examples that the or channel block codes that, assuming a known probabilistic author considers typical and both technically and historically model, guarantee arbitrarily small probability of error (or toler- interesting. The simple technical background, including con- able average distortion) when the blocklength is sufficiently venient notation, will be introduced in Section II. The first large. For this purpose, covered by the previous quotation from key applications, viz. universally attainable exponential error Shannon [75], typical sequences served as a very efficient and bounds for hypothesis testing and channel block-coding, will intuitive tool, as demonstrated by the book of Wolfowitz [81]. be treated in Sections III and IV, complete with proofs. The The limitations of this tool became apparent when interest universally attainable error exponent for source block-coding shifted towards the speed of convergence to zero of the error arises as a special case of the hypothesis testing result. A probability as . Major achievements of the 1960’s were, in the context of discrete memoryless channels (DMC’s), basic result of large deviations theory is also included in Section III. Section V is devoted to the arbitrarily varying channel (AVC) capacity problem. Here proofs could not be Manuscript received December 16, 1997; revised April 20, 1998. This work given in full, but the key steps are reproduced in detail was supported by the Hungarian National Foundation for Scientific Research showing how the results were actually obtained and how under Grant T016386. naturally the method of types suggested a good decoder. Other The author is with the Mathematical Institute of the Hungarian Academy of Sciences, H1364 Budapest, P.O. Box 127, Hungary. typical applications are reviewed in Section VI, including Publisher Item Identifier S 0018-9448(98)05285-7. rate-distortion theory, source-channel error exponents, and 0018–9448/98$10.00 © 1998 IEEE 2506 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 6, OCTOBER 1998 multiuser problems. Although the method of types is tailored divergence, i.e., to discrete memoryless models, there exist extensions of the type concept suitable for certain models with memory. These will be discussed in Section VII. The selection of problems and results treated in this paper has been inevitably subjective. To survey all applications of the method of types would have required a paper the size of a book. In particular, several important applications in with the standard conventions that , Combinatorics are not covered, in this respect the reader if . Here and in the sequel the base should consult the paper of Korner¨ and Orlitsky in this issue. of and of is arbitrary but the same; the usual choices While historical aspects were taken seriously, and a rather are or . large list of references has been included, no attempts were The type of a sequence and the joint made to give a detailed account of the history of the method type of and are the PD’s of types. About its origins let us just make the following brief and defined by letting and be comments. the relative frequency of among and of The ingredients have been around for a long time. In among , respectively, for all , probability theory, they appear in the basic papers of Sanov . Joint types of several -length sequences are defined [72] and Hoeffding [53] on large deviations, cf. Section similarly. The subset of consisting of the possible types III below. A similar counting approach had been used in of sequences is denoted by . statistical physics even earlier, dating back to Boltzmann Lemma II.1: [18]. A remarkable example is the paper of Schrodinger¨ [73] that predates modern large deviations theory but remained unknown outside the physics community until recently. Infor- mation theorists have also used ideas now considered pertinent Proof: Elementary combinatorics. to the method of types. Fano’s [44] approach to the DMC error exponent problem was based on “constant composition codes,” The probability that independent drawings from a PD and Berger’s [13] extension of the rate-distortion theorem to give , is denoted by . Similarly, the sources with partially known and variable statistics relied upon probability of receiving when is sent over his key lemma about covering a type class. Later, in the 1970’s, a DMC with matrix , is denoted by . Clearly, if several authors made substantial use of the concept now called have type and have joint type joint type, including Blahut [16], Dobrushin and Stambler [40], and Goppa [48]. While the ideas of the method of types were already around (II.1) in the 1970’s, this author believes that his research group is fairly credited for developing them to a general method, indeed, to a basic tool of the information theory of discrete memoryless systems. The key coworkers were Janos´ Korner¨ (II.2) and Katalin Marton. A systematic development appears in the book of Csiszar´ and Korner¨ [30]. Were that book written now, Here is defined for any by both authors would prefer to rely even more extensively on types, rather than typical sequences. Indeed, while “merging nearby types, i.e., the formalism of typical sequences has (II.3) the advantage of shortening computations” [30, p. 38], that where denotes the -marginal of . advantage is relatively minor in the discrete memoryless We will write , respectively , to denote context. On the other hand, the less delicate “typical sequence” that or is for each or with approach is more robust, it can be extended also to those or respectively. The divergences in models with memory or continuous alphabets for which the (II.1) and (II.2) are finite iff , respectively, . type idea apparently fails. For , the type class will be denoted by . Similarly, for we write : , , . II. TECHNICAL BACKGROUND The technical background of the method of types is very Lemma II.2: For any type simple. In the author’s information theory classes, the lemmas (II.4) below are part of the introductory material. will denote finite sets, unless stated otherwise; and for any PD the size of is denoted by . The set of all probability distributions (PD’s) on is denoted by .