<<

Contents

Telektronikk Feature: Theory and its Applications

Volume 98 No. 1 – 2002 1 Guest Editorial; Geir E. Øien ISSN 0085-7130 A Historical Perspective on Editor: Ola Espvik 3 Information Theory: The Foundation of Modern Communications; Tel: (+47) 913 14 507 Geir E. Øien email: [email protected] 20 On Shannon and “Shannon’s Formula”; Lars Lundheim Status section editor: Per Hjalmar Lehne 30 Statistical Communication Theory 1948 – 1949; Nic. Knudtzon Tel: (+47) 916 94 909 email: [email protected] Novel Developments on Channel Capacity and Editorial assistant: Gunhild Luke 35 The True Channel Capacity of Pair Cables With Respect to Near End Tel: (+47) 415 14 125 Crosstalk; Nils Holte email: [email protected] 47 Bounds on the Average Spectral Efficiency of Adaptive Coded Modulation; Editorial office: Kjell J. Hole Telenor Communication AS Telenor R&D 53 Breaking the Barriers of Shannon’s Capacity; An Overview of MIMO NO-1331 Fornebu Wireless Systems; David Gesbert and Jabran Akhtar Norway Tel: (+47) 67 89 00 00 Turbo Coding and Iterative Decoding: Theory and Applications email: [email protected] 65 An Introduction to Turbo Codes and Iterative Decoding; Øyvind Ytrehus Editorial board: Berit Svendsen, CTO Telenor 78 Theory and Practice of Error Control Coding for Satellite and Ole P. Håkonsen, Professor Fixed Systems; Pål Orten and Bjarne Risløw Oddvar Hesjedal, Director Bjørn Løken, Director Modulation, Coding and Beyond Graphic design: Design Consult AS, Oslo 92 A New Look at the Exact BER Evaluation of PAM, QUAM and PSK Constellations; Pavan K. Vitthaladevuni and Mohamed-Slim Alouini Layout and illustrations: Gunhild Luke and Britt Kjus, 106 Performance Analysis of Adaptive Coded Modulation with Antenna Diversity Telenor R&D and Feedback Delay; Kjell J. Hole, Henrik Holm and Geir E. Øien

Prepress and printing: 114 Shannon Mappings for Robust Communication; Tor A. Ramstad Optimal as, Oslo

Circulation: Historical Papers 3,750 129 Information Theory; Nic. Knudtzon

139 Statistical Communication Theory; Nic. Knudtzon

145 Statistically Optimal Networks; Nic. Knudtzon

Special

153 Multiple Bottom Lines? Telenor’s Operations in Bangladesh; Arvind Singhal, Peer J. Svenkerud and Einar Flydal

Status

163 Introduction; Per Hjalmar Lehne

164 UMTS Network Domain Security; Geir M. Køien Guest Editorial

GEIR E. ØIEN

On February 24, 2001, Claude Elwood Shannon In a world where predictions for the future per- died at the age of 84. As the father of, and most formance of telecommunication systems some- important contributor to, the field of information times seem to be made more out of marketing theory, he ranks as one of the most brilliant and concerns than out of a scientifically sound important of all 20th century scientists. In a trib- judgement, information theory still has a lot to ute speech made by senator John D. Rockefeller teach us – insights that are sometimes sobering, in the US Congress after Shannon’s death, his but may also be encouraging. The optimistic work was referred to as “The Magna Carta of the vision suggested by some, that “any telecommu- information age”. Had there been awarded a nication service can be made available any- Nobel prize within the information and commu- where, anytime, and to anyone” in the future, nication sciences, no one would have been a can fairly easily be shown to have no roots in Geir E Øien more worthy candidate than Shannon. reality. One example may illustrate this: The claims originally made regarding the available Shannon’s ideas, first presented to the world at rates and coverage for the upcoming Universal large in his seminal 1948 paper “A Mathemati- Mobile Telecommunications System (UMTS) cal theory of communication” in Bell System so far seem a bit over-optimistic … Technical Journal, have been crucial in enabling the information and communication technologi- However, in some cases information theory can cal advances which have created today’s infor- also be used to uncover a performance potential mation society. As with all true pioneers, Shan- beyond what was previously thought possible, non’s way of thinking about information and and aid in the design of systems realizing this communication represented a true paradigm potential. As an important example of the appli- shift. For example, prior to the arrival of his cability of Shannon’s results, his theory showed seminal papers of the late 40s and 50s, there us how to design more efficient communication simply had been no satisfactory way of mod- and storage systems by demonstrating the enor- elling and analyzing the process of information mous gains achievable by coding, and by provid- generation, transfer, and reception from a trans- ing the intuition for the correct design of coding mitter to a receiver over a noisy communication systems. The sophisticated coding schemes used channel – which actually is a generic description in systems as diverse as deep-space communica- of how all practical communication systems work. tion systems, and home compact disk audio sys- Front cover: tems, owe their success to the insights provided With Shannon’s introduction of a generic com- by Shannon’s theory. Information appears as a munication system model, his view of informa- change in the detectable tion as a probabilistic entity (sidestepping its Shannon published many more important and pattern actual semantic meaning), the insight that the influential works in a variety of disciplines, The artist Odd Andersen process of information transmission is funda- including Boolean algebra and cryptography. visualises a set of planes mentally stochastic in nature, and his invention His work has had an influence on such diverse as areas for information to of precise mathematical tools to give a complete fields as linguistics, phonetics, psychology, gam- appear. Whatever kind of predictable pattern that performance analysis of his model, the door was bling theory, stock trading, artificial intelligence, already may exist on those suddenly opened to a much more fundamental and digital circuit design. It also has strong links planes has no interest. Only understanding of the possibilities and limitations to disciplines such as thermodynamics and bio- when that pattern is changed of communication systems. In the words of chemistry. has information occurred. another notable information theorist (and former When part of the pattern of one plane is moved through colleague of Shannon), David Slepian, “Proba- Shannon was also known for his playfulness and a transmission channel to bly no single work in this century has more pro- eclectic interests, which led to famous stunts another plane and changes foundly altered man’s understanding of commu- such as juggling while riding a unicycle down its pattern, this is seen as nication than C.E. Shannon’s article, “A mathe- the halls of Bell Labs. He designed and built information received by the matical theory of communication”, first pub- chess-playing, maze-solving, juggling, and new plane. lished in 1948. The ideas in Shannon’s paper mind-reading machines. These activities bear The artist’s generic message: were soon picked up by communication engi- proof to Shannon’s claim that his motivation Information has been was always curiosity more than usefulness. In produced and understood neers and mathematicians around the world. when the pattern of one plane They were elaborated upon, extended, and com- an age where basic research motivated purely by has become changed and plemented with new related ideas. The subject scientific curiosity sometimes seem to be at the detected. thrived and grew to become a well-rounded and losing end as far as public interest and funding is Ola Espvik, Editor in Chief exciting chapter in the annals of science.” concerned, this is a statement worth remember-

Telektronikk 1.2002 1 ing. The success of information theory clearly In fact, these are particularly exciting times for shows how one person’s curiosity can translate information theory, because we now finally have into very useful results. available powerful techniques for actually approaching the performance limits predicted In Norway, Shannon’s memory and the achieve- by Shannon. One important example is the class ments of information theory were recently hon- of error control codes called “turbo codes”. Why ored with a Claude E. Shannon In Memoriam do these codes work so well? Because their con- Seminar, arranged by Telenor R&D at Kjeller struction turns out to be based to a great extent on the August 9, 2001. The seminar drew over upon the “nonconstructive” proof techniques 100 participants from Norwegian industry and used by Shannon in his analysis! academia. They came to listen to technical talks by some of Norway’s foremost experts within Regarding my four subgoals; first, the issue the fields of information and communication should place information theory in a historical theory, as well as to some historically and philo- context, while at the same time introducing its sophically flavored talks. The seminar was high- basic principles and discussing their importance. lighted by a unique reminiscence by former The papers collected under the heading “A his- Telenor R&D director Nic. Knudtzon, the only toric perspective on information theory” serve Norwegian to have met Shannon at the pioneer- this purpose. ing time when information theory was actually born. This special issue of Telektronikk was Secondly, I wanted to show that information the- inspired to a great extent by the success of this ory is still very much an active research field of seminar, and you will find papers by many of the practical importance, and that the Norwegian same scientists here – including Dr. Knudtzon’s research community is currently making some wonderful reminiscence, as well as three 1950 important contributions to this research. Thirdly, papers of his which rank as the very first presen- some of the most important applications of tations of information theory made in Norway. information theory should be highlighted. Finally, the most promising and exciting current When putting together this issue, I set myself developments in communication systems design one main goal and four sub-goals. The main goal should be included. I do believe that the final was to collect a high-quality collection of papers result reflects these goals successfully, and that which could serve to dispel the often-heard view the standard of the papers within is very high. that “information theory is not constructive, but I hope the reader will find them as inspiring, is only concerned with theoretical limits which insightful, and useful as I do. For me, putting can never be reached in practice”. It is my firm together this issue has truly been a labour of view that information theory can be very con- love! structive, in the sense that it has a lot to say about what one should do and what one should not do when designing communication systems.

2 Telektronikk 1.2002 Information Theory: The Foundation of Modern Communications

GEIR E. ØIEN

This paper gives a brief tutorial introduction to information theory, the fundamental concepts and theorems of which are at the very heart of modern communications and information technology. The results of information theory tell us:

• how to quantify the information content in a set of data; • how to model and analyze a wide range of communication channels and their capacity for trans- mitting information; • conditions under which error-free representation and transmission of information is possible, and when it is strictly impossible;

Geir E Øien (36) received his • conditions for the design of good ways of (codes for) representing information so as to achieve MSc and PhD degrees from the data compaction and compression, and channel error robustness; Norwegian University of Science • which minimal quality reduction we may expect for a given transmission rate, a given information and Technology (NTNU) in 1989 and 1993, respectively. From source, and a given ; 1994 until 1996 he was with • how we may split our communication systems into subsystems, in order to simplify design without Stavanger University College as the loss of theoretical performance. associate professor. Since 1996 he has been with NTNU, since 2001 as full professor of infor- We will present the basic concepts and most famous theorems of information theory and explain their mation theory. usefulness in the design and analysis of communications and information storage systems. Prof. Øien is a member of IEEE and the Norwegian Signal Pro- cessing Society. He is author/ co-author of more than 40 1 Introduction In the following, we shall define and explain research papers. His current research interests are in infor- Modern information theory was basically in- some of the most basic concepts and results mation theory, communication vented by Claude E. Shannon in a series of clas- introduced by Shannon, and try to expand on theory, and signal processing, sic papers [1], [2], [3], and has since been ex- how they affect the design of communication with emphasis on analysis and design of wireless communica- tended and refined by a large number of re- and information storage systems. Due to space tion systems. searchers from all over the world. To some, the limitations and to ease the reading of this paper, [email protected] field of information theory might seem overtly we shall not provide proofs for any of the results concerned with mathematical and statistical presented. Most of the notation and language is rigor, and with an abundance of long, technically based on that of Blahut [4] and the more recent involved mathematical proofs. However, the book by Cover and Thomas [5]. Another impor- advantages of this rigor are many. Starting out tant text on information theory is Berger’s clas- with mostly simple, but stringently defined, sic on rate distortion theory [6]. mathematical assumptions – typically corre- sponding to practical/physical constraints under 2 Shannon’s Generic Model which real-world communication systems are to of a Communication System be designed – one may derive fundamental lim- Central to the development of information the- its on the performance of such systems, as well ory is the notion of a generic communication as making statements about the conditions under system model which can be used as a unifying which a given performance can be attained. In framework suitable for describing a wide range many situations we are also able to obtain useful of real-world systems. In the generic communi- guidelines for how the practical design should cation system model proposed by Shannon, (not to mention should not) be done in order to information is transmitted from an information approach the theoretical performance bounds. source to a user, by means of a transmitter, a

Source Transmitter Channel Receiver User

Noise and impairments

Figure 1 Generic communication system model

Telektronikk 1.2002 3 communication channel, and a receiver. This is limits, and to devise methods for efficient trans- depicted in Figure 1. Examples of possible infor- mission over the channel – that is, coding algo- mation sources in this context are: human speak- rithms. ers, video cameras, musical instruments, micro- phones, loudspeakers, and computer keyboards. The ultimate goal of the coding is to exploit the communication channel as well as possible. This The transmitter and receiver perform informa- is to say that we want to spend as little as possi- tion coding, which means processing the mes- ble of the limited physical resources we have sages generated by the information source in available – time, , transmit power, or order to disc space – on the transmission or storage of information, in order to maximize the number of • represent (encode) the messages in a suitable users, systems, or services that are able to share way during transmission over the channel, and these resources. At the same time we need to ensure that the quality of the information • regenerate (decode) the messages at the retrieved at the receiver end is satisfactory. receiver end, with as little deviation from what was originally transmitted as is required for By “quality” we usually mean something like the service under discussion. “degree of similarity to the transmitted mes- sage”. The appropriate measure of, and typical Note that the term “transmission” here is demands on, quality is dependent on the type of intended to cover transmission both in space information transmitted and on the application or (between two different locations) and in time service: For data communications, the criterion (i.e. storage of data on an imperfect medium). might be that the average information bit error Examples of physical communication channels probability (bit error rate – BER) should be less thus range from wireless channels such as satel- than, say, 10-9 – whereas for speech and video lite links, mobile radio channels, and broadcast communications, the aural or visual quality per- channels, via wired links such as optical fibres ceived by human ears or eyes is the most impor- and copper transmission lines, to magnetic stor- tant thing, and a much higher BER can usually age media and CDs. be accepted.

The impairments to the transmitted messages The accepted perceptual quality range also may differ a lot depending on the type of chan- varies with the application, and is different e.g. nel, but may include for mobile telephony (low-to-medium quality application) and high fidelity audio (very high • thermal noise and atmospheric noise (additive quality application). Real-time applications such noise); as two-way speech communication also place constraints on average delay, buffering, proba- • interference from other sources and system bility of no transmission (“outage”), etc. users; 3 Information – What is it? • reflections and scattering of transmitted radio One of the most basic questions answered by wave power during propagation through the information theory is “What is information – terrain; in a quantitative sense?” When discussing this issue it will be useful to distinguish between • signal attenuation due to path loss in radio two different perspectives: channels or transmission line resistance; • How to quantify the information content in the • intersymbol-interference due to a lack of data produced by a source (as discussed in available bandwidth; Subsection 3.2).

• Doppler shifts due to relative movements • How to quantify the information content trans- between receiver and transmitter; mitted from a source to a user by means of a channel (as discussed in Section 5). • nonlinear effects, e.g. due to nonlinear power amplifier characteristics; 3.1 Information as a Measure of Unpredictability • stains or scratches on a CD-ROM. Regardless of which of the two above perspec- tives is used, an intuitively appealing line of Whatever the channel, the aim of information thinking about information content is as follows: theory is to model these impairments in a quan- An event has a low information content if its titative way, mainly by statistical models. The “future” can be forecast with a high degree of modelling is then used to deduce performance accuracy, based on knowledge of its “past”. In

4 Telektronikk 1.2002 this case an observer’s uncertainty about the 5 future of the event is low, which means he will not receive much new information by continued 4 observation of the event. 3 Thus, an event’s information content is inti- mately linked to the a priori degree of random- 2 ness and uncertainty associated with the event: The more predictable an event is, the less knowl- 1 edge we need to describe or predict it; thus the less information we receive by observing it. 0

Conversely, if an event is highly unpredictable, a -1 detailed observation of its actual development is needed in order to describe it, which means its -2 information content is high. This holds regard- less of the “physical” content of the event in -3 question – sometimes referred to as the semantic meaning of the information. -4 0102030405060708090100 The above can be restated as saying that the actual semantic meaning in a message is of no conse- quence for the amount of information carried by the message. All that matters is the degree of pre- dictability; i.e. its statistical properties. An example of this is found in language mod- Figure 2 Example of time- elling, applied e.g. in speech recognition, where discrete source output Note, however, that the predictability of an event an observer will need knowledge of the structure is usually strongly linked to our knowledge of (usually in the sense of Markov models de- physical or statistical properties and models of scribed by transition probabilities) and vocabu- the event in question. The same event may lary of a language if he is to attempt predicting appear unpredictable to one observer and highly future words in a sentence, based on what has predictable to another, the difference being that already been said. only the latter observer has a priori knowledge of the underlying model describing the event.

1

0.5

0

-0.5 0 100 200 300 400 500 600 Amplitude-continuous (upper curve)

1

0.8

0.6

0.4

0.2 Figure 3 Examples of 0 continuous-time source outputs -0.2 which are respectively -0.4 amplitude-continuous (upper 0 100 200 300 400 500 600 curve) and amplitude-discrete Amplitude-discrete (lower curve) (lower curve)

Telektronikk 1.2002 5 In a communication context, at least from the the probability of the source output xj is known second of the two above perspectives, the (e.g. through histogram estimation based on “events” under study are usually information training data) to be pj. messages passed from a source to a user. The degree of uncertainty on the receiver side, and 3.2.1 Source the a priori predictability of the messages, will Let us now introduce a notion which will turn then depend on the statistical properties of the out to be intimately linked to source information information source under study, the impairments content: The entropy of a source S as described introduced by the physical communication chan- above is defined as nel, and the way the transmitter and receiver are designed. J H(S)=− pj log2 pj, (1) 3.2 The Information Content j=1 in a Source and is measured in information bits per source Armed with these basic insights, we now address symbol. Note that the base of the in the following question: “What is the information the general definition of entropy is arbitrary, but content in a given source that outputs messages for the entropy to be measured in units of bits which we are interested in storing, compressing, per sample, the binary logarithm must be used. or transmitting?” An illustration of the entropy for the special case Initially, to make the explanations and notation of a binary source (J = 2) is shown as a function simple while still illuminating the general the- of symbol probability p (the two symbols in this ory, we will consider an information source that case have probabilities p and 1 – p, since their has been digitized, such that its output is discrete probabilities must sum to 1) in Figure 4. Note both in time (sampled) and amplitude (quan- that the entropy reaches its maximum – of 1 tized). Figure 2 shows outputs from a discrete- information bit per symbol – when the two pos- time source, while Figure 3 shows examples of sible outcomes are equiprobable, i.e. p = 0.5. continuous-time source outputs that are ampli- tude continuous and discrete, respectively. When 3.2.2 Entropy as a Measure of both time and amplitude have been discretized Information Content we say that the source output consists of dis- A commonly used information theoretic state- crete-valued samples, and the source is referred ment is that “the source entropy (as defined to as discrete. above) is a measure of the average amount of information per symbol produced by the We denote the set of J possible sample values source.” What does this mean? Shannon’s first (also called representation levels, or more gener- coding theorem (the noiseless source coding Figure 4 The entropy of ally source symbols) by S = {x1, ..., xJ}. This set theorem) addresses this question in a precise a binary source is called the source alphabet. We assume that way [5].

In the simplest version of this theorem, it is assumed that the source in question is memory- 1 less, i.e. that the different source symbols output by the source are statistically independent of 0.9 each other. Also, for simplicity, we shall assume that we want to represent each source symbol 0.8 by a distinct binary word, for storage or trans- mission on a channel where only binary symbols 0.7 are admitted (such as is the case e.g. on a CD).

0.6 We denote by lj the number of bits in a code- 0.5 word which will be used to represent the source symbol xj. The set of codewords for all j make 0.4 up the source code. Reasonable demands on this code are that it should be able to encode any 0.3 symbol string output by the source without

Entropy (infomation bits/symbol) errors, and that it should be uniquely decodable, 0.2 i.e. an arbitrary encoded binary string should be 0.1 possible to decode without errors into a unique sequence of source symbols. 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 In this situation H(S) turns out to be the lower Probability p limit for the average number of bits per symbol

6 Telektronikk 1.2002 ¯ (average codeword length), l , we may use to 4.1 Huffman Coding represent the source output without errors or As stated previously, there exists a constructive ambiguities: algorithm which may be used to build an optimal J code according to the above principles. The re- ¯ l = pjlj ≥ H(S). (2) sulting code is the well-known Huffman code [7]. j=1 In other words, it is not possible to construct a The Huffman algorithm is best illustrated by code with a lower average codeword length than means of an example: Consider a source with 5 H(S) bits per codeword. Conversely, it is theo- different possible symbol outcomes x1, …, x5, retically possible to construct codes whose aver- with corresponding probabilities given by age length comes arbitrarily close to this limit (from above). This is the essence of the noiseless p1 = 0.30 coding theorem. p2 = 0.24 p3 = 0.20 Thus the notion of entropy as a measure of the p4 = 0.15 source information content is rather obvious if p5 = 0.11 (3) viewed from a data storage viewpoint. Note that the entropy is maximal (log2 J bits/symbol) if all The Huffman code construction for this case is source symbols are equiprobable, i.e. maximally illustrated in Figure 6. As seen from the figure, unpredictable. It is minimal (0 bits/symbol) for a the symbol probabilities are initially ordered deterministic source1), whose future output “tells from the largest to the smallest. The two least us nothing new” and is thus totally redundant. probable symbols (to begin with, symbols 4 and 5 in this case) are assigned a distinct binary code 4 Source Code Construction symbol each – 0 or 1 – to be able to distinguish Of course, one thing is to know the attainable between them. Then their probabilities are limit of code efficiency, quite another is to actu- added, to form the probability of a “quasi-sym- ally find a code approaching this limit. Happily, bol” whose outcome is defined as “one of the information theory also provides necessary and two outcomes whose probabilities have just been sufficient conditions for how to construct the added”. The code symbols just assigned can be codewords of an optimal code, which actually used to distinguish between the two possible approaches the above lower codeword length quasisymbol outcomes. limit as closely as possible [4]. Then the remaining probabilities are re-ordered, We will here still consider only the (practically with the quasi-symbol probability being inserted quite interesting) case of a binary, variable- at its proper place according to its size. The length, and prefix-free code – i.e. a code whose above procedure is thereafter repeated for the codewords are binary strings, where different remaining symbols/quasi-symbols. Repeating codewords might have different lengths, and in the procedure several times, in the end there which no codeword is a prefix of another. The practical interest of this last property is to make the code instantaneously decodable, i.e. each codeword can be immediately decoded without any need for “look-ahead” to other codewords.

A necessary condition for a variable-length pre- fix-free code to be uniquely decodable is that all of its codewords can be put on a code tree. An example of such a tree is shown, for the example 0 1 of a source alphabet with four symbols, in Fig- ure 5. The way to obtain a codeword from the tree is to start at the top node and write down the binary-valued labels as the tree is traversed down to one of the end nodes. Each resulting 0 1 sequence of labels then constitutes one code- word – one for each end node. It is easily seen that the set of codewords thus constructed has Figure 5 Code tree for the the prefix-free property. binary prefix-free code 0 1 {0, 10, 110, 111}

1) This is the case e.g. if p1 = 1 and p2 = … = pJ = 0.

Telektronikk 1.2002 7 0.30 0.30 0.44 0.56 If the source is assumed memoryless, each vec- 00 tor is associated with a probability equal to the 0 product of the probabilities of its individual ele- ments. A Huffman code may now be constructed according to this new, extended set of probabili- 0.24 0.26 0.30 0.44 ties. The larger n is used, the closer to the 10 entropy we are able to come; in fact it is easy to 0 1 show that the following bounds on the average codeword length hold for a Huffman code when X2 0.20 0.24 0.26 n-fold source extension is used [5]: 11 1 H(S) ≤ l ≤ H(S) + [bits per symbol] (5) 0 1 n

X3 0.15 0.20 The above result is dependent on the fact that we 010 know the symbol probabilities. Thus a limitation 0 1 of the Huffman code is that the source’s proba- bility distribution has to be known or estimated X4 a priori in order for a code to be constructed. If 0.11 this distribution is estimated incorrectly, the 011 code will be less efficient – the average code- 1 word length will be increased by an amount which can also be quantified exactly using in- X 5 formation theoretic notions [5].

However, there also exist source code construc- Figure 6 Construction of will be only two symbols/quasi-symbols left tions which are able to asymptotically (as the Huffman code for a 5-symbol (such as is the case to the right in Figure 6), number of symbols to be coded goes to infinity) source. Two arrows pointing of which one is assigned a 1 and the other a 0. reach the entropy bound even without any a pri- to a single node denotes the The codewords which are used to distinguish ori knowledge of the source’s statistical proper- adding of two probabilities to between all original symbol outcomes can now ties. Such codes are called universal codes. The form a “quasisymbol”. The finally be deduced by tracing the paths from the Lempel-Ziw algorithm [5] for compaction of backtracing to find the code- last two symbols back to each original symbol, large data files is by far the most used universal word 010 for the symbol X4 is while at the same time noting the binary labels code. Note that universal codes do not necessar- shown as dashed arrows, with given at each forward repetition of the proce- ily provide compression for short symbol corresponding binary labels dure. In this example such backtracing gives sequences. marked by circles the codes 4.3 Sources with Memory C2 = {0, 1} for the two-symbol quasi-source, Most natural information sources are not memo- C3 = {1, 00, 01} for the three-symbol quasi- ryless; rather, the samples are correlated with source, each other. Indeed, this is what makes it possible C4 = {00, 01, 10, 11} for the four-symbol for us as observers to make sense of the source quasi-source, output. A good example is again that of human C5 = {00, 10, 11, 010, 011} for the original languages: It is precisely the fact that there exist five-symbol source. (4) well-defined grammatical rules and structure, which lead to interword and inter-sentence cor- C5 is the Huffman code for the source. relation (i.e. memory, or statistical dependen- cies) which makes it possible to convey and 4.2 Source Extension understand information by means of a written In general, to obtain a code with an average text or a spoken message. How does the above codeword length coming arbitrarily close to the theory apply to sources with memory? entropy, one must combine Huffman coding with source extension, a process in which vec- 4.3.1 tors of source symbols are treated as single sym- The answer can loosely be said to be as follows: bols from an “extended” source. Using vectors By applying source extension with sufficiently of length n, the extended source has Jn vector large vector lengths, any random source is essen- symbols. As an example, a ternary (3-dimen- tially made into a memoryless extended source. sional) extension of a binary source with alpha- There may be correlation or statistical depen- bet {0, 1} will yield a source with binary vector dency between individual elements within each outcomes in the alphabet {000, 001, 010, 011, vector, but not between two different vectors 100, 101, 110, 111}. – or “extended symbols” – if these are long enough. To make this argument stringent in the

8 Telektronikk 1.2002 general case, one needs to let the vector length P Figure 7 State diagram of a 1st order, a1|a1 n go to infinity, to account for correlation over 3-state Markov source arbitrary lags. The minimum average number of bits per original source symbol is now equal to the entropy rate of the source, a1

(n) H (S ) = lim HS , (6) n→∞ ( ) P P a2|a1 a1|a2 P P where H(S(n)) is the entropy of the source S ex- a1|a3 a3|a1 tended to n-dimensional “supersymbols”. The Pa |a entropy rate thus replaces the first order entropy 2 3 a3 a2 as the natural measure of information content in P a source with memory. a3|a2

Pa |a Pa |a 4.3.2 Markov Sources 3 3 2 2 One common way of modelling memory in sources is to use Markov models. This is an accepted model e.g. for natural speech produc- tion. A (discrete-time, discrete-amplitude) Markov model of order M has the property that its memory stretches back only M time in- J M where Ps = P s Ps s , with stances. More precisely, the probability of a ( n ) ∑l=1 ( l ) ( n l ) symbol outcome at time m is dependent on the J M P s = 1 . This set of equations can be outcomes at times m – 1, …, m – M, but not fur- ∑l=1 ( l ) ther back in time. Each possible set of the M pre- used to solve for the state probabilities P(sl), vious outcomes constitutes a model state. The l = 1, ..., JM, when the transition probabilities number of possible states is less than or equal to P(sn|sl) are given. The M J , where J is the number of possible outcomes H(x|sn) is given by the usual entropy formula, in the source alphabet. Statistically, the model is only with state-conditional symbol probabilities completely characterized by transition probabil- replacing unconditional symbol probabilities: ities which together quantify the probability of J each state being followed by each of the other HXs( n ) = − ∑ P(x j sn )log2 Px( j sn ). (8) states. j =1

A Markov model is usually visualized by means Equation (7) can now be used to find the ulti- of a state diagram. A state diagram of a simple mate limit of error-free compressibility for a 1st order Markov model with J = 3 source sym- Markov source, and a set of state-conditional bols {a1, a2, a3} is depicted in Figure 7. Here, Huffman codes can be used to provide efficient source compaction. That is to say, for each state Pa |a is by definition the probability of the i j we can construct an optimal Huffman code as transition from aj to ai. If there is no arrow before, but according to the conditional symbol between two of the states, it means that that probabilities corresponding to the state. The particular transition is impossible to make. encoder will then switch between the different Huffman codes according to the state of the For Markov sources, there is a simple expression source. for the entropy rate. It is simply the expected value – with respect to the state distribution – 4.4 Continuous-Amplitude Sources of the entropy conditioned on being in a given Most “natural” sources not only have memory, state. Denoting the nth state by sn we obtain: but are analogue in nature, and hence produce data that are continuous both in time (or space)

J M and amplitude. Examples are acoustic and elec- tromagnetic waves as encountered e.g. in audio H (S ) = ∑ P(sn )HXs( n ) (7) n=1 technology and in radio communications. Sam- pling and quantization operations convert such sources into time and amplitude-discrete ones as discussed in previous sections. Of these opera- tions, sampling can preserve all information as long as the source is band limited and the sam- pling theorem criterion, i.e. sampling at a rate of at least twice the bandwidth of the source, is fulfilled.

Telektronikk 1.2002 9 0.5 The first integral term in the above equation, however, is finite and well defined, and merits 0.45 its own name and notation: It is the differential 0.4 entropy, h(S), of the source. The main reason for the importance of this term, which in itself has 2) 0.35 no fundamental physical interpretation , is the fact that several important information theoretic 0.3 concepts, such as channel capacity, are defined in terms of differences between . 0.25 In the continuous case, the difference between 0.2 two (infinitely large) source entropies is always equal to the difference between the correspond- 0.15 ing (finite) differential entropies, whose values 0.1 can be computed. This situation arises because the term – lim∆x→0 log2 ∆x is the same for all 0.05 continuous sources, and thus is cancelled out whenever a difference is computed. 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 5 Information Transmission over a Channel We now consider the second of the two perspec- tives previously introduced regarding informa- Figure 8 A Gaussian pdf with Quantization, however, always reduces informa- tion content: That of quantifying the amount of expected value 0, variance 1, tion, as measured through the source entropy. information transferred to a user from a source with P(0.5 ≤ x ≤ 1) shown In fact, while the entropy of a discrete source is by means of a channel. In doing so, we need to coloured, together with the always finite, every nontrivial continuous-ampli- introduce another information theoretic notion, approximation (quantization) tude source has infinite entropy. This merely that of . This is a function pX(0.75) ⋅ 0.5 (i.e. ∆ =0.5) reflects the fact that with a continuous probabil- which can be said to measure the amount of use- ity density, every fixed amplitude value has ful source information received by a user. Here, probability zero, so it is impossible to predict it is important to emphasize the usability of (or even measure) with infinite resolution which information received, as the total “information values forthcoming samples will have. content” received can stem from many sources, including noise and interference. This type of The way to see that this is true mathematically is “information” is not only superfluous; as we to model the continuous amplitude distribution shall see it actually reduces the useful informa- as the result of a uniform quantization, with a tion. quantization interval ∆ that goes to zero. This is illustrated for a Gaussian source in Figure 8. Mutual information is one of the most funda- mental information theoretic concepts. We will The entropy may then be approximated by a sum show later that it can be used to describe not over the quantized probability distribution. This only the capacity properties of communication sum converges to an integral over a source prob- channels, but also the compression possibilities ability density function (pdf) fX(x) as ∆ goes to for a given source when a certain amount of zero. The expression for the amplitude-continu- error (distortion) is accepted in the source repre- ous source entropy then becomes sentation.  ∞ H(S)= fX (x)log2 fX (x)dx 5.1 The Mutual Information Function −∞ Initially consider two given discrete, possibly

− lim log2 ∆x (9) statistically dependent random variables, X ∈ ∆x→0 {x1, ..., xJ} and Y ∈ {y1, ..., yΚ}. We can think of The last term is an infinitely large positive con- X as the input to, and Y as the output from a stant, which is common to all analogue sources. communication channel; i.e. Y is a “noisy” ver- This implies that the absolute entropy of an ana- sion of X. X and Y have probability distributions T T logue source will be infinite. p = [p1, ..., pJ] and q = [q1, ..., qΚ] respec-

2) In some cases, though, it can be seen as a measure of the “relative randomness” of a given source compared to another: For example, for two different Gaussian sources, the one with the largest variance (and hence the largest amplitude variation from sample to sample) will have the largest . However, it is not possible to successfully generalize this interpretation to two sources with different functional forms on their probability density.

10 Telektronikk 1.2002 tively. The conditional probability distribution H(X) H(Y) Figure 9 Visualization of the of xj, given yk, is denoted Pj|k. Pj|k will then de- relationship between entropies, H(X|Y) scribe the probability of xj being transmitted if conditional entropies, mutual yk is observed at the receiver. information and H(Y|X) for two variables X and Y The mutual information between X and Y is now defined as I(X;Y)

I(X;Y) = H(X) – H(X|Y), (10) H(X,Y) where H(X) is the entropy of the source that out- puts X, and, by definition, K−1 H(X | Y )= qk · H(X | yk) k=0 if the above formulae are applied. For most prac- J−1 K−1 tical channels, 0 < I(X;Y) < H(X). = − qkPj|k log2 Pj|k (11) j=0 k=0 It is interesting to note that the mutual informa- H(X|Y) should be read “the entropy of X when Y tion is a symmetric function, i.e. I(X;Y) = I(Y;X) is observed”. It is a conditional entropy, which = H(Y) – H(Y|X). In some cases, like when com- means that it is based on the a priori knowledge puting the channel capacity of a channel with of some information – in this case Y. It is simply input X and output Y, this might actually be a a measure of the average information content more useful form of the function. Physically, (uncertainty) which is left in X when Y is known. it means that the transmitter’s degree of uncer- tainty regarding what will be received, is the Then, the mutual information I(X;Y ) can be same as the receiver’s uncertainty regarding thought of as the information Y gives about X – what was sent. the average reduction in the observer’s uncer- tainty about X which is brought about by observ- The above theory can be beautifully and simply ing Y. If X is the input to, and Y is the output illustrated by means of a Venn diagram, as from, a given communication channel, H(X|Y) shown in Figure 9. The joint entropy H(X;Y) can then be thought of as information “lost” by referred to in this figure is the total average the channel. uncertainty an observer will encounter regarding the simultaneous outcomes of X and Y. For an ideal (i.e. noiseless) channel Y would of course be equal to X, in which case it is easy to 5.2 Mutual Information in the show from the definition that H(X|Y) is zero. Continuous Case Hence I(X;Y) = H(X), as of course is intuitively Mutual information is an example of a concept correct in this case: All transmitted information defined in terms of a difference between two has been received; X has been fully determined entropies. If the variables X and Y are continuous by observing Y. random variables (with infinite absolute entro- pies), this difference reduces to a difference Correspondingly, if X and Y are independent between differential entropies. The formula variables, Y carries no information about X, and becomes knowing Y cannot reduce our uncertainty about X. In a communication context this would corre- IXY(;)= hX ()− hXY ( |) spond to the received signal being completely = − f( x , y )log f ( x ) dxdy buried in noise. Hence it is intuitive that H(X|Y) ∫ ∫ XY2 X| Y = H(X) – all information is lost during transmis- sion. Thus the intuitive result would be I(X;Y) = where fXY(x, y) is the joint pdf of X and Y. Both 0 in this case. This also turns out to be the case integrals are taken from –∞ to ∞. Through the use of Bayes’ rule [4] we may manipulate this

information sequence sequence decoded Channel Channel encoder C Channel decoder D message of channel of noisy information symbols channel message symbols Figure 10 Channel encoding Noise and distortion and decoding as mappings

Telektronikk 1.2002 11 Figure 11 Input-output Pb |a An important question now is: 1 1 b1 transition diagram for discrete memoryless channel Under what circumstances, if any, is it (J= 2, K = 3) Pb |a theoretically possible to design a channel a1 1 2 coding system (encoder + decoder) such P that the overall transmission of information b2|a1 from source to user becomes as reliable as b2 we desire? P b2|a2 a The answer lies in the capacity of the channel, 2 P b3|a1 and in Shannon’s second coding theorem, also known as the channel coding theorem. In order P b3 to introduce this result, we need some notation, b3|a2 and a statistical model for the channels under study.

formula in various ways to incorporate the prob- 6.2 Channel Modelling ability densities that are most practical to use in A discrete memoryless channel may be de- a given situation. We shall now see how the scribed by its transition probability distribution, mutual information function can be used to find i.e. a description of how probable it is that the the capacity of a general communication channel. various input symbols to the channel emerge from the channel as each of the possible output 6 Channel Capacity and Coding symbols. Consider a discrete channel having an As discussed in the introduction, most of the input alphabet A = {a0, …, aJ-1} and an output physical channels we transmit information over alphabet B = {b0, …, bK-1}. A practical example are subject to noise and distortion of various is a channel where BPSK modulation symbols kinds, resulting in errors in the received wave- are transmitted (J = 2), while the channel adds forms/channel symbols. For the sake of simplic- continuously distributed noise, and the receiver ity we shall here be content to study memoryless performs quantization to, say, K = 8 levels, of channels, which transmit discrete- or continu- the noisy (and thus amplitude-continuous) ous-valued symbols in discrete time intervals3). received symbols. Such quantization must be Memorylessness of a channel means that the done in order to facilitate the use of digital pro- added noise at a given time instance does not cessing during subsequent decoding. influence the channel output in any other time instances. Such a channel may be described by a probabil- ity transition matrix P whose (k,j)-element is 6.1 Channel Coding Pk|j, the probability that the channel output is bk Channel coding means applying a certain map- when the input was aj. It is also common to visu- ping (the channel encoder C) from the source alize such a channel by an input-output transi- alphabet S to a channel codeword alphabet C, tion diagram as exemplified in Figure 11. and a mapping from C back to S (the channel decoder D). In a communication system the For additive-noise channels transmitting contin- mappings are applied as depicted in Figure 10. uous-amplitude symbols we may write the chan- nel output as Y = X + Z where X is the channel The two mappings should be designed to ensure input and Z is the additive noise. The probability that transmission errors in the symbol sequences density function of the noise, fZ(z), then de- produced by the channel encoder will not neces- scribes the channel. sarily result in errors in the decoded source sym- bol sequences (information messages) produced 6.3 The Channel Capacity of a by the channel decoder. Examples of classical Memoryless Channel channel codes are algebraic forward-error-cor- For a given memoryless channel, we now intro- recting codes such as BCH codes, and convolu- duce the notion of channel capacity at cost S. tional codes [8]. More recently the field has The most obvious “cost” in a communication been revolutionized by the advent of iterative system is perhaps an upper limit on the average decoding, especially as applied to turbo codes symbol power available for transmission over and low-density parity check codes. We refer to the channel. Another possible cost is that of the paper [9] by Øyvind Ytrehus in this issue of bandwidth. For simplicity, we consider the case Telektronikk, and the references therein, for a where the bandwidth is given, and thus is not tutorial introduction to these fields. a subject for optimization. The capacity at cost

3) This is a valid description also for a channel transmitting continuous-time waveforms, if it is perfectly bandlimited and the Nyquist criterion is fulfilled.

12 Telektronikk 1.2002 S is still defined in terms of a maximum of the However, it has indeed been demonstrated that mutual information I(P;Q) between channel when these constraints are sought met, by so- input and output4), as called shaping [10], the system performance does improve considerably over that of the more C(S) = max I(P;Q) bits per channel use (12) common systems where simple uniform symbol P∈ P s distributions are used regardless of the channel. Also, it is worth mentioning that the high-perfor- where PS is the set of all possible channel sym- mance turbo codes are – perhaps almost by acci- bol distributions (discrete or continuous depend- dent, but nonetheless – constructed in a way ing on the channel) such that the average cost which owes a great deal to techniques used by per channel use, E[s], is less than or equal to Shannon when proving the channel coding theo- the constant S. P is our notation for an arbitrary rem. It may seem that information theory can channel symbol distribution, while Q denotes be quite constructive after all, if only designers the channel’s statistical model – here assumed achieve enough insight into its results. given. Note that the probability distribution P which actually achieves the maximum is the dis- 6.3.1 The Delay Problem tribution of symbols that the channel encoder A limiting factor of the channel coding theorem, must approximate if we are to achieve perfor- and indeed of many information theoretic mance close to the capacity limit in practice. results, is that if rates close to the capacity are desired, we may have to resort to coding We shall now state the channel coding theorem, extremely long blocks of symbols at a time in which shows the physical significance of the order to obtain the desired reliability. This will capacity C(S): yield a very complex system with long coding delays. Much work has been done on so-called The channel coding theorem: delay-constrained communication systems, and Let C(S) be the capacity of a memoryless the corresponding capacity limits. The paper channel at cost S. For any R < C(S), and for by Pål Orten and Bjarne Risløw [11] in this issue any desired reliability of transmission over of Telektronikk contains a discussion of how the channel (as measured through probabil- additional delay constraints limit the achievable ity of channel decoding error) there exists a capacity. channel code of rate R (information bits per channel symbol) such that the desired relia- 6.4 The Capacity of a Binary bility may be obtained. For rates R > C(S) Symmetric Channel no such codes exist, and there will always The simplest example of a communication chan- be a nonzero probability of decoding errors. nel one can think of is the binary symmetric channel. This channel, which models well e.g. What was completely novel about this result, the storage on a CD, or an optical fibre with compared to the usual line of thinking before binary modulation, takes binary symbols (de- Shannon’s papers, was that it shows that the noted 0 and 1, for simplicity) as input and trans- achievable transmission rate is not a function of mits them with a symmetric probability of error, the desired degree of communication reliability. denoted p. That is to say, the probability of a Either communication can be made reliable, or it transmitted 0 being received as a 1 is p, as is the cannot. In the case that it can, it can be made as probability of a 1 being received as a 0. Note reliable as we want: Methods for achieving any that p can be limited to [0, 0.5] since, if p were desired reliability have been proven to exist for higher than 0.5, the bit error rate could be transmission rates all the way up to the (constant) improved by inverting all the received bits. channel capacity – at the cost of increased system The probability transition diagram for this complexity. Conversely, it has been proven that channel is depicted in Figure 12. there are no transmission schemes to be found which can guarantee any degree of reliability if one attempts to transmit above capacity. 1-p 00

The channel coding theorem holds for both dis- crete and continuous channels. It is, perhaps, p chiefly an existence theorem; in other words, it does not exactly tell us how to construct our p channel coding systems – beyond imposing nec- Figure 12 A binary essary constraints on the channel symbol distri- 1-p symmetric channel 11 bution.

4) The mutual information is really a function of the input distribution and the channel transition probability properties, not the random input and output variables themselves – therefore we often write it as I(P;Q) instead of I(X; Y).

Telektronikk 1.2002 13 1 chance of correct decoding. Hence, the capacity is zero. At p = 0 and p = 1 the capacity attains 0.9 its maximum of 1, meaning that the channel is 0.8 noiseless (in the case of p = 1 every single trans- mitted bit is inverted by the channel, but of 0.7 course these “errors” can all easily be corrected by the receiver!). Thus, in these special cases no 0.6 error protection (error control coding) is needed, and every transmitted channel bit can be an 0.5 information bit.

0.4

Channel capacity 6.5 The Capacity of a Gaussian 0.3 Memoryless Channel As another very important example, let us con- 0.2 sider a memoryless continuous-amplitude chan- nel with additive white Gaussian zero-mean 0.1 noise (AWGN) of power (variance) N, statisti- cally independent of the channel input. That is to 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 say, the noise power is uniformly distributed in frequency (hence white noise), while the noise Transition probability p samples follow a Gaussian distribution.

This noise model is valid for all cases where Figure 13 Channel capacity of The channel capacity is achieved when the input there are many independent sources of noise; in a binary symmetric channel symbols are uniformly distributed, i.e. P = particular it models thermal noise in electronic {0.5, 0.5}. The maximum mutual information components well. In a cellular mobile radio sys- achieved is tem with many active users transmitting in the same frequency band, it is also a good approxi- Figure 14 Capacity (bits per C = 1 + p log2(p) + (1 – p) log2(1 – p) mation when modelling interference between second) as a function of chan- [information bits per channel bit] (13) users. nel signal-to-noise ratio (SNR) for a memoryless, bandwidth- In other words, the capacity is C = 1 – H(p), We furthermore assume that the channel is and power-limited channel with where H(p) = –p log2(p) – (1 – p) log2(1 – p) is bandlimited to B Hz and that the signals to be additive Gaussian noise. The the entropy of the “channel noise source”. The sent are critically sampled, i.e. at fs = 2B Hz. capacity is depicted for band- higher p is (up to 0.5), the more noise there is This is a good model e.g. for a telephone line widths ranging from 1000 Hz on the channel, and the smaller the capacity is. channel. For such a channel, Shannon derived (lower curve) to 4000 Hz (upper At p = 0.5 there is equal probability that every the following famous formula for the capacity curve), in steps of 1000 Hz received bit is correct or wrong, so there is no at received signal power S (which is the same as transmitted power if, as is usually done for this channel model, we assume that no signal attenu- ation is present in the channel): x 104 8 ⎛ S ⎞ C(S) = B⋅log2 1+ ⎝ N ⎠ (14) 7 = B⋅log2 (1+ SNR),

6 measured in information bits per second. SNR denotes channel signal-to-noise ratio; i.e. the 5 ratio between received signal power and noise power. 4 This formula, whose fascinating history is traced in the paper by Lars Lundheim [12] in the pre-

Channel capacity 3 sent issue of Telektronikk, has become so well known that it can sometimes be found misinter- 2 preted as expressing the capacity of any noisy channel – which it does not. It is visualized in 1 Figure 14, for various values of the channel SNR. From the derivation, not to be done here, it 0 is evident that if we are to actually achieve this 012345 678910capacity in practice, we must use a signalling Channel SNR (linear scale) x 105 alphabet with channel codewords that also fol-

14 Telektronikk 1.2002 Performance bound Figure 15 Capacity limit of an AWGN channel as a function of Realizable system Eb N0

R/W

-1.6 Eb dB N0

R 2 B − 1 low a Gaussian distribution, with zero mean and E ≥ N . (19) b 0 R/B variance S. The absolute lower bound for error-free trans- The capacity curves shown in Figure 14 seem to mission over the channel using any transmission imply a linear, unlimited capacity growth as the scheme is obtained from this equation by letting bandwidth increases. A weakness of this figure, the actual transmission rate go to zero: however, is that it does not take into account the practical fact that thermal (white) noise power in min Eb = N0 ⋅ ln(2). (20) a communication system is proportional to the system bandwidth. Thus, operating at the same This is often expressed as the Shannon limit for SNR at different bandwidths means that the reliable transmission on an AWGN channel, transmit power is different for each bandwidth.   Eb A more “fair” comparison from a resource point min = ln(2), (21) N0 of view is obtained by normalizing the SNR with respect to the bandwidth, i.e. express the SNR as or –1.6 dB on the decibel scale. Figure 15 illus- the ratio of transmitted signal power S to noise trates the upper bound (18) on the achievable power per Hertz bandwidth, denoted N0 [W/Hz]. rate per Hz bandwidth, as a function of the sig- Doing this normalization, we obtain the channel nal-to-noise ratio per information bit. It can be capacity on an AWGN channel as follows: seen that the upper rate bound goes asymptoti-   cally to zero as the SNR approaches the Shannon S C = B log2 1+ [information bits/s] (15) limit. N0B

which implies a finite (assuming finite transmit 7 Source Compression and power) asymptotic upper bound on the capacity Rate Distortion Theory as the bandwidth is increased to infinity: At the beginning of this paper we studied data S compaction, the process of efficient, error-free C(B = ∞)= . (16) source representation. However, in many appli- N0 ln(2) cations we are willing, or even forced, to accept For a desired actual transmission rate R ≤ C some distortion in the source output – such as [information bits/second] this can be used to blurring or blocking effects in an image, or obtain a lower bound on the transmit energy per slightly “synthetic” quality, loss of treble or information bit which must be used if error-free background noise in a speech signal – in order to transmission is to be at this rate. The energy fit the source data into a given channel, be that a spent per channel symbol is magnetic disc, a telephone line, or a radio link. S E = b R [J/information bit]. (17) As we have seen, each such channel is character- ized by a finite capacity which provides an Thus, S = EbR, so, from Equation (15) upper bound on the amount of information per   channel symbol (or per unit time) which can be EbR R/B ≤ C/B =log2 1+ , (18) reliably transmitted. If we have a source whose N0B information content is higher than the capacity or of the channel we want to use, we essentially have two choices:

1. Transmit at a rate higher than the source entropy rate (and thus higher than the channel capacity). Admittedly, this enables us to send

Telektronikk 1.2002 15 within a complete communication system as original Source compressed Channel error protected shown in Figure 16. information encoder information encoder information signal (source (channel The idea that the tasks of source and channel code) symbols) to channel coding can be separated as shown above, with- out losing overall system optimality, but greatly reducing design complexity, is rooted in the sep- aration principle, originally devised by Shan- non. However, it is important to realize that this from channel principle holds only when there are no con- straints on computational complexity or overall decoded Source decoded Channel noisy delay in the system. For real systems where such information encoder source encoder channel practical constraints are imposed, there might signal code symbol sequence be performance advantages in a joint design of source and channel coding. The paper by Tor A. Ramstad [13] in this Telektronikk issue addresses this problem.

Figure 16 Information the source data error-free into the channel, but 7.1 Scalar Quantization encoding and decoding we are then forced to accept the decoding The very simplest way of compressing a operations separated into errors which invariably will occur after trans- source’s information content is that of scalar source and channel encoding mission over the channel. These errors will amplitude quantization. A scalar quantizer Q is and decoding typically be outside of our control, and are a nonlinear operation which is used, on a single- stochastic in nature. symbol basis, to limit the source alphabet to a finite, countable set – i.e. for each real-valued 2. Before transmission, reduce the information input symbol x, which may be taken from a con- content of the source in a controlled way (thus tinuous distribution, the quantizer outputs one of introducing a type of distortion over which we only a limited number N of amplitude levels or have some amount of control, e.g. low pass reproduction values. The output y = Q(x) is in filtering or coarser amplitude quantization) each case chosen based on a nearest neighbour until it is lower than the channel capacity. rule. That is, for each input symbol, the quan- The quality-reduced but hopefully still usable tizer simply outputs the quantized output level source can then in principle be reliably trans- which is closest to the input on the real line. It is mitted over the given channel. common to visualize a scalar quantizer by means of its quantizer characteristic, which simply Developing good methods for implementing depicts the input-output relation y = Q(x) as a choice 2 is the task of source compression. function. In general, source compression is performed by a source coder-decoder pair, which is placed In Figure 17 a non-uniform scalar quantizer characteristic is shown. A quantizer is said to be non-uniform when the quantization intervals – the intervals between the possible output inter- Figure 17 3-bit non-uniform vals – have varying length. This is beneficial scalar quantizer characteristic when quantizing sources with non-uniform pdfs. One important example is in the CCITT A-law used for speech in Pulse Code Modulation (PCM) for telephony [14]. Shown here is a 3-bit x1 x quantizer, which simply means that there are 8 = 2 3 x 2 output levels, which can be uniquely indexed 3 by means of 3-bit strings or binary codewords. Thus the information content of the quantized x 4 source in this case is not more than 3 bits per source symbol, whereas the information content before quantization was infinitely large. y x5 7.2 Rate Distortion Theory The branch of information theory that provides x6 the foundation for source compression – i.e. the bounds we search to reach when we compress x7 x8 e.g. an image or an audio signal – is called rate distortion theory. This theory is first and fore- most concerned with the following question:

16 Telektronikk 1.2002 For a given source S, and a given represen- 3.5 tation rate R [code symbols/source symbol], find the minimal distortion D that can theo- 3 retically be attained in the reproduction of S – or, equivalently: For a given maximal distortion D we are willing to accept in the 2.5 reproduction of S, what is the minimal rate R which is theoretically possible to use? 2

The answer to the above question(s) is given by the rate distortion (or source distortion) function 1.5 R(D), which for a given (discrete or continuous) source S is also defined in terms of the mutual 1 information function: Rate distortion function R(D)

R(D) = min I(S;Q). (22) 0.5 Q∈QD

Here Q is the set of all possible noisy test chan- 0 D 0 0.1 0.2 0.3 0.40.5 0.6 0.7 0.8 0.9 1 nels that yields an average symbol distortion E[d] less than or equal to D when the source S Quadratic per-symbol distortion D is transmitted through the channel.

That is, one imagines that the distortion due to compression of the source has occurred due to The source-compression theorem Figure 18 The rate-distortion the transmission of the source over some (imagi- Let S be a memoryless source with rate dis- function for a memoryless nary) noisy channel, described by a transition tortion function R(D). Then, for every D > 0 N(0, 1) source probability matrix or noise distribution – a chan- it is possible to find a source code of rate nel that gives rise to a distortion not larger than r > R(D) such that the source after being D. All channels for which this is possible to encoded and decoded with this code is achieve are considered; then one chooses the one reproduced with average symbol distortion channel where the lowest transmission rate could less than or equal to D. If r ≤ R(D) there be used. This minimal rate is then R(D), and the exists no code such that this is possible. noise characteristics of the corresponding chan- nel tell us how the distortion due to coding Again, the theorem holds for both discrete and should be distributed among the source symbols. continuous sources. It does not tell us exactly how to find practically realizable source codes The designer challenge when designing good that approach the rate-distortion function for a compression algorithms is now to find an actual given source. However, from the proof of the algorithm which results in reproduction noise theorem, as for the case of channel coding, it is with the same noise characteristics as those of evident that we may have to consider encoding the imaginary channel mentioned above. symbol blocks of possibly infinite length n (and hence infinite delay and coder complexity) in The choice of symbol distortion measure, d, is order to do this. arbitrary and may be done by the designer. In practice, one often uses the mean square distor- As an example, consider a continuous, memory- tion between original and decompressed source less source with symbol outcomes following a symbols, i.e. d = (X – Y)2, such that E[(X – Y)2] Gaussian distribution N(0, σ2). This is a reason- ≤ D. This is mainly due to the computational ably valid model e.g. for sub-band samples com- tractability and simplicity of this measure. ing from a well-designed analysis filterbank [15]. For such a source, the rate distortion func- The exact physical significance of the rate dis- tion is given by tortion function is summed up in Shannon’s 2 third coding theorem, also termed the source- 1 σ 2 R(D) = log2 , for 0 ≤ D ≤ σ (23) compression theorem: 2 D

In Figure 18 this function is depicted, for σ2 = 1. Even though closed form expressions for the rate distortion function may be found for only rela- tively simple source models [6], its basic general properties may be generally derived from the formal definition:

Telektronikk 1.2002 17 90 S SNRo = . (25) Do

80 In Figure 19 this optimally achievable SNR after decoding is shown, for various ratios between 70 the source and channel symbol rate. It is seen that the more channel bandwidth spent (band- 60 width expansion), the better the fidelity. This is because bandwidth expansion allows for the use 50 opta of more powerful error control schemes.

SNR 40 8 Concluding Remarks 30 This paper has merely scratched the surface of a huge and continuously expanding field, and tried 20 to convey some basic insights into how informa- tion theory works. There is a lot more to be 10 gained from studies of information theoretic top- ics than what could possibly be included here, 0 e.g. error analysis of communication systems, 0 51015 20 25303540 construction and performance analysis of chan- Channel signal-to-noise ratio γ nel and source codes, optimization of quantiza- tion schemes, and so on.

Figure 19 Optimal signal-to- The function always goes from the rate value Information theory does of course not provide us noise ratio (dB) after decod- H(S) (which here is infinite, since the source is with all the answers to questions regarding com- ing, as a function of the chan- continuous) at distortion 0, to rate 0 at some munication system design. However, it does nel signal-to-noise ratio (dB), finite maximal distortion Dmax. Between these often provide us with important pointers as to for a Gaussian memoryless two points the function is always strictly de- what we should or should not do. For an exam- source transmitted over creasing and convex; hence also continuous. ple, consider modern telephone line modem a memoryless, bandlimited This implies that we can be sure that if we in- standards utilizing trellis coded modulation AWGN channel. Solid line: crease the rate, it is always possible to obtain a (TCM) techniques [16], with rates up Ts = Tc/2 (bandwidth source representation that has strictly improved, to 33.4 kbit/s. 15 years ago, it was thought reduction during transmis- at least in a signal-to-noise ratio sense. The rela- “impossible” to implement such modem rates in sion). Dashed line: Ts = Tc. tive improvement will be largest at low rates. practice due to channel noise problems. Yet the Dash-dotted line: Ts = 2Tc capacity theorems of information theory have all (bandwidth expansion during 7.3 OPTA – Optimum Performance the time indicated that the fundamental attain- transmission) Theoretically Attainable able rate limit for error-free transmission is far The capacity bound for noisy channels and the higher — approximately 60 kbit/s for today’s rate distortion bound for compression may be telephone lines. The modern high-rate modems combined to produce what is commonly referred would have been unimaginable without the to as the Optimum Performance Theoretically insights that information theory provides, as Attainable, or OPTA. This is the best perfor- would the knowledge of the actual potential to mance that can ever be achieved in a system be gained. where a certain source is to be transmitted over a given channel. If the rate distortion function of The same can be said for most modern channel the source is R(D), and the channel capacity is coding, modulation, compression, and com- C, the minimal distortion that can be achieved is paction techniques. It is thus clear that informa- found by solving the equation C = R(D) with tion theory can be of much more practical use respect to D. than what is suggested by the maybe too-com- mon picture many communication engineers For an AWGN channel where the channel sym- have of the field, as a set of theoretical bounds, bol rate is 1/Tc [channel symbols/s] and the attainable only with the aid of both unknown, source symbol rate is 1/Ts [source symbols/s], and possibly infinitely complex, algorithms. It this solution can be found on a simple closed is a challenge to those of us who are teaching form [13]: information theoretic concepts to replace this picture with a more positive and useful one.  − Ts S Tc D = S 1+ , (24) Finally, we remark that there are still many chal- o N B 0 lenges for information theory, and communica- with corresponding maximal SNR after decod- tion system design problems which might bene- ing given by fit from its insight. We would particularly like to mention the following fields as exciting areas for information theoretic research in the future:

18 Telektronikk 1.2002 • Design and performance analysis of multiuser 8 Blahut, R E. Theory and Practice of Error communication networks and MAC protocols; Control Codes. Addison-Wesley, 1984.

• Time-varying and frequency-dispersive wire- 9 Ytrehus, Ø. An Introduction to Turbo Codes less channels, particularly multiple-input-mul- and Iterative Decoding. Telektronikk, 98 (1), tiple-output (MIMO) channels; 65–77, 2002. (This issue.)

• Diversity issues in general. 10 Forney, G D Jr. Trellis shaping. IEEE Trans. on Information Theory, 38 (2), 1992. It seems quite clear that information theory still has an important role to play if tomorrow’s com- 11 Orten, P, Risløw, B. Theory and Practice of munication systems are to live up to the expecta- Error Control Coding for Satellite and Fixed tions towards them. Shannon’s ideas are likely to Radio Systems. Telektronikk, 98 (1), 78–91, cast long shadows into the 21st century. 2002. (This issue.)

References 12 Lundheim, L. On Shannon and “Shannon’s 1 Shannon, C E. A mathematical theory of formula”. Telektronikk, 98 (1), 20–29, 2002. communication. Bell Syst. Tech. J., 27, (This issue.) 379–423 and 623–656, 1948. 13 Ramstad, T. Shannon Mappings for Robust 2 Shannon, C E. Communication in the pres- Communication. Telektronikk, 98 (1), ence of noise. Proc. IRE, 37, 10–21, Jan. 114–129, 2002. (This issue.) 1949. 14 Haykin, S. Communication Systems. Wiley, 3 Shannon, C E. Coding theorems for a dis- 1994 (3rd ed.). crete source with a fidelity criterion. IRE Nat. Conv. Rec., 142–163, Mar. 1959. 15 Ramstad, T A, Aase, S O, Husøy, J H. Sub- band Compression of Images – Principles 4 Blahut, R E. Principles and Practice of and Examples. North Holland, Elsevier, Information Theory. Reading, MA, Addison 1995. Wesley, 1987. 16 Biglieri, E et al. Introduction to Trellis- 5 Cover, T M, Thomas, J A. Elements of Infor- coded modulation with applications. New mation Theory. New York, Wiley, 1991. York, Macmillan, 1991.

6 Berger, T. Rate Distortion Theory. Engle- wood Cliffs, NJ, Prentice-Hall, 1971.

7 Huffman, D A. A method for the construc- tion of minimum redundancy codes. Proc. IRE, 40, 1098–1101, 1952.

Telektronikk 1.2002 19 On Shannon and “Shannon’s Formula”

LARS LUNDHEIM

The period between the middle of the nineteenth and the middle of the twentieth century represents a remarkable period in the history of science and technology. During this epoch, several discoveries and inventions removed many practical limitations of what individuals and societies could achieve. Espe- cially in the field of communications, revolutionary developments took place such as high speed rail- roads, steam ships, aviation and telecommunications.

It is interesting to note that as practical limita- logical advances, rather than theoretical studies tions were removed, several fundamental or isolated from practical life. Lars Lundheim (44) received his principal limitations were established. For Siv.ing. and Dr.ing. degrees instance, Carnot showed that there was a fun- Besides the original sources cited in the text, this from the Norwegian University damental limit to how much energy could be paper builds on historical overviews, such as [4] of Science and Technology (NTNU), Trondheim, in 1984 and extracted from a heat engine. Later this result and [19]–[23]. 1992, respectively. He has held was generalized to the second law of thermo- various research and teaching dynamics. As a result of Einstein’s special rela- “Shannon’s Formula” positions at NTNU, SINTEF, CERN and Trondheim College tivity theory, the existence of an upper velocity Sometimes a scientific result comes quite unex- of Engineering. He is currently limit was found. Other examples include pected as a “stroke of genius” from an individual employed as Associate Profes- Kelvin’s absolute zero, Heissenberg’s uncer- scientist. More often a result is gradually re- sor at NTNU, doing research and teaching in signal process- tainty principle and Gödel’s incompleteness vealed, by several independent research groups, ing for wireless communication. theorem in mathematics. Shannon’s Channel and at a time which is just ripe for the particular [email protected] coding theorem, which was published in 1948, discovery. In this paper we will look at one par- seems to be the last one of such fundamental ticular concept, the channel capacity of a band- limits, and one may wonder why all of them limited information transmission channel with were discovered during this limited time-span. additive white, Gaussian noise. This capacity is One reason may have to do with maturity. When given by an expression often known as “Shan- a field is young, researchers are eager to find out non’s formula1”1): what can be done – not to identify borders they cannot pass. Since telecommunications is one of C = W log2(1 + P/N) bits/second. (1) the youngest of the applied sciences, it is natural that the more fundamental laws were established We intend to show that, on the one hand, this is at a late stage. an example of a result for which time was ripe exactly a few years after the end of World War In the present paper we will try to shed some II. On the other hand, the formula represents a light on developments that led up to Shannon’s special case of Shannon’s information theory2) information theory. When one compares the presented in [1], which was clearly ahead of time generality and power of explanation of Shan- with respect to the insight generally established. non’s paper “A Mathematical Theory of Com- munication” [1] to alternative theories at the “Shannon’s formula” (1) gives an expression for time, one can hardly disagree with J.R. Pierce how many bits of information can be transmitted who states that it “came as a bomb” [4]. In order without error per second over a channel with a to see the connection with earlier work, we will bandwidth of W Hz, when the average signal therefore focus on one particular case of Shan- power is limited to P watt, and the signal is non’s theory, namely the one which is some- exposed to an additive, white (uncorrelated) times referred to as “Shannon’s formula”. As noise of power N with Gaussian probability dis- will be shown, this result was discovered inde- tribution. For a communications engineer of pendently by several researchers, and serves as today, all the involved concepts are familiar – an illustration of a scientific concept whose time if not the result itself. This was not the case in had come. Moreover, we will try to see how 1948. Whereas bandwidth and signal power development in this field was spurred by techno- were well-established, the word bit was seen in

1) Many mathematical expressions are connected with Shannon’s name. The one quoted here is not the most important one, but perhaps the best known among communications engineers. It is also the one with the most immediately understandable significance at the time it was published. 2) For an introduction to Shannon’s work, see the paper by N. Knudtzon in this issue.

20 Telektronikk 1.2002 print for the first time in Shannon’s paper. The notion of probability distributions and stochastic processes, underlying the assumed noise model, had been used for some years in research com- munities, but was not part of an ordinary electri- cal engineer’s training.

The essential elements of “Shannon’s formula” are:

1. Proportionality to bandwidth W 2. Signal power S 3. Noise power P 4. A logarithmic function

The channel bandwidth sets a limit to how fast symbols can be transmitted over the channel. The signal to noise ratio (P/N) determines how much information each symbol can represent. The signal and noise power levels are, of course, expected to be measured at the receiver end of the channel. Thus, the power level is a function both of transmitted power and the attenuation of the signal over the transmission medium (chan- nel).

The most outstanding property of Shannon’s mation content in a signal and the transmission Claude Elwood Shannon papers from 1948 and 1949 is perhaps the of this information through a channel. However, (1916–2001), the founder of unique combination of generality of results and Wiener was not a master of communicating his information theory, also had clarity of exposition. The concept of an informa- ideas to the technical community, and even a practical and a playful side. tion source is generalized as a symbol-generat- though the relation to Shannon’s formula is The photo shows him with one ing mechanism obeying a certain probability pointed out in [2], the notation is cumbersome, of his inventions: a mechanical distribution. Similarly, the channel is expressed and the relevance to practical communication “mouse” that could find its essentially as a mapping from one set of symbols systems is far from obvious. way through a maze. He is to another, again with an associated probability also known for his electronic distribution. Together, these two abstractions Reference to Wiener’s work was done explicitly computer working with Roman make the theory applicable to all kinds of com- by Shannon in [1]. He also acknowledged the numerals and a gasoline- munication systems, man-made or natural, elec- work by Tuller3). William G. Tuller was an powered pogo stick trical or mechanical. employee at MIT’s Research Laboratory for Electronics in the second half of the 1940s. In Independent Discoveries 1948 he defended a thesis at MIT on “Theoreti- One indicator that the time was ripe for a funda- cal Limitations on the Rate of Transmission of mental theory of information transfer in the first Information”4). In his thesis Tuller starts by post-war years is given in the numerous papers referring to Nyquist’s and Hartley’s works (see attempting at such theories published at that below). Leaning on the use of sampling and time. In particular, three sources give formulas quantization of a band-limited signal, and argu- quite similar to (1). The best known of these is ing that intersymbol interference introduced by the book entitled Cybernetics [2] published by a band-limited channel can in principle be elimi- Wiener in 1949. Norbert Wiener was a philo- nated, he states quite correctly that under noise- sophically inclined and proverbially absent- free conditions an unlimited amount of informa- minded professor of mathematics at MIT. None- tion can be transmitted over such a channel. theless, he was deeply concerned about the Taking noise into account, he delivers an argu- application of mathematics in all fields of soci- ment partly based on intuitive reasoning, partly ety. This interest led him to founding the science on formal mathematics, arriving at his main of cybernetics. This field, which is perhaps best result that the information H transmitted over a defined by the subtitle of [2]: “Control and Com- transmission link of bandwidth B during a time munication in the Animal and the Machine” interval T with carrier-to-noise-ratio C/N is lim- included, among other things, a theory for infor- ited by

3) Shannon’s work was in no way based on Wiener or Tuller; their then unpublished contributions had been pointed out to Shannon after the completion of [1]. 4) Later published as RLE Technical report 114 and as a journal paper [5] (both in 1949).

Telektronikk 1.2002 21 Norbert Wiener (1894–1964) upon, if not fully identify, some of the issues had been Shannon’s teacher at that were clarified by Shannon twenty years MIT in the early 1930s. By his later. seminal work Extrapolation, Interpolation and Smoothing of First, it is obvious to Nyquist that the “Speed of Stationary Time Series made transmission of intelligence” (which he terms W) during World War II he lay the is limited by the bandwidth of the channel6). foundation for modern Without much formal mathematical argument, statistical signal processing. Nyquist derives the following approximate for- Although Shannon was mula for W: influenced by Wiener’s ideas, they had little or no contact W = K log m (3) during the years when they made their contributions to where m is the “number of current values”, communication theory. There which in modern terms would be called “the size styles were very different. H ≤ 2BT log(1 + C/N). (2) of the signalling alphabet” and K is a constant. Shannon was down-to-earth in his papers, giving illustrative This expression has a striking resemblance to Whereas Nyquist’s paper is mostly concerned examples that made his Shannon’s formula, and would by most readers with practical issues such as choice of pulse concepts possible to grasp for be considered equivalent. It is interesting to note waveform and different variants of the Morse engineers, and giving his that for the derivation of (2) Tuller assumes the code, a paper presented three years later by Hart- mathematical expression a use of PCM encoding. ley is more fundamental in its approach to the simple, crisp flavour. Wiener problem. The title is simply “Transmission of would rather like to use the A work not referenced by Shannon is the paper Information”, and in the first paragraph the space in-between crowded by Clavier [16]5). In a similar fashion to Tuller, author says that “What I hope to accomplish (...) formulas for philosophical starting out with Hartley’s work, and assuming is to set up a quantitative measure whereby the considerations and esoteric the use of PCM coding, Clavier finds a formula capacities of various systems to transmit infor- topics like Maxwell’s demon essentially equivalent to (1) and (2). A fourth mation may be compared”. Even though Nyquist independent discovery is the one by Laplume had given parts of the answer in his 1924 paper, published in 1948 [17]. this is the first time the question that was to lead up to Shannon’s information theory is explicitly Early Attempts at a General stated. Communication Theory Shannon and the other researchers mentioned Compared to Nyquist, Hartley went a couple of above were not the first investigators trying to steps further. For one thing, he stated explicitly find a general communication theory. Both that the amount of information that may be Shannon, Tuller and Clavier make references transmitted over a system7) is proportional to the to the work done in the 1920s by Nyquist and bandwidth of that system. Moreover, he formu- Hartley. lated what would later be known as Hartley’s law, that information content is proportional to By 1920 one can safely say that telegraphy as a the product of time (T) and bandwidth (B), and practical technological discipline had reached a that one quantity can be traded for the other. It mature level. Basic problems related to sending should also be mentioned that Hartley argued and receiving apparatus, transmission lines and that the theory for telegraph signals (or digital cables were well understood, and even wireless signals in modern terms) could be generalized transmission had been routine for several years. to continuous-time signals such as speech or At this stage of development, when only small television. Hartley’s law can be expressed as increases in efficiency are gained by technologi- cal improvements, it is natural to ask whether Amount of information = const ⋅ BT ⋅ log m.(4) one is close to fundamental limits, and to try to understand these limits. Harry Nyquist, in his Relations between bandwidth and time similar to paper “Certain Factors Affecting Telegraph the one found by Nyquist was discovered simul- Speed” [7], seems to be the first one to touch taneously by Karl Küpfmüller in Germany [10].

5) It is, perhaps, strange that neither Shannon nor Clavier have mutual references in their works, since both [3] and [16] were orally presented at the same meeting in New York on December 12, 1947, and printed more than a year afterwards. 6) Proportionality to bandwidth is not explicitly stated by Nyquist in 1924. He has probably been aware of it, and includes it in his more comprehensive paper [7] four years later. 7) Neither Nyquist nor Hartley make explicit distinction between source, channel and destination, as Shannon does twenty years later. This may seem like a trivial omission, but the distinction is essential for Shannon’s general definition of channel capacity, which requires this separation to define quanti- ties such as source entropy and mutual information.

22 Telektronikk 1.2002 A more mathematically stringent analysis of the relation was carried out in Gabor’s “Theory of communication”.

As was pointed out by Tuller [5], a fundamental deficiency of the theories of Nyquist, Hartley, Küpfmuller and Gabor, is that their formulas do not include noise. The role of noise is that it sets a fundamental limit to the number of levels that can be reliably distinguished by a receiver. From expressions (3) and (4) we see that both Nyquist and Hartley were aware of the fact that the amount of information depends on the number of distinguishable signal levels (or symbols). However, they seem content to include this num- ber m in their formulas instead of deriving it from a more fundamental quantity, such as the signal-to-noise level. In a short discussion, Nyquist mentions “interference” as one of the limiting factors of the signal alphabet size. Hart- ley points out the inter-symbol interference due to channel distortion as the most important limit- ing factor. This is fundamentally wrong, as Tuller remarks, since inter-symbol interference can, in principle, be removed by an equalizer. successful demonstrations of telegraphy by Veil Harry Nyquist (right) This is precisely what Nyquist shows in his 1928 and Morse in the 1840s. It took only a few years (1889–1976) with John R. paper [7]. before all developed countries had established Pierce (left) and R. Kompfner. systems for telegraph transmission. These sys- Nyquist was born in Nilsby in Developments in the Under- tems were expensive, both to set up and to oper- Värmland, Sweden, and standing of Bandwidth ate, and from the start it was important to make emigrated to USA in 19078). As we have seen, the work of Nyquist, Hartley the transmission as cost-effective as possible. He earned an MS degree in and Küpfmüller in the 1920s represents an im- This concern is already reflected in the Morse in 1914 portant step towards the fully developed channel code9) alphabet which is designed according to and a PhD in physics at Yale capacity formulas expressed twenty years later. the relative frequencies of characters in written in 1917. The same year he was A key insight was the realization that informa- English. employed by AT&T where he tion transmission rate was limited by system remained until his retirement bandwidth. We have argued that this understand- It was soon evident that, for transmission on in 1954. Nyquist made ing came when the maturity of telegraph tech- overhead wires, the transmission speed was lim- contribution in very different nology made it natural to ask if fundamental lim- ited by the telegraph operator and, possibly, the fields, such as the modelling itations were within reach. Another, related fac- inertia of the receiving apparatus, not by any of thermal noise, the stability tor was that the concept of bandwidth, so essen- properties of the transmission medium. The only of feed-back systems, and tial in the cited works, was now clearly under- problem connected to what we today would call the theory of digital stood by the involved researchers. Today, when the channel was signal attenuation. This was, communication students of electrical engineering are exposed to however, a relatively small problem, since signal Fourier analysis and frequency domain concepts retransmission was easily accomplished, either from early on in their education, it seems strange manually or with automatic electromagnetic that such a fundamental signal property as band- repeater relays. width was not fully grasped until around 1920. We will therefore take a look at how frequency For the crossing of rivers or stretches of ocean, domain thinking was gradually established as the telegraph signals were transmitted using communications technology evolved. cables. This transmission medium soon showed to be far more problematic than overhead lines. If one should assign a birth date to practical For long spans repeaters were not a practical (electrical) telecommunications technology, it solution, meaning that the attenuation problem would be natural to connect it to one of the first became serious. It was also found that operators

8) More about Harry Nyquist’s early years can be found on Lars-Göran Nylén’s homepage, URL: http://members.tripod.com/~lgn75/ 9) The Vail Code would be a more proper term, since it was Morse’s assistant Alfred Vail who in 1837 visited a print shop in Morristown to learn from the contents of the type cases which letters were more frequent in use. He then advised Morse to abandon his plan of using a word code involving the construction of a dictionary assigning a number to all English words, and use the more practical character code with unique dash-dot combinations for each letter [19].

Telektronikk 1.2002 23 nance was then demonstrated in practical experi- ments by Hertz10).

It is interesting to see how, even before electrical resonance was commonly understood, acoustical resonance was suggested as a means of enhanc- ing the capacity of telegraph systems. As noted above, the telegraph operator represented the bottleneck in transmission by overhead wires. Thus, many ingenious schemes were suggested by which two or more operators could use the Detail from Alexander Graham would have to restrain themselves and use a same line at a time. As early as 1853 the Ameri- Bell’s patent for the lower speed than normal to obtain a distinguish- can inventor M.B. Farmer is reported to have “Harmonic Telegraph”. able message at the receiver end. Both these suggested the first system for time division mul- An alternating current is problems were of concern in the planning of the tiplex (TDM) telegraphy. The idea, which was generated in the left coil with first transatlantic telegraph cable. Expert opin- independently set forth several times, was per- a frequency given by the ions were divided from the start, but the mathe- fected and made practical by the Frenchman vibrating reed c. The reed h to matical analysis of William Thomson (later Lord J.M.E. Baudot11) around 1878. In parallel with the right will resonate and give Kelvin) showed that even though the attenuation the TDM experiments, several inventors were a sound only if it is tuned to would be large, practical telegraphy would be working with frequency division multiplex the same frequency. By this possible by use of sensitive receiving equipment. (FDM) schemes. These were based on vibrating mechanism several users could In particular, Thomson’s analysis explained how reeds, kept in oscillation by electromagnets. By transmit telegraph signal over the dispersion of the cable sets a limit to the pos- assigning a specific frequency to each telegraph the same line by using sible signalling speed. operator, and using tuned receivers, independent equipment tuned to different connections could be established over a single frequencies In our connection, Thomson’s work is interest- telegraph line. One of the most famous patents is ing because it was the first attempt of mathemat- the “harmonic telegraph” by A.G. Bell from the ical analysis of a communication channel. We 1870s. It was during experiments with this idea see that two of the four elements of Shannon’s that Bell more or less accidentally discovered a formula were indirectly taken into account: sig- way of making a practical telephone. nal power reduced by the attenuation and band- width limiting the signalling speed. Bandwidth By 1890 electrical resonance phenomena were was not explicitly incorporated in Thomson’s generally understood by scientists and well- theory. This is quite natural, since the relevant informed engineers. Consequently, practical relationships were expressible in physical cable patents on how to use electrical resonance in constants such as resistance and capacitance. FDM telegraphy began to be filed. These These were parameters that were easily under- attempts, which also included some ideas of stood and that could be measured or calculated. FDM telephony (not practical at the time) were, Bandwidth, on the other hand, was simply not a however, overshadowed by a new invention: the relevant notion, since engineers of the time had wireless. not yet learnt to express themselves in frequency domain terms. After the first few experiments with wireless telegraphy at the start of the twentieth century, it During the nineteenth century topics such as became clear that sharp resonance circuits, tuned oscillation, wavelength and frequency were thor- to specific frequencies or wavelengths were nec- oughly studied in fields such as acoustics, optics essary to avoid disturbance among different and mechanics. For electrical and telegraph engi- users. This requirement made it necessary for neers, however, these concepts had little interest radio engineers to have a good understanding of from the start. frequency and the behaviour of electrical circuits when connected to sources of varying frequency Resonance was a well-known phenomenon in content. It is important to note that the concept acoustics. It was therefore an important concep- of bandwidth was much more elusive than that tual break-through when Maxwell showed math- of frequency. Today, the decomposition of an ematically how a circuit containing both capaci- information-bearing signal into its Fourier com- tance and inductance would respond signifi- ponents is a routine operation to any electrical or cantly different when connected to generators communications engineer. This was not the case producing alternating current of different fre- 80–100 years ago. At that time, one would com- quencies. The phenomenon of electrical reso- monly assume that a radio transmitter was tuned

10) For more details of the history of electrical resonance, including an early contribution by Thomson, see the paper by Blanchard [20]. 11) Baudot is today remembered by the unit baud for measuring the number of symbols per second transmitted through a communication channel.

24 Telektronikk 1.2002 to one – and only one – frequency. The band- whether it was to be transmitted in original or Example of Campbell’s width of a telegraph signal was small compared modulated (by a carrier) form. There was, how- bandpass filter designs with to both the carrier frequencies used and to the ever, some discussion as to how large this band- transfer function. (Original width of the resonance circuits employed. A width had to be. The discussion seems to have figures from U.S. Patent general awareness of bandwidth did not develop ceased after Carson’s “Notes on the Theory of No 1 227 113) until some experience had been gained with tele- Modulation” in 1922. By this time it had been phone transmission. theoretically shown (by Carson) and practically demonstrated that by using so-called single side- In parallel with the development of wireless band modulation (SSB) a modulated signal can telegraphy, and, gradually, telephony, work con- be transmitted in a bandwidth insignificantly tinued on FDM in telecommunications systems larger than the bandwidth of the original (un- or “carrier current telephony and telegraphy”12) modulated) signal. The aim of Carson’s paper which was the term used at the time. In these was to refute claims that further bandwidth systems, first intended for enhancing the capac- reduction could be obtained by modulating the ity of long-distance wire-bound connections, it frequency of the carrier wave instead of the became important to minimize the spacing be- amplitude13). An often quoted remark from the tween carrier frequencies and at the same time introduction is that, according to Carson, “all prevent cross-talk between the channels. To such schemes [directed towards narrowing the obtain this, traditional resonance circuits were bandwidth] are believed to involve a fundamen- no longer adequate for transmitter or for receiver tal fallacy”. Among others, Gabor [10] takes this tuning. What was needed was band-pass filters, statement as a first step towards the understand- sufficiently broad to accept the necessary band- ing that bandwidth limitation sets a fundamental width of the modulated speech signal, flat limit to the possible information transfer rate of enough to avoid distortion, and with sufficient a system. stop-band attenuation to avoid interference with neighbour channels. This kind of device was The Significance of Noise developed during the years prior to World War I, We have seen that the work towards a general and was first patented by G.A. Campbell of Bell theory of communication had two major break- Systems in 1917. throughs where several investigators made simi- lar but independent discoveries. The first one With hindsight it is curious to note that before came in the 1920s by the discovery of the rela- Campbell’s invention, band-limited channels in tion between bandwidth, time and information a strict and well-defined way did not exist! Ear- rate. The second one came 20 years later. An lier transmission channels were surely band-lim- important difference between the theories pub- ited in the sense that they could only be used in lished during these two stages is that in the practice for a limited frequency range. However, 1920s the concept of noise was completely the frequency response tended to roll-off gradu- lacking. ally so as to make the definition of bandwidth, such as is found in Shannon’s formula, question- Why did it take twenty years to fill the gap able. between Hartley’s law and Shannon’s formula? The only necessary step was to substitute So, around 1920 it was evident that an informa- 1+C/N for m in (4). Why, all of a sudden, did tion-bearing signal needed a certain bandwidth, three or more people independently “see the

12) A paper [21] with this title by Colpitts and Blackwell was published in three instalments in Journal of the American Institute of Electrical Engineers in 1921, giving a comprehensive overview both of the history of the subject and the state-of-the-art around 1920. 13) This was an intuitively appealing idea at the time, but sounds almost absurd today, when the bandwidth expansion of FM is a well-known fact.

Telektronikk 1.2002 25 light” almost at the same time? Why did neither When the feed-back principle, patented by H.S. Nyquist, nor Hartley or Küpfmüller realize that Black in 1927, came in use, gains in the order of noise, or more precisely the signal-to-noise ratio hundreds or thousands became possible by cas- play as significant a role for the information cading several amplifier stages. This made noise transfer capacity of a system as does the band- a limiting factor to transmission systems, impor- width? tant to control, and by the 1930s signal-to-noise ratio had become a common term among com- One answer might be that they lacked the neces- munications engineers. sary mathematical tools for an adequate descrip- tion of noise. At this time Wiener had just com- Although the researchers of the 1920s were pleted a series of papers on Brownian motion aware of the practical problem represented by (1920–24) which would become a major contri- noise and interference, it seems that they did not bution to what was later known as stochastic regard it as a fundamental property of the trans- processes, the standard models for description of mission system, but rather as one of the many noise and other unpredictable signals, and one of imperfections of practical systems that should be Shannon’s favourite tools. These ideas, based on disregarded when searching for principal limits probabilistic concepts, were, however, far from of what could be accomplished. mature to be used by even the most sophisticated electrical engineers of the time14). Against this Thus, the fact that noise had just begun to play explanation, it may be argued that when Shan- an active role in communications systems, might non’s formula was discovered, only two15) of partly explain why it was not given sufficient the independent researchers used a formal proba- attention as a limitation to transmission capacity. bilistically based argument. The others based However, when one looks at the reasoning used their reasoning on more common-sense reason- by both Tuller and Clavier (and to some degree ing, not resorting to other mathematical tech- Shannon), one will find that their arguments are niques than what were tools of the trade in the inspired by two practical ideas, both invented 1920s. during the 1930s, namely frequency modulation (FM) and pulse code modulation (PCM). Another explanation could be that the problem of noise was rather new at the time. Noise is Two Important Inventions never a problem as long as it is sufficiently small When reading textbooks and taking university compared to the signal amplitude. Therefore it courses in engineering, one may get the idea that usually arises in situations where a signal has new products are based on the results of engi- been attenuated during transmission over a chan- neering science, which again rely on a thorough nel. As we have seen, such attenuation had been understanding of more basic sciences such as a problem from the early days of telegraphy. physics and chemistry, which finally lean on From the beginning, the problem of attenuation mathematics as the most basic of all exact was not that noise then became troublesome, but knowledge. One can also get the impression that rather that the signal disappeared altogether. the development in these fields follows the same (More precisely, it became too weak to activate pattern: technology must wait for physics to the receiving apparatus in the case of telegraphy, explain new phenomena by mathematics made or too weak to be heard by the human ear in case ready for the purpose in advance. Reality is quite of telephony.) This situation changed radically different. Time and time again inventors with around 1910, when the first practical amplifiers only coarse knowledge of the physical phenom- using vacuum tubes were devised. By use of ena they exploit, have come up with ingenious these, long-distance telephony could for the first problem solutions. Similarly, engineers and time be achieved, and ten years later the first physicists have discovered several mathematical commercial radio broadcasting could begin. But, results, which they have not been able to prove alas, an electronic amplifier is not able to distin- satisfactorily, leaving it to the established mathe- guish between signal and noise. So, as a by- maticians to “tie up the ends” and provide the product, interference, such as thermal noise16), necessary comprehensive theory afterwards. always present in both the transmission lines and the components of the amplifiers, would be With this in mind, it is interesting to note that amplified – and made audible – together with the the understanding of noise in a theory of channel signal. Early amplifiers were not able to amplify capacity had to wait for two practical invention. the signal very much, due to stability problems. These inventions illustrated how signal-to-noise

14) One of the first papers with a rudimentary mathematical treatment of noise was published by Carson in 1925 [12]. It should also be mentioned that Harry Nyquist derived a mathematical model of thermal noise in 1928 [13]. This model was, however, derived without use of probabilistic methods. 15) Shannon and Wiener, of course. 16) Not to mention the “shot noise” generated by travelling electrons in the tubes themselves.

26 Telektronikk 1.2002 ratio (SNR) and bandwidth of a transmission by W.M. Miner.18) Miner’s motivation was to Alec Harley Reeves (1902 – system could actually be traded one against the enhance the capacity of transmission lines by 1971) with two figures from his other. time division multiplex, as had already been PCM patent. Although aware done for telegraphy (see above). Miner’s concept that his invention was far We have already mentioned Carson’s 1922 contained no form of quantizing and should not ahead of what was possible paper, where he showed that frequency modula- be considered a PCM system. This important with the technology of his time, tion (FM) would result in a signal bandwidth addition was first introduced by A.H. Reeves in the patent included several considerably larger than what would result by 1937.19) Reeves realized that his system would circuit solutions to some of SSB, or even traditional AM. This is undeniably need more bandwidth than traditional modula- the involved functionalities true, and it was therefore natural that most tion methods. His rationale was the same as researchers also accepted Carson’s rejection in Armstrong’s: the combat of noise. Reeves’ radi- the same article that FM would be more robust cal insight was that by inserting repeaters at suit- with respect to noise. Among the few who con- able intervals along the transmission line, no tinued working seriously with FM was Edmund additional noise would be added during trans- Armstrong. After several years of experimenta- mission together with the quantizing noise intro- tion, and after introducing an amplitude limiter duced by the encoding (modulation) process at in the receiver, he was able to demonstrate that the transmitter. The quantizing noise could be it was possible to significantly increase the SNR made arbitrarily small by using a sufficiently of a radio communication system by using FM high number of quantization levels. at the cost of expanded bandwidth17) [13]. Implicit in Reeves’ patent lies two important What Armstrong’s results showed was that a principles: trade-off between bandwidth and SNR could in principle be possible. The next step, to realize 1. An analog signal, such as speech, can be rep- that the information transfer capacity of a system resented with arbitrary accuracy by use of suf- depended both on bandwidth and SNR took ficiently frequent sampling, and by quantizing some time, and needed another invention. each sample to one of a sufficiently large number of predefined levels. PCM – Pulse Code Modulation – consists of the sampling and quantizing of a continuous wave- 2. Each quantized sample can be transmitted on form. The use of sampling in telephone trans- a channel with arbitrarily small probability of mission had been suggested as early as 1903 error, provided the SNR is sufficiently large.

17) Carson immediately accepted this as a fact, and very soon afterwards presented a paper together with C. Fry [15] showing mathematically how this mechanism worked, and that this was due to the amplitude limiter not included in the early proposals refuted by him in 1922. 18) U.S. Patent 745,734. 19) French Patent 852,183.

Telektronikk 1.2002 27 A result from this, not explicitly stated by in the transmission of symbols from source to Reeves, is that on a noise-free channel, an infi- destination. This uncertainty is adequately mod- nite amount of information can be transmitted in elled by a probability distribution. This under- an arbitrarily small bandwidth. This is in sharp standing was shared by Wiener, but his attention contrast to the results of the 1920s, and should was turned in other directions than Shannon’s. be considered as a major reason why “Shannon’s According to some, Wiener “under the misap- formula” was discovered by so many just at a prehension that he already knew what Shannon time when PCM was starting to become well- had done, never actually found out” [4]. known. References Concluding Remarks 1 Shannon, C E. A Mathematical Theory of In this paper we have not written much about Communication. Bell Syst. Techn. J., 27, Shannon’s general information theory. On the 379–423, 623–656, 1948. other hand, we have tried to show how the ideas leading up to “Shannon’s formula” gradually 2 Wiener, N. Cybernetics: or Control and emerged, as practical technological inventions Communication in the Animal and the made such ideas relevant. We have also seen that Machine. Cambridge, MA, MIT Press, 1948. after the end of World War II, the subject was sufficiently mature, so that several independent 3 Shannon, C E. Communication in the Pres- researchers could complete what had been only ence of Noise. In: Proc. IRE, 37, 10–21, partially explained in the 1920s. 1949.

On this background, one might be led to con- 4 Pierce, J R. The Early Days of Information clude that Shannon’s work was only one among Theory. IEEE Trans. on Information Theory, the others, and, by a stroke of luck, he was the IT-19 (1), 1973. first one to publish his results. To avoid such a misunderstanding, we will briefly indicate how 5 Tuller, W G. Theoretical Limitations on the Shannon’s work is clearly superior to the others. Rate of Information. Proc. IRE, 37 (5), First, we should make a distinction between the 468–78, 1949. works of Shannon and Wiener ([1]–[3]) and the others ([5] [16] [17]). Both Shannon and Wiener 6 Carson, J R. Notes on the Theory of Modula- delivered a general information measure based tion. Proc. IRE, 10, 57, 1922. on the probabilistic behaviour of information sources, which they both designate by entropy 7 Nyquist, H. Certain factors affecting tele- due to the likeness with similar expressions in graph speed. Bell Syst. Tech. J., 3, 324–352, statistical mechanics. Shannon, furthermore, 1924. uses this concept in his general definition of channel capacity: 8 Nyquist, H. Certain topics in telegraph trans- mission theory. AIEE Trans., 47, 617–644, C = max[H(x) – Hy(x)]. 1928.

This expression can be interpreted as the maxi- 9 Hartley, R V L. Transmission of informa- mum of the difference of the uncertainty about tion. Bell Syst. Techn. J., 7, 535–563, 1928. the message before and after reception. The result is given in bit/second and gives an upper 10 Küpfmuller, K. Über Einschwingvorgange in bound of how much information can be trans- Wellen Filtern. Elektrische Nachrichten- mitted without error on a channel. The most Technik, 1, 141–152, 1924. astonishing with Shannon’s result, which was not even hinted at by Wiener, was perhaps not so 11 Gabor, D. Theory of communication. J. IEE, much the quantitative expression as the fact that 93 (3), 429–457, 1946. completely error-free information exchange was possible at any channel, as long as the rate was 12 Carson, J R. Selective Circuits and Static below a certain value (the channel capacity). Interference. Bell Syst. Techn. J., 4, 265, 1925. The entropy concept is absent in all the presenta- tions of the other group, which deal explicitly 13 Nyquist, H. Thermal Agitation of Electric with a channel with additive noise. All reasoning Charge in Conductors. Phys. Rev., 32, 1928. is based on this special case of a transmission channel. The genius of Shannon was to see that 14 Armstrong, E H. A Method of Reducing the role of noise (or any other disturbances, be- Disturbances in Radio Signaling by a System ing additive or affecting the signal in any other of Frequency-Modulation. Proc. IRE, 24, way) was to introduce an element of uncertainty 689–740, 1936.

28 Telektronikk 1.2002 15 Carson, J R, Fry, T C. Variable Frequency 19 Oslin. G P. The Story of Telecommunica- Circuit Theory with Application to the The- tions. Georiga, Macon, 1992. ory of Frequency-Modulation. Bell Syst. Tech. J., 16, 513–540, 1937. 20 Blanchard, J. The History of Electrical Reso- nance. Bell Syst. Techn. J., 23, 415–433, 16 Clavier, A G. Evaluation of transmission 1944. efficiency according to Hartley’s expression of information content. Elec. Commun. : ITT 21 Colpitts, Blackwell. Carrier Wave Tele- Tech. J., 25, 414–420, 1948. phony and Telegraphy. J. AIEE, April 1921.

17 Laplume, J. Sur le nombre de signaux dis- 22 Bray, J. The Communications Miracle. New cernables en présence du bruit erratique dans York, Plenum Press, 1995. un système de transmission à bande passante limitée. Comp. Rend. Adac. Sci. Paris, 226, 23 Hagemeyer, F W. Die Entstehung von Infor- 1348–1349, 1948. mationskonzepten in der Nachrichtentechnik : eine Fallstudie zur Theoriebildung in der 18 Carson, J. The statistical energy-frequency Technik in Industrie- und Kriegsforschung. system spectrum of random distrubances. Berlin, Freie Universität Berlin, 1979. (PhD Bell Syst. Tech. J., 10, July, 374–381, 1931. dissertation.)

Telektronikk 1.2002 29 Statistical Communication Theory 1948 – 1949

NIC. KNUDTZON1)

Introduction • The Marshall Plan is initiated. I thank you for the invitation to once again enter this distinguished rostrum here at Telenor’s • NATO is established, with Norway as a mem- R&D Department! ber.

My presentation will consist of three parts: • Mao Tse-Tung proclaims the People’s Repub- 1)The general state of affairs, 1948–1949: lic of China. In order for you to properly understand and appreciate the topic I have been given, which • The state of Israel is proclaimed. Dr. Nic. Knudzon (80) obtained dates back more than 50 years, I initially need his Engineering degree from the to place you in the world of those times. • Mahatma Gandhi is murdered by a Hindu Technical University of Norway, fanatic in Delhi. Trondheim in 1947 and his Doc- tor’s degree from the Technical 2)Statistical communication theory, 1948–1949: University in Delft, the Nether- Consistent with the original terminology, I • Long-playing records are introduced in the lands in 1957. 1948–1949 he shall use statistical communication theory as a USA. was with the Research Labora- tory of Electronics, Massachu- common term for ’s informa- setts Institute of Technology, tion theory and Norbert Wiener’s mathemati- • The number of television receivers is reaching working with information theory cal theory of statistically optimal networks. 750,000 in the USA. and experiments. 1950–1955 he was with the Norwegian De- This part of the presentation is an overview of fence Research Establishment, what I saw, heard, and learned about the topic • A committee is appointed to assess television Bergen, working on the devel- during my stay at the MIT Research Labora- in Norway. opment of microwave radio links; and from 1955 to 1967 he tory of Electronics from February 1948 until was Head of the Communica- July 1949. I had the good fortune – as the only • The transistor is invented. tions Division at Shape Techni- Norwegian – to work in this most inspiring cal Center in The Hage, Nether- lands, where his efforts went into environment during those pioneering years. • Claude E. Shannon publishes “A Mathemati- the planning of military telecom- Furthermore I shall give a very condensed cal Theory of Communication”. munications networks and sys- reprise of the three papers I presented on the tems in Western Europe. From 1968 to 1992 he was Director of topic at “Studiemøtet i radioteknikk og elek- • Norbert Wiener publishes the book “Cyber- Research at the Norwegian Tele- troakustikk” (Symposium of Radio Technol- netics”. communications Administration, ogy and Electro-acoustics) at Farris Bad in working on the planning of future telecommunications systems, 1950. The State of Science and Research networks and services. Dr. in Norway, 1948 Knudtzon has been member of 3)Reflections: Finally, I will conclude with In 1941 we were 30 students who, based on our government commissions and various committees, including some reflections on the developments – outstanding results from the matriculation exam, the Norwegian Research Coun- positive and negative – since those days. were admitted to The Faculty of Electrical Engi- cil, the National Council for neering at The Norwegian Institute of Technol- Research Libraries, the Inter- national Telecommunications The General State of Affairs, ogy (NTH): 20 to Power Electrical Engineering, Union, EURESCOM, etc. 1948 – 1949 and 10 to Electronics. In this wartime period the Let us now place ourselves in 1948–1949 and Faculty had three professors, all in Power Elec- list some notable events during this period: trical Engineering. Their expertise was also made available to the Electronics students, • The relationship between the western world although the frequency range was limited to and The Soviet Union is dominated by the 50 Hz. However, in the second half of our stud- cold war. ies, laboratory engineer Reno Berg introduced us to frequencies above 50 Hz. • The new German states of Western Germany and Eastern Germany are established. Our most memorable experience was the six weeks during the winter of 1946 when Helmer • Harry Truman unexpectedly wins the presi- Dahl (38 at the time) and Matz Jenssen (36) – dential election in the USA. based on their achievements and experience

1) The paper is a transcript of a presentation given by the author at the Telenor seminar “From Shannon’s infor- mation theory to today’s information society: Claude Shannon (1916–2001) In Memoriam” on August 9, 2001. The paper was presented in Norwegian and has been translated by Geir E. Øien.

30 Telektronikk 1.2002 from UK laboratories during World War II – that both NTH and MIT were building new gave us unforgettable inspiration by providing libraries at the time: At NTH they dug with insights into the enormous progress made in our spades and transported by horse-carriages; at scientific discipline. The frequencies reached MIT I saw a bulldozer in action for the first into the GHz range! time.

My Personal Situation I came to a flourishing and proud USA, the win- Please allow me to say a few words about my ner of World War II, now acknowledged as the personal situation, which led me to the USA and world’s leading superpower. to close contact with the pioneers of statistical communication theory. At MIT I worked at the Research Laboratory of Electronics, in the temporary buildings of the During his stay at NTH in early 1946, Helmer Radiation Laboratory, which had been a center Dahl gave a lecture on a topic which, in his own for the US research on radar during World War words, “had nothing to do with the curriculum”: II. Thermal noise as the fundamentally limiting fac- tor of communication systems. This caught my The annual updating within our scientific disci- attention to such a degree that the choice of top- plines took place at the IRE (Institute of Radio ics both for my main thesis and my doctoral the- Engineers) National Convention in New York, sis was made there and then! Since NTH had no and my first meeting with this mustering of expertise in this area, I wrote to Helmer Dahl, theory and practice, shortly after my arrival in who at the time was establishing the so-called 1948, was a great experience. In my diary I com- Department of Radar – in fact their research mented: “Good” (!) – particularly on a March activities came to be primarily associated with 24 session, which included contributions from microwave radio links – at The Norwegian Claude Shannon and Norbert Wiener, whom I Defence Research Institute (FFI), and asked then saw and heard for the first time. Claude whether it would be possible for me to do my Shannon’s contribution was “Communications main thesis under his supervision. in the presence of noise” [1].

The topic of the thesis was remarkable for Nor- Of the many other impressions, I would here way at the time: “Give an overview of molecular particularly like to emphasize the immediate, noise (fluctuation noise), and compute the sig- friendly American way of communicating and nal-to-noise ratio associated with pulse-width the sense of being taken seriously – no matter modulation and frequency modulation for ultra- your age – when you accepted a challenge. shortwave transmission.” This was the first step Claude Shannon serves as a typical example in a long-term strategy to determine the choice in his relations with me. When he, on May 17, of modulation scheme for radio links developed 1948, visited the group of 5–6 researchers to by FFI. The two alternatives were frequency which I belonged at MIT, he invited me to visit modulation, which is compatible with conven- Bell Telephone Laboratories. During the whole tional carrier frequency telephony, and pulse of July 7 he was my cicerone, both at the old modulation with time-shared channels. Hence, location in West Street, and at Murray Hill. in 1947 I was already told to assess fluctuation There I also met other well-known scientists, noise and compute signal-to-noise ratios in a among them H. Nyquist and S.O. Rice, whose digital system, which could not be efficiently articles “Mathematical Analysis of Random realized in those pre-transistor days, but which Noise” [2] and “Statistical Properties of a Sine was possible to analyse mathematically. Wave plus Random Noise” [3] have given me the background for much of my work. On My studies were successful, and FFI sent me to August 10, I personally received from Claude Massachusetts Institute of Technology (MIT) for Shannon a copy of the preprint of his pioneering a period of 5 months, which MIT generously paper “A Mathematical Theory of Communica- offered to extend by one year. Travelling to the tion”, and with this treasure I immediately initi- USA in those days was not a trivial affair; I had ated a seminar series in our research group at to take an oath – with my hand on the Bible – as MIT. I received my government official visa. Subse- quently I travelled by an ocean liner over the Claude Elwood Shannon (1916 – 2001) acquired Northern Atlantic Ocean, in a hurricane. his B.Sc. degrees in electrical engineering and mathematics at Michigan University in 1936, USA 1948 – 1949 and his M.Sc. degree in electrical engineering as The transition from a war-ridden and run-down well as his Ph.D. degree in mathematics at MIT Norway, whose reconstruction had not yet prop- in 1940. He became a National Research Fellow erly started, to the dynamic USA, was over- in 1941. Subsequently he worked at Bell Tele- whelming. As a small illustration, I may mention phone Laboratories and had an affiliation with

Telektronikk 1.2002 31 Figure 1 General Channel Communication System Noise

Information Transmitter Receiver Destination source

Message Signal Signal Message + noise + noise

this institution until 1972. He was a professor at Statistical Communication MIT from 1956 until 1978, when he became Theory Professor Emeritus. Apart from his monumental contributions in statistical communication theory The Development he also wrote papers in several other areas, while In 1928, R.V.L. Hartley at Bell Telephone Labo- at the same time serving as an advocate of mod- ratories concluded in his “Transmission of Infor- eration regarding publishing for the sake of pub- mation” [5], based on physical (as opposed to lishing itself. The most spectacular I have heard psychological) considerations, that the amount about him, is his passion for riding a unicycle of information transmitted in a noiseless system while juggling. Many have offered the opinion is proportional to the product of bandwidth and that Shannon was worthy of a Nobel Prize, but transmission time. it seems like he fell outside the six Nobel cate- gories. My meetings with Claude Shannon took In 1946 this result was developed further by place when he was 32–33 years old and at the D. Gabor at British Thomson-Houston Co. in top of his most productive period, and I remem- “Theory of Communication” [6], but he, too, ber him as a pleasant and straightforward man. confined himself to a system without noise.

Norbert Wiener (1894 – 1964) was in 1948 a At a meeting of the IRE in New York on De- world-famous 52-year-old mathematician who cember 12, 1947, A.G. Clavier of ITT’s Federal made his daily rounds in the MIT corridors and Telecommunication Laboratories presented laboratories, a bit remote and with a particular “Evaluation of Transmission Efficiency Accord- sense for absent-mindedness. Due to his interest ing to Hartley’s Expression of Information Con- in statistical communication theory he visited tent” [7]. This work contained an analysis of the our research group quite often, also observing transmission efficiency of frequency modulation some of my measurements of the statistical and various kinds of pulse modulation, based on properties of noise. I sporadically followed some Hartley’s definition of information content, but of his lectures in mathematics, which could be for a system with noise. quite peculiar – particularly when he became addicted to his own computations at the black- At the same IRE meeting in New York, Claude board, giving us associations to Maxwell’s Shannon presented “Communications in the Demon himself. presence of noise” [1], which is identical to his previously mentioned presentation made at Figure 2 Example of Norbert Wiener was affiliated with MIT since IRE’s National Convention in 1948. He took as prediction of filtered 1919, and was a professor there when Claude a starting point his figure of a “General Commu- fluctuation noise Shannon acquired his Ph.D. in mathematics. nication System” (cf. Figure 1), used a geometri- cal viewpoint when solving the problem, and concluded with his formula for the optimal transmission capacity,

C = W log2 (P + N) / N,

for a communication channel with bandwidth W, average transmit power P, and thermal (Gaus- sian and white) noise power N. Furthermore, he presented figures showing how practical systems compared in quantitative performance to the optimal performance theoretically attainable. I discuss this paper in depth in [11] and [12].

32 Telektronikk 1.2002 Then, in July 1948, Claude Shannon’s pioneer- The “Bit” as a Unit of Information ing paper [4] was published. This paper, too, is The information unit bit for “binary digit” seems discussed in [11] and [12]. Claude Shannon, in a to be officially introduced in print for the first footnote (p. 626 in [4]), expresses the view that time in [4], and was “suggested by J.W. Tukey” “Communication theory is heavily indebted to according to Claude Shannon. John Wilder Wiener for much of its basic philosophy and Tukey (1915 – 2000) was a well-known statis- theory ...”. He also states (p. 627) that “We may tician, a contemporary of Shannon from Bell also here refer to Wiener’s forthcoming book Telephone Laboratories (birth and death one “Cybernetics” dealing with the general prob- year before Shannon’s), and at the same time lems of communication and control.” Without professor at Princeton University. He also intro- going further into the matter, it should also be duced the term “”, and made important noted that Claude Shannon worked on cryptog- contributions to the development of Fast Fourier raphy during World War II. This report was not Transform (FFT) algorithms. declassified until 1948, when it was published as “Communication Theory of Secrecy Systems” [8]. My 1950 Presentations of Statistical Information Theory During World War II Norbert Wiener developed My journal papers [11], [12], and [13] were the “The Extrapolation, Interpolation, and Smooth- first presentations on statistical information the- ing of Stationary Time Series” (National ory in Norway.2) I would like to point out one Defence Research Council, Section D2 Report, thing regarding these papers: The parts dealing Feb. 1942), dealing with control of gunfire with information theory are heavily influenced aimed at targets in motion. This report was also by Claude Shannon’s own way of presenting his classified, and in addition rather heavy reading, theory. Please note how many fundamental so it stayed fairly unknown before it became the results can be derived using simple explanations. basis for an MIT course from 1947. I belonged Following Shannon’s publications many others to the second class following this course. Nor- have come up with articles and books, many of bert Wiener’s starting point is an analysis of sta- which try to glorify the subject matter by using tistically stationary time series, a topic I discuss needlessly complicated descriptions. My advice in [11] and [13]. For illustration, I have chosen has always been: “Read Shannon in original!” to reproduce here – as Figure 2 – a simplified Figure 5 from [13], showing the result of using Reflections a statistically optimal predictor for filtered fluc- tuation noise – built at MIT by my good friend On Mathematical Solutions Charles A. Stutt as part of his Ph.D. work. It may be worth pointing out that the methods of research have changed considerably as comput- In 1948 Norbert Wiener published his book ers have become widely available and obtained “Cybernetics”, about control through communi- greater computational power. cation between, and feedback of the processes in living organisms, in machines, and in society Wiener’s solution for statistically optimal filters, [9]. The title (derived from the Greek word for as developed during World War II, was based on steersman) is not only contextually meaningful, the choice of minimum mean squared error as an it has also been included in the vocabulary of optimality criterion. This led to the Wiener-Hopf many languages. In Chapter III, “Time Series, equations, which were solvable analytically. Information, and Communication” (pp. 74-112), Without further comparison intended, I was one Norbert Wiener gives the following “certifica- of those facing a similar situation in my doctoral tion”: “The relevant general theory has been work. We were dependent on mastering the ana- presented in a very satisfactory form by Dr. C. lytical solutions; otherwise our research might Shannon”. Claude Shannon, in his book review fail completely, even after years of work. Today, of “Cybernetics” [10], writes: “Communication computers are available for obtaining numerical engineers have a charter right and responsibility solutions. However, it might be added that un- in several of the roots of this broad field and will critical use of computers can also lead to a loss find Wiener’s treatment interesting reading, of judgement, intuition, and physical under- filled with stimulating and occasionally contro- standing. versial ideas. – Professor Wiener, who has con- tributed much to communication theory, is to be Good or Bad? congratulated for writing an excellent introduc- The development of telecommunications is tion to a new and challenging branch of science.” steadily accelerating, particularly after the advent of transistors, satellites, and optical fibres

2) Translator’s note: Translated versions of [11], [12] and [13] are found in this volume of Telektronikk.

Telektronikk 1.2002 33 see Figure 3. Shannon gave the system planners the basis for quantitative assessments.

References 1 Shannon, C E. Communications in the pres- ence of noise. Proc. of the IRE, 10–21, 1949.

2 Nyquist, H. Mathematical analysis of ran- dom noise. Bell Syst. Technical Journal, XXIII, 282–332, 1944; and XXIV, 46–156, 1945.

3 Rice, S O. Statistical properties of a sine wave plus random noise. Bell Syst. Technical Journal, XXVII, 109–157, 1948.

4 Shannon, C E. A mathematical theory of communication. Bell Syst. Technical Journal, XXVII, 623–656, 1948; and XXVIII, 623–715, 1948. (Later reprinted in: Shannon, C E, Weaver, W. The Mathemati- cal Theory of Communication. The Univer- sity of Illinois Press: Urbana, 1949.)

5 Hartley, R V L. Transmission of informa- tion. Bell Syst. Technical Journal, VII, 535–573, 1928.

6 Gabor, D. Theory of communication. Jour- Figure 3 Comparison of PCM in practical systems. The number of services nal of IEE, 93, 439–457, 1946. and PPM with ideal available to the users is already overwhelming, performance and seems set to increase further. 7 Glavier, A G. Evaluation of transmission efficiency according to Hartley’s expression As with all technology, telecommunications are of information content. Electrical Communi- in many cases a good servant, but it can also be cation, 25, 414–420, 1948. a bad master. Personally, I particularly dislike three things: 8 Shannon, C E. Communication theory of • The mindless over-use. I am increasingly sur- secrecy systems. Bell Syst. Technical Jour- prised to see that people have so much time nal, XXVIII, 656–715, 1949. and money to spend on telecommunication services. 9 Wiener, N. Cybernetics or The Control and Communication in the Animal and the • The lack of quality in the content that is trans- Machine. New York, The Technology Press, mitted, be it the shallowness of many mobile John Wiley, 1948. phone conversations, bad television programs, or the lack of quality control concerning what 10 Shannon, C E. Cybernetics, or Control and becomes available on the Internet. Communication in the Animal and the Machine, by Norbert Wiener (book review). • The distortion of our language found in e- Proc. of the IRE, 1305, 1949. mail. 11 Knudtzon, N. Statistisk kommunikasjonste- However, regardless of ori. En kortfattet oversikt over problemstill- • type of communication link – copper wires, ingen. Teknisk Ukeblad, 16. nov. 1950, cable, fibres, or radio communication, direct 883–887. or via satellite, fixed or mobile; 12 Knudtzon, N. Informasjonsteori. Elektro- • user equipment – telegraph, telephone, fac- teknisk Tidsskrift, 30, 373–380, 1950. simile, television, or computers; 13 Knudtzon, N. Statistisk optimale nettverk. Claude Shannon’s fundamental capacity formula Elektroteknisk Tidsskrift, 32, 413–416, 1950. defines the optimal channel use, against which the various technical solutions can be measured;

34 Telektronikk 1.2002 The True Channel Capacity of Pair Cables With Respect to Near End Crosstalk

NILS HOLTE

Previously, the channel capacity of pair cables that use two-way transmission has been calculated by means of a worst-case estimate of near end crosstalk (NEXT). This is a pessimistic estimate because near end crosstalk in frequency bands with some frequency separation is uncorrelated. In this paper the true channel capacity in a pair cable with respect to NEXT is calculated by means of a stochastic crosstalk model. It is shown how the mean and variance of the channel capacity can be calculated both by analytical methods and by simulation. The entire probability distribution of the channel capacity is estimated by means of a Monte Carlo simulation over a large ensemble of cables. Hence, a true esti- mate of the 1 % confidence limit of the channel capacity with respect to NEXT can be calculated for a given type of cable. The results are compared with traditional estimates. It is shown for a typical

Nils Holte (55) is professor in example that the worst-case channel capacity is approximately 5 % higher than the traditional estimate Telecommunications at the Nor- and that the average channel capacity is approximately another 10 % higher than the true worst-case wegian University of Science estimate. and Technology (NTNU). He received his Siv.Ing. degree in 1971 and his Dr.Ing. degree in 1976, both from NTNU. His main 1 Introduction probability distribution of the channel capacity research interests are adaptive filters, crosstalk in pair cables, Different types of xDSL (Digital Subscriber is calculated. Hence, the 1 % confidence limit of and digital communications, Line) systems used in the existing pair cables of the channel capacity for a given system is esti- with special emphasis on OFDM the public access network are some of the major mated directly from the probability distribution and xDSL systems. alternatives for implementing fixed broadband of the channel capacity instead of the indirect [email protected] access. Symmetrical systems like SHDSL [1] estimate based upon worst case NEXT (99 % have equal transmission rates in both directions confidence limit). and use two-way transmission within the same frequency band. In this case the dominating The new method will primarily be suitable for noise mechanism is near end crosstalk (NEXT). systems with two-way transmission, where The traditional method for calculating the chan- NEXT is the dominating noise mechanism. nel capacity of pair cables with respect to cross- The results show that the new method gives an talk has been to use Shannon’s channel capacity increase in bitrate of approximately 5 % com- formula, where the noise power spectral density pared to the traditional method when applied is based upon the 99 % confidence limit of the to a system with bandwidth 400 kHz in a 10 pair crosstalk power sum. This approach is used in cable (10 pair binder group). textbooks like Starr, Cioffi and Silverman [2] and also in different xDSL standards, for This new approach may in principle be used also instance the ITU standard for SHDSL [1]. This for far end crosstalk (FEXT). However, in cables method gives a pessimistic estimate for NEXT, with identical propagation constants in all pairs, because NEXT has significant variations with there will be full correlation between FEXT at frequency. The NEXT noise powers in two dif- different frequencies, so that the new and the ferent frequency bands are uncorrelated for a fre- traditional method will give identical results. quency separation greater than approximately 100 – 200 kHz. This effect was first modelled in The paper is organised as follows. First, a NEXT detail by Gibbs and Addie [3] who applied their model is presented together with an analytical model to single carrier baseband systems. analysis of NEXT. Then the Monte Carlo simu- lation is explained, and it is shown how the In the current paper, the basic principles of channel capacity may be calculated. Estimates Gibbs and Addie are generalised and extended to of average rate and worst-case rate are presented the calculation of channel capacity. This gives a for a multicarrier system example, and the re- realistic estimate of the achievable bitrate for a sults are compared with results found by using multicarrier system using adaptive modulation. traditional estimates of NEXT. To conclude, it is The calculations are based both upon analytical explained how this new method might contribute methods and a Monte Carlo simulation of ran- to an improvement of the maximum range of dom crosstalk coupling coefficients throughout transmission systems for the SHDSL applica- the cables in a large ensemble of cables. By tion, by using a multicarrier system instead of means of the Monte Carlo simulation the entire the standardised single carrier SHDSL system.

Telektronikk 1.2002 35 z V V 0 10 1, Ci,j(x) is the mutual capacitance per unit length between pair i and j L (x) is the mutual inductance per unit length z0 i,j between pair i and j C is the capacitance per unit length of the pairs L is the capacitance per unit length of the pairs χ (x) N  is the cable length V20 V2, According to Cravis and Crater [5], the coupling

z0 z0 coefficients are stationary, Gaussian, random processes with correlation length less than a few o , meters. In comparison with actual values of α x x+dx and β, this means that the coupling coefficients may be modelled as white noise processes with correlation functions:

RNi,j(τ) = E[κNi,j(x) ⋅ κNi,j(x + τ)] = Figure 2.1 Near end crosstalk 2 Near End Crosstalk Model κNi,jδ(τ). (2.2) between two pairs The near end crosstalk between two pairs in a multipair cable will consist of contributions as The constants kNi,j will be different for different illustrated in Figure 2.1. It is assumed that both pair combinations and can be estimated from pairs are terminated by their characteristic crosstalk measurements. Coupling coefficients impedance Z0 at both ends of the cable. in different pair combinations are statistically independent. The near end crosstalk transfer function for high frequencies (f > 100 kHz) between pair no. i and Most other authors [3, 5, 6] treat this coefficient pair no. j of a cable is given by Klein [4]: as a random variable. Cravis and Crater [5] assume that kNi,j follows a gamma distribution, HNi,j(f)= while Bradley [6] and Gibbs and Addie [3]  assume a log normal distribution, but these V20 = jβ0 KNi,j(x)exp(−2γx)dx. (2.1) assumptions are only approximations. For a V10 0 given cable design the coefficient kNi,j is a deter- ministic function of the pair combinations, and √ it is mainly determined by the average distance β0 = β0(f)=ω LC is the lossless phase between the two pairs and the difference in constant of the cable in rad/km twisting periods between them [7]. γ = α + jβ is the propagation constant of the cable A pair cable consisting of N pairs is used as an α = α(f) is the attenuation constant of a pair in example. For simplicity, only one specific dis- Neper/km turbed pair is taken into account, and this pair is β = β(f) is the phase constant of a pair in rad/km denoted pair no. 1. The procedure will be the κNi,j(x) = [Ci,j(x) / C + Li,j(x) / L] / 2 is the nor- same for other disturbed pairs. Measurements malised NEXT coupling coefficient between for a 0.6 mm pair cross stranded cable using 10 pair i and j at position x pair binder groups have been averaged over indi- vidual pair combinations [8], and the result is shown in Table 2.1.

Pair combination Average NEXT at 1 MHz There are moderate differences in NEXT power sum between the pairs of a binder group (1.5 dB 1–2 54.2 dB between max and min). The pair with the median 1–3 55.7 dB of average NEXT power sum has been chosen as the disturbed pair (pair no. 1). Table 2.1 shows 1–4 57.1 dB the average NEXT between pair no. 1 and all the 1–5 57.9 dB other pairs in the binder group.

1–6 59.0 dB There will be a minor absolute error by using 1–7 59.0 dB calculations for only one pair, but this error is 1–8 59.1 dB probably less than 1 dB, and this approach will Table 2.1 Average NEXT for certainly be sufficient for a comparison of different pair combinations in 1–9 59.3 dB methods. a 0.6 mm cross stranded 1–10 59.6 dB pair cable

36 Telektronikk 1.2002 2.1 Average NEXT Transfer 2 Pf()= EH () f = Functions ps[] Nps According to the model given by (2.1) and (2.2), N 2 EH() f = the near end crosstalk transfer function will be a ∑ []N1,i random variable with zero mean. The average i= 2 β 2 N NEXT power transfer function for one pair com- 0 kkF= ⋅ 15. , (2.9) 4 ∑ N1,i Nps bination is given by: α i= 2 where kNps is a constant. NEXT power sum is ⎡ 2 ⎤ Pi, j ( f ) = EHNi, j ( f ) = also increasing 15 dB/decade with frequency. ⎣⎢ ⎦⎥ ⎡ ∞∞ E⎢β 2 κ (x)⋅κ (y)⋅ 2.2 The Covariance of NEXT 0 ∫ ∫ Ni, j Ni, j (2.3) ⎣⎢ 0 0 In order to study the correlation between NEXT at different frequencies it is convenient to inves- exp −2γx − 2γ *y ⋅dx ⋅dy . ( ) ] tigate the covariance of the crosstalk power transfer function. The covariance of the NEXT * denotes complex conjugate. crosstalk power transfer function at two frequen- cies f1 and f2 for one specific pair combination is The upper integration limit has been increased defined by:

2 2 covHfHf , = from l to infinity to simplify the calculations. []NNij,,()1 ij (2 ) This can be done because crosstalk coupling at 2 2 EH f⋅ H f the outer end of the cable does not contribute to []NNij,,()1 ij()2 (2.10) NEXT for cable lengths of practical interest. −PfPfPfij,,,()11⋅ ij()⋅ ij() 2 Using (2.2) the result may be expressed:

∞ The first term may be expressed: 2 Pi, j ( f ) = kNi, j ⋅β0 exp(−4αx)⋅dx ∫ 2 2 0 EH⎡ f ⋅ H f ⎤ = 2 ⎢ Ni, j ( 1 ) Ni, j ( 2 ) ⎥ kNi, j ⋅β0 ⎣ ⎦ = . (2.4) ∞∞∞∞ 4α 2 2 E[β01 ⋅β02 ∫ ∫ ∫ ∫ κ Ni, j (x)⋅κ Ni, j (y)⋅ 0 0 0 0 Above 100 kHz, the attenuation and phase con- κ Ni, j (z)⋅κ Ni, j (w)⋅ stants may be approximated by: * * (2.11) exp(−2γ 1x − 2γ 1 y − 2γ 2z − 2γ 2w)⋅ α = α( f ) = α1M F , (2.5) dx ⋅dy⋅dz ⋅dw]

β0 = β = β(f) = β1MF. (2.6) The solution is in correspondence with Gibbs and Addie [3] given by: α and β are the attenuation and phase con- 1M 1M 2 2 EH f⋅ H f = stants at 1 MHz. []NNij,,()1 ij()2 F is the frequency in MHz. k 2 ⋅⋅ββ2 2 ⎡ 1 Nij, 01 02 ⎢ 16 Consequently the NEXT power transfer function ⎣αα12⋅ is expressed: 4 + 2 2 2 ()αα12+ + () ββ12− kNi, j ⋅β1M 1.5 1.5 Pi, j ( f ) = ⋅ F = kN ⋅ F , (2.7) ⎤ (2.12) 4α 4 1M + 2 2 ⎥ ()αα12+ ++() ββ12⎦⎥ where kN is a constant. This is the result found by Cravis and Crater [5], and it shows that aver- age NEXT increases 15 dB/decade of frequency. It is common to use the assumption that α << β in order to simplify the results, and hence the For a cable with many pairs, crosstalk from dif- last term may be neglected. However, this is ferent pairs will add on a power basis. If the only a fair approximation for the lowest frequen- same type of system is used in all pairs of an N- cies. At 100 kHz the ratio between β and α is pair cable, the effective crosstalk is given by the less than 10. Using this assumption, the covari- crosstalk power sum. Near end crosstalk power ance may be expressed: sum for pair no. 1 may be expressed:

N 2 2 2 2 covHfHfNNij,,()1 , ij ()2 = HNps ( f ) = ∑ HN1,i ( f ) . (2.8) [] i=2 2 2 2 kNij, ⋅⋅ββ01 02 2 2 . (2.13) 4 ()αα12+ + () ββ12− Average NEXT power sum will be: []

Telektronikk 1.2002 37 The normalised covariance between the NEXT where: power transfer function for one pair combination at frequencies f and f is found by division with N 1 2 2 the average power sum at the two frequencies: ∑kN1,i i= 2 ρkN = . (2.18) 2 2 N 2 covHfHf , ⎡ ⎤ NNij,,()1 ij ()2 ⎢ k ⎥ ρ = []= ∑ N1,i ⎣i= 2 ⎦ PfPfij,,()12⋅ ij()

4 ⋅⋅αα12 2 2 . (2.14) The approximations (2.5) and (2.6) are used for ()αα12+ + () ββ12− the frequencies F1 and F2 given in MHz. Setting ε = α1M / β1M, the normalised covariance of The covariance of NEXT power sum is defined NEXT power sum is expressed: by: 2 4⋅ρkN ⋅ε ⋅ F1 ⋅ F2 2 2 ρps = . (2.19) 2 2 2 covHfHfNps()1 , Nps ()2 = [] ε ( F1 + F2 ) + (F1 − F2 ) ⎡ N N ⎤ ⎛ 2 ⎞ ⎛ 2 ⎞ EHf⎢⎜ ⎟⋅⎜ Hf⎟⎥ ∑∑NN11,,i () ⎜ ij()2 ⎟ (2.15) For the 0.6 mm cable used as example, the coef- ⎣⎢⎝ i==2 ⎠ ⎝ j 2 ⎠⎦⎥ ficient ρkN is equal to 0.137 and ε = 0.055. The −Pfps()12⋅ Pf ps(). normalised covariance between NEXT power sum at f1 and f2 for this cable is shown in Figure The result is expressed: 2.2.

2 2 The figure shows that near end crosstalk at two covHfHf , = []Nps()1 Nps ()2 different frequencies with a large frequency sep- ββ2 ⋅ 2 N aration is almost uncorrelated. The correlation 01 02 ⋅ k 2 . (2.16) 2 2 ∑ N1,i is low for frequency differences larger than 4 ()αα12+ + () ββ12− i= 2 [] 100 kHz in the frequency range below 500 kHz.

The normalised covariance of NEXT power sum 2.3 Probability Distributions of NEXT is: Crosstalk will vary statistically over an ensemble

2 2 of different cables of the same type. The NEXT covHfHf , Figure 2.2 The normalised []Nps()1 Nps ()2 transfer function is a linear function of the cou- covariance between NEXT ρps = = pling coefficients, and hence it will be a com- Pfps()12⋅ Pf ps() power sum at two frequencies plex Gaussian variable with zero mean. In order 4 ⋅⋅⋅ρααkN 12 as a function of the frequency 2 2 = ρρ⋅ kN, (2.17) to calculate the channel capacity of a cable it is αα+ + ββ− difference ()12 ()12 desirable to know the probability distributions of NEXT. The real and imaginary parts of the NEXT transfer function for a single pair combi- nation are given by:

0.15 ri,j(f)=Re[HNi,j(f)] =  f1 = 200 kHz jβ0 KNi,j(x) · sin(2βx) · exp(−2αx)dx,(2.20) 0 f1 = 500 kHz f1 = 1 MHz qi,j(f)=Im[HNi,j(f)] = f1 = 2 MHz  jβ K (x) · cos(2βx) · exp(−2αx)dx. 0.1 0 Ni,j (2.21) 0

A detailed calculation of the probability distribu- tion of the crosstalk transfer function HNi,j(f) is given in Appendix A. Under the assumption that α << β, the second order moments of ri,j(f) and 0.05 qi,j(f) are given by:

22 Er[]ij,,() f= Eq[] ij () f = kPf⋅ β 2 () Normalised NEXT ps covariance at f1 and f2 Normalised NEXT Nij,,0 = ij , 82α

Er[]ij,,() f⋅ q ij () f = 0 . (2.22) 0 -0.5 -0.4 -0.3 -0.2 -0.10 0.1 0.2 0.3 0.4 0.5 Frequency difference, f2 - f1, MHz

38 Telektronikk 1.2002 The power transfer function of NEXT at one endix B. The variables z1,i,j and z2,i,j are the frequency for a specific pair combination is NEXT power transfer function between pair i 2 2 denoted zi,j, where zi,j = zi,j(f) = ri ,j(f) + qi ,j(f). and j at frequencies f1 and f2 respectively. Because ri,j(f) and qi,j(f) are independent Gaus- sian variables with identical variances, the prob- pi,j(z1,i,j,z2,i,j; f1,f2)= ability density of zi,j will in accordance with appendix A be given by: 1 · P (f ) · P (f ) · (1 − ρ) P (z ; f)= i,j 1 i,j 2 i,j i,j     z z 1 z exp − 1,i,j − 2,i,j · i,j (1 − ρ) · P (f ) (1 − ρ) · P (f ) · exp − ,zi,j ≥ 0. (2.23) i,j 1 i,j 2 Pi,j(f) Pi,j(f)  √   2 ρ z1,i,j · z1,i,j NEXT power sum for pair no. 1 is denoted I0 · . (2.27) 1 − ρ Pi,j(f1) · Pi,j(f2) N z = z(f)= z (f). i,j I0(x) is the modified Bessel function of order i=2 zero [14], and ρ is given by (2.14). The probability distribution of NEXT power sum is given by: The two-dimensional probability density func- tion of NEXT power sums z1 = z(f1) and z2 = p(z; f) = p1,2(z1,2; f) * p1,3(z1,3; f) * ... * z(f2) for pair no. 1 is given by: p1,N(z1,N; f), (2.24) p(z1, z1; f1, f2) = where * denotes convolution. p1,2(z1,1,2, z2,1,2; f1, f2) * p1,3(z1,1,3, z2,1,3; f1, f2) * ... * (2.28) For the general case where all pair combinations p1,N(z1,1,N, z2,1,N; f1, f2). have different values of kNi,j, the result of the convolution is: In this equation * denotes two-dimensional con- volution. Given that the two independent sets of p(z; f ) = parameters (x1, y1) and (x2, y2) have the indepen- dent probability distributions p (x , y ) and p (x , ⎛ ⎞ 1 1 1 2 2 N −3 ⎛ z ⎞ y ), the probability distribution of the sums z = ⎜ P ( f ) ⋅exp − ⎟ 2 1 N []1,i ⎜ ⎟ ⎜ ⎝ P1,i ( f )⎠ ⎟ x + y and z = x + y , may be expressed by the , 1 1 2 2 2 ∑⎜ N ⎟ two-dimensional convolution defined by: Figure 3.1 Illustration of a i=2⎜ P ( f ) − P ( f ) ⎟ ⎜ ∏[]1,i 1, j ⎟ Monte Carlo simulation of ⎜ j =2 ⎟ ⎝ j ≠i ⎠ M cables z ≥ 0. (2.25)

For the special case that all pair combinations have the same NEXT level, P(f) = P1,i(f), i = 2,N, the NEXT power sum will be gamma dis- Cable no M tributed with N – 1 degrees of freedom and prob- ability density given by: Cable no ... p(z; f ) = 1 z N −2 ⎛ z ⎞ Cable no 3 ⋅ ⋅exp⎜ − ⎟, (2.26) Γ(N −1) P( f ) N −1 ⎝ P( f )⎠ [] Cable no 2 z ≥ 0. Cable no 1 Γ(x) is the gamma function. In practical cables the NEXT level usually varies between different Pair no 1 pair combinations, so that (2.25) represents a more realistic case than (2.26).

Pair no 2 2.4 Joint Probability Distribution of NEXT at Two Frequencies In order to calculate estimates of the variance of for instance the channel capacity, it is necessary to know the joint probability distributions of ∆x NEXT at two different frequencies f1 and f2. The following two-dimensional probability density function of NEXT power transfer function for Pair no N a single pair combination is calculated in App-

Telektronikk 1.2002 39 90 est frequency under consideration (∆x < λ / 20). Statistically independent crosstalk couplings are drawn according to (2.2) for all cable segments 80 and all pair combinations in each cable, and an ensemble of many different cables may be gen- erated. This is illustrated in Figure 3.1. 70 Near end crosstalk for a specific pair combina- tion is then calculated from Equation (2.1). This 60 gives results as shown in Figure 3.2 for three individual pair combinations of the 10 pair cable (binder group) described in Table 2.1. This 50 Near end crosstalk, dB shows the typical peaky nature of NEXT that is pair combination 1 pair combination 2 observed in measurements. Average NEXT for this cable type is also shown in the figure as a 40 pair combination 3 average NEXT broken line. In accordance with (2.7) this is a straight line with slope 15 dB/decade of the 30 frequency. 10-1 100 101 Frequency, MHz The NEXT power sum in different cables of the type described in Table 2.1 has been calculated Figure 3.2 NEXT for three pair combinations generated by a Monte Carlo simulation by simulation, and the result is shown in Figure 3.3 for pair no. 1 of three different cables. Due to the averaging over 9 different pair combinations, the NEXT power sum is less peaky than NEXT for individual pair combinations. The figure also 65 shows the average NEXT power sum, which is a straight line that varies 15 dB/decade with fre- quency. 60 4 Channel Capacity 55 The theoretical channel capacity of a copper pair can be found by Shannon’s channel capacity for- mula. A realistic estimate of the channel capac- 50 ity of a pair may be calculated by inserting two factors into Shannon’s channel capacity formula. 45 This modified channel capacity will be a realis- tic estimate of the transmission rate of a multi-

NEXT power sum, dB NEXT carrier system using adaptive modulation. Acc- 40 NEXT ps,cable 1 ording to Starr, Cioffi and Silverman [2], the NEXT ps,cable 2 transmission rate for a system operating in the NEXT ps,cable 3 35 frequency band [fl, fh] of a cable may be ex- average NEXT ps pressed:

 fh   30 exp(−2α) -1 0 1 R()=keff · log2 1+η · df . (4.1) 10 10 10 fl z

Frequency, MHz z = z(f) is the near end crosstalk power sum of the actual pair in the cable keff ≤ 1 is the ratio between user available bitrate and the total bitrate Figure 3.3 NEXT power sum pz( 1,z2 ) = px( 1, x2 )* py( 1, y2 ) = η = 10-mdB/10 where mdB is the distance to the for pair no. 1 of three ∞ ∞ Shannon bound in dB different cables ∫ ∫ px( 1, x2 )⋅ (2.29) x 0 x 0 1 = 2 = The crosstalk power sum, z, is a random vari- pz( 1 − x1,z2 − x2 )dx1 ⋅dx2 . able, and hence the channel capacity or transmis- sion rate of the system will be a random vari- 3 Monte Carlo Simulation able. The moments and probability distribution of NEXT of the channel capacity can either be calculated A cable is simulated in a computer by dividing by analytical methods or by Monte Carlo simu- the cable in short segments of length ∆x, which lation. is much shorter than the wavelength at the high-

40 Telektronikk 1.2002 4.1 Calculation of Channel Capacity 1 M R ()= R (), by Analytical Methods av M k (4.6) The first and second order moments of the chan- k=1 nel capacity can be calculated by means of the 1 M σ2 ()= R2() − R2 (). probability distributions of NEXT power sum R M k av (4.7) found in Section 2.3. The average channel k=1 capacity will be: The 1 % confidence limit (worst case estimate)  fh  ∞ of the bitrate, R1%( ), may be estimated from the E[R()] = keff · p(z; f)· ensemble {R ( )}.  fl 0  k exp(−2α) η · dz · df . (4.2) log2 1+ z 5 System Example The variance of the channel capacity will be: Two-way transmission in pair cables is efficient only at the lowest frequencies. At higher fre- var[R( )] = E[R( )2] – {E[R( )]}2 (4.3) quencies it is clearly advantageous to use one- way transmission. This is exploited in the stan- which may be expressed: dards for ADSL [9] and VDSL [10]. The basic differences between one-way and two-way sys- var[R()] = tems are explained for instance in a tutorial  fh  fh  ∞  ∞ paper by Holte [11]. Two-way transmission is 2 keff · [p(z1,z2; f1,f2)− used in HDSL systems and in the newly stan- fl fl 0 0 dardised SHDSL system [1]. The advantages p(z1; f1) · p(z2; f2)]· of calculating the channel capacity by this new   approach are demonstrated by an example. This exp(−2α1 log2 1+η · example is a system which uses a spectrum simi- z1   lar to that of the SHDSL system with maximum exp(−2α2) rate, 2.3 Mbit/s. The SHDSL system is a base- log2 1+η · dz1· z2 band system with a 10 dB bandwidth of approxi-

dz2 · df 1 · df 2. (4.4) mately 400 kHz. The same bandwidth is used in the example, but because the crosstalk models The integrals of (4.2) and (4.4) cannot be solved are not valid below 100 kHz, the bandwidth has analytically and have to be calculated either by been shifted up to the frequency interval from approximations or numerical integration. In Sec- 100 kHz to 500 kHz. Hence the capacity esti- tion 5 it is shown that the channel capacity is mates will be pessimistic in comparison with a approximately Gaussian, and hence an approxi- system operating from 0 – 400 kHz. The main mate analytical probability of the channel capac- assumptions of the system model used in the ity is defined by (4.2) and (4.4). However, due to simulations are listed below: the complexity of both the above integrals and the convolution integral (2.28) the use of Monte Bandwidth: Carlo simulation has been chosen in the rest of 100 kHz < f < 500 kHz this paper. Modulation: Multicarrier modulation (DMT), [12] 4.2 Calculation of Channel Capacity Modul. meth. in each sub-band: by Monte Carlo Simulation Trellis coded M-QAM, 4 ≤ M ≤ 16384 An ensemble of M different cables is generated Corresp. bandwidth efficiency: in a Monte Carlo simulation. The channel capac- 1 to 13 bit/s/Hz ity of cable no. k is found by: Distance to Shannon bound: 9 dB (6 dB margin + 3 dB for TCM) Rk()= User available bitrate:    90 % of total bitrate fh − α k · η · exp( 2 ) df , eff log2 1+ 2 (4.5) Cable type: fl | HNps(f) | 0.6 mm copper cable (22 AWG) Cable size: 2 where | H Nps ( f ) | is the NEXT power sum of 10 pair binder group this sum of this specific cable. Attenuation constant: 15.1 dB/km at 1 MHz, proportional to f By means of the above formulas, the channel Phase constant: capacity can be calculated for the ensemble of 31.4 rad/km at 1 MHz, proportional to f cables. The average bitrate and the variance of Noise model: the bitrate are estimated by: Only NEXT between identical systems within a binder group; all pairs used

Telektronikk 1.2002 41 Average NEXT power sum: Both the average bitrate of the simulated cables 47.9 dB at 1 MHz, proportional to f1.5 (pair 1) and the 1 % confidence limit of the bitrate are 99 % conf.limit, NEXT ps: shown in Figure 6.2 as a function of the cable 44.7 dB at 1 MHz (pair 1) length.

In order to take into account further practical The difference between the two upper curves limitations connected to adaptive modulation, shows the average increase in bitrate that is the estimation of transmission rate has been obtained by using adaptive modulation instead slightly modified in comparison with Equation of offering a guaranteed fixed rate. This increase (4.5). The bandwidth efficiency is set to zero at is approximately 0.25 Mbit/s. The increase in frequencies where the signal to noise ratio gives bitrate is typically 10 % at range 3 km. Figure a bandwidth efficiency below 1 bit/s/Hz, and the 6.2 also shows the bitrate for the same type of bandwidth efficiency is limited to a maximum of system based upon a traditional estimate of 13 bit/s/Hz. In the results presented in this paper, NEXT. This means that the 99 % confidence no quantisation of bandwidth efficiency has been limit of NEXT power sum is used at all frequen- used within the allowed interval. An alternative cies. The difference between the two lower approach will be to quantise the bandwidth effi- curves shows the increase in bitrate that is ciency to integer values in order to simulate obtained by using the capacity estimates of the adaptive modulation which uses two-dimen- new method instead of the traditional approach. sional trellis coding. The reason for not using The increase in bitrate is approximately 5 % for quantisation of bandwidth efficiency in this this example. The increase in bitrate between the paper is that it introduces more or less random two approaches decreases for longer ranges. The quantisation effects in the results, and this may reason is that the useful bandwidth decreases be disturbing in relative comparisons. It is rec- with range due to high attenuation in the upper ommended to use quantisation of bandwidth end of the frequency. Hence, the number of fre- efficiency in detailed dimensioning cases. quency bands that have independent NEXT will decrease as the cable length increases. 6 Results for Adaptive Modulation 7 Implications for New Bitrates for different cable lengths have been Systems estimated by means of the Monte Carlo simula- The newly standardised SHDSL system [1] is a tion, based upon the above methods and the single carrier baseband system using 16-level crosstalk measurements shown in Table 2.1. The pulse amplitude modulation and trellis coded Figure 6.1 Cumulative estimated bitrates for 10,000 different cables at modulation, and hence it has a bandwidth effi- distribution of estimated cable lengths 1, 2 and 3 km are shown in Figure ciency of less than 6 bit/s/Hz. If a new system bitrate for a 0.6 mm cable 6.1 in a normal distribution plot. This result is implemented which uses adaptive multicarrier for different cable lengths shows that the bitrate of a system with adaptive modulation and is spectrally compatible with modulation fits very well with a Gaussian proba- SHDSL, this new system will obtain signifi- bility distribution. cantly higher bitrates for the same cable lengths. The increase in bitrate is due to the following effects: Normal Probability Plot • higher bandwidth efficiency of multicarrier L = 2 km modulation, in particular at the lowest fre- 0.999 L = 3 km 0.997 L = 4 km quencies; 0.99 0.98 • less excess bandwidth for multicarrier modu- 0.95 0.90 lation;

0.75 • independent NEXT in different frequency 0.50 bands as explained in this paper;

Probability 0.25 • a variable rate system will have a larger aver- 0.10 age bitrate than a fixed rate system as shown 0.05 in Figure 6.2. 0.02 0.01 0.003 8 Conclusions 0.001 It has been shown how the true channel capacity of a pair cable with respect to NEXT can be cal- culated both analytically and by simulation. It 1.5 2 2.5 3 3.5 4 is demonstrated that by taking into account the Bitrate, Mbit/s frequency variations of near end crosstalk, the

42 Telektronikk 1.2002 channel capacity for two-way transmission in 4.5 pair cables may be increased in comparison with traditional methods. The increase in worst-case 4 Average bitrate transmission rate is approximately 5 % for a typ- 1% confidence limit 3.5 ical system example. For the same example it is Traditional estimate also shown that the average bitrate for a rate adaptive system may be increased by typically 3 10 % in comparison with a system with a guar- anteed minimum rate. This applies to multicar- 2.5 rier systems with adaptive modulation that use 2 two-way transmission and echo cancelling at all Bitrate, Mbit/s frequencies. 1.5 Appendix A Probability Distributions of 1 NEXT at a Single Frequency 0.5 The simplified notation r = ri,j(f) and q = qi,j(f) is used in this appendix for the real and imaginary 0 part of the NEXT transfer function for one pair 1 2 3456 combination. Because the NEXT transfer func- Range, km tion is a linear function of the coupling coeffi- cients and the coupling coefficients are Gaussian with zero mean, r and q will be jointly Gaussian with zero mean. The variance of the real part will be given by: 2 Figure 6.2 Estimated bitrates β ⋅k 2 2 2 ∞ ∞ 2 2 0 Ni, j α + β for two-way transmission as a 2 2 σq = Eq = ⋅ 2 2 . (A.7) Er[]= βκκ0 ∫ ∫ E[]NNij,,() x⋅ ij() y ⋅ [] 8α α + β 0 0 function of range for a sin()22ββxy⋅ sin()⋅ multicarrier system with The correlation coefficient is defined by: adaptive modulation on a exp()−−22ααx y dx⋅ dy . (A.1) 0.6 mm pair cable Er[]⋅q α Inserting (2.2) gives: ρrq = = . (A.8) σr ⋅σq 2α 2 + β 2 Er[]2 = ∞ According to Papoulis [13], the probability den- 2 2 (A.2) β0 ⋅kNi, j ∫ sin (2βx)⋅exp(−4αx)dx. sity of NEXT for a single pair combination may 0 be expressed:

Solving the integral gives the result: 1 pi, j (r,q) = ⋅ 2 2 2 2 2 β0 ⋅kNi, j β 2π ⋅σr ⋅σq 1− ρrq σr = Er = ⋅ . (A.3) [] 8α 2 2 ⎡ ⎤ α + β 1 ⎛ r2 r ⋅q q2 ⎞ exp⎢− − 2ρ + ⎥. (A.9) 2 ⎜ 2 rq 2 ⎟ ⎢ 21− ρ σ σr ⋅σq σ ⎥ ⎣ ( rq ) ⎝ r q ⎠ ⎦ The remaining second order moments of the NEXT transfer function are calculated corre- spondingly: Insertion of (A.3), (A.7) and (A.8) and using the notation Pi,j = Pi,j(f) for average NEXT transfer ∞ function gives: 2 Er[]⋅q = β0 ⋅kNi, j ∫ sin( 2βx)⋅ 0 αβ22+ prq(,)= cos( 2βx)⋅exp(−4αx)dx, (A.4) ij, 2πβ⋅⋅Pij, ⎡ rq22+ 2α ⋅ r ⎤ ⎢ ⎥ (A.10) 2 exp− − 2 ()αβ⋅−⋅rq . β0 ⋅kNi, j β ⎣⎢ Pij,,β ⋅ P ij ⎦⎥ Er[]⋅q = ⋅ , (A.5) 8 α 2 + β 2 By a change of variables the probability density ∞ may be expressed by the NEXT power transfer 2 2 2 2 2 Eq[]= β0 ⋅kNi, j ∫ cos (2βx)⋅ function z = r + q and the phase angle ϕ = 0 arctan(q / r): exp(−4αx)dx, (A.6)

Telektronikk 1.2002 43 ∞ α 2 + β 2 ⎡ z Er ⋅r = β ⋅β ⋅k sin 2β x ⋅ p (z, ) exp []k l k l Ni, j ∫ ( k ) i, j ϕ = ⎢− ⋅ 0 2π ⋅β ⋅ Pi, j ⎣⎢ Pi, j (B.3) sin( 2βl x)⋅exp(−2α k x − 2αl x)dx, 2 2 ⎤ α + β z ⋅α (A.11) 2 + 2 ⋅cos( 2ϕ − θ )⎥, β β ⋅ P ⎥ i, j ⎦ ∞ Er[]k ⋅ql = βk ⋅βl ⋅kNi, j ∫ sin( 2βk x)⋅ where ϕ = arctan(β / α), z ≥ 0, and 0 ≤ ϕ < 2π. 0 (B.4) cos( 2βl x)⋅exp(−2α k x − 2αl x)dx, The phase of the signals in different pairs in a cable is random. Hence, the phase angles of ∞ NEXT transfer functions are irrelevant. The Eq[]k ⋅ql = βk ⋅βl ⋅kNi, j ∫ cos( 2βk x)⋅ 0 probability density of the NEXT power transfer (B.5) cos 2β x ⋅exp −2α x − 2α x dx, function for one pair combination is given by: ( l ) ( k l ) k,l = 1,2. 2π pi, j (z) = ∫ pz( ,ϕ)dϕ = 0 The solutions are expressed: 2 2 ⎡ 2 2 ⎤ α + β z α + β β0k ⋅β0l ⋅kNi, j exp⎢− ⋅ ⎥ ⋅ Er ⋅q = 2 []k l 4 β ⋅ Pi, j ⎣⎢ Pi, j β ⎦⎥ ⎡ α + α α + α ⎤ ⎛ 2 2 ⎞ ⎢ k l + k l ⎥ (B.6) z ⋅α ⋅ α + β ⎢ 2 2 2 2 ⎥ I ⎜ ⎟, z ≥ 0 (A.12) ⎣(α k + αl ) + (βk − βl ) (α k + αl ) + (βk + βl ) ⎦ 0 ⎜ 2 ⎟ ⎝ β ⋅ Pi, j ⎠

ββ00klij⋅⋅kN, Er[]kl⋅ q = Under the assumption α << β this simplifies to 4 an exponential distribution: ⎡ ⎤ ββkl− ββkl+ ⎢ 22+ 22⎥ (B.7) ⎣⎢()ααkl+ + () ββ kl− ()ααkl+ ++() ββ kl⎦⎥ 1 ⎛ z ⎞ pi, j (z) = exp⎜ − ⎟, z ≥ 0 (A.13) Pi, j ⎝ Pi, j ⎠ ββ00klij⋅⋅kN, Eq[]kl⋅ q = 4 Appendix B ⎡ αα+ αα+ ⎤ ⎢ kl kl ⎥ Joint Probability Distributions 22+ 22(B.8) ⎢ αα+ + ββ− αα+ ++ ββ⎥ of NEXT at Two Different ⎣()kl() kl ()kl() kl⎦ Frequencies In order to calculate the joint probability distri- Under the assumption α << β the last term in the bution of NEXT at two different frequencies f1 equations (B.6) – (B.8) may be neglected, and and f2, the following simplified notation is intro- for this case the covariance matrix is given by: duced in this appendix: r = r (f ), q = q (f ), 1 i,j 1 1 i,j 1 ⎡a 0 cd⎤ r2 = ri,j(f2) and q2 = qi,j(f2). Furthermore, the ⎢0 a −dc⎥ Ci, j = c db 0 . (B.9) parameters α, β, β0 are given an additional index ⎢ − ⎥ ⎣⎢dc 0 b⎦⎥ 1 or 2 to denote the values at f1 and f2 respec- tively. The vector of NEXT variables is defined by: Using the notation P1,i,j = Pi,j(f1) and P2,i,j = Pi,j(f2) for average NEXT, the elements of the T vi,j = [r1 q1 r2 q2] . (B.1) matrix may be expressed:

T P where denotes transposition. The vector vi,j 2 2 1,,ij aEr= []1 = Eq[]1 = , (B.10) will in correspondence with Appendix A be 2 Gaussian with zero mean, and the probability P2,i, j density function is given by [13]: b = Er2 = Eq2 = , (B.11) []2 []2 2 pi, j (vi, j ; f1, f 2 ) = 1 ⎛ 1 T ⎞ c = Er[]1 ⋅r2 = Eq[]1 ⋅q2 = ⋅exp − vi, jCi, jvi, j , (B.2) 2 ⎝ 2 ⎠ P P 2π) C 1,i, j ⋅ 2,i, j (α1 + α2 )⋅ α1 ⋅α2 ( i, j ⋅ , (B.12) 2 2 2 (α1 + α2 ) + (β1 − β2 )

where Ci,j is the covariance matrix of vi,j. The elements of Ci,j are calculated in the same way d = Er[]1 ⋅q2 = −Eq[]1 ⋅r2 = as in Appendix A and are given by: P P 1,i, j ⋅ 2,i, j (β1 − β2 )⋅ α1 ⋅α2 ⋅ , (B.13) 2 2 2 (α1 + α2 ) + (β1 − β2 )

44 Telektronikk 1.2002 E[r1 ⋅ q1] = E[r2 ⋅ q2] = 0. (B.14) pi, j (z1,z2 ) = 2π 2π Gibbs and Addie [3] have neglected the term ∫ ∫ pi, j (z1,ϕ,z2 ,θ )dϕ ⋅dθ = d. This is obviously a significant term, which 0 0 1 ⎛ bz1 + az2 ⎞ means that Gibbs and Addie’s calculation of ⋅exp⎜ − ⎟ ⋅ joint probability densities at two frequencies is ⎠ incorrect. ⎛ z ⋅ z ⋅ c2 + d 2 ⎞ I ⎜ 1 2 ⎟, (B.20) 0 ⎜ g ⎟ ⎝ ⎠ The determinant of Ci,j will be: z1,z2 ≥ 0. 2 2 2 2 | Ci,j | = (a ⋅ b – c – d ) = g . (B.15) The parameters a, b, c, d are replaced according where g may be expressed: to (B.10) – (B.13), and the result is expressed by the normalised covariance ρ defined in (2.14). PP⋅ Hence, the joint probability density of the NEXT gabc= ⋅−22 − d= 12,,ij ,, ij⋅ 4 power transfer function at two frequencies f1 and f for one pair combination is given by: ⎛ αα⋅ ⎞ 2 ⎜1 12 ⎟. (B.16) ⎜ − 2 2 ⎟ ⎝ ()αα12+ + () ββ12− ⎠ 1 pi, j (z1,z2 ) = ⋅ P1,i, j ⋅ P2,i, j ⋅(1− ρ) The inverse correlation matrix is given by: ⎛ z1 z2 ⎞ b 0 −c −d exp − − ⋅ ⎡ ⎤ ⎜ P P ⎟ −1 1 ⎢ 0 bd−c⎥ ⎝ 1,i, j 2,i, j ⎠ Ci, j = . (B.17) g ⎢−cd a 0 ⎥ ⎛ ⎞ ⎢−d −c 0 a ⎥ 2 ρ z1 ⋅ z2 ⎣ ⎦ I0 ⎜ ⎟, (B.21) ⎝ 1− ρ P1,i, j ⋅ P2,i, j ⎠ z1,z2 ≥ 0. The probability density of vi,j may hence be expressed: References 1 1 ITU-T. Single-Pair High-Speed Digital Sub- p v i, j ( i, j ) = 2 ⋅ (2π) g scriber Line (SHDSL) transceivers. Geneva, 2001. (ITU-T Recommendation G.991.2.) ⎛ 1 2 2 2 2 exp⎜ − br1 + q1 + ar2 + q2 − (2.18) ⎝ 2g [ ( ) ( ) 2 Starr, T, Cioffi, J M, Silverman, P J. Under- 2cr⋅r + q ⋅q − 2dr⋅q − q ⋅r . ( 1 2 1 2 ) ( 1 2 1 2 )]) standing digital subscriber line technology. Upper Saddle River, Prentice Hall, 1999. A change of variables to NEXT power transfer functions and NEXT phase angles is carried out. 3 Gibbs, A J, Addie, R. The covariance of near 2 2 The new variables are, z1 = r1 + q1 , ϕ = end crosstalk and its application to PCM sys- 2 2 arctan[q1 / r1], z2 = r2 + q2 , and θ = arctan[q2 / tem engineering in multipair cable. IEEE r2], and the probability density may then be Trans. on Commun., COM-27 (2), 1979, written: 469–477.

1 4 Klein, W. Die Theorie des Nebensprechens pi, j (z1,ϕ,z2 ,θ ) = ⋅ (4π)2 g auf Leitungen. Berlin, Springer, 1955. ⎛ 1 exp − bz + az + 2 z ⋅ z ⋅ 5 Cravis, H, Crater, T V. Engineering of T1 ⎜ 2g [ 1 2 1 2 ⎝ carrier system repeatered lines. Bell System 2 2 ⎤⎞ c + d cos(ϕ − θ + ψ ) ⎟, (B.19) Techn. Journal, March 1963, 431–486. ⎦⎥⎠ 6 Bradley, S D. Crosstalk considerations for a where ψ = arctan[d / c], z1, z2 ≥ 0, and 0 ≤ ϕ, θ 48 channel PCM repeatered line. IEEE <2π. The phase angles of NEXT transfer func- Trans. on Communications, COM-23 (7), tions are irrelevant, and the joint probability den- 1975, 722–728. sity of z1 and z2 will be: 7 Holte, N. A Crosstalk Model for Cross- Stranded Cables. Int. Wire & Cable Symp., Cherry Hill, USA, Nov. 1982.

8 Holte, N. Crosstalk in subscriber cables, Final report. Trondheim, SINTEF, 1985.

Telektronikk 1.2002 45 (SINTEF report no. STF44 F85001.) (In 12 Bingham, J A C. Multicarrier modulation for Norwegian.) data transmission : An idea whose time has come. IEEE Communications Magazine, 9 ITU-T. Asymmetric Digital Subscriber Line May 1990, 5–19. (ADSL) transceivers. Geneva, 1999. (ITU-T Recommendation G.992.1.) 13 Papoulis, A. Probability, random variables, and stochastic processes. 3rd. ed., New 10 ETSI. Very high speed Digital Subscriber York, McGraw Hill, 1991. Line (VDSL); Part 1: Functional require- ments. Sophia Antipolis, 1999. (ETSI TS 14 Abramowitz, M, Stegun, I A. Handbook of 101 270-1, V1.2.1.) mathematical functions. New York, Dover, 1970. 11 Holte, N. Broadband communication in existing copper cables by means of xDSL systems – a tutorial. NORSIG, Trondheim, Norway, October 2001.

46 Telektronikk 1.2002 Bounds on the Average Spectral Efficiency of Adaptive Coded Modulation

KJELL J. HOLE

An introduction to adaptive coded modulation (ACM) was given in an earlier paper by Hole and Øien [4]. It was shown that ACM may achieve a large average spectral efficiency (ASE) on wireless channels with slowly varying frequency-flat fading. This paper presents new upper bounds on the ASE of ACM with maximum likelihood decoding and sequential decoding. The bounds are compared to the theoreti- cal maximum ASE. It is found that this theoretical maximum provides an optimistic upper bound on the achievable ASE of practical ACM schemes. However, the new bounds indicate that ACM may still pro- vide a large ASE.

Kjell Jørgen Hole (41) received I. Introduction We begin by introducing, in Section II, a com- his BSc, MSc and PhD degrees Time-varying channel conditions is an important munication system model utilizing ACM/QAM in from the feature of most wireless communication sys- on an arbitrary single-user flat-fading channel. University of Bergen in 1984, 1987 and 1991, respectively. tems. Future systems must therefore exhibit a Section III then upper bounds the ASE of ACM/ He is currently Senior Research high degree of adaptivity on many levels to sup- QAM both for maximum likelihood decoding Scientist at the Department of port traffic flows with large information rates (MLD) and for sequential decoding (SD). As an Telecommunications at the Nor- wegian University of Science (measured in transmitted information bits per example, the MLD bound is applied to single- and Technology (NTNU) in second). Examples of such adaptivity are: power user Nakagami fading channels, and both the Trondheim. His research inter- control, code adaptation, bandwidth adaptation, MLD and SD bounds are compared to the theoret- ests are in the areas of coding theory and wireless communi- antenna adaptation, and protocol adaptation [1], ical maximum ASE for Nakagami fading chan- cations. [2]. This paper deals with code adaptation. nels [23]. Conclusions are drawn in Section IV. [email protected] The spectral efficiency of a wireless channel II. Fading Channels and ACM or link is equal to the information rate per unit The discrete-time system model consists of a bandwidth. When the instantaneous spectral effi- transmitter and a receiver communicating over ciency varies due to code adaptation, the aver- a single user wireless channel degraded by mul- age spectral efficiency (ASE) should be maxi- tipath fading. The fading is assumed to remain mized. Goldsmith and Varaiya [3] have deter- nearly constant over hundreds of channel sym- mined the maximum ASE of a single-user com- bols. Pilot symbols are sent repeatedly over the munication system with a slowly varying fre- channel to ensure that the receiver is able to quency-flat fading channel, fixed transmit signal fully compensate for the amplitude and phase power, and perfect channel-state information variations in the received signal, i.e. we assume (CSI) available at the transmitter and receiver. ideal channel estimation and coherent detec- tion.1) Hence, we may model the stationary and It is shown in [3] that the maximum ASE may be ergodic fading amplitude α(t) (≥ 0) as a stochas- obtained by a theoretical adaptive coding tic variable with real values. scheme. Adaptive coded modulation (ACM) with fixed transmit power may be a practical Let the transmitted signal have complex base- adaptive transmission technique for frequency- band representation x(t) at time index t ∈ {0, 1, flat fading channels with available CSI [4]–[11]. 2, ...}. The received baseband signal is then ACM may utilize e.g. a set of classical trellis given by y(t) = α(t)x(t) + n(t) where n(t) denotes codes [12]–[17] or a set of turbo-trellis codes complex AWGN. The real and imaginary parts [18]–[21] in which each code is designed for of the noise are statistically independent, both good performance on an additive white Gaussian with variance (N0B)/2 where N0 [W/Hz] is the noise (AWGN) channel. If the codes in a set are total one-sided noise power spectral density and based on quadrature amplitude modulation B [Hz] is the one-sided channel bandwidth. (QAM) constellations of different sizes [22], then a low bit error rate (BER) and large ASE Denote the average transmit power by S [W]. may be obtained simultaneously by switching The instantaneous received carrier-to-noise adaptively between the codes according to the ratio (CNR) is represented by the stochastic 2 CSI. The goal of this paper is to present new variable γ(t) = α (t)S/(N0B). (In the sequel we upper bounds on the ASE of single-user wireless omit the time reference t and refer to α and γ.) ACM/QAM channels with frequency-flat fading. Let p(γ) be the probability density function (pdf)

1) The overhead bandwidth associated with the pilot symbols is ignored.

Telektronikk 1.2002 47 of the CNR γ. We only assume that p(γ) is a con- Assuming Nyquist signaling, the time used to tinuous function on the interval [0, ∞). The CNR transmit one two-dimensional QAM symbol is T = 1/B [s]. Since the number of information has expectation E[γ] = γ = ΩS/(N0B), where s bits per QAM symbol is log (M ) – c/L, the E[α2] = Ω is the average received power gain. 2 n information rate of code n is R = (log (M ) – We assume that there exists a noiseless and zero- n 2 n c/L)/T [bits/s], and the spectral efficiency is delay feedback channel from the receiver to the s R /B = log (M ) – c/L [bits/s/Hz]. Consequently, transmitter such that both the transmitter and n 2 n the ASE of all N codes is [9],[24] receiver have perfect knowledge of the instanta- neous received CNR γ at all times. ASE(){}γ n N Let N quantization levels (or fading regions) Rn = ∑ ⋅ P()γγnn, +1 represent the time-varying received CNR γ. n =1 B When ACM is used, one trellis code designed to N logMcLP / , combat AWGN is assigned to each fading region = ∑()21()n − ()γγnn+ (2) n =1 [4]–[11]. The N fading regions are defined by []bits/s/Hz the thresholds 0 < γ1 < γ2 < ... < γN+1 = ∞. Code n, n ∈ {1, 2, ..., N}, is utilized every time the in- stantaneous received CNR γ falls in region n, i.e. where when γn ≤ γ < γn+1. The ACM system is de- signed such that the BER never exceeds a given γ n+1 Ppd()γγnn,()+1 = ∫ γ γ (3) γ n target maximum BER0 for any CNRγ. Details on N +1 how to choose thresholds γ to achieve is the probability of the instantaneous CNRγ {}n n=1 falling in fading region n. BER ≤ BER0 for a given set of N codes can be found in [9]. In principle, infinitely many sets of 2L-dimen- sional codes are available. We restrict ourselves Fading region n = 1 represents the smallest val- to code sets which contain N codes and consider ues of γ for which information is transmitted. No all such sets having the same parameters c, L, information is sent when γ < γ simply because N 1 and M . In general, the code sets will the channel quality is too bad to successfully {}n n=1 transmit any information with the available correspond to different sets of thresholds {γn} codes. Hence, an ACM scheme experiences for a common target BER0 [9]. A change of code an outage during which information must be set therefore implies changing one or more of buffered at the transmitter end. The probability the associated thresholds. To determine which of outage is code set maximizes the ASE under the given conditions, we first prove the following result.

out γ1 +∞ Ppdpd()γγγγγ1 ==∫∫()1− () . (1) 0 γ1 Lemma 1: For given c, L, BER0, and {Mn}, the ASE defined by (2) and (3) does not decrease III. Bounds on ASE of ACM but may increase when a threshold γn is reduced. This section upper bounds the ASE of ACM utilizing trellis codes for QAM signal constella- Proof: First, let n > 1 and reduce the nth thresh- tions. old from γn to γˆn such that γn-1 ≤ γˆn < γn. III.A Bound for MLD Keep the other thresholds fixed. We study the Let 4 ≤ M1 < M2 < ... < MN denote the number of sum of the (n – 1)th and nth terms in (2). Before symbols in N two-dimensional QAM constella- the threshold is changed, the sum is given by tions of growing size, and let code n be based ˜ ˜ Q = Rn−1 ⋅ P(γ n−1,γ n ) + Rn ⋅ P(γ n ,γ n+1 ) on the constellation with Mn symbols. For some small fixed L ∈ {1, 2, ...}, the encoder for code n ˜ accepts L ⋅ log2(Mn) – c information bits at each for Rj = Rj / B = log2 (M j ) − c / L, j = n −1, n , time index k = L ⋅ t ∈ {0, L, 2L, ...} and gener- while after the threshold is changed the sum ates L ⋅ log (M ) coded bits, 1 ≤ c < L ⋅ log (M ). 2 n 2 n becomes The coded bits specify L modulation symbols in ˆ ˜ ˆ ˜ ˆ the nth QAM constellation. These symbols are Q = Rn−1 ⋅ P(γ n−1,γ n ) + Rn ⋅ P(γ n ,γ n+1 ). transmitted at time indices k, k + 1, ..., k + L – 1. The L two-dimensional symbols can be viewed The pdf p(γ) in (3) is assumed to be a continuous as one 2L-dimensional symbol, and for this rea- function and we may write son the code is said to be a 2L-dimensional trel- lis code. ˆ ˆ P(γ n−1,γ n ) = P(γ n−1,γ n ) + P(γ n ,γ n ) ˆ ˆ P(γ n ,γ n+1 ) = P(γ n ,γ n ) + P(γ n ,γ n+1 )

48 Telektronikk 1.2002 ˆ MLD SD The difference D = Q – Q then becomes nMn LRn/B γ n [dB]γ n [dB]

1 4 1 1.0 0.00 2.39 D = R˜ − R˜ ⋅ P γˆ ,γ []n n−1 ( n n ) 2 1.5 2.62 4.80

˜ ˜ 2 8 1 2.0 4.77 6.80 where Rn > Rn -1 > 0. Since P(,γˆn γn) ≥ 0, we 2 2.5 6.68 8.61 have D ≥ 0. When n = 1 the difference reduces to 3 16 1 3.0 8.45 10.30 2 3.5 10.13 11.94 D = R˜ [P( γˆ , γ ) – P(γ , γ )] 1 1 2 1 2 4 32 1 4.0 11.76 13.53 ˜ 2 4.5 13.35 15.09 = R 1 ⋅ P( γˆ 1, γ1) 5 64 1 5.0 14.91 16.63 ˜ 2 5.5 16.46 18.17 for some R 1 > 0. Since P( γˆ 1, γ1) ≥ 0, we again have D ≥ 0. Q.E.D. 6 128 1 6.0 17.99 19.69 2 6.5 19.52 21.21 It is possible to lower bound the thresholds {γ } n 7 256 1 7.0 21.04 22.73 which can be used for a given BER . Code n is 0 2 7.5 22.55 24.24 only active when the instantaneous CNR falls in 8 512 1 8.0 24.07 25.75 fading region n, i.e. γ ∈ [γn, γn+1). Hence, for a given region n, the fading channel can be 2 8.5 25.58 27.26 approximated by a bandlimited AWGN channel 9 1024 1 9.0 27.08 28.76 with CNR at least equal to γn. The BER must 2 9.5 28.59 30.27 never exceed the target maximum BER0 and 10 2048 1 10.0 30.10 31.78 code n must therefore achieve BER ≤ BER0 on 2 10.5 31.61 33.28 an AWGN channel of CNR γn [9]. The maxi- mum spectral efficiency of this AWGN channel 11 4096 1 11.0 33.11 34.79 is equal to the Shannon capacity, Cn [bits/s], 2 11.5 34.62 36.30 divided by the bandwidth B [Hz]

C n = log 1+ γ [bits/s/Hz]. B 2 ( n ) Proof: The ASE defined by (2) and (3) has an Table 1 Spectral eficiency MLD MLD Since the spectral efficiency of code n must sat- absolute maximum at (γ 1 , ..., γ n ). To see Rn/B [bits/s/Hz] and minimum MLD isfy Rn/B ≤ Cn/B, we have this, reduce the threshold γ1 to its minimum thresholds γ n [dB] and MLD SD value γ 1 . It follows from Lemma 1 that the γ n [dB] for c = 1 and various def Mn MLD ASE is not reduced. Repeat the procedure for values of n and L γ n ≥ c/ L −1 = γ n , n = 1, 2, ..., N. 2 each of the remaining thresholds γ2, γ3, ..., γN. Q.E.D. Note that the spectral efficiency Rn/B is obtained out for a small finite target BER0 while an arbitrarily Since the outage probability P (γ1) in (1) does small BER is assumed for the maximum spectral not increase but may decrease when the thresh- efficiency Cn/B. Hence, the lower bound old γ1 is reduced, we also have MLD γ n is in fact valid for any fixed BER0. Fur- thermore, observe that the proof of the Shannon Corollary 1: The minimum outage probability out MLD capacity Cn (see e.g. [25, Sec. 10.1]) does not in an ACM/QAM system is P (γ 1 ). require MLD; however, we use the superscript MLD MLD to denote that γ n is the minimum possi- To obtain an adaptive codec with ASE close to ble threshold also for MLD. the maximum given by Theorem 1, the informa- tion rate Rn of each trellis code n must be close As an example, let the number of symbols in the to the Shannon capacity Cn of an AWGN chan- n+1 MLD nth QAM constellation be Mn = 2 . The spec- nel with CNR equal to γ n . he capacity Cn is tral efficiency Rn/B = n + 1 – c/L and the mini- obtained with a Gaussian distribution over con- MLD n+1-c/L mum threshold γ n = 2 – 1 are listed (in tinuous-valued channel input symbols. ACM dB) in Table 1 for c = 1, L = 1, 2, and n = 1, 2, utilizes trellis codes based on equiprobable ..., 11. discrete-valued QAM symbols. However, there exist various techniques, called constellation Theorem 1: For given c, L, and {Mn}, the ASE shaping techniques, that achieve a more Gaus- defined by (2) and (3) is maximized for the min- sian-like distribution of the coded QAM symbols MLD imum thresholds γ n , n = 1, 2, ..., N. [17].

Telektronikk 1.2002 49 ASE bound for MLD, c=L=1 where Γ(⋅, ⋅) is the complementary incomplete gamma function [27, Eq. (8.350.2)] and Γ(m) is 9 the gamma function, which equals Γ(m) = (m – 1)! when m is a positive integer.

8 n+1 Let m ∈ {1, 2, 4}, c = L = 1, and Mn = 2 for n = 1, ..., 11. From Theorem 1, the maximum 7 m=4 achievable ASE of ACM/QAM is obtained from m=2 (2) and (4) using the N = 11 minimum thresholds m=1 MLD 6 γ n for L = 1 in Table 1. The maximum achievable ASE is plotted in Figure 1. Observe bits/s/Hz that a large ASE is achievable even for Rayleigh 5 fading (m = 1). The difference between the theoretical maximum ASE [23, Eq. (23)], often 4 denoted MASE, and the maximum achievable ASE of ACM/QAM is plotted in Figure 2. The 3 curves show that the use of N = 11 codes (with Shannon capacity performance on continuous input AWGN channels) results in a maximum 12 14 16 18 20 22 24 26 28 30 achievable ASE of about .5 bit/s/Hz less than the average CNR in dB theoretical maximum ASE. Nearly the same dif- ference was found for L = 2.

Figure 1 Maximum achievable As an example, we apply Theorem 1 to maxi- III.B Bound for Sequential Decoding ASE of ACM/QAM obtained mize the ASE of Nakagami multipath fading In practice, MLD with the Viterbi algorithm from (2) and (4) for minimum channels whose fading amplitude α has a Nak- is only implemented for classical QAM trellis MLD thresholds γ n in Table 1, agami pdf controlled by a real parameter m = codes with relatively short constraint lengths. c = L = 1, n = 1, 2, ..., 11 Ω2/Var[α2] with constraint m ≥ 1/2 [26, p. 48].2) For a set of codes with large constraint lengths, Here, Var[⋅] denotes the variance. The probabil- we may use SD instead [28]–[30]. The channel ity of γ falling in the nth fading region (3) is cutoff rate for each code n, R0(γn) [bits/s/Hz], is equal to [24] then the maximum spectral efficiency at which     Figure 2 Difference between mγn mγn+1 the average number of computations per de- Γ m, γ¯ − Γ m, γ¯ P (γ ,γ )= , theoretical maximum ASE and n n+1 Γ(m) (4) coded information bit is bounded for the lower maximum achievable ASE of threshold γn. For the bandlimited complex ACM/QAM AWGN channel with Gaussian distributed input, we have [28]     2 γn γn R0(γn)=(log2 e) 1+ − 1+ 2 2 Difference between MASE and discrete-rate MLD bound, c=L=1      2 1 γn +log2 e 1+ 1+ 2 2 0.515 [bits/s/Hz].

Theorem 2: If SD with a finite average number 0.51 of computations per decoded bit is to be used, then the ASE defined by (2) and (3) is maxi- m=1 mized for the thresholds bits/s/Hz m=2 0.505 m=4 def SD γ n = min{}γ n R0 (γ n ) ≥ log2 (Mn ) − c / L (5) n = 1, 2, ..., N, 0.5

given c, L, and {Mn}.

Proof: The theorem is an immediate conse- 0.495 quence of Lemma 1 and the following observa- 12 14 16 18 20 22 24 26 28 30 tion: When a set of N trellis codes based on average CNR in dB QAM constellations with Mn symbols are used

2) Nakagami is a general fading distribution that reduces to Rayleigh for m = 1. It also approximates the Rician distribution for m > 1, and it can approach the log-normal distribution.

50 Telektronikk 1.2002 in conjunction with SD, the spectral efficiency Difference between MASE and discrete-rate SD bound, c=L=1 of code n given by R /B = log (M ) – c/L must n 2 n 1.06 satisfy Rn/B ≤ R0(γn) where γn is the lower threshold of fading region n. Q.E.D. 1.05 Corollary 2: The minimum outage probability out SD for SD is P (γ 1 ). 1.04 m=4 A variation on Newton’s method was used to 1.03 m=2 SD determine the thresholds γ n in (5). These val- m=1 ues are tabulated in the rightmost column of n+1 1.02 Table 1 for c = 1, L = 1, 2 and Mn = 2 , n = 1, bits/s/Hz ..., 11. For two-dimensional codes (L = 1) and Nakagami fading with m ∈ {1, 2, 4}, the differ- 1.01 ence between the theoretical maximum ASE [23, Eq. (23)] and achievable ASE (2), (4) is plotted 1 in Figure 3. Nearly 1.06 bits/s/Hz is lost com- pared to the theoretical maximum ASE. The 0.99 same is true for four-dimensional codes (L = 2). Consequently, assuming optimal performing 12 14 16 18 20 22 24 26 28 30 codes, an increase in ASE of about .56 bit/s/Hz average CNR in dB may be obtained by utilizing MLD instead of SD where MLD is feasible.

IV. Conclusions For single-user systems with frequency-flat fad- tion. IEEE Trans. Inform. Theory, 43 (6), Figure 3 Difference between ing, it seems clear that the theoretical maximum 1986–1992, 1997. theoretical maximum ASE and ASE [3], [23] provides an optimistic upper bound achievable ASE of ACM/QAM on the achievable ASE of practical ACM codecs. 4 Hole, K J, Øien, G E. Adaptive coding and with SD, obtained from (2) and It was found that any sets of two-dimensional or modulation: A key to bandwidth-efficient (4) for minimum thresholds SD four-dimensional trellis codes for MLD have ASE multimedia communications in future wire- γ n in Table 1, c = L = 1, at least .5 bit/s/Hz less than the maximum ASE less systems. Telektronikk, 97 (1), 49–57, n = 1, 2, ..., 11 (see Figure 2) on a Nakagami fading channel. 2001. However, ACM may still provide a large ASE. 5 Goldsmith, A J, Chua, S-G. Adaptive coded The ASE of optimal performing ACM with modulation for fading channels. IEEE Trans. MLD is about .56 bit/s/Hz larger than the ASE Commun., 46 (5), 595–602, 1998. of optimal performing ACM with SD. A prelimi- nary investigation indicates that this difference 6 Lau, V K N, Macleod, M D. Variable rate may be smaller for sets of known trellis codes. adaptive trellis coded QAM for high band- Hence, ACM with SD may be an interesting width efficiency applications in Rayleigh alternative to ACM with MLD. More research is fading channels. In: Proc. 48th IEEE Vehic- needed to determine the real world performance ular Technology Conference (VTC’98), of ACM with SD on wireless channels. Ottawa, Canada, May 1998, 348–352.

References 7 Goldsmith, A J. Adaptive modulation and 1 Meyr, H. Algorithm design and system coding for fading channels. In: Proc. IEEE implementation for advanced wireless com- Inform. Theory and Commun. Workshop, munications systems. In: Proc. International Kruger National Park, South Africa, June Zurich Seminar on Broadband Communica- 1999, 24–26. tions (IZS’2000), Zürich, Switzerland, Feb. 2000. 8 Goeckel, D L. Adaptive coding for time- varying channels using outdated fading 2 Bose, V, Wetherall, D, Guttag, J. Next cen- estimates. IEEE Trans. Commun., 47 (6), tury challenges: RadioActive Networks. In: 844–855, 1999. Proc. ACM/IEEE International Conference on Mobile Computing and Networking 9 Hole, K J, Holm, H, Øien, G E. Adaptive (MOBICOM’99), Seattle, WA, Aug. 1999. multidimensional coded modulation over flat fading channels. IEEE J. Select. Areas Com- 3 Goldsmith, A J, Varaiya, P P. Capacity of mun., 18 (7), 1153–1158, 2000. fading channels with channel side informa-

Telektronikk 1.2002 51 10 Hole, K J, Øien, G E. Spectral efficiency of 22 Webb, W T, Hanzo, L. Modern Quadrature adaptive coded modulation in urban micro- Amplitude Modulation. Graham Lodge, Lon- cellular networks. IEEE Trans. Veh. Tech- don, Pentech Press, 1994. nol., 50 (1), 205–222, 2001. 23 Alouini, M-S, Goldsmith, A J. Capacity of 11 Vishwanath, S, Goldsmith, A J. Exploring Nakagami multipath fading channels. In: adaptive turbo coded modulation for flat fad- Proc. 47th IEEE Vehicular Technology Con- ing channels. In: Proc. 52nd IEEE Vehicular ference (VTC’97), Phoenix, Arizona, May Technology Conference (2000 VTC-Fall), 1997, 358–362. Boston, MA, Sept. 2000. 24 Alouini, M-S, Goldsmith, A J. Adaptive M- 12 Ungerboeck, G. Channel coding with multi- QAM modulation over Nakagami fading level/phase signals. IEEE Trans. Inform. channels. In: Proc. 6th Communications Theory, IT-28 (1), 55–67, 1982. Theory Mini-Conference (CTMC VI) in con- junction with IEEE Global Communications 13 Forney, G D Jr. et al. Efficient modulation Conference (GLOBECOM’97), Phoenix, for band-limited channels. IEEE J. Select. Arizona, Nov. 1997, 218–223. Areas Commun., SAC-2 (5), 632–647, 1984. 25 Cover, T M, Thomas, J A. Elements of Infor- 14 Wei, L-F. Trellis-coded modulation with mation Theory. New York, John Wiley, 1991. multidimensional constellations. IEEE Trans. Inform. Theory, 33 (4), 483–501, 26 Stüber, G L. Principles of Mobile Communi- 1987. cation. Norwell, MA, Kluwer Academic Publishers, 1996. 15 Pietrobon, S S, Costello, D J Jr. Trellis cod- ing with multidimensional QAM signal sets. 27 Gradshteyn, I S, Ryzhik, I M. Table of Inte- IEEE Trans. Inform. Theory, 39 (2), grals, Series, and Products. San Diego, CA, 325–336, 1993. Academic Press, fifth ed., 1994.

16 Wang, F-Q, Costello, D J Jr. New rotation- 28 Wang, F-Q, Costello, D J Jr. Erasure-free ally invariant four-dimensional trellis codes. sequential decoding of trellis codes. IEEE IEEE Trans. Inform. Theory, 42 (1), Trans. Inform. Theory, 40 (6), 1803–1817, 291–300, 1996. 1994.

17 Forney, G D Jr., Ungerboeck, G. Modulation 29 Couturier, S, Costello, D J Jr., Wang, F-Q. and coding for linear Gaussian channels. Sequential decoding with trellis shaping. IEEE Trans. Inform. Theory, 44 (6), IEEE Trans. Inform. Theory, 41 (6), 2384–2415, 1998. 2037–2040, 1995.

18 Le Goff, S, Glavieux, A, Berrou, C. Turbo- 30 Wang, F-Q, Costello, D J Jr. Sequential codes and high spectral e±ciency modula- decoding of trellis codes at high spectral tion. In: Proc. IEEE Int. Conf. Commun. efficiencies. IEEE Trans. Inform. Theory, (ICC’94), New Orleans, Louisiana, May 43 (6), 2013–2019, 1997. 1994, 645–649.

19 Robertson, P, Wörz, T. A novel bandwidth e±cient coding scheme employing turbo codes. In: Proc. IEEE Int. Conf. Commun. (ICC’96), Dallas, Texas, June 1996, 962–967.

20 S. Benedetto, S et al. Parallel concatenated trellis-coded modulation. In: Proc. IEEE Int. Conf. Commun. (ICC’96), Dallas, Texas, June 1996, 974–978.

21 Vucetic, B, Yuan, J. Turbo Codes : Princi- ples and Applications. Norwell, MA, Kluwer, 2000.

52 Telektronikk 1.2002 Breaking the Barriers of Shannon’s Capacity: An Overview of MIMO Wireless Systems

DAVID GESBERT AND JABRAN AKHTAR

Appearing a few years ago in a series of information theory articles published by members of the Bell Labs, multiple-input multiple-output (MIMO) systems have evolved quickly to both become one of the most popular topics among wireless communication researchers and reach a spot in today’s ‘hottest wireless technology’ list. In this overview paper, we come back on the fundamentals of MIMO wireless systems and explain the reasons of their success, triggered mainly by the attraction of radio transmis- sion capacities far greater than those available today. We also describe some practical transmission techniques used to signal data over MIMO links and address channel modeling issues. The challenges and limitations posed by deploying this technology in realistic propagation environment are discussed as well.

David Gesbert (32) holds an MSc from the Nat. Inst. for Tele- communications, Evry, France, I. Introduction will see here, the underlying mathematical 1993, and a PhD from Ecole Nat. Superieure des Telecommunica- Digital communications using MIMO (multiple- nature of MIMO environments can give perfor- tions, Paris, 1997. He has worked input multiple-output), or sometimes called “vol- mance which goes well beyond that of conven- with France Telecom Research ume to volume” wireless links, has emerged as tional smart antennas. Perhaps the most striking and been a postdoctoral fellow in the Information Systems Lab., one of the most promising research areas in property of MIMO systems is the ability to turn Stanford University. In 1998 he wireless communications. It also figures promi- multipath propagation, usually a pitfall of wire- took part in the founding team of nently on the list of hot technologies that may less transmission, into an advantage for increas- Iospan Wireless Inc., San Jose, a company promoting high-speed have a chance to resolve the bottlenecks of ing the user’s data rate, as was first shown in wireless Internet access net- traffic capacity in the forthcoming high-speed groundbreaking papers by J. Foschini [2], [3]. works. In 2001 he joined the broadband wireless Internet access networks Signal Processing Group at the 1) Univ. of Oslo as adjunct associ- (UMTS and beyond). In this paper, we attempt to explain the promise ate professor. Dr. Gesbert’s re- of MIMO techniques and explain the mecha- search interests are in the area MIMO systems can be defined simply. Given nisms behind it. To highlight the specifics of of high-speed wireless data / IP networks, smart antennas and an arbitrary wireless communication system, MIMO systems and give the necessary intuition, MIMO, link layer and system MIMO refers to a link for which the transmitting we illustrate the difference between MIMO and optimization. end as well as the receiving end is equipped with conventional smart antennas in section II. A [email protected] multiple antenna elements, as illustrated in Fig- more theoretical (information theory) standpoint ure 1. The idea behind MIMO is that the signals is taken in part III. Practical design of MIMO on the transmit antennas on one end and that of solutions involves both transmission algorithms the receive antennas on the other end are “com- and channel modeling to measure their perfor- bined” in such a way that the quality (Bit Error mance. These issues are addressed in sections Rate) or the data rate (Bit/Sec) of the communi- IV and V respectively. Radio network level con- cation will be improved. MIMO systems use siderations to evaluate the overall benefits of space-time processing techniques in that the time MIMO setups are finally discussed in section VI. dimension (natural dimension of transmission signals) is completed with the spatial dimension II. MIMO Systems: More Than brought by the multiple antennas. MIMO sys- Smart Antennas Jabran Akhtar (25) is currently a PhD student at the University tems can be viewed as an extension of the so- In the conventional wireless terminology, smart of Oslo. His research interests called “smart antennas” [1], a popular technol- antennas refer to those signal processing tech- include MIMO systems and ogy for improving wireless transmission that niques exploiting the data captured by multiple space-time coding techniques. was first invented in the 70s. However, as we antenna elements located on one end of the link [email protected]

coding CHANNEL weighting/demapping 0010100 0010110 modulation “H” demodulation weighting/mapping decoding

Figure 1 Diagram for a MIMO wireless transmission system. The transmitter and receiver are equipped with multiple antenna elements. Coding, modulation and mapping of the signals onto the antennas may be realized jointly or separately

1) Universal Mobile Telephone Services.

Telektronikk 1.2002 53 only, typically at the base station (BTS) where retained since the optimization of the transmit- the extra cost and space are more easily afford- ting and receiving antenna elements can be car- able. The multiple signals are combined upon ried out in a larger space. But in fact MIMO transmission before launching into the channel links offer advantages which go far beyond that or upon reception. The goal is to offer a more of smart antennas [4]. Multiple antennas at both reliable communications link in the presence of the transmitter and the receiver create a matrix adverse propagation conditions such as multi- channel (of size the number of receive antennas path fading and interference. A key concept in times the number of transmit antennas). The key smart antennas is that of beamforming by which advantage lies in the possibility of transmitting one increases the average signal to noise ratio over several spatial modes of the matrix channel (SNR) through focusing energy into desired within the same time-frequency slot at no addi- directions. Indeed, if one estimates the response tional power expenditure. of each antenna element to a desired transmitted signal, one can optimally combine the elements While we use information theory below to with weights selected as a function of each ele- demonstrate this rigorously, the best intuition is ment response. One can then maximize the aver- perhaps given by a simple example of a trans- age desired signal level and minimize the level mission algorithm over MIMO referred here of other components (noise and/or interference). as spatial multiplexing, which was initially described in [3], [5]. In Figure 2, a high rate bit Another powerful effect of smart antennas is stream (left) is decomposed into three indepen- called spatial diversity. In the presence of multi- dent bit sequences, which are then transmitted path, the received power level is a random func- simultaneously using multiple antennas. The sig- tion of the user location and, at times, experi- nals are launched and naturally mixed together ences fading. When using antenna arrays, the into the wireless channel as they use the same probability of losing the signal altogether van- frequency spectrum. At the receiver, after having ishes exponentially with the number of decorre- identified the mixing channel matrix through lated antenna elements. The diversity order is training symbols, the individual bit streams are Figure 2 Basic spatial defined by the number of decorrelated spatial separated and estimated. This occurs in the same multiplexing (SM) scheme branches. way, as three unknowns are resolved from a lin- with 3 transmit and 3 receive ear system of three equations. The separation is antennas yielding three-fold When multiple antennas are added at the sub- possible only if the equations are independent improvement in spectral scriber’s side as well as to form a MIMO link, which can be interpreted by each antenna ‘see- efficiency conventional benefits of smart antennas are ing’ a sufficiently different channel. That is typi-

b1 b4 ••• b1 b4 ••• A1 B1 C1

b1 b2 b3 b4 b5 b6 ••• b2 b5 ••• b2 b3 ••• b1 b2 b3 b4 b5 b6 ••• A2 B2 C2 SIGNAL PROCESSING SIGNAL Modulation and mapping

b3 b6 ••• b3 b6 ••• A3 B3 C3

A1 B1 C1

A2 B2 C2

A3 B3 C3

54 Telektronikk 1.2002 2 cally the case in the presence of rich multipath. C = log2 (1 + ρ | h | ) Bit/Sec/Hz (2) Finally the bits are merged together to yield the original high rate signal. becomes a random quantity, whose distribution can be computed. The cumulative distribution of In general though, one will define the rank of the this “1 × 1” case (one antenna on transmit and MIMO channel as the number of independent one on receive) is shown on the left in Figure 3. equations offered by the linear system men- We notice that the capacity takes, at times, very tioned above. It is also equal to the algebraic small values, due to fading events. rank of the channel matrix. Clearly the rank is always both less than the number of transmit Interesting statistics can be extracted from the antennas and less than the number of receive random capacity related with different practical antennas. In turn, the number of independent design aspects. The average capacity Ca, aver- signals that one may safely transmit through the age of all occurrences of C, gives information MIMO system is at most equal to the rank. In on the average data rate offered by the link. The this example, the rank is assumed full (equal outage capacity Co is defined as the data rate to three) and the system shows a spectrum effi- that can be guaranteed with a high level of cer- ciency gain of three. This surprising result can tainty, for a reliable service: be demonstrated from an information theory standpoint. Prob{C ≥ Co} = 99.9..9 % (3)

III. Fundamental Limits of We will now see that MIMO systems affect Ca Wireless Transmission and Co in different ways than conventional smart Today’s inspiration for research and applications antennas do. In particular MIMO systems have of wireless MIMO systems was mostly triggered the unique property of significantly increasing by the initial Shannon capacity results obtained both Ca and Co. independently by Bell Lab’s researchers E. Telatar [6] and J. Foschini [3], further demon- III.B Multiple Antennas at One End strating the seminal role of information theory in Given a set of M antennas at the receiver (SIMO telecommunications. The analysis of information system), the channel is now composed of M dis- theory-based channel capacity gives very useful, tinct coefficients h = [h0, h1, ..., hM-1 where hi is although idealistic, bounds on what is the maxi- the channel amplitude from the transmitter to the mum information transfer rate one is able to i-th receive antenna. The expression for the ran- realize between two points of a communication dom capacity (2) can be generalized to [3]: link modeled by a given channel. Further, the analysis of theoretical capacity gives informa- C = log2 (1 + ρhh*) Bit/Sec/Hz (4) tion on how the channel model or the antenna setup itself may influence the transmission rate. where * denotes the transpose conjugate. In Fig- Finally it helps the system designer benchmark ure 3 we see the impact of multiple antennas on transmitter and receiver algorithm performance. the capacity distribution with 8 and 19 antennas Here we examine the capacity aspects of MIMO respectively. Both the outage area (bottom of the systems compared with single input single out- curve) and the average (middle) are improved. put (SISO), single input multiple output (SIMO) This is due to the spatial diversity which reduces and multiple input single output (MISO) systems. fading and thanks to the higher SNR of the com- bined antennas. However going from 8 to 19 III.A Shannon Capacity of Wireless antennas does not give very significant improve- Channels ment as spatial diversity benefits quickly level Given a single channel corrupted by an additive off. The increase in average capacity due to SNR white Gaussian noise (AWGN), at a level of improvement is also limited because the SNR is SNR denoted by ρ, the capacity (rate that can be increasing inside the log function in (4). We also achieved with no constraint on code or signaling show the results obtained in the case of multiple complexity) can be written as [7]: transmit antennas and one receive antennas, “8 × 1” and “19 × 1” when the transmitter does C = log2 (1 + ρ) Bit/Sec/Hz (1) not know the channel in advance (typical for a frequency duplex system). In such circumstances This can be interpreted by an increase of 3 dB in the outage performance is improved but not the SNR required for each extra bit per second per average capacity. That is because multiple trans- Hertz. In practice, wireless channels are time- mit antennas cannot beamform blindly. varying and subject to random fading. In this case we denote h the unit-power complex Gaus- In summary, conventional multiple antenna sys- sian amplitude of the channel at the instant of tems are good at improving the outage capacity observation. The capacity, written as: performance, attributable to the spatial diversity

Telektronikk 1.2002 55 effect but this effect saturates with the number of IV. Data Transmission over antennas. MIMO Systems A usual pitfall of information theoretic analysis III.C Capacity of MIMO Links is that it does not reflect the performance We now consider a full MIMO link as in Figure achieved by actual transmission systems, since 1 with respectively N transmit and M receive it is an upper bound realized by algorithms/ antennas. The channel is represented by a matrix codes with boundless complexity. The develop- of size M × N with random independent ele- ment of algorithms with reasonable perfor- ments denoted by H. It was shown in [3] that the mance/complexity compromise is required to capacity, still in the absence of transmit channel realize the MIMO gains in practice. Here we information, is derived from: give the intuition behind key transmission algo- rithms and compare their performance. ⎡ ⎛ ρ ⎞ ⎤ C = log2 ⎢det I M + HH* ⎥, (5) ⎣ ⎝ N ⎠ ⎦ IV.A General Principles Current transmission schemes over MIMO typi- where ρ is the average SNR at any receiving cally fall into two categories: Data rate maxi- antenna. In Figure 3 we have plotted the results mization or diversity maximization schemes. for the 3 × 3 and the 10 × 10 case, giving the The first kind focuses on improving the average same total of 9 and 20 antennas as previously. capacity behavior. For example in the case of The advantage of the MIMO case is significant, Figure 2, the objective is just to perform spatial both in average and outage capacity. In fact, for multiplexing as we send as many independent a large number M = N of antennas the average signals as we have antennas. capacity increases linearly with M: More generally, however, the individual streams Ca <≈ M log2 (1 + ρ) (6) should be encoded jointly in order to protect transmission against errors caused by channel In general the capacity will grow proportional fading. This leads to a second kind of approach Figure 3 Shannon capacity with the smallest number of antennas min(N,M) in which one tries also to minimize the outage as function of number of outside and no longer inside the log function. probability. TX × RX antennas. The plots Therefore in theory and in the case of idealized show the so-called cumulative random channels, limitless capacities can be Note that if the level of coding is increased distribution of capacity. For realized provided we can afford the cost and between the transmit antennas, the amount of each curve, the bottom and the space of many antennas and RF chains. In reality independence between the signals decreases. middle give indication of the the performance will be dictated by the practical Ultimately it is possible to code the signals so outage performance and transmission algorithms selected and by the that the effective data rate is back to that of a average data rate respectively physical channel characteristics. single antenna system. Effectively each transmit antenna then sees a differently encoded version of the same signal. In this case the multiple antennas are only used as a source of spatial diversity and not to increase data rate directly.

Capacity of i.i.d Rayleigh diversity channels at 10dB SNR The set of schemes allowing to adjust and opti- 1 mize joint encoding of multiple transmit anten- 1 x 1 0.9 nas are called space-time codes (STC). Although MIMO Diversity STC schemes were originally revealed in [8] in 3 x 3 0.8 the form of convolutional codes for MISO sys- 10 x 10 tems, the popularity of such techniques really 0.7 took off with the discovery of the so-called space-time block codes (STBC). In contrast to 0.6 convolutional codes, which require computation- SIMO Diversity 1 x 8 hungry trellis search algorithms at the receiver, 0.5 1 x 19 STBC can be decoded with much simpler linear 0.4 operators, at little loss of performance. In the interest of space and clarity we limit ourselves

Prob Capacity < abscissa 0.3 to an overview of STBC below. A more detailed summary of the whole area can be found in [9]. 0.2

MISO Diversity IV.B Maximizing Diversity with 0.1 8 x 1 Space-Time Block Codes 19 x 1 0 The field of space-time block coding was initi- 0 5 101520253035ated by Alamouti [10] in 1998. The objective Capacity in Bits/Sec/Hz behind this work was to place two antennas at

56 Telektronikk 1.2002 the transmitter side and thereby provide an order led to a variety of code design strategies to pro- two diversity advantage to a receiver with only long Alamouti’s work where one either sacri- a single antenna, with no a priori channel infor- fices the data rate to preserve a simple decoding mation at the transmitter. The very simple struc- structure or the orthogonality of the code to ture of Alamouti’s method itself makes it a very retain a full data rate [13], [14], [15]. Although attractive scheme that is currently being consid- transmit diversity codes have mainly been ered in UMTS standards. designed with multiple transmit and single receive antenna in mind, the same ideas can eas- The strategy behind Alamouti’s code is as fol- ily be expanded towards a full MIMO setup. The lows. The symbols to be transmitted are grouped Alamouti code implemented on a system with in pairs. Because this scheme is a pure diversity two antennas at both transmitter and receiver scheme and results in no rate increase2) we take side will for example give a four-order diversity two symbol durations to transmit a pair of sym- advantage to the user and still has a simple de- bols, such as s0 and s1. We first transmit s0 on coding algorithm. However, in a MIMO situa- the first antenna while sending s1 simultaneously tion, one would not only be interested in diver- * on the second one. In the next time-interval –s1 sity but also in increasing the data rate as shown * is sent from the first antenna while s0 from the below. second one. In matrix notation, this scheme can be written as: IV.C Spatial Multiplexing   ∗ Spatial multiplexing, or V-BLAST (Vertical Bell 1 s0 −s1 C = √ ∗ . (7) Labs Layered Space-Time) [3], [16] can be 2 s1 s0 regarded as a special class of space-time block The rows in the code matrix C denote the an- codes where streams of independent data are tennas while the columns represent the symbol transmitted over different antennas, thus maxi- period indexes. As one can observe the block mizing the average data rate over the MIMO of symbols s0 and s1 are coded across time and system. One may generalize the example given space, giving the name space-time block code to in II in the following way: Assuming a block of such designs. The normalization factor addition- independent data C is transmitted over the N × M ally ensures that the total amount of energy MIMO system, the receiver will obtain Y = HC transmitted remains at the same level as in the + N. In order to perform symbol detection, the case of one transmitter. receiver must un-mix the channel, in one of vari- ous possible ways. Zero-forcing techniques use a The two (narrow-band) channels from the two straight matrix inversion, a simple approach that antennas to the receiver can be placed in a vector can also result in poor results when the matrix H format as h = [h0, h1]. The receiver collects becomes very ill-conditioned in certain random observations over two time frames in a vector fading events. The optimum decoding method y which can then be written as y = hC + n or on the other hand is known as maximum likeli- equivalently as yt = H ˆ s + n, where hood (ML) where the receiver compares all pos-   sible combinations of symbols that could have ˆ 1 h0 h1 H = √ ∗ ∗ , s = [s , s ]T and n is been transmitted with what is observed: 2 h1 −h0 0 1 min the noise vector. Cˆ =arg Y − HCˆ (8) Cˆ

Because the matrices C, H ˆ are orthogonal by The complexity of ML decoding is high, and design, the symbols can be separated/decoded in even prohibitive when many antennas or high a simple manner from filtering of the observed order modulations are used. Enhanced variants vector y. Furthermore, each symbol comes with of this, like sphere decoding [17] have recently a diversity order of two exactly. Notice finally been proposed. Another popular decoding strat- this happens despite the channel coefficients egy proposed alongside V-BLAST is known as being unknown to the transmitter. nulling and canceling which gives a reasonable tradeoff between complexity and performance. More recently some authors have tried to extend The matrix inversion process in nulling and can- the work of Alamouti to more than two transmit celing is performed in layers where one esti- antennas [11], [12]. It turns out however that in mates a symbol, subtracts this symbol estimate that case it is not possible to design a perfectly from Y and continues the decoding successively orthogonal code, except for real valued modula- [3]. tions (e.g. PAM). In the case of a general com- plex symbol constellation, full-rate orthogonal Straight spatial multiplexing allows for full inde- codes cannot be constructed. This has therefore pendent usage of the antennas, however it gives

2) Diversity gains can however be used to increase the order of the modulation.

Telektronikk 1.2002 57 limited diversity benefit and is not always the with spatial multiplexing alone for the case of best transmission scheme for a given BER tar- two antennas. The Alamouti curve has the best get. Coding the symbols within a block can slope at high SNR because it focuses entirely on result in additional coding and diversity gain, diversity (order four). At lower SNR, the scheme which can help improve the performance, even combining spatial multiplexing with some block though the data rate is kept at the same level. It coding is the best one. is also possible to sacrifice some data rate for more diversity. Methods to design such codes It is important to note that as the number of start from a general structure where one often antennas increases, the diversity effect will give assumes that a weighted linear combination of diminishing returns, while the data rate gain of symbols may be transmitted from any given spatial multiplexing remains linear with the antenna at any given time. The weights them- number of antennas. Therefore, for a larger num- selves are selected in different fashions by using ber of antennas it is expected that more weight analytical tools or optimizing various cost func- has to be put on spatial multiplexing and less on tions [11], [18], [19], [20]. space-time coding. Interestingly, having a larger number of antennas does not need to result in a In what follows we compare four transmission larger number of RF chains. By using antenna strategies over a 2 × 2 MIMO system with ide- selection techniques (see for example [21]) it is ally uncorrelated elements. All schemes result in possible to retain the benefits of a large MIMO the same spectrum efficiency but offer different array with just a subset of antennas being active BER performance. at the same time.

Figure 4 shows such a plot where the BER of V. Channel Modeling various approaches are compared: The Alamouti Channel modeling has always been an important code in [7], spatial multiplexing (SM) with zero area in wireless communications and this area of forcing (ZF) and with maximum likelihood research is particularly critical in the case of decoding (ML), and a combined STBC spatial MIMO systems. In particular, as we have seen multiplexing scheme [20]. A 4-QAM constella- earlier, the promise of high MIMO capacities tion is used for the symbols except for the Alam- largely relies on decorrelation properties outi code, which is simulated under 16-QAM between antennas as well as the full-rankness of to keep the data rate at the same level. It can be the MIMO channel matrix. The performance of seen from the figure that spatial multiplexing MIMO algorithms such as those above can vary Figure 4 Bit Error Rate (BER) with zero-forcing returns rather poor results, enormously depending on the realization or not comparisons for various while the curves for other coding methods are of such properties. In particular, spatial multi- transmission techniques over more or less closer to each other. Coding plexing becomes completely inefficient if the MIMO. All scheme results use schemes, such as Alamouti and the block code channel has rank one. The final aim of channel the same transmission rate give better results than what can be achieved modeling is therefore to get an understanding of, by the means of converting measurement data into tractable formulas, what performance can be reasonably expected from MIMO systems in practical propagation situations. The other role 2 transmitters - 2 receivers of channel models is to provide with the neces- sary tools to analyze the impact of selected antenna or propagation parameters (spacing, fre- -1 10 quency, antenna height, etc.) onto the capacity to influence the system design in the best way. Finally, models are used to try out transmit and receive processing algorithms with more realis- 10-2 tic simulation scenarios than those normally assumed in the literature.

-3 V.A Theoretical Models

Bit Error Rate 10 Alamouti - Linear (16QAM) The original papers on MIMO capacity used an SM - ZF (4QAM) SM - ML (4QAM) ‘idealistic’ channel matrix model consisting of STBC - ML (4QAM) perfectly uncorrelated (i.i.d.) random Gaussian 10-4 elements. This corresponds to a rich multipath environment, yielding maximum excitation of all channel modes. It is also possible to define other types of theoretical models for the channel 10-5 matrix H, which are not as ideal. In particular 0 510152025we emphasize the separate roles played by SNR (dB) per receive antenna antenna correlation (on transmit or on receive)

58 Telektronikk 1.2002 and the rank of the channel matrix. If fully corre- Figure 5 MIMO lated antennas will lead to a low rank channel, channel propagation. the converse is not true in general. Remote scatterers The complicated disposition of the Let us next consider the following MIMO theo- scatterers in the retical model classification, starting from Fos- environment will chini’s ideal i.i.d. model, and interpret the per- determine the number formance. In each case below we consider a fre- of excitable modes in quency-flat channel. In the case of broadband, the MIMO channel frequency selective channels, a different fre- quency-flat channel can be defined at each fre- quency.

• Uncorrelated High Rank (UHR, a.k.a. i.i.d.) model: The elements of H are i.i.d. complex Gaussian.

• Correlated Low Rank (CLR) model: H = * * grxg txurxu tx where grx and gtx are independent Gaussian coefficients (receive and transmit fading) and urx and utx are fixed deterministic vectors of size M × 1 and N × 1, respectively, and with unit modulus entries. This model is obtained when antennas are placed too close Local RX scatterers to each other or there is too little angular spread at both the transmitter and the receiver. Local TX scatterers This case yields no diversity nor multiplexing gain whatsoever, just receive array / beam- forming gain. We may also imagine the case of uncorrelated antennas at the transmitter and decorrelated at the receiver, or vice versa. the key parameters governing capacity? Under what simple conditions do we get a full rank • Uncorrelated Low Rank (ULR) (or “pin-hole” channel? If possible the model parameters * [22]) model: H = grxg tx, where grx and gtx are should be controllable (such as antenna spacing) independent receive and transmit fading vec- or measurable (such as angular spread of multi- tors with i.i.d. complex-valued components. path [23], [24], which is not always easy to In this model every realization of H has rank achieve. 1 despite uncorrelated transmit and receive antennas. Therefore, although diversity is pre- The literature on these problems is still very sent capacity must be expected to be less than scarce. For the line-of-sight (LOS) case it has in the UHR model since there is no multiplex- only been shown how very specific arrange- ing gain. Intuitively, in this case the diversity ments of the antenna arrays at the transmitter order is equal to min(M,N). and the receiver can maximize the orthogonality between antenna signatures and produce maxi- V.B Heuristic Models mum capacity as reported in [25]. But, in a gen- In practice of course, the complexity of radio eral situation with fading, which is the true propagation is such that MIMO channels will not promising case, this work is not applicable. fall completely in either of the theoretical cases described above. Antenna correlation and matrix In the presence of fading, the first step in in- rank are influenced by as many parameters as the creasing the model’s realism consists in taking antenna spacing, the antenna height, the presence into account the correlation of antennas at either and disposition of local and remote scatterers, the the transmit or receive side. The correlation can degree of line of sight and more. Figure 5 depicts be modeled to be inversely proportional to the a general setting for MIMO propagation. The angular spread of the arriving/departing multi- goal of heuristic models is to display a wide path. The experience suggests that higher corre- range of MIMO channel behaviors through the lation at the BTS side can be expected because use of as few relevant parameters as possible the BTS antenna is usually higher above the with as much realism as possible. clutter, causing reduced angular spread. In con- trast the subscriber’s antenna will be buried in A good model shall give us answers to the fol- the clutter (if installed at street level) and will lowing problems: What is the typical capacity of experience more multipath angle spread, hence an outdoor or indoor MIMO channel? What are less correlation for the same spacing. The way

Telektronikk 1.2002 59 MIMO models can take correlation into account V.B.1 Impact of Scattering Radius is similar to how usual smart antenna channel One limitation of simple models like the one in models do it. The channel matrix is pre- (or (11) is that it implies that rank loss of H can post-) multiplied by a correlation matrix control- only come from rank loss in R or in ling the antenna correlation as function of the θr ,dr path angles, the spacing and the wavelength. For R , i.e. a high correlation between the example, for a MIMO channel with correlated θt ,dt receive antennas, we have: antennas. However as suggested by the theoreti- 1/2 cal model “ULR” above, it may not always be H = R H0 (9) θr ,dr so. In practice such a situation can arise where there is significant local scattering around both where H0 is an ideal i.i.d. MIMO channel matrix the BTS and the subscriber’s antenna and still only a low rank is realized by the channel and R is the M × M correlation matrix. θ θr ,dr r matrix. That may happen because the energy is the receive angle spread and dr is the receive travels through a narrow “pipe”, i.e. if the scat- antenna spacing. Different assumptions on the tering radius around the transmitter and receiver statistics of the paths’ directions of arrival is small compared to the traveling distance. This (DOA) will yield different expressions for is depicted in Figure 6. This situation is referred to as pinhole or keyhole channel in the literature R [26], [27], [28]. For uniformly dis- θr ,dr [22], [30]. tributed DOAs, we find [27], [26] In order to describe the pinhole situation more, S−1 i= 2 π so-called double scattering models are devel- 1 −2πjk()−m dr cos()2 +θr,i R ,d = e (10) []θr r m,k ∑ oped that take into account the impact of the S i= S−1 2 scattering radius at the transmitter and at the receiver. The model is based on a simplified ver- where S (assumed odd) is the number of paths sion of Figure 5 shown in Figure 7 where only with corresponding DOAs θr,i. For “large” val- local scatterers contributing to the total aperture ues of the angle spread and/or antenna spacing, of the antenna as seen by the other end are con- sidered. The model can be written as [22]: R will converge to the identity matrix, θr ,dr which gives uncorrelated fading. For “small” 1 1/2 1/2 1/2 H = R H0,rR H0,tR , (12) S θr ,dr θS ,2Dr / S θt ,dt values of θr,dr, the correlation matrix becomes rank deficient (eventually rank one) causing fully correlated fading. The impact of the corre- where the presence of two (instead of one) i.i.d. lation on the capacity was analyzed in several random matrices H0,t and H0,r accounts for the papers, including [29]. Note that it is possible to double scattering effect. The matrix R generalize this model to include correlation on θS ,2Dr / S both sides by using two distinct correlation dictates the correlation between scattering ele- matrices: ments, considered as virtual receive antennas with virtual aperture 2Dr. When the virtual aper- 1/2 1/2 ture is small, either on transmit or receive, the H = R H0R (11) θr ,dr θt ,dt rank of the overall MIMO channel will fall regardless of whether the actual antennas are correlated or not.

scatter ring

Narrow pipe

BTS area scatter ring

Figure 6 An example of pinhole realization. Reflections around the BTS and subscribers cause uncorrelated fading, yet the scatter rings are too small for User´s area the rank to build up

60 Telektronikk 1.2002 Figure 7 Double scattering MIMO channel model

D r dt D θ r r θs θt

MRXs Dt NTXs

R

V.C Broadband Channels VI. System Level Issues In broadband applications the channel experi- ences frequency selective fading. In this case VI.A Optimum Use of Multiple the channel model can be written as H(f) where Antennas a new MIMO matrix is obtained at each fre- Multiple antenna techniques are not new in com- quency/sub-band. This type of model is of inter- mercial wireless networks. Spatial diversity sys- est in the case of orthogonal frequency division tems, using two or three antenna elements, co- or multiplexing (OFDM) modulation with MIMO. cross-polarized, have been in use since the early It was shown that the MIMO capacity actually stages of mobile network deployments. More benefits from the frequency selectivity because recently, beamforming-type BTS products the additional paths that contribute to the selec- equipped with five to ten or more antennas have tivity will also contribute to a greater overall been offered on the market. These products are angular spread and therefore improve the aver- using diversity to improve the link budget and age rank of the MIMO channel across frequen- the beamforming capability to extend the cell cies [31]. range or help in load balancing.

V.D Measured Channels Beyond the information theory aspects add- In order to validate the models as well as to fos- ressed earlier, there are significant network-level ter the acceptance of MIMO systems into wire- differences between the beamforming approach less standards, a number of MIMO measurement and the MIMO approach to using multiple campaigns have been launched in the last two antennas. years, mainly led by Lucent and ATT Labs and by various smaller institutions or companies While beamforming systems tend to use a larger such as Iospan wireless in California. More number of closely spaced antennas, MIMO will recently Telenor R&D put together its own operate with typically fewer antennas (although MIMO measurement capability. the only true constraint is at the subscriber side rather than at the BTS side). Furthermore the Samples of analysis for UMTS type scenarios MIMO antennas will use as much space as can can be found in [32], [33], [34]. Measurements be afforded to try and realize decorrelation conducted at 2.5 GHz for broadband wireless between the elements while the directional- access applications can be found in [35]. So far, based beamforming operation imposes stringent the results reported largely confirm the high limits on spacing. Also most MIMO algorithms level of dormant capacity of MIMO arrays, at focus on diversity or data rate maximization least in urban or suburban environments. Indoor rather than just increasing the average SNR at scenarios lead to even better results due to a very the receiver or reducing interference. Finally, rich multipath structure. Eigenvalues analyses beamforming systems thrive in near line of sight reveal that a large number of the modes of environments because the beams can be more MIMO channels can be exploited to transmit easily optimized to match one or two multipaths data. Which particular combination of spatial than a hundred of them. In contrast, MIMO sys- multiplexing and space time coding will lead to tems turn rich multipath into an advantage and the best performance complexity trade-off over lose their multiplexing benefits in line of sight such channels remains however an area of active cases. research.

Telektronikk 1.2002 61 Figure 8 User rates in 2 MHz FDD channels in a fixed wireless access system. The plots show the relative gains between various number of antennas at transmitter × receiver (SISO, SIMO, MIMO)

1 x 1 1 x 2 2 x 3

0.0 1.1 2.2 3.4 4.5 5.6 6.7 9.0 11.2 13.5 Mbps

Because of these differences, the optimal way of the evaluation of overall benefits of MIMO sys- using multiple antenna systems, at least at the tems, taking into account deployment and cost BTS, is likely to depend on the situation. The constraints, is still in progress. search for compromising solutions, in which the degrees of freedom offered by the multiple VII. Conclusions antennas are best used at each location, is an This paper reviews the major features of MIMO active area of work. A key to this problem links for use in future wireless networks. Infor- resides in adaptive techniques, which through mation theory reveals the great capacity gains the tracking of environment/propagation charac- which can be realized from MIMO. Whether we teristics are able to pick the right solution at all achieve this fully or at least partially in practice times. depends on a sensible design of transmit and receive signal processing algorithms. More VI.B MIMO in Broadband Internet progress in channel modeling will also be need- Access ed. In particular upcoming performance mea- One unfavorable aspect of MIMO systems, surements in specific deployment conditions will when compared with traditional smart antennas, be key to evaluate precisely the overall benefits lies in the increased cost and size of the sub- of MIMO systems in real-world wireless sys- scriber’s equipment. Although a sensible design tems scenarios such as UMTS. can extract significant gains with just two or three antennas at the user’s side, it may already References prove too much for simple 1 Paulraj, A, Papadias, C B. Space-time pro- devices. Instead wireless LAN modems, PDAs cessing for wireless communications. IEEE and other high speed wireless Internet access, Signal Proc. Mag., 14, 49–83, 1997. fixed or mobile, devices constitute the real opportunity for MIMO because of less stringent 2 Foschini, G J, Gans, M J. On limits of wire- size and algorithmic complexity limitations. In less communications in a fading environ- Figure 8 we show the data rates achieved by a ment when using multiple antennas. Wireless fixed broadband wireless access system with Personal Communications, 6, 311–335, 2 × 3 MIMO. The realized user’s data rates are 1998. color coded from 0 to 13.5 Mb/s in a 2 MHz RF channel3), function of the user’s location. The 3 Foschini, G J. Layered space-time architec- access point is located in the middle of an ideal- ture for wireless communication. Bell Labs ized hexagonal cell. Detailed assumptions can be Technical Journal, 1, 41–59, 1996. found in [36]. The figure illustrates the advan- tages over a system with just one transmit an- 4 Sheikh, K et al. Smart antennas for broad- tenna and one or two receive antennas. Current band wireless access. IEEE Communications studies demonstrating the system level advan- Magazine, Nov 1999. tages of MIMO in wireless Internet access focus mainly on performance. While very promising, 5 Paulraj, A J, Kailath, T. Increasing capacity in wireless broadcast systems using dis-

3) A user gets zero if the link quality does not satisfy the target BER.

62 Telektronikk 1.2002 tributed transmission/directional reception. 18 Hassibi, B, Hochwald, B. High rates codes U.S. Patent, 1994. (No. 5,345,599.) that are linear in space and time. Submitted to IEEE Trans. On Information Theory, 6 Telatar, I E. Capacity of multi-antenna Gaus- 2000. sian channels. Bell Labs Technical Memo- randum, 1995. 19 Sandhu, S, A. Paulraj. Unified design of lin- ear space-time block-codes. IEEE Globecom 7 Proakis, J G. Digital Communications. New Conference, 2001. York, McGraw-Hill, 1989. 20 Damen, M O, Tewfik, A, Belfiore, J C. A 8 Tarokh, V, Seshadri, N, Calderbank, A R. construction of a space-time code based on Space-time codes for high data rate wireless number theory. IEEE Trans. On Information communication: Performance criterion and Theory, March 2002. code construction. IEEE Trans. Inf. Theory, 44, 744–765, 1998. 21 Molisch, A, Winters, M Z W J, Paulraj, A. Capacity of systems with antenna 9 Naguib, A, Seshadri, N, Calderbank, R. selection. In: IEEE Intern. Conf. On Com- Increasing data rate over wireless channels. munications, 570–574, 2001. IEEE Signal Processing Magazine, May 2000. 22 Gesbert, D et al. Outdoor mimo wireless channels: Models and performance predic- 10 Alamouti, S M. A simple transmit diversity tion. IEEE Trans. Communications, 2002. technique for wireless communications. To appear. IEEE Journal on Selected Areas in Commu- nications, 16, 1451–1458, 1998. 23 Pedersen, K I, Mogensen, P E, Fleury, B. A stochastic model of the temporal and 11 Tarokh, V, Jafarkhani, H, Calderbank, A R. azimuthal dispersion seen at the base station Space-time block codes for wireless commu- in outdoor propagation environments. IEEE nications: Performance results. IEEE Jour- Trans. On Vehicular Technology, 49, 2000. nal on Selected Areas in Communications, 17, 1999. 24 Rossi, J P, Barbot, J P, Levy, A. Theory and measurements of the angle of arrival and 12 Ganesan, G, Stoica, P. Space-time diversity time delay of uhf radiowaves using a ring using orthogonal and amicable orthogonal array. IEEE Trans. On Antennas and Propa- designs. Wireless Personal Communications, gation, May 1997. 18, 165–178, 2001. 25 Driessen, P, Foschini, J. On the capacity for- 13 Jafarkhani, H. A quasi orthogonal space- mula for multiple input multiple output wire- time block code. IEEE Trans. Comm., 49, less channels: a geometric interpretation. 1–4, 2001. IEEE Trans. Comm., 173–176, Feb 1999.

14 Tirkkonen, O, Boariu, A, Hottinen, A. Mini- 26 Ertel, R B et al. Overview of spatial channel mal non-orthogonality rate 1 space-time models for antenna array communication block code for 3+ tx antennas. In: Proc. systems. IEEE Personal Communications, IEEE Int. Symp. Spread Spectrum Technol- 10–22, Feb 1998. ogy, 2000. 27 Asztély, D. On antenna arrays immobile 15 Tarokh, V, Jafarkhani, H, Calderbank, A R. communication systems: Fast fading and Space-time block codes from orthogonal GSM base station receiver algorithms. Royal designs. IEEE Trans. Inf. Theory, 45, Institute of Technology, Stockholm, Swe- 1456–1467, 1999. den, March 1996. (Tech. Rep. IR-S3-SB- 9611.) 16 Golden, G D et al. Detection algorithm and initial laboratory results using the V-BLAST 28 Fuhl, J, Molisch, A F, Bonek, E. Unified space-time communication architecture. channel model for mobile radio systems with Electronics Letters, 35, 1, 14–15, 1999. smart antennas. IEE Proc.-Radar, Sonar Navig., 145, 32–41, 1998. 17 Damen, M O, Chkeif, A, Belfiore, J C. Lat- tice codes decoder for space-time codes. 29 Shiu, D et al. Fading correlation and its IEEE Communications Letters, 4, 161–163, effect on the capacity of multi-element 2000. antenna systems. IEEE Trans. Comm., March 2000.

Telektronikk 1.2002 63 30 Chizhik, D, Foschini, G, Valenzuela, R A. 34 Buehrer, R et al. Spatial channel models and Capacities of multi-element transmit and measurements for imt-2000 systems. In: receive antennas: Correlations and keyholes. Proc. IEEE Vehicular Technology Confer- Electronic Letters, 1099–11, 2000. ence, May 2001.

31 Bölcskey, H, Gesbert, D, Paulraj, A J. On 35 Pitchaiah, S et al. Modeling of multiple- the capacity of wireless systems employing input multiple-output (mimo) radio channel OFDM-based spatial multiplexing. IEEE based on outdoor measurements conducted Trans. Comm., 2002. To appear. at 2.5 GHz for fixed bwa applications. In: Proc. International Conference on Commu- 32 Martin, C C, Winters, J, Sollenberger, N. nications, 2002. Multiple input multiple output (mimo) radio channel measurements. In: IEEE Vehicular 36 Gesbert, D et al. Technologies and perfor- Technology Conference, Boston (MA), mance for non line-of-sight broadband wire- 2000. less access networks. IEEE Communications Magazine, April 2002. 33 Ling, J et al. Multiple transmitter multiple receiver capacity survey in Manhattan. Elec- tronic Letters, 37, Aug 2001.

64 Telektronikk 1.2002 An Introduction to Turbo Codes and Iterative Decoding

ØYVIND YTREHUS

The discovery of turbo codes by Berrou et. al. [11] in 1993 revolutionized the theory of error-correcting codes. The purpose of this paper is

• to provide an introduction to turbo codes; • to describe the weight distribution properties of turbo codes that allow low error probability, even when the underlying communication channel is poor; • to explain the low complexity decoding algorithms that make turbo codes so attractive; • to suggest how to select essential components of the turbo construction, such as interleavers and constituent codes;

Øyvind Ytrehus (42) is professor • to mention that turbo coding can be used in practical situations, for example in a coded modulation and currently the Department scheme; and finally Chair at the Department of In- • to point out the limitations of turbo codes that, so far, restrict their use in some applications. formatics, University of Bergen. His research interests include error correcting properties and decoding complexity of alge- I. Introduction During the last half of the 20th century, intense braic codes, convolutional codes, turbo codes, and codes Consider the problem of sending a block u of K research efforts were devoted towards designing based on graphs; applications bits over a noisy channel. With some nonzero practical error correcting codes with a perfor- of coding theory in communica- probability, these bits will be corrupted by the mance approaching Shannon’s predictions. tion and storage; and the inter- action between coding theory noise. To combat these effects, the information However, this goal turned out to be difficult to and cryptology. block is almost always encoded with an error- achieve, even though powerful algebraic con- [email protected] correcting code. structions were devised [5], [8]. The discovery of turbo codes by Berrou et al. [11] in 1993 rep- An error-correcting encoder works by adding resented a major breakthrough. Recent develop- N – K extra parity-check bits to the information ments in this area have produced implementable block, to produce a codeword of N bits. The codes with performance very close to Shannon’s code is the set of codewords that arises when u bounds [36]. ranges over all possible information blocks, and the code rate is the ratio R = K/N. The codeword Figure 2 shows the relationship between the sig- is transmitted over the channel. At the receiving nal-to-noise ratio (SNR) and the bit error rate end, a decoder produces an estimate u˜ , based on (BER) for an additive white Gaussian noise the received message, of u. If u˜ = u, we have a (AWGN) channel. The SNR, expressed in dB, is decoding error, see Figure 1, from [1]1). The Eb parity-check bits are selected with the aim to SNR =10log10 , (1) N0 minimize the probability of decoding error. where Eb is the average received signal energy In his seminal paper [2], Shannon introduced the per information bit, and N0 is the single sided notion of the channel capacity C of a communi- power spectral density of the Gaussian noise.2) cation channel. He proved the following by a For a given code rate, Shannon’s results imply non-constructive argument: For a given commu- that there is a threshold SNR, which is referred nication channel and for any arbitrarily small ε, to as the Shannon bound, below which error free provided the code length N is sufficiently large transmission is impossible. For a continuous there exists a code of any rate R < C with the input AWGN channel, the Shannon bound in property that the decoding error probability with dB is given as optimum decoding is less than ε. Conversely, for 22R − 1 rates larger than the capacity, error free trans- SNRC =10log10 (2) 2R mission is not possible.

1) For non-Norwegian readers: The story “God dag mann! – Økseskaft” is about a hearing-impaired ferryman who receives a call from the local policeman. Prior to the visit, the ferryman anticipates the policeman’s ques- tions. However, as the signal-to-noise level of the actual communication channel is too low, this results in an absurd conversation. 2) Throughout this paper, commonly known facts and results will be presented without explicit references. Ex- planations and derivations of these results can be found in some of the monographs [8], [10], [17], [24], [29].

Telektronikk 1.2002 65 The BER performance of a typical turbo code used with the iterative turbo decoding algorithm (to be discussed in Section III) is shown in Fig- ure 2. The proximity to the rate-specific Shan- non bound varies with the information length N, with the choice of constituent encoders and interleaver, and with details of implementation of the decoding algorithms. However, over the range of such varying parameters, the curves display the characteristics as shown: There is a plateau region at very low SNR, where there is little or no improvement over uncoded transmis- sion; followed by the waterfall region, where the BER drops off rapidly with increasing SNR. Finally there is an error floor at higher SNR, where the BER mainly depends on the probabil- ity of decoding to a few most likely error vectors.

Berrou et al.’s turbo codes are parallel concate- nated codes, generated by an encoder as shown in Figure 3. The encoder accepts an information block u = (u0, ..., uK-1) consisting of K bits. It will be convenient sometimes to consider u as a sequence u(D) K −1u D j (where the vari- = ∑ j =0 j able D can be thought of as just a placeholder). The encoder is systematic, meaning that the information block u is visible as an explicit part of the codeword. The rest of the codeword con- sists of parity check symbols from the two con- stituent encoders, encoder A and encoder B. In Berrou et al.’s model, which we will consider Figure 1 “God dag mann! – Økseskaft!”: Example of decoding error throughout this paper except for the generaliza- tions in Section V, encoders A and B are identi- cal recursive convolutional encoders. This means that the first parity sequence cA(D) is obtained from the information block as cA(D) = BER vs. SNR (dB) u(D)(f(D)/g(D)), for some fixed binary polyno- Plateau mials f(D) and g(D). The second parity sequence Uncoded is obtained from the information block as cB(D) -2 = π(u(D))(f(D)/g(D)), where π(u(D)) is the

Waterfall region Shannon bound rate 1/3: -0.55 dB Convolutional code sequence resulting from permuting the coordi- nates of u(D) according to an interleaver map π. In some cases, 2 extra bits are appended to the -4 ν input sequences in order to terminate the con-

[BER] stituent trellises (see Section III and [24], [29]). 10 This is not shown in Figure 3. Finally, some log -6 of the encoded bits (usually some of the parity Error floor check bits) are punctured (deleted) according to a puncturing pattern P, leaving a total of N en- coded bits. Hence the turbo code is completely -8 specified by the two constituent encoders, by the interleaver π, by the puncturing pattern P, and -101234by the termination rules. SNR Figure 2 The connection between the SNR and the BER. The Shannon bound of (1) In Section II we investigate the properties of the is shown at approximately –0.55 dB for rate R=1/3. Also shown are the BER curves weight distribution that explain the characteristic for uncoded transmission (of rate 1), a 64-state rate 1/3 convolutional code, and a error curves as shown in Figure 1. Section III rate 1/3 turbo code with a simple random interleaver of information length N = deals with the turbo decoding algorithm. Selec- 1000. In this example, the blue and the yellow curves intersect at an SNR of about tion of essential system components is consid- 14 dB and a BER of about 10-55. However, the turbo code can easily be improved ered in Section IV. Finally, Section V discusses by changing the interleaver

66 Telektronikk 1.2002 variations, generalizations, limitations, and u : K infomation bits Figure 3 Turbo encoder. applications. Trellis termination is not shown In a short paper about a topic that in a few years Encoder A N bits has grown into a major research area, there is no f/g to channel space for entering into deeper discussion on the Puncturing finer points. The reference list contains pointers Encoder B Multiplexing to the literature, for those who want to pursue π f/g this subject.

II. Weight Distribution Properties: Performance at the Error Floor When the BER is not too small, it can be accu- Similarly, rately approximated using computer simulation. N B √  However, accurate simulation results require the BER ≤ w Q 2wR · SNR , K observation of hundreds of error events (see any w=d (6) textbook in statistics or computer simulation), w which are particularly hard to obtain for long where Bw = i =1 Ai,w–i is the total information turbo codes with low error floors. weight of all codewords of weight w.

The Hamming weight of a binary vector is The union bounds can be refined. These refine- defined as the number of nonzero positions in ments lead to sharper bounds that also allow us the vector. It is well known (see for example to restrict the summations of equations (3) and [10]) that, under the assumption that maximum (6) to the first few terms: likelihood decoding is used, the Frame error rate d+m √  (FER; the probability that a given frame contains FER ≤ AwQ 2wR · SNR + some- an error) of any error correcting code can be w=d approximated for high SNRs by the union thing that diminishes at high SNR; (7) bound, d+m √  Bw N   BER ≤ Q 2wR · SNR  √ K + some- FER ≤ AwQ 2wR · SNR , (3) w=d w=d thing that diminishes at high SNR; (8) where Aw is the number of codewords of Ham- ming weight w, d is the minimum distance of the where m is some small integer. Thus in the error code = the minimum nonzero w for which Aw > 0, floor region, instead of performing a costly com- and Q() is the complementary error function, puter simulation, we would like to determine the  ∞ first few terms of the weight distribution. For 1 2 Q(x)=√ e−t /2dt. (4) very large SNR, the code’s minimum distance 2π x dC and the total information weight of weight-dC The set of numbers {Aw|w = 0, ..., N} is called codewords, BdC, determine the BER quite pre- the weight distribution or the weight spectrum cisely. of the code. It is sometimes convenient to refer to the weight enumerator function A(X) Benedetto and Montorsi [14] considered the case N w = w=0 AwX . It is also often convenient to con- of a uniform interleaver; i.e. an idealized ran- sider the conditional input-redundancy weight dom interleaver. Under this assumption, they N−K z enumerator function Ai(Z) = z=0 Ai,zZ , [14], showed that the weight enumerator of the (basic, where Ai,z is the number of codewords of infor- unpunctured) turbo code can be approximated by mation weight i and parity check weight z. Note that AA(Z)AB(Z) A (Z) ≈ i  i K K N−K i K ZiA (Z)= Zi A Zz i i,z i (9) i=0 i=0 z=0 K N−K i+z A B = Ai,zZ where Ai (Z) and Ai (Z) , respectively, are the i=0 z=0 conditional input-redundancy weight enumerator N K w function of the constituent codes. = Z Ai,w−1 w=0 i=0 These results explain the behaviour of turbo = A(Z). (5) codes with a random interleaver, as observed and demonstrated by Berrou et al.:

Telektronikk 1.2002 67 • The actual minimum distance of a turbo code II.A Algorithms for Determination of is not impressive compared to the minimum the Weight Distribution distance dC of an arbitrary “classical” error In this section we consider algorithms for deter- correcting code of similar length and rate. mining the number of codewords of weight This explains why, for a given SNR in the ≤ wmax in a particular turbo code with fixed error floor region, the error floor is flatter for interleaver and constituent codes. the turbo code than for the “classical” code. An upper bound on the minimum distance can • The number of codewords of moderately low be achieved by considering only input vectors of weight (say, of weights not much larger than low Hamming weight, say of weights 1, 2 or 3. dC) is extremely small compared to the “clas- This method can be applied for very large codes, sical” code. This latter phenomenon is usually but there is no guarantee that no lower weight referred to as spectral thinning [14], and ex- codewords exist with higher weight input vec- plains why the error floor of the turbo code is tors. lower (for moderately low SNRs) than for the “classical” code. Breiling and Huber [31], refining the previous method, used pre-processing to obtain a list of • For random interleavers, the expected mini- all input vectors corresponding to constituent mum distance grows slowly, and the expected codewords of weight “almost” wmax. The algo- number of codewords of low weight is slowly rithm proceeds to attempt to combine input vec- reduced, with the interleaver length K. tors to the two constituent encoders. The algo- rithm works well for small values of wmax, but An additional observation [14] that arises from is prohibitively complex for larger wmax. (9) is that most low weight codewords are asso- ciated with input vectors of weight 2. This ob- Another approach was followed by Benedetto servation actually makes (9) obsolete, since it et al. in [37], developed further by Rosnes and motivates the design of better, non-random inter- Ytrehus [43]. The idea is to set up a search tree leavers. Interleaver design is discussed in Sec- containing all possible input vectors. Each node tion IV. For now we conclude that, since (9) in the search tree is a constraint set, which deter- does not apply to nonrandom interleavers, we mines the value of a subset of the input vector need another way to determine the initial part positions. This constraint set specifies a subcode of the weight distribution for an arbitrary inter- of each constituent code. The algorithm evalu- leaver. ates the minimum distance of each subcode, and thereby also a lower bound t on the minimum distance of the corresponding subset of the turbo code. If t exceeds wmax, the constraint set can be discarded from the search tree. Otherwise, the search tree is expanded with two new nodes, expanding the current constraint set in two new Figure 4 The minimum distance of UMTS turbo codes of information length K directions. See details in [37]. The efficiency ranging from 40 to 5114 of the evaluation methods is discussed and improved in [43], facilitating the analysis of 28 larger codes and larger values of wmax.

26 The algorithm in [43] was used to determine the minimum distance of all UMTS turbo codes. 24 The results are shown in Figure 4. As a comment on the first class of algorithms discussed in this 22 section, it can be noted that in 351 of the 5075 cases considered in Figure 4, the minimum dis- 20 tance codewords are actually caused by weight-9 inputs. 18 As an example of the minimum distance’s 16 impact on the BER, we also show in Figure 5 14 results for the UMTS codes of information length K = 5114. This code has a codeword of 12 length 26. We also found another turbo code, using the same constituent code but with an 10 interleaver generated by a technique similar to the one described in [42], with minimum dis- 0 1000 2000 3000 4000 5000 tance 36. Note that at target bit error rates larger

68 Telektronikk 1.2002 than 10-6, the performance of the two turbo UMTS Length 5114 BER vs. SNR (dB) codes is almost identical. However, at a target -8 BER of 10 , the code with the larger minimum UMTS code simulation -2 distance has a relative coding gain of almost New code simulation 1 dB, i.e. it requires 1 dB less SNR to achieve the same BER. The asymptotic coding gain dif- ference is approximately 1.2 dB. -4

II.B Distance Bounds [BER] Upper bounds on the minimum distance of turbo 10

log Union Bound UMTS code codes with arbitrary interleavers were derived -6 by Breiling and Huber [40]. They consider only terms of weight at most 36 codewords of input weight two and four of the Union Bound new code types shown in Figure 6, (a) and (b), respec- -8 terms of weight at most 40 tively. An input vector containing a subvector of 10mp-11 (a one followed by mp – 1 zeros fol- lowed by a one), where m is a positive integer 0.2 0.4 0.6 0.8 1 and p is an integer called the period of the SNR encoder denominator polynomial g(D), will gen- erate a constituent codeword of parity weight approximately proportional to m. Hence, for example, if the interleaver π maps one such input vector u into another vector π(u) of the ever, as the story sadly demonstrates, “incorrect” Figure 5 Simulation results same type, as in Figure 6 (a), the corresponding a priori information can end up being disastrous and bounds for information turbo codeword will also have a low overall for the decoding process. Of course, in a decod- length 5114 weight. The approach followed in [40] is to ing situation there is no way ahead of time to show that for any interleaver π, there must exist determine which a priori information is “correct” some short loops of these two types. One impor- and which is not. Iterative decoding works by tant theoretical consequence of this is that the using extrinsic information, to be defined below, normalized minimum distance dC/N of turbo instead of complete information as a connection code as described in this paper approaches zero between the SISO decoders. During the course for large N, in contrast to the best possible block of many iterations, on average the contributions codes, according to the Varshamov-Gilbert of correct extrinsic information will outweigh bound [5]. In practice, observed minimum dis- the negative effects of incorrect extrinsic infor- tances dC of moderate length turbo codes are mation. much lower than Breiling and Huber’s upper bounds. Therefore, all turbo codes will display Iterative decoding is presented in Figure 7. Con- Figure 6 Low weight errors a significant error floor. sider a stochastic variable X that takes a value x made up of (a) weight two with some known probability P(x) (perhaps con- inputs (b) weight four inputs. III. Iterative Decoding ditioned on some specific event). It will be con- Here, p = 3 In Section II, the analysis of the error probability was made under the assumption of maximum likelihood decoding, which in a strict sense un- fortunately seems to be infeasible. However, Berrou and his colleagues re-invented (see Sec- Input Encoder A 100000 1 tion V) iterative decoding, which for moderate to high SNRs appears to perform very close to maximum likelihood decoding, although a for- mal proof to this effect has yet to be presented. However, we will return to this issue in Section Input Encoder B 100 1 IV. (a)

Iterative decoding relies on cooperation between the decoding modules of the two constituent codes involved. Each of the decoding modules uses the available information; namely the re- Input Encoder A 100 000 1 100 1 ceived values for each transmitted symbol, and certain a priori information presented by the other decoder module. To motivate this discus- sion, let us return briefly to the situation of Fig- Input Encoder B 100 1 100 1 ure 1. In this example, a priori information was assumed to aid the process of decoding. How- (b)

Telektronikk 1.2002 69 Figure 7 Turbo decoding λ(R:C)A be regarded as the amount of information gained SISO A (R:U) by this pass through the decoder. The extrinsic λ (E) (E) (A) information λ (u ) on a particular information λ λ j bit uj is independent of the a priori information N bits -1 λ(A)(u ) on that same bit. from π π π j channel (A) (E) (R:U) Demultiplexing λ λ λ At the start of the decoding process, the blocks (R:U) (R:C)A Log metric calculation SISO B λ and λ are submitted to SISO A, (A) λ(R:C)B while the initial a priori information λ is zero (R:U) u T for all symbols. Secondly, the blocks λ (after an appropriate interleaving) and λ(R:C)B are presented to SISO B, together with the extrinsic information λ(E) from the previous step, which is now used after the interleaving as a priori venient to represent this probability in terms of a information λ(A) for SISO B. Subsequently, the log-likelihood ratio (LLR), i.e. a function of the extrinsic information from SISO B is in turn de- type interleaved and presented to SISO A as a priori information for the second round of decoding. P (x) λ(x)=log . (10) As the decoding progresses, the extrinsic infor- 1 − P (x) mation is gradually refined, and information on Abusing notation but convening to the literature, the whole block is used to produce a final esti- we will sometimes refer to LLRs as informa- mate on each user bit uj, 0 ≤ j < K. Technically, tion3) (i e. channel information, a priori infor- one can argue that a threshold decision on the (Complete) mation, extrinsic information, and complete complete information λj should be used u˜ information) on the given bit x. Using an LLR to obtain the estimate j on bit uj. In practice, instead of a probability means that operations after some iterations it seems to make little dif- (E) become additive rather than multiplicative. ference if the current extrinsic information λj is used in place of the complete information. Let λ(x) denote a block of LLRs, i.e. (λ(x1), ..., λ(xm)) for some vector (x1, ..., xm). The receiver III.A The SISO Blocks and the de-multiplexes the received sequence into three BCJR Algorithm blocks of LLRs, λ(R:U), λ(R:C)A, and λ(R:C)B, rep- The iterative decoding algorithm relies heavily resenting the channel information on the trans- on the SISO blocks. Several approaches to deal- mitted information symbols and the parity check ing with the SISO approach have been devel- symbols in the two constituent codes, respec- oped, including the maximum a priori decoding tively. The determination of the probability dis- invented by Bahl, Cocke, Jelinek, and Raviv in tributions requires an accurate estimate of the [4], subsequently known as the BCJR algorithm, received SNR, but it is not particularly difficult and soft-output Viterbi decoding [15]. Below to obtain such an estimate when the channel is follows a description of the BCJR algorithm, in stable. A punctured and non-transmitted symbol the additive version as presented by Benedetto is represented by a zero in the corresponding et al. [16]. This version produces the extrinsic LLR block. information directly.

For each constituent code, a soft-in-soft-output The BCJR algorithm, like most SISO algo- (SISO) decoder, to be discussed below, produces rithms, requires a trellis [18] representation of the complete a posteriori information λ(Complete) the code, see Figure 8. A trellis is a directed on the block of transmitted symbols, based on graph. The set of nodes or states of the graph (A) the a priori information λ , the channel infor- can be partitioned into subsets S0, S1, ..., Sn-1, Sn. (R) mation λ , and the structural relationship be- The states of Sj belong to the j-th depth. For a tween the bits as imposed by the constituent block code, and hence for a terminated convolu- code. On a block level, tional code as the constituent codes of a turbo code, the initial and final depths contain one (Complete) (A) (R) (E) λ = λ + λ + λ . (11) state each, so S0 = {s0} and Sn = {sn}. An edge e from a state in Si-1 terminates at a state in Si, Subtracting λ(A) and λ(R) from λ(Complete), we 1 ≤ i ≤ n. Let sS(e) and sE(e) be the states where obtain the extrinsic information λ(E), which can e starts and ends, respectively.

3) This informal concept of information should not be confused with the mathematically defined information function of information theory, which is used in the discussion of density evolution in Section IV.

70 Telektronikk 1.2002 For the terminated trellis of a binary rate 1/2 S0 Sn Figure 8 A trellis recursive convolutional constituent code, n = K + ν. The trellis in Figure 8 has K = 3 and ν =2. During the first K depths there are two edges leaving each state, but starting at the K-th depth there is just one edge leaving each state, giving a unique terminating path leading to the ending state sn at depth n. Each edge e is associated with the following labels:

• u(e), the information bit associated with e. This is shown by the color in Figure 8 (grey is zero, orange is one). The backward pass similarly starts the recursion • c(e), the parity check symbol associated with with βn(sn) = 0 and calculates e. This is not shown in Figure 8.    ∗ E βj(s)= max βj+1 s (e) e:sS (e)=s Consider a path starting in s0 and terminating in +u(e)λ(A) + u(e)λ(R:U) sn in the constituent code trellis. Collecting the j+1 j+1 pair of edge labels at each edge, we obtain an  (R:C) edge label sequence, which corresponds to a +c(e)λj+1 (14) codeword of the constituent code. The con- stituent code consists of the set of 2K edge label for all states s at depth j, j = n – 1, ..., 1. The sequences obtained by traversing all possible quantity βj(s) is an LLR representing the proba- paths from s0 to sn. bility that the constituent encoder is in trellis state s at time j, conditioned on the available The algorithm requires two passes (or recur- channel information and a priori information on sions), one forward pass through the trellis all symbols after time j. The values βj(s) are nor- where certain values regarding the j-th depth of malized in the same way as αj(s). the trellis are computed in a recursion based on the same values at the (j – 1)-th depth; and one The final step of the SISO module is to compute, backward pass. In the forward recursion, the for all j, 0 ≤ j < K, the extrinsic log metric of the (E) decoder starts the recursion with α0(s0) = 0 and j-th information bit, λj . For u = 0, 1, compute computes, for depth j = 1, ..., n and for each state    ∗ S µ(u, j)= max αj−1 s (e) s at depth j, the value e:u(e)=u ⎧   ⎪ E (R:U) ⎨ +βj s (e) + u(e)λj ∗ S αj(s)= max αj−1(s (e))  e:sE (e)=s ⎪ ⎩ +c(e)λ(R:C) . preceding state j (15) ⎫ ⎪ ⎬⎪ Then + u(e)λ(A) + u(e)λ(R:U) + c(e)λ(R:C) j j j ⎪ ⎭⎪ λ(E) = µ(1,j) − µ(0,j). (16) a priori value channel information j

(E) (12) Note that the extrinsic information λj on the j-th information bit is independent of the a priori (A) where max* aj operating on T numbers a1, ..., aT value λj on the j-th bit. is the sum-operation   III.B Termination Criteria T Experience, backed by the density evolution at log e (13) arguments to be described in Section IV, shows t=1 that the performance of iterative decoding usu- The quantity αj(s) is an LLR representing the ally improves as the number of iterations in- probability that the constituent encoder is in trel- creases, up to a certain point. The number of lis state s at time j, conditioned on the available iterations before this convergence is observed preceding channel information and a priori in- depends on the SNR. formation pertaining to all symbols until time j. To avoid numerical problems, the values αj(s) are normalized at each depth j by finding m = maxs αj(s), and subtracting m from αj(s) for all states s at depth j.

Telektronikk 1.2002 71 Density evolution IV.A Constituent Codes 1 As for convolutional codes, there is a trade-off between decoding complexity and performance. Decoding complexity is directly related to trellis 0.8 complexity [18]. At the error floor, complex constituent trellises generally offer better perfor- mance, but in the waterfall region the picture is not so clear. 0.6

Higher channel SNR IV.A.1 Performance at the Error Floor at output of SISO block

E In [20], a computer search is used to find good I Lower channel SNR 0.4 convolutional constituent codes. The goal of this search was to find recursive convolutional en- coders which offer high weight codewords for all weight two input vectors, due to the lessons 0.2 learnt from equation (9). However, for non-ran-

Mutual information dom interleavers, weight two input vectors may be less important. Another design approach is to use codes whose codeword weights grow fast 0 0 0.2 0.4 0.6 0.8 1 with the active length of the input vectors, i.e. Mutual information I at input of SISO block the number of successive time instants when the encoder is not in the zero state. This corresponds to maximizing the minimum average cycle weights of the constituent codes.

IV.A.2 Performance in the Figure 9 The connection How many iterations must be carried out? Hage- Waterfall Region between a priori information nauer et al. [15] suggested to use a threshold on Recently [33], [35], [39], density evolution con- and extrinsic information the between the extrinsic informa- siderations have been introduced as a tool for tion produced by the two decoders. Wu et al. studying the iterative decoding procedure. In this [30] suggested to use a threshold on the number approach, it is assumed that each SISO produces of sign differences between these two extrinsic extrinsic information which obeys a Gaussian values. To avoid some rare cases where the probability density function, and which is not decoder does not converge, these termination correlated with the observed received values. criteria must be accompanied by a maximum These assumptions are reasonable for large inter- limit on the number of iterations. The simulated leavers [39]. BER performance of these techniques is close to what is obtained with a virtually unlimited num- The technique can be applied to many channel ber of iterations. The actual number of iterations models. We assume an AWGN channel. For a (A) depends on the constituent codes as well as on fixed channel SNR, define IA = I(λ , u) to be the channel SNR, as can also be deduced from the mutual information between the a priori val- the density evolution arguments in the next sec- ues and the transmitted information.4) Similarly, (E) tion. For “typical” constituent codes, [30] claims we can define IE = I(λ , u) to be the mutual that the average number of iterations is small – information between the extrinsic values pro- less than four or five – at SNRs lower than the duced by the SISO and the transmitted informa- error floor threshold. tion. Now, consider the function IE = F(IA, SNR). Analytical expressions for this function have IV. Selection of System been obtained for simple constituent codes [33]. Components For a general code, analytical expressions are The turbo code is made up from the constituent unknown, but the relationships can easily be codes, the interleaver, and the puncturing obtained by computer simulation. This computer schemes. A complete optimization of puncturing simulation is much simpler [39] than a turbo schemes seems to be infeasible; on the other code BER simulation, which requires the ob- hand, among simpler schemes there does not servation of a large number of very rare error seem to be much of a difference. We will there- events. In Figure 9 a typical function IE = F(IA, fore not discuss the issue of puncturing, and SNR) is shown for two different SNRs. focus on the constituent codes and the inter- leavers.

4) At this point we will not go into the finer details of information theory. For a presentation of information theory, see any textbook, for example [9]. Note that a mutual information value of 1 means that knowledge of one parameter exactly determines the other. A mutual information value of 0 means that the two parameters are statistically independent.

72 Telektronikk 1.2002 Figure 10 contains an EXIT (extrinsic informa- Density evolution tion transfer) chart. The functions F for the first 1 SISO and F-1 for the second SISO, for some of SISO A of SISO E

fixed SNR, are plotted in the same coordinate I system. The iterative decoding procedure fol- 0.8 lows a trajectory between these two curves. As shown in Figure 10, at low SNR, the two curves intersect so that decoding beyond a few itera- tions will not lead to improved decoding results. 0.6 At moderate SNR, there is an open tunnel be- tween the two curves, allowing decoding to pro- ceed. The smallest SNR for which the tunnel is 0.4 open is called the pinch-off limit in [39], which corresponds to the SNR values at which the of SISO B, Extrinsic information waterfall region starts. The EXIT charts can be I used for multiple purposes, such as searching for 0.2 and evaluating good constituent codes, and pre- dicting (reasonably accurately) the BER after a prescribed number of decoding iterations. This 0 technique also presents a strong argument that 0 0.2 0.4 0.6 0.8 1 turbo coding is very close to optimal at high priori information A A priori information I of SISO A, Extrinsic information I of SISO B SNR. E

Note that the density evolution technique does not take into consideration the effect of a limited minimum distance. Thus it cannot be applied to leavers, this assumption is not entirely correct. Figure 10 Exit chart determine the error floor. A first order cycle is made up of any two num- bers i, j, 0 ≤ i < j < K. The length of the cycle IV.B Interleavers is defined as l = l(i, j) = j – i + |π(i) – π(j)|. One In principle, any permutation of the K input bits such cycle is shown in Figure 11. If l is small, can serve as an interleaver, but all permutations the independence assumption is jeopardized. are not equally good. Before we proceed to per- Indeed, the negative impact on the waterfall formance related issues, note that any permuta- performance in an interleaver with many short tion can be represented by a K × K lookup table. cycles can be shown by density evolution argu- For large K, or in the case where K is adaptive to ments as well as by simulation. channel conditions or can be selected by the user (within a range), lookup tables may be inconve- Hokfeldt et al. [26] designed interleavers with nient. In such cases it would be nice to have a the aim of minimizing the correlation between simple and fast deterministic algorithm to spec- the extrinsic information produced by successive ify the interleaver. For example, a simple block SISO iterations. The method seeks to minimize interleaver provides maximum spreading of the the number of short first order cycles. There interleaved symbols, which according to both does not seem to be a significant difference in the following subsections is an advantage. How- the waterfall region between these interleavers ever, the regularity of this construction also and those designed in [42], designed next. seems to be disadvantageous both in the water- fall region and especially at the error floor IV.B.2 Performance in the region. As of this writing, the known simple Error Floor Region deterministic algorithms produce interleavers In the error floor region, the key parameters with a worse performance than the interleavers determining performance are the code’s mini- discussed below. mum distance, and the number of codewords Figure 11 Short interleaver of minimum distance. The interleaver can be cycles The purpose of the interleaver is basically to optimize the performance of the turbo codes. We will consider the waterfall region and the error floor region separately.

Input Encoder A ij IV.B.1 Performance in the Waterfall Region The density evolution arguments rely on the information that is passed between the con- stituent SISO modules to be independent. Since Input Encoder B these SISO modules are linked through the inter- π(i) π(j)

Telektronikk 1.2002 73 designed with the aim to optimize these parame- V. Variations, Generalizations, ters. Applications and Limitations

Interestingly, the simplest step is again to make V.A Variations sure that there are no short first order cycles in Codes can be concatenated in other ways than by the interleaver. Dolinar and Divsalar suggested parallel concatenation, and still be decoded by to use s-random interleavers [13]. Their s-ran- the turbo decoding method. This can include two dom interleavers do not contain interleaver map- or more constituent codes. Serial concatenation pings π(i), π(j) if is one possibility.

|i – j| < s and |π(i) – π(j)| < s, V.A.1 Serial Concatenation simultaneously. (17) Serial concatenation [19] is shown in Figure 12. Such constructions can also be decoded with an These are pseudo-randomly generated inter- iterative decoding method. For details see [19]. leavers, where interleaver mappings π(i) are Serially concatenated codes tend to have larger added one by one and discarded during the gen- minimum distances than parallel concatenated eration process if they violate (17) together with codes of comparable complexity and rate. How- any of the existing mappings π(j). Crozier [42] ever, research results so far suggest that the redefined the s-random property slightly, by turbo decoding algorithm also seems to be less requiring that l = |i – j| + |π(i) – π(j)| ≥ s, and pre- efficient in this case. sented two efficient ways to generate powerful interleavers. V.A.2 Repeat-Accumulate Codes Divsalar et al. [23] suggested a particularly sim- Andrews et al. [21] suggested to also remove ple serial concatenation: The inner “code” is a short cycles of the type shown in Figure 6 (a). rate 1 accumulator mapping, i.e. its single output Breiling et al. [32] designed interleavers for spe- bit at time j is simply the modulo 2 sum of all cific pairs of constituent codes, by explicitly input bits so far. The accumulator mapping can avoiding loops that involved known low weight be described by a 2-state trellis, to which a SISO codewords in both constituent codes. The module is applied. The outer code is a rate 1/3 “other” interleaver in Figure 5 was found by repetition code: each information bit is copied using these techniques in combination with the twice in the output. This simple construction ones in [42]. performs surprisingly well, especially in the waterfall region. This can be explained by den- sity evolution arguments [33].

V.B Generalizations A turbo code can be viewed as a low-density u : K information bits parity-check (LDPC) code, introduced already in N’ 1962 by Gallager [3], and reinvented by Tanner in 1981 [6]. Apparently, technology was not ripe Outer Encoder π Outer Encoder for these ideas at the time when they were first N introduced.

Multiplexing and puncturing Wiberg et al. [12] observed that turbo codes, as well as LDPC codes, can be described by a Tan- N bits through channel ner graph. Turbo decoding, as well as iterative decoding of LDPC codes, proceeds as a message Demultiplexing, puncturing, passing algorithm on this graph. This graph and Log metric calculation the decoding algorithm acts by factoring the λ(R) : all N’ code bits decoding problem. Generalizing these concepts, we arrive at factor graphs, which cover a vast Inner SISO number of problems, from many application areas, and their solutions, as special cases. (E) : N’ inform. bits (A) : N’ inform. bits λ λ See [34] for more details. -1 π π In the message passing algorithms, optimal λ(A) : all N’ code bits λ(E) : all N’ code bits scheduling of the messages is a problem yet to Outer be solved. This has led to the introduction of SISO parallel message passing algorithm, and to the λ(E) : K information bits extreme cases of parallelism: Analog decoders [22], [25]. T Figure 12 Serial concatenation u

74 Telektronikk 1.2002 V.C Applications 5 MacWilliams, F J, Sloane, N J A. The theory This paper has focused on turbo codes applied to of error-correcting codes. Amsterdam, AWGN channels with BPSK modulation. Turbo North-Holland, 1977. codes can of course be applied to a variety of coding and communication situations. We 6 Tanner, M. A recursive approach to low review some of these below. complexity codes. IEEE Transactions on Information Theory, IT-27 (5), 533–547, • Coded modulation is possible with turbo 1981. codes as well. In contrast to traditional meth- ods, like trellis coded modulation [7], set par- 7 Ungerboeck, G. Channel coding with multi- titioning cannot be readily applied. Some level/phase signalling. IEEE Transactions on challenges remain, concerning issues such as Information Theory, IT-28 (1), 55–67, 1982. how to map the binary encoded bits to signal constellation points, or rotation invariance. 8 Lin, S, Costello, D. Error Control Codes. Still, turbo coded modulation produce good Englewood Cliffs, Prentice Hall, 1983. results [29]. 9 Johannesson, R. Informationsteori : Grund- • Turbo codes have been applied with success valen för (tele-)kommunikation. Lund, to fading channels [29]. This usually requires Studentlitteratur, 1988. (In Swedish). a combined iterative detection/decoding scheme where equalization [25] and/or esti- 10 Blahut, R. Digital Transmission of Informa- mation of channel parameters [38] are in- tion. New York, Addison-Wesley, 1990. cluded in the decoding process. 11 Berrou, C, Glavieaux, A, Thitimajshima, P. • An example of a completely different applica- Near Shannon limit error correcting coding tion is the use of turbo decoding in correlation and decoding : Turbo-codes. In: Proc. attacks on cryptographic functions [27], [28]. ICC’93, Geneva, Switzerland, May 1993, 1064–1070. McEliece [41] suggests that turbo-like codes will work well in most communication situations. 12 Wiberg, N, Loeliger, H-A, Koetter, R. Codes and iterative decoding on general graphs. V.D Limitations European Transactions on Telecommunica- Turbo coding performs close to the Shannon tions, 6 (5), 513–525, 1995). bound even when a very small BER is required. The relatively small minimum distance presents 13 Dolinar, S, Divsalar, D. Weight distributions some problems, though. This means that to for turbo codes using random and nonran- achieve a small BER at low SNR, a certain mini- dom permutations. Pasadena, CA, USA, Jet mum information length N is required. Since Propulsion Lab (JPL), 1995. (TDA Progress turbo decoding processes one block at a time, report, 42-122.) this can cause a decoding delay beyond what is tolerable for some applications. 14 Benedetto, S, Montorsi, G. Unveiling turbo codes: some results on parallel concatenated References coding schemes. IEEE Transactions on 1 Asbjørnsen, P C, Moe, J. God dag mann! – Information Theory, 42 (3), 409–428, 1996. Økseskaft! In: Norske Folkeeventyr, 1841 – 1844. (In Norwegian) 15 Hagenauer, J, Offer, E, Papke, L. Iterative decoding of binary block and convolutional 2 Shannon, C. A Mathematical Theory of codes. IEEE Transactions on Information Communication. Bell System Tech. J., 27, Theory, 42 (3), 429–445, 1996. July and October, 379–423, 623–656, 1948. 16 Benedetto, S et al. A soft-input soft-output 3 Gallager, R. Low-density parity-check maximum a posteriori (MAP) module to codes. IRE Transactions on Information decode parallel and serial concatenated Theory, IT-8 (1), 21–28, 1962. codes. Pasadena, CA, USA, Jet Propulsion Lab (JPL), 1996. (TDA Progress report, 42- 4 Bahl, L R et al. Optimal decoding of linear 127.) codes for minimizing symbol error rate. IEEE Transactions on Information Theory, 17 Pless, V, Huffman, W C (eds.). Handbook of IT-20 (2), 284–287, 1974. Coding Theory. Amsterdam, North-Holland, 1998.

Telektronikk 1.2002 75 18 Vardy, A. Trellis structure of codes. In: 29 Vucetic, B, Yuan, J. Turbo Codes : Princi- Pless, V, Huffman, W C (eds.). Handbook of ples and Applications. Boston, Kluwer, Coding Theory. Amsterdam, North-Holland, 2000. 1998. 1989–2118. 30 Wu, Y, Woerner, B D, Ebel, W J. A simple 19 Benedetto, S et al. Serial Concatenation of stopping criterion for turbo decoding. IEEE Interleaved Codes : Performance Analysis, Communication Letters, 4 (8), 258–260, Design, and Iterative Coding. IEEE Transac- 2000. tions on Information Theory, 44 (3), 909–926, 1998. 31 Breiling, M, Huber, J B. A method for deter- mining the distance profile of turbo codes. 20 Benedetto, S, Garello, R, Montorsi, G. A In: Proc. of 3rd ITG conference on source search for good convolutional codes to be and channel coding, Munich, Germany, Jan- used in the construction of turbo codes. uary 2000. IEEE Transactions on Communications, 44 (9), 1101–1105, 1998. 32 Breiling, M, Peeters, S, Huber, J B. Inter- leaver design using backtracking and spread- 21 Andrews, K S, Heegard, C, Kozen, D. Inter- ing methods. In: Proc. IEEE Int. Symposium leaver design methods for turbo codes. In: on Information Theory, Sorrento, Italy, June Proc. IEEE Int. Symposium on Information 2000, 451. Theory, Cambridge, MA, USA, 1998, 420. 33 Divsalar, D, Dolinar, S, Pollara, F. Low 22 Loeliger, H-A et al. Iterative sum-product complexity turbo-like codes. In: Proc. 2nd decoding with analog VLSI. In: Proc. IEEE Int. Symposium on Turbo Codes, Brest, Int. Symposium on Information Theory, France, September 2000, 73–80 . Cambridge, MA, USA, 1998, 146. 34 Kschischang, F R, Frey, B J, Loeliger, H-A. 23 Divsalar, D, Jin, H, McEliece, R J. Coding Factor graphs and the sum-product algo- Theorems for ‘turbo-like’ codes. In: Proc. rithm. IEEE Transactions on Information 1998 Allerton Conference on Communica- Theory, 47 (2), 498–519, 2001. tions, Allerton, IL, USA, Sept. 1998. 35 Richardson, T, Urbanke, R. The capacity of 24 Heegard, C, Wicker, A B. Turbo Coding. low-density parity-check codes under mes- Boston, Kluwer, 1999. sage-passing decoding. IEEE Transactions on Information Theory, 47 (2), 599–618, 25 Hagenauer, J et al. Decoding and equaliza- 2001. tion with analog non-linear networks. Euro- pean Transactions on Telecommunications, 36 Chung, S Y et al. On the design of low-den- 10 (5), 659–690, 1999. sity parity-check codes within 0.0045 dB of the Shannon limit. IEEE Communications 26 Hokfeldt, J, Edfors, O, Maseng, T. Inter- letters, 5 (2), 58–60, 2001. leaver design for turbo codes based on the performance of iterative decoding. In: Proc. 37 Garello, R G, Pierleoni, P, Benedetto, S. IEEE International Conference on Commu- Computing the free distance of turbo codes nications, Vancouver, BC, Canada, June and serially concatenated codes with inter- 1999. leavers: Algorithms and applications. IEEE Journal of Selected Areas in Communica- 27 Johansson, T, Jönsson, F. Fast correlation tions, 19 (5), 800–812, 2001. attacks based on Turbo code techniques. In: Proceedings of Crypto’99, Santa Barbara, 38 Valenti, M C, Woerner, B D. Iterative chan- CA, USA, August 1999. Lecture Notes in nel estimation and decoding of pilot symbol Computer Science, Berlin, Springer, 1999. assisted turbo codes over flat-fading chan- nels. IEEE Journal on Selected Areas in 28 Fossorier, M P C, Mihaljevic, M J, Imai, H. Commun., 19 (9), 1697–1705, 2001. Critical noise for convergence of iterative probabilistic decoding with belief propaga- 39 ten Brink, S. Convergence behavior of itera- tion in cryptographic applications. In: Pro- tively decodes parallel concateneated codes. ceedings of AAECC’13, Honolulu, HI, USA, IEEE Transactions on Communications, 49 November 1999. Lecture Notes in Computer (10), 1727–1737, 2001. Science, Berlin, Springer, 1999.

76 Telektronikk 1.2002 40 Breiling, M, Huber, J B. Combinatorial anal- 42 Crozier, S N. New High-Spread High-Dis- ysis of the minimum distance of turbo codes. tance Interleavers for Turbo-Codes. IEEE Transactions on Information Theory, Preprint, 2001. 47 (7), 2737–2750, 2001. 43 Rosnes, E, Ytrehus, Ø. Algorithms for turbo 41 McEliece, R J. Are turbo-like codes effective code weight distribution calculation with on non-standard channels? IEEE Information applications to UMTS codes. To be pre- Theory Newsletter, 51 (4), 2001. (Presented sented at IEEE Int. Symposium on Informa- at IEEE Int. Symposium on Information tion Theory, Lausanne, Switzerland, July Theory, Washingthon, D.C., USA, June, 2002. 2001.)

Telektronikk 1.2002 77 Theory and Practice of Error Control Coding for Satellite and Fixed Radio Systems

PÅL ORTEN AND BJARNE RISLØW

Strong “state-of-the-art” Forward Error Correction (FEC) coding has been extensively applied for both satellite and fixed radio communication. With the discovery of Turbo codes, performance close to the capacity limits for moderate bit error rates became possible. In this paper we discuss various aspects of the capacity theorem and the capacity limits. We describe and present results for some error control coding schemes that are currently applied in satellite and fixed radio systems, and compare these results with the theoretical limits. Finally, we describe and evaluate promising new coding schemes that have not yet been applied in commercial systems.

Pål Orten (35) is Research Man- 1 Introduction tems are characterised by an extremely large dis- ager at Nera Research. He re- By the widespread use of mobile cellular tance between transmitter and receiver, since the ceived his Siv.Ing. degree from phones, radio communications have become a communication signal is transmitted via a satel- the Norwegian University of Sci- ence and Technology in 1989, natural part of life for many people around the lite located up to 36,000 km (for Geosynchron- and his PhD from Chalmers Uni- world. This has been made possible by advanced ous Earth Orbits – GEO) above the earth surface. versity of Technology in 1999. technical research enabling cost-effective imple- Since the intensity of electromagnetic waves (in From 1990 to 1995 he was Research Scientist at ABB Cor- mentation and reliable communication in a free space) decays with the square of the radio porate Research and from 1995 rather harsh environment. In general a radio path length, such a system is clearly power lim- at Nera Research (interrupted channel is exposed to many degrading and dis- ited. Therefore, satellite systems normally need by PhD studies). In addition to various research and develop- turbing effects like interference, multipath prop- line of sight to have sufficient link margin for ment activities in channel cod- agation causing fading, Doppler shifts, and ther- reliable operation. Naturally, sophisticated error ing and signal processing at mal noise. Obviously, to be able to operate a control coding schemes are necessary for accept- ABB and Nera, he also partici- pated in the European research communication system under such conditions, able performance, as well as for limiting the project FRAMES, which resulted some method or scheme for error control is power consumption in both satellite and termi- in the definition of the UMTS required. Many of these advanced error control nal. This is especially important for mobile ter- system in ETSI. His current re- search has focus at fixed wire- schemes were developed for deep space commu- minals. Another major problem with the long less communications and nications and satellite communications where signal path of GEO satellites is the correspond- mobile satellite communications. power (and to some extent bandwidth) limits the ing long delay. This delay might reduce the per- [email protected] performance. Other advances in coding theory ceived quality of speech services and cause have been made for applications where band- problems for delay sensitive data communica- width efficiency is very important. Examples of tion services. Thus, satellite systems have also such systems are cable modems and radio relay been designed with satellites in lower earth communications. orbits. For such systems, the delay is reduced, but other problems like more Doppler and need An example of a satellite communication link is for hand-over between satellites occur. A com- shown in Figure 1. Satellite communication sys- mon and accepted channel model for satellite

Bjarne Risløw (38) is a Research Satellite Scientist at Nera Research. He received his MSc in Electrical Engineering at the Norwegian Institute of Technology in 1988. He has worked as a Research Scientist at SINTEF DELAB (1989–1992), ABB Corporate Research (1993–1994) and from 1995 at Nera Research. At ABB and Nera he has worked with the development of terminals Return Channel and earth stations for the Inmarsat systems. In the last Forward Channel years he has been mostly involved in wireless broadband access procject, DVB-RCS and LMDS. His main field of interest is with terminal technologies and in particular modem tech- nology, coding and synchroni- Terminal Earth Station sation. [email protected] Figure 1 Satellite System

78 Telektronikk 1.2002 systems is the Ricean channel model. With CityLink handheld phones, and thus rather non-directive antennas, the scattered components may be sig- nificant. However, with rather directive anten- nas, and slowly moving or fixed terminals, the direct ray is very strong compared to the diffuse scattered components. The channel is then quite close to an Additive White Gaussian Noise (AWGN) Channel. For maritime or aeronautical applications these assumptions may not hold, in future satellite and radio systems. Finally, the Figure 2 Radio link and other fading models must be used. More paper presents a comparison of some of the cod- communication system information about satellite channels can be ing schemes along with a discussion of their found for instance in [1]. properties.

Figure 2 shows a radio link communication sys- 2 The Capacity Theorem tem with some high capacity radio hops (typi- and Capacity Limits cally STM-1 data rates1) bringing the communi- cation to a city where the traffic is further dis- 2.1 Introduction tributed to the user by cable or radio transmis- Claude Shannon developed the formulas for “the sion. Radio link communication systems are maximum rate of transmission of binary digits rather different from satellite communication over a system when the signal is perturbed by systems, since they are typically used instead various types of noise” [2]. For the AWGN of fibre cables in areas where it is impractical channel with bandwidth, W, and signal-to-noise or not cost effective to deploy fibre cables. ratio, S/N, Shannon showed that it was possible Radio relay systems therefore have to be highly to transmit binary digits at a rate reliable, provide rather high data rates, and also a very low bit error rate. The goal for the radio ⎛ S ⎞ C = W log2 1+ (1) system design is then to provide close to fibre ⎝ N ⎠ quality and speed. Obviously, this sets strong requirements on the channel coding that may be with as small probability of errors as desired. applied. Radio links operate with line of sight This maximum rate is also called the channel communications, with antennas normally capacity. Shannon used a geometrical approach mounted several meters above the ground. Nev- to prove this. Assume that m bits are encoded ertheless, reflections may occur from oceans with M = 2m different signal functions inside the and/or mountains resulting in a multipath fading channel. Also precipitation may cause the signal code sphere 2TWP of radio. P is the average to fade. A much used channel model for radio power of the codes. Then, by letting T (code link communication is a two-ray model, where length) go to infinity one would reach the capac- the rays are separated by 6.3 ns. ity limit. A remarkable point is that the bound holds on average even if the M signals are In this paper we focus on coding for satellite picked at random within the code sphere. communications and radio link communication systems. Initially, we present the capacity limits Many researchers have tried to design random- of Shannon, as well as the practical limits for a like codes, but a decoder for a pure random code given code rate, modulation method and block would be very complex since the decoder would length. We then study coding schemes that have have to compare the received signal against all been applied in many satellite and radio link sys- M possible transmitted sequences. This decoder tems, and show examples of their performance. would generally have to calculate Euclidean dis- We then proceed with a presentation of Turbo tances (use soft decision). With the discovery of codes that are now implemented in several Turbo codes one suddenly found a practical way recently designed satellite systems. Next, we to decode long random-like codes using soft look at coding schemes and performance for decisions, but still with limited decoding com- high spectral efficiency radio systems. Further- plexity. more, we present coding schemes that have not yet been applied in commercial systems, includ- Let us take a closer look at the capacity formula. ing two block code based coding schemes with When signalling at a certain rate Rs the signal to iterative decoding, Turbo Product Codes and noise ratio S/N relates to the energy to noise den- Low Density parity Check Codes. We believe sity for each bit Eb / N0 as: these are strong candidates to be implemented

1) STM-1 (Synchronous Transport Module level 1) is the basic transmission rate (155.52 Mbit/s) in SDH (Synchronous Digital Hierarchy).

Telektronikk 1.2002 79 8 It is then easy to see that when the channel band- width W goes to infinity the minimum required 7 Shannon Limit E / N will become ln(2) or –1.6 dB. This result Asymptote = -1.6 dB b 0 is not very useful since unlimited bandwidth will 6 not be available. Nevertheless, it is a fundamen- tal limit. Now if we let the η = C/W be the spec- tral efficiency (bit/s/Hz), then Eq. (3) becomes 5 E / N = (2η – 1) / η, which is the unconstrained C/W b 0 limit as plotted in Figure 5. 4 Until a few years ago there was no practical way 3 of getting really close to the Shannon limit. With a certain coding scheme a reasonably good cod- 2 ing gain compared to uncoded modulation could be achieved, but we were still far away from the 1 Shannon limit. As an example, concatenation of an inner convolutional and an outer Reed- Solomon code was still about 2–3 dB away from 0 -2 -1 0 1 2 3456the capacity limit. However, with the discovery of Turbo codes and iterative decoding one Eb/N0 [dB] closed the gap and achieved performance within 1 dB of the Shannon limit with practical code lengths.

Figure 3 Relationship between S E R 2.2 Capacity Limit as = b s the minimum required Eb /N0 (2) Function of Code Rate N N0 W and bandwidth (W/C) Eq. (1) can be re-written to take into account the The minimum required Eb / N0 when transmit- effect of the code rate. If we let S/N = R * ting at capacity (Rs = C) is then: Eb / N0 where R is the code rate of the code, i.e. the ratio between the number of information bits   and actual number of bits transmitted on the Eb W C/W = 2 − 1 (3) channel, and W = 1/2T = Rs / 2 we get the capac- N0 C ity in bits per dimension as: The relationship between the minimum required   Eb / N0 and the relative bandwidth (W/C) is 1 2REb shown in Figure 3. C = log2 1+ (4) 2 N0 Figure 4 Minimum bit error rate as function of code rate From the converse to the coding theorem one can derive the inequality, [3]

R ⋅ (1 – Hb(p)) ≤ C (5) 100 where R=0.75 R=0.5 -1 H (p) = –p ln(p) – [1 – p] ln[1 – p]). (6) 10 R=0.33 b ⋅ ⋅ R=0.1 Hb(p) is the binary entropi function where p is 10-2 the coded bit error probability. One can now cal- culate the bit error rate for a specific code rate and Eb / N0 and the results are shown in Figure 4. 10-3 BER Furthermore, the Shannon limit for the different code rates can be written as:

2R 10-4 Eb 2 − 1 = (7) N0 2R

10-5 From Eq. (8) and Figure 4 we can see that the capacity limit by using 1/2-rate coding is 0 dB. 10-6 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Eb/N0 [dB]

80 Telektronikk 1.2002 2.3 Capacity Limit with 2.5 How to Utilise the Limits Constrained Input The bounds calculated earlier in this document Code Design are based on unconstrained input, but most sys- Turbo codes have been made, which are very tems use some kind of constrained input like for close to the Shannon Limit at moderate BER lev- instance BPSK, QPSK, 8-PSK, or 16-QAM. The els. These codes and other related codes are not channel capacity for equiprobable binary con- algebraic codes, but depend on some parameters, strained input (± 1), [4], can be written as: which are found by a cut and try process. When

1 ∞ py( 1) C = p y 1 log dy + 2 ∫ ( ) 2 py −∞ ( ) 1 ∞ py( −1) p y −1 log dy (8) 2 ∫ ( ) 2 py −∞ ( ) 6

The probability density function of the received Unconstrained signal including noise, p(y | ±1), has mean ±1 5 BPSK 2 QPSK and variance σ = 1 / (2 * Es / N0). Adjusting for 16-QAM the code rate, we have Es / N0 = R * Eb / N0. We can now determine the capacity in bit/s/Hz from 4 64-QAM Eq. (8). This expression can easily be extended to any square QAM constellation, and the capac- ity limit for some of these constellations is 3 shown in Figure 5.

If we assume QPSK and a code rate of 1/2 this Spectral Effeciency 2 gives a spectral efficiency of 1 bit/s/Hz. The capacity limit in this case is 0.2 dB while the unconstrained limit from Figure 4 was 0 dB. 1 We see that for higher order modulation schemes the equiprobable square QAM constel- lation will never reach the Shannon limit, and 0 asymptotically this distance will be 1.53 dB. -2 0 2 46 8 10 11 14 There are two ways to improve the performance; Capacity Limit Eb/N0 [dB] either by making the probability distribution more Gaussian or by increasing the dimension of Figure 5 Spectral Efficiency as function capacity limit QAM the signal constellation. This is also called shap- ing gain, and at high SNR this gain is separable from the coding gain. Methods exist that can 3.5 easily obtain 1 dB of the theoretical gain.

3. R=0.5 2.4 Capacity Limit as R=0.75 Function of Block Length 2.5 Limit for R=0.5 Finally the actual code length used will limit Limit for R=0.75 the capacity. This can be found from the sphere 2 packing bound and the detailed formulas can be found in [5]. In Figure 6 we have plotted the 1.5 capacity limit as function of block length for 1/2 and 3/4 rate. 1

Note that the influence of the block length is the 0.5

same for both code rates. The curves are only Capacity limit (dB) 0 shifted upwards as the code rate is increased. In order to reach the capacity limit a rather long -0.5 block length (106) is required, but such block lengths are impractical for most systems. Typi- -1 cal block lengths can be the ATM cell size (424 bit) or MPEG packet size (1504 bit), and if we -1.5 assume 1/2-rate coding this gives a capacity loss 102 103 104 105 106 of 1.4 dB and 0.8 dB respectively. Block length N information bits

Figure 6 Capacity limit as function of block length (information bits)

Telektronikk 1.2002 81 3.2 Coding for Satellite Applications

3.2.1 Viterbi Decoding of 1 2 3 K-2 K-1 K Convolutional Codes Convolutional codes have been very popular for forward error correction in satellite systems. For g0 convolutional codes, dependence between the symbols is obtained by performing a convolution on the data symbols. With more than one such linear combination, redundancy is added, and the code can correct errors. Figure 7 shows a feed gn forward convolutional encoder with constraint length K and code rate R = 1/n or R = 1/(n+1) if the systematic2) encoder switch is closed. Which shift register contents to add (modulo 2) is decided by the generator polynomials, gi. The outputs are then multiplexed onto the output line. Figure 7 Convolutional feed doing simulation for a set of code parameters we forward encoder can easily determine whether a code is good or A nice feature with convolutional codes is the not by comparing the simulation results against existence of a maximum likelihood sequence the theoretical limits. Further in this article we decoding algorithm originally proposed by will use the limits to compare different coding Viterbi in 1967 [6] and now known as the schemes, since not all codes are directly compa- Viterbi algorithm. The constraint length, K, of a rable to each other (different code rate, block convolutional code is commonly defined as the length). number of encoder memory (or delay) elements plus one3). Since the complexity of the Viterbi System Design Level algorithm increases exponentially with the mem- New codes like Turbo codes offer very large ory of the encoder, we normally have to choose flexibility with respect to code rates and block constraint lengths around 10 or lower to be able lengths. When evaluating different coding tech- to decode with the Viterbi algorithm. The code niques to be used in a new system this usually rate, R, is the relation between the number of includes extensive time-consuming simulations. information bits, k, and code bits, n, transmitted, Instead we can use the capacity limits to esti- such that R = k/n. Rate k/n codes where k is dif- mate the performance since the limits presented ferent from one can be constructed by applying offer an easy way to get a first estimate. multiple shift registers. A problem with this approach is that the number of trellis states 3 Channel Coding for becomes 2kK. Instead, high rate convolutional Radio Systems codes can be obtained by puncturing. Puncturing means that we delete some code bits of a lower 3.1 Introduction rate 1/n code to obtain a code with a higher rate. Since the radio channel is subject to various For these codes the complexity of the decoding impairments like fading, interference and ther- is essentially the same as for a rate 1/n code. mal noise, advanced channel coding is necessary Results have shown that the performance loss is for sufficient performance and range. In addition also quite low by constructing families of multi- to the performance criteria, the coding scheme rate and rate compatible codes by puncturing [7] must also allow sufficient system flexibility. For [8]. An important parameter for good perfor- satellite communication systems it might also be mance of convolutional codes is the free dis- necessary to apply adaptive modulation and cod- tance, df. The free distance is the distance (or ing to account for varying quality of service weight) of the path with the shortest distance requirements and channel variations. The coding from the correct path in the trellis. scheme should then be able to provide variable error protection depending on the channel condi- A convolutional code that has become almost a tions. For fixed radio systems reliability and standard for satellite communications is the rate extremely low error rates are vital, and the flexi- R = 1/2 constraint length K = 7 convolutional bility is not easily utilised. code with generator polynomials 133 and 171 (in octal notation). It is for instance used in a number of Inmarsat systems, and in the Digital Video Broadcasting (DVB) satellite system. The

2) When the encoder is systematic the information sequence appears directly in the coded sequence. 3) Other definitions also exist in the literature.

82 Telektronikk 1.2002 Code rate Eb / N0 to achieve Capacity for constrained Capacity limit Loss compared to Performance at 10-6 binary input block length constrained input limit

1/2 5.0 dB 0.2 dB Not applicable 4.8 dB

3/4 6.0 dB 1.6 dB Not applicable 4.4 dB

free distance of this code is 10. This is the high- Therefore, the decoding time will vary with the Table 1 Performance est possible free distance of a rate 1/2 feed-for- channel conditions and is thus a random vari- of constraint length 7 ward convolutional code with constraint length able. To cope with this varying decoding com- convolutional code with 7. This code turns out also to be optimum with plexity, large buffers are required to store data Viterbi decoding, compared regard to a more sophisticated criterion [9]. In in periods of intense search activity. With finite to capacity limits Table 1 we give the required Eb / N0 to achieve buffer sizes there is always a certain probability a bit error rate of 10-6 for this constraint length that the decoder has not finished decoding be- K = 7 code. Observe that although the code has a fore the buffers are full (buffer overflow situa- rather short constraint length, the performance is tion). The decoding must then be stopped. This less than 5 dB away from the theoretical limit on is normally the most critical event with sequen- a BPSK/QPSK channel. tial decoding. Since the undetected error proba- bility can be made sufficiently low by choosing 3.2.2 Sequential Decoding of a high constraint length, the overflow rate will Convolutional Codes dominate the total error rate. For sequential As described above the complexity of the decoding the speed of the decoder will therefore Viterbi decoding algorithm increases exponen- have influence on the system performance. A tially with the constraint length. At the same better implementation or a faster hardware or time the error probability of the convolutional signal processor will improve the error rate since code also decreases exponentially with the con- the decoder has time for more search, and buffer straint length. To have sufficiently low bit error overflows become more rare. To be able to rates at low signal-to-noise ratios, we might have recover quickly from an overflow situation, sys- to use a constraint length that makes Viterbi tematic encoders are often used with sequential decoding impractical. A possible solution is then decoding. It can be shown that if a channel/mod- to use sequential decoding, see for instance [10] ulation parameter called the Pareto exponent is or [11] for details. Sequential decoding is a sub- above one, ρ > 1, then the mean value of the optimal decoding algorithm, but since the decod- number of computations required will be ing complexity increases linearly with the con- bounded. In this case a sufficiently large con- straint length instead of exponentially, we can straint length can be chosen to have negligible always choose a constraint length that is suffi- error rate. ciently high to meet our BER requirements even if the decoding is sub-optimal. “Inmarsat B High speed data” is a mobile/port- able satellite service launched some years ago. The basic idea of sequential decoding is to This system uses a rate 1/2 systematic convolu- search the code tree sequentially, going along tional code with constraint length K = 36. Due to the branch which gave the best metric increment. the very high constraint length, Viterbi decoding When the channels are good this is most likely is impossible and sequential decoding is applied. the path that would have been chosen also by the The generator polynomial of the encoder is Viterbi algorithm. When the channel is noisier, 714461625313 (octal notation). This code is an we will from time to time make an incorrect optimum distance profile (ODP) convolutional local decision in our search. This will usually be code [12], which makes it well suited for realised quite soon since the accumulated metric sequential decoding. will now be bad. The decoder will thus back up and try alternative paths, which locally gave a There has also been some work on sequential worse metric. Obviously, when the channel is decoding for Rayleigh fading channels [13], and Table 2 Performance of very noisy there will be a lot of back- and for- for such channels a higher signal-to-noise ratio sequential decoding, compared ward searches before the correct path is found. is needed for the Pareto exponent to be one. to capacity limits

Code rate Eb / N0 where Pareto Capacity for constrained Capacity limit Loss compared to exponent is equal to 1 binary input block length constrained input limit

1/2 2.2 dB 0.2 dB Not applicable 2.0 dB

3/4 3.7 dB 1.6 dB Not applicable 2.1 dB

Telektronikk 1.2002 83 -5 Code Code rate Eb / N0 to achieve Capacity for Capacity limit Loss at 10 BER of 10-5 constrained input block length4)

Voyager 0.44 2.5 dB –0.1 dB 0.2 dB 2.3 dB

DVB-RCS 0.46 3.5 dB 0.0 dB 0.8 dB 2.7 dB

Table 3 Performance of 3.2.3 Concatenated Coding convolutional codes are separated by an inter- Concatenated Codes Concatenated coding is a combination of an leaver, see Figure 8. The component encoders (Convolutional and Reed- inner code and an outer code where the inner are RSC (Recursive Systematic Convolutional) Solomon) code should work well at low signal-to-noise codes. These encoders can be very short. Proba- ratios using maximum likelihood decoding (soft bly the most important element in the code is decoding) and a strong algebraic outer code with the interleaver, which defines the minimum dis- low redundancy. A much used combination is to tance. Designing the interleaver is critical for the use a convolutional code (K = 7) with Viterbi performance, and the interleaver should ideally decoding and a Reed-Solomon outer code. The be random or pseudo-random [17]. The Turbo Reed-Solomon code is non-binary and typically code can easily be applied to any block size and works on 8-bit symbols. The errors from the code rate, [16] and [18]. However one must be Viterbi decoder are bursty and in order to com- especially careful with the interleaver and punc- bat these errors a byte interleaver is used to split turing map for high code rates [19]. up the long error burst. However, for packet transmission and time critical services interleav- The decoding is iterative and based on soft-input ing cannot be used. and soft-output.

This coding scheme was for a long period the To achieve this, the Turbo decoding algorithm “state-of-the-art” in communication systems may apply the Bahl-Jelinek-Cocke-Raviv (BJCR) and has been used in deep-space communication algorithm [20], or the less optimum soft-output (“Voyager” [14]) and is used in the DVB satel- Viterbi algorithm (SOVA) [21]. A simplification lite system. The Voyager code uses a (255,223) of the BJCR algorithm has been shown to Reed-Solomon code and DVB-RCS a (204,188) achieve very good results with very little perfor- Reed-Solomon code [15]. In the DVB-RCS sys- mance loss [22], however several simplified ver- tem one is limited to MPEG packets and cannot sions exist. The actual complexity depends on use interleaving, while for Voyager one could component codes and the number of iterations interleave up to 8 Reed-Solomon blocks. The required. The Turbo coding/decoding technique performance is shown in Table 3. has achieved results very close to the Shannon Limit. 3.2.4 Turbo Codes Berrou et al. [16] introduced a new class of error A lot of work has been done for PCC codes and correcting codes, called Turbo Codes. This cod- has been proposed for a wide range of standards ing technique is based on Parallel Concatenated (EDGE, UMTS, DVB-RCS, CCSDS and several Convolutional (PCC) codes, where two parallel more). Commercial implementations can be

+ + +

+ T T T T

+

Puncturing + + + Pseudo- random + T T T T Interleaver Figure 8 The original Turbo + Code as presented by Berrou

4) The block length for the Voyager code is based on 8 interleaved Reed-Solomon blocks while the DVB-RCS code does not use interleaving.

84 Telektronikk 1.2002 -6 Code Code rate Eb / N0 to achieve Capacity for Capacity limit Loss at 10 BER of 10-6 Constrained input Block length5)

Berrou code (variant) 1/2 1.6 dB 0.2 dB 1.0 dB 0.6

DVB-RCS Code 1/2 1.8 dB 0.2 dB 1.0 dB 0.8

bought, even though most of them are based on cases it is not possible or cost effective to apply Table 4 Performance of the Field-Programmable Gate Array (FPGA) solu- fibre cables. The obvious, and often the only original Turbo and DVB-RCS tions, like the Nera Turbo decoder [23] and oth- alternative is then radio. When the high data code ers [24], [25]. rates that are normally transmitted on fibre are to be transmitted via a radio system, we need Nera uses both a variant of the original Berrou modulation schemes with very high spectral effi- code and a similar but somewhat less complex ciency, and coding schemes that can provide vir- Turbo code in its satellite products. The latter is tually error-free communication. The high spec- used in the DVB Return Channel System (DVB- tral efficiency requirement means that the code RCS), [15] [26]. In Table 4 the performance of rate must be rather high, otherwise the modula- these two codes is given for MPEG block sizes tion alphabet must be extremely large. Virtually (188 bytes). We see that both codes are within error-free communication means that the bit 1 dB of the capacity limit. error rate must be below 10-12. The last few years coding schemes have been found that are The disadvantage of the Turbo code is that one quite close to the Shannon limit6). However, might get a flattening of the bit error curve. these schemes often also have rather large block However, with a good interleaver design this lengths. Due to attenuation, fading and atmo- flattening effect can be mitigated and only occur spheric disturbances the range of such a radio at very low error rates. The inventors have transmission scheme is limited to between 50 patented both the Turbo encoding scheme and and 150 km where the specific range depends different decoders. on the climatic conditions of the installation site, the topography and the carrier frequency that is 3.3 Comparison of applied. At high carrier frequencies the attenua- Coding Techniques tion will be the limiting factor, while at lower In Figure 9 we compare convolutional coding frequencies multipath propagation may be the Figure 9 Comparison of and concatenated coding with Turbo together most critical. Therefore, in order to connect two coding techniques with some coding limits. The comparison is done for the MPEG packet size and approxi- mately 1/2-rate coding. We see that all codes -1 achieve a very good coding gain, but the Turbo 10 code gives us an additional 2 dB compared to Capacity Limit more traditional coding techniques. Appropriate Uncoded -2 simulation results for sequential decoding were 10 Sequential limit Convolutional Coding not available, but results from [13] for a Concatenated Coding Rayleigh fading channel and a constraint length DVB-RCS Turbo K = 36 code, show that BER = 10-6 is achieved 10-3 approximately 1 dB from the theoretical sequen- tial limit. The difference is probably less for an AWGN channel. The sequential limit that is 10-4 BER plotted is for 1/2-rate coding, corresponding to a Pareto exponent ρ = 1. 10-5 3.4 Coding for Systems with High Spectral Efficiency 10-6

3.4.1 System Requirements 10-7 The most effective transmission medium both in terms of reliability and transmission speed is 04682101214 optical fibre transmission. However, in many Eb/N0 [dB]

5) The capacity of 1/2-rate coding with binary constrained input is 0.2, but including 0.8 loss due to the limited block size (MPEG) we get 1.0 dB. 6) The distance to the Shannon limit is still higher for higher order modulation than for binary coding. Some improvement is expected by spectral shaping. 7) We have used the term effective block length to indicate that it is not necessarily a block code that is applied, but for trellis codes the “block length” is not easily defined.

Telektronikk 1.2002 85 Table 5 Necessary spectral Channel Practical Spectral Minimum modulation efficiency STM-1 in different Spacing Symbol Rate Efficiency alphabet (QAM) channels 28 MHz 24 6.5 128-QAM

40 MHz 33 4.7 32-QAM

56 MHz 47 3.3 16-QAM

distant locations, we need several radio hops. Spectral Modulation Minimum Thus, to avoid a much higher delay than what efficiency alphabet Code rate is present for the fibre system, we also need to 3.3 16QAM 0.83 limit the processing delay for the radio transmit- ter and receiver. This puts strong requirements 3.3 32QAM 0.66 on the allowed effective block length7) of the 3.3 64QAM 0.55 coding scheme.

4.7 32QAM 0.94 Fixed radio systems are normally connected to 4.7 64QAM 0.78 the backbone network and must therefore at 4.7 256QAM 0.58 least offer STM-1 rates (155.52 Mbit/s). In most European countries these links must follow a 6.5 64QAM Not possible frequency plan with a channel spacing of 28 6.5 128QAM 0.93 MHz, 40 MHz or 56 MHz. In Table 5 we give the necessary spectral efficiencies in these cases. 6.5 256QAM 0.81 It would be beneficial to offer two STM-1 carri- 6.5 1024QAM 0.65 ers in one channel or even to go to the higher STM-4 rate (622.02 Mbit/s), and to offer as high capacity and spectral efficiency as possible. Table 6 Necessary minimum code rate for a given spectral efficiency and However, implementation issues limit the possi- modulation type ble spectral efficiency upward to around 8 bits/s/Hz.

Table 6 summarises the minimum code rates that can be applied in order to achieve a given spec- n tral efficiency as presented in Table 5 for a given n-m modulation method. We see that if a code rate MAPPING close to 1/2 is to be used, we need a very large Xn modulation alphabet to achieve sufficiently high :: ::Select signal spectral efficiency. Such large constellations will from subset Xm+1 have other impacts on the system like problems ai Zm with synchronisation and non-linearity effects in power amplifier and RF front end. Although we Xm Convolutional :: may obtain a spectral efficiency of 6.5 with a :: Encoder Select Z X1 Rate m/(m+1) 1 Subset code rate 1/2 using 8192QAM, such a radio link will be extremely expensive or difficult to Z0 implement with existing technology.

Figure 10 Trellis Coded Modulation (TCM) and mapping of bits to symbols

Table 7 Performance and Parameter Overall Spectral Measurements Loss compared to -6 parameters for some specific Code rate Efficiency Eb / N0 at BER = 10 unconstrained Nera TCM schemes capacity limit implemented in radio relay 2D TCM, 32 QAM 0.80 4 11.7 dB 6.0 dB systems, XD means X dimensional 2D TCM, 64QAM 0.83 5 14.0 dB 6.1 dB8) 4D TCM, 128QAM 0.93 6.5 15.7 dB 4.3 dB

8) Compared against the constrained input the loss is 4.8 dB.

86 Telektronikk 1.2002 Parity Figure 11 Serial Concate- + + nated Convolutional code, Outer Pseudo- ten Brink Repetition random + T T T Code Interleaver

Systematic Doping

3.4.2 Trellis Coded Modulation switch is used to replace some of the parity bits Since Ungerboeck published the idea of trellis with systematic bits at a very low rate (1:100). coded modulation (TCM) [28] combining the This doping is necessary to “kick-start” the itera- coding and modulation operations, TCM has tive decoder. The code has very low complexity, become the obvious solution for spectrally effi- but still gets within 0.1 dB of the Shannon limit cient systems like radio relay systems. The codes at 10-5 for very long block lengths. normally used with TCM are rather short, and thus the delay does not lead to any serious prob- Bit error rate (BER) performance results have lems. However, the mapping to the constellation shown these codes to be asymptotically better points is a critical operation in order to achieve than PCC. There are few or no basic patents good performance. Figure 10 shows the basic due to the many publications, especially by idea of trellis coded modulation including the Benedetto. This scheme has slower convergence mapping operation to constellation symbols. compared to PCC, which might stem from the Table 7 presents some performance results for fact that the inner and outer decoder will work at TCM schemes used with radio links. different signal to noise ratios. Good high code rate performance seems harder to achieve. It is 3.5 Promising New Coding Schemes unclear whether this is a fundamental problem The coding schemes described so far have been or just lack of good codes. applied for some time and are in practical use in various systems or prototypes. In this section we 3.5.2 Turbo Product Codes describe coding schemes that have so far mostly The idea of Turbo Product Codes (TPC) is to been subject to research, but are promising can- apply iterative (or Turbo) decoding to the decod- didates with potential for future applications. The ing of two concatenated block codes, known as first is a serial concatenation of convolutional product codes (see for instance [29]). In Figure codes with iterative decoding. Then we look at 12 we show the fundamental idea of Product a class of Turbo codes that are based on product Codes. A number of code words are generated codes (see for instance [29]), which are decoded using the first component code (the row vectors), iteratively. These codes have therefore been and then encoding is performed “vertically” by named Turbo Product Codes. The third method the second component code. Observe that we is a type of codes which was first proposed by also encode the parity symbols creating parity on Gallager as early as 1963 [30], and is known as the parity. The two codes that are applied may Low Density Parity Check (LDPC) codes. be different or the same. Typically BCH codes, Hamming codes and/or parity check codes are 3.5.1 Serial Concatenated used. We may also apply a third code, requiring Convolutional Codes a cube for illustration. To be able to perform Serial Concatenated Convolutional Codes iterative decoding, we need soft decisions both (SCCC) was introduced by Benedetto et al. as an in and out of the individual decoding processes. alternative to Turbo Codes based on serial con- Details of this decoder can be found in the paper catenation [31]. Basically, the algorithms and [33] by Phyndia et al. who first published the much of the theory used for PCC can be applied idea of Turbo Product Codes. These codes have for these codes. Note however that interleaver since gained much interest due to some attrac- and component codes are different. According tive properties. Block codes typically perform to [31] the outer code should be a maximum free better than convolutional codes for high code distance, non-systematic convolutional code rates. Turbo Product Codes therefore seem to (NSC), while the inner code should be recursive. perform better than the convolutional code based However, a somewhat different design is to use Turbo codes for high code rates. Therefore, we a simple repetition code (R = 1/2) as the outer consider Turbo product codes to be a strong can- code and rate 1 recursive convolution code as didate for replacing TCM in the high spectral an inner code as proposed by ten Brink [32]. A efficiency radio systems described above. Also,

9) This feature is sometimes erroneously referred to as an error floor. We prefer to call it a levelling out feature since it is certainly not a floor, and the decrease in error rate is still better than for the uncoded system.

Telektronikk 1.2002 87 k k Figure 12 The principle of R = 1 2 , d11 d12 d13 d14 d15 P16 P17 product codes: parity symbols n1n2 are calculated for a data d21 d22 d23 d24 d25 P26 P27 vector, and then vertically where ki is the number of information bits across the previously encoded into a block with code i, and ni is the calculated code words with d31 d32 d33 d34 d35 P36 P37 total number of encoded bits in the code i. This the second code lack of flexibility for TPC may be a rather unde- sired feature for many applications. P41 P42 P43 P44 P45 P46 P47

The authors are not aware of any commercial radio systems in operation today using Turbo Product Codes. However, these codes were seri- the levelling out effect9) of the bit error rate ously considered for several satellite systems curve that has been observed for other Turbo that were standardised or planned recently. TPC codes seem less dramatic for the product codes is also an option in the IEEE 802.16 standard for than the convolutional codes. There are indica- wireless metropolitan area networks that was rat- tions that the levelling out effect can be made ified in December 2001. Since these codes now negligible for many applications by some smart start to appear in standards, there are also chip modifications [34]. For the PCC codes this effect manufacturers that have TPC ASICS under is to some extent controlled or reduced by care- development. ful interleaver design, improving the minimum distance of the code. Figure 13 shows the performance of TPC with 64QAM modulation plotted together with As described, product codes consist of two or 64TCM. The spectral efficiency of the 64TCM more concatenated block codes as component scheme is 5, while the spectral efficiency of the codes. Therefore, the Turbo Product Codes have 64QAM modulated TPC is 4.7 (the code rate the same restriction on the relation between code used is R = 0.779, and two BCH (64,57) compo- rate and block length as their component codes. nent codes are used). Observe that the spectral Thus, we cannot get any arbitrary code rate efficiency is slightly lower for the TPC scheme, matching any desired block length. The total but it also significantly outperforms the TCM block length of the product codes is the product scheme. Because of the length of the TPC code Figure 13 Turbo Product of the block length of each component code. the delay is also somewhat higher than for TCM. Codes with 64QAM This means also that the code rate is given by compared to 64TCM 3.5.3 Low Density Parity Check Codes Low Density Parity Check (LDPC) codes are block codes that were first presented by Gallager [30], and are also referred to as Gallager codes. These codes were “forgotten” for many years until they were rediscovered by McKay [35]. LDPC codes probably have the world record on Capacity Limit coding gain with a performance only 0.005 dB -2 10 TPC and 64QAM from the Shannon limit [36] (using extremely 64-TCM long code words). These codes have a very sparse parity check matrix, H. A sparse parity matrix means that the number of ones is very low compared to the number of zeros. This is 10-4 why the codes are referred to as low density par- ity check (LDPC) codes. The parity check matrix is constructed in a semi-random manner,

BER placing t ones on each column. Research results [35] indicate that it is advantageous to have an 10-6 overlap of only one (overlap constraint) between the ones of the columns10). In addition we try to achieve as uniform a row weight as possible. A parity check matrix constructed as described above is then transformed into systematic form, 10-8 and the generator matrix G is easily obtained and 41416186 8 10 12 used for encoding. For a given G the encoding

Eb/N0 [dB] process is exactly as for any block code.

10) This is to avoid short cycles in the graph, which are bad for the decoder performance.

88 Telektronikk 1.2002 Gallager codes, where the columns of the parity 100 check matrix have the same (uniform) weight, 100 it., N=848, R=1/2, t=3 are called regular Gallager codes. Similarly, 100 it., N=530, R=4/5, t=5 100 it., N=3008, R=1/2, t=3 10-1 irregular Gallager codes have parity check matri- 100 it., N=1880, R=4/5, t=5 ces with non-uniform column weights. There are 100 it., N=477, R=8/9, t=5 indications that irregular Gallager codes have better performance than regular Gallager codes 10-2 [35]. Gallager codes may also be constructed over GF(q). Such non-binary codes may be better than the irregular Gallager codes over GF(2) [35]. 10-3 Bit error rate

Gallager codes are decoded iteratively by a sum- product algorithm. After each iteration the syn- 10-4 drome is calculated and a new iteration is per- formed if the syndrome is not zero. When the syndrome is zero, we have found a valid code 10-5 word. The decoding is then terminated and the information bits are released. If the syndrome is not zero after some predefined maximum num- 10-6 05671 2 34 ber of iterations, the decoding is halted and the information part of the current code word esti- Eb/N0 [dB] mate is released. The minimum distance of Gal- lager codes is high, and, except for very short Figure 14 Performance of LDPC codes with various packet sizes (MPEG and ATM block lengths, errors are normally only observed packet sizes) and various code rates. Results are shown for regular LDPC codes when the decoding is terminated because the with t denoting the column weight of the parity check matrix maximum number of iterations was reached. Thus, Gallager codes have a built-in error detec- tion mechanism that can make them promising candidates for packet data communication with retransmission (hybrid ARQ). Gallager codes need many more iterations in the decoding than the Turbo codes, but the complexity of each iter- ation is much lower.

In contrast to the decoders for many other block ATM size block codes, the decoder for Gallager codes can easily 1.0 utilise soft decisions. In the decoder the soft decisions are used to initialise to appropriate ini- tial symbol probabilities. Consequently, there is also limited extra complexity in soft decision 0.8 decoding of Gallager codes. Figure 14 shows the performance of some LDPC codes with MPEG and ATM packet sizes. The performance is com- parable to or better than the performance of the 0.6 best Turbo codes.

4 Comparison and Discussion As a comparison of the discussed coding 0.4 schemes, we have plotted the spectral efficiency Spectral Efficiency of a number of coding techniques as a function Capacity of the signal-to-noise ratio (Eb / N0). In Figure 15 we present the spectral efficiency for ATM 0.2 Concatenated sized (i.e. 53 bytes or 424 information bits) PCC TPC packets as a function of the required E / N to b 0 LDPC achieve a packet error rate equal to 10-5, where E is the bit energy and N the noise spectral 0 b 0 -1 05671 2 3 4 density. Note that for the LDPC codes we have Required E /N [dB] used bit error rate instead of packet error rate, b 0 and therefore the results are slightly too good compared to the other schemes. However, due Figure 15 Spectral efficiency of various error control schemes for ATM sized to the steep curves the SNR difference resulting packets. We have applied a packet error rate of 10-5, except for LDPC codes a bit from this is modest. We see that the iterative (or error rate equal to 10-5 is applied. Modulation is BPSK

Telektronikk 1.2002 89 MPEG size block References 1.0 1 Sheriff, R E, Hu, Y F. Mobile satellite com- munication networks. West Sussex, John Wiley, 2001.

0.8 2 Shannon, C E. Communication in the Pres- ence of Noise. In: Proceeding IRE, 37, 10–21, January 1949. (Reprint available in Proceedings of the IEEE, 86 (2), 447–457, 0.6 1998.)

3 Blahut, R E. Principles and Practise of 0.4 Information Theory. Addison-Wesley, 1987. Spectral Efficiency 4 Proakis, J G. Digital Communications. 2nd Capacity Edition. New York, McGrawHill, 1989. 0.2 Concatenated PCC 5 Dolinar, S, Divsalar, D, Pollara, F. Code TPC Performance as a Function of Block Size. LDPC Pasadena, California, Jet Propulsion Labora- 0 tory, California Institute of Technology, -1 05671 2 3 4 1998. (TMO Progress Report 42-133.) Required Eb/N0 [dB] 6 Viterbi, A J. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Figure 16 Spectral efficiency Turbo based) schemes perform significantly bet- Information Theory, IT-13 (2), 260–269, of various error control ter than the concatenated schemes. In Figure 16 1967. schemes for MPEG sized we have plotted the corresponding results for packets. We have applied a MPEG packets (i.e. 188 bytes or 1504 informa- 7 Frenger, P et al. Rate compatible convolu- packet error rate of 10-5, tion bits). The concatenated coding scheme does tional codes for multirate DS-CDMA sys- except for LDPC codes a bit not apply soft decisions when decoding the Reed tems. In: IEEE Transactions on Communica- error rate equal to 10-5 is Solomon code since this would be rather com- tions, 47 (6), 828–836, 1999. applied. Modulation is BPSK plex. Soft decisions explain a major part of the performance difference. 8 Frenger, P et al. Multirate convolutional codes. Gothenburg, Sweden, Chalmers Uni- The coding schemes presented also have differ- versity of Technology, Dept. of Signals and ent degrees of flexibility. The PCC-Turbo code Systems, Communication Systems Group, can accommodate any block length by changing 1998. (Tech. Rep. 21.) the interleaver size. We can change the code rate by altering the puncturing pattern. Thus, code 9 Frenger, P, Orten, P, Ottosson, T. Convolu- rate and block length can be modified indepen- tional codes with optimum distance spec- dently of each other. This gives great flexibility trum. IEEE Communication Letters, 3 (11), in the system design. For the Turbo Product 317–319, 1999. Codes the code rates and block lengths are directly given by the code rate and block length 10 Viterbi, A J. Principles of digital communi- of the component codes. Shortening of the code cation and coding. New York, McGraw-Hill, is of course possible, but still we do not have 1979. the same flexibility as for the PCC Turbo codes. Note the relatively poor performance of the 11 Wicker, S B. Error control systems for digi- R = 0.6 code for TPC and MPEG packets. This tal communication and storage. New Jersey, comes from the limited flexibility of the TPC Prentice Hall, 1995. code that is applied. With additional component codes one could achieve improved performance. 12 Johannesson, R. Some long rate one-half The LDPC codes are somewhere between these binary convolutional codes with an optimum two other Turbo codes in flexibility. A code with distance profile. IEEE Transactions on infor- any code rate and block length can in principle mation Theory, IT-22 (5), 629–631, 1976. be designed, but distinct encoders and decoders must be implemented. There are indications of 13 Orten, P, Svensson, A. Sequential decoding significant loss when puncturing the LDPC of convolutional codes for Rayleigh fading codes [37], so changing the code rate by punc- channels. Wireless Personal Communica- turing is not recommended.

90 Telektronikk 1.2002 tions, 20 (1), 61–74, 2002. Kluwer Aca- 24 SmallWorld. (May 7, 2002) [online] – URL: demic Publishers. http://www.sworld.com.au/

14 Costello, D J et al. Applications of Error- 25 TurboConcept. (May 7, 2002) [online] – Control Coding. IEEE Transactions on URL: http://www.turboconcept.com/ Information Theory, 44 (6), 2531–2560, 1998. 26 ETSI. Digital Video Broadcasting (DVB); Interaction channel for Satellite Distribution 15 ETSI. Standard Digital Video Broadcasting Systems, Guideline of use of EN 301 790. (DVB); Interaction channel for Satellite Dis- Sophia Antipolis, 2001. (ETSI Technical tribution Systems. Sophia Antipolis, 2000. Report TR 101 790, 2001-06.) (EN 301 790 V.1.2.2, 2000-12.) 27 Forney, G D Jr, Ungerboeck, G. Modulation 16 Glavieux, A, Berrou, C, Thitimajshima, P. and Coding for Linear Gaussian Channels. Near Shannon Limit error correcting coding IEEE Transactions on Information Theory, and decoding: Turbo-Codes (1). In: Proceed- 44 (6), 2384–2415, 1998. ings of IEEE International Conference on Communications, Geneva, Switzerland, May 28 Ungerboeck, G. Trellis-Coded Modulation 1993, 1064–1070. with redundant signal sets. Part 1: Introduc- tion. IEEE Communications Magazine, 25 17 Crozier, S et al (eds.). Performance of Turbo (2), 5–11, 1987. Codes with Relative Prime and Golden Inter- leaving Strategies. Sixth International Con- 29 Wilson, S. Digital Modulation and coding. ference on Mobile Satellite Communication New Jersey, Prentice Hall, 1996. (IMSC ’99), Ottawa, Canada, June 1999. (www.cra.ca/fec) 30 Gallager, R G. Low density parity-check codes. Cambridge, MA, MIT Press, 1963. 18 Pyndiah, R et al. Near optimum decoding of (Research Monograph series no 21.) products codes. In: Proceedings of IEEE Global Telecommunications Conference 31 Benedetto, S et al. Serial Concatenation of (GLOBECOM ’94), San Francisco, Interleaved Codes : Performance Analysis, Nov–Dec 1994, 1/3, 339–343. Design and Iterative Decoding. IEEE Trans- actions on Information Theory, 44 (3), 19 Acikel, O F, Ryan, W E. Punctured turbo- 909–926, 1998. codes for BPSK/QPSK channels. IEEE Transactions on Communications, COM-47 32 ten Brink, S. Rate one-half code for (9), 1315–1323, 1999. approaching the Shannon limit by 0.1 dB. IEE Electronics Letters, 36 (1), 1293–1294, 20 Bahl, L R et al. Optimal decoding of linear 2000. codes for minimizing symbol error rate. Transactions on Information Theory, IT-20, 33 Pyndiah, R. Near optimum decoding of 284–287, 1974. product codes: Block Turbo Codes. IEEE Transactions on Communications, 46 (8), 21 Hagenauer, J, Offer, E, Papke, L. Iterative 1003–1010, 1998. decoding of binary block and convolutional codes. IEEE Trans. on Inform. Theory, 42 34 Advanced Hardware Architectures. (May 7, (2), 429–445, 1996. 2002) [online] – URL: http://www.aha.com/

22 Pietrobon, S, Barbulescu, A S. A simplifica- 35 MacKay, D J. Good error-correcting codes tion of the modified Bahl decoding algo- based on very sparse matrices. IEEE Trans- rithm for systematic convolutional codes. action on Information Theory, 45 (2), Int. Symposium on Information Theory & its 399–431, 1999. Applications, Sydney, Australia, Nov. 1994, 1073–1077. 36 Chung, S et al. On the design of low-density parity-check codes within 0.0045 dB of the 23 Risløw, B et al. Implementation of a 64 Shannon limit. IEEE Communications Let- kbit/s Satellite Modem Using Turbo Codes ters, 5 (2), 58–60, 2001. and 16-QAM. In: Proceedings of Norwegian Signal Processing Symposium, Tromsø, May 37 Orten, P. Low density parity check codes for 23–24, 1997. wireless communications. Billingstad, Nera Research, 2000. (Technical Report.)

Telektronikk 1.2002 91 A New Look at the Exact BER Evaluation of PAM, QAM and PSK Constellations*)

PAVAN K. VITTHALADEVUNI AND MOHAMED-SLIM ALOUINI

Hierarchical constellations offer a different degree of protection to the transmitted messages according to their relative importance. As such they found interesting application in digital video broadcasting systems as well as wireless multimedia services. Although a great deal of attention has been devoted in the recent literature to come up with explicit closed-form expressions for the bit error rate (BER) performance of standard pulse amplitude modulation (PAM), quadrature amplitude modulation (QAM), and phase-shift-keying (PSK) constellations, very few results are known on the BER performance of hierarchical constellations. In this paper, we argue that a recursive way of Gray coding a constellation ensures the existence of a recursive algorithm for the exact and generic (in the constellation size) BER computation of generalized hierarchical PAM, QAM, and PSK constellations over additive white

Pavan K. Vitthaladevuni (23) Gaussian channel (AWGN) channels. This new approach provides also as a byproduct an alternative received his B.Tech. degree in unified way for the BER evaluation of the well-known standard PAM, QAM, and PSK constellations. Electrical Engineering from the Because of its generic nature, this new approach readily allows numerical evaluation for various cases Indian Institute of Technology (IIT), Madras, India in 1999 and of practical interest. his MSEE degree from the Uni- versity of Minnesota, Twin Cities in 2001. He is currently a gradu- ate research assistant in the I. Introduction The evaluation of the bit error rate (BER) of Department of Electrical and In his study of broadcast channels, Cover [1] classical uniform M-ary pulse amplitude modu- Computer Engineering at the showed about three decades ago that one strat- lation (PAM), M-ary quadrature amplitude mod- University of Minnesota, Twin Cities and is working towards egy to guarantee basic communication in all ulation (QAM), and M-ary phase-shift-keyed his PhD. His research interests conditions is to divide the broadcasted messages (PSK) constellations has long been of interest. span digital communications into two or more classes and to give every class For example, exact expressions for the BER of over wireless channels and information theory. a different degree of protection according to its 16-QAM and 64-QAM were derived in [10]. importance. The goal is that the most important Later on, generic (in M) but approximate BER [email protected] information (known as basic or coarse data) expressions for uniform M-QAM have been must be recovered by all receivers while the less developed in [11] and [12]. More recently, Yoon important information (known as refinement, et al. [13] obtained the explicit and generic (in detail, or enhancement data) can only be recov- M) closed-form expression for the BER of uni- ered by the “fortunate” receivers which benefit form square QAM. We extended these results to either from better propagation conditions (e.g. hierarchical square and non-square 4/M-QAM closer to the transmitter and/or with a direct line- constellations (see [14]). For this particular fam- of-sight path) or from better RF devices (e.g. ily of hierarchical constellations, for every chan- lower noise amplifiers or higher antenna gains). nel access, two bits are assigned for the basic Motivated by this information-theoretic study, information and (log2M – 2) bits are assigned for many researchers have shown since then that the refinement information. However, for more one practical way of achieving this goal relies general hierarchical PAM, QAM or PSK con- on the idea of hierarchical constellations (known stellations, the derivation of exact and generic also as embedded, multi-resolution, or asymmet- closed-form expressions becomes more tedious. ric constellations) which consist of non-uniform- In this paper, we take a new look at this problem Mohamed-Slim Alouini (33) re- ceived his Diplome d'Ingenieur ly spaced signal points [2], [3], [4]. This concept and we present an alternative recursive approach degree from TELECOM Paris was studied further in the early nineties for digi- for the exact BER computation of generalized and his Diplome d'Etudes App- tal video broadcasting systems [3], [5] and has hierarchical M-PAM, M-QAM, and M-PSK con- rofondies in Electronics from the Univ. of Pierre & Marie Curie, gained more recently new actuality with stellations over additive white Gaussian channel Paris, both in 1993. He received (AWGN) channels. For example, we consider his M.S.E.E. from Georgia Tech, • the demand to support multimedia services by the 2/4-PAM wherein a BPSK constellation is Atlanta, in 1995, and his PhD in electrical engineering from Cal- simultaneous transmission of different types embedded into a 4-PAM constellation (see Fig- tech, Pasadena, in 1998. He of traffic, each with its own quality require- ure 1) and the 2/4/8-PAM constellation wherein joined the department of Electri- ment [6], [7], [8]; and a BPSK constellation is embedded into a 4-PAM cal and Computer Engineering of the Univ. of Minnesota in 1998, constellation, which is in turn embedded into a where his current research inter- • a possible application in the DVB-T standard 8-PAM constellation (see Figure 2). For these ests include statistical modeling [9] in which hierarchical modulations can be hierarchical constellations one information bit is of multipath fading channels, adaptive and hierarchical modu- used on OFDM subcarriers. sent for each of the two or three different levels lation techniques, diversity sys- tems, interference mitigation techniques, and digital commu- *) This work is supported in part by the National Science Foundation (NSF) grant CCR-9983462 and in part by nication over fading channels. the Center of Trnasportation Studies (CTS) through the Intelligent Transport Systems (ITS) Institute, Minneapo- [email protected] lis, Minnesota.

92 Telektronikk 1.2002 of protection required, respectively. Apart from 2d2 solving for the exact BER of these generalized 2d1 hierarchical constellations, this new approach 10 1 11 01 0 00 provides as a byproduct an alternative unified way for the BER evaluation of the well-known standard uniform constellations. i1

The remainder of this paper is organized as fol- i2 lows. The subsequent section presents some design issues as well as the model and parame- ters of generalized hierarchical constellations. While section III describes the recursive algo- rithm for the exact BER computation of the con- most significant position, and as such, is referred Figure 1 Generalized stellations under consideration, section IV shows to as the most significant bit (MSB). The bit hierarchical 2/4-PAM how these general results can be exploited to from the bit stream with the second highest pri- constellation obtain the BER of standard uniform constella- ority is assigned the next most significant posi- tions. Section V illustrates the mathematical for- tion, and the bit with the least priority is assign- malism by some numerical examples. Next, sec- ed the least significant position, and is referred tion VI outlines some potential applications of to as the least significant bit (LSB). In general, if these new results. Finally, section VII concludes we were to have m bit streams of data (referred the paper with a summary of the main points of to as sub-channels, i1 through im, hereafter) with the paper. their respective priorities, we follow a similar procedure in assigning the bits to positions in the II. System Model and m-bit symbol. We would then have M = 2m sym- Parameters bols, and these can be visualized as points in an M-PAM constellation. In addition to arranging II.A General Setup the bits in a symbol with respect to (w.r.t.) their priorities, we need to place these symbols in a II.A.1 PAM hierarchical way in the constellation to achieve Consider a situation wherein we want to achieve unequal error protection. In Figure 2, the ficti- unequal error protection for various bits in a tious BPSK constellation denoted by the large PAM symbol. Given that information rides on grey circle represents the highest priority (here- the amplitude of the symbol, we need to con- after referred to as first level of hierarchy or sub- struct a specific PAM constellation, to achieve channel i1). The distance d1 refers to the highest the stated goal of unequal error protection. We priority. The distance d2 represents the second consider a generalized 2/4/⋅⋅⋅/M-PAM constella- priority (second level of hierarchy or sub-chan- tion with Karnaugh map style Gray mapping nel i2). Finally, the distance d3 represents the (see Figure 2 for the generalized 8-PAM exam- third priority (third level of hierarchy or sub- ple). In the case of 8-PAM, we assume that there channel i3). In other words, the constellation are 3 streams of data, each of which has a prior- (A through H) can be visualized as a BPSK (S1 ity (or equivalently, a target BER). We take one and S2) embedded into a 4-PAM (T1 through bit from each stream to form a symbol of 3 bits. T4), which is further embedded into an 8-PAM The bit from the bit stream with the highest pri- (A through H). Gray coding is also done hierar- ority (lowest allowed BER) is assigned to the chically. The fictitious BPSK is coded using 1

i3 i3

i1

i2

S S 100 T1 101 1 111 T2 110 010 T3 011 2 001 T4 000

10 11 01 00 ADB 10CFEHG

2d3 2d1 Figure 2 Generalized hierarchical 2/4/8-PAM 2d2 constellations

Telektronikk 1.2002 93 bit. As shown in Figure 2, S1 is coded as 1. In g1,k = b1,k, the next level of hierarchy, the fictitious T1 and gi,k = bi,k ⊕ bi-1,k, i = 2, 3, ⋅⋅⋅, m,(1) T2 are coded as 10 and 11 respectively. Note that we are tagging a 0 or 1 to the right of the Gray where ⊕ represents modulo-2 addition. For code of S1. A and B are then coded as 100 and reader convenience, MATLAB programs gener- 101 respectively, by tagging a 0 and 1 to the ating Gray codes are available at [15]. right of the Gray code for T1. We remind the reader that A through H are the symbols that II.A.2 QAM are transmitted. It helps to visualize the 8-PAM A generalized hierarchical square M-QAM con- constellation as one that evolves from a 4-PAM stellation (M = 22m) can be modeled as follows. which in turn evolves from a BPSK. This is why We assume that there are m bit streams of data we will refer to this 8-PAM constellation as a 2/4/8-PAM. This procedure of Gray coding en- ⎛ 1 ⎞ m = log2 M . Each one of these incoming sures the greatest protection to the MSB at the ⎝ 2 ⎠ cost of the LSB. streams carries information of a particular prior- ity. For every channel access 2 bits are chosen In general, given any M-PAM (M = 2m), this from each level of priority. The 2 highest prior- process can be automated as follows. Label the ity bits are assigned the MSB positions in the in- constellation points from the left to right (or vice phase(I) and the quadrature phase(Q), respec- versa) with integers starting from 0 to M – 1. tively. Bits with lower priorities are assigned the Then, convert the integer labels to their binary subsequent positions of lower significance. For form (for example, if we are dealing with a 16- instance the 2 bits with the second highest prior- PAM (16 = 24), the binary representation of 4 ity are assigned the second most significant posi- would be 0100). For the k-th symbol, let the tions in the I-phase and Q-phase, and so on until binary equivalent be b1,k b2,k ⋅⋅⋅ bm,k. Then, the the 2 least priority bits are assigned the LSB corresponding Gray code (g1,k g2,k ⋅⋅⋅ gm,k) (k = position in the I- and Q-phase. This can be 0, 1, 2, ⋅⋅⋅ , M – 1) with integer label b (b1,k b2,k viewed as a 4/16/64⋅⋅⋅/M-QAM constellation. ⋅⋅⋅ bm,k) is given by

i2

i1

2d1

1000 1010 0010 0000

10 00 2d2

1001 1011 1011 0001

q2

1101 1111 0111 0101

q1 S1 11 01

1100 1110 0110 0100 Figure 3 Generalized hierarchical 4/16-QAM constellations

94 Telektronikk 1.2002 The Gray codes for the symbols are given by S1 i q i q i q θ2 g1g1g2g2 ···gmgm , where the superscripts “i” 0 and “q” refer to the in-phase and quadrature C phase respectively. In other words, the Gray 011 001 θ3 code for the symbol is obtained by interleaving B the Gray codes of the symbol position in the I- T2 T1 channel and Q-channel M PAMs respec- 01 00 A tively. For instance, in the 4/16-QAM constella- 010 000 tion shown in Figure 3, the symbol S1 has the I- D channel Gray code as 00 (rightmost), and the Q- r θ1=π/2 channel Gray code as 11 (third from the top). So, the Gray code of S1 is 0101.

II.A.3 PSK The system model is similar to that of an M- 110 100 PAM constellation, the only difference being E H that the information rides on the phase of the 11 10 symbol, rather than the amplitude. In addition to T3 T4 arranging the bits in a symbol w.r.t. their priori- G 111 ties, we need to place these symbols in a hierar- F 101 chical way in the constellation to achieve unequal error protection. In Figure 4, the ficti- 1 tious BPSK constellation denoted by the * sym- S2 bol represents the highest priority (hereafter referred to as first level of hierarchy or sub- channel i1). The angle θ2 represents the second priority (second level of hierarchy or sub-chan- nel i2). Finally, the angle θ3 represents the third priority (third level of hierarchy or sub-channel Energies Figure 4 Generalized i3). In other words, the constellation (A through The average symbol energy for the generalized hierarchical 2/4/8-PSK H) can be visualized as a BPSK (S1 and S2) M-PAM constellation can be computed as fol- constellations embedded into a QPSK (T1 through T4), which lows. If we were to denote every point in the is further embedded into an 8-PSK (A through constellation by its distance from the origin, then H). Gray coding is also done hierarchically. The the coordinate will be of the form paT, where a same equations (1) can be used to Gray code a is a coordinate row vector (with elements ±dm). generalized hierarchical M-PSK constellation. For example, the symbol S1 in Figure 2 has vec- For reader convenience, MATLAB programs tor a = [–1 + 1 – 1]d3. The energy of a point in automating the Gray code generation are avail- the constellation with coordinate x can be writ- able at [16]. ten x2. By construction, we observe that for every point with coordinates (d1 + x), there is a II.B System Parameters point with coordinates (d1 – x). Also, we note that for every point with coordinates (d1 + d2 + II.B.1 PAM x), there exists a point with coordinates (d1 + d2 – x), and this is true through the hierarchy from Distances level 1 up to level m. Hence, because of this As we have described in section II-A.1, the dis- symmetry, we can see that the average energy tances we use evolve in a hierarchy. To simplify Es can be written as the notation in our proposed algorithm, we   E = d2 + d2 + ···+ d2 = ppT d2 . define the distance vector as d = [d1 d2 ⋅⋅⋅ dm] s 1 2 m m (3) and the priority vector p as

p =[p1p2 ···pm−1pm]= II.B.2 QAM   d1 d2 dm−1 ··· 1 . (2) Distances dm dm dm As has been mentioned previously, square This vector controls the relative message priori- M-QAM constellations can be viewed as two ties. The larger the ratio pi / pi+1 the greater is the protection for the bit in position i than the bit M -PAMs in quadrature. Therefore, to in position (i + 1). describe them, we need two distance vectors, di and dq, where the superscripts “i” and “q” again refer to the I-phase and the Q-phase respectively.

Telektronikk 1.2002 95 In the case of square QAM, these two vectors tude d, and get the m bits corresponding to this d are identically equal and are given by and assign them to their respective bit streams (MSB to i1, ⋅⋅⋅, LSB to im). Equivalently, we i q d = d = [d1 d2 ⋅⋅⋅ dm]. (4) could use dˆ to decode individual bit streams as Energies shown in Figure 5. We note from Figure 2 that Just as in the case of PAM, we can show for for symbols 2/4/8-PAM constellation, the MSB QAM that the average energy is given by: is 0 in the right half plane, and 1 in the left half   2 2 2 T 2 ˆ Es =2 d1 + d2 + ···+ dm =2pp dm. (5) plane. So, if the recovered amplitude, d , is pos- itive, then we can directly assign “0” to the MSB. Similarly, the next most significant bit (i2) II.B.3 PSK ˆ As we have described in section II-A.3, a is 1 if |d| ≤ d1 and 0 else. So, if d ≥ d1, we m 2/4/⋅⋅⋅/M-PSK (M = 2 ) constellation can be directly assign “0” to bit i2. More generally, for described through a set of angles. To simplify 2/4/⋅⋅⋅/M-PAM, the demodulator uses the follow- the notation, we define an angle vector as ing decision rules:

• For bit i : If dˆ ≥ 0, i = 0; else i = 1. θ = [θ1 θ2 ⋅⋅⋅ θm]1 × m (6) 1 1 1 ˆ and the priority vector p as • For bit i2: If |d | ≥ d1, i2 = 0; else i2 = 1.   ⋅⋅⋅ θ1 θ2 θm−1 p =[p1p2 ···pm−1pm]= ··· (7) • For bit i : The decision boundaries are given θm θm θm 1×m m by vector B, that can be constructed as shown by the MATLAB code in Table I. π with θ1 = . Note that these definitions are on 2 This is nothing but ML decoding done on indi- the same lines as for PAMs. As in the case of vidual bits instead of symbols. Demodulation for PAMs, the p vector controls the relative message QAMs can be done similarly. priorities. The larger the ratio pi / pi+1 the greater is the protection for the bit in position i than the PSK bit in position (i + 1). Another important system The demodulator is also based on the ML rule. parameter is the symbol energy for the general- The phase η, of the incoming signal is recov- ized 2/4/⋅⋅⋅/M-PSK constellation, which is given ered. This phase is then used for decoding the bit 2 by Es = r , where r is the symbol amplitude. streams. We could interpret the decoder in two ways. We could use η to identify the most likely II.B.4 Demodulator/Detector transmitted phase θ, and get the m bits corre- sponding to this θ, and assign them to their PAM respective bit streams (MSB to i1, ⋅⋅⋅, LSB to im). The demodulator is based on Maximum Likeli- Equivalently, we could use θ to decode individ- ual bit streams. We note from Figure 4 that for ˆ hood (ML) rule. The amplitude d , of the in- symbols 2/4/8-PSK constellation, the MSB is 0 coming signal is recovered. This amplitude is in the upper half plane, and 1 in the lower half then used for decoding the bit streams. We could plane. So if the recovered phase, η, is such that interpret the decoder in two ways. We could use 0< η < π, then we can directly assign “0” to the MSB. Similarly, the next most significant bit (i ) ˆ 2 Figure 5 Hierarchical d to identify the most likely transmitted ampli- is 1 in the left half plane, and 0 in the right half 2/4/⋅⋅⋅/M-PAM Demodulator π π plane. So if − < η < , we directly assign 2 2

“0” to bit i2. More generally, for 2/4/⋅⋅⋅/M-PSK,

Bit i1 the demodulator uses the following decision rules: Decision rule for bit i1 ^ d • For bit i : If 0 < η < π, i = 0; else i = 1. Bit i2 1 1 1 r(t) Decision rule for bit i Results (n+1) Ts 2 π π in bit • For bit i : If − < η < ,i = 0 ; else i = 1. dt streams 2 2 2 2 2 (sub channels) nT ⋅⋅⋅ Bit i • For bit i : The decision boundaries are given by Ocs m m vector B, that can be constructed similar to what Decision rule for bit im is shown by the MATLAB code in Table I.

96 Telektronikk 1.2002 Once again, this is nothing but ML decoding Generation of vector B Table I MATLAB pseudo-code done on individual bits instead of symbols. for the generation of vector B DecisionBoundVector=d(1 : m – 1) III. Exact BER Computation end B=zeros(2m-1, 1) III.A Exact BER for 2/4/⋅⋅⋅/M-PAM for i = 2m-2 : 2m-1 – 1 Constellations index=i mult=zeros(1, m – 1) III.A.1 2/4-PAM Constellation Consider the 2/4-PAM constellation as shown in for j = m – 1 : –1 : 1 mult(1, j)=2*mod(index,2)–1 Figure 1. The probability of error for the bit i1 in index=floor(index/2) 1 d1 + d2 symbol 00 is given by erfc , where end 2 N0 B(i + 1,1)=mult*DecisionBoundVector erfc(⋅) function is the complimentary error func- end tion defined by for i = 2m-2 : –1 : 1 2 ∞ 2 B(i, 1) = –B(2m-1 – i +1, 1) erfc(x) = exp −z dz ∫x ( ) π end

N and 0 is the two-sided power spectral density 2 of the band pass AWGN. Similarly, the bit error where probability for bit i1 in symbol 01 is given by d = [d d ⋅⋅⋅ d d + d ] (11) 1 d − d + 1 2 m-2 m-1 m 1×(m-1) erfc 1 2 . Hence, the average bit error 2 N0 and probability for bit i1, Pb(4, (d1, d2); i1), is given by d– = [d1 d2 ⋅⋅⋅ dm-2 dm-1 + dm]1×(m-1) (12)

Pb 4,(d1,d2 ),i1 = ( ) are (m – 1) dimensional row vectors (d is an 1 d − d 1 d + d erfc 1 2 + erfc 1 2 . (8) m dimensional row vector). As we proceed 4 N 4 N 0 0 through the recursion, the constellation size goes on decreasing until we either reach a Now, consider the second sub-channel i2. It can stage wherein the vectors d+ and d– are of be shown that the average bit error probability length 2 (in the case of bits i1 and i2), or bit for the sub-channel i2, Pb(4, (d1, d2), i2), is given ik (for k > 2) becomes the LSB. In the former by [17] case we use the result from 4-PAM (given in Sect. III-A.1), and come out of the recursion.

Pddib ()4,,,()12 2= In the latter case (k = m), we do the following. 1 ⎡ d 22dd+ dd− ⎤ ⎢2erfc2 erfc12 erfc 12⎥. (9) − + • If k = m or in the event that bit ik becomes the 4 ⎣⎢ N 0 N 0 N 0 ⎦⎥ LSB at some stage through the recursion, we need an LSB recursion algorithm. It was III.A.2 Generalized 2/4/⋅⋅⋅/M-PAM shown in [17] that the recursion for the LSB is Constellation given by We propose a recursive algorithm for the exact BER computation of generalized 2/4/⋅⋅⋅/M-PAM 1 PMbm(),,d i=+m () P01 P , (13) constellations. We use the generalized 4-PAM 2 constellation as the root for this algorithm. We where have already developed the expressions for exact BER of bits i1 and i2 in the case of 4-PAM (m = P0 = m−1 m−1 2). To compute the BER for bits ik, Pb(M, d, ik), 2 2 ⎡ ⎛ ⎞ ⎤ 1 j +1 B( j) − d0 (i) (k = 1, 2, ⋅⋅⋅, m), where bit i represents the LSB ∑ ∑ ⎢(−1) erfc⎜ ⎟ ⎥ (14) m 2 ⎢ N ⎥ and m > 2. The pseudo-code of the algorithm i=1 j =1 ⎣ ⎝ 0 ⎠ ⎦ was shown to be given by [17]: and • If k < m P1 = Pb (M,d,ik ) = 2m−1 ⎛ 2m−1 ⎡ ⎤⎞ 1 j ⎛ B( j) − d (i)⎞ 1 ⎡ ⎛ M ⎞ ⎛ M ⎞ ⎤ ⎜1+ ⎢(−1) erfc 1 ⎥⎟, (15) Pb ,d+ ,ik + Pb ,d− ,ik , ∑ ∑ ⎜ ⎟ ⎢ ⎝ ⎠ ⎝ ⎠ ⎥ i=1 ⎜ j =1 2 ⎢ ⎝ N0 ⎠ ⎥⎟ 2 ⎣ 2 2 ⎦ (10) ⎝ ⎣ ⎦⎠

Telektronikk 1.2002 97 III.B Extension to Generalized Table II MATLAB Generation of vectors d0 and d1 pseudo-code for generation Hierarchical M-QAM d =zeros(2m, 1) of vectors d0 and d1 sym Constellations m-1 d0=zeros(2 , 1) As mentioned before, QAM constellations can m-1 d1=zeros(2 , 1) be viewed as 2 PAMs in quadrature. This fact for i = 2m-1 : 2m – 1 helps to deduce the BER expressions [17] for QAMs in terms of those of the 2 PAM constella- index=i tions. For instance, square QAM constellations mult=zeros(1,m – 1) use 2 bits for every level of priority (M = 22m). for j = m – 1 : –1 : 1 Figure 6 shows a 4/16/64-QAM constellation. mult(1, j)=2*mod(index,2)–1 index=floor(index/2) In the case where every level of priority is not end restricted to 2 bits per channel access, many other QAM constellations can arise. The recur- d (i + 1, 1)=mult*d sym sive algorithm developed here for the exact BER end computation can be readily adapted to treat all m-1 for i = 1 : 2 these cases. As an example of the applicability m dsym(i, 1)=–d(2 – i + 1, 1) of the proposed algorithm, we will consider a end family of rectangular M-QAM (i.e. M = 22m+1) constellations with m + 1 incoming streams of d0(1,1)=dsym(1,1); data. For this considered family of constellations x=2;y=1; the highest priority level is assigned 1 bit in the while(x < 2m – 1) in-phase whereas all the other levels are assigned d1(y,1)=dsym(x,1) two bits in similar fashion as the square QAM d1(y+1,1)=dsym(x+1,1) case described above. This can be viewed as a

d0(y+1,1)=dsym(x+2,1) 2/8/32⋅⋅⋅/M-QAM constellation. As an illustra- tion of this family of constellations Figure 7 d0(y+2,1)=dsym(x+3,1) shows a generalized 32-QAM constellation. x=x+4 y=y+2 III.C Exact BER for 2/4/⋅⋅⋅/M-PSK end

d1(y,1)=dsym(x,1) III.C.1 2/4-PSK Constellation

d1(y+1,1)=dsym(x+1,1) Consider the 2/4-PSK constellation shown in Figure 8. For the MSB (bit i ), it can be shown d0(y+1,1)=dsym(x+2,1) 1 that

1 Pb 4,(θ1,θ2 ),i1 = erfc γ sinθ2 , (16) ( ) 2 ( )

r2 where where γ = is the symbol carrier to noise N0 - B is the vector of decision boundary posi- ratio (CNR). Now, consider the LSB (second tions for the LSB, w.r.t. the origin, sub-channel i2). It can easily be shown that [18]

- d0 is the vector of positions of constellation 1 Pb 4,(θ1,θ2 ),i2 = erfc γ cosθ2 . (17) points whose LSB is 0, w.r.t. the origin, and ( ) 2 ( )

- d1 is the vector of positions of constellation III.C.2 Generalized 2/4/⋅⋅⋅/M-PSK points whose LSB is 1, w.r.t. the origin. Constellations We notice that the decision boundaries for bits The generation of these vectors is shown in i1, i2,⋅⋅⋅, im in terms of θ1, θ2, ⋅⋅⋅, θm are the same Tables I and II. For the reader’s convenience, in both 2/4/⋅⋅⋅/M-PSK and 2/4/⋅⋅⋅/M/2M-PSK. So, our MATLAB computer programs for PAM the protection angles for these bits in the latter (along with a readme file) are available at [15] case differ from those in the former, by ±θm+1. to allow one to immediately compare the per- Similar to the PAM case, we propose a recursive formance of various generalized hierarchical algorithm for which we use the generalized 2/4- constellations. PSK constellation as a root. We have already developed the expressions for exact BER of bits i1 and i2 in the case of 2/4-PSK (m = 2). To com- pute the BER for bits ik, Pb(M, θ, ik), (k = 1, 2, ⋅⋅⋅, m), where bit im represents the LSB, and

98 Telektronikk 1.2002 m > 2, we use a recursive algorithm, whose 00 01 01 00 00 01 01 00 pseudo code is given as follows: 00 01 01 00

• If k < m 10 11 11 10 10 11 11 10

Pb (M,θ ,ik ) = 01 00 1 ⎡ ⎛ M ⎞ ⎛ M ⎞ ⎤ S1 P ,θ ,i + P ,θ ,i , (18) ⎢ b ⎝ + k ⎠ b ⎝ − k ⎠ ⎥ 2 ⎣ 2 2 ⎦ 10 11 11 10 10 11 11 10 10 11 11 10 where 00 01 01 00 00 01 01 00

θ+ = [θ1 θ2 ⋅⋅⋅ θm-2 θm-1 + θm]1×(m-1) (19)

and 00 01 01 00 00 01 01 00 10 11 11 10 θ– = [θ1 θ2 ⋅⋅⋅ θm-2 θm-1 – θm]1×(m-1) (20) 10 11 11 10 10 11 11 10 are (m – 1) dimensional row vectors (note that θ is an m dimensional row vector). As we pro- 11 10 ceed through the recursion, the constellation size goes on decreasing until we either reach 10 11 11 10 10 11 11 10 a stage wherein the vectors θ+ and θ– are of 00 01 01 00 length 2 (this is the case with bits i1 and i2) or the bit ik (for k > 2) becomes the LSB. In the 00 01 01 00 00 01 01 00 former case we use the result from 2/4-PSK T (given in Section III-C.1) and come out of the 1 2d3 recursion. In the latter case, we use the follow- 2d2 ing closed form expression. 2d1

• If k = m or in the event that bit ik becomes the LSB at some stage through the recursion, the BER of the LSB, Pb(M, θ, im), can be written Figure 6 Hierarchical 64-QAM constellation. Gray coding is done hierarchically. in closed form [18] as For instance, the Gray code for symbol S1 is 001010, while that for symbol T1 is 110000 1 Pb (M,θ ,im ) = (P0 + P1 ), (21) 2m

where

2m−1 2m−1 j P0 = ∑ ∑(−1) F(B( j) − φ0 (i)), (22) i=1 j =1

and S1 00 01 01 00 00 01 01 00

2m−1 2m−1 00 01 01 00 j +1 P1 = (−1) F B( j) − φ1(i) , 2d3 ∑ ∑ ( ) (23) 10 11 11 10 10 11 11 10 i=1 j =1 1 0 where the F-function is defined in [19], [20] 2d2 as follows: 10 11 11 10 10 11 11 10 2 sgn(ψ ) π − ψ ⎡ sin ψ ⎤ 10 11 11 10 F ψ = − exp −γ dθ, (24) ( ) ∫0 ⎢ 2 ⎥ 2π ⎣⎢ sin θ ⎦⎥ 00 01 01 00 00 01 01 00 −π < ψ < π, T 2d3 1 where sgn(⋅) is the sign function. Please refer 2d2 to [19], [20] for the properties of this function. 2d1

In (22) and (23), the vectors B, φ0 and φ1 are Figure 7 Hierarchical 32-QAM constellation. The Gray coding is done hierarchi- defined as follows. cally. For instance, the Gray code for symbol S1 is 00100, while that for symbol T1 is 01101

Telektronikk 1.2002 99 0 - The elements of B represent the angular positions of the decision boundaries of the LSB w.r.t. the reference axis.

01 BA- The elements of φ0 represent the angular 00 positions of those symbols in the constella- tion (2m – PSK), whose LSB is ‘0’, w.r.t. θ2 the reference axis.

- The elements of φ1 represent the angular positions of those symbols in the constella- θ1=π/2 tion (2m – PSK), whose LSB is ‘1’, w.r.t. the reference axis.

As an example, see Figure 9 for the generation of vectors a, φ0 and φ1 in the case of 2/4/8- PSK constellation. More generally, these vec- tors can be easily generated for any 2/4/⋅⋅⋅/M- PSK. Note that vector B is similar to vector B 11 10 in the case of PAMs, and it can be generated CDusing the same pseudo code given in Table I, but by replacing vector d with θ. Vectors φ0 and φ1 are similar to the vectors d0 and d1 respectively. They can be generated using the 1 same pseudo-code given in Table II, but by replacing vectors d, d0 and d1 with θ, φ0 and Figure 8 2/4-PSK constellation φ1 respectively. For reader convenience, all our MATLAB programs for PSK (along with a readme file) are available at [16].

IV. Special Cases

011 001 IV.A Uniform M-PAM a(2) θ3 a(1) Standard uniform M-PAM constellations can be viewed as a special case of the generalized M- PAMs when p = [2m-1 2m-2 ⋅⋅⋅ 4 2 1]. The result- θ2 010 000 ing BER for bit ik, Pb(M, p, ik), obtained by using the recursive algorithm, can be shown to θ1 be in agreement with the explicit closed form expression recently derived in [13] and given by reference

Pb (M,p,ik ) = 1−2−k M −1 () ⎢ j2k−1 ⎥ ⎡ k −1 ⎤ 1 k −1 ⎢ j2 1 ⎥ 010 100 (−1)⎣⎢ M ⎦⎥ ⎢2 − ⎢ + ⎥⎥ M ∑ M 2 j =0 ⎣⎢ ⎣⎢ ⎦⎥⎦⎥ ⎡ ⎤ 3γ log M erfc⎢( 2 j +1) 2 ⎥, (25) ⎢ M 2 −1 ⎥ a(3) a(4) ⎣⎢ ( ) ⎦⎥ 111 101 where ⎣⎦⋅ is the floor function. The average

BER Pb(M, p) is given by

Vector a = π - π +-π - - π + m 2 θ2 2 θ2 2 θ2 2 θ2 1 Pb (M,p) = ∑ Pb (M,p,ik ). (26) m k =1 π π π π - θ2 - θ3 +θ2 +θ3 --- θ2 - θ3 +θ2 +θ3 Vector Φ0 = 2 2 2 2 IV.B Uniform Square QAM M-QAM can be viewed as a special case of the π - - π + - --π - - π + - generalized M-QAMs when p = [2m-1 2m-2 ⋅⋅⋅ 4 2 Vector Φ1 = 2 θ2 θ3 2 θ2 θ3 2 θ2 θ3 2 θ2 θ3 1]. Please refer to Figure 6 for a 4/16/64-QAM example. This can be used when all the incom- Figure 9 Vectors a, φ0 and φ1 for a 2/4/8-PSK constellation ing bit streams have nearly the same priority.

100 Telektronikk 1.2002 By symmetry, the in-phase and quadrature phase ⎡ ⎤ M bits (ik and qk) have the same BER. The average log2 1 ⎣ ⎦ BER for both the in-phase and quadrature phase Pb(M)= Pb(M,θ,ik) (28) log M bits is obtained by averaging the BER of all in- 2 k=1 phase bits ik and quadrature bits qk where Pb(M, θ, ik) is the BER of the bit ik in the   constellation, and which can be evaluated using 1 k =1, ···, log2 M yielding (18) and (21). Using these equations in (28), it 2 can also be shown that the exact BER of uniform M-PSK can be written in terms of the Hamming  M m weights of the M symbols,W , (k =1, 2, ···,M) 1  √  k P s(M,d)= P M,d,i as b 2m b k k=1 M  1 m √  P M W M b( )= M k + Pb M,d,qk log2 k=1 k=1      (2k − 1)π (2k − 3)π F − F 1 m √  M M = P M,d,i , m b k (27) k=1 in agreement with [21], [22] and [23, Section √  4.1]. Note that for constellations Gray coded in P M,d,i where√ b k is the BER of the bit ik the Karnaugh map style (as shown in section M M in a -PAM constellation. II-A.3), these Hamming weights Wk can be cal- culated in terms of the m bits (m = log2M) in the IV.C Uniform PSK Gray codes of the symbols, gk, (corresponding to Uniform M-PSK can be viewed as a special case the k-th symbol) as

of the generalized 2/4/⋅⋅⋅/M-PSKs when M   log2 π π π π M θ = ··· Wk = gi,k 2 4 8 M m-1 m-2 , (i.e. when p = [2 2 i=1 2m-3 ⋅⋅⋅ 4 2 1]). The average BER is obtained by averaging the BER of all the bits, ik (k = 1, 2, ⋅⋅⋅, log2M) yielding

Uniform 64-QAM obtained by setting priority vector p = [4 2 1] 10-0

10-1

Approximate Recursive Algorithm [12] Exact Explicit Expression for Square QAM [13] and Exact Recursive Algorithm 10-2 Monte Carlo Simulation Leading Term Approximation Average bit error probability Average

10-3 Figure 10 Comparison of the various methods (Yang and Hanzo [12], Cho and Yoon [13] and the proposed recursive algorithm) for the 10-4 computation of the BER for -5 0 5 10 15 20 the uniform 64-QAM case Carrier to noise ratio, Es/N0 [dB] with p = [4 2 1]

Telektronikk 1.2002 101 V. Numerical Examples tools for a variety of scenarios of practical inter- Since the number of different cases covered by est. We further note that the analytical expres- our analysis is quite large, we just present in this sions derived in the previous sections, and the section some numerical examples that demon- corresponding numerical results presented in this strate the usefulness of the proposed analytical section have been verified extensively by Monte

Figure 11 Performance of 4/16/64-QAM with p = [5 2.25 1] 4/16/64-QAM. Note that the 10-0 MSB performs much better than the LSB

10-1

LSB in the I and Q channel 10-2 (bits i3 and q3)

Bits i2 and q2

MSB in the I and Q channel (bits i and q ) 10-3 1 1

Average bit error probability Average 10-4

10-5

-6 10 -5 0 5 10 15 20

Carrier to noise ratio, Es/N0 [dB]

Generalized 2/4/8/16-PSK with Θ = [π/2 π/6 π/15 π/40], p = [20 6.6667 2.6667 1] 10-0

10-1 Bits i1 (MSB)

Bits i2

Bits i3

Bits i3 10-2 Average bit error probability Average

10-3

Figure 12 Performance of 2/4/8/16-PSK. Note that the MSB performs nearly 20 dB 10-4 -5 0 5 10 15 20 better than the LSB at BER ≤ 10-2 Carrier to noise ratio, Es/N0 [dB]

102 Telektronikk 1.2002 8/32-PSK constellation, with Θ = [π/2 π/4 π/8 π/20.8 π/41.6], p = [20.8 10.4 5.2 2 1] Figure 13 Performance of 10-0 2/4/8/16/32-PSK. The 3 most significant bits are the base bits and the remaining are refinement bits. Note that on average, the base bits perform 10–15 dB better than the Bits i and i 10-1 1 2 -2 Base refinement bits at BER ≤ 10 bits Bit i3 Average BER for base bits

Refinement Bit i4 bits 10-2 Bit i5 Average BER for refinement bits Average bit error probability Average

10-3

10-4 -5 0 5 10 15 20

Carrier to noise ratio, Es/N0 [dB]

Carlo simulations for the various constellations the factor β. These protected bits (i1, i2, i3) are under consideration. Thus, the reader can be referred to as base bits, and the others (i4, i5) totally confident in the correctness of the newly are referred to as the refinement bits. As β is derived recursive expressions and the accuracy increased, the 2 sets of curves move further apart. of the numerical results illustrated below. VI. Potential Applications Figure 10 deals with the uniform 64-QAM case and shows the perfect match between the pro- VI.A Voice and Data Applications posed exact recursive algorithm and the exact Four years ago, a new scheme for simultaneous generic expression obtained by Yoon et al. [13] data and voice transmission over fading channels as well as the Monte-Carlo simulations. The was proposed [24]. It provided high average approximate recursive expression obtained in spectral efficiency for data transmission, while [12] comes very close to the exact result, where- meeting stringent delay requirements for voice as the leading term approximation1) gives an transmission. This modem always transmits optimistic result at low CNR. Figure 11 shows voice packets, in the form of a BPSK signal on the performance of 4/16/64-QAM constellation. the Q-channel. When fading is not severe, the Note that for the priority vector p = [5 2.25 1], modem assigns a significant transmit power to the protection of the MSB gets a higher protec- data transmission. This is done by using a large tion than the LSB. Figure 12 shows the similar PAM constellation (on the I-channel). As the nature of performance of 2/4/8/16-PSK. Figure channel fading gets severe, the modem reduces 13 shows the performance of a 2/4/8/16/32-PSK the size of this constellation, thereby giving constellation, where the p vector is given by more power to voice transmission to meet the p =[β24 β23 β22 2 1], with β = 1.3 ≥ 1. This is a stringent delay constraints. We are currently situation wherein the 3 most significant bits are looking into designing an alternative adaptive protected better than the 2 least significant bits. scheme wherein hierarchical QAM or PSK con- Also, note that the bits i1, i2, i3 form a uniform stellations are used instead, in order to improve 8-PSK, while the bits i4, i5 form a uniform 4- the spectral efficiency for simultaneous multi- PSK. So, this is a constellation wherein an 8- media transmission. PSK, embedded in a 32-PSK is protected by

1) By leading term, we mean just the dominant erfc(⋅) term.

Telektronikk 1.2002 103 VI.B Downlink Multiplexing/ 5 Morimoto, M et al. A study on power assign- Multicasting ment of hierarchical modulation schemes As we have seen in the previous section, by for digital broadcasting. IEICE Trans. Com- virtue of Gray coding, we protect the MSB much mun., E77-B, 1994. more than the LSB. Could this property along with hierarchical channel coding be used for 6 Morimoto, M, Okada, M, Komaki, S. A hier- multicasting information from base station to archical image transmission system in a fad- mobile units? Information for the user who suf- ing channel. In: Proc. IEEE Int. Conf. Univ. fers the least fading could be transmitted on the Personal Comm. (ICUPC’95), 769–772, LSBs of the signal, while that for the user suffer- 1995. ing the worst fading could be transmitted on the MSBs of the same signal. The users receive the 7 Pursley, M B, Shea, J M. Nonuniform phase- symbol and look only for the bits in particular shift-key modulation for multimedia multi- positions within the symbol. We are now focus- cast transmission in mobile wireless net- ing on the development of this application and works. IEEE Journal of Selected Areas in the comparison of its spectral efficiency perfor- Communications, SAC-17, 774–783, 1999. mance with that of existing multicast strategies. 8 Pursley, M B, Shea, J M. Adaptive nonuni- VII. Conclusion form phase-shift-key modulation for multi- We have argued that the recursive way of Gray media traffic in wireless networks. IEEE coding a PAM/PSK constellation ensures the Journal of Selected Areas in Communi- existence of a recursive algorithm for the com- cation, SAC-00, 1394–1407, 2000. putation of the BER. As a result, we have obtained exact and generic expressions (in M) 9 DVB-T standard: ETS 300 744. Digital for the BER of the generalized hierarchical Broadcasting Systems for Television, Sound M-PAM constellations over AWGN channels. and Data Services: Framing Structure, These results can be extended easily to fading Channel Coding and Modulation for Digital channels [17], [18]. We have also shown that Terrestrial Television. (ETSI Draft, 1.2.1, these expressions can be extended to generalized EN300 744, 1999-1.) hierarchical M-QAM constellations. Using the same argument, we have derived similar BER 10 Fitz, M O, Seymour, J P. On the bit error expressions for hierarchical PSK constellations. probability of QAM modulation. Interna- Because of their generic nature, these new tional Journal of Wireless Information Net- expressions readily allow numerical evaluation works, 1 (2), 131–139, 1994. for various cases of practical interest. In particu- lar numerical examples have shown that the 11 Lu, J et al. M-PSK and M-QAM ber com- leading-term approximation gives significantly putation using signal-space concepts. IEEE optimistic BER values at low CNR but is quite Trans. Commun., COM-47, 181–184, 1999. accurate in the high CNR region. Finally, we have suggested possible applications, voice and 12 Yang, L L, Hanzo, L. A recursive algorithm data integration and downlink multiplexing, to for the error probability evaluation of M- name a few. QAM. IEEE Commun. Letters, 4, 304–306, 2000. References 1 Cover, T. Broadcast channels. IEEE Trans. 13 Yoon, D, Cho, K, Lee, J. Bit error probabil- on Inform. Theory, IT-18, 2–14, 1972. ity of M-ary quadrature amplitude modula- tion. In: Proc. IEEE Veh. Technol. Conf. 2 Sundberg, C E W, Wong, W C, Steele, R. (VTC’2000-Fall), Boston, Massachusetts, Logarithmic PCM weighted QAM transmis- 2422–2427, September 2000. sion over Gaussian and Rayleigh fading channels. IEE Proc., 134, 557–570, 1987. 14 Vitthaladevuni, P K, Alouini, M-S. BER computation of generalized hierarchical 3 Ramchandran, K et al. Multiresolution 4/M-QAM constellations. In: Proc. IEEE broadcast for digital HDTV using joint International Symposium on Personal, source/channel coding. IEEE Journal of Indoor and Mobile Radio Commun. Conf. Selected Areas in Communications, 11, (PIMRC’2001), San Diego, California, 1, 1993. 85–89, September 2001. (Journal version in IEEE Trans. on Broadcasting, 47 (3), 4 Wei, L-F. Coded modulation with unequal 228–239, 2001.) error protection. IEEE Trans. Commun., COM-41, 1439–1449, 1993. 15 Vitthaladevuni, P K, Alouini, M-S. MATLAB programs for design and BER computation

104 Telektronikk 1.2002 of generalized hierarchical M-QAM constel- 20 Pawula, R F. Distribution of the phase angle lations. Available at: http://www.ece.umn. between two vectors perturbed by Gaussian edu/users/pavan/Generalized-Hierarchical- noise II. IEEE Trans. Veh. Technol., 50, Qam.html 576–583, 2001.

16 Vitthaladevuni, P K, Alouini, M-S. MATLAB 21 Lee, P J. Computation of the bit error rate of programs for design and BER computation coherent M-ary PSK with Gray code bit of generalized hierarchical M-PSK constel- mapping. IEEE Trans. Commun., COM-34, lations. Available at: http://www.ece.umn. 488–491, 1986. edu/users/pavan/Generalized-Hierarchical- PSK.html 22 Tellambura, C, Mueller, A J, Bhargava, V K. Analysis of M-ary phase-shift keying with 17 Vitthaladevuni, P K, Alouini, M-S. BER diversity reception for land-mobile satellite computation of generalized QAM constella- channels. IEEE Trans. Veh. Technol., tions. In: Proc. IEEE Global Commun. Conf. VT-46, 910–922, 1997. (GLOBECOM’2001), San Antonio, Texas, 1, 632–636, 2001. (Journal version submitted 23 Simon, M K, Hinedi, S M, Lindsey, W C. to IEEE Trans. on Information Theory.) Digital Communication Techniques – Signal Design and Detection. Englewood Cliffs, 18 Vitthaladevuni, P K, Alouini, M-S. BER NJ, Prentice Hall, 1995. computation of generalized hierarchical PSK constellations. In: Proc. IEEE International 24 Alouini, M-S, Tang, X, Goldsmith, A. An Conference on Communications (ICC’2002), adaptive modulation scheme for simultane- New York, April 2002. (Journal version sub- ous voice and data transmission over fading mitted to IEEE Trans. Commun.) channels. IEEE Journal of Selected Areas in Communication, SAC-17, 837–850, May 19 Pawula, R F, Rice, S O, Roberts, J H. Distri- 1999. (See also Proc. of the IEEE Vehicular bution of the phase angle between two vec- Technology Conference (VTC’98), 939-943, tors perturbed by Gaussian noise. IEEE Ottawa, Ontario, Canada, May 1998.) Trans. Commun., COM-30, 1828–1841, 1982.

Telektronikk 1.2002 105 Performance Analysis of Adaptive Coded Modulation with Antenna Diversity and Feedback Delay

KJELL J. HOLE, HENRIK HOLM AND GEIR E. ØIEN

A general adaptive coding scheme for spectrally efficient transmission on flat fading channels was intro- duced by the authors in an earlier paper [2]. An instance of the coding scheme utilizes a set of multi- dimensional trellis codes designed for additive white Gaussian noise channels of different qualities. A feedback channel between the decoder and encoder makes it possible for the encoder to switch adaptively between these codes based on channel state information fed back from the decoder. In this paper, the adaptive coding scheme is employed in a mobile wireless communication system consisting of a stationary transmitter with one antenna, a wireless Rayleigh fading channel, and a mobile terminal with one or more receive antennas. The bit-error-rate at the output of the decoder is determined for various terminal speeds, time delays in the feedback channel, and number of receive antennas. The Kjell Jørgen Hole (41) received obtained results indicate that the proposed adaptive coding scheme is well suited for communications his BSc, MSc and PhD degrees over mobile wireless channels with carrier frequencies in the high MHz range, delay spread up to in computer science from the 250 ns, and terminal mobility up to pedestrian speed. University of Bergen in 1984, 1987 and 1991, respectively. He is currently Senior Research Scientist at the Department of Telecommunications at the Nor- I. Introduction nonzero terminal speed. As an example, Section wegian University of Science Many authors have studied adaptive (coded) IV evaluates a specific adaptive encoder and and Technology (NTNU) in modulation for wireless communications (see [1] decoder (codec) utilizing a set of four-dimen- Trondheim. His research inter- ests are in the areas of coding and the references therein). In an earlier paper sional trellis codes. A conclusion is drawn in theory and wireless communi- [2], we considered a general adaptive coding Section V. cations. scheme for single-user channels with frequency- [email protected] flat slowly varying multipath fading. A particu- II. System Model and lar instance of this coding scheme utilizes a set Coding Scheme of multidimensional trellis codes designed for The system model consists of a stationary trans- additive white Gaussian noise (AWGN) chan- mitter/receiver (transceiver), a wireless fre- nels of different qualities. A feedback channel quency-flat fading channel, and a mobile trans- makes it possible for the encoder to switch adap- ceiver, or terminal. It is assumed that the dis- tively between these codes based on channel tance between the stationary transceiver and the state information (CSI) fed back from the de- mobile terminal is not more than a few hundred coder, thus resulting in an overall scheme with meters. We will only consider the flow of user high spectral efficiency. information on the downlink. Hence, in our model the feedback channel (or uplink) from the The output bit-error-rate (BER) of an adaptive terminal to the receiver will only be used for coding scheme may increase with growing time CSI. delay in the feedback channel and/or increasing terminal speed [3]. Since any implemented feed- The stationary transceiver has one transmit back channel has nonzero feedback delay, and antenna, while the mobile terminal has H (≥ 1) since it is necessary to allow for mobile termi- receive antennas. Each of the H antenna nals, it is important to determine the BER degra- branches is modeled as a Rayleigh fading chan- Henrik Holm (30) received his Siv.Ing. and PhD degrees in dation of the proposed adaptive coding scheme nel with ideal coherent detection. It is assumed electrical engineering from the in [2]. Alouini and Goldsmith [4] have deter- that the branch signals are statistically indepen- Norwegian University of Science mined the BER degradation for uncoded adap- dent. and Technology (NTNU) in 1997 and 2002, respectively. He is tive modulation. In this paper, we extend their currently a post doctoral re- technique to determine the BER degradation of Denoting the transmitted complex baseband sig- seacher at the NTNU and a any instance of the proposed adaptive coding nal at time index t ∈ {0, 1, 2 ...} by x(t), the guest researcher at the Univer- sity of Minnesota. His research scheme. received signal at antenna h ∈ {1, 2, ..., H} interests include statistical mod- can then be written as yh(t) = αh(t) ⋅ x(t) + nh(t). elling and robust transmission We first introduce, in Section II, a mobile wire- Here, the stationary and ergodic fading envelope on wireless channels. less channel with Rayleigh fading, where the αh(t) is a real-valued random variable with a [email protected] mobile terminal has multiple receive antennas Rayleigh distribution, and nh(t) is complex-val- whose signals are combined using the maximal ued AWGN with statistically independent real ratio combining (MRC) method [5, Ch. 5]. For and imaginary components. The total one-sided any instance of the adaptive coding scheme and power spectral density of the AWGN is denoted any number of receive antennas, Section III then N0 [W/Hz] and the one-sided channel bandwidth shows how to determine the BER degradation is denoted B [Hz]. associated with a nonzero feedback delay and a

106 Telektronikk 1.2002 Let S [W] denote the constant average transmit code n be based on the constellation with Mn power. The instantaneous received carrier-to- symbols. For some small fixed L ∈ {1, 2, ...}, noise ratio (CNR) on antenna branch h at time the encoder for code n accepts L ⋅ log2(Mn) – 1 index t is then information bits at each time index k = L ⋅ t ∈ 2 {0, L, 2L, ...} and generates L ⋅ log (M ) coded αh(t) · S 2 n γh(t)= ,h=1, 2, ···,H, bits. The coded bits specify L modulation sym- N0B bols in the nth QAM constellation. These sym- bols are transmitted at time indices k, k + 1, ..., γ¯h with expectation E[γh(t)] = = ΩS/(N0B) k + L – 1. The L two-dimensional symbols can   2 be viewed as one 2L-dimensional symbol, and where Ω= E α h ( t ) is assumed independent of h. Thus, γ ¯ h is also equal for all h. for this reason the code is said to be a 2L-dimen- sional trellis code. In practice, the N codes are The mobile terminal implements an MRC com- chosen such that they may be encoded and biner to process the H received branch signals decoded by the same codec [2]. [5, p. 316]. Since the branch signals are statisti- cally independent, the instantaneous CNR at the To determine the values of the fading region Geir E Øien (36) received his output of the H-branch MRC combiner is given boundaries (or thresholds) γn, we need to deter- γ = H γ MSc and PhD degrees from the by h =1 h .1) If we denote E[γ] = γ¯ then mine the BER performance of each code. When Norwegian University of Science γ¯h = γ¯ /H, and the gamma probability density code n is operating on an AWGN channel of and Technology (NTNU) in 1989 and 1993, respectively. From function (pdf) of the instantaneous CNR γ at the CNR γ, the BER-CNR relationship for varying 1994 until 1996 he was with output of the MRC combiner may be written as γ may be approximated by the expression Stavanger University College as [5, Eq. (5.2-14)]   associate professor. Since 1996 −bnγ  H   BER ≈ a · exp , he has been with NTNU, since H γH−1 γ n (2) p (γ)= exp −H , Mn 2001 as full professor of infor- γ γ¯ (H − 1)! γ¯ (1) mation theory. Prof. Øien is a member of IEEE and the Norwegian Signal Pro- γ ≥ 0. where an (> 0) and bn (> 0) are constants which cessing Society. He is author/ depend only on the weight distribution of the co-author of more than 40 It is convenient to view the combination of the H code [2]. These constants can be found for any research papers. His current research interests are in infor- antenna branches and the MRC combiner as a given code by least-squares curve fitting of data mation theory, communication single channel. The instantaneous CNR γ at the from AWGN channel simulations to (2). The fit- theory, and signal processing, output of this channel determines the channel ting must be done separately for each code in the with emphasis on analysis and design of wireless communica- state at a given time. We assume that the mobile set. tion systems. terminal has perfect knowledge of γ. The range [email protected] [0,° ) of possible CNR values is divided into Plots of BER found in the literature indicate that N + 1 non overlapping intervals (or fading the approximation in (2) is accurate for any regions). At any given time the CNR will fall in CNR γ resulting in BER & 10-1 (see Figure 1 for one of these fading regions, and the associated an example). Unfortunately, for the minimum CSI, i.e. the region index n ∈ {0, 1, ..., N}, is value γ = 0, the approximation reduces to BER ≈ sent to the stationary receiver via the feedback an, and since an can be larger than one, (2) may channel, which is assumed to be error free. be of little use for low CNRs. When we only want to approximate the BER at moderate-to- Assume that γ ∈ [0, γ1) in fading region 0, high CNRs, as was done in [2], this is not a γ ∈ [γn, γn+1) in region n ∈ {1, 2, ..., N – 1}, and problem. However, we need to approximate the γ ∈ [γN,° ) in region N. Also, assume that the BER for any CNR γ ≥ 0 in this paper, and we BER must never exceed a target maximum will therefore use the following BER expression BER0. When γ ∈ [γn, γn+1) we use a multidimen- for code n sional trellis code, denoted code n ∈ {1, 2, ...,    N}, designed to achieve a BER ≤ BER0 on an bnγ ∗ an · exp − M ,γ≥ γn AWGN channel of CNR γ ≥ γn. For γ < γ1, i.e. BERn = n 1 ,γ<γ∗ γ in fading region 0, the channel conditions are 2 n (3) so bad that no information is transmitted, and we have an outage during which the information Here, the boundary flow is buffered. ∗ ln(2an)Mn γn = bn Let 4 ≤ M1 < M2 < ... < MN denote the number of symbols in N quadrature amplitude modulation (QAM) constellations of growing size, and let

1) We suppress the time dependence from now on for notational simplicity.

Telektronikk 1.2002 107 where ∞ Γ υ,µ = t υ −1e−t dt -0.5 ( ) ∫µ (6)

is the complementary incomplete gamma func- -1 tion [6, Eq. (8.350.2)]. Since H is an integer in (5), the function may be calculated using [6, Eq. (8.352.2)]. -1.5 III. BER Degradation

BER -2 The BER degradation due to nonzero feedback 10 delay and nonzero terminal speed is determined log in this section. It is assumed that the communi- -2.5 cation system utilizes a set of N trellis codes with known parameters an and bn.

-3 M=8 M=32 M=128 M=512 Let the total feedback delay, τ [s], be the time between the moment the mobile terminal acquires a set of L modulation symbols and the -3.5 M=4 M=16 M=64 M=256 moment the stationary transmitter activates a new code. The total feedback delay is deter- mined by the sum of three delays: i) the process- -4 ing time needed by the terminal to estimate the 5101520 25 30 instantaneous CNR γ and to determine in which CNR in dB fading region n the CNR falls, ii) the time needed to feed back the region index n to the transmitter, and iii) the processing time needed by the transmitter to activate code n. Figure 1 The boxes are BER is the smallest CNR such that the BER is no estimates generated by larger than 0.5. The boundary was obtained by In a real system, the processing delay i) depends software simulation and the assuming equality in (2), setting BER = 0.5, and on the technique used to estimate the instanta- curves are estimates obtained solving for γ. neous received CNR, whereas the processing from (3). The labels indicate delay iii) depends on the encoder complexity. the number of symbols in the For a true BER between 10-1 and 0.5 the expo- Since the distance between the stationary trans- QAM signal constellations nential expression in (3) tends to produce a ceiver and the mobile terminal is assumed to be utilized by the 4-dimensional larger value than the true BER, assuming that no more than a few hundred meters, the trans- trellis codes the coded communication system manages to mission delay ii) is mainly determined by the maintain synchronization. In practice, it is diffi- communication protocols. cult to maintain synchronization for a very high BER, and the approximation BER = 0.5 may The size of the signal constellation Mn = Mn(γ) therefore be close to the true BER of a real sys- at time index t is a function of the instantaneous tem. If the coded system should exhibit a BER received CNR γ, but the constellation is used at > 0.5 for a very low CNR, then all decoded time t + τ when γ has changed to γτ . Conse- information bits may be flipped to achieve a quently, while the CNR γ falls in some fading BER < 0.5. Hence, 0.5 is a reasonable upper region n, i.e. γn ≤ γ < γn+1, the CNR γτ may fall bound on the BER. outside this region. Substituting γτ for γ in (3), we can write the BER as a function of γτ for a Assuming a target BER0 such that γ > γn* and given γ: setting BER equal to BER in (3), the thresh- n 0 τ olds are given by [2] BERn (γ τ γ ) = ⎧ bnγ τ * ⎪an ⋅exp − M ( ) , γ τ ≥ γ n ⎨ ( n γ ) (7) γn = (MnKn)/bn, n = 1, 2, ..., N (4) 1 * ⎩⎪ 2 , γ τ < γ n γN+1 = ∞

where Kn = –ln (BER0/an). The average BER for γ in fading region n is now given by The probability that γ falls in fading region n, τ γ n+1 BER n = Pn = P(γn ≤ γ < γn+1), is given by [4, Eq. (10)] ∫γ n ∞ τ BERn γ τ γ p γ τ γ dγ τ pγ (γ )dγ , (8) {}∫0 ( ) γ τ γ ( ) HH,,HHγ nnγ +1 ΓΓ()γ − ()γ Pn = (5) ()H −1 !

108 Telektronikk 1.2002 where pγ(γ) is given by (1). Furthermore, where

H Hρbn βn = + (14) p γ τ γ is the pdf of γ conditioned on γ [4] γ τ γ ( ) τ γ HMn + γ (1− ρ)bn and ()H −1 /2 H ⎛ γ τ ⎞ pγ γ (γ τ γ ) = ⎜ ⎟ H bn τ (1− ρ)γ ⎝ ργ ⎠ ω n = + . (15) γ Mn ⎛ 2H ργγ τ ⎞ ⎛ H(ργ + γ τ )⎞ ⋅I ⋅exp − . (9) H −1⎜ 1 ⎟ ⎜ 1 ⎟ ⎝ ( − ρ)γ ⎠ ⎝ ( − ρ)γ ⎠ From Appendix C, we have

1 I2(n) = S(an ,bn ) − S( 2 ,0) (16) The function IH-1(⋅) in (9) is the (H – 1)th-order modified Bessel function of the first kind [7, Ch. for 9]. The pdf also contains the channel power cor- relation coefficient ρ at lag τ. It is shown in H def ()1− ρ Appendix A that ρ is given by the square of the S()abnn, = a n H −1 ! zeroth-order Bessel function of the first kind [7, () jH+ Ch. 9], ⎪⎧ ∞ ρ j ⎡ HM ⎤ ⋅⎨∑ ⎢ n ⎥ 2 ⎩⎪ j= 0 ()jH+ −11!! j⎣⎢bHMnn()− ργ+ ⎦⎥ ρ = J0 (2πf Dτ), (10) ⎛ ⎡ b H ⎤ ⎞ for any number of receive antennas. Here, ⎜Hj,⎢ n ⎥ * ⎟ ⋅γ inc⎜ ++ γ n ⎟ ⎢Mn ()1− ργ⎥ fD = υ/λ [Hz] is the maximum Doppler fre- ⎝ ⎣ ⎦ ⎠ quency shift defined by the terminal speed υ ⎡ ⎛ ⎞ ⎛ ⎞⎤⎫ Hγ nnnHγ +1 ⎪ [m/s] and the wavelength λ [m] of the carrier. ⋅⎢Γ⎜Hj+ , ⎟−Γ⎜Hj+ , ⎟⎥⎬ (17) ⎣⎢ ⎝ ()11− ργ⎠ ⎝ ()− ργ⎠⎦⎥⎪ ⎭ Using (7), the average BER in fading region n given by (8) can be rewritten as the difference where between two double integrals, µ γ υ,µ = t υ −1e−t dt (18) inc ( ) ∫0 τ BER n = I1(n) − I2(n), is the incomplete gamma function [6, Eq. where (8.350.1)]. Since H + j is an integer in (17), the function may be calculated using [6, Eq.

def γ (8.352.1)]. I1(n) = n+1 ∫γ n The average BER over all N codes, denoted by ⎪⎧ ∞ ⎛ b γ ⎞ ⎪⎫ a ⋅exp − n τ p γ γ dγ ⎨ n τ τ ⎬ ∫0 ⎜ ⎟ γ τ γ ( ) τ ⎩⎪ ⎝ Mn ⎠ ⎭⎪ BER , is equal to the expected number of ⋅p (γ )dγ (11) γ information bits in error per modulation symbol divided by the expected number of transmitted and information bits per modulation symbol,

def γ 2(n) = n+1 I ∫γ N τ n in BER BER τ ∑n=1 n ⎧ * ⎡ ⎤ = N ⎪ γ n ⎛ bnγ τ ⎞ 1 a ⋅exp − − ∑n=1inPn ⎨∫0 ⎢ n ⎜ ⎟ ⎥ ⎪ ⎢ ⎝ Mn ⎠ 2 ⎥ N ⎩ ⎣ ⎦ i 1(n) − 2(n) ∑n=1 n []I I . (19) = N pγ γ (γ τ γ )dγ τ ⋅ pγ (γ )dγ . (12) i P τ } ∑n=1 n n

The double integral I2(n) is zero for γn* = 0, i.e. Here, in = log2(Mn) – 1/L is the number of infor- when parameters an and bn result in good BER mation bits per modulation symbol and Pn is approximations for any γτ ≥ 0. Hence, I2(n) may defined by (5). In practice, the double integral be viewed as a “correction term” needed when I2(n) can only be approximated since the sum in an and bn are only useful for γτ ≥ γn* > 0. (17) must be terminated after a definite number of terms. Since each term in the sum is positive, It is shown in Appendix B that the termination causes the expression in (19) to become an upper bound on the BER. The tight- H an ⎛ H ⎞ ness of the bound improves as the number of I1(n) = ⎜ ⎟ (H −1)!⎝ γ ⎠ terms is increased. We will use the ten first Γ H,β γ −Γ H,β γ terms in the sum of (17) in the next section. ( n n ) ( n n+1 ) (13) H ω ( n )

Telektronikk 1.2002 109 nMn an bn γn [dB] ≈ IV. Evaluation of Example 1 4 188.7471 9.8182 7.7 Codec An adaptive codec with eight 4-dimensional trel- 2 8 288.8051 6.8792 12.4 lis codes was described in [2]. The individual 3 16 161.6898 7.8862 14.6 codes’ BER performances on an AWGN channel 4 32 142.6920 7.8264 17.6 were simulated for various CNRs. The obtained BER points (represented by boxes) are shown 5 64 126.2118 7.4931 20.8 in Figure 1. Curve fitting with the least squares 6 128 121.5189 7.7013 23.7 method was used to obtain the parameters an and b listed in Table 1. The corresponding BER 7 256 79.8360 7.1450 26.9 n approximations (3) are plotted in Figure 1. The 8 512 34.6128 6.9190 29.7 expression in (4) was used to determine the tabu- 2) lated thresholds γn (rounded to one decimal -4 Table 1 Parameters an and bn for example codec and calculated thresholds γn [dB] digit) for target BER0 = 10 . -4 for target BER0 = 10 γ¯ Using the thresholds γn, setting L = 2, and h = 20 dB, the base-10 logarithm of the average BER (19) is plotted as a function of the correla- H=1 tion coefficient ρ in Figure 2 for H ∈ {1, 2, 4} -1.5 receive antennas. We observe that because the thresholds are chosen according to (4), the instantaneous BER is smaller than the target H=2 -2 BER0 for γn < γ < γn+1 and ρ close to one. As a result, the average BER will be below BER0 for large ρ (see Figure 2). -2.5

Let τmax denote the maximum total delay, or H=4 maximum tolerable delay, for a given target -3 BER BER0. The expression (10) for ρ can be used to 10 determine the maximum tolerable delay τmax for log different Doppler shifts f and targets BER . -3.5 D 0 The minimum values of ρ (rounded to three dec- BERτ imal digits) needed to achieve ≤ BER0 = 10-4 are listed in Table 2 for H {1, 2, 4}. -4 ∈

If we let the carrier frequency be f = 1900 MHz 8 -4.5 and use the value c = 3 ⋅ 10 m/s for the speed of light, then the wavelength of the carrier fre- quency is λ = c/f = 3/19 (⊕ 0.16) m. A mobile -5 terminal with (pedestrian) speed υ = 1 m/s then 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 has Doppler shift fD = υ/λ = 19/3 (⊕ 6.33) Hz. correlation coefficient The corresponding maximum tolerable delays τmax (rounded to one decimal digit) are listed in Figure 2 Base-10 logarithm of average BER for correlation coefficient 0.8 ≤ ρ < 1, Table 2. -4 γ¯ target BER0 = 10 , and average antenna branch CNR h = 20 dB To see that the fading is nearly constant over many hundred modulation symbols for commu- H min. ρτmax [ms] ≈ τmax/T [symb.] ≈ nications at pedestrian speed, we calculate the number of symbols transmitted during the maxi- 1 0.997 2.7 1,080 mum tolerable delay τmax. We first need to 2 0.991 3.4 1,360 determine a bandwidth B for which it is reason- able to assume that the fading is frequency-flat. 4 0.963 6.9 2,760 The (rms) delay spread, σd [s], measures how much a signal component may be delayed during Table 2 Minimum correlation coefficient ρ needed to achieve BER ≤ 10-4 for aver- transmission [8, Sec. 2.2.2]. The reciprocal of age antenna branch CNR γ ¯ h = 20 dB and different number H of receive antennas. the delay spread provides a measure of the width Maximum tolerable delay τmax [ms] and number of modulation symbols transmitted of the band of frequencies which are similarly during τmax for carrier frequency 1900 MHz, bandwidth 400 kHz, and terminal affected by the channel response. The channel speed ν = 1 m/s

2) -3 -4 The thresholds in Table 1 are larger than the thresholds in [2, Table I] because we have reduced the target BER0 from 10 to 10 . Furthermore, the path memory length of the Viterbi decoder was set to 9 in [2] while a path memory length of 16 was used in this paper.

110 Telektronikk 1.2002 H H γ = γ = K α2 2 is therefore approximately frequency-flat if the h=1 h h=1 h where αh is the bandwidth B << 1/σd. power gain on the Hth antenna branch. Since we 2 also have γ = α ⋅ K, it follows that 2 H 2 At 1900 MHz, the multipath delay spread is up α = h=1 αh and we can write to σ = 250 ns for a cordless phone in indoor d    and outdoor environments [9]. Hence, we may H H 2 2 2 2 assume that a channel with bandwidth at least E[α ατ ]=E αh αi,τ up to B = 400 kHz has frequency flat fading. h=1 i=1 The time needed to transmit one symbol at the H   H    = E α2 α2 + E α2 α2 . Nyquist signaling rate is T = 1/B = 2.5 µs, result- h h,τ h i,τ (21) h=1 h=1 i=h ing in τmax/T symbols being transmitted during the maximum tolerable delay. Using the rounded Furthermore, because the signals on different = values of τmax in Table 2, we obtain the τmax/T antenna branches (hi) are statistically indepen- values listed in the rightmost column of Table 2 dent, the covariance for terminal speed υ = 1 m/s and H ∈ {1, 2, 4}.   2 2 cov αh,αi,τ =       V. Conclusion 2 2 2 2 E αhαi,τ − E αh E αi,τ =0, It has been shown (see Figure 2) that the BER performance may degrade considerably as ρ de-       2 2 2 2 creases, which – for a given carrier frequency or equivalently, E α h α i,τ − E . αh E αi,τ =0, – corresponds to increasing the terminal speed. The expression in (21) is then equal to However, the degradation can be mitigated by     2 2 2 2 2 the use of MRC antenna diversity. Still, our E α ατ = HE αhαh,τ + H(H − 1)Ω , results indicate that adaptive coded modulation may be best suited for systems with moderate and the correlation coefficient in (20) reduces to 2 2 2 mobility requirements, with terminals moving E αhαh,τ − Ω ρ = at pedestrian speed. Ω2 (22) Appendix A – Calculation of ρ In this appendix we show that the channel power Observe that (22) is independent of the number correlation coefficient ρ is given by the expres- of receive antennas H. In fact, (22) defines the sion in (10). The instantaneous received CNR correlation coefficient for a Rayleigh fading on the channel may be expressed as γ = α2 ⋅ K channel (without MRC). It is shown in [8, Eq. where α2 is the channel power gain and K = (2.68)] that the numerator in (22) is equal to Ω2 H E[γ]= E[γ ]=HKΩ 2 S/(N0 B). Since h=1 h , J0 (2πfDτ), and as a result, ρ is given by the we have E[α2] = HΩ. Assume that α2 is the expression in (10). 2 power gain at some time t and let ατ be the power gain at time t + τ for τ > 0. The correla- Appendix B 2 2 tion coefficient ρ between α and ατ is then – Evaluation of I1(n) given by In the following we calculate the double integral cov(α2,α2 ) in (11). For the inner integral, BERn in (3) is ρ =  τ 2 2 fixed since the CNR γ is fixed. It follows from σα2 σα2 τ (3) that 2 2 2 2 E[α ατ ] − E[α ]E[ατ ] (20) b γ = . M = − n . σ 2 σ 2 n α ατ ln(BERn/an)

Using this expression for Mn and setting Dn = 2 The channel gain α is gamma distributed [8, p. – ln(BERn / an), the inner integral in (11) is 48]. Hence, assuming that the channel power equal to 2 2 gains α and ατ have the same expectations and 2 2 standard deviations, we have E[α ]E[ατ ] = 2 2 2 2 (HΩ) and σ α 2 σ α 2 = (E[α ]) /H = HΩ in (20).  ∞  (H−1)/2 τ H γ I1(n, γ) def= a τ n (1 − ρ)¯γ γρ 2 2 0 To calculate E[α ατ ], we first compare two   different expressions for the instantaneous H(ργ + γ ) D γ · exp − τ − n τ received CNR γ. When the communication chan- (1 − ρ)¯γ γ nel is viewed as a Rayleigh fading channel with  √  a H-branch MRC combiner, then 2H ργγ ·I τ dγ . H−1 γ¯(1 − ρ) τ (23)

Telektronikk 1.2002 111 Introducing the constant Appendix C – Evaluation of 2(n) ρH 2γ 2 I x = We shall calculate the double integral I2(n) γ 1− ρ Hγ + γ 1− ρ D ( )( ( ) n ) defined by (12). We first split the double integral and making the substitution in two to obtain

⎛ H Dn ⎞ z = ⎜ + ⎟ γ τ , I2(n) = ⎝ γ (1− ρ) γ ⎠ ⎧ * ⎫ γ n+1 ⎪ γ n ⎛ bnγ τ ⎞ ⎪ an ⋅exp − p γ γ dγ the integral (23) can be written as ∫γ ⎨∫0 ⎜ ⎟ γ τ γ ( τ ) τ ⎬ n ⎩⎪ ⎝ Mn ⎠ ⎭⎪ H ⎛ ⎞ ⋅p (γ )dγ Hγ γ (28) I1(n,γ ) = an ⎜ ⎟ ⎝ Hγ + γ (1− ρ)Dn ⎠ * γ n+1 ⎧ γ n −1 ⎫ ⎛ ρD Hγ ⎞ − 2 p γ γ dγ p (γ )dγ . (29) n ∫γ ⎨∫0 γ τ γ ( τ ) τ ⎬ γ ⋅exp⎜ − ⎟ n ⎩ ⎭ ⎝ Hγ + γ (1− ρ)Dn ⎠ ()H −1 /2 ∞ ⎛ z ⎞ −z− x The second integral (29) is a special case of the ⋅ e IH −1 2 xz dz. (24) ∫0 ⎝ x ⎠ ( ) -1 first integral (28) with an = 2 and bn = 0. Hence, we only need to consider the first inte- The value of the integral in (24) is equal to gral. The pdf pγ γ (γ τ γ ) defined by (9) con- QH(x, 0) where QH(⋅, ⋅) is the generalized Mar- τ cum Q-function of order H [7, Eq. (11.63)]. tains the (H – 1)th-order modified Bessel func- Since QH(x, 0) = Q1(x, 0) = 1 for all x, we have tion of the first kind defined by [7, Eq. (9.28)] H 2 j ⎛ Hγ ⎞ ν ∞ 1 1(n,γ ) = a 1 ( 2 z) I n ⎜ ⎟ I (z) = ⎛ z⎞ ⎝ Hγ + γ (1− ρ)Dn ⎠ ν ⎝ ⎠ ∑ 2 j =0 ( j + ν)! j! ⎛ ρD Hγ ⎞ ⋅exp − n . (25) ⎜ Hγ + γ 1− ρ D ⎟ ⎝ ( ) n ⎠ for ν an integer. Using this definition, the inner integral in (28) is equal to The double integral in (11) can now be written as ⎡ ⎤H ⎛ ⎞ HMn Hργ an ⎢ ⎥ ⋅−exp⎜ ⎟ I1(n) = F(γn) – F(γn+1) (26) ⎣⎢bHMnn()11− ργ+ ⎦⎥ ⎝ ()− ργ⎠ j ∞ ⎡ 2 ⎤ for 1 MHργ ∑ ⎢ n ⎥ ∞ j= 0 ()jH+ −11!! j⎣⎢()− ργ{}bHMnn() 1− ργ+ ⎦⎥ ξ = 1(n,γ ) p γ dγ . F ( ) ∫ξ I γ ( ) ⎛ ⎡ b H ⎤ ⎞ ⎜Hj,⎢ n ⎥ * ⎟ γ inc⎜ ++ γ n ⎟ ⎝ ⎣⎢Mn ()1− ργ⎦⎥ ⎠ To calculate F(ξ), we first observe that BERn in (3) is no longer a constant since γ varies. Using the connection Dn = – ln(BERn / an) = (bnγ)/Mn, where γinc(⋅, ⋅) is defined by (18). The outer inte- it follows from (1) and (25) that gral in (28) is then equal to (17). Since the dou-

H H ble integral in (29) is a special case of the double a ⎛ H ⎞ ⎛ HM ⎞ (ξ ) = n n integral in (28), the “correction term” I2(n) is F ⎜ ⎟ ⎜ ⎟ (H −1)!⎝ γ ⎠ ⎝ HMn + γ (1− ρ)bn ⎠ now given by (16). ∞ H −1 ⋅ γ exp(−βnγ )dγ ∫ξ References 1 Hole, K J, Øien, G E. Spectral efficiency of where βn is defined by (14). Substituting t = βnγ adaptive coded modulation in urban micro- and observing that cellular networks. IEEE Trans. Veh. H H H Technol., 50 (1), 205–222, 2001. ⎛ HMn ⎞ ⎛ 1 ⎞ ⎛ 1 ⎞ ⎜ ⎟ ⎜ ⎟ = ⎜ ⎟ ⎝ HMn + γ (1− ρ)bn ⎠ ⎝ βn ⎠ ⎝ ω n ⎠ 2 Hole, K J, Holm, H, Øien, G E. Adaptive multidimensional coded modulation over flat for ωn defined by (15), we get fading channels. IEEE J. Select. Areas Com- H mun., 18 (7), 1153–1158, 2000. an ⎛ H ⎞ Γ(H,βnξ ) F (ξ ) = ⎜ ⎟ H (27) (H −1)!⎝ γ ⎠ ω 3 Goeckel, D L. Adaptive coding for time- ( n ) varying channels using outdated fading esti- where Γ(⋅, ⋅) is given by (6). The expression for mates. IEEE Trans. Commun., 47 (6), I1(n) in (13) is now obtained from (26) and (27). 844–855, 1999.

112 Telektronikk 1.2002 4 Alouini, M-S, Goldsmith, A J. Adaptive M- 7 Temme, N M. Special Functions – An Intro- QAM modulation over Nakagami fading duction to the Classical Functions of Mathe- channels. Proc. 6th Communications Theory matical Physics. New York, NY, John Mini-Conference (CTMC VI) in conjunction Wiley, 1996. with IEEE Global Communications Confer- ence (GLOBECOM’97), Phoenix, Arizona, 8 Stüber, G L. Principles of Mobile Communi- Nov. 1997, 218–223. cation. Norwell, MA, Kluwer, 1996.

5 Jakes, W C (ed.). Microwave Mobile Com- 9 Ue, T et al. Symbol rate and modulation lev- munications. NJ, Piscataway, IEEE Press, elcontrolled adaptive modulation/TDMA/ second ed., 1994. TDD system for high-bit-rate wireless data transmission. IEEE Trans. Veh. Technol., 47 6 Gradshteyn, I S, Ryzhik, I M. Table of Inte- (4), 1134–1147, 1998. grals, Series, and Products. San Diego, CA, Academic Press, fifth ed., 1994.

Telektronikk 1.2002 113 Shannon Mappings for Robust Communication

TOR A. RAMSTAD

Shannon’s geometric interpretation of messages and channel representations in communicating time- discrete, amplitude-continuous source symbols is exploited in this paper. The basic idea is to map what we can call source space symbols into channel space symbols, where the two symbols possibly have different dimensions. If we decrease the dimension through this operation, bandwidth reduction is obtained. If, on the other hand, the dimension is increased when mapping from the source to the channel space, bandwidth expansion results. In practical systems several mappings are applied for different dimension changes depending on the importance of the source symbols as well as the avail- able channel resources. The paper also discusses theoretical limits for Gaussian sources and channels, expressed by OPTA (optimal performance theoretically attainable), and compares practical mapping

Tor A. Ramstad (60) received constructions with these limits. Finally, the paper demonstrates how efficient image communication his Siv.Ing. and Dr.Ing. degrees systems can be designed, and shows that much higher robustness towards channel variations can be in 1968 and 1971, respectively, obtained than is possible for pure digital systems. both from the Norwegian Univer- sity of Science and Technology (NTNU). He has held various positions in the Dep. of Telecom- munications at NTNU, where since 1983 he has been a full 1 Shannon Mappings In this article we will present the general ideas, professor of telecommunica- Modern telecommunications are to a large introduce some simple examples of how to con- tions. In 1982–83 he was a visit- degree based on the works of Nyquist and Shan- struct mappings for special sources and chan- ing associate professor at the University of California, Santa non. The present work is no exception, but it is nels, and show complete systems for image Barbara; he was with the Geor- specifically related to a less known general idea transmission. gia Institute of Technology in presented by Shannon in his famous 1949 paper 1989–90 as a visiting adjunct professor, and again at UCSB [33]. There he suggests mapping time discrete 2 Introduction to the as a visiting professor in 1997– signals from a continuous multidimensional Communication Problem 98. Dr. Ramstad’s research source space to a continuous channel space of The basic objective in efficient communication interests include multirate signal processing, speech and image different dimensionality. By this mechanism one of natural signals like speech, images, and video, processing with emphasis on can obtain bandwidth compression, which will can be stated as: optimizing for conveying as image and video communica- increase the number of simultaneous users on a many signals as possible with a specified quality tions, where joint source-chan- nel coding is a central topic physical channel, or bandwidth expansion if it is over a given physical channel. The channel is necessary to increase the received signal fidelity usually band-limited, it has certain power and/or [email protected] due to a noisy channel. amplitude constraints, e.g. due to battery lifetime and regulations for radio transmission, and the When we initially started the work presented in channel will usually distort the signal by inser- this article, we were not aware of Shannon’s tion of different noise types, like thermic noise idea. When we discovered that Shannon had or contamination by crosstalk from other electri- introduced the geometrical mappings in his 1949 cal signals, or linear and nonlinear distortions paper [33], we were pleased and assumed we due to non-ideal behavior of the channel. were on the right track. To honor Shannon, we would like to propose to call the mappings used The exploitation of a given physical channel in this paper Shannon mappings. Shannon did can be influenced by different measures. These not pursue his idea, partly because technology include signal adaptation to the channel charac- was not ready for designing and implementing teristics, driving power, relay amplification/ such systems at the time. Today the situation is regeneration, and optimal receivers. Given the quite different. channel capacity, the number of useful signals that can be transmitted is also influenced by sig- The mapping description is very general and nal compression. However, the amount of com- leads to a deeper understanding of the general pression is limited by the unavoidable noise gen- communication problem. One can cast such erated when amplitude-continuous signals are techniques as quantization and modulation into compressed. this framework. The term joint source-channel coding is often encountered in modern literature. Under certain simplifying, but rather realistic The mapping idea can probably incorporate all conditions, the channel capacity can be calcu- the different techniques suggested for source- lated [33]. To approach the channel capacity, channel coding, and it is probably the most gen- which guarantees lossless transmission of multi- eral way of looking at the entire communication level signals, one has to resort to systems of in- problem. finite complexity, which also require infinite

114 Telektronikk 1.2002 delay. In such a channel, the compression opera- 2.2.1 Distortion and Rate tion can be viewed as a separate problem [32, Natural signals conveying meaningful informa- 34] due to Shannon’s separation theorem. To tion to humans invariably have inherent sample- obtain a minimum digital representation, infinite to-sample dependencies. Such dependencies are complexity is also required for the codec. usually characterized as redundancy. Further- more, human perception is not capable of distin- The aim of this paper, however, is to contribute guishing signals with certain distortions from the to developing understanding for the possible pure original. Even observable distortion is tol- gains which can be reached by doing joint com- erated. Most communication systems rely heav- pression and channel coding, when we constrain ily on the user tolerance to distortions. The the allowed complexity and delay. We stress that amount of imperceivable distortion is often very good robustness is obtained for the pro- called irrelevancy. The amount of acceptable posed systems without resorting to explicit error distortion can be called noise tolerance. protection. To assess distortion we need distortion mea- 2.1 Discrete Transmission over sures. Such measures should agree as much as Noise-free, Band-limited Channels possible with the perception of the receiver. Throughout the discussion we assume that all When for natural signals the receiver is a human useful signals are either ideally band-limited, or being, the distortion measure should mimic can be made close to that through lowpass filter- human perception. This turns out to be rather ing without loss of subjective quality. (The pass- complex both for visual and auditory perception. band of the filter can be freely chosen.) This Although it does not reflect human perception enables us to sample the signal at or above the well, the most common distortion measure is the Nyquist rate without loss of information. If the mean squared error (mse). If the original- and bandwidth is B, and the minimum necessary reconstructed signals are given by vectors with sampling frequency is F = 2B, then the sampled s ˆ signal can be transmitted over an ideal and components xi, i = 1, 2, 3, ..., M and xi , i = 1, 2, noise-free Nyquist channel with bandwidth W = 3, ..., M, respectively, the mse is defined by B. That is, the time discrete signal requires the same bandwidth as the original analog signal. M ⎪⎧ 1 2 ⎪⎫ D = ε ⎨ xi − xˆi ⎬, (1) M ∑( ) In practice, we need to relax this assumption ⎩⎪ i=1 ⎭⎪ somewhat. First of all, some oversampling is necessary to allow for realistic low-pass anti- where ε{ } is the expectation operator. aliasing filters, and also, any Nyquist channel needs some roll-off factor in order to obtain If we have a good signal model, we can, accord- finite order filters. We can account for both ing to Shannon, evaluate the necessary rate for effects by including some oversampling factor obtaining a given distortion. A Gaussian source α implying that the practical channel bandwidth of independent and identically distributed sam- must be W = αB. α is a design parameter that ples has a simple rate distortion (R-D) function can be made close to 1 by increasing the filter measured in bits/sample complexities. ⎧ σ 2 1 log ⎧ X ⎫, for σ 2 ≤ σ 2 , ⎪ 2 2 ⎨ σ 2 ⎬ D X 2.2 Limits in Signal Communication R = ⎨ ⎩ D ⎭ (2) ⎪0, for σ 2 > σ 2 , The above discussion on error free transmission ⎩ D X specifies the number of samples that can be 2 2 transmitted for a given channel bandwidth. It where σ X is the signal power and σ D is the does not include limitations induced by noise, accepted distortion. As the rate can never be which is an omnipresent phenomenon. As we negative, the limiting case means that the noise shall see, noise-free transmission would imply is set equal to the signal. That is, it is preferable infinite channel capacity! to transmit nothing for this case. The expression 2 2 σ X / σ D is called the signal-to-noise ratio The second important aspect in communications (SNR), and is usually measured in dB. is how signals carry information and how they can be represented by samples that can be con- With correlated sources, the R-D function is veyed through the physical channel. Actually, more complex. The main point is that due to the from a mathematical point of view any analog interrelation between the samples, a correlated signal contains infinite information, it is only signal can be coded at a lower rate for the same when accepting noise contamination that finite distortion. information results. However, this is not a disas- ter, as we shall see that the human observer is Although we measure the rate in bits/sample, the quite tolerant towards certain degradations. formula does not imply that we must quantize

Telektronikk 1.2002 115 n(k) 2 Figure 1 Optimal system with σN This means that the signal-to-noise ratio in the no bandwidth alteration received signal is given by the signal-to-noise ratio on the channel plus 1. 2 β σX x(k) y(k) An interesting fact is that unlike almost all other cases, the OPTA can be reached using a very simple implementation.

A possible system model is given by Figure 1. the signal. This is only a measure of information, The signal samples are transmitted without any that is, it measures the information in a signal modification over a Nyquist channel with addi- when a certain noise level is acceptable. tive Gaussian noise. We reach the OPTA condi- tion for this system when the signal is also Gaus- 2.2.2 Channel Capacity sian by selecting The Nyquist theorem tells us how many samples σ2 β = X . can be transmitted given the bandwidth. It does σ2 + σ2 (6) not tell us how much information can be con- X N 2 2 veyed by each sample when the channel has a Remember that σ C = σ X for the above case. given signal-to-noise ratio. Shannon’s channel capacity provides this measure. If we try to construct an optimal system based on the separation theorem for this case, we The simplest case is when the channel is memo- would have to apply an infinite-dimensional ryless and power limited, that is, the channel vector quantizer for the source coder followed 2 power is given by σ C, and the channel is cor- by a channel coder with infinite delay. 2 rupted by white, Gaussian noise with power σ N. Then the channel has a capacity measured in bits In the above case the signal bandwidth and the per sample: channel bandwidth are the same. If, for some reason, this is not acceptable, there must be a   σ2 sample rate change. This is easily understood if C 1 C . = log2 1+ 2 (3) we consider rate per time unit rather than rate 2 σN per symbol. The source produces 2B samples per 2 2 σ C / σ N is called the channel signal-to-noise second, which indicates that its rate is given by ratio (CSNR), and is also usually measured in dB. 2BRs bits per second. Likewise, the channel can transmit 2W symbols per second, which gives a It is again more complicated to find the capacity capacity of 2WC bits per second. Setting these of channels with other constraints and memory two rates equal; we obtain OPTA for the general as well as other noise types. For our purpose this case (still assuming the same simple source and channel will illustrate our main points. channel models).

2.2.3 Optimal Performance Theoretically     σ2 σ2 Attainable (OPTA) B 1 X W 1 C . 2 log2 σ2 =2 log2 1+σ2 OPTA is a measure for the limits for efficient 2 D 2 N (7) communications. It can be derived from the rate- distortion function and the channel capacity. Solving this equation with respect to the source signal-to-noise ratio, we obtain   The simplest case occurs when we transmit one σ2 σ2 W/B X = 1+ C source sample per channel sample. This means σ2 σ2 (8) that the source rate has to be equal to the channel D N capacity   2 We observe that the bandwidth relation W/B 1 σX Rs = log2 = C 2 σ2 enters the equation. This bandwidth change can  D  2 be obtained only by dimension change where 1 σC = log2 1+ (4) M source samples are combined into K channel 2 σ2 N samples, where K/MW≈ /B. which implies that In Figure 2 the OPTA-curves for different sam- ple compression ratios are given. We have   2 2 included dimension increase in the plots, which σX σC 2 = 1+ 2 . (5) gives sample rate expansion, as well as dimen- σD σN sion reduction. We mentioned initially that the

116 Telektronikk 1.2002 ultimate aim is to make rate or bandwidth reduc- 50 tion, but this can be obtained only if the channel 45 has a sufficient CSNR for the required received signal quality. If the channel is not good enough 40 even without compression, sample rate increase is necessary. 35

Practical signal sources are usually decomposed 30 into sub-sources with different variances and 1:2 25 even time variations. Each source would nor- SNR (dB) mally require different SNR, implying that some 20 sub-sources can be compressed, others can be 1:1 sent unmodified, while some have to be ex- 15 3:2 panded. Actually, some sub-sources are insignif- 2:1 icant and can be skipped altogether. This is a 10 3:1 natural consequence of the condition that Rs = 0 4:1 2 2 5 for σ D = σ X in Equation 2. The average rate required by all these sub-sources is the interest- 0 ing number for assessing the efficiency of the 0 510152025 system. CSNR (dB)

This theory gives the limits for any single-source single-channel system, but it does not tell us how to get close to these limits. The rest of this Figure 2 OPTA curves with paper will provide examples of possible methods M/K as parameter where nonlinear mappings are used for dimen- sion change. In most of these methods no bit Ψ = q ° T. (10) representation is used, but it is convenient to compare the results to more traditional results The operations are illustrated in Figure 3. The where bits give an intermediate representation operation q is, of course, an identity operation form. when Ψ is invertible.

3 Geometric Description of The simplest way of doing dimension reduction Dimension Change is to skip some of the vector components. Define the input signal as a vector x consisting of M components. The vector can be represented Although dimension increase can always be geometrically as a point in a Euclidean space of done without approximation, the most common dimension M. Mathematically we say that way of increasing the signal dimension is by M x ∈ R . Then the vector components x1, x2, ..., quantization, which is not an invertible operation. xM are the coordinate values of the vector. A dimension change can be performed by a map- 3.1 Quantization in Source-to- ping Ψ from the space of dimension M to a Channel Mappings space of dimension K (Ψ : RM → RK) as Quantization must be applied for making a digi- tal representation from an analog signal. We will y = Ψ(x). (9) discuss the most general form called vector quantization, and show that the principle is very The operator Ψ is generally nonlinear, but we simple and closely related to the mappings dis- can occasionally make use of linear operators. cussed in this paper.

Mappings without dimension change (K = M), and mappings with dimension increase (K > M) can be made invertible, whereas dimension re- ducing mappings (K < M) for all practical pur- poses involve approximative operations, and are thus not invertible. x w y It is the instructive to split the mapping opera- 1 1 1 Figure 3 Mapping from tion into two parts; an approximation operation x2 w2 y2 higher to lower dimension

• q • T • q, which maps the complete space onto a sub- • • • as a two-stage process. • • • space, and the dimension-changing operation T, q indicates approximation xM wM yK which is an invertible operation: while T represents the mapping between dimensions

Telektronikk 1.2002 117 A vector quantizer can be split into an approxi- channel. Uniform quantization requires that all mation part q and a part which maps to a discrete Voronoi regions are of length ∆ on the real line. representation. This is an intermediate represen- tation from which we can map to the channel in The reconstructed signal can then be written many ways. b−1 M ˆ −i q is a mapping of x ∈ R to a finite set W = xxa= ∆sign()∑ i 2 , (14) M i=1 {w0, ..., wL-1|wi ∈ R } of representation vec- tors: where the ai’s represent the bits received with q : RM → W. (11) values 0 and 1, and b is the total number of bits when we reserve one bit for the sign of x. The mapping q can be defined on a partition of RM into L non-overlapping, M-dimensional cells Consider now the result of a bit error when the {Ci} according to the rule ai’s are transmitted directly on the binary chan- nel. An error in the least significant bit hardly x ∈ Ci ⇒ q(x) = wi, i = 0, 1, ..., L – 1, (12) influences the reconstruction, whereas a bit error in the most significant bit changes the sign of the where {Ci} satisfies signal and thus can change the value greatly if the amplitude is large at the same time. This L−1 M ∆ = problem relates to what Shannon calls the U Ci = R and Ci ∩ Cj = for ij. (13) i=0 threshold effect (see below). The advantage of this method is that we require only a modest In a VQ setting the collection of representation CSNR to get a very low bit error rate. vectors is referred to as the codebook. The cells Ci, called Voronoi regions, can be thought of as If we, on the other hand, transmit the complete solid polygons in the M-dimensional space RM. index directly using L = 2b channel levels, a transition from one state to another only makes a The index i identifies that vector uniquely and is change in the least significant bit. This requires a therefore an efficient representation of the vec- much higher CSNR than for binary transmission, tor. The vector can be reconstructed approxi- but at the same time guarantees a moderate ≈ mately as xwi by looking up the representa- channel noise influence on every sample. tion vector in cell number i. Thus, the in bits per sample in this scheme is b = log2(L)/M. Even though we can represent the process of A further bit reduction can also be obtained by going from the real line via quantization to some entropy coding of the indices, provided the sym- channel representation in many steps, the two bols have different probabilities. important steps are the approximation step which maps the real line to a discrete subset on The next step is to design the channel represen- the real line, and the mapping from these dis- tation. The simplest example is transmitting the crete values to a discrete channel space of possi- indices bitwise on a binary channel using K = b. bly higher dimension. M log (L) We then obtain a mapping Ψ : R → R 2 . Whether this is a bandwidth expansion or reduc- We claimed earlier that dimension increase does > tion depends on whether log2(L) < M. not have to involve any approximation. And as a matter of fact, the quantization step is not neces- With reference to Figure 3 the dimension-chang- sary. We will return to this shortly. ing mapping T consists of the index assignment and the channel representation of the index. We will now leave methods based on Shannon’s separation theorem and study mappings in But there are many other ways of making chan- greater detail and present some important exam- nel representations of the index. One could ples. We start with a discussion of methods sug- group the bits in the index into, say b1 bits, and gested by Shannon. transmit the message by 2b1-level signals. This would reduce the bandwidth by a factor of b1. 3.2 Ideas in Shannon’s Paper Often we want to protect the bits using forward Shannon, in his 1949 paper [33], suggests a error correction (FEC), in which case we add mapping which is reproduced in Figure 4. If parity bits, thus increasing the necessary channel used for signal expansion, the length along the bandwidth. curve from some reference point represents the source sample amplitude, while the two coordi- Let us analyze the simplest system of all where nates of any point on the curve gives the two we use scalar, uniform quantization (M = 1) with channel samples, which can be represented as a b bits per sample and binary signaling on the QAM symbol. Channel noise will disturb the

118 Telektronikk 1.2002 signal. If, in the receiver, the decision device Figure 4 One-to-two- projects the noise corrupted sample down to the dimensional mapping closest point on the curve, then small noise sam- suggested by Shannon ples will only move the signal along the curve and thus produce insignificant change. If the channel noise reaches a certain level, there is a probability of crossing to a neighboring line. This is what Shannon refers to as the threshold effect. This effect will be encountered for all non-linear expanding mappings.

Shannon also mentions that the same mapping can be used for signal compression. Then the two coordinates represent the signal vector, and are approximated by the closest point on the curve. The channel symbol can be selected as Uncertainty the distance along the curve from a reference due to noise point to the approximation point on the curve.

We will demonstrate that Shannon-like map- pings work quite well for signal expansion, and It is easy to generalize this method to any map- argue that they may also perform well for com- ping Ψ : RM → R1 or even Ψ : RM → RK. pression if the signal is uniformly distributed over the square. In order to compare this system with others, we cast the algorithm into geometric form. Assume Shannon suggested another algorithm for signal that the two components have a support compression. For the special case of 2:1 map- xk ∈ [0, 1], k = 1, 2, which is indicated by the ping (Ψ : R2+ → R1+) he first represents the two way we have written the numbers in Equations components in decimal or binary form: 15 and 16. Also represent the components with

x1 = 0.c1c2c3 ... x = 0.d d d ..., (15) 2 1 2 3 1 where ci and d1, i = 1, 2, 3 ... are the digits. The one-dimensional signal y is obtained by picking 0.8 digits from x1 and x2 alternately and inserting them into y as 0.6 2 x y = 0.c1d1c2d2c3d3 ..., (16) 0.4 In this way it is possible to obtain an exact repre- sentation of any two real numbers by one real number using an infinite number of digits. In a 0.2 communication system y can be transmitted as one sample. 0 0 0.2 0,4 0.6 0.8 1

x1 It is important to note that if x1 and x2 are both described by b bits, then y needs 2b bits to be 1 exactly represented. Assume we transmit first the original samples using one multilevel symbol 0.8 for each component. This would require 2b lev- els for each symbol. If y is transmitted as a mul- tilevel symbol, the required bandwidth is re- 0.6 duced by a factor of 2, but the number of levels 2 x now needs to be 22b for exact representation. For 0.4 transmission over noisy channels this would require approximately twice the signal-to-noise ratio measured in dB for transmitting y com- 0.2 pared to transmitting the original symbols x1 Figure 5 Cantor maps for and x2. So if we define a bandwidth-SNR prod- signal dimension change 0 uct (Hz × dB), it remains unchanged for this 0 0.2 0,4 0.6 0.8 1 representing the mapping mapping operation. x1 given by Equations 15 and 16

Telektronikk 1.2002 119 In the next two sections we will dig deeper into the geometric construction of mappings for dimension change in communication systems.

3.3 Dimension Reduction Let us discuss further how we can make exact representations of a signal vector by using a vector of a lower dimension. We consider 2:1 mappings which try to represent the 2-D space, which is a plane, by use of a one-dimensional continuous curve. This is somewhat different from the previous example where we used a dis- crete subset as an intermediate representation. This new problem is related to what is called space-filling curves, or Peano-curves after the inventor.

Assume that the region of support for the 2-D signal is a unit square. Figure 6 shows how the classical Hilbert-curve can be constructed to fill the entire space. To be able to represent any point exactly, an infinite number of iterations has to be used.

A possible one-dimensional representation of any point in the two-dimensional space is the distance from the entrance-point of the curve to the point on the curve which coincides with the coordinates of the vector. It is obvious that this Figure 6 The first 4 iterations a finite number of bits, which obviously is the distance will, in general, be infinite. To cope for constructing a space-filling same as quantization. Then mappings using 3 with this problem for practical use, we must Hilbert-curve and 4 bits are shown in Figure 5 in the upper resort to a limited number of iterations, which and lower parts, respectively. would mean that we can only represent most points of the space approximately, but the dis- The lines connecting the quantization values in- tance to all points will then be finite. dicate how y is constructed. The numeric value of y is proportional to the number of nodes 3.3.1 Signal Approximation which is traversed from the origin to the actual We can conclude that it is necessary to make vector point. From the two figures one easily approximations when we lower the signal di- deduces a way to construct systems with more mensionality and want to transmit the resulting bits in a systematic way. In the limit when signal components over a channel with finite b → ∞, the whole space will be densely popu- signal-to-noise ratio. lated by points, and thus the accuracy of the compressed signal is also guaranteed. The subspace generated by q in Figure 3 must have certain topological properties suited for Figure 7 Double spiral of subsequent mapping to the lower dimension in Archimedes. The orange spiral the operator T. can be represented by positive 4 channel symbols, while the A very good continuous mapping for the case grey spiral can be represented when the region of support is a disk of radius 1, by negative channel symbols. is a double spiral composed of two spirals of 2 The star represents a 2-D Archimedes. The spirals can be described para- input vector. The channel metrically as, representation is obtained by approximation to the closest 0 θ x1 = 2∆ cos(θ ), point on the spiral indicated 2π by a circle. The channel adds θ x2 = 2∆ sin(θ ), (17) noise to the channel sample, -2 2π which in turn causes the representation point to move and along the spiral, as indicated -4 by the diamond -4 -2 0 2 4

120 Telektronikk 1.2002 θ x w y n x x1 = 2∆ cos(θ + π), 1 1 1 1 1 2π θ x2 w2 y2 n2 x2 x2 = 2∆ sin(θ + π). (18) q T R 2π • • • • • • • • • • • • • • •

xM wM yK nK xM θ is the rotation angle to the point [x1, x2] rela- tive the curves’ derivative at the origin. 2∆θ/π is the radius to the point, while ∆ is the distance between two neighboring crossings of the real Figure 8 Overall system model axis of either of the two spirals.

A spiral example is shown in Figure 7. Any 2-D vector, as e.g. the value represented by the star, will be approximated to the closest point on one 4 Figure 9 Scatter plots of of the spirals, in this case the point indicated by Gaussian signals with the circle. This is very similar to vector quanti- standard deviation zation. In VQ the approximations are points in 2 σX = 1.0 inside the spiral the space, while here it is a continuous curve.

The transform T can be chosen in many ways. 2 x 0 One possibility is to make y equal to the length from the approximation point on the spiral along the spiral to the origin. One spiral can be repre- -2 sented by positive amplitudes while the other can be represented by negative channel samples. Or one could select y to be the rotation angle θ. -4 -4 -2 0 2 4 To fill the entire space with the two curves we x1 need to let ∆→0. Then the spiral length will be infinite.

It is also easy to conceive systems for making Let us try to get more insight into the noise prob- space-filling curves for mappings from higher lem by studying the spiral example further. Fig- dimensions to one dimension. A one-dimension- ure 7 shows the effect of approximation and al approximate representation of a three-dimen- channel noise. As the amplitude of the transmit- sional sphere can be visualized by a ball of yarn. ted signal is corrupted by noise, the reconstruc- The thread is the curve, and a one-dimensional tion will be inexact. If the channel sample was representation of a point within the sphere could originally obtained as the length along the spiral be the length of the thread to the closest possible to the approximation point, the received signal point on the thread. will give the length along the spiral to the decoded point as indicated in Figure 7 with a T can be generalized to any non-linear length diamond. adjustments, as if the thread in the ball of yarn was made from rubber, and could be stretched How do the two noise contributions interact? It unevenly. is obvious that the approximation noise depends on the density of the spiral relative to the signal 3.3.2 Channel Noise components’ standard deviation. We illustrate In the complete communication chain, the chan- this in Figure 9. The noise is more severe when nel noise is the next obstacle when we want to the standard deviation of the signal is small rela- maximize throughput with a given fidelity. A tive to the density of the spiral. more complete signal chain is shown in Figure 8. But the denser the spiral becomes, the larger the In the model we represent the channel noise by channel amplitudes get. If the channel power (or the vector n. At the receiver an approximate channel amplitudes, if channel is amplitude lim- inverse operation, R, tries to minimize the noise ited) is too large, a downscaling is necessary. In effect while preserving the original signal as the receiver an upscaling must be applied, which well as possible. The received signal vector is also increases the noise with the upscaling factor. thus We observe a typical trade-off: If we lower ˆ x = R ° (n + T ° q)(x). (19) the approximation noise by making the spiral denser, the influence from the channel noise will become more severe, and vice versa. There

Telektronikk 1.2002 121 Figure 10 SNR versus CSNR 35 We can summarize the important requirements for different values of σX/∆ = 30 for obtaining a good dimension reducing map- [0.1, 1.0, 2.0, 4.0]. The OPTA- 25 ping. curve is also shown for the 20 2D-1D mapping 1. The mapping should cover the entire space in 15 such a way that any point is mapped to a close

SNR (dB) 10 representation point. 5 0 2. Probable source symbols should be mapped to channel symbols with low amplitudes to mini- -5 0 10 20 30 40 50 60 mize the average channel power. CSNR (dB) 3. Signals that are close in the channel space should remain close when mapped back to the source space. This will prevent small channel noise samples from inducing large errors in the decoded signal. The opposite is not neces- exists a balance between the two contributions sary. Two close source samples may well be which will make the system optimal for a given mapped to entirely different regions in the channel signal-to-noise ratio (CSNR). Figure 10 channel space. shows results from several simulations when transmitting Gaussian, white noise over a Gaus- Based on the above observations, will Shannon’s sian, memoryless channel using different spirals mapping in Figure 4 perform well for dimension and different channels. reduction? For the region of support shown in the figure, the curve covers the space quite It is quite interesting that the closest point from nicely. If we pick the center of the figure as the each of the simulated curves to OPTA is in the reference point for zero channel amplitudes, the range of 1 – 2 dB. By studying the background power requirement will also be satisfied if the material more closely, another interesting con- probability density function is uniform over the clusion is that at the closest point to OPTA the square. On the other hand, if the signal is Gaus- relation σN/∆ = 0.35 holds approximately for a sian, then the channel power will be higher for large range of conditions when no scaling of the this mapping than for the spiral mapping. PAM symbols was performed. Finally, the channel noise will perturb the signal in an acceptable way. Altogether, these specula- Another important aspect of the double spiral tions indicate that Shannon’s original mapping approximation shown in Figure 9 is that it runs would perform fairly well, especially for uni- through the origin and covers the plane in a sym- formly distributed signals. metric manner for the negative and positive channel amplitudes (drawn as grey and yellow 3.4 Dimension Increase lines, respectively). This implies that the channel Dimension increase becomes necessary to representation is symmetric (provided that the improve the fidelity in transmitting signal source samples are rotation symmetrically dis- amplitudes over noise prone channels. tributed), and that the transmitted power is low because the probability density function is peaky A well-known mapping is obtained by transmit- at the origin for Gaussian signals, and will be ting a signal amplitude K times and averaging represented by small channel amplitudes. the output signal. This is therefore a 1 : K map- ping. This will improve the signal-to-noise ratio in the receiver if each of the samples is contami- nated by independent noise components. This is a repetition code. 55 It is easy to conclude that we gain 3 dB in SNR 45 per doubling of K. A plot of the SNR versus CSNR using a repetition code for K = 2 is shown 35 in Figure 11. There is a striking difference be- tween the obtained performance and OPTA 25 SNR (dB) except at very low rates. Figure 11 Comparison of the performance of a code where 15 Let us take a closer look at the above repetition the samples are sent twice to code in terms of geometry and try to find the 5 the OPTA-curve for two times 0 510152025reason for its poor performance. Figure 12 bandwidth expansion CSNR (dB) shows the repetition code when K = 2. Both

122 Telektronikk 1.2002 channel components are equal, which implies Figure 12 Repetition 0.8 that they lie on a diagonal line in the square expansion from one to two channel space. It is immediately clear that the dimensions ( : R R2) 0.4 Ψ → channel space is not well exploited. Most of the

space is empty! 2 y 0

2 3.4.1 Error-free mapping Ψ : R → R -0.4 Although expanding mappings can certainly be devised for dimension change by rational ratios, -0.8 we limit our discussion to the case Ψ : R → R2. -1 -0.5 0 0.5 1 y A very simple error-free expanding mapping 1 uses one component that represents a discretized version of the signal and an extra component that represents the difference between the exact signal and the discretized signal.

The discretized signal can be transmitted as a PAM multilevel signal 6 Figure 13 Optimized 1D-to- 2D mapping for CSNR = 4 y1 = K1q(x), (20) 20 dB (top) and 3 dB (bottom) 2 where K is a scaling factor. The second compo-

1 1 nent is the correction term which can be trans- y 0 mitted as a continuous PAM signal -2

y2 = K2(x – q(x)), -4

-6 where K2 is a second scaling factor. The power -8-6 -4 -2 0 24 6 8 is distributed among the two samples through y2 the scaling factors. The quantizer is optimized both for decision and representation levels. 6 4 Two typical resulting mappings (from [2]) that have been optimized for different CSNRs are 2 shown in Figure 13. Observe that the orange 1

y 0 lines are not part of the mappings, but illustrate the connection between the different parts. The -2 performance of this type of mapping is given in Figure 14. It performs much better than the repe- -4 tition code presented in Figure 11! -6 -8-6 -4 -2 0 24 6 8 To approach the mapping suggested by Shan- y2 non, every second of the horizontal lines in the previous mapping is reversed, resulting in the optimized system in Figure 15. The performance of this system is slightly worse than the system in Figure 13. The original mapping suggested by Shannon is thus expected to perform quite well for signal expansion.

4 Joint Source Coding and 60 Modulation Incorporating Signal Decomposition Real-world signals are much more complex than 40 the signals we have studied so far. As a matter of fact, the type of information conceivable by Figure 14 Performance of the SNR (dB) 20 human observers requires structures in the signal 1D-to-2D mapping as a that involve statistical variations and sample function of the CSNR when the dependencies. This can be modeled as signals 0 system is optimized for each with short-term statistics, such as spectra, that 0305 10 15 20 25 point on the curve. change from position to position. The non-white CSNR (dB) The upper curve is OPTA

Telektronikk 1.2002 123 Figure 15 “Shannon-look- 6 A complete system for signal transmission in- alike” mapping 4 cluding signal decomposition is shown in Figure 16. In this system the outputs from the filter 2 bank are analyzed and classified. The classifica-

1 tion information is used to select different map-

y 0 pings that are pre-optimized for the encoder. -2 4.1 System Example -4 In the following we present an example taken -6 from [2]. We give a brief review of the most -8-6 -4 -2 0 2 4 6 8 important aspects of the system. Details can be y2 found in the thesis.

In this system the signal decomposition is per- formed in a filter bank using “System K” from spectra, which account for sample dependencies, [1]. This is a separable filter bank which in each imply that some signal decomposition should be dimension first splits the signal into 8 subbands applied before any mapping takes place to de- of equal size. The resulting low-pass band is fur- correlate the signal. The variabilities must be ther split dyadically into three stages. The filter accounted for by some type of adaptivity. coefficients were found by optimizing for coding gain. Signal decomposition can be performed in fre- quency selective filter banks. If non-overlapping Altogether 5 different mappings are used in the bands are implemented, the output subband coder. These are, in terms of their K : M ratio channels will be uncorrelated. With many chan- given by 1:4, 1:2, 2:3, 1:1, 2:1. The dimension nels the bands will be narrow, which implies that reducing mappings are optimized for Laplacian each band also will be close to white. The filter sources, because this corresponds most closely bank outputs have different power in each chan- to the actual distribution. The optimization nel, which also vary with position. method was developed by Fuldseth [6]. It opti- Figure 16 Signal mizes a discrete set of points for minimum mse communication system taking the approximation noise and the channel including signal noise into account. That is, the mappings are decomposition and optimized for a certain CSNR. The 1:2 mapping mapping allocation is shown in Figure 17. Notice that the mappings Mapping have a “spiral form”. If we drew an optimized Allocation spiral for a Gaussian source, it would look even more like Archimedes’ spirals.

The 1:1 mapping is based on Equation (6) al- Source Signal Channel though it is optimal only for Gaussian sources. Decomp. • q() T() • • In [2] it is shown that the deviation from OPTA is slight even when the source is Laplacian. The 1:2 mappings are of the type shown in Figure 13.

A very important part of the coder is the map- ping allocation [2] included in Figure 16. This is a mechanism for using the available resources 6 optimally. The resources in this case means bandwidth and channel power. Bandwidth is 4 directly related to the different mapping ratios, while power is a consequence of the mappings, 2 but can be altered by including scaling factors when inserting the signal components into the channel. In this coder the allocation is based on 0 the local variances in each of the subbands.

Figure 17 2:1 mapping for -2 Some digital side information must be conveyed Laplacian source. The to the receiver in order to inform about which mapping was optimized for -4 mappings have been used. Full error protection a white, Gaussian channel with is provided for this information to make sure that CSNR = 23.1 dB. 256 point -6 it is not lost before the noise destroys all mean- were used for the optimization -6 -4-2 0 2 4 6

124 Telektronikk 1.2002 34 ingful information anyway. Channel resources are allocated also to this part of the message.

32 It is difficult to make a completely fair compari- son to other systems. In [2] the reference system uses JPEG2000 part 1 baseline coder (ISO/IEC, 30 2001), implemented in verification model 8.0. Using QAM modulation with Gray coding and low-density parity check codes with soft deci- 28 sion, performance 3 dB away from the channel capacity can be achieved for an additive, white, PSNR (dB) Gaussian noise channel [28]. By assuming this 26 model and error-free performance down to this CSNR, and breakdown after this point, the simu- lation results for two images are shown in Figure 24 18. PSNR means Peak Signal-to-Noise Ratio, and is defined as the ratio between the quadratic peak value divided by the noise power, mea- 22 sured in dB. It is a common quality measure for 10 15 20 25 30 image evaluation. CSNR (dB)

34 Three curves are shown for each image. They represent systems optimized for the CSNR indi- cated by the circles and crosses. For the refer- 32 ence coder this is the CSNR for which the coder breaks down. This requires a certain bit rate for error protection. The system does not improve 30 for the given design when the channel CSNR increases. This is different for the proposed sys- tem. 28

For the “Goldhill” image, which is quite de- PSNR (dB) tailed, the proposed coder outperforms the refer- 26 ence coder everywhere. The situation is different for “Lenna”. This image is much smoother and favors the reference coder. But the proposed 24 coder still offers graceful degradation and can be used for lower CSNRs. 22 10 15 20 25 30 We have argued that the signal-to-noise ratio does not correspond to our perception. Therefore CSNR (dB) the visual quality of the received signal should also be studied. In the images in Figure 19 the new system is compared to the JPEG2000 sys- tem combined with the efficient channel code at CSNR = 20 dB for the two upper images. The 5 Conclusion Figure 18 Simulation results lower image is for the new system at CSNR = This paper uses the geometrical mapping method for “Goldhill” (upper figure) 17 dB. This is at a point where the JPEG2000 suggested by Shannon for making channel repre- and “Lenna” (lower figure) system breaks down, so there is no need to show sentations from signal vectors. This is a direct including three systems that image, as it contains only rubbish. method which does not need the intermediate optimized for different CSNRs quantization step for data reduction. What we (marked with star or circle). What clearly should be observed is that optimize for is bandwidth- and power use, which The dashed lines represent the JPEG2000 blurs the tiles and the brick structure are the natural resources available. The paper reference coder, where the of the walls. The new system maintains more of shows results for an image coder recently devel- circles indicate the breakdown that even at 17 dB CSNR. Notice, however, that oped. It gives comparable coding results as the point. The solid lines give the there is a light spot on the roof of the left-hand JPEG2000 coder combined with the best channel performance of the proposed house which is probably due to a severe channel coding methods available, and it offers much system. The crosses mark the noise component. better robustness towards channel changes. It design point for each curve should also be noted that the complexity of the proposed system can be quite low, and there is no extra delay for channel coding for the main portion of the information.

Telektronikk 1.2002 125 Figure 19 Image coding examples aspects not covered in those references. For at the rate 0.1 channel symbols completeness the following references are there- per pixel. Upper and lower images fore provided: [20, 18, 24, 25, 22, 19, 14, 23, 26, result from the proposed coder 27, 21, 8, 10, 11, 12, 13, 6, 9, 7, 16, 17, 15, 4, 5, at CSNR = 20 dB and 17 dB, 3, 29, 30, 31]. respectively. The middle image is by the reference coder at References CSNR = 20 dB 1 Balasingham, I. On Optimal Perfect Recon- struction Filter Banks for Image Compres- sion. Trondheim, Norwegian University of Science and Technology, 1998. (PhD thesis.)

2 Coward, H. Joint source-channel coding : Development of methods and utilization in image communications. Trondheim, Norwe- gian University of Science and Technology, 2002. (PhD thesis.)

3 Coward, H, Ramstad, T A. Bandwidth dou- bling in combined source-channel coding of memoryless Gaussian sources. In: Proc. IEEE Int. Symp. Intell. Signal Processing Commun. Syst. (ISPACS), 1, 571–576, Hon- olulu, HI, USA, November 2000.

4 Coward, H, Ramstad, T A. Quantizer opti- mization in hybrid digital-analog transmis- sion of analog source signals. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Proc. (ICASSP), Istabul, June 2000, 2636–2640. IEEE.

5 Coward, H, Ramstad, T A. Robust image communication using bandwidth reducing and expanding mappings. In: Thirty Fourth Asilomar Conference on Signals, Systems and Computers, 2, 1384–1388, Pacific Grove, CA, USA, October 2000.

6 Fuldseth, A. Robust Subband Video Com- pression for Noisy Channels with Multilevel Signaling. Trondheim, Norwegian Univer- sity of Science and Technology, 1997. (PhD thesis.) Many other systems based on the mapping method have been devised and simulated includ- 7 Fuldseth, A, Fischer,T R, Ramstad, T A. ing video coders and other channel representa- Channel-optimized trellis-coded vector tions, such as phase modulation. quantization for channels with a power con- straint. In: Proc. Information Theory Work- We believe that the robustness offered, and the shop (ITW-98), San Diego, USA, February possible gains obtained by further development 1998. of such systems could make them good candi- dates for wireless systems, especially for image 8 Fuldseth, A, Lervik, J M. Combined source and video communication, but also for broad- and channel coding for channels with a casting. power constraint and multilevel signaling. In: Proc. ITG Conference, München, Ger- The paper has for the most part only referenced many, October 1994, 429–436. ITG. the central Shannon papers plus the most impor- tant Dr.Ing. theses because these make the most 9 Fuldseth, A, Ramstad, T A. Robust and effi- complete description of the methods. However, cient video communication based on com- there exist several papers which present parts of bined source- and channel coding. In: Proc. the methods and results, and others that present Nordic Signal Processing Symposium

126 Telektronikk 1.2002 (NORSIG -97), Tromsø, Norway, May 1995, 19 Lervik, J M. Joint optimization of digital 65– 68. communication systems: Principles and practice. In: Proc. ITG Conference, 10 Fuldseth, A, Ramstad, T A. Combined video München, Germany, October 1994, coding and multilevel modulation. In: Proc. 115–122. ITG. Int. Conf. on Image Processing (ICIP), Lau- sanne, Switzerland, September 1996, I, 20 Lervik, J M, Eriksen, H R, Ramstad, T A. 941–944. Bandwidth efficient image transmission sys- tem based on subband coding. A possible 11 Fuldseth, A, Ramstad, T A. Robust subband method for HDTV. In: Proc. NOBIM Conf., video coding with leaky prediction. In: Sev- Lillehammer, Norway, February 1993. (In enth IEEE Processing Work- Norwegian.) shop, Loen, Norway, September 1996, 57–60. 21 Lervik, J M, Fischer, T R. Robust subband image coding for waveform channels with 12 Fuldseth, A, Ramstad, T A. Bandwidth com- optimum power- and bandwidth allocation. pression for continuous amplitude channels In: Proc. Int. Conf. on Acoustics, Speech, based on vector approximation to a continu- and Signal Proc. (ICASSP), München, Ger- ous subset of the source signal space. In: many, April 1997. IEEE. Proc. Int. Conf. on Acoustics, Speech, and Signal Proc. (ICASSP), 1997, IV, 22 Lervik, J M, Fuldseth, A, Ramstad, T A. 3093–3096. Combined image subband coding and multi- level modulation for communication over 13 Fuldseth, A, Ramstad, T A. Channel-opti- powerand bandwidth limited channels. In: mized subband video coding for channels Proc. Workshop on Visual Signal Processing with a power constraint. In: Proc. Int. Conf. and Communications, New Brunswick, NJ, on Image Processing (ICIP), 3, 428–431, USA, September 1994, 173–178. IEEE. Santa Barbara, CA, USA, October 1997. 23 Lervik, J M, Grøvlen, A, Ramstad, T A. 14 Grøvlen, A, Lervik, J M, Ramstad, T A. Robust digital signal compression and mod- Combined digital compression and digital ulation exploiting the advantages of analog modulation. In: Proc. NORSIG-95 (Signal communication. In: Proc. IEEE GLOBE- Processing Symposium), Stavanger, Norway, COM, Singapore, November 1995, September 1995, 69–74. IEEE/NORSIG. 1044–1048. IEEE.

15 Hjørungnes, A. Optimal bit and power con- 24 Lervik, J M, Ramstad, T A. An analog inter- strained filter banks. Trondheim, Norwegian pretation of compression for digital commu- University of Science and Technology, nication systems. In: Proc. Int. Conf. on 2000. (PhD thesis.) Acoustics, Speech, and Signal Proc. (ICASSP), Adelaide, South Australia, April 16 Hjørungnes, A, Ramstad, T A. Linear solu- 1994, 5, V–281–V–284. IEEE. tion of the combined source-channel coding problem using joint optimal analysis and 25 Lervik, J M, Ramstad, T A. A new approach synthesis filter banks. In: Thirty-First Asilo- to objective evaluation of power- and band- mar Conference on Signals, Systems and width-limited integrated communication sys- Computers, Naval Postgraduate School, San tems. In: Proc. Nordic Signal Processing Jose, CA, USA, 2, 990–994. Maple Press, Symposium (NORSIG -94), Ålesund, Nor- November 1997. way, June 1994, 49–54. NORSIG.

17 Hjørungnes, A, Ramstad, T A. On the per- 26 Lervik, J M, Ramstad, T A. Robust image formance of linear transmission systems communication using subband coding and over power constrained, continuous ampli- multilevel modulation. In: Proc. 1996 Sym- tude channels. In: Proc. for the UCSB Work- posium on Visual Communications and shop on Signal & Image Processing, Santa Image Processing (VCIP-96), Orlando, FL, Barbara, USA, December 1998, 31–35. USA, March 1996, SPIE 2727, 2, 524–535. SPIE/IEEE. 18 Lervik, J M. Integrated system design in dig- ital video broadcasting. Piksel’n, 10 (4), 27 Lervik, J M. Subband Image Communication 12–22, 1993. over Digital Transparent and Analog Wave- form Channels. Trondheim, Norway, Nor- wegian University of Science and Technol- ogy, 1996. (PhD thesis.)

Telektronikk 1.2002 127 28 Myhre, B, Markhus, V, Øien, G E. LDPC 31 Ramstad, T A. Robust image and video com- coded adaptive multilevel modulation for munication for mobile multimedia. In: slowly varying Rayleigh-fading channels. NATO Advanced Study Institute, Signal Pro- In: Proc. Norwegian Signal Processing cessing for Multimedia, Il Cioccho, July Symp. (NORSIG), Trondheim, Norway, 1998. October 2001. 32 Shannon, C E. A mathematical theory of 29 Ramstad, T A. Efficient and robust commu- communication. Bell Syst. Tech. J., 27, nication based on signal decomposition and 379–423 and 623–656, 1948. approximative multi-dimensional mappings between source and channel spaces. In: 33 Shannon, C E. Communication in the pres- Proc. NORSIG, Helsinki, Finland, Septem- ence of noise. Proc. IRE, 37, 10–21, January ber 1996. 1949.

30 Ramstad, T A. Digital image communica- 34 Shannon, C E. Coding theorems for a dis- tion. In: Proc. International Workshop on crete source with a fidelity criterion. IRE Circuits, Systems and Signal Processing for Nat. Conv. Rec., March 1959, 142–163. Communications ’97, Tampere, Finland, April 1997.

128 Telektronikk 1.2002 Information Theory

A lecture presented at a study session for radio technology and electro-acoustics at Farris Bad, 16–18 June 1950 by Graduate Engineer Nic. Knudtzon, Norwegian Defence Research Establishment, Bergen

This is a translation into English of the paper “Informasjonsteori”, which appeared in Elektroteknisk Tidsskrift 63 (30), pp. 373–380, 1950. The translation was done by Berlitz GlobalNET and final quality control was done by Geir E. Øien.

1 Introduction nel, and how much information it is possible to Information theory is a branch of statistical com- transmit through a channel with a given band- munications theory. Figure 1-1 is a schematic width and signal-to-noise ratio. representation of a communications system; it Dr. Nic. Knudzon (80) obtained consiss of an information source, a communica- We are not concerned with the semantic content his Engineering degree from the tions channel and a destination, the communica- of the message (its meaning); a communications Technical University of Norway, tions channel consisting of a transmitter, a con- engineer must be capable of constructing an effi- Trondheim in 1947 and his Doc- tor’s degree from the Technical nection and a receiver. The information source cient telegraphy system for Greek, knowing the University in Delft, the Nether- generates the messages that are to be transmit- statistical characteristics of that language, even lands in 1957. 1948–1949 he ted. The transmitter converts the messages to a though he does not understand Greek. was with the Research Labora- tory of Electronics, Massachu- suitable signal for transmission, and in the setts Institute of Technology, receiver the inverse operation takes place. The We will attempt to reach our conclusions by working with information theory destination is the person or equipment to whom, means of clear and simple considerations, avoid- and experiments. 1950–1955 he was with the Norwegian De- or to which, the message is addressed. On the ing abstract mathematics (probability theory and fence Research Establishment, way, noise is added. In the following treatment, dimension theory) which would be necessary for Bergen, working on the devel- we will ignore the effects of distortion of mes- exact derivations. Hence, we can only consider opment of microwave radio links; and from 1955 to 1967 he sages resulting from non-linear characteristics very simple examples by means of practical was Head of the Communica- of the equipment, and other system “errors”. applications. tions Division at Shape Techni- cal Center in The Hage, Nether- lands, where his efforts went into We cannot know the content of the individual 2 Classification of Systems the planning of military telecom- messages beforehand, all we can know is the According to Message Type munications networks and sys- statistical characteristics of the class of messages We can distinguish between the following tems in Western Europe. From 1968 to 1992 he was Director of we wish to transmit. Telecommunication is systems: Research at the Norwegian Tele- therefore a statistical process, and communica- a) Discrete system. The message and the signal communications Administration, tions systems must be constructed for a specified both consist of a series of discrete (discontinu- working on the planning of future telecommunications systems, class of messages with given statistical charac- ous) symbols. An example of this is ordinary networks and services. Dr. teristics.*) In order to make an objective assess- telegraphy, in which the message is a series of Knudtzon has been member of ment of the ability of a system to transmit infor- letters and the signal consists of dots and government commissions and various committees, including mation, it is necessary to define a unit of infor- dashes. the Norwegian Research Coun- mation, just as the volt is the unit of electrical cil, the National Council for voltage. We will now define such a unit. By b)Continuous system. The message and the sig- Research Libraries, the Inter- national Telecommunications means of this we will then study the information nal are both continuously varying. An exam- Union, EURESCOM, etc. in different messages, how these should be con- ple is ordinary telephony, in which the mes- verted to achieve an efficient transmission chan- sage consists of pressure variations in the air

Channel

Noise

Information Transmitter Receiver Destination source

Message Signal Signal Message + noise + noise

Figure 1-1 A communications system

*) See also “Statistical Communications theory. A brief overview of the problem”, an introductory lecture pre- sented at a study session for radio technology and electro-acoustics, 16–18 June 1950. Teknisk Ukeblad 1950.

Telektronikk 1.2002 129 Symbol Numbers 3 Information in a Discrete System S1 000 Written text is a typical example of a discrete message. There are indications that signals in the 3. Division human nervous system are also discrete, and can therefore be represented by a series of choices. S2 001

2. Division 3.1 Unit of Information The simplest type of choice is a choice between S3 010 two equal possibilities: 1 or 0, yes or no, heads or tails, or simply, any case of equally possible 3. Division “either – or” states. We will define the unit of information as the outcome of such a binary ele- S 011 4 mentary choice, and we will designate it 1 bit 1. Division (abbreviation of “binary digit”). The designation 1 Hartley has also been suggested, after one of the pioneers of information theory. Further, we S5 100 will postulate that H independent elementary 3. Division choices provide H bits of information. Based on this, we are able to develop methods of calculat- S6 101 ing the amount of information in discrete mes- sages. 2. Division 3.2 Nth-order choice S 110 7 The selection of one symbol from a register of 3. Division N – that is a group of N possible elements – is called an Nth-order choice. To specify how

S8 111 much information such a choice represents, we must reduce it to a series of independent elemen- tary choices. The number of these necessary to specify one of the N possible elements is by def- inition equal to the number of bits of information for the Nth-order choice in question. Figure 3-1 Nth-order choices, and the signal is a continuously varying elec- elements of equal probability trical function of time. 3.21 Choice of N Elements of Equal Probability c) Hybrid system. Both discrete and continuous We envisage these as N symbols arranged in a signals are present. An example is pulse-code column, as shown for N = 8 in Figure 3.1. To modulation (PCM). specify one particular element by means of a series of elementary choices, we proceed as The individual messages are generated through follows: First, we divide them into two equal a series of selections from a given register, that groups, which represents one elementary choice. is to say, by successive selection from a given Then we divide each group into two sub-groups; collection of symbols of one type or another. two elementary choices will be sufficient to For example, a written message is produced by specify one of these. The process of successive selecting letters from the alphabet of the lan- division is continued until the desired symbol is guage in question; speech by selecting particular isolated from the others. We see that sounds that the speaker can generate with his or her voice. It is worth noting that the register in 1 elementary choice is needed for N = 2 general consists of a limited number of symbols: 2 elementary choices are needed for N = 4 hence we only have a few dozen written charac- 3 elementary choices are needed for N = 8 ters at our disposal, and speech is physiologi- etc. cally restricted to approximately 50 different sounds. The number of possible messages of In general, finite duration is therefore also limited. The larger the number of different symbols and the HN = log2 N bits (3-1) longer the duration, the more information the individual message will contain. The amount are required to specify a choice from N equal of information in a single message therefore possibilities. increases with the number of possibilities. The groups which arise after each division can be designated by 0 or 1 respectively, as shown in

130 Telektronikk 1.2002 a priori Figure 3-2 Nth-order choices, Probability elements of unequal P Symbol Numbers H i i probability

1/2 S1 0 1

1. Division

1/4 S2 10 2

2. Division

1/8 S3 110 3

3. Division

1/8 S4 111 3

1 Figure 3-1. The resulting number of digits for Hi = log2 = − log2 pi pi each symbol is then equal to HN, that is, equal to the number of bits necessary to specify the sym- bols. Since the sum of the probabilities of all symbols is equal to 1, the average information for the Nth- The above-mentioned result is strictly speaking order choice is only correct if N is a power of 2, so that HN is an integer. If this is not the case, HN will be equal N p log 1 to one of the two integers that are closest to ∑ i 2 pi N i=1 log2 N, depending on which symbol is to be HN = N = − ∑ pi log2 pi (3-2) i=1 specified. ∑ pi i=1 3.22 Choice of N Elements of Unequal Probability This expression has a maximum at pi = 1/N, that Each of these possibilities has given (a priori) is to say, in the case of equally likely choices. It probabilities. Again, we perform successive is then reduced to Equation (3-1). If the four divisions in elementary choices, that is, into two symbols in the example above had been equally th groups of elements having equal probability. likely, i.e. all pi = 1/4, the information per N - The divisions are performed such that the total order choice would have been probability in the groups is equal. HN = log2 N = 2 Let us first consider an example. In Figure 3-2, four symbols Si are given, with probabilities pi which is greater that for any other set of proba- (where i = 1,..4). The divisions are performed as bilities. shown, and the groups are designated by 0s and 1s. The number of digits is then equal to the The right-hand side of Equation (3-2) has, ex- number of elementary choices Hi bits, that are cept for the sign, the same mathematical form as necessary to specify the symbol in question. Hi entropy in thermodynamics for a system whose is different for the different symbols, the infre- possible states have the probabilities pi. It has quent ones representing more information than therefore been designated negative entropy. the frequent ones. We find that The fact that the base 2 has been included in the symbol S1 S2 S3 S4 above expressions results from our decision to define the unit of information as a choice Hi 1233 between 2 equal possibilities. If we instead had chosen to define the unit as a choice between 10 The average information per choice is thus equal possibilities, we would have obtained common , etc. Mathematically, there 1 1 1 1 7 is of course no difference. However, in practical HN = ⋅1+ ⋅2 + ⋅3 + ⋅3 = 2 4 8 8 4 terms, there are good arguments for working with binary numbers. An electrical apparatus, for Generally, we find that the number of bits neces- example, does not need to measure the size of a sary to specify a symbol with a priori probabil- pulse, but only decide whether the pulse is pre- ity pi sent or not. A switch, or flip-flop circuit has two

Telektronikk 1.2002 131 possible states; m such elements therefore have single symbol, and is therefore characterised by 2m possibilities, and can therefore store m bits an a priori probability of 1 (i.e. certainty), and of information. Moreover, from experience, it is contains no information. natural for a person to divide successively into two equal groups, rather than directly into ten 3.32 The Symbols are Dependent groups, for example. In practice, the symbols are often dependent. Thus, in Norwegian e is very often followed by 3.3 Discrete Messages and Signals n or r in word-endings, and there is therefore a A discrete message (signal) will generally con- greater conditional probability of n or r occur- sist of a series of symbols S1, S2, …, Si, …, SN, ring after e than, for example, after l. having the probabilities p1, p2, …, pi, …, pN, i.e. a series of Nth-order choices. Such a process, It can be shown that Equation (3-4) is valid also which depends on a set of a priori probabilities, in the case of dependent choices, if we for pi is called a stochastic process. It is characterised introduce the conditional probability for symbol by certain statistical properties, which are deter- Si, based on knowledge of all previous choices. mined by these probabilities. An example of such a discrete message is ordinary text, in With dependent choices, the results of Section which the symbols are selected from the alpha- 3.31 can be considered as a first approximation. bet of the language in question, which has They will give too high values of HS, because known letter frequencies of occurrence, etc. the uncertainty of each choice is smaller when those choices are dependent. In the following treatment, we will assume that the messages (signals) we are concerned with are 3.4 Redundancy statistically stationary, that is to say that their We define the redundancy of an information statistical characteristics are invariant towards source as translation in time of the relevant time series. HSmax − HS Stated in another way: we assume that the R = HS ergodic hypothesis is valid. The statistical char- max acteristics can then be determined as time aver- where H is the maximum negative entropy Smax ages for a single message of sufficient duration, we could have obtained for the same symbols or as the average for all messages in the same (ifthese were independent and equally probable). class at a given point in time. The redundancy is an expression of the correla- tion in the message. 3.31 The Symbols are Independent If the individual messages consist of one symbol For example, let us consider the English lan- that can be selected from N possible symbols, guage. There are certain a priori conditional there are N possiblities. If they consist of two probabilities for the occurrence of digrams (two symbols, there are N2 possibilities, and so on. consecutive letters), trigrams (three consecutive The total number of messages possible with n letters), etc, and for certain words. The redun- such Nth-order choices is M = Nn. If these are dancy for everyday English is found to be equally possible then the average information approximately 50 %, if we do not consider sta- per message (consisting of n symbols) is therefore tistical structures over more than eight letters. This means that half of written English is deter- HM = log2 M = n log2 N mined by the structure of the language, while the remainder is chosen freely. The above-men- and the average information per symbol tioned figures have been determined in different ways: by calculating the negative entropy of sta- HS = log2 N (3-3) tistical approximations to English, by eliminat- ing a certain proportion of the letters in a text If the choices do not have equal probability, the and letting another person read it, and by crypto- average information per symbol according to the graphic methods. Here, it is worth mentioning ergodic hypothesis that there is a considerable difference between N the redundancy of Shakespeare’s English with a HS = − pi log2 pi (3-4) large vocabulary and that of so-called “basic i=1 English” with a vocabulary of about 850 words. It can be shown that this expression has a maxi- In the first case, it is possible to express oneself mum value at pi = 1/N, in other words, when all briefly and concisely, and the redundancy is low. symbols are equally possible. In the second case, a large number of words is needed for an exact description, and the redun- HS is zero only when all values of pi are zero dancy is therefore high. Similar considerations except one, and this equals 1, i.e. N = 1. The can be made for the different forms of the Nor- message consists in this case of a series of one wegian language.

132 Telektronikk 1.2002 Another example is an address label on a postal on. For the Czech language, which has com- package recently received by the author. It read pletely different letter frequencies, it is probably Forsavarels Farstiningsint, less efficient. [correct: Forsvarets Forskningsinstitutt] Avd. for Radar 4 Information in Continuous Bergen ... Norway and Hybrid Systems The information content in continuous messages Thanks to the redundancy in this message, the and signals can be determined by reducing them package reached its destination. to discrete signals. This is done in two stages: sampling of the signals gives a series of ordi- Also in television signals, redundancy is com- nates, which can then be quantized. We shall mon; hence the background of the picture can now look at these two processes in more detail. remain unchanged for long periods, while a per- son in the foreground makes small movements. 4.1 Sampling The function of time f(t) is assumed to contain In an efficient communications system, we will only frequency components in the bandwidth 0 – often remove part of the redundancy at the input W p/s. Intuitively, we know that f(t) cannot then to the channel. The more we remove, the more change to a substantially new value in an inter- important the remaining symbols become. How val of time 1 , which is half the period of the much we are to remove therefore depends on the 2W impairments (noise) on the channel. highest frequency. On this basis, pulse modula- tion was introduced, and experiments were per- 3.5 Coding formed to examine the intelligibility of signals The discrete messages consist of symbols of with sampling frequencies from 1 up to 3 times varying frequency of occurrence, and can there- W. It was found that a factor of around 2 was fore be represented by a series of Nth-order sufficient to achieve good quality. We will now choices (which as a rule have unequal probabil- demonstrate that f(t) is defined exactly by sam- ity). Each of these can be reduced to a series of pling at a frequency of 2W per second. elementary choices; this process is called binary coding. Theoretically, optimal binary coding f(t) is multiplied by a function S(t), consisting of requires an average of HS elementary choices/ delta pulses repeated at a frequency fr. The prod- symbol, where HS is given by Equation (3-4). uct f(t) ⋅ S(t) is then a series of ordinates of f(t) separated by a distance 1/fr. S(t) can be broken The example in Section 3.22 illustrates a very down into a Fourier series consisting of a DC simple coding process. The binary codes are component, the fundamental frequency and the identical to the numbers that arise from the suc- higher harmonics. cessive divisions into two groups of equal proba- bility. It follows from this that the most fre- As shown by the theory of pulse modulation, quently occurring symbol has the shortest code, multiplication by f(t) will result in sidebands and that the code becomes longer the more in- around each of these components with the same frequent the symbols are. shape as the spectrum of f(t), that is with a width of W. This is illustrated in Figure 4-1. If we are We can think of coding as a “stretching” of the to separate the spectrum for f(t) around the DC time scale according to the probability of occur- component from the lower sideband around the rence for the individual symbols, that is, a statis- fundamental, the two must not be mixed. The tical adaptation of the channel to the class of condition for this is message that it is to transmit. Of course, this requires a time delay, which in the case of opti- fr – W > W mal coding, can be infinitely long. HS in Equa- tion (3-4) is therefore to be considered as a lower or bound in practice. It is, however, generally easy fr ≥ 2W to calculate how close we are to this optimal state and thus indicate the degree of efficiency In most practical cases, it is simplest to sample of the coding. This will depend on the memory periodically, but this is not necessary. We can, of the transmitter. for example, also determine f(t) uniquely by means of ordinates with varying time intervals, Of course, we do not need to use binary coding: or by means of ordinates and derivatives, as long the Morse code, for example, is quaternary. For as the frequency of sampling is at least 2W per English and Norwegian text, this is certainly not second. bad, as the most commonly occurring letters such as e and t have the shortest codes, and so

Telektronikk 1.2002 133 Amplitude 5 The Information Capacity of the Channel

5.1 Derivation and Definition We will now determine how many bits H of Frequency information it is possible to transmit in time T 0 f 2f 3f r r r through a channel with bandwidth W and signal- W to-noise ratio Ks = Ps / Pn (reduced to receiver input). fr - W If we are interested in the time series f(t) for an interval of T, then Figure 5.1 shows a continuous signal f(t) in a grid where the division along the time axis is the Figure 4-1 Frequency n = 2TW samples (4-1) sampling interval 1 and the division along components for S(t), 2W with associated will be sufficient to characterize f(t) uniquely the amplitude axis is the tolerance ∆. sidebands for f(t) and exactly. According to Equation (4-2), there are 4.2 Quantization P + P Through sampling, we have reduced a continu- N = s n = 1+ K P s ous function of time f(t) with limited bandwidth n to a series of ordinates which exactly define f(t). These can nevertheless vary continuously in possible levels in each time interval, and for a amplitude, and hence an infinite number of ele- signal of duration T there are, according to mentary choices, i.e. an infinite number of bits Equation (4-1), of information, will be needed to specify any one of them. However, there is no need to spec- n = 2WT ify the amplitudes with greater accuracy than a certain tolerance ∆, which is primarily deter- such time intervals. mined by the level of noise. We will therefore only specify certain amplitude steps, and we call In a single time interval there are N possible this quantization. If the maximum value of the amplitudes, after two time intervals there are amplitude of the function is A, we will be able therefore N2 possibilities, and so on. The total to distinguish between a total of number of possible signals is therefore

A + ∆ N = n 2WT WT ∆ N = 1+ Ks = (1+ Ks )

We wish to express N in terms of a signal power In other words, there is a limit to the number Ps, and a noise power Pn, which are the charac- of signals we can transmit through this type of teristic expressions for statistical messages and channel. If they are equally probable, the amount noise. Knowing the amplitude distribution of of information per signal in bits is maximum, fluctuation noise, we can, on the basis of experi- that is ence, set H = log (1 + K )WT = WT log (1 + K ) (5-1) A + ∆ P + P 2 s 2 s N = = s n (4-2) ∆ P Figure 5-1 Continuous signal n The information the channel can transmit per time interval is therefore

C = H/T = W log2(1 + Ks) (5-2) Amplitude A This quantity is called the information capacity of the channel, and its units are bits/sec.

If we are to exploit this capacity fully to transmit a message containing an amount of information H in a time T, optimal coding will be required (see Section 3.5), that is, the transmitter must f(t) convert the message to a signal which is com- pletely statistically adapted to the channel. We ∆ Time t will call such a communication system an ideal 1 system. It is not possible to transmit more infor- 2W mation per time interval than C.

134 Telektronikk 1.2002 5.2 Discussion of an Ideal System concepts and results of information theory open Equation (5-2) applies to all values of Ks. For up a possibility of studying the transmission of 6) Ks << 1, log2 (1 + Ks) ≈ Ks log2ε, i.e. information in nervous systems.

C ≈ 1.443WKs The human central nervous system contains something of the order of 10,000 million cells. For Ks >> 1 The most complicated computer so far built, ENIAC, has 10,000 binary elements, which is ≈ CWlog2K about the same number as in the nervous system of an earthworm. The large number of cells in Equation (5-2) shows the relationship between the human brain makes it possible for a signal to the parameters W, Ks and C. For the same infor- be transmitted by several parallel routes, so that mation capacity C in two cases, marked 1 and 2, a fault in a single cell in a chain does not signifi- cantly interfere with the transmission. In an elec- (1 + K )W1 =(1+K )W2 s1 s2 tronic computer, however, such a fault would usually lead to a completely erroneous result in 1+K =(1+K )W2/W1 s1 s2 all subsequent calculations, because the signal usually can only travel by one route. The results or, approximated for large values of Ks of information theory have become important in connection with the operation and correction of W2/W1 Ks ≈ K 7) 1 s2 errors in computers.

We can thus reduce the bandwidth W if we sim- Moreover, an complete theory for transmitting ply increase the signal-to-noise ratio enough. A secret information (cryptography) has been reduction in W by half will therefore require an developed.8) increase in Ks to the second power. Conversely, we can increase W and thus manage with a sub- 6.2 Practical Communications stantially lower Ks. Systems Figure 6-1 Graphic represen- The equation tation of C/W = log2(1 + Ks). The reduction in bandwidth could, for example, Some typical practical systems be achieved in the following manner: instead of C / W = log (1 + K ) are indicated 1 2 s specifying each sample (at intervals of 2 W ), every second sample and its derivative are trans- mitted as a composite number. In this way, the necessary bandwidth can be halved. On the other C/W bit hand, the signal-to-noise ratio must be raised to 5 the power of 2 to enable us to read the composite number with the same accuracy.

It should be mentioned that the bandwidth enters 4 into the expressions for Ks = Ps / Pn, since, for 4 fluctuation noise,

P = W ⋅ 4kT n abs 3 PCM systems where k is Boltzmann’s constant and Tabs = tem- 3 perature.

Equation (5-2) can be derived exactly by means of multi-dimensional geometry.3) This kind of Ideal system 2 treatment will also lead to more wide-reaching 2 results, such as an explanation of the threshold problem in broadband modulation systems. PPM systems 6 Applications 1 6.1 Miscellaneous There are reasons to believe that the nervous systems of living organisms consist of nerve cells with two possible responses: 0 or 1, exactly like a switch. The signals can therefore be char- Ks db acterised by a series of binary choices, and the -20 -10 0 10 20

Telektronikk 1.2002 135 is represented graphically in Figure 6-1. The Pulse code modulation: curve shows how much information it is gener- Here, η is around 40 %. As is known, PCM ally possible to transmit per unit time and per is the only practical system devised so far for cycle of bandwidth for a signal-to-noise ratio Ks. which Km increases with bandwidth Ws. An In general, the ideal values will not be attainable increase in Ws will therefore not reduce η, as in practice, since this would require optimal cod- with other types of pulse modulation. ing, which can result in an infinitely long time delay. However, it is possible to calculate how Previously, it has only been possible to deter- far a practical system will be from the optimal mine a communications system’s efficiency in system; hence the diagram indicates points for transmitting information by means of compre- typical pulse position modulation (PPM) sys- hensibility tests. This is a subjective measure, tems and binary, tertiary, and so on, pulse code unless we allow a very large number of people modulation (PCM) systems, without delay in the to act as information source or destination, and transmitter. use statistical methods to find average scores. This would, however, prove expensive, because We will define the information efficiency η as the system must first be built and then subjected the ratio of the amount of information per sec- to time-consuming tests. However, on the basis ond that is actually transmitted to the maximum of the general definition Equation (6-1) for infor- amount of information that could have been mation efficiency η, we are now able to calcu- transmitted through a corresponding ideal sys- late (that is, objectively assess) the ability of an tem. To put it another way individual communications system to transmit inf./sec. in received message information of various types. η = information capacity

Wm log2(1 + Km) Literature = (6-1) 1 Fano, R M. The Transmission of Informa- Ws log2(1 + Ks) tion. Massachusetts Institute of Technology, where the indices m and s represent the message Research Laboratory of Electronics. (Tech- and the signal, respectively. nical Report No. 65.)

Let us consider the information efficiency for a 2 Shannon, C E. A Mathematical Theory of number of practical modulation systems.5) Communication. Bell Syst. Tech. J., 27, 623–656, 1948. Amplitude modulation (AM): ≈ single sideband Wm = Ws and Km Ks 3 Shannon, C E. Communication in the Pres- or η ≈1 = 100 % ence of Noise. Proc IRE, 37, 10–21, 1949. ≈ double sideband Wm = 0.5Ws and Km Ks or η ≈0.5 = 50 % 4 Tuller, W G. Theoretical Limitations on the Rate of Transmission of Information. Proc Frequency modulation (FM): IRE, 37, 468–478, 1949. η is in the region of 20 – 2 % when the modu- lation index µ (= frequency deviation/Wm) 5 Clavier, A G. Evaluation of Transmission varies between 1 and 100. η decreases as µ Efficiency According to Hartley’s Expres- increases; when the modulation index in- sion of Information Content. Elec. Com., 25, creases, i.e. increase of the bandwidth Ws, a 414–420, 1948. better message-to-noise ratio is obtained, but the information efficiency decreases. The rea- 6 Wiener, N. Cybernetics. Wiley, 1948. son for this is that the denominator increases with the bandwidth Ws, while the numerator 7 Hamming, R W. Error Detecting and Cor- only increases with the log of Km. recting Codes. Bell Syst. Tech. J., 29, 147–160, 1950. Pulse modulation: With the exception of PCM, the different sys- 8 Shannon, C E. Communication Theory of tems have an information efficiency of the Secrecy Systems. Bell Syst. Tech. J., 28, same order of magnitude as FM. This can be 656–715, 1949. explained in the same way, namely that the bandwidth Ws becomes significantly larger Excerpts from the Discussion than Wm, so as to obtain a desired message- After the Lecture to-noise ratio for a relatively low Ks. Garwick: The lecture was very interesting. I have noticed a couple of points that need further clarification.

136 Telektronikk 1.2002 1. The lecturer mentioned that when a time dimension the network statistically correctly. series is to be transmitted without distortion What significance does this have for the practis- by means of sampling, it is sufficient to ing engineer who wishes to try out using these sample with a time interval equal to or greater new methods? What investigations must be car- 1 than 2 W , where W is the bandwidth. But what ried out before it is possible to say that one is the bandwidth? In the case of a completely knows a class of information completely? square spectrum, this is simple enough, and similarly if the spectrum has some other Knudtzon: shape, but is limited such that frequency I will mention some examples. In telegraphy, components above a certain limit do not exist. it is possible to set up tables showing the fre- In practice, the bandwidth will often have in- quency of occurrence of letters, digraphs, tri- finite extent. The frequency spectrum’s side- graphs and words. Such tables exist for the bands will also have infinite extent, and the English, French, German, Italian and Spanish proof will therefore not be mathematically languages at least. In the case of telephony, one valid. needs the correlation function, which can be measured electronically. The result depends to 2. During his treatment of quantising, the lec- a large extent upon what one wishes to transmit, turer stated that: for example to what extent the character (intona-  tion, emotion, atmosphere) is to be preserved. A +∆ Ps + Pn N = = These matters are being subjected to extensive ∆ Pn measurements at the Psycho-Acoustic Labora- The left-hand side is correct, but it is not tory at Harvard, among other places. Also televi- immediately obvious that the right-hand side sion signals are being studied: it should, for is equal to the left-hand side, since it is the example, be unnecessary to transmit a stationary amplitudes that characterise the noise, and not background in a television picture continuously. the power. I will briefly summarise the technique for elec- Knudtzon: tronic determination of the correlation function. 1. It was assumed that the message was limited The autocorrelation function for a time function in frequency to the bandwidth 0 – W p/s. f(t) is defined as:  Whatever the shape of the spectrum within 1 +T this bandwidth, the bandwidth is therefore W. ϕ(τ)= lim f(t)f(t + τ)dt T →∞ 2T −T It is easy to demonstrate mathematically that an infinitely long message can have a limited The function f(t) is sampled periodically at a fre- frequency spectrum. If the message has passed quency of fs in pairs separated by intervals of τk through an ideal low-pass filter whose upper seconds, giving us the samples α1, β1, ..., αn, βn, frequency limit is W, the bandwidth will be W, ... Pulses are generated with heights proportional irrespective of the behaviour of the system to αn, and widths proportional to βn, such that function in the pass band. In practice, the the areas represent the products αnβn. The aver- attenuation in the rejected band can be made age is then taken of the integrated product over large, but not infinite, for all frequencies the observation time T, and we get the following greater than W. We will therefore introduce a expression for the autocorrelation for τk tolerance in the same way as in quantising, 1 Tfs and the bandwidth will be that frequency at ϕ(τ ) ≈ α β | τ k Tf n n k which the attenuation definitively exceeds the s n=1 tolerance. Then, the measurement is carried out for τk+1, 2. I stressed that from our knowledge of the and so on. Correlators of this type have been amplitude distribution of fluctuation noise, the built at the Research Laboratory of Electronics two expressions can be equated to each other. at MIT. In fact, it is precisely the power that charac- terises the noise. A mathematical treatment Falnes: based on multi-dimensional geometry has The lecturer has treated statistically distributed been presented by Shannon.*) noise. In practice, impulsive noise is also experi- enced. Although the average noise voltage is Gaudernack: low, strong spikes can completely obliterate a The lecturer has pointed out that a piece of infor- message such as a telegraphy character. mation is not completely known until one has full knowledge of its statistical properties, and that this knowledge is necessary to be able to

*) Proc. IRE 1949 pp 10-21.

Telektronikk 1.2002 137 f(t) noise ratio Ks. The equation defines in a way the boundary between what is possible and what is impossible. For example, it can prevent us from α1 β1 α2 β2 α3 β3 α4 β4 trying to transmit a message through a channel τk τk τk τk that is not capable of carrying that message. Time 1/f 1/f 1/f 1/f s s s s On the other hand, we can only achieve the max- imum information capacity by using optimal β1 β β2 4 β3 coding, and this can demand expensive and com- α1 α4 αnβn α2 α3 plex technical equipment. Hence, it may well be Time economically justifiable to use a more primitive type of coding, and therefore a greater band- width and output power than is strictly neces- sary, technically speaking. This is especially the Σαnβn τk case when one attempts to save power by using large bandwidths, since one then makes use of far greater bandwidths than the information Time capacity actually requires, in order to obtain sim- ple coders and decoders (modulators and demod- ulators). This may also have historical causes, The Hell System which has replaced ordinary since up to now modulation systems have been telegraphy is less sensitive to impulsive noise, developed without considering the concept of because each character is represented by a larger information capacity, and it is possible that by number of impulses. As the lecturer pointed out, using information theory, more suitable methods one therefore achieves better signal-to-noise may be found which are both economically and ratio, at the expense of bandwidth. technically efficient.

Garwick: Nevertheless, there is reason to believe that max- I would like to mention some examples which imum information capacity and optimum eco- illustrate how the transmission of unnecessary nomics only rarely will coincide. In cases where data provides a check of the accuracy of the bandwidth is cheap, such as in UHF systems, it transmitted signal. First, let us consider how the may be worthwhile to use large bandwidths in number 25 can be written in the binary system. order to save power by simple technical means. On the other hand, it is interesting that in cases The number is written as 11001. This is the min- where bandwidth is expensive, as in cable com- imum number of necessary characters. If one of munications, single sideband transmission has them is reproduced incorrectly, the number itself proved to be a suitable system which approxi- becomes incorrect. In a coded decimal system, mates closely the theoretical optimum for trans- the number can be written as 0010, 0101. The mission. Carson demonstrated the special char- first group represents the figure 2, the second acteristics of this system long ago, on the basis group the figure 5. In each group, there are some of other criteria. Based on information theory, character combinations which do not represent we can see his results in a broader perspective, a number. Thus, if such an impossible combina- as one of many possible optimal solutions. tion is received, this indicates that an error has occurred. Even safer systems can be achieved Knudtzon: with other methods of coding. This method is Finally, I would like to mention that in a com- used, for example, in a mathematical computer munications system containing noise it is possi- constructed at the Norwegian Defence Research ble to calculate various negative entropies: at the Centre. input to the channel, at the output from the chan- nel, and corresponding expressions for the con- Helmer Dahl: ditional probabilities between input and output. It might be interesting to stress the importance Many important conclusions can be derived on of the derived equation for information capacity the basis of these calculations. It can also be from a practical point of view. The equation demonstrated mathematically that the negative reads: entropy will always decrease on passing through H the system, and the average information can C W K = T = log2(1 + s) therefore never increase in an isolated system. This is completely analogous to the second law and represents the maximum amount of informa- of thermodynamics. tion that can possibly be transmitted in a given time interval, at a bandwidth W and signal-to-

138 Telektronikk 1.2002 Statistical Communication Theory A Brief Outline of the Problem1)

Graduate Engineer Nic. Knudtzon, M.N.I.F. Norwegian Defence Research Establishment, Bergen

This is a translation into English of the paper “Statistisk kommunikasjonsteori”, which appeared in Teknisk Ukeblad on November 16, pp. 883–887, 1950. The translation was done by Berlitz GlobalNET and final quality control was done by Geir R. Øien.

The schematic diagram in Figure 1 illustrates a The original idea was that the electrical function communication system, which consists of an of time representing the message at receiver-out, information source, channel and destination, should be an exact reproduction of the time with the channel consisting of , transmis- series for the message at sender-in. Conse- Dr. Nic. Knudzon (80) obtained sion and receiver. The information source gener- quently, what was desired was the elimination of his Engineering degree from the ates messages, which can either be discrete2), as the distance between the information source and Technical University of Norway, used in telegraphy, or continuous as in telephony. destination, hence the prefix “tele” (far off) in Trondheim in 1947 and his Doc- tor’s degree from the Technical The sender converts the message to a signal that telegraphy, telephone, television, telemetry etc. University in Delft, the Nether- suits the connection, which is a cable or radio It was expected that part of the signal output lands in 1957. 1948–1949 he connection. The receiver attempts to retrieve the would be lost in the channel, but that it had a fre- was with the Research Labora- tory of Electronics, Massachu- original message by carrying out the inverse pro- quency-dependent response just like all other setts Institute of Technology, cess of the sender. The destination is the person electrical networks was a surprise to most of the working with information theory or device that the message is bound for. communication engineers of the time. At that and experiments. 1950–1955 he was with the Norwegian De- time only slow telegraphy and telephony had fence Research Establishment, Three things characterise a communication sys- been developed and the systems’ frequency Bergen, working on the devel- tem of this type: characteristics were detailed based on practical opment of microwave radio links; and from 1955 to 1967 he tests, most of which were comprehensibility was Head of the Communica- a) It is to transmit information. tests. Due to the properties of these special types tions Division at Shape Techni- b)The equipment and channel have a limited of signals and the human ear, only the amplitude cal Center in The Hage, Nether- lands, where his efforts went into frequency band. response was taken into consideration, while the the planning of military telecom- c) Noise occurs, while at the same time signal phase response was almost completely ignored. munications networks and sys- strength is lost. A certain signal/noise relation- Unfortunately, this became customary and is tems in Western Europe. From 1968 to 1992 he was Director of ship is therefore required in order to achieve a often still the case today. This further led to elec- Research at the Norwegian Tele- specified message/noise relationship at the trical networks being studied exclusively in the communications Administration, destination. frequency range for many years, by means of so- working on the planning of future telecommunications systems, called “sinusoidal analysis” (steady state analy- networks and services. Dr. These facts may seem rather obvious to us sis). The network is then uniquely determined by Knudtzon has been member of today. However, it is only in recent years that we its system function H(jω), which can be defined government commissions and various committees, including have fully taken the consequences of them, and as the relationship between a signal’s frequency the Norwegian Research Coun- in so doing have arrived at a general communi- spectra at output and input. This is generally a cil, the National Council for cation theory. In order to see this more clearly complex-valued function. Research Libraries, the Inter- national Telecommunications we will briefly consider some fundamental Union, EURESCOM, etc. stages in the development of communication The next milestone of significance to our discus- techniques. sion was the discovery of the radio valve, which

Channel

Noise

Information Transmitter Receiver Destination source

Message Signal Signal Message + noise + noise

Figure 1 Communication system

1) Lecture presented at a study session for radio technology and electro-acoustics at Farris Bad, 16–18 June 1950. 2) Non-continuous.

Telektronikk 1.2002 139 Figure 2 Typical deformation of pulse in an electrical network In Network Out

Time Time

enabled the amplification of the signals. The frequency component. The samples are quan- losses could now be gained back, and this was tized in PCM, which means that only certain dis- of importance for transmission over both cables crete amplitudes are transmitted. The stages and radio. However, a new problem arose as between these are determined by the distortion different types of noise manifested themselves. that is thereby allowed. By sampling and quan- Fluctuation noise that enters via the aerial and tizing, we have managed to convert the continu- is generated in valves and resistors, occurs at ous message to a discrete message. all frequencies; the power is proportional to the bandwidth. In order to combat the noise, other Television was developed around the same time modulation methods were introduced in addition as the modulation methods. The television sig- to the amplitude modulation (AM), which was nals consist of a series of pulses; a typical exam- known from multiplexing channels, amongst ple of how such a pulse is deformed in an elec- other things. Common to all modulation meth- trical network is shown in Figure 2. It is ods is that a low frequency message can be extremely important that the pulses do not over- transmitted as a high frequency signal. Fre- lap. The development of the television therefore quency modulation (FM) was the first solution. led to studies in the networks’ responses in the This idea was first put forward because it was time domain for a pulse input of a suitable form. hoped that one could manage with narrower fre- It has proved to be beneficial in calculations to quency bands than with AM, so that the effect of use the so-called δ pulse, which is defined as fol- the noise would be reduced. However, the oppo- lows: site proved to be true, as practical tests showed that the broader the frequency band became, the δ (t) = 0 for t ≠ 0 more the message/noise relationship improved +∞ for the same transmit power. FM is thus an illus- ∫ δ (t) dt = 1 trative example of how fumbling progress has −∞ been made. After further experimentation, the different pulse modulation systems were devel- All other time series can be thought of as com- oped one by one; as we know, intense work is posed of such δ pulses. The network is subse- currently being carried out on pulse code modu- quently precisely determined by its δ pulse lation (PCM). Detailed investigations of all these response. systems show that for the same message/noise relationship at receiver-out, the sender’s average It may be appropriate here to explain the connec- signal power PS can be reduced when the band- tion between the system function H(jω) and the width W is increased, if only the signal/noise δ pulse response h(t) for an electrical network. relationship KS = PS / PN (reduced to sender-out) This has been done in detail in the Appendix. is higher than a certain threshold value. Thus, The results show that “sinusoidal analysis” and noise reduction requires a broad frequency “δ pulse analysis” are closely connected and can band. It should be added that when calculating be derived from each other. If we choose to the message/noise relationship for the different work in the frequency range, it has to be stressed modulation systems, only messages that are that the amplitude response and the phase sinusoidal functions of time have been so far response are equally important, and that they considered. cannot be specified independently of each other.

As is known, during pulse modulation, a series Servo mechanisms were particularly developed of the message’s ordinates are transmitted during and after World War II, as there was a instead of the entire continuous time series. This great need for control equipment for artillery process is known as sampling. It can be shown3) etc. The reason why development in this area that the message is completely and exactly deter- has happened so quickly, is that it quickly be- mined by the samples, if these are taken more came apparent that the problems had much in often than the equivalent of twice the highest common with those that were previously

3) “Information Theory” lecture presented at a study session for radio technology and electro acoustics at Farris Bad 15–18 June 1950. ETT no. 30, 1950.

140 Telektronikk 1.2002 Figure 3 The filter problem Message fu(t) fm(t) H(jω) t In Out h(t) t Noise Desired fd(t) f (t) n t

encountered in the network theory, and thus Let me now consider the filter problem illus- known results could be utilised. However, the trated in Figure 3. A message fm(t) and noise opposite effect has also taken place. An impor- fn(t) are input to a linear electrical network, tant problem that has had major consequences which we wish to construct so that the time occurred in connection with anti-aircraft guns, function fu(t) is as an exact reproduction of mes- where the target can move quickly and in irregu- sage-in fm(t) as possible. Completely exact ren- lar paths. It was therefore important to be able to dering is out of the question, due to stray capaci- predict the position of the target for the length of tance etc., which limits the frequency band, and time that it took for the projectile to reach it. In if this was not limited, we would get infinitely order to solve this problem a thorough study had strong noise. The best option is therefore to con- to be made of the aircraft’s path as a function of struct the system so that we get the least possible time. A pilot under fire will naturally steer in error between the actual time series fu(t) and the curved paths, but the freedom of movement is desired time series fd(t), which in this case with limited by the construction of the aircraft. Thus, the filter is identical to fm(t). We must first de- for example the radius of curvature cannot be cide what we mean by error. It is natural here made smaller than a certain size depending on that we specify this in the time domain. If we the aircraft’s design, weight and speed. The directly let [fu(t) – fd(t)], the error will be a func- flight path cannot therefore be regarded as tion of time. If we take the mean over a long known in advance, but the path’s statistical time, we see that positive and negative failures properties can be determined for the types of will cancel each other out and this is not what aircraft that are targeted. Consequently, this is we want. We bypass this problem by taking the what needs to be borne in mind when construct- mean of the square of the difference. This means ing the predictor. that the system will be determined by

We are faced with exactly the same problems in +T 1 2 communication engineering. The individual lim []f u (t) − f d (t) dt = minimum T →∞ 2T ∫ messages that we want to transmit over the sys- −T tem cannot be known in advance if they are to contain any information for the destination. This error criterion is physically sound for many There is therefore no point in transmitting a applications and leads to a mathematically solv- sinus function or a single δ pulse, as their time able problem. We will call the corresponding response is determined for all time. The message network statistically optimal. When solving the received could therefore be predicted immedi- problem mathematically, it is shown4) that the ately and the easiest thing would be not to send correlation functions for the classes of time it at all. “Sinus analysis” and “δ pulse analysis” series that occur at input and output, determine have subsequently no real justification. On the the network’s system function. The network will other hand, there is reason to emphasise that always be realisable and stable, as the require- only a limited number of signals can be transmit- ment for this is taken into consideration in the ted over a channel with a certain frequency band solution. It is also possible to derive an expres- and signal-to-noise ratio. The communication sion for the size of the error. Above we have system is therefore constructed for a class of considered the filter problem where fd(t) = fm(t). messages with certain known statistical charac- Generally, however, we can specify fd(t) as we teristics. For example, these could be amplitude wish, and decide the relevant network in a simi- distribution, correlation functions, power density lar way as for the filter. For a statistically opti- spectra, letter and word frequencies etc. We will mum differentiator we thus set fd(t) = fm'(t), and assume that the messages are statistically sta- for a statistically optimum predictor fd(t) = fm(t + tionary, i.e. that these characteristics are invari- α), where α is the desired prediction time. It is ant with respect to a shift of the time scale. This worth noting that we can thus, without difficulty, is the nature of fluctuation noise. demand a shift α in the time domain. The prob-

4) “Statistically Optimal Networks”, lecture presented at a study session for radio technology and electro-acoustics at Farris Bad, 15–18 June 1950. ETT no. 30, 1950.

Telektronikk 1.2002 141 lem of constructing statistically optimum net- ever, we are here primarily interested in the fact works for classes of time functions with known that it is possible (without time delay) to trans- correlation functions is therefore solved in the- mit comprehensible speech over considerably ory, and practical applications have also been narrower bands than those that are currently in found. It is apparent that the problem is posed use. This is done at the expense of the different very differently and is based on a much sounder information sources’ individual characters; with foundation than previously known filter theory. drastic bandwidth reduction the listener will understand well enough what is being said, but It is important to have a clear view of what we will not be able to determine whether the infor- really want to transmit. We shall use telephony mation source is male or female or happy or sad. as an example. Since the ear does not react to In many cases, for example in systems for trans- smaller distortions of the phase, an exact repro- mitting orders, weather reports etc., the speech’s duction of the microphone current is not essen- individual character is of little consequence. By tial. We know from experience that a frequency reducing the bandwidth we have thus managed band of at least 4,000 p/s is required to transmit to remove the redundancy in the message. Only speech of a reasonable quality, so that the user rarely will all the redundancy be removed since understands what the information source is this will help to increase the comprehensibility relaying, and also to some degree gets an if there are disturbances. Similar considerations impression of its emotions and feelings as is the can be made for other communication systems, case in direct speech. Since a very large part of for example for telegraphy and television. In the the conversations over the commercial telephone latter, transmitting the entire visible spectrum network convey emotions, emphasis here is put (3 × 1014 p/s) has never been suggested, but the on the transmitted message giving as close to the image is analysed, a code signal is transmitted same impression to the listener as if the informa- and the image is recovered in the receiver by tion source was addressing him directly. Practice synthesis; this is all in full analogy with the pre- has shown that the price of a telephone channel viously discussed vocoder telephony. The con- increases proportionately with the bandwidth, clusion is that communication systems must be and for financial reasons it is therefore desirable constructed based on knowledge of the informa- to reduce this. This can be done in a number of tion source, the destination and what is to be ways; we shall briefly discuss some of these transmitted. If information is being transmitted below. to or from a person we need to recognise the reaction of different messages for the type of a) Analysis-synthesis telephony. The micro- people that will use the system. Work is there- phone current is analysed in the sender, codes fore currently underway in the USA to measure are transmitted for the tone type and power the comprehensibility and quality of messages spectrum’s form, the codes then influence an to the ear and eye under different conditions. “artificial voice” (oscillator set) in the re- ceiver. The necessary bandwidth for the same Up to now we have used the expression informa- comprehensibility as that of an ordinary chan- tion without further discussion of what it actu- nel is approximately 400 p/s, i.e. reduction ally is. We have stated that we shall concentrate with a factor of 10. Systems of this type were the attention not so much on the time function, built before the beginning of the 1940s by Bell as on what the time function represents, and this Telephone Laboratories. They are known as what is information. We have furthermore indi- vocoders. cated that each message is selected from the class of messages that we wish to transmit; b)Sound code telephony. Speech is physiologi- information therefore has something to do with cally limited to approximately 50 different selection. The difficulty in trying to give a closer sounds, and someone that talks quickly can definition of what we mean is not only due to the barely express more than 5–6 per second. If fact that we have included people in our commu- every sound is given a code, a frequency band nication system. Even in cases where text is of approximately 40 p/s should be sufficient. transmitted from paper roll to paper roll in teleg- raphy (where people are not counted as either c) Sound group telephony. If we are only inter- information sources or destinations), we have ested in transmitting certain sound groups not yet been able to account for how much infor- (special sound combinations or words), the mation can be transmitted per second. Neither bandwidth for special systems can conceiv- have we been able to say how much information ably be further reduced by a factor of 10. is lost if 1% of the characters are transmitted incorrectly. We therefore need a measure (i.e. a If any of these methods of bandwidth reduction unit) of information, just like 1 volt is the unit are to be used, the terminal equipment will of for electric voltage. A unit of this type is course become more expensive so that they can defined5) and has been given the designation only be beneficial for long transmissions. How- 1 bit (or 1 Hartley).

142 Telektronikk 1.2002 With a unit of this type at our disposal we can be reduced, if KS, i.e. the sender output, is logically attack the problem of transmitting increased sufficiently. Conversely, an increase in information as effectively as possible. Let us the bandwidth will allow a lower sender output. consider a simple example from telegraphy. The This concurs exactly with practical experience. message consists of letters of varying frequency of occurrence. In Norwegian and English the let- In the introduction we mentioned the three ter E appears more often than for example Z; in things that characterise a communication sys- an efficient system we should therefore allow tem: information, frequency band and signal-to- the signal for E to be shorter than the signal for noise ratio. We have now arrived at a general Z, as in Morse code. In a telegraphy system for relationship between them. the Czech language, the messages will have other letter frequencies, and a different code is We have confined ourselves above to consider- therefore desired. Similar considerations can be ing electrical communication systems. The made for the other types of communication sys- points of view that have been expressed can tems. What is generally involved is converting however be used with regard to communication the message so that it statistically adapts to the systems in a much wider context. The science channel. This process is known as coding, and that generally deals with the control and trans- will generally require a time delay in the sender, mission of information in living beings and which consequently must be equipped with a machines, has been given the name Cybernetics memory of one type or another. In order to avoid (Greek for steersman) by its founder, Norbert misunderstandings, we draw your attention to Wiener. the fact that this form of coding must not be con- fused with the previously discussed simplifica- What is fundamentally new is that we have tion of the message by removing the redun- found telecommunications to be a statistical pro- dancy. cess and that a unit is defined for information. Whilst before we considered the different sys- It can be shown6) that a channel with bandwidth tems individually, we can now look at them col- W p/s and signal-to-noise ratio KS (reduced to lectively. This line of theought is very important; sender-out) during the time T can transmit a it opens up great possibilities and will undoubt- quantity of information: edly gain greater practical importance as we define statistical characteristics for the different H = T W log2 (1 + KS) bits classes of messages we want to transmit.

With optimal coding, the channel’s information Appendix capacity is thus Correlation between “sinusoidal C = H/T = W log2 (1 + KS) bits/sec analysis” and “δ pulse analysis” The system function H(jω) for a network can be C expresses the maximum number of bits per defined as the relationship between a signal’s second that it is possible to transmit over such a frequency spectrum at output Fu(jω) and input channel. There is thus an optimum that can eas- Fi(jω) ily be calculated but which in practice can only Fu ( jω ) be achieved with optimal coding. Generally Hj( ω ) = speaking however, it is also possible to calculate Fi ( jω ) how far from this optimum the individual sys- tems are. Previously, a communication system’s We select a δ pulse as the signal; it can be effectiveness for transmitting information could regarded as the limit shape for the pulse only be estimated by carrying out comprehensi- a 2 2 exp −a t when a → ∞, i.e. frequency bility tests, which is a subjective measurement π ( ) unless many different people are used to repre- spectrum-in, is in accordance with Fourier’s sent the information source and destination. integral formula Now on the other hand, we have obtined a tool for calculating (i.e. objectively calculate) a sys- +∞ ⎡ a 22 ⎤ tem’s information efficiency. Fji ()ω = lim⎢ exp −−at jω t⎥⋅ dt ∫ a →∞ () −∞ ⎣ π ⎦ From the expression above we see that for the or it is permitted to reverse the order of the limit same information capacity the bandwidth W can and the integral,

5) “Information Theory”, l.c. 6) “Information Theory”, l.c.

Telektronikk 1.2002 143 Fji ()ω = Literature +∞ Shannon, C E. A Mathematical Theory of a lim∫ exp()−−at22 jω t⋅ dt=1 (2) Communication. Bell Syst. Techn. J., 27, a →∞ π −∞ 379–423, 623–656, 1948.

The network’s δ pulse response is h(t), i.e. fre- Shannon, C E. Communication in the Presence quency spectrum-out of Noise. Proc. IRE, 37, 10–21, 1949. +∞ Fu ( jω ) = ∫ h(t)exp(− jωt)⋅dt (3) Wiener, N. Cybernetics. Wiley, 1948. −∞ Wiener, N. Extrapolation, Interpolation, and Consequently, with the insertion of (2) and (3) in Smoothing of Stationary Time Series. Wiley, (1) 1949. +∞ H( jω ) = ∫ h(t)exp(− jωt)⋅dt Halsey, R J, Swaffield, J. Analysis-Synthesis −∞ Telephony with Special Reference to the or according to Fourier’s integral formula Vocoder. Proc. Inst. Elec. Eng., III 95, 391–405, 1948. 1 +∞ h(t) = H( jω )exp( jωt)⋅dω 2 ∫ π −∞

There is therefore a unique relationship between H(jω) and h(t).

144 Telektronikk 1.2002 Statistically Optimal Networks

Summary of lecture presented at a study session for radio technology and electro- acoustics at Farris Bad, 16-18 June 1950 by Graduate Engineer Nic. Knudtzon, Norwegian Defence Research Establishment, Bergen

This is a translation into English of the paper “Statistisk optimale nettverk”, which appeared in Elektro- teknisk Tidsskrift 63 (30), 413–416, 1950. The translation was done by Berlitz GlobalNET and final qual- ity control was done by Geir E. Øien.

A more detailed presentation will be issued as a separate publication. Only a brief outline of the problem will be presented here without any details about the mathematical solution.

The individual messages (signals) that are trans- A filter separates a desired message from an mitted over a communication system cannot be unwanted message and noise. Previously, work Dr. Nic. Knudzon (80) obtained known in advance if they are to contain any was carried out in the frequency domain, and the his Engineering degree from the information for the addressee. The only thing filter’s critical frequencies were set more or less Technical University of Norway, that is known is certain statistical characteristics randomly based on practical tests, where it was Trondheim in 1947 and his Doc- 1) tor’s degree from the Technical for the class of messages we want to transmit. found which amplitude and phase response sepa- University in Delft, the Nether- rated the unwanted messages from the desired lands in 1957. 1948–1949 he Previously, electrical networks were constructed messages most effectively without any substan- was with the Research Labora- tory of Electronics, Massachu- on the basis of messages that were a sine curve tial distortion. “Most effective” and “substan- setts Institute of Technology, or a single δ pulse. The responses for these time tial” were however based on subjective assess- working with information theory series are determined forever, and will not trans- ments. The calculations, therefore, were not and experiments. 1950–1955 he was with the Norwegian De- mit any information. These methods are not founded on any firm basis. Gradually the need fence Research Establishment, therefore based on realistic situations. However, for more complete solutions increased, and thus Bergen, working on the devel- all other time series can conceivably be built up the television problems were created, which due opment of microwave radio links; and from 1955 to 1967 he of such sine curves or δ pulses. We have been to the nature of the message were different from was Head of the Communica- aware of this for a long time, but until recently those previously encountered in telephony and tions Division at Shape Techni- we have neglected to take account of the individ- telegraphy. We were then led to proceed tenta- cal Center in The Hage, Nether- lands, where his efforts went into ual components’ statistical weight, i.e. how tively with finding usable solutions in this new the planning of military telecom- often the individual frequency components or δ field. Communication engineering was in many munications networks and sys- pulses appear in the messages. When we charac- ways an art, since not only the principle of the tems in Western Europe. From 1968 to 1992 he was Director of terise an electrical network, we should combine constructions but their application depended a Research at the Norwegian Tele- the system function H(jω) and the δ pulse lot on the engineer’s ingenuity and “good communications Administration, response h(t) with the statistical characteristics nature”, whereas the quality could be discussed working on the planning of future telecommunications systems, for the class of messages the network is to trans- depending on personal taste. With the wealth networks and services. Dr. mit. As we know, the following unique relation- of experience we have gradually gained in this Knudtzon has been member of ship exists between an electrical network’s sys- way, very good and efficient networks have government commissions and various committees, including tem function and δ pulse response undoubtedly emerged. However, we have still the Norwegian Research Coun-  +∞ not been able to judge how efficient these net- cil, the National Council for H(jω)= h(t)ε−jωtdt works are, because the optimum has remained Research Libraries, the Inter- −∞ national Telecommunications  unknown. Moreover, we have been led to pro- 1 +∞ Union, EURESCOM, etc. h(t)= H(jω)εjωtdω ceed tentatively each time a new type of problem 2π −∞ has turned up. In brief, we have missed a general and well-founded theory for networks that will

Message fu(t) fm(t) H(jω) t In Out h(t) t Noise Desired fd(t) f (t) n t

Figure 1 The network problem

1) See also “Statistical communication theory. A brief outline of the problem”, Teknisk Ukeblad, 1950.

Telektronikk 1.2002 145 fi(t) fd(t) network As we already know, the following relationship f (t) + f (t) f (t) filter exists between time functions-out fu(t) and time m n m 3) functions-in fi(t) in a linear electrical network f t f t m( ) m'( ) differentiator  +∞ f (t)= f (t − τ)h(τ)dτ f (t) f (t + α) predictor u i (2) m m −∞ f (t) + f (t) f (t + α) filter-predictor m n m where h(t) is the δ pulse response.

Upon insertion of equation (2) in (1), the condi- tion for obtaining a minimum is attained on the transmit information of different kinds. The following form problem and methods we are now dealing with  +∞  +∞ constitute a step in the development towards this E = h(τ)dτ h(σ)d(σ) −∞ −∞ goal.    1 +T lim fi(t)fi[t − (σ − τ)dt] We will now formulate the network problem in T →∞ 2T −T the time domain, see Figure 1. The messages are     +∞ 1 +T regarded as being continuous functions of time, −2 h(τ)dτ lim fi(t − τ)fd(t)dt T →∞ 2T where the redundancy is removed to the extent −∞ −T    we wish. The information in a message of this 1 +T type can easily be calculated.2) Our problem is + lim fd(t)fd(t)dt T →∞ 2T −T now to create a network in such a way that for = minimum certain classes of time functions-in fi(t) we get (3) time functions-out fu(t) which deviate as little as possible from the desired time functions fd(t). The expressions in the brackets are all of the The deviation is defined in the time domain as form  the mean squared error, i.e. 1 +T ϕ (χ)= lim f (t)f (t ± χ)dt  +T 12 1 2 1 2 T →∞ 2T −T E =lim [fu(t) − fd(t)] dt = minimum (1) T →∞ 2T −T This function is known as the correlation func- Networks that are constructed based on this min- tion; for f1(t) = f2(t) we get auto-correlation, and = imum condition are defined as statistically opti- for f1(t) f2(t) we get cross-correlation. mal. Some examples of these networks are given in the table below, where fm(t) represents a mes- The auto-correlation function ϕ11(χ) is a very sage and fn(t) represents noise or an unwanted important statistical parameter for the associated message – both have known statistical character- time function f1(t). Its most important property is istics. expressed in Wiener-Khintchine’s theorem: ϕ11(χ) is the Fourier transformation of the power It is worth noting that we can specify a time shift density spectrum φ11(jω), which is defined as the α in the time domain, and are thereby able to mean output per frequency unit for f1(t). This is construct networks, known as predictors, which expressed mathematically as follows:  can predict the outcome for fm(t). +∞ Φ11(jω)= ϕ11(χ)cosωχdχ (3a) It is assumed in the following that −∞ 1)all time functions-in are statistically station-  ary, i.e. their statistical characteristics are 1 +∞ invariant upon a translation in time of the ϕ11(χ)= Φ11(jω)cosωκdω (3b) 2π −∞ associated time function. Commonly occur- ring messages and noise will most often fulfil Corresponding relations apply to the cross-corre- this condition. lation function. Equation (3a) forms the key equation for the correlation method: the power 2)the network is linear. This condition is en- density spectrum can be determined via the cor- forced due to the mathematical difficulties relation function, which is often easily calcu- when solving non-linear problems. A solution lated based on its definition equation by means of the linear problem is however better than of statistical methods. none at all.

3) See for example “Theory of Servomechanisms”, R.L. Series No. 25, p 35. 2) “Information theory”, lecture presented at a study session for radio technology and electro-acoustics at Farris Bad, 16–18 June 1950. ETT no. 30. 1950.

146 Telektronikk 1.2002 Upon insertion of the correlation functions, As an example we will find the special system equation (3) takes the following form: function for a predictor. Here is

 +∞  +∞ E = h(τ)dτ h(σ)d(σ)ϕii(τ − σ) f (t) = f (t) and f (t) = f (t + α) −∞ −∞ i m d m  +∞ −2 h(τ)dτϕid(τ) − ϕdd(0) = minimum (4) where α is the prediction time. −∞

Consequently Here the correlation functions are known for the classes of time series we will deal with, and the Φii(λ) = Φmm(λ) network is selected so that its δ pulse response h(τ) makes the error E minimum. This is a prob- and it can further be shown that lem of variation, which is solved by varying h(τ) λα and using Euler-Lagrange’s condition for Φid(λ) = Φmm(λ)ε extremal values. Equation (4) will hereby be reduced to Wiener-Hopf’s integral equation so that, after equation (6)  +∞ h(σ)dσϕ (τ − σ) − ϕ (τ)=0  ii id 1 +∞ −∞ H(λ)= ε−λtdt (V ) 2πΦ (λ) 0 for τ>0 (5) mm  c+j∞ (V ) w(t+α) Φmm(w)ε dw c−j∞ (7) When solving this equation, consideration is given to the fact that the network must be stable. The condition for this is h(τ) = 0 for τ < 0, or We will apply this equation to a simple example. expressed in the frequency domain, that the sys- Let fm(t) be a time series as shown in Figure 2, tem function H(λ) does not have poles inside the consisting of a series of identical pulses of the right half plane. form tε-t. The single pulses are totally indepen- dent of each other and are random, there are k The system function for the statistically optimal per unit time. The correlation function can be network can then be shown to be shown to be  1 ∞ k H(λ)= ε−λtdt ϕ (τ)= ε−|τ| (1+ | τ |) (V ) mm 4 2πΦii (λ) 0  c+j∞ Φ (w) id εwtdw and by applying equation (3a) the power spec- (H) c−j∞ Φii (w) (6) trum is found as k 1 ϕ (jω)= where λ = σ + jω and w = x + jy are complex- mm 2π (1 + ω2)2 valued variables. The power spectrum for time (H) series-in Φii(λ) is split into factors Φii (λ) and (V) Φii (λ), which have all poles and zeros in the right and left half respectively, i.e. This is factorised as follows k 1 1 (H) (V ) Φ (λ)=Φ (λ) · Φ (λ) ϕmm(jω)= ii ii ii 2π (1 + jω)2 (1 − jω)2

When H(λ) is thus determined, we will find a and this results in  configuration of common network elements that k 1 Φ(V ) (w)= has this system function. This is known as syn- mm 2π (1 + w)2 thesis and there are several methods for solving this problem. It must be pointed out that the syn- thesis problem does not have a unique solution; there are several different configurations that have one and the same system function. Figure 2 Signal

It is also possible to find an expression for the size of the error. Just like the system function, te-t this is completely and exclusively determined by the correlation functions (or their Fourier trans- forms, which are the power spectra) for the classes of time functions that occur at input and Time which are desired.

Telektronikk 1.2002 147 α Figure 3 Predictor circuit for tion noise, see the block diagram in Figure 4. the signal in Figure 2 The noise source was a radio tube 6D4, and the filter a single resonance circuit of fo = 1080 p/s and Q varying in stages from 10 to 90. The auto- 1 + α correlation function for such filtered noise can be calculated; it will be a damped cosine func- tion with frequency fo and attenuation propor- tional to Q. The optimal predictor was then con- structed based on Equation (7). Figure 5 shows an example of the prediction when Q = 10, the solid curve represents the time function at pre- dictor-out and the dotted one at predictor-in. This results in the predicted function following nicely the actual function. Such predictors can be envisaged to have a bearing on the suppres- Osc. sion of noise in communication systems. Noise Filter Predictor scope source 2 beams Similar examples can be found for other types of statistical optimal networks: filters, differentia- tors, compensators etc, which are determined by Equation (6) upon due specification of the Figure 4 Demonstration of predictor for noise desired time series fd(t). This equation has there- fore a very general validity.

Literature 1 Wiener, N. Extrapolation, Interpolation and Smoothing of Stationary Time Series. The Technology Press, John Wiley, 1949.

2 Lee, Y W, Wiesner, J B. Correlation Func- tions and Communication Applications. Electronics, 23, 86–92, 1950.

3 Lee, Y W, Stutt, C A. Statistical Prediction of Noise. Proc. Nat. Elec. Conf., Chicago, 5, 342–365, 1949.

Excerpts from the Discussion After the Lecture Garwick: Figure 5 Example of prediction of filtered fluctuation noise 1. Derivation of the system function H(jω) based on the delta function is not mathematically correct. The delta function is not a mathemati- cal function; it does not satisfy Dirichlet’s def- inition of a function, and cannot be integrated, either as a Riemann or a Lebesgues integral. The system function for the optimal predictor for When using delta functions, the order of the this time series is then limit and integral sign are swapped. The limit   1 +∞ c+j∞ εw(t+α) must be carried out before the integral sign. It H(λ)= (1 + λ)2 ε−λtdt dw 2 2π 0 c−j∞ (1 + w) is possible that the result will be correct regardless, but this must be checked using or calculated, as the complex integration is per- other methods. formed using the residue theorem, as 2. Where the error size E was discussed, there H(λ) = ε-α(1 + α + αλ) was absolutely no mention of the transmission channel’s noise. Is the intention to imagine This is a network as shown in Figure 3. No that the signal plus noise at input are already guidelines have previously existed for the con- strengthened so much that the channel’s set struction of predictors. noise is negligible?

Lee & Stutt [3] have carried out a practical demonstration of predictors for filtered fluctua-

148 Telektronikk 1.2002 3. The equation: when recapitulating known results from the  +T 1 2 network theory and could not take time to dis- E =lim [fm(t) − fd(t)] dt T →∞ 2T −T cuss this in more detail here on the black- cannot always be used. f(t) will in practice be board, so as to avoid displacing the topic of almost equal to zero for t > T1 and t < T2. The my lectures. integral can then be divided in three:    1 +T1 1 T2 1 T 2. Linear networks are assumed. The noise from []2 dt + []2 dt + []2 dt 2T 2T 2T −T T1 T2 the network is reduced therefore to the input.

When T goes to infinity, this expression goes 3. Statistically stationary time series are ex- to zero regardless of the network. pressedly assumed. The equation is mathemat- ically correct. Knudtzon: 1. Communication engineers and mathemati- C.B. & H.G.W. cians will probably never agree on the δ func- tion. I refer to a very sound and worthy article by van der Pol.4) I only used the δ function

4) Philips Research Reports 1948, pp 174–190.

Telektronikk 1.2002 149 150 Telektronikk 1.2002 Special

Telektronikk 1.2002 151 152 Telektronikk 1.2002 Multiple Bottom Lines?1) Telenor’s Mobile Telephony Operations in Bangladesh ARVIND SINGHAL, PEER J. SVENKERUD AND EINAR FLYDAL

The present article distills lessons learned about sound business and corporate social responsibility practices from Telenor’s participation in mobile telephony operations in Bangladesh. Telenor’s mobile telephony operations in Bangladesh provide valuable lessons about how corporations can strategically make forays in uncharted markets, and, while doing so, creatively seek and meet multiple co-existing bottom lines. Telenor is financially, intellectually, and structurally “richer” through its Bangladesh venture, gaining significant new insights and experiences in doing business in “distant” geographies, developing new culturally-based “benchmarking” standards, and experimenting with new business models that integrate telephone “ownership” with “access”. Further, by partnering in Bangladesh with an internationally-known socially-driven development organization (i.e. the Grameen Bank), Telenor

Dr. Arvind Singhal (39) is Presi- has strategically gained global visibility in both corporate and social sectors. dential Research Professor and Scholar in the School of Inter- personal Communication, Col- lege of Communication, Ohio The social responsibility of business is to I want my fellow Americans to know that the University, where he teaches increase profits. people of Bangladesh are a good investment. and conducts research in the Milton Freidman, Nobel-prize-winning With loans to buy cell phones, entire villages are areas of diffusion of innovations, mobilizing for change, design economist (quoted in Hood, 1996, p. 16). brought into the information age. I want people and implementation of strategic throughout the world to know this story. communication campaigns, and Corporate social responsibility is not an occa- U.S. President Bill Clinton in an address dur- the entertainment-education communication strategy. Dr. sional activity. It is not like visiting a mausoleum ing his meeting with members of the Village Singhal is co-author of four once a year, or hearing a church sermon every Phone Project in Dhaka, Bangladesh in books and has won several Top Sunday. It has to be completely integrated with March, 2000. Paper Awards. He has won the Baker Award for Research at the corporation’s business function. Ohio University twice, and Muhammad Yunus, Managing Director of the The present article investigates2) the mobile tele- numerous other teaching and Grameen Bank, in a personal interview (May phony operations in Bangladesh of Telenor, the research recognitions. Dr. Sing- hal has served as a consultant 2, 2001). leading Norwegain telecommunication com- to the World Bank and UN pro- pany. Telenor has forged a business and strategic grams, as well as private corpo- The mobile phone is like a cow. It gives me social change partnership with the Grameen rations. [email protected] “milk” several times a day. And all I need to do Bank, one of the best known development orga- is to keep its battery charged. It does not need nizations in the world. A historical background to be fed, cleaned, and milked. It has now con- on Telenor’s involvement in Bangladesh is pro- nected our village with the world. vided. Telenor’s business and social accomplish- Parveen Begum, owner and sole dispenser ments in Bangladesh are presented, highlighting of mobile telephony services in Village how a corporation can pursue multiple bottom Chakalgram, Savar Thana, Bangladesh, in lines. a personal interview (May 2, 2001).

Peer J. Svenkerud (36) is Direc- tor of Stakeholder Relations, Telenor ASA. Svenkerud holds a PhD in organizational communi- cation with specific focus on intercultural issues from Ohio University, and has conducted postgraduate studies at Harvard Business School. He is author of numerous peer-reviewed articles in international journals, and top academic papers at interna- tional conferences. His research interests center around informa- tion diffusion campaigns and the social impact of new commu- nication technologies. Svenke- rud was formerly director of Bur- son-Marsteller in Oslo, and Assistant Professor at the Uni- versity of New Mexico. peer- This grocery shop is now the village information hub: The village phone is here during opening [email protected] hours, and otherwise with the shopkeeper’s wife at home

Telektronikk 1.2002 153 Telenor Goes to Bangladesh Norway. While Mr. Hermansen’s top advisors How did Telenor get involved in Bangladesh? were “torn” about whether or not to foray into To fully answer this question, a little background Bangladesh, Mr. Hermansen was enthusiastic, on the Grameen Bank operations in Bangladesh and provided patronage for the project to move is useful. The Grameen (rural) Bank, founded in forward4). Bangladesh in 1983 by Professor Muhammad Yunus, is a system of lending small amounts of So, in 1996, Telenor and Grameen Telecom money to poor women so that they can earn a formed a joint venture company called living through self-employment. No collateral is GrameenPhone Ltd (GP). Telenor provided 51 needed, as the poor do not have any. Instead, the percent of the equity investment, Grameen Tele- women borrowers are organized in a group of com provided 35 percent, Marubeni of Japan five friends. Each group member must repay provided 9.5 percent, and Gonofone Develop- their loan on time, while ensuring that other ment Corporation of USA provided the balance group members do the same, or else their oppor- 4.5 percent. The Company, GrameenPhone, was tunity for a future loan is jeopardized. This deli- awarded license to operate nation-wide GSM- cate dynamic between “peer pressure” and “peer 900 on November 11, 1996. Einar Flydal (53) is cand.polit. support” among Grameen borrowers is at the GP started its operation on March 26, 1997. from the University of Oslo, 1983 heart of its widespread success (Yunus, 1999). in pol. science, and a Master of By December 2001, the Grameen Bank loaned In launching its Bangladesh operations, Telecom Strategy from the Uni- versity of Science and Technol- money to about 2.4 million poor women borrow- GrameenPhone knew that its commercial viabil- ogy (NTNU), Trondheim, 2002. ers, and had an enviable loan recovery rate of 95 ity depended on meeting the large unmet need Apart from 1985–1993, when percent. The idea of micro-lending, based on the for telephony services in urban areas. Since its engaged in IT in education for the Ministery of Education and Grameen Bank experience, has spread through- inception in 1997, GrameenPhone’s subscription with small IT companies, Flydal out the world, and has everywhere proven effec- has doubled each year to reach 500,000 sub- has worked with Telenor in a tive in gaining a high rate of repayment of the scribers by December 2001. The company variety of fields since 1983. He has also worked as a radio free- loans. turned a profit three years later in 2000, with lancer on minorities and music; even brighter business prospects ahead: Demand on statistical indicators at the In the mid-1990s, the Grameen Bank began dis- for mobile telephony services in Bangladesh is Chair of Peace and Conflict Research, Univ. of Oslo, and on cussions with various mobile telephony opera- estimated at about 5 to 6 million subscribers work organisation on oil plat- tors around the world, including Telenor, to (out of a population of 130 million people). forms at the Work Research accomplish its vision of placing one mobile GrameenPhone’s growing mobile telephony net- Institute, Oslo. Present profes- sional interests: environment, phone in each of the 68,000 villages of Bangla- work in the country, and its financial viability, CSR, innovation, and ICT. desh. At that time, there was one telephone in helps the Grameen Telecom’s Village Phone [email protected] Bangladesh for every 400 people, representing Project to piggyback on it. one of the lowest telephone densities in the world3). There was virtually no access to tele- GrameenPhone sells air time in bulk to Grameen phony services in rural areas, where 85 percent Telecom to re-sell it to members of Grameen of Bangladesh’s 130 million people lived. Pro- Bank in villages. The eventual goal is for one fessor Yunus realized that while it was not possi- Grameen borrower in each of the nation’s ble for each rural household to own a telephone, 68,000 villages to become the “telephone lady” it was possible through mobile telephone tech- for her village. Some 10,000 villages have been nology to provide access to each villager. To covered until February 2002. The village tele- operationalize his vision, Professor Yunus estab- phone lady operates a mobile pay phone busi- lished a non-profit organization called Grameen ness, with the cheapest cellular rate in the world: Telecom. 9 cents per minute during peak hours and 6.7

Telenor CEO Tormod Hermansen was intrigued by Professor Yunus’ idea of the village tele- phone, and believed that Bangladesh, given it only had 500,000 fixed line telephones in urban areas, presented a significant business opportu- nity for Telenor. Mobile telephony services could address the large unmet demand for tele- phony in Bangladesh, where the waiting period for a private fixed line connection was ten years. While there were significant “first-mover” advantages to be gained, the business risk was extremely high, given the unpredictable nature of Bangladesh’s political and regulatory envi- ronment. Telenor had previously never con- More than 10,000 village phone ladies can show ducted business in a developing country in you with pride their new means of income and Asia, and Bangladesh seemed aeons away from social advancement: the mobile phone

154 Telektronikk 1.2002 cents in the off-peak. Her “mobile” presence market value now, estimated by its manage- means that all village residents can receive and ment at a modest $600 to $800 per subscriber, make telephone calls, obviating the need to is $300 to $400 million (U.S.). These numbers install expensive large-scale telephone ex- suggest that GrameenPhone’s present value to changes and digital switching systems. Telenor is about six to seven times its majority (51 percent) equity investment of $40 plus Strategic Importance of million. GrameenPhone The Telenor – GrameenPhone venture is of Contribution to the Bangladesh tremendous strategic importance to Telenor for Economy at least two compelling reasons: • GrameenPhone, to date, has invested $160 million in Bangladesh, making it the largest 1 GrameenPhone was a majority stake, start-up foreign private investor in the country. venture for Telenor, as opposed to it purchas- ing a minority holding in an already estab- • GrameenPhone has to date contributed $75 lished telecommunications business venture million to the national treasury of Bangladesh (as is the case with Telenor’s involvement in in the form of new telephony tariffs, license Thailand and Malaysia). So Telenor was fees, fees for leasing of the fiber-optic line, involved in launching GrameenPhone from and other such receivables. day one, thus experiencing the full gamut of pioneering experiences. • By February 2002, GrameenPhone has directly created about 600 jobs (its employee 2 The Bangladesh venture, metaphorically strength) internally and 10,000 jobs externally speaking, was as “distant” as could be from in 10,000 Bangladeshi villages through the Telenor’s past business ventures. Here was an Village Phone Project of Grameen Telecom. established, affluent, Norwegian corporation, on the cutting-edge of telecommunications Service Provision in Unserved and technology, adept at doing business in a stable Underserved Areas political and regulatory environment, and in a • The 10,000 village-based mobile phones, country where the telephone density is the leased or purchased by women members (also highest in the world, establishing a business called “village telephone ladies”) of the venture in a far-away, fledgling, unserved and Grameen Bank through a loan, serve 18 mil- underserved Bangladeshi market, where the lion rural inhabitants, who previously did not telephone density is about the lowest in the have access to telephony services. By the end world, and the political and regulatory envi- of 2004, the number of village phones will ronment is relatively unstable. When one likely increase to 40,000, serving an estimated adds to this the social and cultural “distance” 75 to 80 million rural inhabitants, about two- between Norwegians and Bangladeshis, one thirds of the entire Bangladesh population. realizes this venture was bound to yield signif- icant new learnings for Telenor. • The village phones, on average, generate 3–4 times more revenues for GrameenPhone than Multiple Bottom Lines an individual use city/township subscription. Telenor’s involvement in the GrameenPhone mobile telephony project in Bangladesh has • The village telephone ladies, on average, yielded an impressive list of business and social make $70 to $80 per month of net profit from accomplishments: selling mobile telephony services in rural areas, which amounts to three times the per Market Penetration and Return on Equity capita GNP of Bangladesh. • As noted previously, GrameenPhone’s sub- scription has more than doubled each year to • In overall terms, the Village Phone Project reach 500,000 subscribers by December, (VPP) makes telephony services accessible 2001, which represents the biggest subscriber and affordable to poor, rural Bangladeshis, base and coverage of any mobile telephony spurs employment, increases the social status operator in Bangladesh, and in the entire of the village telephone ladies, provides South Asia region. Mobile telephony users5) access to market information and to medical (650,000) in Bangladesh in December 2001 services, and represents a tool to communicate outnumber the country’s fixed-line telephone with family and friends within Bangladesh subscribers (590,000). and outside. Studies indicate that the VPP has had a tremendous positive economic impact • GrameenPhone earned a profit in both 2000 in rural areas, creating a substantial consumer and in 2001 and is strategically positioned in surplus, and immeasurable quality-of-life 2002 for “explosive” growth. The company’s benefits (Richardson et al., 2000; Bayes et al.,

Telektronikk 1.2002 155 In towns, the phone service telephony than in Western countries. Also, shop may often be split Asian cultures, by virtue of their “collectivis- between a front office for tic” orientation and extended kinship struc- men, and a back entrance for tures, spur more frequent telephone talk be- women. The phone lady in tween family members and friends, and for front of her books show people extended durations. that gender patterns can be changed Teleaccess Business Models • In relatively underserved and unserved tele- communication markets, business potential should be evaluated not just on the basis of teledensity (or potential ownership) but also on teleaccessibility (or potential for providing access to those who cannot afford to own a subscription), as evidenced by the large reach of the Village Phone Project.

Choice of Partner • In relatively “uncharted” markets or new geographies, it is of critical importance to choose a suitable local partner who adds 1999). For instance, the village phone now long-term complementary value. Telenor’s obviates the need for a rural farmer to make a major partner in Bangladesh is Grameen Tele- trip to the city to find out the market price of com, a non-governmental organization floated produce, or to schedule a transport pick-up. by the internationally-acclaimed microlending The village phone accomplishes the task at institution, Grameen Bank, which has 2.4 about one-fourth the transportation costs and million borrowers in 41,000 of the 68,000 almost instantaneously (as compared to the Bangladeshi villages. In Bangladesh (and hours of time it can take to make the trip), overseas), “Grameen” has tremendous brand serving as a boon to the rural poor. equity by virtue of its widespread success in poverty alleviation, empowerment of rural Gains in Intellectual and women, and its well-known credo that “good Structural Capital development is good business” (the slogan of What gains in intellectual and structural capital the the Village Phone Project). have accrued to Telenor through its involvement in GrameenPhone? Many in Bangladesh feel that the “Grameen” brand is far more recognized in Bangladesh Pioneering Experience in than even Coca Cola! So branding the new Uncharted Markets venture “GrameenPhone” brought instant • Reaching out to a relatively underserved and credibility to Telenor’s business venture in unserved telecommunications markets, while Bangladesh. Also, Telenor’s partnering with relatively risky for an organization like Grameen Telecom made possible the Village Telenor, is a viable business proposition espe- Phone Project, whereby Grameen Bank bor- cially if it wishes to expand operations geo- rowers who take loans to lease or purchase the graphically, and pioneer in gaining new busi- mobile telephone sets now settle their monthly ness competencies. Being an early entrant in telephone bills through the bank workers. The the field of mobile telephony in a relatively already existing village-based loan disburse- unserved and underserved telecommunica- ment and repayment infrastructure of the tions market bodes well for Telenor to main- Grameen Bank allows for handling the logis- tain its dominant market leader position in tics of the Village Phone Project at a very Bangladesh, and to forge new opportunities small, additional marginal cost. As noted pre- elsewhere. viously, while the Village Phone Project rep- resents some 10,000 subscribers (2 percent of Rethinking Benchmarks GrameenPhone’s 500,000 subscribers), and a • Telenor has learned that conventional Euro- relatively small percent of its revenues (6 to 8 pean benchmarking for estimating market percent), it yields very high social impact in potential – using measures of per capita GNP terms of reaching 18 million rural Bangla- or Western patterns of telecommunications deshis who previously did not have access to traffic – may be inappropriate or at least telephony services, resulting in significant inadequate in the Asian context. In Bangla- quality-of-life enhancements for them. desh, for instance, people spend a much higher percent of their disposable income on

156 Telektronikk 1.2002 • In an “uncharted” market or “new geogra- personnel. One employee can now activate up Women’s network group phy”, a local partner can also play a signifi- to 2,000 telephone subscriptions a day, as leaving the Grameen Bank cant role in familiarizing an organization like compared to a paltry 150 previously. Telenor weekly micro-repayment Telenor with the various political, regulatory, is presently sharing this GrameenPhone cus- meeting. Grameen Bank social, and cultural uncertainties, and helping tomer activation software, through a CD- village facilities: building in to cope with them. ROM, in other geographies. corrugated iron sheets – itself a symbol of progress Two-Way Learning Human Resources for a • Telenor’s foray into Bangladesh has not been Global Marketplace a one-way flow of capital, technology, and • Over 50 Telenor officials have spent varying organizational structures from Norway to periods of time in Bangladesh, gaining invalu- Bangladesh. Rather, technology-transfer, able experience in living and conducting busi- knowledge sharing, and capacity building ness in a foreign nation’s political, regulatory, have occurred in both directions, accruing bureaucratic, social, and cultural environ- significant benefits for Telenor. Telenor has ment. Such experiences, laced with all kinds helped create a corporate culture at Grameen- of uncertainty, adjustment, acculturation, and Phone that is perceived by its Bangladeshi new learnings, contribute significantly to a employees as being democratic, relatively corporation’s human resources in a global non-hierarchical, merit-based, and gender-sen- playing field. sitive. Also, Telenor pioneered in Bangladesh the idea of integrating health, safety, and envi- Lessons for Business and Cor- ronmental issues in its business practices: For porate Social Responsibility instance, it has immunized all its employees In addition to an impressive list of commercial against the Hepatitis-B virus, and fields a doc- and social accomplishments in Bangladesh, tor in its corporate office who provides medi- tremendous public relations and promotional cal consultation to employees and regularly benefits accrue to Telenor by cooperating with conducts training programs on occupational the Grameen family of companies, which repre- and health safety. In turn, employees of sent an icon of development organizing to the GrameenPhone in Bangladesh have pioneered outside world. When the world’s leading several operational innovations in Bangladesh agenda-setter, the U.S. President (at that time, that hold tremendous value for Telenor in its Bill Clinton) visits with the village telephone greenfield companies and other established ladies in Bangladesh and hails the integrated markets. For instance, the Customer Relations business and social aspects of their venture (as Division of GrameenPhone, in-house, devel- expressed in the statement listed at the top of oped software that cuts down the customer this case study), mass media, policy-makers, phone activation procedure from 19 computer corporations, and the public all over the world keystrokes to two keystrokes. This innovation take notice. has significantly enhanced employee produc- tivity, obviating the need for hiring additional

Telektronikk 1.2002 157 Specifically, Telenor’s foray into Bangladesh integrating CSR as a competitive asset in highlights the following lessons for its business all business ventures. Hence, true corporate and corporate social responsibility functions: social responsibility means integrating all business functions with a social imperative, #1 Sound business means subscribing to multi- and measuring the effectiveness of the CSR ple, co-existing, and mutually-reinforcing function not just by what Telenor has (win-win) bottom-lines, which also implies achieved to date in Bangladesh, but what acting as a socially responsible corporation. more can it achieve in the long-term. In Bangladesh, Telenor’s multiple bottom- lines included: At the present time, Telenor’s operations in Bangladesh have centered around only one of its • Meeting commercial interests in terms of core competencies, i.e. mobile telephony. How- revenues, profits, and growth. ever, every aspect of Telenor’s business (Internet services, communication satellites for narrow • Meeting social cause-related interests in and broadband services, interactive Web-based terms of serving unserved and underserved services, cable television, telemedicine, fixed markets nationally, and also serving poor, and mobile telephony services, and others) holds rural, illiterate inhabitants who are tradi- a strategic business potential in Bangladesh, and tionally excluded from traditional markets in other developing country markets. How can (thus overcoming the digital divide). this business potential be tapped and leveraged in new geographies? • Gaining substantial amount of experience in overseas operations by doing business Telenor’s well-established partnership with the in a “distant” geography and an unfamiliar Grameen Bank – which has launched several market, which helps build intellectual and new information technology companies such as structural capital for future ventures. Grameen Telecom, Grameen Communications, Grameen Software, Grameen Cybernet, • Gaining in “image” and “prestige” by Grameen Shakti (power), and others – positions partnering in a unique commercial and Telenor, like no other corporation in the world social experiment with an internationally to experiment with new initiatives in E-health, acclaimed branded local partner, the E-education, E-commerce, E-banking, and other Grameen Bank. The “value” of endorse- services that may have a long-term business as ments from such world luminaries as U.S. well as a social function. For instance, Grameen- President Bill Clinton, or the “value” of the Phone’s has 1,800 kilometers of available opti- GSM Community Award bestowed on the cal fiber (leased from Bangladesh Railways), Village Phone Project during the GSM which to date has been barely utilized. Can World Congress in Cannes, France, in Telenor leverage its relationship with the 2000, are hard to gauge in pure economic Grameen family of companies to develop new terms, pointing to the value of recognizing business ventures with a social imperative? multiple bottom-lines. To profit further on the lessons learned, should #2 Corporate social responsibility does not Telenor seriously consider looking at Bangla- mean merely “showcasing” one initiative desh as the prime location for establishing an (such as the Bangladesh case), but rather independent R&D and/or a Business Develop-

GrameenPhone’s main income source is the cities. Here publicity boards in one of Dhaka’s most fashionable hotels, shown us by GrameenPhone’s information officer, Yamin Bakht

158 Telektronikk 1.2002 A new initiative: One of several Grameen IT education centers, providing the resources for self reliant software development and distance services like secretarial, programming, punching etc.

ment Center, mainly hiring talented Bangladeshi of business and corporate social responsibility personnel to experiment with new initiatives in functions, have resulted in an exemplary first E-health, E-commerce, E-education, E-banking, mile in a long marathon race. Will Telenor con- and other need-based applications to establish sider running the full race? ventures in one of the most “unserved” and “underserved” world markets? With its existing References physical presence in Bangladesh through Bayes, A, von Braun, J, Akhter, R. 1999. Village GrameenPhone, an already established relation- Pay Phones and Poverty Reduction. Bonn, Ger- ship with a branded local partner in the Grameen many, Center for Development Research. Bank, and a tremendous base of already-gained intellectual and structural capital, can Telenor be Hood, J. 1996. The Heroic Enterprise. New at the forefront of developing new products and York, Free Press. services which can be economically viable and also address the needs of unserved and under- Richardson, D, Ramirez, R, Haq, M. 2000. served markets of Asia, Africa, and Latin Amer- Grameen Telecom’s Village Phone Programme ica? Could Telenor, for instance, in association in Rural Bangladesh : A Multimedia Case Study. with the Grameen family of companies, experi- Ottawa, Canada, Canadian International Devel- ment with the synergies that arise from the pres- opment Agency. ence of credit (provided by Grameen Bank), connectivity (provided by GrameenPhone), and Yunus, M. 1999. Banker to the Poor. New York, energy (provided by Grameen Shakti through Public Affairs. solar panels) in unserved and underserved mar- kets? With the microcredit movement growing Endnotes by leaps and bounds around the world, and with 1) We thank the following individuals who helped us in the Grameen family of companies leading this implementing the present project: Tormod Her- march (Grameen replication efforts are now mansen, CEO of Telenor; Beth Tungland, Senior underway in over 75 countries), could Telenor Vice President, Telenor’s Corporate Social Respon- position itself for new market opportunities as sibility function; Marit Reutz, Director, Telenor no other corporation? Corporate University; Sigve Brekke, Managing Director, Telenor Asia; Ola Ree, Managing Direc- In summary, Telenor’s mobile telephony opera- tor of GrameenPhone; Professor Muhammad tions in Bangladesh in co-operation with the Yunus, Managing Director of the Grameen Bank; Grameen system suggest that a corporation can Mr. Khalid Shams, Deputy Managing Director of strategically pursue multiple bottom lines. It does the Grameen Bank; Syed Yamin Bakht, Additional not imply that such a success story is always a General Manager of Information, GrameenPhone, clear and neat result of plans and detailed over- and various others. sight. On the contrary, new options and innova- tive entrepreneurships do also imply trials and 2) Our data-collection procedures consisted of (1) errors and new and uncommon problems to extensive archival research, including reading of solve. None the less, Telenor’s operations in various evaluation reports on the GrameenPhone Bangladesh, especially the strategic integration project (for instance, the Richardson et al., 2000;

Telektronikk 1.2002 159 and the Bayes et al., 1999 reports); books written on 4) Several middle and senior managers at Telenor con- the Grameen Bank, including Professor Muhammad tinue to be worried about the sustainability of the Yunus’ (1999) book, Banker to the Poor, and others; GrameenPhone initiative. For the GrameenPhone (2) in-depth interviews at Telenor AS, Norway with business to continue growing, large capital invest- key individuals involved in the planning and imple- ments are continually needed. In 2001, Telenor’s mentation of the Telenor-GrameenPhone project in other equity partners in GrameenPhone (Grameen Bangladesh, and with their Bangladeshi counter- Telecom, Marubeni, and Gonofone) were unable to parts, including Professor Muhammad Yunus, raise their share of the new investments, which put Managing Director, and Mr. Khalid Shams, Deputy Telenor in the awkward position of somehow raising Managing Director of Grameen Bank; (3) a two- (through credit) the needed funds at the last minute. week field visit to Bangladesh for observation of, and in-depth interviews with key principals at 5) Some 150,000 mobile telephony users are served by GrameenPhone and the Village Pay Phone projects other competitors of GrameenPhone. of Grameen Telecom. Our above field-based activi- ties in Bangladesh yielded about 30 in-depth taped interviews, several volumes of field-notes, and over 250 photographs.

3) By December 2001, there is one telephone in Bangladesh for every 200 people, largely as a result of Telenor-GrameenPhone’s mobile telephony oper- ations.

160 Telektronikk 1.2002 Status

Telektronikk 1.2002 161 162 Telektronikk 1.2002 Introduction

PER HJALMAR LEHNE

In this issue of the Status section of Telektro- The paper thus contains a fairly detailed descrip- nikk, we focus on one of the most important but tion of MAPsec, as it is specified by SA3 and often underrated aspects of modern telecommu- given by one of the key people of this work. nications, namely security. The section contains Additionally, the Network Domain Security for only one paper, however it is a very comprehen- IP (NDS/IP) is explained and the limitations of sive description of UMTS Network Domain the IP security protocol (IPsec) are addressed. Security written by Geir M. Køien from Telenor He concludes with what lies ahead for UMTS R&D, who is a delegate in the 3GPP SA3 deal- Network Domain Security, namely the introduc- ing with security. tion of a Public Key Infrastructure (PKI) to sup- Per Hjalmar Lehne (44) obtained port the use of digital certificates. Secure authen- his MSc from the Norwegian In his paper, which is a follow-up from a previ- tication methods are probably one of the most University of Science and Tech- ous paper in 2000, he addresses the further important mechanisms necessary to facilitate a nology in 1988. He has since been with Telenor R&D working developments of the security specifications in wide use of e-commerce in general and m-com- with different aspects of terres- UMTS Releases 4 and 5. Work on security for merce particularly. trial mobile communications. the control plane of the UMTS core network 1988 – 1991 he was involved in standardisation of the ERMES started with Release 4. Here only security of the paging system in ETSI as well Mobile Application Part (MAP) of SS7 was as in studies and measurements specified. This is referred to as MAPsec. Secu- on EMC. His work since 1993 has been in the area of radio rity for IP-based control plane protocols was progapation and access tech- scheduled for Release 5. nology, especially on smart antennas for GSM and UMTS. He has participated in the RACE 2 Mobile Broadband Project (MBS), COST 231, and COST 259 and is from 2001 vice-chair- man of COST 273. His current interests are in 4th generation mobile systems and the use of MIMO technology in terrestrial mobile networks, where he participates in the IST project FLOWS. [email protected]

Telektronikk 1.2002 163 UMTS Network Domain Security

GEIR M. KØIEN

1 Introduction The reason that one chose to protect MAP is that This article is a follow-up to the article on MAP is a crucial core network protocol that pro- Overview of UMTS security for Release 99 in vides mobility management services and dis- Telektronikk 1.2000 [1]. During the last two tributes the Authentication Vector (AV) security years the UMTS security architecture has data from the HLR/AuC to the VLR/SGSN. evolved to include security for the control plane of the core network as well as to cover security The technical specification 3GPP TS 33.200 for the new IP Multimedia Subsystem (IMS) Network Domain Security; MAP application architecture. layer security [3a] was completed for Release 4 Geir Køien (36) has been an by June 2001. The specification, which is often employee of Telenor R&D since This article will focus on describing the services just called MAPsec, contained procedures for 1998. He has been working with and features of the core network control plane secure transport of MAP messages between various mobile systems since 1991 and is interested in secu- security extensions. These are collectively called MAP network elements, but lacked mechanisms rity and signalling aspects of Network Domain Security and comprise two for key negotiations and distribution. The key mobile systems. He is the technical specifications. negotiation and distribution procedures are Telenor delegate to 3GPP SA3 (Security) were has served/ scheduled to be included in the Release 5 ver- serves as rapporteur for the A brief description will also be given on the way sion of TS 33.200 [3b]. Network Domain Security speci- forward for the UMTS security specification fications 3GPP TS 33.200 and 3GPP TS 33.210. He is also pur- process. 1.2 Network Domain Security suing a PhD at Agder University Features for UMTS Release 5 College. 1.1 Security Features for UMTS The main Network Domain Security goals for [email protected] Release 4 UMTS Release 5 are: The work to provide security for the control plane of the UMTS core network started for real a) to provide Network Domain Security protec- with UMTS Release 41). This work took place tion for IP-based control plane protocols; under the work item name of Network Domain Security (NDS) and the goal was to secure all b)to complete the MAP security architecture to important control plane protocols in the core include key negotiation and distribution pro- network. This included both protocols based on cedures. the telephone signalling systems (SS7) protocol stack and protocols based on the IP protocol The Network Domain Security for IP-based con- stack. It was realized that the SS7-based and IP- trol plane protocols are based on the IETF IPsec based protocols were sufficiently different to protocols. The work was completed when the warrant separate security solutions. The work technical specification 3GPP TS 33.210 Network to protect IP-based protocols was scheduled for Domain Security; IP network layer security [4] UMTS Release 5. was approved March 2002.

During the process for Release 1999 it was real- 2 Network Domain Security; ized that to protect SS7-based protocols would MAP Application Layer inevitably mean to protect them at the applica- Security tion layer. The drawback of implementing pro- In this section an attempt is made to explain the tection at the application layer is that the target technical realization of the MAP security protec- protocol itself will have to be updated in a per- tion. vasive and non-trivial way. This is an expensive and time consuming process that would have to 2.1 MAPsec Security Services be repeated for every target protocol. After care- The security services provided by MAPsec are: ful deliberations it was found that one could not afford to protect more than a selected set of pro- • Cryptographic data integrity of the MAP tocols and in the end it was decided that the only messages; SS7-based protocol that would be protected was • Data origin authentication for the MAP the Mobile Application Part (MAP) protocol [2]. messages;

1) The naming conventions for the releases changed and Release 4 is the subsequent release to Release 1999.

164 Telektronikk 1.2002 • Replay protection for the MAP messages; Security domain A Security domain B • Confidentiality (encryption) for the MAP Zd messages. KAC KACA B 2.2 The MAP Security Architecture The main MAP security architecture consists of the following elements and interfaces: Ze Ze Ze

• Key Administration Centre (KAC) – Release 5 This new network element is responsible for key negotiation and distribution between net- Zf Zf MAP MAP MAP work operators. The KAC is part of TS 33.200 NE NE NEB Release 5 [3b]. A1 A2

• MAP Network Elements (MAP-NE) The MAP network elements must be updated IKE “connection” between KACs to support the Zf-interface to participate in Local key distribution from KAC to MAP-NE (secured) secure MAP communication. MAP-NEs con- MAP operations secured by means of SecureTransport forming to MAPsec Release 5 specifications must also support the Ze-interface.

• Zd-interface (KAC – KAC) – Release 5 The Zd-interface is an IP-based interface that For the purpose of the actual protection, the Figure 1 Overview of the Zd, is part of MAPsec Release 5. It is used to neg- SecureTransport meta-operation component is Ze and Zf-interfaces otiate MAPsec Security Associations (SAs2)) used. The original MAP component is encap- (from TS 33.200 [3a]) between MAP security domains. The only sulated in SecureTransport. traffic over the Zd-interface is the Internet Key Exchange (IKE) negotiations of MAPsec From an architectural point of view, one should security associations. The semantics of the note that the use of IKE for key negotiation MAPsec SAs are defined in The MAP Security introduces an anomaly. To use IKE for the Zd- Domain of Interpretation for ISAKMP infor- interface may seem to be an obvious choice in mational RFC. At present only a draft version that IKE is a protocol that is specifically de- of the MAPsec DoI is available (draft-arkko- signed to carry out key negotiations. However, map-doi-04.txt [5]). The security services IPsec/IKE make the basic assumption that IKE specified by the security association is en- negotiates security associations on behalf of coded in the protection profile information the network element on which it resides. For element. MAPsec this will not be the case since the KAC shall not use the MAPsec SA itself. MAPsec • Ze-interface (KAC – MAP-NE) – Release 5 security associations are furthermore valid on a The Ze-interface is an IP-based interface that network-to-network basis and not individually is part of MAPsec Release 5. This interface between the communicating parties. provides distribution of security association data from a KAC to a MAP-NE within one This means that one pair of security associations operator domain. will be used for all MAPsec communication between two domains. In Figure 1 MAP-NEA1 • Zf-interface (MAP-NE – MAP-NE) and MAP-NEA2 will use the same security asso- – Release 4 ciation pair when communicating with MAP- The Zf-interface is a MAP interface that is NEB1. This also extends to the case were two part of MAPsec Release 4. The MAP-NEs MAP-NEs within the same security domain may be from the same security domain or needs to engage in MAPsec secured dialogues, from different security domains (as shown in but here the security association (SA) pair must Figure 1). The MAP-NEs use MAPsec secu- be a special initiator-SAself and responder-SAself rity associations received from a KAC to pro- pair. tect the MAP operations. The MAP operations within the MAP dialogue are protected selec- tively according to the chosen MAPsec pro- tection profile.

2) A MAPsec Security Association (SA) is a unidirectional logical control channel for a secured connection. The SA specifies the security services, the algorithms and the lifetime of the secured connection amongst others. Due to different requirements, a MAPsec SA is different from an IPsec SA.

Telektronikk 1.2002 165 Protection Protection mode Protection mode Protection mode Technically, MAPsec protection is specified per level for invoke for result for error MAP operation component. The MAPsec pro- component component component tection profiles are organized by means of pro- tection groups, which are loosely organized 11 0 0around the various transactions (dialogues) that MAP executes. Each protection group defines 21 1 0a set of MAP operations and their protection 31 2 0modes at the operation component level. The concept of “protection level” is introduced to 42 1 0 administrate the protection mode on operation 52 2 0component level. A protection level of an opera- 62 0 0tion determines the protection modes used for the operation’s components according to Table 1.

It shall be noted that not all MAP operations/ components are included in the protection Table 1 MAPsec protection 2.3 MAPsec Protection Modes groups. The operations/components that are not levels (from TS 33.200 [3a]) MAPsec provides for three different protection present in any protection group cannot be pro- modes. These protection modes regulate the pro- tected by means of MAPsec. tection level of the original MAP component: The protection profiles are composed of non- • Protection Mode 0: No Protection overlapping protection groups and are prede- Protection Mode 0 uses the encapsulation pro- fined in TS 33.200. The current set of protection vided by MAPsec, but offers no cryptographic groups and protection profiles may be extended protection. in later releases.

• Protection Mode 1: Integrity, Authenticity 2.5 MAPsec Security Policies In Protection Mode 1 the protected payload is MAPsec security policies are structured around a concatenation of the cleartext and the Mes- the protection profile concept. The extent of sage Authentication Code (MAC) generated security protection is thereby regulated both in by the integrity function f7. terms of which operations to protect and the pro- tection level by means of choosing the appropri- • Protection Mode 2: Confidentiality, ate protection profile. The protection profile con- Integrity, and Authenticity cept offers very simple policy management at Integrity in protection mode 2 is achieved by the expense of flexibility. the same means as for protection mode 1. Confidentiality is achieved by encrypting the Network operators must agree on which protec- cleartext using the encryption function f6. tion profile to use in bilateral agreements. These agreements should become part of the standard Note that in protection mode 0 no protection is agreements between operators. offered and that the “protected” payload is iden- tical to the payload of the original MAP mes- 3 Network Domain Security; IP sage. However, since a protection mode 0 com- network Layer Security ponent is encapsulated by means of Secure- (NDS/IP) Transport it is not identical to the original com- The most important IP-based protocol to protect ponent when it comes to the processing steps in the UMTS core network control plane is the taken by MAP/MAPsec. GTP-C protocol [6]. NDS/IP is defined on the network layer and it is therefore easy to adapt 2.4 MAPsec Protection Profiles NDS/IP to protect GTP-C. In fact, no changes The notion of a MAPsec protection profile was are required to the target protocol. invented to simplify negotiation of security asso- ciations between roaming partners. The idea was Technical specification 3GPP TS33.210 Net- to make agreements on the extent of MAPsec work Domain Security; IP network layer secu- protection very simple, both qualitatively and rity [4] defines how the IP-based core network quantitatively. This would make it much easier control plane protocols can be protected. The for roaming partners to create mutually compati- IETF already has a stable and well-defined ble MAPsec security requirements. Unfortu- architecture for IP security (IPsec, RFC-2401 nately, the actual construction of the protection [7]) at the network layer. It was therefore an profiles has become somewhat counterintuitive, obvious choice for SA3 to base NDS/IP on the slightly inflexible and may appear a bit con- security services afforded by IPsec. For the pur- trived. pose of the closed domain of NDS/IP, many of the options and services of IPsec were redun-

166 Telektronikk 1.2002 dant. The use of IPsec in NDS/IP is therefore higher layers and cannot differentiate its services profiled to remove unnecessary functionality. beyond the information found in the IP headers. The driving force behind the profiling was to For the purpose of the UMTS core network con- reduce complexity in order to facilitate stable trol plane protocols, this will not normally mat- and effortless interoperability. ter. The exception to this is that IPsec cannot be used to discriminate on the contents in IP tun- 3.1 NDS/IP Security Services nels. This means that NDS/IP, located at the IPsec offers a set of security services by means UMTS control plane, cannot selectively protect of the two security protocols Authentication the contents of the GTP-U protocol [6] since it Header (AH) (RFC-2402, [8]) and Encapsulating cannot inspect the contents of the tunnel. Security Payload (ESP) (RFC-2406, [9]). For NDS/IP the IPsec security protocol shall always As mentioned above, IPsec would normally be be ESP and it shall always be used in tunnel used to provide security services to the transport mode. Tunnel mode is an IPsec mode that pro- protocols TCP and UDP. This is all fine, except vides protection for the whole of the original IP that the new IP-multimedia services in UMTS packet and is typically used between security will also use the new Stream Control Transmis- gateways. The other mode, transport mode, is sion Protocol (SCTP) transport layer protocol targeted specifically towards end-to-end commu- [13]. SCTP is amongst other things capable of nication between users, and where the primary supporting multiple IP addresses at each end- goal is to protect the payload portion of the orig- point (multi-homing), and this is currently a inal packet – i.e. providing protection for appli- problem with IPsec. For the time being one cation data. must therefore accept that IPsec, and therefore NDS/IP, cannot be guaranteed to support SCTP. The security services provided by NDS/IP are: Work is ongoing in the IETF to solve this prob- a) Connectionless data integrity lem, and NDS/IP will be updated to support b)Replay protection SCTP when a solution is finalized for IPsec. c) Data origin authentication d)Data confidentiality for the whole original IP 3.3 The NDS/IP Security Architecture packet (optional) The NDS/IP key management and distribution e) Limited protection against traffic flow analy- architecture is based on the IPsec IKE protocol. sis when confidentiality is applied This protocol provides automated Security Asso- ciation (SA) negotiation and distribution for the List-1 Security services provided by NDS/IP IPsec protocols. During SA negotiation, IKE goes through two phases. Phase-1 is the set-up When using NDS/IP, the minimum protection of the IKE “control channel” and phase-2 is level provided shall be integrity protection/mes- where the actual IPsec SA negotiation takes sage authentication with replay protection. This place. amounts to the services a), b) and c) in List-1. The ESP authentication mechanisms provide The basic idea of the NDS/IP architecture is to these security services. provide hop-by-hop security. This is in accor- dance with the chained-tunnels or hub-and- Confidentiality protection (encryption) is an spoke models of operation. The use of hop-by- option when using NDS/IP, and for NDS/IP hop security also makes it easy to operate sepa- it shall always be used in conjunction with rate security policies internally and towards integrity protection as recommended in the ESP other external security domains. RFC [9]. The ESP confidentiality mechanisms are used to provide services d) and e). In NDS/IP only the Security Gateways (SEGs) shall engage in direct communication with enti- NDS/IP also specifies the use of the Internet Key ties in other security domains for NDS/IP traffic. Exchange (IKE) (RFC-2409, [10]) protocol for The SEGs will then establish and maintain IPsec automatic key negotiation and distribution. secured ESP tunnels between security domains.

3.2 IPsec Limitations All NDS/IP traffic from an NE in one security IPsec operates at the IP layer and provides its domain towards an NE in a different security security services to the transport layer protocols. domain will be routed via an SEG and will be The transport layer protocols have traditionally afforded chained-tunnels security protection been the User Datagram Protocol (UDP) [11] towards the final destination. and the Transmission Control Protocol (TCP) [12]. Since IPsec is located at the network layer, Operators need only establish one ESP tunnel its processing rules are exclusively based on between two communicating security domains. information in the IP header. This means that This would make for coarse-grained security IPsec cannot identify logical connection at granularity. Alternatively, the operators may set

Telektronikk 1.2002 167 Figure 2 NDS architecture Security domain A Security domain B for IP-based protocols (from TS 33.210 [4]) NE NE A-1 A-1 Zb Zb

Za Zb Zb SEGA SEGB

Zb Zb NE NE A-2 A-2

IKE “connection”

ESP tunnel

up separate tunnels for each of the protocols and vant when different security policies are em- services that are protected3). ployed within the security domain and towards external destinations. The following interfaces are defined for protec- tion of native IP-based protocols: The restriction not to allow secure inter-domain NE-NE communication does not preclude a sin- • Za-interface (SEG-SEG) gle physical entity to contain both NE and SEG The Za-interface covers all NDS/IP traffic functionality. between security domains. The SEGs use IKE to negotiate, establish and maintain a secure 3.4 NDS/IP Encryption Algorithms ESP tunnel between them. IPsec offers a wide set of confidentiality trans- forms. The mandatory transforms that compliant • Zb-interface (NE-SEG / NE-NE) IPsec implementations must support are the The Zb-interface is located between SEGs and ESP_NULL and the ESP_DES transforms. NEs and between NEs within the same secu- However, the Data Encryption Standard (DES) rity domain. The Zb-interface is optional. Nor- transform is no longer considered sufficiently mally ESP shall be used with both encryption strong in terms of cryptographic strength. For and authentication/ integrity, but an authenti- NDS/IP, neither of the mandatory transforms cation/integrity only mode is allowed. The is allowed. ESP tunnel shall be used for all control plane traffic that needs security protection. Whether The new Advanced Encryption Standard (AES) the tunnel is established when needed or a pri- [14] developed by NIST is expected to be avail- ori is for the security domain operator to able for IPsec shortly. For the purpose of NDS/ decide. The tunnel is subsequently used for IP, the ESP_AES transform(s) will be made exchange of NDS/IP traffic between the NEs. mandatory when they are approved by the IETF. In the meantime the 3DES transform is to be The security policy established over the Za- used. interface is subject to roaming agreements. This differs from the security policy enforced over the 3.5 NDS/IP Integrity Algorithms Zb-interface, which is unilaterally decided by The integrity transforms that compliant IPsec the security domain operator. implementation is required to support are the ESP_NULL, the ESP_HMAC_MD5 and the There is no NE-NE interface for NEs belonging ESP_HMAC_SHA-1 transforms. Since NDS/IP to separate security domains. This is because it traffic always requires the anti-replay service, is important to have a clear separation between the ESP_NULL transform is not allowed in the security domains. This is particularly rele- NDS/IP.

3) IPsec will normally discriminate traffic based on the quintuple (source-IP-address, destination-IP-address, source-port-number, destination-port- number and transport-protocol-identity).

168 Telektronikk 1.2002 Security domain A Security domain B Figure 3 IPsec used with chain-tunnels, i.e. hierarchy still kept, one Certificate Publish Cross-certification (off-line) Publish certificates/CRLs certificates/CRLs Authority (CA) per security CA CA A B domain which must be IKEPhase: 1 cross-certified with roaming partners Cert(SEG )CA A A (from S3-010622 [15]) RepositoryA SEGA SEGB RepositoryB Cert(SEGB)CAB

The new Advanced Encryption Standard (AES) With this as the background, SA3 has started [14] developed by NIST is expected to be avail- work to prepare for the introduction of a Public able for IPsec shortly. For the purpose of Key Infrastructure (PKI) to support digital cer- NDS/IP, the ESP AES MAC transform(s) will tificates in the core network. These digital cer- be made mandatory when they are approved by tificates will be used solely for authentication the IETF. NDS/IP implementations shall also purposes for the core network elements. IPsec/ support the ESP_HMAC_SHA-1 transform. IKE has been designed with support for various authentication methods and IPsec/IKE can use 4 The Road Ahead for UMTS digital certificates for authentication. This means Network Domain Security that IPsec and IKE can still be used for provision of security services. Furthermore, IPsec/IKE is 4.1 Scalability sufficiently flexible to allow for gradual migra- For the IP-based services, a need is foreseen for tion to PKI and digital certificates as the pre- an authentication framework for network ele- ferred authentication method. ments. The basis for this assumption is that authentication for the IPsec IKE phase-1 “con- 4.2 Authentication Framework trol channel” is currently provided by means of A new work item is being created in SA3 to pre-shared secrets. Normally, the use of pre- address this issue. The new work item, which shared secrets scales poorly. For NDS/IP the will likely result in a new technical specification, problem is manageable due to the inherent sec- has tentatively got the title “Network Domain tioning and hierarchy introduced by the security Security; Authentication Framework (NDS/AF)”. domains and the fact that protection is by means The main purpose of NDS/AF is to provide PKI of chained tunnels. based entity authentication for network elements participating in NDS/IP secured communication. Nevertheless, in an environment where the num- ber of IP capable network elements is expected The introduction of PKI in NDS/IP will proba- to rise dramatically, the need for a more scalable bly be executed in phases. For the first phases, architecture is likely to be needed within the the requirement to always use chained tunnels next few years. The rise in the number of IP will likely be kept. This will facilitate smooth addressable network elements is likely to co- migration towards full PKI support in NDS/IP incide with migration to IPv6. The migration to for all operators. For instance, an operator may IPv6 will also mean that many of the technical choose to introduce PKI very early within its obstacles to end-to-end security that are found in own security domain while still supporting the IPv4 will likely disappear or be mitigated by the use of pre-shared secret towards external desti- advent of IPv6. nations. Figure 3 shows a case where PKI based authentication is used for IKE phase-1 over the So at the same time as the number of IP-address- Za-interface between the SEGs. In this case, able network elements is expected to rise sharply operator A may also use PKI internally while it will also be possible to introduce true end-to- operator B uses pre-shared secrets internally. end communication. This will theoretically result in the need for authentication between all When PKI is fully integrated in NDS/IP and IP-addressable network entities for all UMTS when IPv6 is widely deployed in the UMTS core operators. This amounts to (n • (n – 1))/2 mutual networks, the requirement in NDS/IP to use a authentication associations, where n is the num- chained-tunnel architecture will likely be relaxed ber of all IP-addressable UMTS network enti- to allow for direct end-to-end communication ties. The need for a truly scaleable authentication between NDS/IP peers. framework will therefore be strong at that time.

Telektronikk 1.2002 169 Note that there is no strong need for PKI for For the purposes of the present document, the MAPsec since the number of MAPsec security following abbreviations apply: associations is limited by the number of security domains and not by the number of MAP network 3GPP 3rd Generation Partnership Project elements. Therefore, the use of pre-shared (www.3gpp.org) secrets for authentication of the MAPsec IKE AES Advanced Encryption Standard phase-1 session is unlikely to introduce scalabil- CR Change Request ity problems for the foreseen lifetime of DoI Domain of Interpretation MAPsec4). f6 MAP encryption algorithm f7 MAP integrity algorithm 5 Summary IETF Internet Engineering Task Force During the last two years the development of (www.ietf.org) Network Domain Security in UMTS has come IKE Internet Key Exchange a long way. IP Internet Protocol IPsec IP security protocols Through the standards defined in 3GPP TS (as defined in RFC 2401) 33.200 and 3GPP TS 33.210 it will be possible ISAKMP Internet Security Association and to security protect the MAP protocol and GTP-C Key Management Protocol and other IP-based protocols in the UMTS core KAC Key Administration Centre network control plane. MAC Message Authentication Code MAP Mobile Application Part The work to refine the security protection in MAP-NE MAP Network Element the core network will continue. Automated key MAPsec MAP security – the MAP security management will be developed for MAPsec in protocol suite order to facilitate robust interworking and man- NDS Network Domain Security agement for MAP security. Improved authenti- NDS/IP Network Domain Security; cation services based on PKI are planned for the IP network layer security NDS/IP. Support for new transport layer proto- NDS/MAP Network Domain Security; cols like SCTP as well as support for new cryp- MAP application layer security toalgorithms like AES will be added when they NE Network Entity or Network Element become available from the IETF. NIST National Institute of Standards and Technology (www.nist.gov) A remaining obstacle to security in the UMTS PKI Public Key Infrastructure core network control plane is for operators to RFC Request For Comment (IETF stan- actually deploy the security protocols in their dards are published as standards networks. Let us hope that operators will adopt track RFCs) the new security standards and use the security SCTP Stream Control Transmission services aggressively. Protocol SEG Security Gateway 6 Acronyms and Abbreviations SA Security Association The technical report 3GPP TR 21.905 [16] con- SA3 Services and system Architecture: tains all the official acronyms and abbreviations Work Group 3 (Security) that apply to more than one technical specifica- SS7 Signalling System No. 7 tion group. In addition, most technical specifica- TCP Transmission Control Protocol tions and reports contain a list of abbreviations UDP User Datagram Protocol with relevance to the respective document. UMTS Universal Mobile Telecommunica- tions System

4) Observe that MAP itself may live longer than MAPsec. MAP may today be run over the IP protocol stack. NDS/IP could then conceivably be used to secure MAP instead of MAPsec. On the other hand, MAPsec, which is an application layer solution, can be used for both MAP-over-SS7 and MAP- over-IP.

170 Telektronikk 1.2002 Appendix A Resources 3b 3GPP. Network Domain Security; MAP application layer security (Release 5) (3GPP 3GPP Resources TS 33.200) All meeting contributions, technical reports and technical specifications are found at the 4 3GPP. Network Domain Security; IP net- 3GPP web and ftp sites. The web site is at work layer security (Release 5). Draft. www.3gpp.org and the ftp site is at ftp.3gpp.org. (3GPP TS 33.210)

On the website under the “specification” title 5 IETF. The MAP Security Domain of Inter- one will find information about the current set of pretation for ISAKMP. (Work in progress.) specifications, the latest approved versions, etc. http://www.ietf.org/internet-drafts/draft- This is very convenient. The 3GPP website also arkko-map-doi-04.txt (Note: Internet-Drafts contains a lot of other useful information and is are removed after 6 months.) well worth a closer look if one is interested in UMTS standardization. 6 3GPP. Tunnelling Protocol (GTP) across the Gn and Gp interfaces. (3GPP TS 29.060 IETF Resources GPRS) A number of 3GPP specifications draw on IETF standards. The IETF standards can be found at 7 IETF. Security Architecture for the Internet the www.ietf.org website. Protocol. (RFC-2401)

The email Exploder Lists 8 IETF. IP Authentication Header. (RFC- For those interested in following the email com- 2402) munications of the technical specification groups (TSGs) there are essentially two ways to do this. 9 IETF. IP Encapsulating Security Payload (ESP). (RFC-2406) The first is to subscribe to the list(s) of interest. This is done at the at list server webpage at 10 IETF. The Internet Key Exchange (IKE). http://list.3gpp.org/. You simply pick out the (RFC-2409) groups of interest to you and subscribe to them. Be warned that many of the groups have a high 11 IETF. User Datagram Protocol. (RFC-768) volume of email contributions to the lists. 12 IETF. Transmission Control Protocol. The second is simply to use the list server web (RFC-793) pages to browse the up-to-date archives of all the listserver email lists. This is convenient since 13 IETF. Stream Control Transmission Proto- one does not have to subscribe to a list to be able col. (RFC-2960) to browse it. 14 NIST. Specification for the Advanced 7 References Encryption Standard (AES). (FIPS-197) 1 Køien, G M. Overview of UMTS security (Copies can be obtained at http:// for Release 99. Telektronikk, 96 (1), csrc.nist.gov/encryption/aes/) 102–107, 2000. 15 3GPP. Using PKI to provide network 2 3GPP. Mobile Application Part (MAP). domain security (Telenor/Nokia). (Tempo- (3GPP TS 29.002) rary document S3-010622.)

3a 3GPP. Network Domain Security; MAP 16 3GPP. Vocabulary for 3GPP specifications. application layer security (Release 4). (3GPP TR 21.905) (3GPP TS 33.200)

Telektronikk 1.2002 171