A Iilstory of the THEORY of INFORMATION
Total Page:16
File Type:pdf, Size:1020Kb
Paper No. 1177 621.391(091) RADIO SECTION A IIlSTORY OF THE THEORY OF INFORMATION By E. COLIN CHERRY, M.Sc., Associate Member. (The paper was first received 1th February, and in revised form 28th May, 1951.) SUMMARY most important of which has been his ability to receive, to The paper mentions first some essential points about the early communicate and to record his knowledge. Communication development of languages, codes and symbolism, picking out those essentially involves a language, a symbolism, whether this is a fundamental points in human communication which have recently been spoken dialect, a stone inscription, a cryptogram, a Morse-code summarized by precise mathematical theory. A survey of telegraphy signal or a chain of numbers in binary digital form in a modern and telephony development leads to the need for "economy," which computing machine. It is interesting to observe that as technical has given rise to various systems of signal compression. Hartley's applications have increased in complexity with the passage of early theory of communication is summarized, and Gabor's theory time, languages have increased in simplicity, until to-day we are of signal structure is described. Modern statistical theory of Wiener and Shannon, by which considering the ultimate compression of information in the "information" may be expressed quantitatively, is shown to be a simplest possible forms. It is important to emphasize, at the logical extension of Hartley's work. A Section on calculating start, that we are not concerned with the meaning or the truth machines and brains attempts to clear up popular misunderstandings of messages; semantics lies outside the scope of mathematical and to separate definite accomplishments in mechanization of thought information theory. The material which is to be communicated processes from mere myths. between a transmitting mind and a receiving mind must, however, Finally, a generalization of the work over the whole field of be prearranged into an agreed language. We are not concerned scientific observation is shown, and evidence which supports the view with the "higher" mental processes which have enabled man to that "information plus entropy is an important invariant of a physical find a word for a concept, even less with the formation of a system" is included. concept; we start only from the point when he already has a dictionary. (1) INTRODUCTION A detailed history of spoken and written languages would be In recent years much has been hecird of the "theory of com irrelevant to our present subject, but nevertheless there are munication" or, in a wider scientific circle, the "theory of certain matters of interest which may be taken as a starting information": in scientific books, articles and broadcast talks, point. The early writings of Mediterranean civilizations were many concepts, words and expressions are used which are· part in picture, or "logographic" script; simple pictures were used to of the communication engineer's stock-in-trade. Although represent objects and also, by association, ideas, actions, names, electrical communication systems have been in use for over a and so on. Also, what is much more important, phonetic century it is only comparatively recently that any attempt has writing was developed, in which sounds were given symbols. been made to define what is the commodity that is being With the passage of time, the pictures were reduced to more communicated and how it may be measured. Modem com formal symbols, as determined by the difficulty of using a chisel, munication theory has evolved from the attempts of a few or a reed brush, while the phonetic writing simplified into a mathematically-minded communication engineers to establish set of two or three dozen alphabetic letters, divided into con such a quantitive basis to this most fundamental aspect of the sonants and vowels.t. 2 subject. Since the idea of "communication of information" has In Egyptian. hieroglyphics we have a supreme example of what a scope which extends far beyond electrical techniques of trans is now called redundancy in languages and code; one of the mission and reception, it is not surprising that the mathematical difficultiesin deciphering the Rosetta stone lay in the fact that a work is proving to be of value in many, widely diverse, fields of polysyllabic word might give each syllable not one symbol but a science. number of different ones in common use, in order that the word It may therefore be of interest to _communication engineers to should be thoroughly understood. 3 (The effect when literally consider historically the interrelation between various activities transcribed into English is one of stuttering.) On the other hand and fields of scientific investigation on which their subject is the Semitic languages show an early recognition of redundancy. beginning to impinge. The scope of information theory is Ancient Hebrew script had no vowels; modern Hebrew has none, extensive, and no attempt will be made here to present an exposi too, except in children's books. Many other ancient scripts tion of the theory; it is in any case still in process of development. show no vowels. Slavonic Russian went a step further in This paper should be regarded purely as a history; the references condensation: in religious texts, commonly used words were appended may be found useful by those wishing to study the abbreviated to a few letters, in a manner similar to our present subject in detail. day use of the ampersand, abbreviations such as lb and the increasing use of initials, e.g. U.S.A., Unesco, O.K. ( 1. 1) Languages and Codes But the Romans were responsible for a very big step in speeding Man's development and the growth of civilizations have up the recording of information. The freed slave Tyro invented depended in the main on progress in a few activities, one of the shorthand in about 63 B.c. in order to record, verbatim, the Written contributions on papers published without being read at meetings are speeches of Cicero. This is not unlike modern shorthand in invited for consideration with a view to publication. appearance (Fig. and it is known to have been used in Europe This is an "integrating" paper. Members are invited to submit papers in this 1), category, giving the full perspective of the developments leading to the present until the Middle Ages.4 practice in a particular part of one of the branches of electrical science. The paper is a modified version of one presented by the author to the Symposium Related to the structure of language is the theory of crypto on Information Theory held under the auspices of the Royal Society in September, grams or ciphers, certain aspects of which are of interest in our 1950. The original paper was given only restricted publication. Mr. Cherry is Reader in Telecommunications, University of London. present study, in connection with the problem of coding.24 [ 383] 384 CHERRY: A HISTORY OF THE THEORY OF INFORMATION this has assumed increasing importance. Certain types of message are of very common occurrence in telegraphy; e.g. some commercial expressions and birthday greetings, and these should be coded into very short symbols. By the year 1825 a number of such codes were in use. The modern view is that Ne fide/iter di/igit quernfastidit et ca/amitas queru/a. mo nam messages having a high probability of occurrence contain little Fig. 1.-Roman shorthand (orthographic). information and that any mathematical definition we adopt for the expression "information" must conform with this idea, that Ciphering, of vital importance for military and diplomatic the information conveyed by a symbol, a message or an observa secrecy, is as old as the Scriptures. The simple displaced alphabet tion, in a set of such events, must decrease as their frequency of code, known to every schoolboy, was most probably used by occurrence in the set increases. Shannon has emphasized the Julius Caesar.3 There are many historic uses of cipher; e.g. importance of the statistics of language in his recent publica Samuel Pepys's diary4 was entirely ciphered "to secrete it from tions,24 and has referred to the relative frequencies of letters in a his servants and the World"; also certain of Roger Bacon's scripts language, and of the digrams (or letter-pairs) such as ed, st, er, have as yet resisted all attempts at deciphering. A particularly and of the trigrams ing, ter, and so on, which occur in the English important cipher is one known as "Francis Bacon's Biliteral language. His estimate is that English has a redundancy of at Code": Bacon suggested the possibility of printing seemingly least 50%, meaning that if instead of writing every letter of an Vowels h d t e q a o u e ' Ill lllf 1111 I II Ill WI /Ill/ I II II Ill \1111 I II Ill Ill/ //� � b j v s n m g ng z Fig. 2.-The Ogam (Celtic) script. innocent lines of verse or prose, using two slightly different English sentence (or its coded symbol) we were suitably to founts (called, say, 1 and 2). The order of the 1 's and 2's was symbolize the digrams, trigrams, etc., the possible ultimate used for coding a secret message: each letter .of the alphabet compression would be at least two to one. If such a perfect was coded into five units, founts 1 and 2 being used as these code were used, any mistake which might occur in transmission units.3 Now, this code illustrates an important principle which of a message could not be corrected by guessing. Language seems to have been understood since the early civilizations statistics3 have been essential for centuries, for the purpose of -information may be coded into a two-symbol code.* There are deciphering secret codes and cryptograms.