Formal Modeling in Cognitive Science

Noisy Channel Model Noisy Channel Model Kullback-Leibler Divergence Kullback-Leibler Divergence Cross-entropy Cross-entropy Formal Modeling in Cognitive Science 1 Noisy Channel Model Lecture 29: Noisy Channel Model and Applications; Channel Capacity Kullback-Leibler Divergence; Cross-entropy Properties of Channel Capacity Applications Frank Keller 2 Kullback-Leibler Divergence School of Informatics University of Edinburgh [email protected] 3 Cross-entropy March 14, 2006 Frank Keller Formal Modeling in Cognitive Science 1 Frank Keller Formal Modeling in Cognitive Science 2 Noisy Channel Model Channel Capacity Noisy Channel Model Channel Capacity Kullback-Leibler Divergence Properties of Channel Capacity Kullback-Leibler Divergence Properties of Channel Capacity Cross-entropy Applications Cross-entropy Applications Noisy Channel Model Channel Capacity We are interested in the mathematical properties of the channel So far, we have looked at encoding a message efficiently, put what used to transmit the message, and in particular in its capacity. about transmitting the message? The transmission of a message can be modeled using a noisy Definition: Discrete Channel channel: A discrete channel consists of an input alphabet X , an output alphabet Y and a probability distribution f (yjx) that expresses the a message W is encoded, resulting in a string X ; probability of observing symbol y given that symbol x is sent. X is transmitted through a channel with the probability distribution f (yjx); Definition: Channel Capacity the resulting string Y is decoded, yielding an estimate of the The channel capacity of a discrete channel is: message W^ . C = max I (X ; Y ) f (x) Channel ^ W Encoder X Y Decoder W Message f(y|x) Estimate of message The capacity of a channel is the maximum of the mutual information of X and Y over all input distributions f (x). Frank Keller Formal Modeling in Cognitive Science 3 Frank Keller Formal Modeling in Cognitive Science 4 Noisy Channel Model Channel Capacity Noisy Channel Model Channel Capacity Kullback-Leibler Divergence Properties of Channel Capacity Kullback-Leibler Divergence Properties of Channel Capacity Cross-entropy Applications Cross-entropy Applications Channel Capacity Channel Capacity Example: Binary Symmetric Channel Example: Noiseless Binary Channel Assume a binary channel whose input is flipped (0 transmitted a 1 or 1 Assume a binary channel whose input is reproduced exactly at the transmitted as 0) with probability p: output. Each transmitted bit is received without error: 0 1 - p 0 0 0 p p 1 1 1 1 - p 1 The channel capacity of this channel is: The mutual information of this channel is bounded by: C = max I (X ; Y ) = 1 bit I (X ; Y ) = H(Y ) − H(X jY ) = H(Y ) − x f (x)H(Y jX = x) f (x) P = H(Y ) − Px f (x)H(p) = H(Y ) − H(p) ≤ 1 − H(p) 1 1 This maximum is achieved with f (0) = 2 and f (1) = 2 . The channel capacity is therefore: C = max I (X ; Y ) = 1 − H(p) bits f (x) Frank Keller Formal Modeling in Cognitive Science 5 Frank Keller Formal Modeling in Cognitive Science 6 Noisy Channel Model Channel Capacity Noisy Channel Model Channel Capacity Kullback-Leibler Divergence Properties of Channel Capacity Kullback-Leibler Divergence Properties of Channel Capacity Cross-entropy Applications Cross-entropy Applications Channel Capacity Properties Channel Capacity A binary data sequence of length 10,000 transmitted over a binary symmetric channel with p = 0:1: Theorem: Properties of Channel Capacity 1 C ≥ 0 since I (X ; Y ) ≥ 0; 1 - p 0 0 2 C ≤ log jX j, since C = max I (X ; Y ) ≤ max H(X ) ≤ log jX j; p 3 C ≤ log jY j for the same reason. p 1 1 - p 1 Frank Keller Formal Modeling in Cognitive Science 7 Frank Keller Formal Modeling in Cognitive Science 8 Noisy Channel Model Channel Capacity Noisy Channel Model Channel Capacity Kullback-Leibler Divergence Properties of Channel Capacity Kullback-Leibler Divergence Properties of Channel Capacity Cross-entropy Applications Cross-entropy Applications Applications of the Noisy Channel Model Applications of the Noisy Channel Model The noisy channel can be applied to decoding processes involving Application Input Output f (i) f (oji) linguistic information. A typical formulation of such a problem is: Machine trans- target source target translation lation language language language model we start with a linguistic input I ; word word model I is transmitted through a noisy channel with the probability sequences sequences distribution f (oji); Optical charac- actual text text with language model of ter recognition mistakes model OCR errors the resulting output O is decoded, yielding an estimate of the Part of speech POS word probability f (wjt) input Î . tagging sequences sequences of POS sequences ^ Speech recog- word speech sig- language acoustic I Noisy Channel O Decoder I f(o|i) nition sequences nal model model Frank Keller Formal Modeling in Cognitive Science 9 Frank Keller Formal Modeling in Cognitive Science 10 Noisy Channel Model Channel Capacity Noisy Channel Model Channel Capacity Kullback-Leibler Divergence Properties of Channel Capacity Kullback-Leibler Divergence Properties of Channel Capacity Cross-entropy Applications Cross-entropy Applications Applications of the Noisy Channel Model Example Output: Spanish{English Let's look at machine translation in more detail. Assume that the we all know very well that the current treaties are insufficient and that , French text (F ) passed through a noisy channel and came out as in the future , it will be necessary to develop a better structure and English (E). We decode it to estimate the original French (F^): different for the european union , a structure more constitutional also make it clear what the competences of the member states and which ^ belong to the union . messages of concern in the first place just before F Noisy Channel E F f(e|f) Decoder the economic and social problems for the present situation , and in spite of sustained growth , as a result of years of effort on the part of our citizens . the current situation , unsustainable above all for many F^ We compute using Bayes' theorem: self-employed drivers and in the area of agriculture , we must improve f (f )f (ejf ) without doubt . in itself , it is good to reach an agreement on procedures F^ = arg max f (f je) = arg max = arg max f (f )f (ejf ) , but we have to ensure that this system is not likely to be used as a f f f f (e) weapon policy . now they are also clear rights to be respected . i agree with the signal warning against the return , which some are tempted to Here f (ejf ) is the translation model, f (f ) is the French language the intergovernmental methods . there are many of us that we want a model, and f (e) is the English language model (constant). federation of nation states . Frank Keller Formal Modeling in Cognitive Science 11 Frank Keller Formal Modeling in Cognitive Science 12 Noisy Channel Model Channel Capacity Noisy Channel Model Kullback-Leibler Divergence Properties of Channel Capacity Kullback-Leibler Divergence Cross-entropy Applications Cross-entropy Example Output: Finnish{English Kullback-Leibler Divergence the rapporteurs have drawn attention to the quality of the debate and also the need to go further : of course , i can only agree with them . we Definition: Kullback-Leibler Divergence know very well that the current treaties are not enough and that in future For two probability distributions f (x) and g(x) for a random , it is necessary to develop a better structure for the union and , therefore variable X , the Kullback-Leibler divergence or relative entropy is perustuslaillisempi structure , which also expressed more clearly what the member states and the union is concerned . first of all , kohtaamiemme given as: f (x) economic and social difficulties , there is concern , even if growth is D(f jjg) = f (x) log X g x sustainable and the result of the efforts of all , on the part of our citizens x2X ( ) . the current situation , which is unacceptable , in particular , for many carriers and responsible for agriculture , is in any case , to be improved . The KL divergence compares the entropy of two distributions over agreement on procedures in itself is a good thing , but there is a need to the same random variable. ensure that the system cannot be used as a political lyomaaseena . they also have a clear picture of the rights of now , in which they have to work Intuitively, the KL divergence number of additional bits required . i agree with him when he warned of the consenting to return to when encoding a random variable with a distribution f (x) using intergovernmental methods . many of us want of a federal state of the the alternative distribution g(x). national member states . Frank Keller Formal Modeling in Cognitive Science 13 Frank Keller Formal Modeling in Cognitive Science 14 Noisy Channel Model Noisy Channel Model Kullback-Leibler Divergence Kullback-Leibler Divergence Cross-entropy Cross-entropy Kullback-Leibler Divergence Kullback-Leibler Divergence Example Theorem: Properties of the Kullback-Leibler Divergence For a random variable X = f0; 1g assume two distributions f (x) and g(x) with f (0) = 1 − r, f (1) = r and g(0) = 1 − s, g(1) = s: 1 D(f jjg) ≥ 0; 2 D(f jjg) = 0 iff f (x) = g(x) for all x 2 X ; 1−r r D(f jjg) = (1 − r) log 1−s + r log s −s s 3 1 D(f jjg) 6= D(gjjf ); D(gjjf ) = (1 − s) log 1−r + s log r 4 I (X ; Y ) = D(f (x; y)jjf (x)f (y)). 1 1 If r = s then D(f jjg) = D(gjjf ) = 0. If r = 2 and r = 4 : So the mutual information is the KL divergence between f (x; y) 1 1 1 2 1 2 and f (x)f (y).

Load more