Problem Set 2

Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain a reasonable level of consistency and simplicity we ask that you do not use other software tools.] If you collaborated with other members of the class, please write their names at the end of the assignment. Moreover, you will need to write and sign the following statement: \In preparing my solutions, I did not look at any old homeworks, copy anybody's answers or let them copy mine." Problem 1: [10 Points] In many pattern classification problems we have the option to either assign a pattern to one of c classes, or reject it as unrecognizable - if the cost to reject is not too high. Let the cost of classification be defined as: 8 0 if ! = ! , (i.e. Correct Classification) <> i j λ(!ij!j) = λr if ! i = ! 0, (i.e. Rejection) > :λs Otherwise, (i.e. Substitution Error) Show that for the minimum risk classification, the decision rule should associate a test vector x with class ! i, if P(! ijx) ≥ P(! jjx) for all j and and P(! j x) ≥ 1 - λr , and reject otherwise. i λs Solution: Average risk is choosing class !i: c c X ∗ X R(!ijx) = λ(!ij!j)P (!jjx) = 0 P (!ijx) + λsP (!jjx) j=1 j=1;j!=i 1 where λ(!ij!j) is used to mean the cost of choosing class !i where the true class is !j. Hence: R(!ijx) = λs(1 − P (!ijx)) Associate x with the class !i if highest posterior class probability and the average risk is less than the cost of rejection: λs(1 − P (!ijx)) ≤ λr P (!ijx) ≥ 1 − λr/λs Problem 2: [16 points] In a particular binary hypothesis testing application, the conditional density for a scalar feature x given class w1 is 2 p(xjw1) = k1 exp(−x =20) Given class w2 the conditional density is 2 p(xjw2) = k2 exp(−(x − 6) =12) a. Find k1 and k2, and plot the two densities on a single graph using Matlab. Solution: We solve for the parameters k1 and k2 by recognizing that the two equations are in the form of the normal Gaussian distribution. 2 2 −(x−µ1) −x 1 2 k e 20 = p e 2σ1 1 2 2πσ1 2 2 −x −(x − µ1) = 2 20 2σ1 2 µ1 = 0; σ1 = 10 1 k1 = p 20π 2 0.16 p(x|w ) 1 p(x|w ) 0.14 2 0.12 R R R 0.1 1 2 1 p 0.08 0.06 0.04 0.02 0 −10 −5 0 5 10 15 20 25 30 x Figure 1: The pdf and decision boundaries In a similar fashion, we find that 2 µ2 = 6; σ2 = 6 1 k2 = p 12π These distributions are plotted in Figure 1. b. Assume that the prior probabilities of the two classes are equal, and that the cost for choosingp correctly isp zero. If the costs for choosing incorrectly are C12 = 3 and C21 = 5 (where Cij corresponds to predicting class i when it belongs to class j), what is the expression for the conditional risk? Solution: The conditional risk for Two-Category Classification is dis- cussed in Section 2.2.1 in D.H.S. where we can see that the formulas for the risk associated with the action, αi, of classifying a feature vector, x, as class, !i, is different for each action: R(α1jx) = λ11P (!1jx) + λ12P (!2jx) R(α2jx) = λ21P (!1jx) + λ22P (!2jx) The prior probabilities of the two classes are equal: 1 P (! ) = P (! ) = 1 2 2 3 The assumption that the cost for choosing correctly is zero: λ11 = 0; λ22 = 0 The costs for choosing incorrectly are given as C12 and C21: p λ12 = C12 = 3 p λ21 = C21 = 5 Thus the expression for the conditional risk of α1 is: R(α1jx) = λ11P (!1jx) + λ12P (!2jx) p R(α1jx) = 0P (!1jx) + 3P (!2jx) p R(α1jx) = 3P (!2jx) And the expression for the conditional risk of α2: R(α2jx) = λ21P (!1jx) + λ22P (!2jx) p R(α2jx) = 5P (!1jx) + 0P (!2jx) p R(α2jx) = 5P (!1jx) c. Find the decision regions which minimize the Bayes risk, and indicate them on the plot you made in part (a) Solution: The Bayes Risk is the integral of the conditional risk when we use the optimal decision regions, R1 and R2. So, solving for the optimal decision boundary is a matter of solving for the roots of the equation: R(α1jx) = R(α2jx) p p 3P (!2jx) = 5P (!1jx) p p 3P (xj! )P (! ) 5P (xj! )P (! ) 2 2 = 1 1 P (x) P (x) Given the priors are equal this simplifies to: p p 3P (xj!2) = 5P (xj!1) 4 Next, using the values, k1 and k2, from part (a), we have expressions for pxj!1 and pxj!2 . 2 p 1 −(x−6) p 1 −x2 3p e 12 = 5p e 20 12π 20π 2 −(x−6) −x2 e 12 = e 20 −(x − 6)2 −x2 = 12 20 20(x2 − 12x + 36) = 12x2 8x2 − 240x + 720 = 0 The decisionp boundary is found by solvingp for the roots of this quadratic, x1 = 15 − 3 15 = 3:381 and x2 = 15 + 3 15 = 26:62 d. For the decision regions in part (c), what is the numerical value of the Bayes risk? p Solution:p For x < 15 − 3 15p the decision rule will choose !1 For 15 − 3p15 < x < 15 + 3 15 the decision rule will choose !2 For 15 + 3 15 < x the decision rule will choose !1 p p Thus the decision region R1 is xp< 15 − 3 15 andpx > 15 + 3 15, and the decision region R2 is 15 − 3 15 < x < 15 + 3 15. 5 Z Risk = λ11P (!1jx) + λ12P (!2jx) R1 Z + λ21P (!1jy) + λ22P (!2jy) R2 Z Z = λ12P (yj!2)P (!2) + λ21P (yj!1)P (!1) R1 R2 p Z y=15−3 15 p 1 Z y=inf p 1 = 3N(6; 6) + p 3N(6; 6) y=− inf 2 y=15+3 15 2 p Z y=15+3 15 p 1 + 5N(0; 10) p 2 py=15−3 15 3 1 1 p p 1 1 p p = ( + erf(((15 − 3 15) − 6)= 12)) + ( − erf(((15 + 3 15) − 6)= 12)) 2 2 2 2 2 p 5 1 p p 1 p p + erf((15 + 3 15)= 20) − erf((15 − 3 15)= 20) 2 2 2 = 0:2827 Problem 3: [16 points] Let's consider a simple communication system. The transmitter sends out 3 1 messages m = 0 or m = 1, occurring with a priori probabilities 4 and 4 respectively. The message is contaminated by a noise n, which is indepen- 1 5 2 dent from m and takes on the values -1, 0, 1 with probabilities 8 , 8 , and 8 respectively. The received signal, or the observation, can be represented as r = m + n. From r, we wish to infer what the transmitted message m was (estimated state), denoted usingm ^ .m ^ also takes values on 0 or 1. When m =m ^ , the detector correctly receives the original message, otherwise an error occurs. a. Find the decision rule that achieves the maximum probability of correct decision. Compute the probability of error for this decision rule. Solution: It is equivalent to find the decision rule that achieves the minimum probability of error. The receiver decides the transmitted message is 1, i.e.,m ^ = 1 if 6 Figure 2: A simple receiver P (rjm = 1) · P r(m = 1) ≥ P (rjm = 0) · P r(m = 0) P (rjm = 1) ≥ P (rjm = 0) · 3 (1) Otherwise the receiver decidesm ^ = 0. The likelihood functions for these two cases are 8 1 if r = 0 > 8 <> 5 8 if r = 1 P (rjm = 1) = 2 > 8 if r = 2 > :0 otherwise 8 1 if r = −1 > 8 <> 5 8 if r = 0 P (rjm = 0) = 2 > 8 if r = 1 > :0 otherwise Eq. 1 holds only when r = 2. Therefore, the decision rule can be summarized as ( 1 if r = 2 m^ = 0 otherwise The probability of error is as follows: P r(e) = P r(m ^ = 1jm = 0) · P r(m = 0) + P r(m ^ = 0jm = 1) · P r(m = 1) = P r(r = 2jm = 0) · P r(m = 0) + P r(r 6= 2jm = 1) · P r(m = 1) 1 5 1 3 = 0 + ( + ) · = = 0:1875 8 8 4 16 7 1 0.9 0.8 0.7 0.6 0.5 p(n) 0.4 0.3 0.2 0.1 0 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 n Figure 3: The pdf of n b. Let's have the noise n be a continuous random variable, uniformly −3 2 distributed between 4 and 4 , and still statistically independent of m. First, plot the pdf of n. Then, find a decision rule that achieves the minimum probability of error, and compute the probability of error. Solution: The uniform distribution is plotted in Figure 3. The decision rule is still determined by using Eq. 1. The likelihood functions become continuous, instead of discrete in (a): ( 4 if 1 < r ≤ 6 P (rjm = 1) = 5 4 4 0 otherwise ( 4 if −3 < r ≤ 2 P (rjm = 0) = 5 4 4 0 otherwise The interesting region is where the two pdf's overlap with each other, 1 2 namely when 4 < r ≤ 4 .

Problem Set 2

Implications of Rational Inattention

The Physics of Optimal Decision Making: a Formal Analysis of Models of Performance in Two-Alternative Forced-Choice Tasks

Statistical Decision Theory Bayesian and Quasi-Bayesian Estimators

Statistical Decision Theory: Concepts, Methods and Applications

The Functional False Discovery Rate with Applications to Genomics

Optimal Trees for Prediction and Prescription Jack William Dunn

Rational Inattention in Controlled Markov Processes

Optimal Decision Making in Daily Life

Bayesian Decision Theory Chapter 2 (Jan 11, 18, 23, 25)

A CART-Based Genetic Algorithm for Constructing Higher Accuracy Decision Trees

FDR and Bayesian Multiple Comparisons Rules

1 Introduction