15. Probability and Statistics MAE 546 2018.Pptx

Probability and Statistics Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 2018 • Concepts and reality • Probability distributions • Bayes’s Law • Stationarity and ergodicity • Correlation functions and power spectra Copyright 2018 by Robert Stengel. All rights reserved. For educational use only. http://www.princeton.edu/~stengel/MAE546.html http://www.princeton.edu/~stengel/OptConEst.html 1 Concepts and Reality of Probability (Papoulis, 1990) • Theory may be exact – Deals with averages of phenomena with many possible outcomes – Based on models of behavior • Application can only be approximate – Measure of our state of knowledge or belief that something may or may not be true – Subjective assessment A : event P(A) : probability of event nA : number of times A occurs experimentally n :total number of trials n P(A) ≈ A n 2 Interpretations of Probability (Papoulis) • Axiomatic Definition (Theoretical interpretation) – Probability space, abstract objects (outcomes), and sets (events) – Axiom 1: Pr(Ai) ≥ 0 – Axiom 2: Pr(“certain event”) = 1 = Pr [all events in probability space (or universe)] – Axiom 3: With no common elements, Pr(Ai ∪ Aj ) = Pr(Ai ) + Pr(Aj ) • Relative Frequency (Empirical interpretation) n ⎛ Ai ⎞ N = number of trials (total) Pr(Ai ) = lim n = number of trials with attribute A N →∞ ⎝⎜ N ⎠⎟ Ai i 3 Interpretations of Probability (Papoulis) • Classical (“Favorable outcomes” interpretation) n Ai N is finite Pr(Ai ) = nAi = number of outcomes N “favorable to” Ai • Measure of belief (Subjective interpretation) – Pr(Ai) = measure of belief that Ai is true (similar to fuzzy sets) – Informal induction precedes deduction – Principle of insufficient reason (i.e., total prior ignorance): • e.g., if there are 5 event sets, Ai , i = 1 to 5, Pr(Ai) = 1/5 = 0.2 4 Probability “... a way of expressing knowledge or belief that an event will occur or has occurred.” Statistics “The science of making effective use of numerical data relating to groups of individuals or experiments.” 5 Empirical (or Relative) Frequency of Discrete, Mutually Exclusive Events in Sample Space ni Pr(xi ) = ; i = 1 to I in [0,1] N • N = total number of events • ni = number of events with value xi • I = number of different values • xi = ordered set of hypotheses or values • xi is a discrete, real random variable 0.3 • Ensemble of equivalent sets 0.25 0.2 Ai = {x ∈U x = xi } ; i = 1 to I 0.15 Pr(x) • Cumulative probability over all sets 0.1 I I 1 I 0.05 ∑Pr(Ai ) = ∑Pr(xi ) = ∑ni = 1 0 i=1 i=1 N i=1 1 2 3 4 5 6 Cumulative Probability, Pr(x ≥/≤ a), Discrete Random Variables, and Continuous Variables 1 0.8 0.6 Pr(x) Cum Pr(x) ≥ a 0.4 Cum Pr(x) ≤ a 0.2 0 1 2 3 4 5 • Continuous, real random variable, x, e.g., a continuum of colors – xi is the center of a categorical band in x (purple, blue, green, …) Pr(xi ± Δx / 2) = ni / N I ∑Pr(xi ± Δx / 2) = 1 i=1 7 Probability Density Function, pr(x), and Continuous, Cumulative Distribution Function, Pr(x <X) Slope of cumulative distribution Pr(xi ± Δx / 2) ! pr(xi ) ⎯Δ⎯x→⎯0→pr(x) Δx Cumulative probability for x over (–∞, +∞) I I Pr x x / 2 pr x x ∑ ( i ± Δ ) = ∑ ( i ) Δ ⎯Δ⎯x→⎯0→ i=1 i=1 I→∞ ∞ pr(x) dx = 1 ∫−∞ Cumulative distribution function, x < X X Pr(x < X) = pr(x) dx Gaussian probability density and ∫−∞ cumulative distribution functions 8 Properties of Random Variables • Mode – Value of x for which pr(x) is maximum • Median – Value of x corresponding to 50th percentile – Pr(x < median) = Pr(x > median) = 0.5 • Mean – Value of x corresponding to statistical average • First moment of x = Expected value of x “Force” ∞ x = E(x) = x pr(x) dx ∫−∞ “Moment arm” 9 Expected Values ∞ Mean Value x = E(x) = x pr(x) dx ∫−∞ • Second central moment of x = Variance – Variance from the mean value rather than from zero – Smaller value indicates less uncertainty in the value of x ∞ 2 2 2 E ⎡(x − x ) ⎤ = σ x = (x − x ) pr(x) dx ⎣ ⎦ ∫−∞ • Expected value of any function of x, f(x) ∞ E f (x) = f (x) pr(x) dx [ ] ∫−∞ See Supplemental Material for examples of non-Gaussian distributions 10 Expected Value is a Linear Operation Expected value of sum of random variables x1 and x2 need not be statistically independent ∞ E x1 + x2 = x1 + x2 pr(x) dx [ ] ∫−∞ ( ) ∞ ∞ = x1 pr(x) dx + x2 pr(x) dx = E x1 + E x2 ∫−∞ ∫−∞ [ ] [ ] Expected value of constant times random variable ∞ ∞ E k x = k x pr(x) dx = k x pr(x) dx = k E x [ ] ∫−∞ ∫−∞ [ ] 11 Gaussian (Normal) Random Distribution • Used in some random number generators (e.g., RANDN) • Unbounded, symmetric distribution • Defined entirely by its mean and standard deviation 2 (x− x ) − 1 2 2σ x ∞ pr(x) = e pr(x) dx = 1 ∫−∞ 2π σ x Mean value: First moment, µ1 ∞ E(x) = x pr(x) dx = x ∫−∞ Variance: Second central moment, µ2 ∞ 2 2 2 E ⎡(x − x ) ⎤ = (x − x ) pr(x) dx = σ x ⎣ ⎦ ∫−∞ Units of x and σ x are the same 12 Probability of Being Close to the Mean (Gaussian Distribution) 2 (x− x ) − 1 2σ 2 Probability of being within ± 1σ pr(x) = e x 2π σ x Pr ⎣⎡x < (x + σ x )⎦⎤ − Pr ⎣⎡x < (x − σ x )⎦⎤ ≈ 68% Probability of being within ± 2σ Pr ⎣⎡x < (x + 2σ x )⎦⎤ − Pr ⎣⎡x < (x − 2σ x )⎦⎤ ≈ 95% Probability of being within ± 3σ Pr ⎣⎡x < (x + 3σ x )⎦⎤ − Pr ⎣⎡x < (x − 3σ x )⎦⎤ ≈ 99% 13 Experimental Determination of Mean and Variance • Sample mean for N data points, x1, x2, ..., xN N ∑ xi x = i=1 N • Sample variance for same data set N 2 Histogram ∑(xi − x ) σ 2 = i=1 x (N − 1) • Divisor is (N – 1) rather than N to produce an unbiased estimate – (N – 1) terms are independent – Inconsequential for large N • Distribution is not necessarily Gaussian – Prior knowledge: fit histogram to known distribution – Hypothesis test: determine best fit (e.g., Rayleigh, binomial, Poisson, ... ) 14 Central Limit Theorem • Probability distribution of the sum of independent, identically distributed (i.i.d.) variables – Approaches normal distribution as number of variables approaches infinity – Summation of continuous random variables produces a convolution of probability density functions (Papoulis, 1990) – See Supplemental Material for sufficient conditions ∞ ∞ pr(y) = ∫ pr(y − x)pr(x)dx pr(z) = ∫ pr(z − y)pr(y)dy −∞ −∞ 15 MATLAB Simulation of Central Limit Theorem Uniform distribution, 100,000 points, 60 histogram bins 16 Multiple Probability Densities and Expected Values Probability density functions of two random variables, x and y pr(x) and pr(y) given for all x and y in (−∞,∞) pr(x, y) : Joint probability density function of x and y ∞ ∞ ∞ ∞ ∫ pr(x)dx = 1; ∫ pr(y)dy = 1; ∫ ∫ pr(x, y)dx dy = 1; −∞ −∞ −∞ −∞ Expected values of x and y ∞ Mean Value E(x) = ∫ x pr(x)dx = x −∞ ∞ Mean Value E(y) = ∫ y pr(y)dy = y −∞ ∞ ∞ Covariance E(xy) = ∫ ∫ x y pr(x, y)dx dy −∞ −∞ ∞ Autocovariance 2 2 E(x ) = ∫ x pr(x)dx 17 −∞ Joint Probability (n = 2) Suppose x can take I values and y can take J values; then, I J ∑Pr(xi ) = 1 ; ∑Pr(yj ) = 1 i=1 j =1 If x and y are uncorrelated, Pr x , y = Pr x ∧ y = Pr x Pr y ( i j ) ( i j ) ( i ) ( j ) Pr(yj) and I J 0.5 0.3 0.2 ∑∑Pr(xi , yj ) = 1 i=1 j =1 0.6 0.3 0.18 0.12 0.6 Pr(x ) i 0.4 0.2 0.12 0.08 0.4 0.5 0.3 0.2 1 18 Conditional Probability (n = 2) If x and y are not independent, probabilities are related Probability that x takes ith value when y takes jth value • Similarly Pr xi , yj ( ) Pr(xi , yj ) Pr xi | yj = Pr y | x = ( ) Pr y ( j i ) ( j ) Pr(xi ) Pr(xi | yj ) = Pr(xi ) Pr(yj | xi ) = Pr(yj ) iff x and y are independent of each other iff x and y are independent of each other Causality is not addressed by conditional probability 19 Applications of Conditional Probability (n = 2) Joint probability can be expressed in two ways Pr(xi , yj ) = Pr(yj | xi )Pr(xi ) = Pr(xi | yj )Pr(yj ) Unconditional probability of each variable is expressed by a sum of terms J I Pr y Pr y | x Pr x Pr(xi ) = ∑Pr(xi | yj ) Pr(yj ) ( j ) = ∑ ( j i ) ( i ) j =1 i=1 20 Bayes’s Rule Thomas Bayes, 1702-1761 Bayes’s Rule proceeds from the previous results th Probability of x taking the value xi conditioned on y taking its j value Pr(yj | xi )Pr(xi ) Pr(yj | xi )Pr(xi ) Pr(xi | yj ) = = I Pr(yj ) ∑Pr(yj | xi ) Pr(xi ) i=1 ... and the converse Pr(xi | yj ) Pr(yj ) Pr(xi | yj ) Pr(yj ) Pr(yj | xi ) = = J Pr(xi ) ∑Pr(xi | yj ) Pr(yj ) j =1 Bayes’s or Bayes’ Rule? See http://www.dailywritingtips.com/charless-pen-and-jesus-name/ 21 Random Processes 22 Ensemble Statistics • Probability metrics apply to an ensemble of events Ai , i = 1 to I n ⎛ Ai ⎞ • For I events, Pr(Ai ) = lim , i = 1 to I N→∞ ⎝⎜ N ⎠⎟ • As I becomes large, Ai approaches a continuous, real, random variable, x, Ai = {x ∈U x = xi } ; i = 1 to I → ∞ I I I ∑ Pr(Ai ) = ∑ Pr(xi ) = ∑ Pr(xi ± Δx / 2) = • and i=1 i=1 i=1 I ∞ pr xi Δx ⎯⎯⎯→ pr(x) dx = 1 ∑ ( ) Δx→0 ∫−∞ i=1 I→∞ • x has no specified dependence on time 23 Random Process ! Random-process variable (or time series), x(t), varies with time as well as across the ensemble of trials ⎡ ⎤ prensemble (xi ) = prensemble ⎣xi (t )⎦, i = I, I → ∞ ! Joint ensemble statistics (e.g., joint probability distribution) depend on time ⎡ ⎤ prensemble (x1,x2 ) = prensemble ⎣x (t1),x (t2 )⎦ 24 Non-Stationary Random Process ! Joint ensemble statistics (e.g., joint probability distribution) depend on specific values of time, t1 and t2 ⎡ ⎤ ⎡ ⎤ prensemble ⎣x (t1)⎦ ≠ prensemble ⎣x (t2 )⎦ {unless periodic} ⎡ ⎤ prensemble (x1,x2 ) = prensemble ⎣x (t1),x (t2 )⎦ … unless time series is periodic or a special case 25 Non-Stationary Process: Sinusoidal Mean Trial 1 Trial 2 … Trial n Ensemble statistics Ensemble statistics 26 Non-Stationary Process: Sinusoidal Variance Trial 1 Trial 2 … Trial n Ensemble statistics Ensemble statistics 27 Stationary

Load more