<<

and Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 2018

• Concepts and reality • Probability distributions • Bayes’s Law • Stationarity and ergodicity • Correlation functions and spectra

Copyright 2018 by Robert Stengel. All rights reserved. For educational use only. http://www.princeton.edu/~stengel/MAE546.html http://www.princeton.edu/~stengel/OptConEst.html

1

Concepts and Reality of Probability (Papoulis, 1990) • Theory may be exact – Deals with averages of phenomena with many possible outcomes – Based on models of behavior • Application can only be approximate – of our state of knowledge or belief that something may or may not be true – Subjective assessment A : event P(A) : probability of event

nA : number of times A occurs experimentally n :total number of trials n P(A) ≈ A n 2 Interpretations of Probability (Papoulis) • Axiomatic Definition (Theoretical interpretation) – , abstract objects (outcomes), and sets (events)

– Axiom 1: Pr(Ai) ≥ 0 – Axiom 2: Pr(“certain event”) = 1 = Pr [all events in probability space (or universe)] – Axiom 3: With no common elements,

Pr(Ai ∪ Aj ) = Pr(Ai ) + Pr(Aj )

• Relative (Empirical interpretation) n ⎛ Ai ⎞ N = number of trials (total) Pr(Ai ) = lim n = number of trials with attribute A N →∞ ⎝⎜ N ⎠⎟ Ai i

3

Interpretations of Probability (Papoulis) • Classical (“Favorable outcomes” interpretation) n Ai N is finite Pr(Ai ) = nAi = number of outcomes N “favorable to” Ai

• Measure of belief (Subjective interpretation)

– Pr(Ai) = measure of belief that Ai is true (similar to fuzzy sets) – Informal induction precedes deduction – Principle of insufficient reason (i.e., total prior ignorance):

• e.g., if there are 5 event sets, Ai , i = 1 to 5, Pr(Ai) = 1/5 = 0.2

4 Probability “... a way of expressing knowledge or belief that an event will occur or has occurred.” Statistics “The science of making effective use of numerical data relating to groups of individuals or experiments.”

5

Empirical (or Relative) Frequency of Discrete, Mutually Exclusive Events in

ni Pr(xi ) = ; i = 1 to I in [0,1] N • N = total number of events

• ni = number of events with value xi • I = number of different values

• xi = ordered set of hypotheses or values

• xi is a discrete, real random 0.3

• Ensemble of equivalent sets 0.25

0.2 Ai = {x ∈U x = xi } ; i = 1 to I 0.15 Pr(x) • Cumulative probability over all sets 0.1 I I 1 I 0.05 ∑Pr(Ai ) = ∑Pr(xi ) = ∑ni = 1 0 i=1 i=1 N i=1 1 2 3 4 5 6 Cumulative Probability, Pr(x ≥/≤ a), Discrete Random Variables, and Continuous Variables

1

0.8

0.6 Pr(x) Cum Pr(x) ≥ a 0.4 Cum Pr(x) ≤ a 0.2

0 1 2 3 4 5 • Continuous, real , x, e.g., a continuum of

– xi is the center of a categorical band in x (purple, blue, green, …)

Pr(xi ± Δx / 2) = ni / N I ∑Pr(xi ± Δx / 2) = 1 i=1 7

Probability Density , pr(x), and Continuous, Cumulative Distribution Function, Pr(x

Pr(xi ± Δx / 2) ! pr(xi ) ⎯Δ⎯x→⎯0→pr(x) Δx Cumulative probability for x over (–∞, +∞)

I I Pr x x / 2 pr x x ∑ ( i ± Δ ) = ∑ ( i ) Δ ⎯Δ⎯x→⎯0→ i=1 i=1 I→∞ ∞ pr(x) dx = 1 ∫−∞ Cumulative distribution function, x < X

X Pr(x < X) = pr(x) dx Gaussian probability density and ∫−∞ cumulative distribution functions

8 Properties of Random Variables • – Value of x for which pr(x) is maximum • – Value of x corresponding to 50th – Pr(x < median) = Pr(x > median) = 0.5 • – Value of x corresponding to statistical average • First of x = of x “” ∞ x = E(x) = x pr(x) dx ∫−∞ “Moment arm” 9

Expected Values

∞ Mean Value x = E(x) = x pr(x) dx ∫−∞

• Second of x = – Variance from the mean value rather than from zero – Smaller value indicates less in the value of x

∞ 2 2 2 E ⎡(x − x ) ⎤ = σ x = (x − x ) pr(x) dx ⎣ ⎦ ∫−∞ • Expected value of any function of x, f(x)

∞ E f (x) = f (x) pr(x) dx [ ] ∫−∞

See Supplemental Material for examples of non-Gaussian distributions 10 Expected Value is a Linear Operation

Expected value of sum of random variables

x1 and x2 need not be statistically independent ∞ E x1 + x2 = x1 + x2 pr(x) dx [ ] ∫−∞ ( ) ∞ ∞ = x1 pr(x) dx + x2 pr(x) dx = E x1 + E x2 ∫−∞ ∫−∞ [ ] [ ] Expected value of constant times random variable ∞ ∞ E k x = k x pr(x) dx = k x pr(x) dx = k E x [ ] ∫−∞ ∫−∞ [ ]

11

Gaussian (Normal) Random Distribution

• Used in some random number generators (e.g., RANDN) • Unbounded, symmetric distribution • Defined entirely by its mean and

2 (x− x ) − 1 2 2σ x ∞ pr(x) = e pr(x) dx = 1 ∫−∞ 2π σ x

Mean value: First moment, µ1 ∞ E(x) = x pr(x) dx = x ∫−∞

Variance: Second central moment, µ2

∞ 2 2 2 E ⎡(x − x ) ⎤ = (x − x ) pr(x) dx = σ x ⎣ ⎦ ∫−∞

Units of x and σ x are the same 12 Probability of Being Close to the Mean (Gaussian Distribution)

2 (x− x ) − 1 2σ 2 Probability of being within ± 1σ pr(x) = e x 2π σ x

Pr ⎣⎡x < (x + σ x )⎦⎤ − Pr ⎣⎡x < (x − σ x )⎦⎤ ≈ 68%

Probability of being within ± 2σ

Pr ⎣⎡x < (x + 2σ x )⎦⎤ − Pr ⎣⎡x < (x − 2σ x )⎦⎤ ≈ 95%

Probability of being within ± 3σ

Pr ⎣⎡x < (x + 3σ x )⎦⎤ − Pr ⎣⎡x < (x − 3σ x )⎦⎤ ≈ 99%

13

Experimental Determination of Mean and Variance

• Sample mean for N data points, x1, x2, ..., xN N ∑ xi x = i=1 N • Sample variance for same data set N 2 Histogram ∑(xi − x ) σ 2 = i=1 x (N − 1) • Divisor is (N – 1) rather than N to produce an unbiased estimate – (N – 1) terms are independent – Inconsequential for large N • Distribution is not necessarily Gaussian – Prior knowledge: fit histogram to known distribution – Hypothesis test: determine best fit (e.g., Rayleigh, binomial, Poisson, ... ) 14 Central Theorem • of the sum of independent, identically distributed (i.i.d.) variables – Approaches as number of variables approaches infinity – Summation of continuous random variables produces a of probability density functions (Papoulis, 1990) – See Supplemental Material for sufficient conditions

∞ ∞ pr(y) = ∫ pr(y − x)pr(x)dx pr(z) = ∫ pr(z − y)pr(y)dy −∞ −∞

15

MATLAB Simulation of Uniform distribution, 100,000 points, 60 histogram bins

16 Multiple Probability Densities and Expected Values Probability density functions of two random variables, x and y pr(x) and pr(y) given for all x and y in (−∞,∞) pr(x, y) : Joint probability density function of x and y

∞ ∞ ∞ ∞ ∫ pr(x)dx = 1; ∫ pr(y)dy = 1; ∫ ∫ pr(x, y)dx dy = 1; −∞ −∞ −∞ −∞ Expected values of x and y

∞ Mean Value E(x) = ∫ x pr(x)dx = x −∞ ∞ Mean Value E(y) = ∫ y pr(y)dy = y −∞ ∞ ∞ Covariance E(xy) = ∫ ∫ x y pr(x, y)dx dy −∞ −∞ ∞ Autocovariance 2 2 E(x ) = ∫ x pr(x)dx 17 −∞

Joint Probability (n = 2)

Suppose x can take I values and y can take J values; then,

I J ∑Pr(xi ) = 1 ; ∑Pr(yj ) = 1 i=1 j =1 If x and y are uncorrelated,

Pr x , y = Pr x ∧ y = Pr x Pr y ( i j ) ( i j ) ( i ) ( j ) Pr(yj) and I J 0.5 0.3 0.2 ∑∑Pr(xi , yj ) = 1 i=1 j =1 0.6 0.3 0.18 0.12 0.6 Pr(x ) i 0.4 0.2 0.12 0.08 0.4 0.5 0.3 0.2 1 18 (n = 2)

If x and y are not independent, are related Probability that x takes ith value when y takes jth value • Similarly

Pr xi , yj ( ) Pr(xi , yj ) Pr xi | yj = Pr y | x = ( ) Pr y ( j i ) ( j ) Pr(xi )

Pr(xi | yj ) = Pr(xi ) Pr(yj | xi ) = Pr(yj ) iff x and y are independent of each other iff x and y are independent of each other

Causality is not addressed by conditional probability

19

Applications of Conditional Probability (n = 2)

Joint probability can be expressed in two ways

Pr(xi , yj ) = Pr(yj | xi )Pr(xi ) = Pr(xi | yj )Pr(yj )

Unconditional probability of each variable is expressed by a sum of terms

J I Pr y Pr y | x Pr x Pr(xi ) = ∑Pr(xi | yj ) Pr(yj ) ( j ) = ∑ ( j i ) ( i ) j =1 i=1

20 Bayes’s Rule

Thomas Bayes, 1702-1761

Bayes’s Rule proceeds from the previous results th Probability of x taking the value xi conditioned on y taking its j value

Pr(yj | xi )Pr(xi ) Pr(yj | xi )Pr(xi ) Pr(xi | yj ) = = I Pr(yj ) ∑Pr(yj | xi ) Pr(xi ) i=1 ... and the converse

Pr(xi | yj ) Pr(yj ) Pr(xi | yj ) Pr(yj ) Pr(yj | xi ) = = J Pr(xi ) ∑Pr(xi | yj ) Pr(yj ) j =1

Bayes’s or Bayes’ Rule? See http://www.dailywritingtips.com/charless-pen-and-jesus-name/ 21

Random Processes

22 Ensemble Statistics • Probability metrics apply to an ensemble of events

Ai , i = 1 to I n ⎛ Ai ⎞ • For I events, Pr(Ai ) = lim , i = 1 to I N→∞ ⎝⎜ N ⎠⎟

• As I becomes large, Ai approaches a continuous, real, random variable, x,

Ai = {x ∈U x = xi } ; i = 1 to I → ∞

I I I ∑ Pr(Ai ) = ∑ Pr(xi ) = ∑ Pr(xi ± Δx / 2) = • and i=1 i=1 i=1 I ∞ pr xi Δx ⎯⎯⎯→ pr(x) dx = 1 ∑ ( ) Δx→0 ∫−∞ i=1 I→∞ • x has no specified dependence on time 23

Random Process ! Random-process variable (or time ), x(t), varies with time as well as across the ensemble of trials ⎡ ⎤ prensemble (xi ) = prensemble ⎣xi (t )⎦, i = I, I → ∞

! Joint ensemble statistics (e.g., joint probability distribution) depend on time

⎡ ⎤ prensemble (x1,x2 ) = prensemble ⎣x (t1),x (t2 )⎦

24 Non-Stationary Random Process ! Joint ensemble statistics (e.g., joint probability distribution) depend on specific

values of time, t1 and t2 ⎡ ⎤ ⎡ ⎤ prensemble ⎣x (t1)⎦ ≠ prensemble ⎣x (t2 )⎦ {unless periodic} ⎡ ⎤ prensemble (x1,x2 ) = prensemble ⎣x (t1),x (t2 )⎦

… unless is periodic or a special case

25

Non-: Sinusoidal Mean

Trial 1

Trial 2

Trial n

Ensemble statistics Ensemble statistics 26 Non-Stationary Process: Sinusoidal Variance

Trial 1

Trial 2

Trial n

Ensemble statistics Ensemble statistics

27

Stationary Random Process

Joint ensemble statistics (e.g., joint probability distribution) depend only on the difference, Δt, in values of time ⎡ ⎤ ⎡ ⎤ prensemble ⎣x (t1)⎦ = prensemble ⎣x (t2 )⎦ ⎡ ⎤ ⎡ ⎤ prensemble ⎣x (t1),x (t2 )⎦ = prensemble {x (t1),x⎣t1 +(t2 −t1)⎦} ⎡ ⎤ = prensemble ⎣x (t ),x (t + Δt )⎦

28 Stationary Process

Trial 1

Trial 2

Trial n

Ensemble statistics Ensemble statistics

Ensemble statistics and time-series statistics are stationary and not the same for each trial 29

Ergodic Random Process

Ensemble statistics and time-series statistics are stationary, and they are the same

⎡ ⎤ ⎡ ⎤ prensemble ⎣x (t )⎦ = prtime ⎣x (t )⎦ ⎡ ⎤ ⎡ ⎤ prensemble ⎣x (t ),x (t + Δt )⎦ = prtime ⎣x (t ),x (t + Δt )⎦

30 Stationary, Ergodic Process

Ensemble statistics and time-series statistics are stationary and the same for each trial

Trial 1

Trial 2

… Time-series statistics Trial n

Ensemble statistics Ensemble statistics 31

Histograms of Random Processes

Process with Non-Stationary Mean Process with Non-Stationary Variance

Stationary, Non-Ergodic Ergodic Process Process

32 Correlation and Covariance Functions

33

Autocorrelation Functions of Random Processes Function for Non- Stationary Variance

E ⎣⎡x(t1 ), x(t2 )⎦⎤

∞ ∞ = x t x t pr ⎡x t , x t ⎤dx t dx t ∫ ∫ ( 1 ) ( 2 ) ensemble ⎣ ( 1 ) ( 2 )⎦ ( 1 ) ( 2 ) −∞ −∞ !ψ ⎡x t , x t ⎤ ⎣ ( 1 ) ( 2 )⎦ function of two variables 34 Autocovariance Functions of Random Processes Mean values subtracted from variables

E{⎣⎡x(t1 ) − x (t1 )⎦⎤⎣⎡x(t2 ) − x (t2 )⎦⎤}

∞ ∞ = ⎡x t − x t ⎤⎡x t − x t ⎤pr ⎡x t , x t ⎤dx t dx t ∫ ∫ ⎣ ( 1 ) ( 1 )⎦⎣ ( 2 ) ( 2 )⎦ ensemble ⎣ ( 1 ) ( 2 )⎦ ( 1 ) ( 2 ) −∞ −∞

! φ ⎡x" t1 x" t2 ⎤, where (x") ! x − x ⎣ ( ) ( )⎦

35

Autocovariance Functions of Random Processes

With t1 = t2, autocovariance function = Variance

2 E ⎡x t − x t ⎤⎡x t − x t ⎤ = E ⎡x t − x t ⎤ {⎣ ( 1 ) ( 1 )⎦⎣ ( 2 ) ( 2 )⎦} {⎣ ( 1 ) ( 1 )⎦ } = E ⎡x! 2 t ⎤ = σ 2 ⎣ ( 1 )⎦ x

x! t " ⎡x t − x t ⎤ ( ) ⎣ ( ) ( )⎦

36 Cross-Covariance Functions Cross-covariance Function for Non- Stationary Variance

E{⎣⎡x(t1 ) − x (t1 )⎦⎤⎣⎡y(t2 ) − y (t2 )⎦⎤}

∞ ∞ = ⎡x t − x t ⎤⎡y t − y t ⎤pr ⎡x t , y t ⎤dx t dy t ∫ ∫ ⎣ ( 1 ) ( 1 )⎦⎣ ( 2 ) ( 2 )⎦ ensemble ⎣ ( 1 ) ( 2 )⎦ ( 1 ) ( 2 ) −∞ −∞ = φ ⎡x! t y! t ⎤ ⎣ ( 1 ) ( 2 )⎦

With t1 = t2, cross-covariance function

E x t x t y t y t {⎣⎡ ( 1 ) − ( 1 )⎦⎤ ⎣⎡ ( 1 ) − ( 1 )⎦⎤} = σ xy 37

Covariance Functions for Stationary Processes For stationary processes, statistics are independent of time

Autocovariance Cross-covariance x t y t x t y t t φ ⎣⎡x!(t1 )x!(t2 )⎦⎤ = φ ⎣⎡x!(t)x!(t + Δt)⎦⎤ φ ⎣⎡ !( 1 ) !( 2 )⎦⎤ = φ ⎣⎡ !( ) !( + Δ )⎦⎤ " φ Δt " φ Δt xx ( ) xy ( ) Ordering in product is immaterial; therefore, Autocovariance function is symmetric

φxx (Δt) = φxx (−Δt) Cross-covariance function is not symmetric unless y = kx Δt = Lag time 38 Properties of Continuous-Time Covariance Functions for Ergodic Processes Correlation depends on Δt only Convolution of the dependent variables

φxx (Δt) = E ⎣⎡x(t)x(t + Δt)⎦⎤ ∞ 1 T = x(t)x(t + Δt)pr(x)dx = lim x(t)x(t + Δt)dt ∫ T→∞ ∫ −∞ T 0

φyy (Δt) = E ⎣⎡y(t)y(t + Δt)⎦⎤ 1 T = lim y(t)y(t + Δt)dt T→∞ ∫ T 0 Cross-covariance function 1 T φxy (Δt) = E x(t)y(t + Δt) = lim x(t)y(t + Δt)dt ( ) T →∞ ∫ T 0 39

Properties of Discrete-Time Covariance Functions for Ergodic Processes k = Number of (±) lags Convolution sums of the dependent variables Discrete case approaches continuous case as k -> 0

1 N φxx (k) = E xn xn+ k = lim xn xn+ k ( ) N →∞ ∑( ) N n=1

1 N φyy (k) = E yn yn+ k = lim yn yn+ k ( ) N →∞ ∑( ) N n=1

Cross-covariance function 1 N φxy (k) = E xn yn+ k = lim xn yn+ k ( ) N →∞ ∑( ) N n=1 40 Additional Properties of Covariance Functions for Stationary Processes “Self” correlation ≥ lagged correlation

φxx (0) ≥ φxx (Δt)

φyy (0) ≥ φyy (Δt)

2 φxx (0)φyy (0) ≥ ⎣⎡φxy (Δt)⎦⎤

41

Additional Properties of Correlation Functions for Stationary Processes 1 N φxx k = E xnxn+k = lim xnxn+k ( ) ( ) N →∞ ∑( ) N n=1 For finite length, number of products in correlation function estimate decreases as number of lags increases

Time, sec (k) # of Products 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 n 0 1 2 3 4 5 6 7 8 9 10 x(n) - 0.000 0.050 0.100 0.149 0.199 0.247 0.296 0.343 0.389 0.435 0.479 - - x(n) 0.911 11 0.000 0.050 0.100 0.149 0.199 0.247 0.296 0.343 0.389 0.435 0.479 - - x(n+1) 0.785 10 - 0.000 0.050 0.100 0.149 0.199 0.247 0.296 0.343 0.389 0.435 0.479 - x(n+2) 0.659 9 - - 0.000 0.050 0.100 0.149 0.199 0.247 0.296 0.343 0.389 0.435 0.479

N Approximate correction: multiply φ by N − k

42 AutoCovariance Functions for Stationary and Non Stationary Examples

Stationary, Non- Ergodic Process Ergodic Process

Process with Non- Process with Non- Stationary Mean Stationary Variance

43

The of

http://en.wikipedia.org/wiki/Colors_of_noise 44 Dirac Function Singularity at a in time

⎪⎧ ∞, Δt = 0 δ (Δt) = ⎨ 0, Δt ≠ 0 ⎩⎪

∞ Δt1 δ (t)dt = lim δ (Δt)d(Δt) = 1 ∫ Δt1→0 ∫ −∞ −Δt1

Representation of as Gaussian distribution with vanishing standard deviation

2 ⎛ Δt ⎞ 1 −⎜ ⎟ δ (Δt) = lim e ⎝ a ⎠ a→0 a π 45

Continuous-Time

http://simplynoise.com Autocovariance function 2 φxx (Δt) = σ x δ (Δt) Conditional probability distribution = Unconditional probability distribution

pr ⎣⎡x(t) x(t + Δt)⎦⎤ = pr ⎣⎡x(t)⎦⎤ 46 Function Discrete function at indices

⎪⎧ 1, i = j δ ij = ⎨ 0, i ≠ j ⎩⎪

47

Discrete-Time White Noise

Autocovariance function 2 2 φxx (t j ) = σ x δ (t j ) = σ x δ ij Conditional probability distribution = Unconditional probability distribution pr ⎡x t x t ⎤ = pr ⎡x t ⎤ ⎣ ( j ) ( j+k )⎦ ⎣ ( j )⎦

48 Colored Noise*: First-Order Filter Effects

x! = ax − au, a < 0, u = white noise a = –1 rad/s a = –0.1 rad/s

* = non-white noise http://yusynth.net/Modular/EN/NOISE/ 49

Colored Noise: Probability Density Functions x = −x + u

Input Histogram Function Output Histogram Function

tf = 10,000 sec

Probability density functions are virtually identical 50 Markov (or Chains) and Processes ! Markov (Discrete Time) ! Probability distribution of dynamic process at time tk+1 > tk > 0, conditioned on the past history ! Depends only on the state, x, at time tk Pr ⎡x | x , x , x ,,0 ⎤ = Pr x | x ⎣ k +1 ( k k −1 k −2 )⎦ [ k +1 k ] ! Markov Process (Continuous Time) ! Probability distribution of dynamic process at time s > t > 0, conditioned on the past history depends only on the state, x, at time t ! Follow discussion of Continuous-Time (Infinitesimal definition, Jump chain/holding definition, Transition probability definition, Kolmogorov forward equation) at

https://en.wikipedia.org/wiki/Markov_chain 51

Probability Distributions, Markov Chains, and Expected Values • Multivariate probability distributions are functions of many components of x (tautology) • If a process is Markovian, the description on the previous slide applies to the entire distribution • Consequently, it applies to expected values of x and f(x) as well

∞ E (x) = x pr x d x ∫−∞ ( )

∞ E f(x) = f(x) pr x d x [ ] ∫−∞ ( )

52 Steady State Scalar discrete-time LTI system with zero- mean Gaussian random input

xi+1 = axi + (1− a)ui , 0 < a < 1 Expected value

E(xi+1 ) = E ⎣⎡axi + (1− a)ui ⎦⎤, 0 < a < 1

= aE(xi ) + (1− a)E(ui ) Expected value at stochastic steady state

E(xi+1 ) = E(xi )

(1− a)E(xi ) = (1− a)E(ui )

E xi = E ui = 0 ( ) ( ) 53

Autocovariance of a Markov Sequence Scalar LTI system with zero-mean Gaussian random input

φxx (1) = E(xi xi+1 ) = E ⎣⎡xia(xi − ui )⎦⎤ a ⎡E x 2 E x u ⎤ = ⎣ ( i ) − ( i i )⎦

If input is white noise, E(xiui ) = 0

2 2 φxx (1) = aE(xi ) = aσ x

k 2 φxx (k) = a σ x , 0 < a < 1 54 Autocovariance of a Markov Process Expected value at stochastic steady state x! = ax − au = a(x − u), a < 0

E(x!) = a ⎣⎡E(x) − E(u)⎦⎤ ⎯t⎯→∞⎯→0 E x ⎯⎯⎯→ E u ( ) t→∞ ( ) Autocovariance function of Markov Process t E x t x t t E ⎡x t ea Δt x t ⎤ φxx (Δ ) = ⎣⎡ ( ) ( + Δ )⎦⎤ = ⎣ ( ) ( )⎦ a Δt 2 a Δt 2 = e E(x ) = e σ x , a < 0 55

Random Sequence (discrete)

AutoCovariance Functions of

Markov Random Sequences and Process (continuous) Processes

−lna f = Δt

56 and Its Inverse Fourier transform of x(t)

∞ F[x(t)] = X(ω) = ∫ x(t)e− jωt dt, ω = frequency, rad / s −∞

Inverse Fourier transform 1 ∞ x(t) = ∫ X(ω)e jωt dω 2π −∞

x(t) and X ( jω ) are a Fourier transform pair

57

Power Function of a Random Process of the complex amplitude in x(t)

X(ω) = XRe (ω) + jXIm (ω) Complex conjugate

X *(ω) = XRe (ω) − jXIm (ω)

Power spectral density function

2 X (ω) = X (ω)X *(ω)

= [X Re (ω)+ jX Im (ω)][X Re (ω)− jX Im (ω)] 2 2 = X Re (ω)+ X Im (ω) ! Ψ xx (ω) 58 Power Spectral Density and Autocorrelation Functions Form a Fourier Transform Pair For a stationary process, Power spectral density function

∞ Ψ ω = ψ τ e− jωτ dτ xx ( ) ∫ xx ( ) −∞ Autocorrelation function

1 ∞ ψ τ = Ψ ω e jωτ dω xx ( ) ∫ xx ( ) 2π −∞

59

Power Spectral Density and Autocovariance Functions for Variation from the Mean for x(t) = x(t) − x Power spectral density function

∞ Φ ω = φ τ e− jωτ dτ xx ( ) ∫ xx ( ) −∞ Autocovariance function 1 ∞ φ τ = Φ ω e jωτ dω xx ( ) ∫ xx ( ) 2π −∞ 60 Power Spectral Density and Autocovariance Functions for 1st-Order Filter Output, a = –1 rad/s, White-Noise Input

Autocovariance Function Power Spectral Density Function

Power spectral density function has magnitude but not angle 61

Spectral Characteristics of Continuous-Time White Noise

1 ∞ 1 ∞ φ 0 = σ 2 = Φ ω e jω ((0) dω = Φ ω dω A xx ( ) x ∫ xx ( ) ∫ xx ( ) π 0 π 0 Which is (1 π )× area (A) under the power spectral density curve For white noise,

∞ B Φ ω = φ 0 δ τ e− jωτ dτ = φ 0 = constant xx ( ) xx ( ) ∫ ( ) xx ( ) −∞

However, A is infinite except when the constant integrand is zero

62 White Noise and Band-Limiting ! Heuristic arguments for use of “white noise”: ! Real systems are band-limited by ! Discrete-time systems are band-limited by sampling frequency Assume that power spectral density

has a sharp cutoff at ωB

⎧ ⎪ Φ, ω ≤ ω B Φ xx (ω ) = ⎨ 0, ω > ω ⎩⎪ B 2 σ x = Φω B 2π

φxx (τ ) = Φsinω Bτ 2πτ

! If x(t) is a Markov process

2 2 fσ x Φxx (ω ) = 2 2 When power spectral density is plotted ω + f against frequency rather than log(frequency), rolloff does not seem as abrupt

https://en.wikipedia.org/wiki/White_noise 63

Next Time: Least-Squares Estimation

64 Supplemental Material

65

Relative Frequency Example: Probability of Rolling a “7” with Two • Proposition 1: 11 possible sums, one of which is 7 n Ai 1 Pr(Ai ) = = N 11

• Proposition 2: 21 possible pairs, not distinguishing between dice – 3 pairs: 1-6, 2-5, 3-4 n Ai 3 1 Pr(Ai ) = = = N 21 7

• Proposition 3: 36 possible outcomes, distinguishing between the two dice – 6 pairs: 1-6, 2-5, 3-4, 6-1, 5-2, 4-3

n Ai 6 1 Pr(Ai ) = = = N 36 6 66 Rolling Dice Favorable Outcomes Result Dice Rolling, 10,000,000 Trials

Date and Time are 05-Feb-2017 16:37:55 Number = 7; ======freq = 0; Trials = 1e7; Number = 2 Probability = 0.027762 d1 = randi(6,Trials,1); Number = 3 Probability = 0.055632 d2 = randi(6,Trials,1); Number = 4 Probability = 0.083471 for i = 1:Trials Number = 5 Probability = 0.11094 sum = d1(i) + d2(i); Number = 6 Probability = 0.13878 if sum == Number Number = 7 Probability = 0.16689 or ~1/6 freq = freq + 1; Number = 8 Probability = 0.13907 end Number = 9 Probability = 0.11113 end Number = 10 Probability = 0.083353 Number = 11 Probability = 0.055476 Number = 12 Probability = 0.027808

67

Random Number Example • Statistical properties prior to actual event • Excel spreadsheet: 2 random rows and one deterministic row – [RAND()] generates a uniform random number on each call

=RAND() =RAND() =RAND() =RAND() =RAND() =RAND() =RAND() =RAND() =RAND() =RAND() =RAND() =RAND() =RAND() =RAND() 0.1 0.2 0.3 0.4 0.5 0.6 0.7 1st Trial 2nd Trial 3rd Trial 1 1 1

0.8 0.8 0.8

0.6 Random 0.6 0.6 Random 0.4 Deterministic 0.4 0.4

0.2 0.2 0.2

0 0 0

4th Trial th 0.90 • Output for 4 trial 0.80 0.70 0.18 0.54 0.49 0.49 0.02 0.73 0.88 0.60 0.81 0.46 0.84 0.16 0.89 0.30 0.03 0.50 0.40 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.30 0.20 0.10 Once the experiment is over, 0.00 the results are determined 68 Simple Hypothesis Test: t Test

Is A different from B?

• t compares mean value difference of two data sets, normalized by standard deviations – |t| is reduced by uncertainty in the data sets (s) – |t| is increased by number of points in the data sets (n)

(mA − mB ) t = m : Mean value of the data set 2 2 σ : Standard deviation of data set σ A σ B + n : Number of points in data set nA nB

• |t| > 3, mA ≠ mB with ≥99.7% confidence (error probability ≤ 0.003 for Gaussian distributions) [n > 25]

69

Application of t Test to DNA Microarray Data (Data from Alon et al, 1999)

2 2 σ T σ N t = (mT − mN ) + 36 22

• 58 RNA samples representing tumor and normal tissue • 1,151 transcripts are over/under- expressed in tumor/normal comparison (p ≤ 0.003) • Genetically dissimilar samples are apparent

70 Mean Value of a Uniform Random Distribution • Used in most random number generators (e.g., RAND) • Bounded distribution • Example is symmetric about the mean ⎧ 0 x x ⎪ < min ∞ ⎪ 1 pr x dx =1 pr(x) = ; x < x < x ∫−∞ ( ) ⎨ x x min max ⎪ max − min x > xmax ⎩⎪ 0 Mean value 2 2 ∞ x max x 1 xma−x xmin 1 x = E(x) = x pr(x) dx = dx = = xmax + xmin ∫−∞ ∫x ( ) min xmax − xmin 2 xmax − xmin 2 71

Variance and Standard Deviation of a Uniform Random Distribution

∞ pr(x) dx = 1 ∫−∞

Variance

xmin = −xmax ! a

3 a 2 a 2 2 1 2 x a E ⎡(x − x ) ⎤ = σ x = x dx = = ⎣ ⎦ 2a ∫−a 6a 3 −a Standard deviation

2 2 a a σ x = σ x = = 3 3 72 Some Non-Gaussian Distributions • – Random variable, x – Probability of k successes in n trials – Discrete probability distribution described by a probability mass function, pr(x )

n! k n−k ⎛ n ⎞ k n−k pr(x) = p(x) ⎣⎡1− p(x)⎦⎤ ! p(x) ⎣⎡1− p(x)⎦⎤ k!(n − k)! ⎝⎜ k ⎠⎟ = probability of exactly k successes in n trials, in (0,1) ~ normal distribution for large n of the distribution p(x) : probability of occurrence, in (0,1) n :number of trials 73

Some Non-Gaussian Distributions

– Probability of a number of events occurring in a fixed period of time – Discrete probability distribution described by a probability mass function

λ = Average rate of occurrence of event (per unit time) λ ke−λ k = # of occurrences of the event pr(k) = k! pr(k) = probability of k occurrences (per unit time) ~ normal distribution for large λ

• Cauchy-Lorentz Distribution γ pr(x) = 2 – Mean and variance are undefined π ⎡γ 2 + x − x ⎤ ⎣ ( 0 ) ⎦ – “Fat tails”: extreme values more likely than normal distribution 1 −1 ⎛ x − x0 ⎞ 1 Pr(x) = tan ⎜ ⎟ + – Central limit theorem fails π ⎝ γ ⎠ 2 74 Some Non-Gaussian Distributions • – Continuous positive random variable – Example: Magnitude of vector whose components have normal distributions

x −x2 (2σ 2 ) pr(x) = e , x ≥ 0 σ 2

75

Some Non-Gaussian Distributions • – Positive random variable with shape, k, and scale, λ – Example: Time to failure for process with failure rate proportional to a power of time

k−1 k k ⎛ x ⎞ −(x λ) pr(x) = ⎜ ⎟ e , x ≥ 0 λ ⎝ λ ⎠

76 Some Non-Gaussian Distributions

• Poisson Distribution – Probability of a number of events occurring in a fixed period of time – Discrete probability distribution described by a probability mass function

λ = Average rate of occurrence of event (per unit time) λ ke−λ k = # of occurrences of the event pr(k) = k! pr(k) = probability of k occurrences (per unit time) ~ normal distribution for large λ

• Cauchy-Lorentz Distribution γ pr(x) = 2 – Mean and variance are undefined π ⎡γ 2 + x − x ⎤ ⎣ ( 0 ) ⎦ – “Fat tails”: extreme values more likely than normal distribution 1 −1 ⎛ x − x0 ⎞ 1 Pr(x) = tan ⎜ ⎟ + – Central limit theorem fails π ⎝ γ ⎠ 2 77

Skewness and of Probability Distributions (3rd and 4th Central Moments) µ µ = 3 Excess Kurtosis = Kurtosis − 3 = 4 − 3 σ 3 σ 4

Laplace Hyperbolic Secant Logistic Normal Raised Cosine Wigner Semicircle Uniform

78 Two Trials to Show Mean, Standard Deviation, Kurtosis, and Skewness

Stationary, Ergodic Mean and Variance Stationary, Ergodic Mean and Variance Mean = 0.0068123 Mean = 0.0050475 Standard Deviation = 1.0006 Standard Deviation = 0.99947 Kurtosis = 3.0291 Kurtosis = 3.0485 Skewness = -0.0093799 Skewness = 0.0147

Stationary, Non-Ergodic Mean and Variance Stationary, Non-Ergodic Mean and Variance Mean = 0.99847 Mean = 0.99337 Standard Deviation = 1.3156 Standard Deviation = 1.3262 Kurtosis = 2.5933 Kurtosis = 2.6395 Skewness = -0.12198 Skewness = -0.18208

Non-Stationary (Sinusoidal) Mean Non-Stationary (Sinusoidal) Mean Mean = 0.17438 Mean = 0.18366 Standard Deviation = 1.2058 Standard Deviation = 1.1833 Kurtosis = 2.904 Kurtosis = 2.83 Skewness = -0.042752 Skewness = -0.10543

Non-Stationary (Sinusoidal) Variance Non-Stationary (Sinusoidal) Variance Mean = -0.0043818 Mean = 0.0072196 Standard Deviation = 0.6804 Standard Deviation = 0.71168 Kurtosis = 4.6055 Kurtosis = 5.0192 Skewness = 0.063442 Skewness = -0.05492

Sine Mean = 6.6818e-06 Mean = 6.6818e-06 Standard Deviation = 0.707 Standard Deviation = 0.707 Kurtosis = 1.5006 Kurtosis = 1.5006 Skewness = -2.8338e-05 Skewness = -2.8338e-05 79

Some Non-Gaussian Distributions Bimodal Distributions

• Bimodal Distribution – Two Peaks – e.g., concatenation of 2 normal distributions with different

Histogram • Random Sine Wave x = Asin[ωt + random phase angle]

⎧ 1 ⎪ x ≤ A pr(x) = ⎨ π A2 − x2 ⎪ x > A ⎩ 0

80 Histograms (Wikipedia Example) Scaled Numbers

Area = Sum (+)

Normalized Numbers

Area = 1

81

Sufficient Conditions for Central Limit Theorem (Papoulis, 1990)

The probability distribution of the sum of independent, identically distributed (i.i.d.) variables approaches a normal distribution as the number of variables approaches infinity

3 xi , are independent, identically distributed (i.i.d.) random variables, and E(xi ) is finite

xi are bounded: xi < A < ∞, and σ i > a > 0

3 2 E(xi ) < B < ∞, and ∑σ n ⎯n⎯→∞⎯→∞

82 Dependence and Correlation x and y are independent if pr(x, y) = pr(x)pr(y) at every x and y in (−∞,∞) pr(x | y) = pr(x); pr(y | x)= pr(y) Dependence pr(x, y) ≠ pr(x)pr(y) for some x and y in (−∞,∞) x and y are uncorrelated if E(xy) = E(x)E(y) ∞ ∞ ∞ ∞ x y pr x, y dx dy = x pr x dx y pr y dy = x y ∫ ∫ ( ) ∫ ( ) ∫ ( ) −∞ −∞ −∞ −∞ Correlation E(xy) ≠ E(x)E(y)

83

Which Combinations are Possible?

Independence and lack Independence and of correlation correlation pr(x, y) = pr(x)pr(y) at every x and y in (−∞,∞) pr(x, y) = pr(x)pr(y) at every x and y in (−∞,∞) ∞ ∞ ∞ ∞ ∞ ∞ ∫ ∫ x y pr(x, y)dx dy x y pr x, y dx dy = x pr x dx y pr y dy = x y −∞ −∞ ∫ ∫ ( ) ∫ ( ) ∫ ( ) ∞ ∞ ∞ ∞ −∞ −∞ −∞ −∞ = ∫ ∫ x y pr(x)pr(y)dx dy ≠ ∫ x pr(x)dx ∫ y pr(y)dy = x y −∞ −∞ −∞ −∞ Dependence and lack Dependence and of correlation correlation pr(x, y) ≠ pr(x)pr(y) for some x and y in (−∞,∞) pr(x, y) ≠ pr(x)pr(y) for some x and y in (−∞,∞) ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∫ ∫ x y pr(x, y)dx dy = ∫ x pr(x)dx ∫ y pr(y)dy = x y ∫ ∫ x y pr(x, y)dx dy ≠ ∫ x pr(x)dx ∫ y pr(y)dy = x y −∞ −∞ −∞ −∞ −∞ −∞ −∞ −∞

84 ⎡ ⎤ ⎡ x ⎤ xk 1k xk = ⎢ ⎥ = ⎢ ⎥ Example ⎢ yk ⎥ ⎢ x2 ⎥ ⎣ ⎦ ⎣ k ⎦ 2nd-order LTI system

xk +1 = Φxk + Λwk , x0 = 0

Gaussian disturbance, wk, with independent, uncorrelated components

⎡ 2 ⎤ ⎡ w ⎤ σ w 0 w ⎢ 1 ⎥ ; Q ⎢ 1 ⎥ = = 2 ⎢ w2 ⎥ ⎢ 0 σ ⎥ ⎣ ⎦ ⎣ w2 ⎦ Propagation of state mean and covariance

xk +1 = Φxk + Λw

T T Pk +1 = ΦPkΦ + ΛQΛ , P0 = 0

Off-diagonal elements of P and Q express correlation 85

xk +1 = Φxk + Λw

T T Example Pk +1 = ΦPkΦ + ΛQΛ , P0 = 0

Independence and lack of Independent dynamics correlation in state and correlation in state

⎡ a 0 ⎤ ⎡ c 0 ⎤ ⎡ a 0 ⎤ ⎡ c ⎤ Φ = ⎢ ⎥; Λ = ⎢ ⎥ Φ = ⎢ ⎥; x0 ≠ 0; w = w; Λ = ⎢ ⎥ ⎣ 0 b ⎦ ⎣ 0 d ⎦ ⎣ 0 a ⎦ ⎣ c ⎦

Dependence and lack of Dependence and correlation in nonlinear output correlation in state ⎡ a b ⎤ ⎡ e 0 ⎤ Φ = ; Λ = ⎢ ⎥ ⎢ 0 f ⎥ ⎣ c d ⎦ ⎣⎢ ⎦⎥ ⎡ a b ⎤ ⎡ e 0 ⎤ Φ = ⎢ ⎥; Λ = ⎢ ⎥ ⎡ ⎤ c d 0 f x1 ⎣ ⎦ ⎣⎢ ⎦⎥ z = ⎢ ⎥; Conjecture (t.b.d.) x2 ⎣⎢ 2 ⎦⎥ 86 Colored Noise: First-Order Filter a = –1 rad/s

Input AutoCovariance Output AutoCovariance Function Function

tf = 10,000 sec 87

Bandwidth Effect: Comparison of AutoCovariance Functions for 1st- Order Filters a = –1 rad/s a = –0.1 rad/s

tf = 10,000 sec

88 Sampled Sine Wave Statistics

⎧ 1 ⎪ x ≤ A pr(x) = ⎨ π A2 − x2 Time Series ⎪ x > A ⎩ 0

Histogram

Bi-modal distribution x(t) = sin(0.1t) Autocorrelation Function (aliased by short sample length Δt = 0.1 s, # points = 1258 without n/(n – k) correction)

89

Sample-Length Effect on Estimate of Sine Wave Autocorrelation Function

x(t) = sin(t 16π )

Autocorrelation function estimate becomes inaccurate when number of lags > ~10% of the number of sample points 90 Frequency Folding and

τ : Sampling interval, sec

ω D = ω Cτ, rad/sec π ω = = Folding frequency C τ ! Discrete Fourier transform is periodic in frequency

increments of Δω D = 2π ! Relationship of discrete Fourier transform to continuous Fourier transform

1 ∞ ⎛ 2πn⎞ XD (ωC ) = ∑ XC ⎜ω + ⎟ τ n=−∞ ⎝ τ ⎠

! Distortion of and is called aliasing 91