<<

Spectral Estimation

Examples from research of Kyoung Hoon Lee, Aaron Hastings, Don Gallant, Shashikant More, Weonchan Sung Herrick Graduate Students Estimation: Bias, and Square Error Let φ denote the thing that we are trying to estimate. Let φ ˆ denote the result of an estimation based on one set with N pieces of information. Each data set used for estimation à a different estimate of φ.

ˆ ˆ Bias: b ( φ ) = φ − E [ φ ] True value - the average of all possible estimates formed from N data points 2 2 Variance: σ = E[ (φˆ − E[φˆ]) ] Measure of the spread of the estimates about the mean of all estimates.

2 2 2 Mean Square Error: m.s.e. = E[ (φˆ −φ) ] = b +σ Estimation: Some definitions

Estimate is consistent if, when we use more data to form the estimate, the mean square error is reduced.

If we have two ways of estimating the same thing, we say that the that leads to the smaller mean square error is more efficient than the other estimator.

true estimates φ = (a,b) value bias xxx b xxx x mean of all x x x estimates

a Examples

1 N Bias and variance of an estimate of the mean: X ,µˆ = ∑ X N n n=1 ⎡ 1 N ⎤ 1 N 1 N E[µˆ] = E⎢ ∑ X ⎥ = ∑ E⎡X ⎤ = ∑ µ = µ (unbiased) N n N ⎣ n⎦ N ⎣⎢ n=1 ⎦⎥ n=1 n=1 Derivation ⎡ 2⎤ ⎡ 2⎤ ⎛⎛ N ⎞ ⎞ ⎛ N ⎞ ⎡ 2⎤ ⎢ 1 ⎥ ⎢ 1 ⎥ assuming that 2 ˆ ˆ ⎜⎜ ⎟ ⎟ ⎜ ⎟ σµˆ E⎢ µ − E[µ] ⎥ = E⎢ ∑ Xn − µ ⎥ = E⎢ ∑ Xn − µ ⎥ ⎣( ) ⎦ ⎜⎜ N ⎟ ⎟ ⎜ N ( )⎟ the samples X ⎢⎝⎝ n=1 ⎠ ⎠ ⎥ ⎢⎝ n=1 ⎠ ⎥ n ⎣ ⎦ ⎣ ⎦ are independent 1 ⎡ N N ⎤ = E⎢ X − µ X − µ ⎥ 2 ∑ ∑ ( m )( n ) of one another. N ⎣⎢n=1m=1 ⎦⎥ ⎧ 2 ⎫ 1 2 ⎡ ⎤ ⎡ ⎤ = ⎨ N − N E⎣(Xn − µ)(Xm − µ)⎦ + N E⎢(Xn − µ) ⎥⎬ N 2 ⎩( ) ⎣ ⎦⎭ Separate into terms ⎧ 2 ⎫ 1 2 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ = ⎨ N − N E⎣(Xn − µ)⎦E⎣(Xm − µ)⎦ + N E⎢(Xn − µ) ⎥⎬ where n does not N 2 ⎩( ) ⎣ ⎦⎭ 1 ⎡ 2⎤ 1 2 equal m and where = N E⎢(Xn − µ) ⎥ = σ x N 2 ⎣ ⎦ N n=m Examples

Biased Estimate of the variance of a set of N measurements:

N 1 ∑ (X − µˆ)2 n N n=1

Unbiased Estimates of the variance of a set of N measurements:: N N 1 2 1 2 ∑ (Xn − µˆ) and ∑ (Xn − µ) N −1n=1 N n=1 First estimate the mean, and use Special case where the mean is that estimate in this calculate known and doesn’t need to be (have lost 1 degree of freedom) estimated from the data Estimation of Autocovariance functions

Two methods of estimating Rxx(τ) from T sec. of data. 1. Dividing by the integration time: T-|τ| Estimation was unbiased but had very high variance, particularly when τ is close to T. 2. Dividing by total time: T Estimation was biased (asymptotically unbiased). This was equivalent to multiplying first estimate by a triangular window (T-|τ|)/T. This window attenuates the high variance estimates.

x(t) x(t) x(t+τ) T secs τ time

Calculating the average value of [x(t) x(t+τ)] from T seconds of data. Estimation of Autocovariance functions

Two methods of estimating Rxx(τ) from T sec. of data. 1. Dividing by the integration time: T-|τ| Estimation was unbiased but had very high variance, particularly when τ is close to T. 2. Dividing by total time: T Estimation was biased (asymptotically unbiased). This was equivalent to multiplying first estimate by a triangular window (T-|τ|)/T. This window attenuates the high variance estimates.

x(t+τ) x(t) T secs x(t) x(t+ ) x(t) τ τ time x(t) x(t+ ) x(t+ ) x(t) τ τ x(t) τ τ τ τ Calculating the average value of [x(t) x(t+τ)] from T seconds of data. Estimation of Cross

Same issues as for Auto-Covariance: Bigger τ less averaging for finite T. y(t-τ) x(t) x(t) T time τ

y(t) T x(t) y(t+τ) time x(t) and y(t), zero mean, weakly stationary random processes. Average value of [x(t) y(t+τ)]. Additional problem: must make T large enough to accommodate system delays.

Estimation of Covariance

With fast computation of spectra, these are now more usually estimated by inverse Fourier transforming the and cross estimates.

Inverse transform of RAW PSD or CSD ESTIMATE equivalent to Method 2 for calculating covariance functions with triangular window for data

of size Tr Power Spectral

Definition: ⎡ X * X ⎤ S ( f ) = lim E⎢ T T ⎥ = +∞ R (τ)e− j2π f τ dτ. xx T → ∞ ∫−∞ xx ⎣⎢ T ⎦⎥ Estimation: 1. Could the estimate (not computationally efficient). 2. Could use the domain definition directly. ⎡ * ⎤ ˆ XT XT Raw Estimate = Sxx ( f ) = ⎢ ⎥ ⎣⎢ T ⎦⎥ No averaging! Extremely poor variance characteristics.

2 Variance is S x x ( f ) and is unaffected by T, the length of data used. Power Spectral Density Estimation (Continued)

Smoothed estimate from segment averaging. w(t) x(t)

time Ts 1. Break up into Nseg segments, Tr seconds long. 2. For each segment: 1. Apply a window to smooth transition at ends of segments

2. Fourier Transform windowed segment à XT(f) 2 3. Calculate a raw power spectral density: |XTs | /Ts estimate 3. Average the results from each segment to get the smoothed estimate and do a power compensation for the window used. NSEG 1 ˆ 1 2 S!xx ( f ) = Sxx ( f ) wcomp = w (t)dt NSEG.wcomp ∑ i T ∫ i=1 Power Spectral Density Estimation (Continued)

Smoothed estimate from segment averaging. w(t) x(t)

time Ts Overlap: For some windows segment overlap makes sense. A Hann window, 50% overlap that data de-emphasized in one windowed segment is strong emphasized in the next window (and vice versa).

Bias: Note PSD estimate bias is controlled by the size of the window (Ts) which controls the frequency resolution (1/Ts). Larger window, smoother transitions à less power leakage à less bias Power Spectral Density (PSD) Estimation (Continued)

We argue that the distribution of the smoothed PSD was related to that of a Chi-squared 2 (χν ) with ν = 2.NSEG degrees of freedom, if Tr was large enough so we could ignore bias errors.

Therefore: ⎡2.Nseg.S! ⎤ 4.Nseg2 xx ⎡ ! ⎤ Variance⎢ ⎥ = Variance Sxx = 2(2.Nseg) S 2 ⎣ ⎦ ⎣ xx ⎦ Sxx

2 ! Sxx and rearranging we showed that: Variance[Sxx ] = Nseg

Therefore, we can control variance by averaging more segments.

Note: shorter segments mean larger bias, so for a fixed T seconds of data, there is a trade-off between Segment Length (Tr), which controls the bias, and Number of Segments (NSEG), which controls the variance: T=Tr.NSEG.

Cross Spectral Density (CSD)

Definition: ⎡ * ⎤ lim XTYT +∞ − j2π f τ Sxy ( f ) = E⎢ ⎥ = ∫ Rxy (τ)e dτ. T → ∞ ⎢ T ⎥ −∞ ⎣ ⎦ Estimation: Could Fourier Transform the Cross-correlation function estimate (not computationally efficient). Could use the definition directly. ⎡ * ⎤ ˆ XTYT Raw Estimate = Sxy ( f ) = ⎢ ⎥ ⎣⎢ T ⎦⎥ As with PSD, this has extremely poor variance characteristics, so – divide the time histories into segments, – generate a raw estimate from each segment, and – average to reduce variance and produce a smoothed estimate.

Cross Spectral Density Estimation: Segment Averaging

w(t) x(t)

time Ts y(t) w(t)

Ts time

à Fourier Transform of Windowed Segments XT(f) & YT(f). * ˆ XTs( f )YTs( f ) Sxy ( f ) = Raw Estimate from ith segment = i Ts 1 Nseg S! ( f ) = Sˆ ( f ) Smoothed Estimate = xy ∑ xyi Nseg i=1 Issues with Cross Spectral Density Estimates

1. Reduce bias by choosing the segment length (Tr) as large as possible. (Bias greatest where the changes rapidly.) 2. Reduce variance by averaging many segments. 3. Might require a large amount of averaging to reduce effects: ym(t) = y(t) + n(t) = h(t)∗ x(t) + n(t) x(t), n(t) zero mean, weakly stationary,

uncorrelated random processes 2 Syy H( f ) Sxx ! ! ! ! SNR = = Sxy ≈ H( f )Sxx + Sxn → H( f )Sxx ym S S nyny nyny 1 ⎡ 1 ⎤ Var{Sxy} proportional to ⎢1+ ⎥ Nseg 2 ⎣⎢ γxy ⎦⎥

4. Time delays between x and y cause problems, if the time delay (to) is greater than a small fraction of the segment length (Tr). Can estimate t0 and offset y segments, but need T+t0 seconds of data. Cross Spectral Density Estimation: Segment Averaging with System Delays

w(t) x(t)

time estimated Ts t y(t) 0 w(t)

Ts time

Offsetting y segements essentially removes most ofà the delay from the Fourier Transform of Windowed Segments XT(f) & YT(f). estimated function.

Can put back delay effects in by multiplying estimate of H(f) by: ˆ e− j2π f t0 Function Estimation: Substitute in Smoothed Estimates of Spectral Densities Coherence takes values in the 0 to 1. 2 ! 2 2 | Sxy | 2 | Sxy | Definition: γ = ; Estimate: γ! = xy xy ! ! SxxSyy SxxSyy – Substituting raw spectral density estimates into formula results in 1 A result where the coherence = 1 at all from measured should be treated with a high degree of suspicion. – Estimate highly sensitive to bias in spectral density estimates, which is particularly bad where the phase of the cross spectral density changes rapidly (at maxima and minima in |Sxy|). – COHERENCE à 0 because of: NOISE ON INPUT AND OUTPUT NONLINEARITY BIAS ERRORS IN ESTIMATION

Example: System with Some Nonlinearities (cubic stiffness) and Noisy Measurements

Nonlinearity causes spread of energy here, around 3x and 5x this frequency

Nonlinear Poor Poor SNRy SNRy

Nonlineary causes broad dips in coherence function. Poor SNR If you drive the system on output causing this harder these regions become wider Dips due to Bias Errors Example: with Noisy Output Measurements High SNR; High SNR; Low SNR on output; Tr = 512/fs Tr = 2048/fs Tr = 512/fs Dip filled in with noise

Less Averaging compared to N=512 case: fewer segments à greater variance but bias effects are less

Bias greatest where phase change is fastest

Dips mainly due to bias….

and thus get smaller SNRy also affecting as resolution increases coherence here H1 and H2 Estimates of H: Effects of Noise

If the system is linear and there is No Noise (ignoring all other estimation erros): H(f) = Sxy(f)/Sxx(f) (H1 approach) = Syy(f)/Syx(f) (H2 approach)

Cases with Noise: Assume that estimation errors are small (Tr and Nseg both large).

= S /S = [S (f)/S (f)]/[1+ S /S ] H1estimate xmym xmxm xy xx nxnx xx = H(f)/[1+ S /S ] nxnx xx Noise on the input adversely affects this estimate of H. Theory: |H1estimate| < |H|

= S /S* = [S (f)/S* (f)].[1+ S /S ] H2 estimate ymym xmym yy xy nyny yy = H(f).[1+ S /S ] nyny yy Noise on the output adversely affects this estimate of H. Theory: |H2 estimate| > |H|

Note that with bias errors due to windowing (Tr not as large as you would like) these inequalities may not hold, but |H1estimate| < |H2estimate| Estimation of H

⎡ ! ⎤ ! Sxy E[Sxy ] Note that, e.g., E[Hˆ ] = E⎢ ⎥ ≠ ! ! ⎣⎢Sxy ⎦⎥ E[Sxx ] Frequency response function estimates are extremely sensitive to bias errors which are worse at peaks and troughs. Require large segment sizes to overcome bias, but this means less segments to average, thus higher variance.

Note: A low coherence function does not necessarily imply a poor frequency response function estimate. If the coherence function is low because of noise on the response (input), then the H1 (H2) frequency response estimation should be accurate, provided sufficient averaging was done to reduce the variance of the estimates. Calibration of PSD and CSD in MatLab psd - old program pwelch – new program cpsd – gives complex conjugate of want you want mean square value of the time signal (variance), should give the same result as integrating the PSD. (Parseval’s theorem)

Check for whether you are getting a two-sided or a one-sided PSD. One sided: Add negative and positive frequency contributions (not for the components at f=0 and fs/2, though, which should be zero anyway) – this is what Matlab does Two sided: When you integrate the (0 to fs/2) you’ll get about half of what you expect (no addition of positive and negative frequency contributions has occurred)

Matlab also doubles the CPSD from 0 to fs/2, which doesn’t make sense, because it is convenient when you estimated the frequency response function because the doubling cancels. Calibration (continued)

Power Spectral Density Estimates Using DTFs:

Recall that for –fs/2 < f < fs/2,

XT ( f ) fs ≈ Δ.DFT w(nΔ).x(nΔ),n = 0,1,...Ns −1 = Δ.Xk f =k ( ) Ns

2 * * Δ X k ˆ X T ( fk )XT ( fk ) 2 X k X k Sxx ( fk ) = ≈ Δ . = Ts.wcomp NsΔ.wcomp Ns.wcomp

Ns = number of points in a segment. Calibration Continued: Energy Spectral Density

We sometimes have segments that contain a single transient (tap testing of structures) and we average the raw spectra from each segment to remove noise effects. [Be careful with applying this random process theory to different types of signals, each segment used in the estimation should contain similar information.] If we choose different Tr, i.e., allow a shorter or longer time between successive transients, (transient should have died away in the segment), the PSD will change because of the divide by Tr in the formula.

Ts time - s To overcome this problem we estimate an Energy Spectral Density (ESD) (remove the divide by Tr in the raw PSD estimate.) 2 2 2 2 Raw ESD estimate = |XTs(f)| ≈ Δ |Xk| (/Hz) [You also need to be careful with window choice here so as not to distort the transient] Calibration Continued: Power Spectrum

Power Spectrum Segment averaging is often applied to signals that have periodic and random components. In a power spectrum (works great for periodic signals), as resolution increases (frequency spacing gets smaller) the noise floor decreases. Total power = sum of power at each spectral component.

Recall: Ck = Xk/N, if you synchronize, don’t alias and there is no noise.

Power Spectral Density (PSD) (ideal for random signals level unaffected by changes in frequency resolution – window size) Total power = the integral of the PSD = sum of PSD Values x Freq.Resolution 2 2 Power estimate = |Xk| /N 2 2 = Raw PSD estimate . (frequency resolution) = (Δ |Xk| /N ) . (fs/N) V