<<

Appendix 1 Linear Systems Analysis

LINEAR SYSTEMS AND FILTER

A is given by the following relation:

y(t) = f: h(t, 'O)x(r) d'O (A1-1) where x(t) is called the input, h(t, '0) the system function at t = '0, y(t) the output . If the system is time invariant, the output signal of the linear system is given by

y(t) = f:oo h(t - 'O)x('O)d'O (A1-2) h(t) is called the impulse response. Because if the input is an impulse, then the output

y(t) = f:oo h(t - '0)15('0) d'O = h(t) (A1-3)

For a time invariant linear system, when the input is x(t - to), then the output is y(t - to), where to is the delay. The system is called causal if

h(t) = 0, t

This means there is no response prior to the input. The system is called stable if bounded input produces a bounded output. This requires that the impulse function satisfies

(A1-5)

The time invariant system satisfies the linearity condition: If

i = 1,2

549 550 APPENDIX 1 then

(Al-6)

where at and a2 are constants. Note that the output of a causal time invariant system is given by

yet) = r", h(t - r)x(r) dr ~ h(t) ® x(t), (Al-7) convolution of x(t) with h(t). Taking the of the both sides of (Al-7), when they exist, we get

Y(f) = H(f)X(f) (Al-8) where

i2 j X(f) = 9"(x(t)) = J:", x(t)e- 7[ t dt

i2 j H(f) = 9"(h(t)) = J:", h(t)e- 7[ t dt

i2 j Y(f) = 9"(y(t)) = L"'", y(t)e- 7[ t dt

Y(f), H(f), and X(f) are called the output, system, and input transfer func• tion. In other words the system , the Fourier transform of the impulse function, is given by the relation

Y(f) Fourier transform of output signal (Al-9) H(f) = X(f) = F·ouner transf orm 0 f·mput sIgna. I

The time invariant linear system can also be given by the constant coefficient linear differential equation with a forcing term. The system is shown in Fig. (Al-l).

X(t) H(t) y(n Output Input J I hIt)

x(~ y(~

Fig. AI-l. Linear time invariant system. LINEAR SYSTEMS ANALYSIS 551

HILBERT

H(f) = {-~, j: ~ j, f> 0 (A1-1O) (j=~)

Taking the inverse Fourier transform of (A1-10), the filter impulse function

1 h(t) =- (A1-11) nt

The output, known as the Hilbert transform is given by

1 y(t) = - (8) x(t) m

= ~foo X(T) dt n -oot-T (Al-12)

If x(t) = cos t, y(t) = sin t. The Hilbert transform shifts the phase and is there• fore called quadrature filter. A signal z(t) is called analytic signal if

z(t) = x(t) + ix(t), i=~ (Al-13) where x(t) is the Hilbert transform of x(t). If x(t) = sin t, then z(t) = exp [it].

SAMPLING

Signals that are of discrete- type are called digital . Analog signals can be sampled, the sampled values of the signals form digital signals. It is possible to reconstruct the from its sampled values provided the signal is band limited and it is sampled at the Nyquist rate. A signal is called bandlimited of fx if its one sided spectrum is zero outside fx. The Nyquist sampling interval is defined as T, = 1/2fx, the Nyquist sampling rate is Is = 2fx, and Nyquist instants are

(A1-14)

For example, if x(t) = A cos 12m, then fx = 6 Hz, Nyquist sampling interval is 1/12 sec, and rate is 12 Hz. 552 APPENDIX 1

IDEAL INSTANTANEOUS SAMPLING THEOREM

A band limited signal of bandwidth Ix can be exactly reconstructed from its sampled values uniformly spaced in time with period 1'. < 1/21x by passing it through an ideal low pass filter with bandwidth B where Ix < B < Is - Ix and is given by

A ~ • 2B(t - n1'.) x(t) = 2B L... x(n1'.) sm ( ) (AI-15) n=-oo 2Bt-n1'.

This formula is known also as interpolation or as cardinal series.

z-TRANSFORM

Let {Xn} be samples of a continuous signal. We define the z-translorm of Xn as

00 X(z) = L Xnz-n (AI-16) n~ -00

where z is a complex variable. The set of values of z for which the series converges to a finite value is called the domain or region of convergence. The z-transform X(z) is analytic in the region Rl < Izl < Rz whenever the right side of Eq. (AI-16) has a finite limit. If Xn = °when n < 0, then

00 X(z) = L Xnz-n (AI-17) n=O

The z-transform of Xn is a series of negative power of z only. The z-transform exists if

(AI-18)

If z = R exp [i2nf], then X(z) exists if

The series converges absolutely if

(AI-19) LINEAR SYSTEMS ANALYSIS 553

The domain of convergence is outside of the circle of radius R I . If

Xn = 0 when n > 0 then

o 00 X(z) = L Xnz-n = L: X_mzm (Al-20) n=-oo m=O

X(z) is a power series of positive power of z. The series converges absolutely if Izl < R z. The domain of convergence is the circle with center at the origin and radius R z. The transform of Xn in Eq. (Al-16) can be written as

o 00 X(z) = L: Xnz-n + L: Xnz-n - Xo (Al-21) n=-oo n=O

For the existance of X(z), the domain convergence is RI ~ Z ~ R z. When X(z) is a of z, X(z) = P(z)/Q(z), where P(z), and Q(z) are polynomials in z. The roots of P(z) are called the zeros of X(z) and the roots of Q(z) are called the poles of X (z). When X(z) is a rational function of z, the domain of convergence is bounded by the minimum and maximum values of the poles.

Properties of the z-transform

1. Linearity. If

00 L: Xnz-n = X(z), n=-oo

and

00 L: Y,.z-n = Y(z), n=-oo

then

00 L: (aXn + bYn)z-n = aX(z) + b Y(z), Rs < Izl < R6 (Al-22) n=-co

where 554 APPENDIX 1

Rs = max[Rl,R3J

R6 = min [R2 ,R4 J

2. Delay. If

n= -00 then

co L Xn_mz-n = z-m X (z), (Al-23) n=-oo

3. Conjugation. If

co L Xnz-n = X(z), n= -00

then

co L X:z-n = X*(z*), (Al-24) n=-X

Here X: stands for complex conjugate of Xn and Xn is a complex sequence. 4. Convolution. If

co L Xnz-n = X(z) n= -00

co L hnz-n = H(z) n= -00

and

co Y" = L hn-kXk K=-co

then

Y(z) = H(z)X(z) (Al-25)

The inverse of z-transform is given by LINEAR SYSTEMS ANALYSIS 555

Xn = ~ rX(Z)Zn-l dz (Al-26) 2mJc

Where c is a closed path in the region of convergence in the counterclockwise direction. If X(z) is a rational function of Z, Xn can be evaluated by the partial fraction method or by the residue theorem

Xn = ~ rX(z)zn-l dz 2mJc

= { I [residues of X(z)zn-l at the poles inside c] for n ~ 0 (Al-27) - I [residues of X(z)zn-l the poles outside c] for n < 0

5. Parseval's theorem.

00 1 f dw I Xn Y,,* = -. X(w) Y*(l/w*)- (Al-28) n=-oo 2m c w

6. Complex convolution. If

then

U(z) = I00 unz-n = -.1 f X(w) Y (z)-- dw n=-oo 2m c w w (Al-29)

where the region of convergence of X(z) is Rl < Izl < R2 and that of Y(z) is R3 < Izl < R 4 · A linear system is called bounded input bounded output (BIBO) stable, if the output is bounded for every bounded input. Let Xn be a bounded input sequence, hn be the impulse response sequence, and Y" be the output of the linear time-invariant discrete-time system. The output Y" is given by

00 Yn = I hkXn - k no:::::: -00

If the system is causal, then

hk = 0 for k < 0, (Al-30) 556 APPENDIX 1 in this case,

00 Y" = L hkXn- k k=O

If a linear, causal and time invariant system is stable, then the output is bounded

IY"I

when the input IXnl < L, Land M are constants. Since

00 Y" = L hkXn-k k=O

then

00 I Y"I = L IhkllXn-kl k=O

The causal system is stable if

where C is a constant. On the other hand if

then

I Y"I is not bounded. We conclude that a linear time invariant and causal system is stable if and only if

(Al-31) LINEAR SYSTEMS ANALYSIS 557

The z-transform of the causal linear time invariant system is

00 H(z) = L hnz-n n=O

The region of convergence of H(z) is Izl > R2 because H(z) is a polynomial in negative powers of z. The system is causal if and only if the region of conver• gence of H (z) is inside a circle of finite radius in the z-plane. A causal system is stable if and only if the region conveyence of H(z) is exterior of the unit circle with the center at the origin in the z-plane. The system H(z) is invertible, causal and stable if the zeros and poles of H(z) are inside the unit circle. This system is called the minimum phase. For example, if

z(4z - 1) H (z) = -----'----'----• (2z - 1)(3z - 2) then H(z) has zeros at z = 0 and z = 1/4 and has poles at z = 1/2 and z = 2/3. Since zeros and poles are inside the circle Izl = 1, H(z) is causal and stable.

VECTORS AND MATRICES

Let n be a positive number and let Xl' X 2, ••• , Xn be any elements of real or complex numbers. The ordered n-tuples

(AI-32) is defined as an n-dimensional vector. A set V of vectors is called a vector space if X E V, Y E V implies X + Y E V and if X E V implies O(X E V for any scalar 0(. The transpose of a vector is defined as

Xl X2 X'= X3 (AI-33)

Xn

X is called a row vector and X' is called a column vector. A set of vectors Xl, X2 , ... , Xn is said to be linearly independent if there exist scalars cl , c2 , ... , cn, not all simultaneously zero, such that C1X l + C2 X2 + ... + cnXn = O. If no such scalars exist, then Xl' X2 , .•. , Xn are called linearly independent. 558 APPENDIX 1

The inner product of two vectors is defined as

(A1-34)

The norm of a vector X is defined as

IIXII = J(X,X) (A1-35)

Two vectors are orthogonal if (X, Y) = 0 (A1-36) An m x n matrix A consists of a collections of mn elements aii , i = 1, ... , m, j = 1,2, ... , n, and is written as

(A1-37a)

If m = n, then A is called a square matrix, If m = 1, then A is a row vector with n components. If n = 1, then A is column vector with m elements. If m = n = 1, then A is a scalar quantity. If

A = {aiJ, m x n matrix

B = {iJiJ, m x n matrix

then

(Al-37b)

If ex = f3 = 1, then C is the sum of two matrices A and B. It should be noted that A + B = B + A. Let

A = {aiJ, m x k matrix

B = {bu}, k x n matrix

The product of two matrices is defined as

(A1-38)

The transpose of the matrix A, a m x n matrix, is defined as LINEAR SYSTEMS ANALYSIS 559

n X m matrix where

m x n matrix (Al-39)

If A = A', then A is called a symmetrix matrix. If A is a square matrix with all diagonal 1 and the remaining elements zero, then A is called the n x n identity matrix. It can be shown that

AI = IA = A (Al-40)

A square matrix A is called a nonsingular matrix if the determinant of A, IAI, is not equal to zero. Let A be a nonsingular matrix. The matrix B is called the inverse of A if

BA = AB = I (Al-41)

and B is denoted by A -1. It should be noted that

(ABC)' = C' B' A' (ABC}-1 = C-l B-1 A-I (Al-42)

The rank of a matrix is defined as the number of independent rows (or columns). A square matrix A of complex elements is said to be unitary if AHA = I, where AH is the conjugate transpose of A. If A is a real square matrix, A' A = I. A square matrix is called an orthogonal matrix if A' = A-I. If A = AH, then A is called a self-adjoint or Hermitian matrix. The trace of a square matrix A is defined as

Tr(A) = all + azz + a33 + ... + ann (Al-43)

We have:

Tr(A + B) = Tr(A) + Tr(B) and Tr(AB) = Tr(BA)

The inner product of two matrices A and B is defined as

[A, B] = Tr(ABH) (Al-44)

The norm of a matrix A is denoted by II A II and is defined as 560 APPENDIX 1

IIA II = [A, A] liZ = JTr(AAH) (Al-4S)

It can be proved that

I[A,B]I ~ IIAIIIIBII, (Al-46)

This inequality is known as Cauchy-Schwarz inequity. If AX = AX, (Al-47) then A is called an eigenvalue or characteristic value and X is called an eigenvector or characteristic vector. The characteristic polynomial is given by

n ~(A) = IAI - AI = L: IX)i = 0 i=l (IX; are constant coefficients) (Al-48) where A is an n x n square matrix. A square matrix A is called simple if its eigenvalues are distinct. Let Xl, Xz, ... , Xn be the eigenvectors corresponding to the distinct eigenvalues A1' ... , An' Let the model matrix

(Al-49)

such that

where A is a simple matrix. Then P is nonsingular and

(A1-S0)

For every symmetric matrix, there is an orthogonal matrix P, P'P = I and

P'AP = A (A1-S1)

A matrix A is positive definite if

Q(X) = [AX,X] = X' AX> 0 for every X # 0 (A1-S2)

If A is a symmetric and positive definite matrix, then A is called a covariance matrix. LINEAR SYSTEMS ANALYSIS 561

(i) If A is positive definite, then all the eigenvalues are positive. (ii) If P is a model matrix of a positive definite matrix A, then P' AP is also positive. (iii) If A is a positive definite, then A -1 exists. (iv) If A is positive definite, then there exists an nonsingular matrix W such that A=W'W (AI-53)

This factorization is called a Choleski decomposition if W is a trian• gular matrix. (v) If A is positive definite and the matrix (I - B' A -1 Q) is non-singular then

(AI-54)

where Q is an n x m matrix and B' is an m x n matrix, m :( n. This equality is known as the Woodbury formula.

(a) Corollary

Where R and A are positive definite matrices. This equality is known as matrix inversion lemma

(b) Cholesky Decomposition

The n x n symmetric matrix A can be decomposed as

A = T'T, (AI-56)

where T is a triangular matrix and is given by

t11 t22 t21 0 T= t31 t32 t33 (AI-57)

t41 t42 t44

tni tn2 tnn

j = 2, 3, ... , n 562 APPENDIX 1

j > i, i = 2, 3, ... , n (AI-58) j > i, i = 2, 3, ... , n

This decomposition is also called square root method. The inverse of a nonsingular matrix can be obtained using the Cholesky decomposition of upper and lower triangular matrix. For any vector gi'

(T')-l[Alg;] = [(T')-1 AI(T')-lg;] = [Tlh;], i = 1,2, ... , n (AI-59) where T' hi = gi' g;, hi are vectors, i = 1,2, ... , n. Computation of the h vector is given by

1 = 2, 3, ... , n (Al-60)

(c) Singular Value Decomposition

Let A be an m x n matrix of rank r. There exists an m x m unitary matrix V and an n x n unitary matrix V and an r x r diagonal matrix D with strictly positive elements such that

r n - r r ~ ] m-r (Ji > 0 (Al-6l)

where L is an m x n matrix. If r = m = n, then L = D, V is the transpose of the complex conjugate of V. The elements of D are called the singular values of A. The matrix A can be expressed as

r A = L (JiVi v;H (Al-62) ;=1

where {Vi: i = l, ... ,r}, the column vectors of V, are the left singular vectors LINEAR SYSTEMS ANALYSIS 563 of AHA and {1Ij: j = 1, ... , r}, the column vectors of V, are the right singular vectors of AAH , and (Ii are singular values of A. Let the singular decomposition of A be given by (Al-62). The pseudo• inverse AI is given by

1 LI = [ D- OJ (Al-63a) o 0

If r = n, then the pseudo-inverse of A is given by

(Al-63b)

(d) Householder Matrix

A matrix T is called the Householder matrix (transform) if

T = I - 2hh' (Al-64) where

h'h = 1 (Al-65)

(e) Functions of Matrices and Matrix calculus

Let A be a square matrix. Then it can be shown that A 2 t2 A 3 t 3 1. eAt = I + At + ~- + ~- + ... (Al-66) 2! 3! d 2. ~eAt = AeAt = eAtA (Al-67) dt

3. d(dA = {ddt ai/t)} (Al-68)

d dA dB 4. ~AB=-B+A- (Al-69) dt dt dt

5. f A dt = {f aij(t) dt} (Al-70)

dQn = AX + A'X (Al-71) dX 564 APPENDIX 1

Cayley-Hamilton Theorem. Let L\(A,) denote the characteristic polynomial of A. Then

L\(A) = 0 (A1-72a)

This theorem can be used to compute the inverse of a nonsingular matrix A by multiplying A -1 in the equation MA) = O. Let

A = [~ !J

Then

L\(A,) = A,2 - 6A, + 5

L\(A) = A 2 - 6A + 51

A-I = [61 - A]/5 = ~[_: -~J

Inverse of a Partitioned Matrix.

(a) (A1-72b) where E = D - B' A-I B, F = A-I B and A and D are symmetric matrices and all the inverses exist.

(b) (A1-72c) where U and V are two column vectors.

BIBLIOGRAPHICAL NOTES

Schwarz and Friedland (1965), Cooper and McGuillen (1967), Liu and Liu (1975), Kreyszig (1979), and Wylie and Barrett (1982) provide the back• ground material on signals, systems, and transform applications of vectors and matrices. Lighthill (1959), Papoulis (1965), and Goodman (1968) are excellent texts on Fourier series and transform. A good account of vectors and matrix theory can be found in Bellman (1970), Fadeeva (1959), Moore (1968) and Graybill (1969), Rao (1973), Stewart (1973), and Searle (1982). Discrete-time signals and time series analysis emphasizing digital filter, sam- LINEAR SYSTEMS ANALYSIS 565 pIing and word size, quantization, recursive and nonrecursive filters, and filter implementation are treated in Stearns (1975), Tretter (1976), and Otnes and Enochson (1978). Applications of functional analysis in signal synthesis and spectral analysis are given in Frank (1968) and Papoulis (1977). Appendix 2 Probability

Consider a random experiment g. The outcome of that g is called an elemen• tary event or event and the set of all these elementary events is called the universal or the sample space n. Let us denote an element of n by OJ. Let A and B be two subsets of n. We define A + B = {OJ: OJEA or OJEB} (A2-1) AB = {OJ: OJEA and EB} (A2-2)

A C = {OJ: OJ is not in A} (A2-4)

The class of set ff on which probabilities are to assigned should have the following properties: If A E ff and BE ff, then

1. 2. A + BEff 3. ABEff (A2A)

ff is called the Boolean Field. For each event A E ff, we assign a value peA) to be called probability of A. Note that probability is a function on a set. The triplet (n, ff, P) is called the probability space. Two events A and B are called disjoint if AB = 0, an empty set. A probability measure P(·) is a mapping from events into the reals satisfying the following three axioms:

I. peA) ~ ° (A2-5) II. pen) = I (A2-6) III. For a countable set {AJ of events, if AiAj = 0 then

P(~Ai) = ~P(A;) (A2-7)

The reader may be able to derive the following important results:

(a) P(0) = 0, the probability of occurrence of a impossible event. (A2-8)

566 PROBABILITY 567

(b) O:s; peA) :s; 1 for A E.? (A2-9)

(c) P(AC) = 1 - peA), A E.? (A2-1O)

(d) If A1 C A2, then P(A1) :s; P(A2)' (A2-11)

For any two arbitrary (events) sets A1 and A2 with A1 c .? and A2 c .?, then

(A2-12)

RANDOM VARIABLE

A real-valued point function X(·) defined on the probability space (O,.?, P) is called a random variable (r.v.) if the set {w: X(w):S; x} c.? for every real x. A random variable X is a function that assigns a real value to each out• come in O. We will assert that P[X(w) = 00] = 0 and P[X(w) = - 00] = o.

DISTRIBUTION FUNCTION

The function

Fx(x) = P{w: X(w) :s; x}

~ P[X :s; x] (A2-13)

defined on the real line R is called the distribution function of the random variable X. The properties of distribution function Fx(x) are

1. Fx( - 00) = 0, (A2-I4)

2. If Xl < X 2 , then FX (x 1 ) :s; FX (x 2 ), the function is nondecreasing. (A2-IS)

3. Fx(x+) = Fx(x), the function is continuous, at least from the right. (A2-I6)

There are two kinds of distributions. One is called the discrete type, cor• responding to a random variable assuming discrete values a1 , a2 , ••. , an such that

n I, X~O Fx(x) = I Piu(x - a;), u(x) = { (A2-17) i=l 0, x

such that 568 APPENDIX 2

(i) Pi = P{X = a;)

(ii) P; ~ 0, for i = 1, ... , n (A2-18)

n (iii) LPi = 1 (A2-19) i=1

There are also singular distributions. The other type is called the absolutely continuous type and is given by

(A2-20)

Where fx{x) is the probability density function (p.dJ.) and

1. fx{x) ~ ° (A2-21)

2. J:oo fx{x)dx = 1 (A2-22)

For the absolute continuous case, i.e., Fx{x) is differentiable, the p.dJ. of X is defined as

(A2-23)

Note that

PEa ~ x ~ b] = Fx{b) - Fx{a) (A2-24) = rfx{x) dx, for the continuous case (A2-25) k2 P[k1 < X < k 2 ] = L P{x = k) k=k j

Examples of discrete distributions are binomial and Poisson.

Binomial Distribution

If a coin is tossed n times, the probability of r heads is given by binomial distribution PROBABILITY 569

r = 0,1, ... , n (A2-27) where p = P (head) and °< p < 1. Hence the distribution

Fx(x) = f (n) pr(1 - prru(x - r) r=O r where

1, t>o { u(t) = 0, t

Poisson Distribution

The number of people arriving in a counter or the number of photons in photodetector is given by the Poisson distribution:

Ar p ~ P[X = r] = e-A_ r = 0,1, ... ,00 (A2-28) r r., ' where A is the average intensity, A ~ 0. Hence the distribution

(A2-29)

Uniform Density

The phase of a transmitted signal takes any value from a to b uniformly. The probability density function is given by

fx(x) = {b ~ a' (A2-30) 0, elsewhere

Gaussian (Normal) Density

A random variable X which takes values - 00 to 00 and which can be expressed approximately as sum of many indepenent, identically distributed 570 APPENDIX 2 random variables is given by

(A2-31)

A random variable X is called a binomial r.v. if its distribution is given by (A2-26). A binomial r.v. is called a Bernoulli r.v. if n = 1 in (A2-27). A Ber• noulli trial is an experiment which has two possible outcomes or events. Tossing of a coin can be considered as a Bernoilli trial with two possible outcomes, heads or tails. The output of a soft limiter (half-wave rectifier) is either 0 or 1 depending on the negative or positive input signal. Let us denote two possible outcomes of a Bernoulli trial by 0 and 1. Let X be a Bernoulli r.v. with P[X = 1] = p and P[X = 0] = (1 - p). Let Y be the number of trials necessary to get the event 1. The r.v. Y is called a geometric r.v. and its distribution is given by

pry = K] = (1 _ p)Kp, K = 0, 1, 2, 3, ... (A2-32)

Let X be a r.v. on a probability space (n, ff, P). The nth moment of X is defined as

if X has an absolutely continuous ftO Xnf(x)dx, (A2-33) distribution, n = 1, 2, 3, ... E[xn] = { -00 if X has discrete I (x)np(x = x), (A2-34) j distribution n = 1, 2, 3, ...

Sometime it is defined by a compact notation

(A2-35)

When n = 1, the first moment is E(X). E(X) is called the mean or average value and is denoted by Ilx' When n = 2, then E(X2) is called the second moment. The variance of X is defined as

Variance of X ~ 6; = E[x2] - [E(X)]2 ~ Var(X) } (A2-36) = Second moment - (first moment?

6 x is called the standard deviation. It should be noted that for some variable r.v., some moments may not exist. The characteristic function (c. f) is given by PROBABILITY 571

rPx(v) = E(e iXV) (A2-37)

= f:oo eiXVf(x) dx (A2-38) if X has an absolutely continuous distribution, and

(A2-39) if X has a discrete distribution. It should be noticed that in the case of a r.v. with absolutely continuous function, the p.dJ. is given by

i=J=1 (A2-40)

Therefore, fx(x) and rPx(v) form a Fourier transform paIr. Note that [rPx(v)] ~ 1.

Moment theorem:

E(xn) = bdnrPx~v) I (A2AI) l dv v=o

Therefore from the characteristic function, all the moments can be obtained whenever they exist and the probability density function (p.dJ.) can be obtained.

FUNCTION OF RANDOM VARIABLES

In case y = g(X) has multiple roots Xl' ... , Xn such that

then the probability density of Y is given by

where 572 APPENDIX 2

d g(x;) = dx (g(Xi», i = 1,2, ... , n (A2-42)

and 9 -1 is the inverse of g(. ) when it exists.

Expectation of Function of R.V.

Let Y = g(X) be a measurable function of X and denote Fy(y) as its distri• bution function (df). The expectation is

E(y) = f:oo ydFy(y)

= f:oo g(x)dFx(x) = fg(Xi)P(X = X;), for the discrete case (A2-43)

= f:oo g(x)fx(x) dx, for the continuous case (A2-44) where fx(x) is the p.d.f. of X.

Chebyshev's Inequality

For any random variable X with mean /lx and variance (J;,

c>O (A2-45)

MULTIDIMENSIONAL RANDOM VARIABLE-RANDOM VECTOR

The collection of n single-valued functions X 1 (w), X 2 (w), Xn(w) mapping Q into Rn is called an n-dimensional random variable if the set

for all X, where X is an n dimensional vector. We define the joint distribution functions as

(A2-46) PROBABILITY 573

It can be shown that

1. F is nondecreasing } 2. It is right continuous (A2-47)

3. F(x;) = F( 00, ... ,00, Xi' 00, ... ,00) is called the marginal distribution function (d.f.) The joint probability density (p.d.f.) is defined as

(A2-48) whenever the right side of exists. The joint p.d.f. must satisfy the following:

For example, when n = 3, the p.d.f. is

where FAx 1 , X2, x 3) is the joint probability distribution function of (X 1 ,X2,X3)· The joint p.d.f. of (x 1, X 2 , X 3) must satisfy the following conditions:

fAx 1 ,X2,X3) ~ 0

J:oo J:oo'" J:oo fAx 1 ,X2,X3)dX 1 dX2dx3 = 1

The marginal density is given by

For n = 3, the marginal densities are 574 APPENDIX 2

fAxl) = f:oo f:OOfAX1,X2,X3)dX2dX3

fAx2) = f:oo f:oo fAx l ,X2,x3)dxl dX3 and

fAx3) = f:oo f:oo fAx l ,X2,x3)dxl dX 2

fAx l ,X3) = f:OOfx(X l ,X2,X3)dX2,

Two random variables Xl' X2 are called independent if

(A2-51)

For the continuous case, the two random variable Xl and X 2 are called independent if

(A2-52)

Similarly three r.v. Xl' X2, and X3 are called independent if

(A2-53)

For the continuous case Xl' X 2 , X3 are independent if

(A2-54)

These definitions can be extended for independent r.v. Xl' ... , X n , i.e.,

n FX(xl,···,Xn) = TIFxiXj) (A2-55) j=l

For the continuous case,

n fx(xl ,·· ., xn) = TI fXj(x) (A2-56) j=l PROBABILITY 575 and density function respectively. The joint density function satisfies the following:

1. (A2-57)

2. (A2-58)

Expectations for Multidimensional r.v.

It can be proved that if Xl and X2 are two independent LV.S, then

1. E[aXl + bX2 ] = aE[Xl] + bE[X2 ] (A2-59)

2. E[X1 X2 ] = E[Xl ]E[X2 ] (A2-60)

Defn: Two random variables Xl and X2 are uncorrelated if

E[X1 X2] = E[Xl ]E[X2 ] (A2-61) 3. Var(aX + bY) = a 2 Var(X) + b2 Var(Y) (A2-62)

when X and Yare independent r.v. Recall

where /1x = E(X), the mean or the average value of X, i.e.

Defn: Two random variables X and Yare orthogonal if the expectation of their product is zero, i.e., E(XY) = O.

oo I {I Ig(Xi' yJP(x = Xi' Y = Yj) 4. E(g(x, y» = I g(x, y) dF(x, y) = Ii o! Ioo -00 g(x, y)f(x, y) dx dy -00 -00

We define the characteristic function for X as 576 APPENDIX 2

co = ff f eil:.VjXjfAx, ... ,xn)dxI ···dxn (A2-63) ~ n-fold

When the Xl' X 2 , ••• , Xn are independent, the characteristic function of X is given by

n ~AVI,···,Vn) = fl ~x/vJ (A2-64) j=l

where

if Xj are continuous

if Xj are discrete

If

Z=X+Y, (A2-65)

where X and Yare independent, the characteristic function of Z is

(A2-66)

The p.dJ. of Z, whenever it exists, is

(A2-67)

SUM OF GAUSSIAN INDEPENDENT R.V.'s.

Let PROBABILITY 577 where

E(X1 ) = Ili' i = 1, ... , n Var(X;) = CJ;" i = 1,2, ... , n and Xl' X 2 , ••• , Xn are independent Gaussian random variable. The p.d.f. and characteristic function (c.f.) are given by

(A2-68) where n n E(z) = Ilz = L E(Xi) = L Ili (A2-69) i=l i=l and

n n CJ; = L CJ;, = L Var(X;) (A2-70) i=l i=l

(A2-71)

The sum of independent Gaussian r.v.'s is also a Gaussian r.v. with mean equal to sum of all means and with variance equal to sum of variances. Let X be a n dimensional vector and Y be a n dimensional vector. We define covariance as an n x n matrix

CCx = E(X - fl)(X - fl)' (A2-72)

where

x ~ [] p ~ [J:l X' ~ (x"x" ... ,xJ

E[X;] = Ili' 1 ~ i ~ n

It can be seen that

E[(LaiXif] = E[a'xx'a] = aExx'a' 578 APPENDIX 2 where

a = [a1, ... ,aJ',

Note that the covariance matrix is symmetric and positive definite. The covariance matrix is also called the dispersion matrix. The cross covariance matrix of two random vectors X and Y is defined as

~xy = E[(X - flx)(Y - fly)']

= E(XX') - flxfl~ (A2-73)

If Cxy = 0, then X and Yare called the uncorrelated random vectors. If

EXY' =0 then X and Yare called the orthogonal vectors. Let us denote

(A2-74) and

i,j=1, ... ,n (A2-75) where

i,j = 1, ... , n

Pij is defined as the correlation coefficient of Xi and Xj. If the Xi and Xj are independent, then

Pij = 0, i.e., J.lij = 0 and

Ipijl ~ 1 for all i,j (A2-76)

The covariance matrix (dispersion matrix) can be written as PROBABILITY 579

In case of (Xl, ... , Xn) are uncorrelated, Cfix is a diagonal matrix. If X = (Xl, ... ,Xn) where Xi' i = 1, ... , n is a Gaussian LV., then X is called an n• dimensional Gaussian vectoL The p.d.f. of X is given by

(A2-77)

l where Cfix- is the inverse of the covariance matrix and we assume covariance matrix is positive definite and ICfixl, the determinant of the matrix Cfix, #0. The c.f. of X is

(A2-7S)

where

Let us represent Y as

Y=PX (A2-79)

where P is an n x n nonsingular matrix. The p.d.f. of Y is given by

r ( ) = fAp- l y) IPI = det(P) (A2-S0) Jy Y IFI'

The mean and covariance of Yare

my g, E(Y) = PJlx

Cfiy = E[(Y - Jly)(Y - Jly)'] (A2-S1)

If X is a Gaussian random vector, then the p.d.f. of Y is given by 580 APPENDIX 2

(A2-82)

When

I1r = PI"

Y is also a Gaussian vector For

n= 2,

Note again CCx is symmetric and positive definite. The bivarate (2-dimen• sional) density is

(A2-83) where

Jli = E[xJ, i = 1,2, ...

Forn = 3, PROBABILITY 581

This matrix is symmetric and positive definite.

Transformation of Random Variables

Let Xl and Xl be continuous random variables whose joint density fx(x1, Xl) is given. Let Y1 = g(Xl> Xl) and Yl = h(x1, Xl) such that g-l(.,.) and h-1(.,.) exist as well as derivatives of g(X 1,Xl), h(X1,Xl) with respect to Xl and Xl' If (xll), x~)), (xll ), X~l)), ... , (xl"), x~)) are all real solution of g(xY), x~)) = Y1, and h(xY), x~)) = Yl, i = 1, ... , n, then joint density of Y is

n !Y(Y1'Yl) = L fx(xy),x~))/\J;I (A2-84) i=l where

is the Jacobian of the transformation. If X = [X1,Xl"",XnJ' and {X;} are uncorrelated and Gaussian, then {X;} are independent Gaussian r.v.s. Let (Q, g-, P) be a probability space. Consider a set A such that P(A) =1= 0 and B, ABEg-. We define the conditional probability of B given that event A has occurred as

P(B\A) = P(AB)/P(A) (A2-85)

Let Ai' i = 1, 2, ... , n be disjoint sets such that P(Ai) =1= 0 for each i and Ai and AiBEg-. For any set BEg-, we can obtain 582 APPENDIX 2

n P(B) = L P(BIAj)P(A) (A2-86a) 1=1

This is known as the total probability theorem. We state Bayes' theorem:

P(BIAJP(AJ P(AiIB) = -n---- (A2-86b) L P(BIA)P(Aj ) j=1

Two events A and B are called statistically independent if

P(BIA) = P(B) (A2-87)

By definition

P(BIA) = P~t~) = P(B) (A2-88) therefore A and B are also statistically independent if

P(AB) = P(B)P(A) (A2-89)

Two r.v.s X and Yare independent if and only if

P[XEBIYEA] = P(XEB) (A2-90) for all events A and B. Consider a pair of random variables (Xl' X 2 ) and a set .2 E!lFl such that PEw: X(W)E.2] 1= o. We define the conditional distribution function

(A2-91)

The conditional density function of X 2 given Xl' when it exists, is

f(X IX) =f(Xl ,X2 ) (A2-92) 2 1 f(Xl )

f(X ,X ) l 2 (A2-93) PROBABILITY 583

Similarly it can be shown that

!(X1) = f !(XdX2 )!(X2 )dX2

!(X IX ) =!(X1 ,X2 ) = !(X1 ,X2) 1 2 (A2-94) 2 !(X ) toooo !(X1 , X 2 ) dX2

The conditional expectation of Y given X is defined as

E(YIX) = f:oo Y!(YIX)dY

= rjJ(X)

Therefore,

E(Y) = E{E(YIX)} = E(rjJ(X))

= E f:oo Y!(YIX)dY

= too [f:oo Y!(YIX)dY]!(X)dX

= f:oo fY!(YIX)!(X)dY dX.

= f:oo f Y!(Y, X) dY dX

Conditional Expectation Theorem.

E(Y) = Ex{Ey(YIX)}

= f: E(YIX = x)dFx(X), (A2-95) where Ex and Ey stand for the expectation operator with respect to the r.v.s X and Y, respectively. 584 APPENDIX 2

DISTRIBUTIONS ASSOCIATED WITH A GAUSSIAN RANDOM VARIABLE

Let X be a Gaussian r.v. with mean Jl and variance 0";. If

then the p.d.f. of Z is given by

t ( ) __1_ -z2f2 }z z - e fo

where the mean of X = Jlz = 0 and variance of Z = 0"1 = 1. The p.d.f. of X is

(A2-96)

(a) Amplitude Density (Chi)

Let

2 2 2 R (n) = X n = J X1 + X 2 + ... + Xn (A2-97)

where Xl' X 2 , ••• , Xn are independent normalized Gaussian r.v.s. The p.d.f. of R(n) is given by

(A2-98)

where r(n/2) is the gamma function and u(r) is a unit step function. When n = 2,

(A2-99)

tR (r) is known as the Rayleigh intensity density. Most fading channels are JJ (2) given by a Rayleigh density (Kennedy (1968)). When n = 3, PROBABILITY 585

(A2-100) fR(3)(r) is called the Maxwell density. The probability density of propagation waves is given by the Maxwell density.

(b) Power Density (Chi-Square)

Let

R[n) = X; = xl + Xi· .. + X; (A2-101)

Where Xl, X 2 , ••• , Xn are indepenent normalized Gaussian r.v.s. The p.d.f. of R[n) is given by

fRt.)(X) = fx;(X)

1 Xn-2/2e- X /2u(X) (A2-102) 2nI2r(~)

is called the power or chi-square density with n degree of freedom (diversity). When n = 2, the power density is known as the Rayleigh power density, and is given by

x~O (A2-103) x

It can be shown that

E(X;) = f.lX; = n (A2-104) Var(x;) = a;; = E(X; - f.lx;)2 = 2n (A2-105)

Xn and X; tend to a (normal) Gaussian density when n ~ 30. 586 APPENDIX 2

(c) Student's t Distribution

Let Y and Z be independent random variables such that Y has a X~ density function and Z has a (normal) Gaussian density with mean /ly = O. Let

z t =-- (A2-106) n ftfrr

The p.d.f. of tn is given by

h (t) = r[(n + 1)/2] [1 + t2r

The p.d.f. of tn is called Student's t distribution with n degrees of freedom. Note that

n ()[ = E(t~) = -- for n > 2. n n-2

When n ~ 00, h(t) becomes a standard Gaussian (normal) density. When n = 1, Student's t distribution reduces to

which is known as the Cauchy distribution. Mean and variance do not exist for the Cauchy distribution.

LIMIT THEOREMS

Let (Q, $', P) be the basic probability space, {Xi(w), i = 1,2... } be an infinite sequence of r.v.s and Fn be the p.d.f. of (Xl(w), ... ,Xn(w)) defined on R" (n• dimensional Euclidean space). Then we have the following consistency pro• perty: Fn(Xl , ... , Xn) = Fn+1 (Xl' XZ" •• , Xn. 00)' A sequence of r.v.s Xn, n = 1,2, ... is said to converge to a constant C:

(a) Weakly or in probability (written as Xn -.!'. C) if for every given c: > 0,

limitP(IXn - CI > c:) = 0 (A2-108) PROBABILITY 587

(b) Strongly or almost surely (written as Xn~ C) if

p(limitXn = c) = 1 (A2-109) n--+oo

(c) In quadratic mean (written as Xn ~ C) (also written as Xn m4 C) if

limitE(Xn - Cf = 0 (A2-110)

It can be proved that:

(i) Convergence in q.m. implies convergence in probability. (This fol• lows from Chebyshev's inequality.)

(ii)

(d) The sequence ofr.v.s {Xn} is said to converge in distribution to a r.v. X with dJ. Fx(.) if Fn --+ F as n --+ 00 at all continuity points of F. Such convergence is written as Xn~ X. Let ~n(V) be the d. of X n. If Xn~ X, then ~n(V) --+ ~(V), where ~(V) is the d. of X. If ~n(V)--+ ~(V) and the limit function is continuous at V = 0, then Xn ~ X and ~(V) is the d. of X. (e) Law of large numbers: Let {Xn}' n = 1, 2, ... be a sequence of obser• vations and Xn be the average of the first n observations, i.e.,

(i) Kinchin's Theorem: Let (Xn)' n = 1,2, ... , be independent and identi• cally distributed (i.i.d) r.v.s and E(X;) = J-l < 00 for i = 1, 2 .... Prove that Xn ~ J-l. This is the weak law of large numbers. (ii) Kolmogorov's Theorem: Let Xl' X 2 , •.• be a sequence of i.i.d. r.v.s. Then a necessary and sufficient condition that Xn --+ J-l is that E(X;) exists and is equal to J-l. This is called the strong law of large numbers. (f) Central Limit Theorem: For a sequence {Xn}' n = 1,2, ... , of i.i.d r.v.s. Then

in the weak or strong sense provided J-l = E(x) exists. We will discuss 588 APPENDIX 2

the distribution of Xn as n ~ 00. Let us denote

Note that

and

= ~ [n2 (t X k- nll)2] (J n k=l

Assume that E(Xk - Ilf = (J2 for all k. The density function of Zn as n ~ 00 is given by a Gaussian density with mean 0 and variance 1. (g) Cauchy-Schwarz Inequality. Let X and Y be r.v.s. with finite second moments. Then

(A2-111)

BIBLIOGRAPHICAL NOTES

There are several books on probability theory and applications for each area of interest. Only a few books will be cited here. Papoulis (1965), Breiman (1969), Davenport (1970) and Breipohl (1970), and Meyer (1970) will provide adequate background on probability theory and application in engineering and science. A rigorous treatment of probability theory requires a back• ground of measure theory and functional analysis; see Rudin (1966) and Royden (1968). Advanced texts in this category are Feller (1966), Vols. I, II, Breiman (1968), and Chung (1969). Appendix 3 Stochastic Integrals

Let fJ(t), - 00 < t < 00 be a Wiener process and f(t) be a continuously dif• ferentiable function on the closed interval [a,b]. We want to evaluate

I = f f(t)n(t) dt = f f(t) dfJ(t) (A3-1) where

n(t) = dfJ = limit [fJ(t + e) - fJ(t)] (A3-2) dt .-0 e is the white noise process. But n(t) does not exist because the Wiener process fJ(t) is not differentiable. So I is not defined in the sense of a Riemann or Lebesgue integral. However, in the limiting process, I can be defined mean• ingfully. We express

I = limit fb f(t) [fJ(t + e) - fJ(t)] dt £-0 a e

= limit f b f(t)-dd (1- fl+' fJ(s) ds ) dt .-0 a tel l l = limit [f(t)- 1i . +. fJ(s)ds Ib - fb f'(t) (1- i +. fJ(s)ds )]dt .-0 e I a a e I integrating by parts. Taking the limits, we get

I = f(t)fJ(t) I: -f f'(t)fJ(t) dt,

= f(b)fJ(b) - f(a)fJ(a) - f f'(t)fJ(t)dt, (A3-3) where f'(t) = df(t)jdt. The inegral in the right side of Eq. (A3-3) is well defined. We note that I is a Gaussian random variable because the right• hand side is a linear function of a Wiener process, which is a Gaussian process.

589 590 APPENDIX 3

Theorem A3-1. Let f(t) and get) be two continuously differentiab)e functions in the closed interval [a, b]. If (J(t) is a Wiener process with variance (j2 t and f(t), get) are integrable in [a, b], then

E [f f(t) d(J(t) f get) d(J(t)] = (j2 f f(t)g(t) dt (A3-4)

Proof. Using the formula in Eq. (A3-3), we get

f f(t) d(J(t) = f(b)(J(b) - f(a)(J(a) - f 1'(t)(J(t) dt

= feb) [(J(b) - (J(a)] - f 1'(t) [(J(t) - (J(a)] dt (A3-5) Similarly

f g(t)d(J(t) = g(b) [(J(b) - (J(a)] - f g'(t) [(J(t) - (J(a)] dt (A3-6)

We will derive first the following identity:

E[f 1'(t)((J(t) - (J(a))dt· f g'(t)((J(t) - (J(a))dt]

= (j2 f (f(t) - f(b))(g(t) - g(b))dt (A3-7)

The left side of Eq. (A3-7) is

= f 1'(s) {f g'(t)E[((J(t) - (J(a))((J(s) - (J(a))] dt} ds

= (j2 f1'(S){f g'(t)min(s - a,t - a) dt}ds,

= (j2 f 1'(s) {f g'(t)min(s - a, t - a)dt + rg'(t)min(s - a,t - a)dt}ds

= (j2 f 1'(s) {f (t - a)g'(t) dt + (s - a) rg'(t) dt} ds (A3-8) STOCHASTIC INTEGRALS 591

In the second step, we have used the property:

E[P(t)P(s)] = (J2 min (t, s)

Integrating the inner integral by parts, we get

f( )dt + r( ) dt = (t - a)g(t) J: - f g(t) dt + (s - a)(g(b) - g(s))

= (s - a)g(b) - f g(t) dt

= f (g(b) - g(t)) dt (A3.9a)

Let

Pi (t) = P(t) - p(a)

Using Eqs. (A3-8) and (A3-9a), the left side Eq. (A3-7) becomes

E[r f'(t)f1i(t)dt rgl(t)f1i(t)dt]

= (J2 rf'(S){f (g(b) - g(t))dt}dS

= (J2 rf (g(b) - g(t))f'(s)dtds

= (J2 r(g(b) - g(t))(f f'(s) dS) dt

= (j2 r(g(b) - g(t))(f(b) - f(t))dt (A3-9b)

In the last step but one we have interchanged the order of integration. See Figure (A3-1); the area of the integration is shaded. Next we remark that 592 APPENDIX 3

s t=s

al--~~--+-

a b Figure A3.1

E [(fJ(b) - fJ(a» rf'(t)(fJ(t) - fJ(a))] dt = rf'(t)E[(fJ(b) - fJ(a»(fJ(t) - fJ(a))] dt = rf'(t)E[(fJ(b) - fJ(t) + fJ(t) - fJ(a»(fJ(t) - fJ(a))] dt = rf'(t)E[(fJ(t) - fJ(a»(fJ(t) - fJ(a))] dt (A3-10) = rf'(t)(J2(t - a)dt = (J2 [(b - a)f(b) - rf(t)dt] (A3-11a)

Since (fJ(b) - fJ(t» and (fJ(t) - fJ(a» are uncorrelated, the expectation of the product is zero. This property is used in deriving Eq. (A3-1O). Further we note that

E[fJ2(t)] = E[(fJ(t) - fJ(a) + fJ(a»2] = E[(fJ(t) - fJ(a»2] + E[fJ2(a)] E[(fJ(t) - fJ(a»2 = E[fJ2(t)] - E[fJ2(a)] = (J2t - (J2a = (J2(t - a), t > a (A3-11b)

Similarly

E[(fJ(b) - fJ(a» rg'(t)(fJ(t) - fJ(a»dt] = (J2[(b - a)g(b) - rg(t)dt] (A3-12) STOCHASTIC INTEGRALS 593

We observe further that

E[f(b)(P(b) - p(a))g(b)(P(b) - p(a)]

= f(b)g(b)E[(P(b) - p(a))2] = f(b)g(b)a2(b - a) (A3-l3)

Finally, using Eqs. (A3-5) and (A3-6), we get

E[r f(t)P(t)dt· rg(t)P(t)dt] = E[ {f(b)(P(b) - p(a)) - rf'(t)Pl(t)dt} . {9(b)(P(b) - p(a)) - rg'(t)Pl(t)dt}] = E[f(b)g(b)(f3(b) - p(a))2 - f(b)(P(b) - p(a)) rg'(t)Pl (t) dt - g(b)(P(b) - p(a)) rf'(t)Pl(t)dt + rf'(t)Pl(t)dt· rg'(t)Pl(t)dt (A3-14) where Pl (t) = P(t) - p(a)

Substitution of Eqs. (A3-l3), (A3-12), (A3-11a), and (A3-9b) in Eq. (A3-14), gives

E[f f(t)P(t)dt· rg(t)P(t)dt] = f(b)(g(b)' a2(b - a)) - a2f(b{(b - a)g(b) - f g(t) dt]

- a2g(b{(b - a)f(b) - f f(t)dt]

+ a2 f (g(b) - g(t))(f(b) - f(t» dt 594 APPENDIX 3

= f(b)g(b)a2(b - a) - a2f(b{(b - a)g(b) - rg(t)dt] - a2g(b{(b - a)f(b) - rf(t)dt] + a2 {g(b)f(b)(b - a) - g(b) rf(t)dt - f(b) rg(t)dt + rg(t)f(t)dt} = a2 rg(t)f(t) dt (A3-15a)

When f(t) = g(t) in (A3-4), we get

(A3-15b)

Therefore we remark that

/ = rf(t)dP(t) (A3-16)

is a Gaussian random variable with mean

E[I] = rf(b)E(dP(t» = 0 (A3-17)

and variance

E(/2) = E[r f(t)dP(t)J

= a2 rP(t) dt (A3-18)

provided that P(t) is integrable in [a, b]. It can be shown that if fit) and g(t) are continuously differentiable func• tions in [ - 00, b] and [ - 00, c] and f(t)g(t) is integrable in [ - 00, min (b, c)] and P(t) is a Wiener process with variance a2 t, then STOCHASTIC INTEGRALS 595

EXAMPLE 1. Let x(t), t ~ 0 be given by

x(t) = feo e-a(t-u) df3(u), a > 0

Find the mean and covariance of x(t).

Solution. The mean is

Ex(t) = feo e-a(t-u) E(df3(u))

=0

The covariance is

(J2 = _e-a(t+s)e2amin(t,s) 2a

APPLICATION TO STOCHASTIC DIFFERENTIAL EQUATIONS

Consider a first order linear stochastic differential equation of Langevin type given by

dx dt + ax(t) = n(t), where 596 APPENDIX 3

n(t) = dP = limit [(P(t + e) - P(t))] (A3-20) dt £--0 e and P(t) is a standard Wiener process, (I2 = l. Equation (A3-20) can be written as

dx(t) + ax(t)dt = dP(t), x(t) = Xo (A3-21)

Multiply both sides by eat in Eq. (A3-21) and the resultant equation is

eat[dx(t) + ax(t) dt] = eat dP(t) or

(A3-22)

Integrating both sides of Eq. (A3-22), we get

x(t)eat _ x(to)eato = rt eau dP(u) Jto Simplification yields

t x(t) = x(to)e-a(t-to) + r e-a(t-U) dP(u) (A3-23) Jto The right-side integral of Eq. (A3-23) is defined in the sense (A3-3), which is also called the Wiener integral. Let us assume that X(to) = Xo and is inde• pendent of P(t). x(t) is a Gaussian process with mean

E[x(t)] = E[xo]e-a(t-to) let y(t) = x(t) - E[x(t)] then

E[y(t)] = 0

E(y(t)y(s)) = E [{ e-a(t-u) dP(u) 1: e-a(s-v) dP(V)] STOCHASTIC INTEGRALS 597

min(t,s) = e-a(t+S) l e 2au du to

e 2au Imin (t,s) = e-a(t+s) __ 2a to

(A3-24)

The variance of x(t) is given by Eq. (A3-24). Appendix 4 Hilbert Space

We consider a space H whose elements are real random functions, Xl (t, w), X 2 (t, w), X 2 (t, w), ... , W EO. 0 is the sample space. We assume that in this space all the elements Xi(t, w) ~ Xi(t), i = 1, 2, ... have finite second moments i.e.

E[Xf(t)] < 00 i = 1,2, ...

We claim that the space H is a linear space. If XiEH and Xj(t)EH, then aXi(t) + [3Xj(t) is also in H. We define innerproduct and norm is this space by the following relations:

(Xi'X) = E[Xi(t)Xj(t)]

II X;(t) 112 = E(X;(t)2) ~ 0

We assume the following axioms in space H.

AI. Xi(t) + Xit) = Xj(t) + Xi(t) (Commutative law) A2. (Xi(t) + Xj(t)) + Xk(t) = X;(t) + (Xj(t) + Xk(t)) (Associative law) A3. There is null element () E H such that

X;(t) + () = Xi(t) for all Xi(t) E H

A4. a(Xi(t) + Xj(t)) = aXi(t) + aXit) A5. (a + [3)Xi(t) = aXi(t) + [3Xi(t) (Distributive laws) A6. ((J[3)Xi(t) = (J([3Xi(t)) A7. OXi(t) = 0, 1Xi(t) = Xi(t)

BI. II Xi(t) II ~ 0 for all Xi(t) E H and if II Xi(t) II = 0 then Xi(t) = o. B2. II X;(t) + Xj(t) II ~ II Xi(t) II + II Xj(t) II for all X;(t) and Xj(t) in H. B3. II aXi(t) II = lalll Xi(t) II for all scalars a and Xi(t) E H.

Cl. (Xi(t), Xj(t)) = (Xit), X;(t)) C2. (X;(t) + Xit), Xk(t)) = (Xi(t), Xk(t)) + (Xj(t) + Xk(t)) C3. (AXi(t), Xit)) = A(Xi(t), Xit)) C4. (Xi (t), Xi(t)) ~ 0; (Xi(t), Xi(t)) = 0 if Xi(t) = 0

598 HILBERT SPACE 599

H is called a linear vector space if it satisfies axioms defined in Al-A7. His called a normed linear space ifit satisfies the axioms of Al-A7 and Bl-B3. A sequence {Xn(t)} E H is called a Cauchy sequence if EIXn(t) - Xm(t)12 -+ 0 when n, m -+ 00. If limitn->00 Xn(t) -+ X(t) in the mean square sense, and X(t) E H, then space H is called complete. We assume the limit X(t) E H. A complete normed linear space H is called a Banach space. If the complete Banach space satisfies further the axioms Cl-C4, then the space H is called a Hilbert space. We now consider a Hilbert space in which each element is a vector. We denote ele• ments by {Xi(t)} where X;(t) = [Xii (t), X i2 (t), ... , Xin(t)] Each vector has n elements. We define the inner product and norm in H as

(X;(t), Xit)) = E[X;(t)Xj(t)]

IIXi(t) 112 = E[X;(t)Xi(t)] ~ 0

Two vectors Xi(t) E Hand Xj(t) E H are said to be orthogonal if

We denote this by Xi(t).l Xj(t), i -# j.

lemma 1. If X.l Y, then IIX + YI1 2 = IIXI1 2 + IIYI1 2. Write X;(t) = X and Xj(t) = Y, for simplicity of notation. Proof. IIX+YII2 =(X+Y, X+Y)=(X, X) + (X, Y)+(Y, X)+(Y, Y) using axioms A2 and C2. Since X and Yare orthogonal, (X, Y) = E(X'Y) = O. From Cl, (Y, X) = (X, Y) = 0 Hence

II X + Y 112 = (X, X) + (Y, Y) = II X 112 + II Y 112

This lemma is known as the Pythagorean theorem

lemma 2. I(X, Y)1 2 ,,:;; II XliII YII. Equality holds if X = ),Y or Y = 0, where e is a zero vector.

Proof. By axiom Bl,

(X - )'Y,X - )'Y) ~ 0,

where), -# 0 is a scalar quantity. We note that

(X - ),X,X -),Y) = (X, X) - ),(Y,X) - )'(X, Y) + ),2(y, Y) 600 APPENDIX 4 by using axioms in BI-B3 and CI-C4. We choose

Je = (X, Y)[IIYllr2 (X - JeY,X - JeY) = (X,X) -1(X,XW(IIYII)-2

= (X,X) -I(X, Y)1 2/(y, Y) ~ 0

Hence Lemma 2 follows. This inequality is known as the Cauchy-Schwarz inequality.

Lemma 3. IIX + YI1 2 + IIX - Y11 2 = 211XI1 2 + 211YI1 2 Proof.

IIX + YI1 2 = (X + Y,X + Y) = (X, X) + 2(X, Y) + Y, Y) (A4-1)

II X - Y 112 = (X - Y, X - Y)

= (X, X) - 2(X, Y) + (Y, Y) (A4-2) where we have used Cl:

(X, Y) = (Y, X) (A4-3)

Combining (A4-1) and (A4-2), we get

IIX + YI1 2 + IIX - YII 2 = 2(X,X) + 2(Y, Y) = 2(IIX11 2 + IIYI1 2)

Lemma 3 is known as the parallelogram law.

Definition: Xn converges to X if II Xn - X II -+ 0 as n -+ 00.

Lemma 4. If Xn -+ X and Yn -+ Y, then (Xn' Yn) -+ (X, Y). Proof. if Xn -+ X, and Yn -+ Y

and

II Xn - X II -+ 0 as n -+ 00

II Y n - Y II -+ 0 as n -+ 00

Now HILBERT SPACE 601

(X, Y) = (X - Xn + Xn, Y - Y n + Y n) = (X -Xn, Y - Yn) + (X -Yn, Yn) + (Xn, Y - Yn) + (Xn, Yn)

But

I(X-Xn'YnW~IIYnIlIlX-Xnll~M21IX-Xnll~0 as n~oo

I(Xn'Y-YnW~ IIXnIlIIY-Ynll ~M1I1Y-Ynll~0 as n~oo

I(X - Xn,Y - YnW ~ IIX - XnllllY - Ynll ~O as n~ 00 using Lemma 2 for the each above inequalities. Hence when n ~ 00

This lemma is known as the continuity of norm. A linear manifold, L c H, is a subset of vectors in H such that for any Xi E L and XjEL, then O(X i + 0(2XjEL for any pairs of scalars 0(1 and 0(2' A set of vectors {Xii = 1,2, ... , n} is said to be linearly independent if, given c1 , c2 , ••. , Cn, such that

n L CiXi = ° i=l then all {cJ are zeros. If the linearly independent vectors in H generate or span H, then the vectors are called a basis for H. In an n-dimensional Eu• clidean space Rn, the basis vectors are

e 1 = [1,0, ... ,0]" e2 = (0, 1, ... ,0)" ... , en = [0,0, ... ,1],

Any vector X E R n is given by

n X = L eixi = (X 1 ,X2 ,···,xn) i=l

where Xi is the ith component of the vector. If the basis vectors {XJ are such that

then basis vectors are called orthogonal basis vectors. If the orthogonal basis 602 APPENDIX 4

vectors satisfy the condition

IIXil1 = 1 for all i

the basis vectors are called orthonormal basis vectors. If a smooth scalar function x(t) is a periodic function with period T and S~C() Ix(tW dt < 00, then x(t) can be expressed by Fourier expansion:

00 x(t) = I Ci exp [in2JifotJ n=O

where fo = liT, T is the period. The orthonormal basis vectors are

where

xn = exp [in2nfotJ

If Y 1, Y 2, ... are any basis vectors then we can always construct orthonormal basis vectors V 1, V 2, ... by using the Gram-Schmidt orthonomoralization scheme. The recursive scheme is given by following equations:

Xl = Y l

X 2 = Yz - a21 Y 1

where

We get the normal basis by normalizing Xn vectors;

for all n

The vectors V l' V 2' ... form an orthonormal basis of the linear space spanned by the basis vectors Y1' Y2 , •••• A set of vectors r l' r 2, ... , r n are called reciprocial basis or dual basis vectors if HILBERT SPACE 603

I, i=j (ri'V) = (jij = {0, i #- j

For any X E H, a space spanned by orthonormal basis vectors, we can write

00 00 X = I (XiVi = I (ri,x)Vi i~O i~O where {rJ and {vJ are dual basis vectors and {v;} are orthonormal basis vectors. Let L be a subset of Hand M be a set of finite linear combination of elements in L, i.e., M = {Y: Y = I7~1 aiXi,XiEL}. Mo is called the linear manifold spanned by L. To get Mo, we add a set oflimiting points of H such that new set M is a complete Hilbert space. M is called a subspace of H. M is closed with respect to addition and scalar multiplication and M contains all the limiting points. Let Ll and L z be two subsets of H. If every element of Ll is orthogonal to every element of L z, then we say Ll is orthogonal to L z. We denote Ll 1. L z. If Ml and Mz are two subspaces generated by Ll and L z, respectively, the Ml and Mz are orthogonal. It is known from elementary geometry that the shortest distance from a point to a straight line is the perpendicular distance. The shortest distance (vector), from a point to a subspace is orthogonal to the subspace. If X E H, the Hilbert space and M is a subspace in H such that X is not in M. The problem is to find the vector ME M such that x - rn is minimum. If rno minimizes the norm II x - m II, then rno is called the projection of X in M. It is shown in Fig. A4-1 (x - rno) is perpendicular from X to M.

x

M H

Fig. A4-1. Geometrical interpretation of the projection theorem. 604 APPENDIX 4

Projection Theorem. Let M be a subspace of H, and X belong to H but not in M. Then X can be represented by

X=mo+Z (A4-4)

where mo EM and Z E Mt, a set perpendicular to M. Further

IIX - moll ~ Ilx - mil (A4-5) for the mE M, where the equality holds if mo = m. Proof (a) Existence. We will prove first that then exists mo E M which yields the minimum norm (mean square error). Denote

em = min IIX - mil

M-l = {m.l M,mEH}

Let there be a sequence mn EM such that II mn - X II --> em as n --> 00. Suppose em = O. There exists a sequence vectors mn E M such that II X - mn II --> 0 as n --> 00. This implies mn --> X. Since M is complete, then X E M. But this contradicts the hypothesis that x E M. Note that em is never negative because power (norm) is always nonnegative. Hence

limit IIX - mnll--> min IIX - mil ~ 0 (A4-6)

we form the identity:

II(mi + X) + (X - mj )11 2 + II(mi - X) - (X - mj )1I 2 = 211mi - XI1 2 + 211mj - XII 2 (A4-7a)

using the parallelogram theorem (Lemma 3). Rearranging (A4-7a), we get

lI(m; + X) + (X - mj )1I 2 = 211m; - XI1 2 + 211mj - XII 2 - lI(mi - X) - (X - mj )11 2 (A4-7b)

On simplification of (A4-7b), we obtain

11m; - mjll2 = 211m; - XII 2 + 211mj - XI1 2 - IImj + mj - 2XI1 2 = 211m; - XI1 2 + 211mj - XI1 2 - 411 (m;; mj ) - X r (A4-8) HILBERT SPACE 605

Since mi, mjEM and M is a subspace, (mi + m)j2EM and

II(mi + m)j2 - XII ~ e~ (A4-9a)

Fix 8> 0, choose N such that for all i,j > N,

Ilmi-XII

Equations (A4-8) and (A4-6) yield

II mi - mj 112 ~ 2 (~ + e~ ) + 2 (~ + e~ ) - 4e~ = t:

Hence mi is a Chauchy sequence. Therefore there exists a vector mo E H such that IImi - moll-+O as i -+ 00. Also M is a closed subspace, so moEM. By Lemma 4, limo - X II = em. This establishes the existence of mo such that

min IIX - mil = IIX - moll = em m

(b) Orthogonality. Let us denote

Z=X-mo

Suppose Z is not in M-l, i.e., Z is not orthogonal to M. In this case

(Z,m) = r # 0

for all mEM. moEM and, let m 1 EM, then mo + 2ml EM. Thus

liZ - 2mlll = IIX - (mo + 2m1)11 ~ em (A4-9c) II Z II = II X - mo II = em (A4-10)

Hence, from Eqs. (A4-9a, b, c) and (A4-10), we get

(A4-11)

Further,

IIX - mo - 2mll12 = liZ - 2mll12 = (Z - 2m1 ,Z - Am 1) = IIZII 2 -22(Z,ml)+ IImll1222 = IIZI1 2 - 22r + IImlll222 (A4-12) 606 APPENDIX 4

Eq. (A4-12) can be written as

since A is arbitrary, we can find A such that

when r # 0,

which contradicts (A4-11). Hence Z = (X - mo)EM-I. (c) Uniqueness. Denote

X = m i + ZI X = mz + Zz

If the representation is unique, then mi = mz and ZI = Zz. Since mi EM and mz EM, then mi - mz EM. ZI EM-I and Zz EM-I. So Z1 - Zz EM-I. (mi - mz) 1- (Z1 - Zz). Also

m1 - mz = ZI - Zz Ilml - mzllz = (m1 - mZ,m1 - mz) = (m1 - mZ,Zl - Zz) = 0 m1 - mz = Z1 - Zz = o.

Hence the representation is unique. Finally we note that (X - m) 1- M. Hence

II X - mo liz = II X - m + m - mo liz = II X - m liz + II m - mo liz :::; IIX-mllz

by using Pythagorian theorem (Lemma 1) in the second step. The equality occurs if m = mo. This completes the proof.

Let {Y(O), ... , Y(n - 1)} be n observations, and M be the linear manifold of the {Y(j)}. X(n) is not in M, but the projection of X(n) is in M. Denote the HILBERT SPACE 607 projection of X(n) by t(n). Let

n-l t(n) = L a;Y(i) (A4-13) ;=0 where {A;} are constants. The projection theorem states that «X(n) - t(n)) is orthogonal to the space of observations, M: (X(n) - t(n)) 1. M. Hence (X(n) - X(n)) is orthogonal to all the elements in M. Therefore

E[(X(n) - t(n))'Y(j)] = 0, 0~j~n-1 (A4-14)

Simplification of (A4-14) gives

or, E(t'(n)Y(j)) = E«X'(n)Y(j)), 0~j~(n-1) (A4-15)

If {X(n)} is a stationary random process with mean zero and finite co• variance, (A4-15) yields

n-l L a;Ryy(i - j) = Rxy(n - j), 0~j~n-1 (A4-16) ;=0

The projection of X(n) on M yields the minimum mean square error by the projection theorem. Hence the orthogonality principle gives the minimum mean square error. (A4-16) is known as a normal equation. The reader is referred to Royden (1968) and Yosida (1972) for further study on Hilbert spaces. Luenberger (1969) deals with projection theorems and optimization. Bibliography

Abramowitz, M. and 1. Stegun (1965), Handbook of Mathematical Functions, Dover Publica• tions, N.Y. Akaike, H. (1974) A new look at the Statistical Model Identification, IEEE Trans. Automatic Control AC-19, 667-673. Alder, Berno, editor (1973), Methods in Computational Physics, Vol. 13, Academic Press, N.Y. Anderson, Brian D. O. and John B. Moore (1979), Optimal Filtering, Prentice-Hall Inc. Asefi, T. (1978), Stochastic Process, Estimation Theory and Image Enhancement, JPL Publica• tion, Pasedena, California. Astrom, Karl J. (1970), Introduction to Stochastic , Academic Press. N.Y. Balakrishnan, A. V., editor (1966), Advances in Communication Systems, Volume 2, Academic Press, N.Y. ---, editor (1968), Communication Theory, McGraw-Hili Inter-University Electronic Series. Bartlett, M. S., (1978) An Introduction to Stochastic Processes, Cambridge University Press, Cambridge, U.K. Battin, R. (1964), Astronomical Guidance, McGraw Hill Book Co., N.Y. Bellman, Richard E. (1967) Introduction to Mathematical Theory of Control Processes, Linear Equations and Quadratic Criteria, Vol. I, Academic Press, N.Y. ---(1970), Introduction to Matrices, McGaw-Hili Book Co. New York, N.Y. ---, and S. Dryfus (1962), Applied Dynamic Programming, Princeton University, Princeton, N.J. --- and R. E. Kalaba (1965), Quasilinearization and Nonlinear Boundary-Value Problems, Volume 3, American Elsevier Publishing Company Inc. N.Y. Bendat, Julius S. and Allan G. Piersol (1971), Random Data: Analysis and Measurement Proce• dures, Wiley-Interscience Inc., N.Y. ---(1977), Principles and Applications of Random Noise Theory, Robert E. Krieger Publish• ing Company, Huntingon, N.Y. Bendat, Julius S. and A. G. Piersol, (1980), Engineering Aspects of Correlation and Spectral Analysis, John Wiley, N.Y. Bhat, V. Narayan (1972), Elements of Applied Stochastic Processes, John Wiley and Sons, Inc. New York. Bierman, Gerald J. (1973), A Comparison of Discrete Linear Filtering, Algorithms, IEEE Trans• actions on Aerospace and Electronic System~;-Vol. AES-9, No.1, 28-37. ---(1977), Factorization Methods for Discrete Sequential Estimation, Academic Press, N.Y. ---, and Catherine L. Thornton (1977), Numerical Comparison of Kalman Filter Al- gorithms: Orbit Determination Case Study, Automatica Vol. 13,23-35, Pergamon Press. Blackman, Nelson M. (1966), Noise and its Effect on Communication, McGraw Hill Book Company. N.Y. Blackman, R. B. and J. W. Tukey (1958), The Measurement of Power Spectra. Dover Publica• tions, Inc. N.Y. Blake, 1. F. and W. C. Lindsey (1973), Level Crossing problems for Random Processes, IEEE Transaction on , Vol. 17-19, No.3 295-315. Breiman, L. (1968), Probability, Addison-Wesley Book Co., Reading Mass. Breipohl, Arthur M. (1970), Probabilistic Systems Analysis, An Introduction to Probabilistic Models, Decisions, and Applications of Random Processes, John Wiley and Sons, Inc. N.Y.

609 610 BIBLIOGRAPHY

Brillinger, David R. (1981) Time Series, Data Analysis and Theory, Holden-Day, Inc., San Francisco, CA. Brighham, E. O. and R. R. Morrow (1967), The Fast Fourier Transform, IEEE Spectrun, 63-70. Box, George E. P. and Gwilym M. Jenkins (1976), Times Series Analysis Forecasting and Control, Holden-Day Inc., San Francisco, CA. Brockett, R. W. (1970), Finite Dimensional Linear Systems, John Wiley & Sons. Inc., N.Y. Brogan, W. L. (1974), Modern Control Theory, Quantum Publishers, Inc., N.Y. Bronez, T. P. and J. Cadzow (1983), An Algebraic Approach to Superresolution Array Process• ing, IEEE Transaction on Aerospace & Electronics, Vol. AGSI9, 1, 123-133. Brown, R. G. (1983), Introductions to Signal Analysis and Kalman Filtering, John Wiley & Son Inc. N.Y. Bryson, Jr. Arthur E. and Yu-Chi-Ho (1975), Applied Optimal Control Optimization Estimation and Control, Hemisphere Publishing Corp., Washington, D.C. Bucy, Richard S. and R. D. Joseph (1968), Filteringfor Stochastic Processes with Applications to Guidance, Interscience Publishers. John Wiley and Sons, Inc., N.Y. Burg, J. P. (1967), Maximum Entropy Spectral Analysis, 37th Annual International S.E.G., Meeting, Oklahoma City, OK. Cadzow, James A. (1973), Discrete Time Systems, Prentice-Hall, Inc., Englewood Cliffs, N.J. --- (1982), ARMA Modeling of Time Series, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-4, No.2, 124-128. Capon, J. (1969), High Resolution Frequency Wave Number Spectrum Analysis, Proc. IEEE, 57,1408-1418. Carlson, Neal A (1973), Fast Triangular Formulation of the Square Root Filter, AIAA Journal, Vol. 11, No.9, 1259-1265. Chan, Y. T. and J. M. M. Lavoie and J. B. Plant (1981), A Parameter Estimation Approach to Estimation of Frequencies· of Sinusoids, IEEE Transaction on Acoustics, Speech and , Vol. ASSP-29, No.2, 214-219. Chatfield, C. (1980), The Analysis of Time Series: An Introduction, Second Edition, Chapean and, Hall Ltd., London. Childers, Donald G. Editor (1978), Modern Spectrum Analysis, IEEE Press, selected reprint series. Chung, K. L. (1968), Probability, Harcourt, Brace Jovanovich, N.Y. Claerbout, J. F. (1976), Fundamentals of Geophysical Data Processing, McGraw Hill Book Co., N.Y. Conover, W. J. (1980), Practical Nonparametric Statistics, John Wiley and Sons, N.Y. Cooper, George R. and Clare D. McGillen (1967), Methods of Signal and System Analysis, Holt, Rinehart and Winston, Inc., N.Y. Cramer, Harold and M. R. Leadbetter (1967), Stationary and Related Stochastic Processes, John Wiley and Sons, Inc., N.Y. Cruz, Jose B. and M. E. VanValkenbur (1974), Signals in Linear Circuits, Houghton Mifflin Company, Boston, Mass. Dahlquist, B., A. Byorch and N. Anderson (1974), Numerical Methods, Prentice Hall Inc. Englewood Cliffs, N.J. Davenport, Wilbur B. Jr., and William Root (1958), An Introduction to the Theory of Random Signals and Noise, McGraw-Hill Book Company, N.Y. ---(1970), Probability and Random Processes, McGraw-Hill Book Company, N.Y. Davis, M. H. A. (1977), Linear Estimation and Stochastic Control, Chapman and Hall LTD, A Halsted Press Book, John Wiley and Sons, N.Y. Desoer, C. A. (1970), A Second Course on Linear Systems, Van Nostrand Reinhold Co., N.Y. BIBLIOGRAPHY 611

DiFranco, J. V. and W. L. Rubin (1968), Radar Detection, Prentice-Hall, Inc. Englewood Cliffs, N.J. Dillard, G. M. and C. E. Antoniak (1970), A Practical Distribution-Force Detection Proceedure for Multiple Range-Bin Radars, IEEE Trans on Aerospace and Electronic Systems, Vol. AES6, No.5, 629-635. Doob, J. L. (1953), Stochastic Processes, John Wiley and Sons, Inc., N.Y. Duda, Richard O. and Peter E. Hart (1973), Pattern Classification and Scene Analysis, Wiley• Interscience, N.Y. Durbin, J. (1960), The Filtering of time-series models, Rev Inst. Internat. Statistics, 28, 233-244. Edward, J. A. and M. M. Fitelson (1973), Notes on Maximum-Entropy Processing, IEEE Transactions on Information Theory, 232-233. Eghbali, H. J. (1979), K-S Test for Detecting Changes from Landsat Imagery Data, IEEE Trans. on Systems, Man and Cybernatics, Vol. SMC-9, No. I, 17-23. Eykhoff, Pieter Editor, (1981), Trends and Progress on System Identification, Pergamon Press, Inc., Maxwell House, Fairview Park, N.Y. Faddeeva, V. N. (1959), Computational Methods of Linear Algebra, Dover Publications, Inc. N.Y. Feller, William (1966), An Introduction to Probability Theory and Its Applications, Vol. II John Wiley and Sons, Inc., N.Y. ---(1968), An Introduction to Probability Theory and Its Applications, Vol. I John Wiley and Sons, Inc., N.Y. Ferguson, T. S. (1967), Mathematical Statistics, Academic Press, N.Y. Fleming, Wendell H. and Raymond W. Rishel (1975), Deterministic and Stochastic Optimal Control, Springer-Verlag, N.Y. Frank, L. 1969, Signal Theory, Prentice-Hall, Englewood Cliffs, N.J. Franklin, Bener, F. and David J. Powell (1980), Digital Control of Dynamic Systems, Addison• Wesley Publishing Co., Reading, Mass. Fraser, Donald C. and James E. Potter (1969), The Optimum Linear Smoother as a Combina• tion of two Optimum Linear Filters, IEEE Transactions on Automatic Control, 387-390. Friedlander, Benjamin (1982), Lattice Filters for Adaptive Processing, Proceedings of the IEEE, Vol. 70, No.8, 829-867. ---(1983), Instrumental Variable Methods for ARMA Spectral Estimation, IEEE Transac• tions on Acoustics; Speech, and Signal Processing, Vol. ASSP-31, No.2, 404-415. Fukunaga, Keinosuke (1972), Introduction to Statistical Pattern Recognition, Academic Press, N.Y. Gagliardi, Robert M. (1978), Introduction to Communications Engineering, Wiley-Interscience, N.Y. Gallager, Robert G. (1968), Information Theory and Reliable Communication, John Wiley and Sons Inc. Gelb, Arthur Editor (1974), Applied Optimal Estimation, The M.LT. Press, Cambridge, MA. Gersch, Will and Douglas Foutch (1974), Least Squares Estimates of Structural System Param• eters Using Covariance Function Data, IEEE Transactions on Automatic Control, Vol. AC- 19, No.6, 898-903. ---, and D. R. Sharpe (1973), Estimation of Power Spectra with Finite-order Autoregressive Models, IEEE Trans on Automatic Control, Vol. ACI8, 8. 367-369. Gibson, Jerry D., and James L. Melsa (1975), Introduction to Nonparametric Detection with Applications, Academic Press, N.Y. Gold, Bernard and Charles Rader (1969), Digital Processing of Signals, McGraw Hill Book Company N.Y. Goodman, J. W. (1968), Introduction to Fourier Optics, McGraw Hill Book Co., N.Y. 612 BIBLIOGRAPHY

Gradshteyn,1. S. and 1. M. Ryzhik (1965), Table of Integrals, Series, and Products, Translation edited by Alan Jeffrey, Academic Press, N.Y. Graybill F. A. (1969), Introduction to Matrices with Application to Statistics, Wadsworth Pub• lishing Co., Belmount, CA. Grenander, Ulf, and Gabor Szego (1969), Toeplitz Forms and Their Applications, University of California Press, Berkley and Los Angeles. Hancock, John C. and Wintz, Paul A. (1966) Signal Detection Theory, McGraw Hill N.Y. Hansen, V. G. and B. A. Olsen, (1971) Nonparametric Radar Extraction Using a Generalized Sign Test, IEEE Trans. on AES, Vol. AE7, No.5, 942-950. Harris, C. J. (1976), Problems in System Identification and Control, The Institute of Mathematics and Its Applications, May, 139-150. ---, F. J. (1978) On the Use of Windows for Harmonic Analysis, IEEE, Vol. 66, 51-83. Hassab, Joseph C. and Ronald E. Baucherm (1979), Optimum Estimation of Time Delay by a Generalized Correlator, IEEE Transactions on Acoustics, Speech, and Signal Processing Vol. ASSP-27. No.4 373-379. Haykin, Simon, Editor (1976), Detection and Estimation Applications to Radar, Dowden Hut• chinson and Ross, Inc., Stroudsburg, Penn. ---, Editor (1979) Nonlinear Methods of Spectral Analysis, Springer-Verlag, Berlin, Heidel- berg, N.Y. ---, and J. Cadzow Editors (1982) Proc IEEE, August. Helstrom, Carl W. (1968), Statistical Theory of Signal Detection, Pergamon Press N.Y. Hildebrand, F. B. (1956), Introduction to Numerical Analysis, McGraw-Hill Book Company, Inc. Ho, Yu-Chi and Ashok Agarwal (1968), On Pattern Classification Algorithms, Introduction and Survey, IEEE Transactions on Automatic Control, Vol. AC-13, No.6, 676-689. Hoe!, Paul G., Sidney C. Port and Charles J. Stone (1972), Introduction to Stochastic Processes, Houghton Mifflin Company, Boston, Mass. Hollander, Miles and Douglas A. Wolf (1973), Nonparametric Statistical Methods, John Wiley and Sons, N.Y. Isermann, R. U. Bauro, W. Bamberger, P. Kneppo and H. Siebert (1974), Comparison of Six on-line Identification and Parameter Estimation Methods, Automatica, Vol. 10, 81-103, Pergamon Press. Jazwinski, Andrew H. (1970), Stochastic Processes and Filtering Theory, Academic Press, N.Y. Jenkins, Gwilym M. and Donald G. Watts (1968), Spectral Analysis and its Applications, Holden-Day, San Francisco, CA. Kagiwada, Harriet and Robert Kalaba (1974), Integral Equations via Imbedding Methods, Addison-Wesley, Reading, Mass. Kailath, Thomas (1968), An Innovations Approach to Least-Squares Estimation Part I: Linear Filtering in Additive White Noise, IEEE Transactions on Automatic Control, Vol. AC-13, No.6., 646-660. --- (1969a) Fredholom Resolvents, Wiener-HolpfEquation and Ricati-Differential Equa• tions, IEEE Transaction on Information Theory, Vol. I T-15, No.6, 665-672. --- (1969b), A General Likelihood-ratio Formula for Random Signals in Gaussian Noise, IEEE Transaction on Information Theory, Vol. I T-15, 350-361. ---(1970), The Innovations Approach to Detection and Estimation Theory, Proceedings of the IEEE, Vol. 58 No.5., 680-695. --- (1972), A Note on Least Squares Estimation by the Innovations Method, SlAMS Control, Vol. 10(3).477-486. --- (1974), A view of Three Decades of Linear Filtering, IEEE Trans. on Information Theory, IT 20, 2146-181. ---(1980), Linear Systems, Prentice Hall, Englewood Cliffs, N.J. BIBLIOGRAPHY 613

Kallianpur, Gopinath (1980), Stochastic Filtering Theory, Springer-Verlag, N.Y. Kalman, R. E. and R. S. Bucy (1961), New results in Linear Filtering and Prediction, J. Basic Eng, Trans ASME, 83, 3, 95-108. Kanwal, Ram P. (1971), Linear Integral Equations: Theory and Technique, Academic Press N.Y. Karlin, Samuel and Howard M. Taylor (1975), A First Course in Stochastic Processes, Academic Press. N.Y. Kassam, S. A., and J. B. Thomas (1975), A Class of Nonparametric Detectors for Dependent Input Data, IEEE Trans on Information Theory, Vol. IT-21. No.4, 431-437. Kay·, Steven M. and Stanley Lawrence Marple (1981), Spectrum Analysis-A Modern Perspec• tive, Proceedings of the IEEE, Vol. 69, No. 11, 1380-1419. Kazakos, P. Papantoni and Demitri Kazakos Editors (1977), Nonparametric Methods in Com• munications, Marcel Dekker Inc., N.Y. Kendall, Sir Maurice (1976), Time-Series, 2nd edition, Hafner Press, A Dddivision of MacMillian Publishing Co., Inc. N.Y. Kennedy, R. (1968), Communication through Dispersive Channel, McGraw Hill, N.Y. Kleinnock, Leonard (1975), Queueing Systems, Volumes I and II, Wiley-Interscience, N.Y. Koopmans, L. H. (1974), The Spectral Analysis of Time Series, Academic Press Inc., N.Y. Kreyszig, E. (1979), Advanced Engineering Mathematics, John Wiley & Sons, N.Y. Kumarsan, Ramdas and Donald W. Tufts (1983), Estimating the Angles of Arrival of Multiple Plane Waves, IEEE Transactions on Aerospace and Electronic Systems, Vol. AES-19, No.1, 134-369. Kushner, H. J. (1969), Stochastic Control and Stability, Academic Press, N.Y. ---(1971), Introduction to Stochastic Control, Holt, Rinehart and Winston, Inc., N.Y. --- (1976), Finite Difference Methods for Weak Solutions, J. Math Anal and Api., 55, 251-265. Landers, T. E. and R. T. LaCoss (1977), Some Geophysical Applications of Autoregressive Spectral Estimates, IEEE Trans. On Geoscience Electronics, Vol. GElS, 26-37. Lang, Stephen W. and James H. McClellan (1983), Spectral Estimation for Sensor Arrays, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-31, 349-358. Larimore, M. G. (1980), Convergence Studies of Thompson's Unbiased Adaptive Spectral Estimator, 14th Asilomar Con! on Circuits, Systems and Computation, Pilcifjc Grove, CA, 17-19. Larson, H. J. and B. O. Shubert (1979), Probabilistic Models in Engineering Sciences, Vol II: Random Noise, Signals and Dynamical Systems, John Wiley and Sons. N.Y. ---and ---(1979), Probabilistic Models in Engineering Sciences, Vols. I & II, John Wiley and Sons, N.Y. LaSalle, J and S. Lefschetz (1961), Stability by Liapunov Methods, Academic Press, N.Y. Laub, A. J. (1979), A Schur Method for solving Algebraic Ricati Equations, IEEE Trans. Automatic Control, Vol. 24, No.6, 913-921. Lawson, C. L. and R. J. Hanson (1974), Solving Least Squares Problems, Prentice-Hall, Englewood, N.J. Lee, Daniel T. L., Martin Morf and Benjamin Friedlander (1981), Recursive Least Squares Ladder Estimation Algorithms, IEEE Transactions on Acoustics, Speech, and Signal Process• ing, Vol. ASSP-29, No.3, pp. 627-641. Lehmann, E. L. (1975), Nonparameterics: Statistical Methods Based on Ranks, Holden Day, San Francisco, CA. Lighthill, M. J. (1959), Introduction to Fourier Analysis, Cambridge Univ. Press., Cambridge, U.K. Lindsey, W. C and M. K. Simon (1975), Telecommunication Systems Engineering, Prentice-Hall, Englewood Cliffs, N.J. ---and Heinrich Meyr (1977), Complete Statistical Description of the Phase-Error Process 614 BIBLIOGRAPHY

Generated by Correlative Tracking Systems, IEEE Transactions on Information Theory, Vol. IT. 23, No.2, 194-199. Liu, C. L. and Jane W. Liu (1975), Linear Systems Analysis, McGraw Hill Book Company, N.Y. Lloyd, Emlyn, Editor (1980), Handbook of Applicable Mathematics, Vol. II, Wiley-Interscience, N.Y. Luenberger, David G. (1969), Optimization by Vector Space Methods, John Wiley and Sons. Inc. N.Y. ---(1979), Introduction to Dynamic Systems, Theory, Models, and Applications, John Wiley and Sons, N.Y. Liung, Lennart and Keith Glover (1981), Frequency Domain Versus Time Domain Methods in Systems Identification, Automatica, Vol. 17., No. 1,71-86. Meditch, J. S. (1969), Stochastic Optimal Linear Estimation and Control, McGraw-Hill Book Company. N.Y. Makhoul, John (1975), Linear Prediction: A Tutorial Review, IEEE Proceedings Vol. 63, No.4., 561-579. --- (1977), Stable and Efficient Lattice Methods for Linear Prediction, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-25, 423-428, No.5. --- (1978), A Class of All-zero Lattice Digital Filters: Properties and Applications, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-26, No.4, 304-314. ---, and Cossell (1981), Adaptive Lattice Analysis of Speech, IEEE Transactions on Acous• tics Speech, and Signal Processing, Vol. ASSP-29, No.3, 654-659. Marple, Larry (1980), A New Autoregressive Spectrum Analysis Algorithm, IEEE Transac• tions on Acoustics, Speech, and Signal Processing, Vol. ASSP-28, 441-453. Maybeck, Peter S. (1982) Stochastic Models, Estimation, and Control, Vols. I and 2, Academic Press, N.Y. Meditch, J. S. (1969), Stochastic Optimal Linear Estimation and Control, McGraw-Hill Book Company, N.Y. Mehra, Raman K. (1971), On-line Identification of Linear Dynamic Systems with Applications to Kalman Filtering, IEEE Transactions on Automatic Control, Vol. AC-16, No.1, 12-21. ---(1971), A Comparison of Several Nonlinear Filters for Reentry Vehicle Tracking, IEEE Transactions on Automatic Control, Vol. AC-16, No.4. 307-319. ---, and Dimitri G. Lainiotis (1976), System Identification Advances and Case Studies, Academic Press, N.Y. Mendel, Jerry M. (1973) Discrete Techniques of Parameter Estimation, Marcel Dekker, Inc. N.Y. Meyer, Paul L. (1970), Introductory Probability and Statistical Applications. Addison-Wesley Reading, Mass. Meyer, Paul Andre (1972), Martingales and Stochastic Integrals, Springer-Verlag, N.Y. Meyer, Stuart L. (1975), Data Analysis for Scientists and Engineers, John Wiley & Sons Inc. N.Y. Miller, Kenneth S. (1974), Complex Stochastic Processes: An Introduction to Theory and Application, Addison-Welsley Publishing Company, Inc., Reading, Mass. Mohanty, N. C. (1971), Identificability of Laguerre Mixtures, IEEE Trans. on In! Theory. Vol. 8,514-515. ---(1973), m-ary Laguerre Detection, IEEE Trans. on Aerospace and Electronics, 464-467. ---(1974), Estimation of Delay in MPPM Signals in Optical Communications, IEEE Trans. on Communication, Vol. 22, 713-714. ---(1976), Error Probability in Binary Digital Communication in the Presence ofInterfer• ring Signals, IEEE Trans. on Aerospace and Electronics Systems, Vol. AES-12, 517-519. BIBLIOGRAPHY 615

--- and T. T. Soong (1977), Linear Filtering in Multiplication Noise, Information and Control. Vol. 34,141-147. --- (1977) Transition Density of Phase error in Nonlinear Tracking Systems, Information Sciences, 13,239-252. --- (1978), Spectrum Estimation of Non-Uniform Sampled Values, Int. Conf. on Acoustic, Speech and Signal Processing, Tulsa, Oklahoma. --- (1978), Laguerre and Poisson Distributions, Bulletin of Int. Statistical Institute, 346-349. --(1978), Radar Reflection from Randomly Moving Scatteres, Froc. IEEE, 66, 86-88. --(1978), Generation of Isotropic Random Field, SIAM J. Appl. Mathematics, Vol. 35, 358-361. ---(1979), Detection of Narrow band FM Signal, IEEE Trans on Communication, Vol. 27, 1806-1809. --- (1981), Terminal Homing Guidance Seeker, International Telemetering Cont. San Diego, CA. 1327-1349. ---(1981), Computer Tracking of Moving objects in Space, IEEE Trans on Pattern Analy• sis and Machine Intelligence, 3, 606-611. --- (1981), Adaptive Equalizer for Carrier Modulated Data Transmission Int. Information Theory Symp. Snata Monica, CA. --- (1982), Image Detection and Enhancement of Moving Ship in the Clultered Back• ground, IEEE Conference on Image Processing and Pattern Recognition. Las Vegas, Nerada. --- (1983), Autonomous High Altitude Naviation Satellittes, Information Sciences, 125-150. --(1984), Error Probability in Rice-Nakagarri Channel, Proc ofIEEE. Vol. 72, 129-130. Morf, Martin and Thomas Kailath (1975), Square-Root Algorithms for least-Squares Estima• tion, IEEE Transactions on Automatic Control, Vol. Ac-20, No.4., 487-497. ---, J. R. Dobbins, Benjamin Friedlander and Thomas Kailath (1979), Square-root Al• gorithms for Parallel processing in Optimal Estimation, Automatica Vol. 15,299-306. Mood, Alexander McFarlane and Franklin A. Graybill (1963), "Introduction to the Theory of Statistics," McGraw-Hill Book Company, Inc. New York. Moore, John T. (1968), "Elements of Linear Algebra and Matrix Theory," McGraw-Hill Book Company, N.Y. Nahi, N. E. (1969), Estimation Theory and Applications, John Wiley and Sons, Inc. N.Y. Nguyen, V. V. and E. F. Wood (1982), Review and unification of Linear Identificability Concepts, SIAM Review Vol. 26, 34-52. Noether, Gottfried (1967), Elements of Nonparametric Statistics, John Wiley and Sons, Inc. N.Y. Nugent, Sherwin T. and John P. Finley (1983), Spectral Analysis of Periodic and Normal Breathing in Infants, IEEE Transactions on Biomedical En-gineering Vol. BME-30 No. 10, 672-675. Nuttall, Albert H. (1981), Some Windows with very good Sidelobe Behavior, IEEE Transac• tions on Acoustics, Speech, and Signal Processing, Vol. ASSP-29, No. 84-91. Olkin, Ingram, Leon J. Glesser and Cyrus Derman (1980), Probability, Models and applica• tions, MacMillan Publishing Co., Inc. N.Y. Oppenheim, Alan V. and Ronald Scafer (1975), Digital Signal Processing, Prentice-Hall, Inc., Englewood, Cliffs, N.J. ---, Editor (1978) Applications of Digital Signal Processing, Prentice-Hall, Inc. Englewood, Cliffs, N.J. Ostle, Bernard and Richard Mensing (1975), Statistics in Research, The Iowa State University Press, Ames, Iowa. 616 BIBLIOGRAPHY

Otnes, Robert K. and Loren Enochson (1978), Applied Time Series Analysis, Vol. 1, Basic Techniques, Wiley-Interscience, N.Y. Papas, Thrasyvoulos, Alan J. Laub and Nils R. Sandell Jr. (1980), On the Numerical Solution of the Discrete-Time Algebraic Riccati Equation, IEEE Transactions on Automatic Control, V 0t AC-25, No.4, 631-641. Papoulis, Athanasios (1965), Probability, Random Variables, and Stochastic Processes, McGraw-Hill Book Company N.Y. ---(1977), Signal Analysis, McGraw-Hill Book Company, N.Y. --- (1981), Maximum Entropy and Spectral Estimation A Review, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-29, No.6, 1176-1186. Parzen, Emanuel (1962), Stochastic Processes, Holden Day, San Francisco, CA. ---(1967), Time Series Analysis, Holden Day, San Francisco, CA. Phadke, M. S. and S. M. Wu (1977), Identification of Multiinput Multioutput Transfer Func• tion and Noise Model Model of a Blast Furnace from Closed-Loop Data, IEEE Transactions on Automatic Control, Vol. AC-19, 6, 944-951. Piersol, Allan G. (1981) Time Delay Estimation Using Phase Data, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. ASSP-29, No.3, 471-477. --- (1981), Use of Coherence and Phase Data Between Two Receivers in Evaluation of Noise Environments, Journal of Sound and Vibration, Vol. 56,215-228. Pisarenko, V. F. (1973), The Retrieval of Harmonics from a Covariance Function, Geophysics, J. R. Astr., 347-366. Priestley, M. B. (1981), Spectral Analysis and Time Series, Vols. I & II, Academic Press, N.Y. Proakis, John G. (1974), Channel Identification for High Speed Digital Communications, IEEE Transactions on Automatic Control, Vol. AC-19, No.6, 916-922. Rabiner, L. R., and B. Gold (1975), Digital Signal Processing, Prentice-Hall, Englewood Cliffs, N.J. Rajban, N. S., and V. M. Chadeev (1980), Identification of Industrial Processes, North Holland Publishing Co., Amsterdam. Rao, C. Radhakrishnan (1973), Linear Statistical Inference and Its Applications, Second Edi• tion, John Wiley & Sons, N.Y. Rauch, H. E., F. Tung, and C. T. Striebel (1965), Maximum Likelihood Estimates of Linear Dynamic Systems, AIAA Journal, Vol. 3, No.8, 1445-1449. Reddi, S. S. (1979), Multiple Source Location-A Digital Approach, IEEE Transactions on Aerospace and Electronic Systems, Vol. AES-15, No.1, 95-104. Reddy, V. V., B. Egardt and T. Kailath (1982), Least Squares Type Algorithm for Adaptive Impelementation of Pisarenko's Harmonic Retrieval Method, IEEE Transaction on Acous• tics, Speech and Signal Processing, Vol. ASSP-30, No.3, 399-405. Reid, W. T. (1972), Riccati Differential Equation, Academic Press, N.Y. Riesz, Frigyes and B. Sz.-Nagy (1965), Functional Analysis, Translated from the 2nd French Edition by Leo F. Boron, Frederick Ungar Publishing Co., N.Y. Robinson, Enders A. (1983), Multichannel Time Series Analysis with Digital Computer Progress, Goose Pond Press, Texas, 3rd Edition. Ross, Sheldon, M. (1970), Applied Probability Models with Optimization, Holden-Day, San Francisco, CA. Rounds, E. M. (1978), Computations of Two Sample Kolmogorov-Sairnov Statistics, Project Memo TSC-PM-AI42-5, Technology Service Corporation, Santa Monica, CA. --- (1979), A Combined Nonparametric Approach to Feature Selection and Binary Deci- sion Tree Design, IEEE, CH 1428,38-43. Royden, H. L. (1968), Real Analysis, The MacMillan Company, N.Y. Rozanov, Y. A. (1977), Innovation Processes, John Wiley & Sons, N.Y. Rudin, Walter (1966), Real and Complex Analysis, McGraw-Hill Book Company, N.Y. BIBLIOGRAPHY 617

Sage, A. P. and J. L. Melsa (1971), Estimation Theory, With Applications to Communications and Control, McGraw Hill Book Co., N.Y. Saridis, George N. (1974), Comparison of Six On-Line Identification Algorithms, Automatica, Vol. 10,69-79. Satorius, E. H. and S. T. Alexander, (1979), Channel Equalization Using Adaptive Algorithms, IEEE Transactions on Communication, Vol. COM-27, No.6, pp. 899-905. Schwartz, Mischa and Leonard Shaw (1975), Signal Processing, McGraw Hill Book Company, N.Y. ---, and J. D. Peck (1981), Application of Least Squares Lattice Algorithms to Adaptive Equalization, IEEE Transactions on Communications. Vol. COM-29, No.2, pp. 136-142. Schwarz, R. and B. Friedland (1965), Linear Systems, McGraw Hill, N.Y. Searle, Shayle R. (1982), Matrix Algebra Usefulfor Statistics, John Wiley & Sons, N.Y. Selby, Samuel M., Editor-in-Chief, Mathematics, (1975) Standard Mathematical Tables, CRC Press, Inc., Boca Raton, FL. Selin, Ivan (1965), Detection Theory, The Rand Corporation, Princeton University Press, Princeton, N.J. Shanmugam, K. Sam (1979), Digital and Analog Communication Systems, John Wiley and Sons N.Y. Singleton, R. C. (1969), An Algorithm for Computing the Mixed Radix Fast Fourier Trans• form, IEEE. Trans. Audio Electronics, AU-17, 93-103. Slepian, D., and T. T. Kadota (1969), Four Integral Equations of Dectection Theory, SIAM Appl. Math., Vol. 17, No.6, 1102-1116. Snyder, D. (1975), Random Point Process, John Wiley & Sons, N.Y. Soderstrom, T., and P. Stoica (1987), Comparison of Some Instrumental Variable Methods, Automatica, Vol. 17, I, 101-115. Sorenson, H. W. (1980), Parameter Estimation, Marcel Dekker, Inc., N.Y. Specht, Donald F. (1969), Vector Cardiographic Diagnosis Using the Polynomial Discriminant Method of Pattern Recognition, IEEE Transactions on Biomedical Engineering, Vol. BME-14, No. 2,90-95. . Srinath, M. D. and P. K. Rajsekharan (1979), An Introduction to Statistical Signal Processing with Applications, Wiley-Interscience N.Y. Stearns, S. D. (1975) Digital Signal Analysis, Hayden Book Co., Rochelle Park, N.J. Strand, Otto Neall (1977), Multichannel Complex Maximum Entropy (Autoregressive) Spectral Analysis, IEEE Transactions on Automatic Control, Vol. AC-22, No.4, 634-640. Strang, Gilbert (1976), Linear Algebra and its Application, Academic Press, N.Y. Stratonovich, R. L. (1963), Translated from the Russian by Richard A. Silverman Topics in the Theory of Random Noise, Vols. I, II, Gordon and Breach, N.Y. Strejc, V. (1980), Least Squares Parameter Estimation, Automatica, Vol. 16535-550. Stoica, Petre and Torsten Soderstrom (1983), Optimal Instrumental Variable Estimation and Approximate Implementations, IEEE Transactions on Automatic Control, Vol. AC-28, No.7, 753-771. Thomas, John B. (1969), An Introduction to Statistical Communication Theory. John Wiley & Sons, Inc., N.Y. ---(1970), Nonparametric Detection, Proceedings of the IEEE, Vol. 58, No.5., 623-631. Thomasian, Aram J. (1969), The Structure of Probability Theory with Applications, McGraw Hill, N.Y. Thompson, Paul A. (1979), An Adaptive Spectral Analysis Technique for Unbiased Frequency Estimation in the Presence of White Noise, 13th Asilomar Con! on Circuits, Systems and Computation, Pacific Groove, CA, 5-7. Tretter, Steven A. (1976), Introduction to Discrete-Time Signal Processing, John Wiley & Sons, Inc., N.Y. 618 BIBLIOGRAPHY

Ulrych, Tad J. and Rob W. Clayton (1976), Time Series Modelling and Maximum Entropy, Physics of the Earth and Planetary Interiors, 12, 188-200, Elsevier Scientific Publishing Company, Amsterdam. Van Blaricum, Michael L. and Raj Mittra (1978), Problems and Solution Associated With Prony's Method for Processing Transient Data, IEEE Transactions on Antennas and Propa• gation, Vol. AP-26, No. I, 174-183. Van Trees, Harry L. (1968), Detection, Estimation, and Modulation Theory, Part-I John Wiley & Sons, Inc. N.Y. --- (1971), Detection, Estimation, and Modulation Theory, Part II, Nonlinear Modulation Theory, John Wiley & Sons, N.Y. ---(1970), Detection, Estimation and Modulation, Part III, Radar and Sonar Signal Process• ing, John Wiley & Sons, N.Y. Wainstein, L. A. and V. D. Zubakov (1962), Extraction of Signal from Noise, Dover Publica• tions, Inc., N.Y. Wald, A. (1947), Sequential Analyses, Dover Publication, N.Y. Walpole, Ronald E. and Raymond H. Myers (1978), Probability and Statistics for Engineers and Scientists, MacMillian Publishing Co., Inc., N.Y., 2nd edition. Wax, Nelson (1954), Selected Papers on Noise and Stochastic Processes, Dover Publication, Inc., N.Y. Weber, Charles, L. (1968), Elements of Dectection and Signal Design, McGraw-Hill Series in Systems Science, McGraw Hill Book Company, N.Y. Wellstead, Peter E. (1981), Non-Parametric Methods of System Identification, Automatica, Vol. \7, No. 1,55-69. Whalen, Anthony D. (1971), Detection of Signals in Noise, Academic Press, N.Y. Widrow, Bernard (1971), Adaptive Filters: Aspects of Network and System Theory, Edited by R. E. Kalman, Holt, Rinehart and Winston, Inc., N.Y. Wiener, N. (1949), The Extrapolation, Interpolation, and Smoothing of Time Series, The MIT Press, Cambridge, Mass. Wong, Eugene (1971), Stochastic Processes in Information and Dynamical Systems, McGraw• Hill Book Company, N.Y. Wood, M. G., J. B. Moore and B. D. O. Anderson (1971), Study of and Integral Equations Arising in Detection Theory, IEEE Trans. on Information Theory, Vol. 14-17, No.6., pp. 677-686. Woodward, P. M. (1953), Probability and Information Theory, with Applications to Radar, Pergamon Press, N.Y. Wozencraft, John M. and Irwin Mark Jacobs (1965), Principles of Communication Engineering, John Wiley & Sons, Inc., N.Y. Wylie, C. R. and L. C. Barrett (1982), Advanced Engineering Mathematics, McGraw Hill Book Co., N.Y. Yaglom, A. M. (1961), Translated and Edited by Richard A. Silverman, An Introduction to the Theory of Stationary Random Functions, Dover Publications, Inc., N.Y. Yosida, K. (1972), Functional Analysis, Springer-Verlag, Berlin and N.Y. Young, Peters C. (1972), Comments on On-Line Identification of Linear Dynamic Systems with Applications to Kalman Filtering, IEEE Transactions on Automatic Control, 17,269-271. ---(1974), Recursive Approaches to Time Series Analysis, The Institute of Mathematics and its Applications, May/June, 209-224. ---(1981), Parameter Estimation for Continuous Time Models, A Survey, Automatica, Vol. 17, No. 1,23-39. Zeoli, G. W., and T. S. Fong (1971). Performance of a Two Sample Mann-Whitney Nonpara• meteric Detector in a Radar Applications, IEEE Trans. on Aerospace and Electronics Systems, Vol. AES-7 No.5, 931-959. Index

Absolutely continuous distribution, 568 Baseband signal, 150 Absorbing state, 54 Basis, 601 Adaptive filter, 369 Bayes Adaptive lattice method, 369-384 criterion, 223 Adjoint Fokker-Planck equation, 75 risk,223 Akaike information criterion, 341 rule, 308 Aliasing, 329 theorem, 582 Aliasing theorem, 329 Beam forming, 355 All pass filter, 400 Bernoulli random variable, 570 All pole filter, 338 Best linear unbiased estimator, 238 All zero filter, 338, 402 BIBD (bounded input-bounded output), Alternative hypothesis, 293 446,447,448,530 Amplitude modulated signal, 138 Binary detection, 292-309 Amplitude shift keying, 6 Binary phase shift keying (see BPSK) Analytic signal, 551 Binomial distribution, 568 Aperiodic chain, 52 Bivarate Gaussian density, 580 A posteriori density, 225 Bode Shannon Filter, 316 A priori density, 223 Boolean field, 566 Arc sine law, 186 Box-Jenkins forecasting, 547 ARMA, 121,402,527 BPSK,317 Associative law, 598 Brownian motion, 23, 531 Asymtotically stable, 447, 448 Burg method (see Maximum entropy Autocorrelation coefficient, 126 method) Autocorrelation function, 10 Autocovariance, 16 Campbell's theorem, 32, 102 Autoregressive process, 121,450 Capacity theorem (see Shannon-Hartley transfer function criterion, 342 capacity theorem) Average value (see Mean, Expected value) Caratheodory theorem, 360 ensemble (see Expected value) Cardinal series, 552 statistical (see Mean) Cauchy density, 124 time (see Sample mean) Cauchy-Schwartz inequality, 269, 464, 465,558,600 Backward diffusion equation (see Fokker- Cauchy sequence, 465, 599 Planck equation) Causal filter, 569 Backward prediction error, 346 Cayley-Hamilton theorem, 564 Banach space, 599 Central limit theorem, 587 Bandlimited signal, 270, 551 Channel capacity, 165, 170 Bandpass sampling theorem (see Sampling Chapman-Kolomogorov equation, 46 theorem) Characteristic Bandwidth, 335 function, 12,570 Bartlett window, 397 joint function, 2

619 620 INDEX

Characteristic (cont.) Convergence polynomial,146 almost surely, 587 roots, 146 mean square, 587 Chebyshev's inequality, 159,572 probability, 586 Chi square density, 214, 585 Cramer-Rao bound, 250, 253, 362 Cholesky decomposition, 561 Cramer-Rao inequality, 253 Clicks, 178 Critical region, 294, 298, 301 Coherence function, 127,390 level,298 Colored measurement noise, 538 Cross Colored noise, 18, 113 correlation, 126, 380, 387 Compatibility condition, 2 covariance, 12, 16, 126 Complementary error function, 296 power spectral, density, 127, 389 Complementary solution, 425 spectral estimation, 387-393 Complete (space), 599 Crossing of level, 172 Completely controllable, 441, 448 Cycle slipping, 82 Completely observable, 443 Compound Poisson process, 29 Daniel-Bartlett approach, 326, 397,398 Computational load, 369 Data compression, 147 Conditional Delta function, 5 density, 582 Density function, 568 entropy, 162 binomial, 568 expectation, 583 Cauchy, 125 geometric, 57 chi square, 212 Poisson process, 30 Erlang (see Gamma) probability, 581 exponential, 89,585 Conditioning, 544 Gamma, 104 Confidence interval, 247, 249 Gaussian, 569 Confluence hypergeometric function, 203 joint, 573 Conjugate-gradient method, 514 Maxwell,585 Consistent estimator, 219 normal (see Gaussian) Continuity Poisson, 569 norm, 60 Rayleigh,141 quadratic mean, 35 Rice, 141 Continuous uniform, 569 parameter process, 413 Detection, 292-309 Controllable, 441 Deterministic system, 413 Controllability criterion, 441 Differencing filter, 199 Convex function, 310 Differentiable, 35 Convolution Diffusion process, 75 integral,550 Digital modulation, 5 theorem, 554 Discrete Correlated noise, 532 parameter process, 1 Correlation function, 10 measurement, 537 Correction coefficient, 126 time, 413 Correlator-detector (see Matched filter) Discrete-time stochastic systems, 413, 449 Cospectrum, 127 Distributed system, 413 Cost function, 223 Distribution function, 567 Counting process, 10, 28 Bernoulli,570 Covariance, 10, 111,486 Binomial,568 Covariance matrix, 578 exponential, 89 Covariance stationary process, 109 geometric,570 INDEX 621

Distribution function (cont.) Expectation, 572 joint, 572 Expected value (see Mean) Poisson, 569 Exponential density, 89, 94, 585 properties, 567 Exponential distribution, 89 Student's t, 586 Exponential stability,S 30 Distribution law, 598 Extended Kalman filter, 504 Doob decomposition, 459 Extended least square, 546 Doubly stochastic process, 30 Extended Matrix method, 547 Dual basis, 602 Dynamic system, 413 Fade, 207 Failure rate function, 94 Efficient estimator, 233, 253 Fejer's kernel, 323 Eigenfunction, 142 Filter Eigenvalue, 560 adaptive, 369 Eigenvector, 560 all pole, 338 Entropy, 155, 161, 163, 164 all zero, 338 joint, 162 causal, 549 Envelope, 140, 179 difference, 199 delay (see Group delay) extended Kalman, 504 Equilibrium, 47 finite impulse response, 339 Equivalent degrees of freedom, 398 Fraser-Potter, 500 Equivocation, 162 infinite impulse response, 338 Erfx (see Error function) inverse, 338 Ergodic, 52 Kalman, 476 Ergodic theorem, 156 Kalman-Bucy,476 Ergodicity, 155 lattice, 350 Erlang density (see Gamma density) matched, 267 Error minimum phase, 338, 380 function, 296 quadrature, 551 of 1st kind, 293 Wiener, 237 of 2nd kind, 294 Filtered Poisson process, 34 probability, 308 Filtered sequence, 520 Estimation criteria, 223 Final prediction error, 341 Estimation in colored noise, 259 Finite impulse response, 339 Estimator, 478 First passage time, 48 best linear unbiased, 238 Fisher information, 250 biased, 218 Fokker-Planck equation, 73 consistent, 219 Folded normal density, 193 efficient, 253 Forgetting factor, 517 Gauss-Markov, 314 Fourier transform, 550 least squares, 240 Fraser-Potter filter, 500 linear, 235 Full wave rectifier, 191 maximum a posteriori, 226 Forward diffusion equation, (see Fokker• maximum entropy, 343 Planck equation) maximum likelihood, 228 Forward prediction error, 345 mean square, 458 Function minimum mean square, 224 basis, 601 minimum variance unbiased, 219 Dirac delta,S recursive, 262,412 error, 296 unbiased, 218 Gamma, 95 Event, 566 impulse response (see Appendix 1) 622 INDEX

Function (cont.) Newton, 514 orthonormal, 141 of system, 509-526 transfer (see Appendix 1) steepest decent, 514 Fundamental matrix, 443 stochastic approximation, 515 computation, 433 Impulse response function, 549 Independent random variable, 15,574 Gain, 264 increment process, 8 Gamma density, 104 stationary, 8 Gamma function, 95 Inequality Gaussian Cauchy-Schwartz, 269 bivariate, 580 Chebyshev's 159 characteristic function, 19 information, 255 density, 19,458,459,523 Infinite impulse response filter, 338 entropy, 164, 205 Information, 161 joint density, 19 average, 161 Markov, 54, 55, 83 inequality, 255 process, 18,450 matrix, 512,517 random variable, 569, 579 mutual, 161 Generalized least square, 241, 242,450 rate, 165 Geometric distribution (see Distribution Innovation, 264,457,458 function) process, 457 Gradient algorithm, 514 theorem, 467 Gram-Schimdt orthonomoralization, 602 Input, 549 Group delay, 135 Integrable, 36 Integral Half wave rectifier, 206 Riemann-Stieltjes, 38 Hanning window, 398 stochastic, 38 (see also Appendix 3) Hard limiter, 186 Intensity of the process, 24 Hazard function, 94 Interarrival time, 28 High resolution spectrum, 354 Instrument variable method, 520 Hilbert space, 479 Interpolation, 552 Hilbert transform, 551 Interval estimation, 247 Homogeneous Poisson process, 27 Inverse , 91 Householder transform, 563 Inverse Z-transform (see Appendix 1) Hurwitz polynomial, 348 Invertible process, 405 Hypothesis Irreducible Markov chain, 52 alternative, 293 binary, 297 Jacobian, 581 M-ary, 297 Join t characteristics function (see multiple, 297 Appendix 2) null,292 Joint distribution, 572 Identification method entropy, 155-164 correlation, 525 Gaussian density, 19 conjugate, 514 probability density, 573 extended least square, 546 generalized least square, 517 Kalman-Bucy filter, 476, 495-497 instrument variable, 520 Kalman filtering, 476-499 Kalman equation, 546 divergence, 497 least squares, 515 extended, 504-506 maximum likelihood, 518 information, 540 INDEX 623

Kalman filtering (cont.) Matrix stabilized,539-540 addition, 558 Kalman gain matrix, 482 covariance, 578 Karhunen-Loeve expansion, 142 differentiation (Appendix 1),563 Karhunen-Loeve transformation, 146 dispersion, 5 7 8 Kinchin's theorem, 587 diagonal,562 Kolmogorov equation, 66, 73, 75, 457 diagonalization, 560, 562 Kolmogorov theorem, 587 Hermitian, 559 Kumarsen-Tufts method, 407 Householder, 563 identity, 559 Lag window, 331 information, 512 Bartlett, 332 inner product, 559 Blackman-Tukey, 332 integration, 563 Daniel,326 inverse, 559 Dirichlet, 331 multiplication, 558 Hanning, 398 nonsingular, 559 Parzen, 333 norm, 559 rectangui

Modified periodogram (see Spectral Pisarenko method, 359 estimation [David-Bartl~ttl) Point process, 10 Moment, 12,570 Poisson process, 24 Moment theorem, 571 compound, 29 Moving average, 121, 122,409 conditional, 30 Multichannel spectral estimation, 407 homogeneous, 27 Multidimensional LV., 572 nonhomogeneous, 30 Multiple regression, 239 shot noise, 31 Positive definite, 530 Narrowband process, 137 PSD (power spectral density), 18, 112 Neumann series, 467 Potter update, 542 Neyman-Pearson (NP) Power of test, 294 criterion, 299 Power spectral density, 18, 112 lemma, 299 Prediction, 478 Nonhomogeneous Poisson process, 29, 30 backward, 345,478 Nonlinear estimation, 234 error, 478 Nonlinear systems, 180 filter, 316,484 Nonstationary process, 21,' 109 forward, 345,478 Norm of Price's theorem, 193 matrix, 559 Probability vector, 558 a priori, 162 Normal density, 237 a posteriori, 162 Normal equation, 345,478 axioms, 366 Null hypothesis, 293 conditional, 162 Nyquist interval, 551 density function (see Density function) Nyquist rate, 551 detection, 306 error, 306 Observable, 443, 448 false alarm, 298 Observability criterion, 443 miss, 298 One step predictor, 489 space, 566 Optimum linear filter, 266 Projection theorem, 604 Orstein-Uhlenbeck process, 106 Prony method, 359 Orthogonal LV., 15 Pseudo-inverse matrix, 563 Orthogonality principle, 236 Purely random sequence, 54 Orthonormal Pythagorean theorem, 599 basis, 601 expansion, 141 Quadratic spectrum, 127,390 function, 141 Quadrature filter, 551 Output, 549 Random signal, 1 Paley-Wiener Criterion, 283,477 Random variables, 1, 567 Parallelogram law, 600 independent, 574 Parseval's theorem, 268, 270, 555 orthogonal, 575 Partial correlation coefficient, 340 uncorrelated, 575 Parzen criterion, 342 transformation, 581 Parzen window, 333, 397 Random walk, 107 PDP (see Density function) Rank, 559 Periodogram,321 Rate of the process, 24 Phase lock loop, 77 transmission, 164 Physical relizability, 273, 282 Rational function, 283 Picard's lemma, 463, 466 Rauch-Tung-Striebel equation, 503, 535 INDEX 625

Rayleigh density, 141, 310,584 Space Receiver operating characteristic, 306 Banach,599 Recurrent state, 51 complete, 599 Recursive estimation, 262, 412 Hilbert, 599 Reflection coefficient, 340 Normed linear, 599 Reliability function, 94 Spectral Regression models (see Autoregressive distribution (see Spectral measure) process) dynamic range, 377 Relative efficiency, 221 factorization, 283 Renewal leakage, 327 counting process, 89 measure, 128 density, 88 moment, 179 equation, 92 representation, 128 function, 88 resolution, 327 process, 86 width,l77 time, 88 windows, 319 Riccati equation, 495 Bartlett, 327,332 Rician density, 141 Blackman-Tukey, 332 Riemann-Stieltjes integral, 38 Dirichlet, 331 Risk function, 223 Hanning, 398 Parzen, 332 Sample rectangular, 331 autocorrelation, 159 triangular, 323 data, 413 Spectral estimation function, 1 adaptive lattice, 369 mean, 212 autoregressive, 337 space, 566 Daniel-Bartlett, 326 variance, 212 instrument variable, 520 Sampling, 141 Kumarsen-Tufts, 407 Sampling theorem, 146 maximum entropy, 343-354 Second moment (see Moment) maximum likelihood, 226 second order stationary (see Wide• multichannel spectral, 407 sense stationary) nonparametric (see Window) Sequential Pisarenko, 359 estimation (see Recursive estimation) Prony, 359 processing, 542 singular value decomposition, 406 Shannon-Hartley capacity theorem, 170 Square law device, 180 Shot noise, 31 Square root Signal detection, 292 lJJter, 541 Signal-to-noise ratio, 267 information, 541 Significance level, 294 Square root method, 562 Simple linear regression, 238 Stable, 549, 556 Singular values, 562 asymptotically, 447 Singular values decomposition, 406, 562 bounded input-bounded output, 446 Size of the test, 294 causal system, 556 Smoluchowski equation, 67 exponentially, 530 Smoothing, 478, 498 Lyapunov, 447 fixetl interval, 498, 536 matrix, 530 fixed lag, 498 uniformally, 530 fixed point, 498 Standard deviation, 570 Source entropy, 164 Standard Wiener process, 21 626 INDEX

State space, 423 Transition probability, 41 Stationary Transversal filter, 423 independent increment, 8 Truncation point, 336 process, 9, 67 Statistics, 212 Unbiased estimator, 218 Steady state density, 68 Uncorrelated (processes), 15 Steepest decent method, 514 Uniform random variable, 569 Step function, 4 Uniformally most powerful (UMP) , 303, 305 Stochastic Uniform ally stable, 530 approximation, 515 Unit step function, 4 differential equation, 66, 595 dynamic system, 413 Van der Mode matrix, 366 integral, 38 Variance, 570 process Vector, 557 continuous parameter, 1 column, 557 discrete parameter, 1 norm, 558 Strictly stationary process, 9 row, 557 Subspace, 603 inner product, 558 Sufficient statistics, 222 orthogonal,558 Symmetry condition, 2 Vector-Markov process, 63 System Volterra integral equation, 463 deterministic, 415-452 identification, 509-526 Waiting time, 28 linear, 180, 549 Wald equation, 108 nonlinear, 180 Weibull failure law, 95 stochastic, 452 Welch method, 399 time invariant, 549 White noise, 18, 113 transfer function, 118 Whitening filter, 147 Wide-sense stationary (real random process), 18,109,114 Tapped delay line, 409 Wiener filter, 237, 267,431 Temporarilly homogeneous, 41 Wiener-Hopf equation, 282 Test Wiener and Kolomogorov, 457 size, 294 Wiener integral, 596 statistics, 303 Wiener process, 20 uniformly powerful, 303, 305 Wiener-Levy process, 20, 473 Threshold level, 294 Window Time update, 542 Bartlett, 327 Toeplitz matrix, 339 Daniel,326 Total probability theorem, 582 Parzen, 332 Trace, 559 Windrow-Hoff method, 372 Transform Woodbury formula, 561 Fourier, 550 Hilbert, 551 Yule-Walker equation, 373 Householder, 563 Laplace, 89 Z-transform, 552 Z-transform, 552 Zero crossing, 170 Transient, 52 detectors, 170 Transition matrix, 63 Zero padding, 337