Math Notes for ECE 278

G. C. Papen

September 6, 2017 c 2017 by George C. Papen All rights reserved. No part of this manuscript is to be reproduced without written consent of the author. Contents

1 Background 1 1.1 Linear Systems...... 1 1.1.1 Bandwidth and Timewidth...... 7 1.1.2 Passband and Complex-Baseband Signals...... 12 1.1.3 Signal Space...... 15 1.2 Random Signals...... 23 1.2.1 Distribution Functions...... 23 1.2.2 Random Processes...... 35 1.3 Electromagnetics...... 46 1.3.1 Material Properties...... 47 1.3.2 The Wave Equation...... 50 1.3.3 Random Electromagnetic field Fields...... 55 1.4 References...... 56 1.5 Problems...... 56

2 Examples 65 2.1 Filter Estimation...... 65 2.2 Constant-Modulus Objective Function...... 68 2.3 Adaptive Estimation...... 69

Bibliography 71

iii iv 1 Background

The study of communication systems is rich and rewarding, bringing together a broad range of topics in engineering and physics. Our development of this subject draws on the understanding of basic material in the subjects of linear systems, random signals, electromagnetics. The emphasis in this chapter is on the concepts that are relavent to the understanding of modern digital communication systems. This background chapter also introduces and reinforces various and alternative sets of notation that are used throughout the book. Much of the understanding of the various topics in this book depends on the choice of clear and appropriate notation and terminology.

1.1 Linear Systems

A communication system conveys information by embedding that information into temporal and perhaps spatial variations of a propagating signal. We begin with a discussion of the properties of signals and systems. A signal is a real-valued or complex-valued function of a continuous or discrete variable called time. A system responds to a signal s(t) at its input producing one or more signals r(t) at its output. The most amenable systems are linear systems because a linear mathematical model can support

Figure 1.1: A block diagram of a linear system characterized by an impulse response function h(t). Using the properties of homogeneity and additivity, an input ax1(t) + bx2(t) produces an output ay1(t) + by2(t). powerful methods of analysis and design. We are interested in both discrete systems and continuous systems expressed in a variety of mathematical forms such as by using continuous integral equations, continuous differential equations, or discrete difference equations. A communication signal may be a real function of time or a complex function of time. The rectangular form of a complex function is a(t) = aR(t) + iaI (t) where aR(t) is the real part and aI (t) 2 iφ(t) p 2 2 is the imaginary part and i = 1. The polar form is a(t) = A(t)e where A(t) = aR(t) + aI (t) −−1 is the amplitude and φ = tan (aI /aR) is the phase. Systems can be classified by the properties that relate the input s(t) to the output r(t).

Linearity A system, either real or complex, is linear if it is homogeneous and additive:

1 1. Homogeneous systems If input s(t) has output r(t), then for every scalar a, real or complex, input as(t) has output ar(t). 2. Additive systems If input x1(t) has output y1(t) and input x2(t) has output y2(t), then input x1(t) + x2(t) has output y1(t) + y2(t).

The output r(t) of a linear continuous-time system can be written as a superposition integral of the input s(t) and a function h(t, τ) Z ∞ r(t) = h(t, τ)s(τ)dτ (1.1.1) −∞ where h(t, τ), called the time-varying impulse response, is defined as the output of the system at time t in response to a Dirac impulse δ(t τ). − The Dirac impulse δ(t) is a not a proper functiona. It is defined by the formal integral relationship Z ∞ s(t) = δ(t τ)s(τ)dτ, (1.1.2) −∞ − for any function s(t). This integral is referred to as the sifting property of a Dirac impulse. For the treatment of discrete-time signals, a Kronecker impulse δmn is useful, defined by δmn equal to one if m is equal to n, and δmn equal to zero otherwise.

Shift Invariance Under appropriate conditions, a system described by a superposition integral can be reduced to a simpler form known as a shift-invariant system, or when appropriate, as a time-invariant or a space-invariant system. If input s(t) has output r(t), then for every τ, input s(t τ) has output − r(t τ). In this case, the form of the impulse response for a linear and shift-invariant system − depends only on the time difference so that h(t, τ) = h(t τ, 0) and (1.1.1) reduces to − Z ∞ r(t) = h(τ)s(t τ)dτ. (1.1.3) −∞ − The output is then a convolution of the input s(t) and the shift-invariant impulse response h(t) and is denoted by r(t) = s(t) ~ h(t). The shift-invariant impulse response is also called, simply, the impulse response. Every linear shift-invariant system can described as a linear shift-invariant filter. Convolution has the following properties: 1. Commutative property h(t) ~ s(t) = s(t) ~ h(t). 2. Distributive property h(t) ~ (x1(t) + x2(t)) = h(t) ~ x1(t) + h(t) ~ x2(t). aA Dirac impulse is an example of a generalized function or a generalized signal. For the formal theory see Strichartz (2003).

2 3. Associative property h1(t) ~ (h2(t) ~ s(t)) = (h1(t) ~ h2(t)) ~ s(t).

Using the distributive property of convolution, we can write for complex functions,

a(t) = b(t) ~ c(t)

aR(t) + iaI (t) = (bR(t) + ibI (t)) ~ (cR(t) + icI (t)) = b (t) c (t) b (t) c (t) + ib (t) c (t) + b (t) c (t). (1.1.4) R ~ R − I ~ I R ~ I I ~ R The class of shift-invariant systems includes all those described by constant-coefficient, linear differential equations. An example of a spatially-invariant system is free space because it has no boundaries and thus the choice of the spatial origin is arbitrary. Systems with spatial boundaries are spatially-varying in at least one direction, but may be spatially-invariant in the other directions. However, many spatial systems with boundaries can be approximated as spatially-invariant over a limited range of spatial inputs.

Causality A causal filter h(t) is a linear filter whose impulse response has a value equal to zero for all times t less than zero. A causal impulse response cannot have an output before it has an input. A right- sided signal s(t) has a value equal to zero for all times less than zero. A linear time-invariant system is causal if and only if it has a right-sided impulse response. A causal h(t) can be defined using the unit-step function, which is

 1 for t > 0 .  1 u(t) = 2 for t = 0 (1.1.5)  0 for t < 0.

A linear shift-invariant system is causal if its impulse response h(t) satisfies h(t) = h(t)u(t) except at t = 0. For this case, the lower limit of the integral for the output signal given in (1.1.3) is equal to zero. A function related to the unit-step function is the signum function defined as . sgn(t) = 2u(t) 1 −  1 for t > 0 .  = 0 for t = 0 (1.1.6)  1 for t < 0. − A system for which the output r(t) depends on only the current value of s(t) is called memoryless. The corresponding property in space is called local.

3 The Fourier Transform The Fourier transformb (or spectrum) S(f) of the temporal signal s(t) is defined, provided the integral exists, as Z ∞ S(f) = s(t)e−i2πftdt. (1.1.7) −∞ The Fourier transform formally exists for any signal whose energyc E, given by Z ∞ E = s(t) 2, (1.1.8) −∞ | | is finite. Such signals are called finite energy or square-integrable signals. The Fourier transform can be extended to include a large number of signals and generalized i2πfct signals with infinite energy, but finite power, such as cos(2πfct) and e by means of a limiting process that often can be expressed using the Dirac impulse δ(t). The signal s(t) can be recovered as an inverse Fourier transform Z ∞ s(t) = S(f)ei2πftdf, (1.1.9) −∞ with s(t) S(f) denoting the transform pair. To this purpose, two signals whose difference has zero energy←→ are regarded as the same signal. Another way to say this is that the two signals are equal almost everywhere. A Fourier transform can also be defined for spatial signals. For a one-dimensional spatial signal f(x), we haved Z ∞ F (k) = f(x)eikxdx, (1.1.10) −∞ where k is the spatial frequency, which is the spatial equivalent of the temporal frequency ω = 2πf.

Properties of the Fourier Transform Several properties of the Fourier transform used to analyze communication systems are listed below.

1. Scaling

1 f  s(at) S (1.1.11) ←→ a a | | for any nonzero real value a. This scaling property states that the width of a function in one domain scales inversely with the width of the function in the other domain.

b R ∞ −iωt Angular frequency ω = 2πf (Hz)is also used to define a Fourier transform pair where S(ω) = −∞ s(t)e dt, and 1 R ∞ iωt s(t) = 2π −∞ S(ω)e dω. We will use this alternative notation for electromagnetics, where it is conventional. cThe term energy here refers to a mathematical concept and does not necessary correspond to physical energy. dThe usual sign convention for the spatial Fourier transform is the opposite of the sign convention for the temporal Fourier transform, but this is a matter of preference.

4 2. Differentiation d s(t) i2πfS(f). (1.1.12) dt ←→ The dual property is 1 d t s(t) S(f). (1.1.13) ←→ −i2π df 3. Convolution s(t) h(t) S(f)H(f). (1.1.14) ~ ←→ Convolution in the time domain is equivalent to multiplication in the frequency domain. The dual property is s(t)h(t) S(f) H(f). (1.1.15) ←→ ~ 4. Modulation A special case of the convolution property occurs when h(t) = ei2πfct and gives

i2πfct s(t)e S(f fc) (1.1.16) ←→ − i2πfct for any real value fc. Multiplication in the time domain by e translates the frequency origin of the baseband signal S(f) to the carrier frequency fc, which can be written as

S(f) δ(f fc) = S(f fc). ~ − − The modulation process is linear with respect to s(t) but does contain frequency components that are not present in the original baseband signal. The dual property is

−i2πft0 s(t t0) e S(f) (1.1.17) − ←→ for any real value t0. 5. Parseval’s relationship Two signals s(t) and h(t) with finite energy satisfy Z ∞ Z ∞ s(t)h∗(t)dt = S(f)H∗(f)df. (1.1.18) −∞ −∞ When h(t) = s(t), the two integrals express the energy in s(t) computed both in the time domain and in the frequency domain. These integrals are equal and finite. . If s(t) is real, then the following relationships hold for S(f) = S (f) + iS (f) where S (f) = . R I R Re[S(f)] is the real part and SI (f) = Im[S(f)] is the imaginary part of the Fourier transform S(f) = S∗( f) − S (f) = S ( f) R R − S (f) = S ( f) I − I − S(f) = S( f) | | | − | φ(f) = φ( f), (1.1.19) − −

5 p where S(f) = S (f)2 + S (f)2 is the magnitudee of the Fourier transform, and | | R I   . SI (f) φ(f) = arg S(f) = tan−1 SR(f) is the phase of the Fourier transform. A consequence of these properties is that the Fourier transform of a real signal s(t) is conjugate symmetric, meaning that the negative frequency part of the Fourier transform containing the same information as the positive frequency part. This observation allows us to construct an equivalent representation of the real signal that consists only of the nonnegative frequency components. To do so, define  2S(f) for f > 0 .  Z(f) = S(0) for f = 0 (1.1.20)  0 for f < 0. The function Z(f) is equal to twice the positive part of S(f) for positive frequencies, has a value equal to S(0) at the zero-frequency componentf, and contains no negative frequency components. The inverse Fourier transform of Z(f) is called the analytic signal z(t) corresponding to s(t). The analytic signal z(t) is complex. Similarly, the real signal s(t) is related to z(t) by 1 ∗ s(t) = 2 (z(t) + z (t)) (1.1.21a) = Re[z(t)], (1.1.21b) where S(f) = S∗( f) has been used because s(t) is real. The analytic signal z(t) is directly related − to the real signal s(t) by

z(t) = s(t) + isb(t) with z(t) Z(f), where s(t) is the Hilbert transform of s(t) defined as ←→ b 1 Z ∞ s(τ) s(t) = dτ. (1.1.22) b π t τ −∞ − The Hilbert transform is formally the convolution of s(t) and (πt)−1 For example, if s(t) = cos(2πft), then the analytic signal is z(t) = cos(2πft) + i sin(2πft) = ei2πft. The Hilbert transform relates a real function of time to a complex function of time with a one- sided function of frequency Z(f). A counterpart of the Hilbert transform, called the Kramers-Kronig transform (or the Kramers-Kronig relationship), relates a function of frequency to a real-valued one- sided (causal) function of time. The inverse of this transform relates a real-valued one-sided function of time to a function of frequency. Let s(t) be a real-valued causal function with Fourier transform

S(ω) = SR(ω) + iSI (ω), conventionally stated using the angular frequency ω in place of f. The functions SR(ω) and SI (ω) are related by 1 Z ∞ S (Ω) S (ω) = R dΩ. (1.1.23) I π ω Ω −∞ − An alternative form of the Kramers-Kronig transform expresses instead SR(ω) in terms of SI (ω). eThe word modulus is sometimes used for the magnitude of a complex number. fThe value at zero frequency is often called the DC value (direct current) even if the signal does not represent current.

6 Modes of a Linear Time-invariant System Let the input to a linear time-invariant system characterized by an impulse response h(t) consist of a single complex frequency component given by s(t) = ei2πft. Using the commutative property of the convolution operation, the output is Z ∞ r(t) = h(τ)ei2πf(t−τ)dτ −∞ Z ∞ = ei2πft h(τ)e−i2πfτ dτ −∞ = H(f)ei2πft, (1.1.24) where H(f), called the transfer function, is the Fourier transform of h(t). For any frequency f0, i2πf t i2πf t i2πf t the output H(f0)e 0 is a scaled version of the input e 0 . Therefore, the function e 0 is an g eigenfunction , eigenmode, or simply a mode of a linear, shift-invariant system with the value H(f0) being the eigenvalue. For a linear transformation described by a matrix, the corresponding vector is called an eigenvector. Given that a linear, shift-invariant system can only scale the function i2πf t e 0 by a complex number H(f0), a linear shift-invariant system cannot create new frequency components. Any s(t) at the input to h(t) will have an output described by the convolution r(t) = s(t) ~ h(t). Using the convolution property of the Fourier transform given in (1.1.14), the output signal R(f) in the frequency domain is given by R(f) = S(f)H(f) where S(f) is the Fourier transform of s(t). The inverse Fourier transform (cf. (1.1.9)) of R(f) yields the output signal r(t) Z ∞ r(t) = S(f)H(f)ei2πftdf. (1.1.25) −∞ The relationship between the input and the output of a linear shift-invariant system in both the time domain and the frequency domain is shown in Figure 1.2, where the two-way arrows represent

s(t) h(t) r(t)=s(t) h(t)

S(f) H(f) R(f)=S(f)H(f)

Figure 1.2: Time and frequency representation of a time-invariant linear system. the Fourier transform relationship.

1.1.1 Bandwidth and Timewidth Signals used in communication systems are constructed from finite energy pulses. A signal s(t) of finite energy must have most of its energy in some finite region of the time axis, and its Fourier

gIn general, an eigenfunction of a linear transformation is any function that is unchanged by that transformation except for a scalar multiplier called an eigenvalue.

7 transform S(f) must have most of its energy in some finite region of the frequency axis. The timewidth is a measure of the width of the signal s(t). The bandwidth is a measure of the width of the spectrum S(f). The root-mean-squared bandwidth rms of a signal s(t) with nonzero energy is defined by W Z ∞ 2 . 1 2 2 rms = (f f) S(f) df, (1.1.26) W E −∞ − | | where S(f) is the Fourier transform of s(t), where . 1 Z ∞ f = f S(f) 2df (1.1.27) E −∞ | | is the centroid or the mean of the term S(f) 2/E, and where E = R ∞ S(f) 2df is the energy in | | −∞ | | the pulse s(t). Expanding the square of (1.1.26) and simplifying yields the alternative form Z ∞ 2 1 2 2 2 rms = f S(f) df f W E −∞ | | − 2 = f 2 f , (1.1.28) − where . 1 Z ∞ f 2 = f 2 S(f) 2df (1.1.29) E −∞ | | is defined as the mean-squared frequency. The root-mean-squared timewidth Trms is defined in an analogous fashion to (1.1.28) as

T 2 = t2 t2, (1.1.30) rms − where t = 1 R ∞ t s(t) 2dt and t2 = 1 R ∞ t2 s(t) 2dt. The value T 2 defines the mean-squared E −∞ | | E −∞ | | rms timewidth of the pulse s(t). The relationship between Trms and rms for a baseband pulse is shown in Figure 1.3a. If the same pulse is modulated onto a carrierW to produce a passband pulse, as shown in Figure 1.3b, then the definition of rms is not as useful because the spectrum S(f) is not contiguous. In this case, the W passband bandwidth B is twice the baseband bandwidth because it occupies twice the frequency range. W Other measures of the bandwidth and timewidth are common. The ideal bandwidth is the smallest value of such that S(f) = S(f)rect(f/ ). The three-decibel bandwidth or half-power bandwidth W W 2 of a signal s(t) whose power density S(f) is unimodal is denoted by h. It is defined as the | | W frequency at which S(f) 2 is half (or 3 dB) of the power density at the maximum of S(f) 2. | | − | | The effective timewidth Tamp of a nonnegative real pulse s(t) is defined as 2 . 1 Z ∞  Tamp = s(t)dt E −∞ R ∞ 2 −∞ s(t)dt (1.1.31) = R ∞ 2 . −∞ s (t)dt

8 s(t) S(f)

T W rms ←→ rms

t f

S(f) s(t)

t ←→ -f f c fc

Figure 1.3: (a) A baseband pulse and its spectrum. (b) A passband pulse and its spectrum.

Instead, the effective timewidth Tpower of a complex pulse s(t) is defined differently in terms of the instantaneous power P (t) = s(t) 2 as | | R ∞ 2 . −∞ P (t)dt Tpower = R ∞ 2 −∞ P (t)dt 2 R ∞ 2  −∞ s(t) dt = | | . (1.1.32) R ∞ s(t) 4dt −∞ | | These two definitions are similar, but are not the same.

Timewidth-Bandwidth Product The Schwarz inequality, discussed in Section 1.1.3 (cf. (1.1.72)), can be used to determine a lower h bound on the timewidth-bandwidth product of the root-mean-squared timewidth Trms of the signal s(t) and the root-mean-squared bandwidth rms of the corresponding spectrum S(f). A pulse W s(t) with a mean time t and a mean frequency f has the same timewidth and bandwidth as the pulse s(t t)ei2πf . Therefore it is enough to consider s(t) with both means, t and f equal to zero. − R ∞ 2 R ∞ 2 Normalize the energy so that −∞ s(t) dt = −∞ S(f) df = 1 (cf. (1.1.18)). The derivation then follows from the expression | | | |

d ds∗(t) ds(t) t s(t) 2 = s(t) 2 + t s(t) + t s∗(t) dt | | | | dt dt  ds∗(t) = s(t) 2 + 2Re ts(t) . (1.1.33) | | dt

hThis is also called the time-bandwidth product.

9 Integrate both sides from to −∞ ∞ ∞ Z ∞ Z ∞ ∗  2 2 ds (t) t s(t) = s(t) dt + 2Re ts(t) dt . −∞ −∞ | | −∞ dt

The left is zero because the power s(t) 2 in a finite energy pulse must go to zero faster than 1/ t | | | | as t goes to infinity. Therefore, the squared magnitude of the terms on the right are equal so that | | Z ∞ 2 Z ∞ ∗  2 2 ds (t) s(t) dt = 4 Re ts(t) . (1.1.34) −∞ | | −∞ dt

Setting the left side to E2 and applying the Schwarz inequality given in (1.1.72) to the right gives

Z ∞ Z ∞ ∗ 2 2 2 ds (t) E 4 ts(t) dt dt. (1.1.35) ≤ −∞ | | −∞ dt

2 The first integral on the right equals ETrms (cf (1.1.30)). Using the differentiation property of the temporal Fourier transform (cf. (1.1.12)) and Parseval’s relationship (cf. (1.1.18)), the second integral can be written as (cf. (1.1.26))

Z ∞ ∗ 2 Z ∞ ds (t) 2 2 2 dt = i2πfS(f) df = 4π E rms. −∞ dt −∞ | | W With these expressions, (1.1.35) now leads to following inequality for the timewidth-bandwidth producti 1 Trms rms . (1.1.36) W ≥ 4π

2 As an example, the Fourier transform S(f) of a gaussian pulse s(t) = e−πt in time is a gaussian pulse in frequency with the transform pair given by

2 2 e−πt e−πf . (1.1.37) ←→ 2 2 Expressing these pulse in the standard form e−t /2σ , each pulse is characterized by σ2 = 1/2π. For 2 2 2 2 2 a gaussian pulse, because Trms is defined using s(t) and rms is defined using S(f) , Trms = 2 | | W | | rms = 1/4π so that Trms rms = 1/4π, which satisfies (1.1.36) with equality. This means that aW gaussian pulse, perhaps time-shiftedW or frequency shifted, produces the minimum value of the timewidth-bandwidth product.

Communication Pulses Several basic pulses are commonly used in the study of communication systems. A rectangular pulse of unit height and unit width centered at the origin is the rect pulse defined as

.  1 t 1/2 rect(t) = | | ≤ . (1.1.38) 0 t > 1/2 | | iThis is called the Heisenberg uncertainty relationship in other contexts.

10 The Fourier transform of this rectangular pulse is Z ∞ S(f) = s(t)e−i2πftdt −∞ sin(πf) = πf = sinc(f), (1.1.39) where the sinc pulse is defined as

. sin(πt) sinc(t) = . (1.1.40) πt The sinc pulse has its zeros on the nonzero integers and is equal to one for t equal to zero. The Fourier transform pairs

rect(t) sinc(f) (1.1.41a) ←→ sinc(t) rect(f) (1.1.41b) ←→ are duals. The scaling property of the Fourier transform gives the pair

1 rect(t/T ) sinc(fT ). (1.1.42) T ←→ In the limit as T goes to zero, the left side approaches a Dirac impulse and the right side approaches the constant one. In the sense of this limit, the Fourier transform pair

δ(t) 1, (1.1.43) ←→ and its dual

1 δ(f), (1.1.44) ←→ can be defined. Another useful Fourier transform pair that is defined using a limiting process is the Fourier transform of an infinite series of Dirac impulses, which is given byj

∞ ∞ X X δ(t j) δ(f j). (1.1.45) − ←→ − j=−∞ j=−∞

This Fourier transform pair is abbreviated as comb(t) comb(f). The transform pair ←→ ∞ ∞ X X δ(t jTs) (1/Ts) δ(f j/Ts), (1.1.46) − ←→ − j=−∞ j=−∞

jThis pair is whimsically called the “picket fence miracle”. A companion statement that avoids the use of impulses is the Poison summation formula.

11 then follows from the scaling property of the Fourier transform. Another useful pulse is the gaussian pulse

2 2 s(t) = e−t /2σ . (1.1.47) 2 2 The transform pair e−πt e−πf given in (1.1.37) becomes ←→ 2 2 2 2 2 2 2 e−t /2σ √2πσe−2π σ f = √2πσe−σ ω /2, (1.1.48) ←→ by using the scaling property of the Fourier transform given in (1.1.11). Inserting the mathematical symbol i into the argument of a gaussian pulse gives another pulse 2 called a quadratic phase pulse, a chirp pulse, or an imaginary gaussian pulse. Because e−iπt has infinite energy, it does not conform to the requirements of the formal definition of a Fourier transform pair. Therefore, the Fourier transform must be defined by a limiting process and is given by

2 2 2 e−iπt e−iπ/4eiπf = e−iπ/4eiω /4π. (1.1.49a) ←→ The duality property of the Fourier transform gives

2 2 e−iπ/4eiπt e−iπf . (1.1.50) ←→ Another transform pair is 1 πe−2π|t| , (1.1.51) ←→ 1 + f 2 with the pulse waveform in the frequency domain called a lorentzian pulse. For this pulse Trms exists, but rms does not (or is infinite). W A list of Fourier transform pairs is provided for reference in Table 1.1.

1.1.2 Passband and Complex-Baseband Signals A passband signal is a signal of the form  se(t) = A(t) cos 2πfct + φ(t) , (1.1.52) where A(t) is the amplitude, φ(t) is the phase, and both A(t) and φ(t) vary slowly compared to the carrier or reference frequency fc. Radio-frequency signals of the form of (1.1.52) are passband signals because of the high frequency of the carrier as compared to the baseband modulation bandwidth. An equivalent representation for a passband signal can be derived using the trigonometric identity cos(A + B) = cos A cos B sin A sin B to yield −

s(t) = sI (t) cos(2πfct) sQ(t) sin(2πfct), (1.1.53) e − where sI (t) = A(t) cos φ(t) is the in-phase component, and sQ(t) = A(t) sin φ(t) is the quadrature component. A passband signal can also be written as the real part of a complex signal h i iφ(t) i2πfct se(t) = Re A(t)e e (1.1.54a) h i i2πfct = Re (sI (t) + isQ(t)) e (1.1.54b) h i = Re s(t)ei2πfct , (1.1.54c)

12 s(t) S(f) 1 δ(f) δ(t) 1 i2πfct e δ(f fc) − rect(t) sinc(f) sinc(t) rect(f) 2 2 e−πt e−πf 2 2 2 2 2 e−t /2σ √2πσe−2π σ f 2 2 e−iπt e−iπ/4 eiπf −2π|t| 1 πe 1+f 2 ∞ ∞ X X δ(t j) δ(f j) − − j=−∞ j=−∞ comb(t) comb(f) K   X sin (2K + 1)πf δ(t j) − sin(πf) j=−K

Table 1.1: Table of Fourier transform pairs. where s(t) = sI (t) + isQ(t) is the complex-baseband signal that represents the passband signal se(t). The amplitude of a passband signal can be written in terms of the root-mean-squared amplitude Arms(t), defined by

sZ sZ r . 2 2 1 2 Arms(t) = se(t) dt = A(t) cos 2πfct + φ(t) dt A (t), (1.1.55) T T ≈ 2

R T 2 where 0 cos (2πfct)dt = 1/2 has been used, and the integration time T is large compared to 1/fc and small compared to any temporal variation of A(t). If the complex-baseband signal is a function of both time and space, then the signal is called the complex signal envelope a(z, t), and is often expressed using a root-mean-squared amplitude. The Fourier transform of the passband signal se(t) is Z ∞ Z ∞ −i2πft  i2πfct −i2πft Se(f) = se(t)e dt = Re s(t)e e dt. −∞ −∞

1 ∗ Using the identity Re[z] = 2 (z + z ) gives Z ∞ 1 i2πfct ∗ −i2πfct −i2πft Se(f) = s(t)e + s (t)e e dt. 2 −∞ Applying the modulation property of the Fourier transform yields

1 ∗  Se(f) = S(f fc) + S ( f fc) , (1.1.56) 2 − − −

13 where S(f) is the Fourier transform of the complex-baseband signal s(t). The notion of a passband signal implies that S(f fc) and S(f + fc) have essentially no overlap. − A passband impulse response eh(t) has a passband transfer function of the form

∗ He(f) = H(f fc) + H ( f fc), (1.1.57) − − − which has the same functional form as (1.1.56), but without the factor of 1/2. Provided the terms H(f fc) and H(f + fc) do not overlap, − 2 2 2 He(f) = H(f fc) + H( f fc) . (1.1.58) | − | | − − | To define the baseband equivalent of the passband system, the complex-baseband transfer function H(f) is defined as

 H(f + f ) f < f H(f) = e c c 0 f > fc

Using the modulation property of the Fourier transform given in (1.1.16) and (1.1.57), the real passband impulse response can be written as

eh(t) = h(t)ei2πfct + h∗(t)e−i2πfct = 2Re[h(t)ei2πfct], (1.1.59) where h(t) is the complex-baseband impulse response, which is the inverse Fourier transform of the complex-baseband transfer function H(f). The output passband signal r(t) has a Fourier transform given by

Re(f) = Se(f)He(f) 1 ∗  = R(f fc) + R ( f fc) , (1.1.60) 2 − − − which can be verified using the definitions of Se(f) and He(f), noting that the two cross terms ∗ ∗ S(f fc)H ( f fc) and S ( f fc)H(f fc) are zero under the same set of constraints used to derive− (1.1.58− ).− − − − In summary, the output of a passband linear system eh(t) to a passband signal se(t) at the input can be determined in either the time domain or the frequency domain using

re(t) = se(t) ~ eh(t) Re(f) = Se(f)He(f).

With these translated to complex baseband using the same frequency reference, the output of the corresponding complex-baseband linear system h(t) with an input complex-baseband signal s(t) is

r(t) = s(t) ~ h(t) R(f) = S(f)H(f).

14 1.1.3 Signal Space The set of all complex signals of finite energy on an interval [0,T ] defines the signal space over that interval. Two elements within the signal space are deemed to be equivalent or equal, if the energy of the difference of the two signals is zero. A countable set of signals ψn(t) of a signal space spans that signal space if every element of the signal space can be expressed{ as} a linear combination of the ψn(t). This means that every s(t) can be written as X s(t) = snψn(t), (1.1.61) n in the sense that the difference between the left side and the right side has zero energy. The set ψn(t) is called a basis if the elements of the set are linearly independent and span the signal { } space. Every basis for a signal space is countably infinite. An orthonormal basis ψn(t) satisfies the additional requirement that { }

Z T ∗ . ψm(t)ψn(t)dt = δmn, (1.1.62) 0 for all m and n, where δmn is the Kronecker impulse. This means that each basis function satisfies

Z T 2 ψn(t) dt = 1. (1.1.63a) 0 | | and Z T ∗ ψm(t)ψn(t)dt = 0 (m = n). (1.1.63b) 0 6

For an orthonormal basis, the coefficient sn of the expansion in (1.1.61) is given by

Z T . ∗ sn = s(t)ψn(t)dt. (1.1.64) 0 A set of basis functions must span the entire signal space, which implies that the number of functions in any basis for signal space is infinite. A basis must be infinite, but not every infinite orthonormal set is a basis for the set of square-integrable functions. A linear transformation on a signal space is a mapping from the space onto itself that satisfies the linearity properties. With respect to a fixed basis, a linear transformation can be described by a matrix, called the transformation matrix.

Inner Product For any orthonormal basis ψm(t) a signal s(t) in the signal space over [0,T ] is { } completely determined by an infinite sequence of complex components sn, which are the coefficients of the expansion given in (1.1.61). These coefficients may be regarded as forming an infinitely long vector s called a signal vector. Given a signal vector s with complex components, define the conjugate transpose vector as the vector s† whose components are the complex conjugates of the corresponding components of the vector s. If s is a column vector, then s† is a row vector.

15 P P Using a(t) = m amψm(t) (cf. (1.1.61)) and b(t) = n bnψn(t), define the inner product as

. Z T a b = a(t)b∗(t)dt · 0 Z T X X ∗ ∗ = ambn ψm(t)ψn(t)dt m n 0 X X ∗ = ambnδmn m n X ∗ = ambm, (1.1.65) m where (1.1.62) is used in going from the second line to the third line. Setting a(t) = b(t) in (1.1.65) immediately gives the energy statement

Z T 2 X 2 a(t) dt = am . (1.1.66) 0 | | m | |

2 For a finite-energy signal a(t), this implies that am goes to zero as m goes to infinity. For some | | integer N, an arbitrarily small amount of energy is discarded by including only N terms. ∗ The term am is a component of the (column) signal vector a. The term bm is the component of a (row) signal vector b†. Therefore,

a b = b†a, (1.1.67) · where b†a is the matrix product of a one by N matrix and an N by one matrix. Using (1.1.66), the energy in the signal s(t) is

Z T 2 X 2 2 E = s(t) dt = sn = s . (1.1.68) 0 | | n | | | |

Similarly, the component sn in (1.1.64) is determined using a(t) = s(t) and b(t) = ψn(t)

Z T ∗ . sn = s(t)ψn(t)dt = s ψn. (1.1.69) 0 ·

This expression defines the projection of s(t) onto ψn(t). The vector ψn has the nth component equal to one and all other components equal to zero. It is a basis vector that corresponds to the basis function ψn(t) defined in (1.1.61). The set of all linear combinations of basis vectors along with an inner product is an instance of a Hilbert space.

. Outer Product In contrast to the inner product operation a b = b†a defined in (1.1.65), which produces a scalar, the outer product operation of two vectors· produces a transformation. Just as the inner product operation is invariant with respect to a change in basis, the outer product as an operation is invariant with respect to a change in basis. However, the matrix representation M of ∗ an outer product with elements mij = aibj does depend on the basis.

16 The outer product of two column vectors is defined as . a b = ab†. (1.1.70) ⊗ The outer product is also called the dyadic product or the tensor product of the two vectors. The outer product distributes over both addition and scalar multiplication so that (αa + βb) (γc + δd) = αγ (a c) + αδ (a d) + βγ (b c) + βδ (b d). ⊗ ⊗ ⊗ ⊗ ⊗ Schwarz Inequality Any two signal vectors satisfy 2 2 2 s1 s2 s1 s2 , | · | ≤ | | | | which is known as the Schwarz inequality. For the signal space of complex functions with finite energy on the interval [0,T ], the Schwarz inequality can be written using (1.1.65) as Z T 2 Z T Z T ∗ 2 2 s1(t)s2(t)dt s1(t) dt s2(t) dt, (1.1.71) 0 ≤ 0 | | 0 | | whereas for the set of square-integrable functions on the infinite line, it is Z ∞ 2 Z ∞ Z ∞ ∗ 2 2 s1(t)s2(t)dt s1(t) dt s2(t) dt. (1.1.72) −∞ ≤ −∞ | | −∞ | |

In either case, the equality holds if and only if s1(t) = ks2(t) for some constant k, possibly complex.

2 Distance in a Signal Space Using (1.1.65) and (1.1.68), the squared euclidean distance d12 between two signals s1(t) and s2(t), or, equivalently two signal vectors s1 and s2, is defined as Z T 2 . 2 d12 = s1(t) s2(t) dt 0 | − | Z T 2 2 ∗ ∗  = s1(t) + s2(t) s1(t)s2(t) s1(t)s2(t) dt 0 | | | | − − = E1 + E2 2Re [s1 s2] . (1.1.73) − · This expression states that the squared euclidean distance between two signals depends on the energy of each signal as well as their inner product. Now consider an infinite-duration passband signal  se(t) = A(t) cos 2πfct + φ(t) h i = Re s(t)ei2πfct . Using Parseval’s relationship and the modulation property of the Fourier transform, the energy in this passband signal is Z ∞ Z ∞ E = s2(t)dt = S2(f)df se e e −∞ −∞ Z ∞ 1 1 2 = 2 S(f fc) + 2 S(f + fc) df −∞ − Z ∞ 1 2 1 = S(f) df = 2 Es, (1.1.74) 2 −∞ | |

17 provided that S(f fc)S(f +fc) = 0. Equation (1.1.74) states that, under this condition, the energy − in a complex-baseband signal s(t) is twice the energy in the passband signal se(t). Similarly, under this condition, the euclidean distance between two complex-baseband signals is twice the distance between the equivalent passband signals so that

2 2 dij(complex-baseband signal) = 2dij(passband signal). (1.1.75) Using the same line of reasoning, the cosine and sine components are orthogonal with Z ∞ sI (t) cos(2πfct)sQ(t) sin(2πfct)dt = 0. (1.1.76) −∞

For a narrowband signal, both A(t) and φ(t) vary slowly compared to the carrier frequency fc. Therefore, over a finite time interval T much greater than 1/fc, the energy in the signal is well- approximated by (1.1.74) with the cosine and sine components being nearly orthogonal.

Fourier Series A function in signal space on the interval [0,T ] can be expanded in a Fourier series. The sinusoids are the basis functions and the Fourier coefficients are the expansion coefficients.

Nyquist-Shannon Series A deterministic baseband waveform s(t) whose spectrum S(f) is zero for f larger than is called a bandlimited waveform. Because ( , ) defines an interval on the |frequency| axis,W the set of functions on this interval is a signal space−W andW is spanned by a countable set of basis functions. One such set of basis functions is given by the Nyquist-Shannon sampling theorem. The sampling theorem can be described by setting = 1/2, and multiplying s(t) by comb(t) (cf. W (1.1.45)). This produces a sampled waveform with the samples s(j) spaced by the sampling interval Ts = 1. Two sampled waveforms are shown in Figure 1.4 for two different sampling rates. Using comb(t) comb(f) and its dual (cf. Table 1.1), and the convolution property of the Fourier transform←→ (cf. (1.1.15)) gives

s(t)comb(t) S(f) comb(f) (1.1.77a) ←→ ~ s(t)comb(t) sinc(t) S(f) comb(f)rect(f) (1.1.77b) ~ ←→ ~ The left side of (1.1.77a) is an infinite sequence of impulses with the area of the kth impulse equal to the kth sample value. The right side is the spectrum of the sampled waveform S(f) ~ comb(f) and is shown at the bottom of Figure 1.4 for two different sampling rates. For the left set of curves in Figure 1.4a, multiplying the right by rect(f), which is shown as a dashed line, recovers S(f) because the support of S(f) is [ 1/2, 1/2] and thus the images of the original spectrum S(f) do − not overlap in S(f) ~ comb(f). Multiplication by rect(f) in frequency corresponds to a convolution in time with sinc(t). Because [S(f) ~ comb(f)]rect(f) = S(f), the convolution of sinc(t) with the left side of (1.1.77b) recovers s(t) so that   s(t) = sinc(t) ~ s(t)comb(t) ∞ X = s(j)sinct j. (1.1.78) − j=−∞

18 Combs (a) (b) in Time ......

t t Baseband Signals

t t Sampled Signals

t t Spectra of Rect Sampled Signals function

f f

Figure 1.4: (a) Sampling at greater than the Nyquist rate. (b) Sampling at less than the Nyquist rate showing the effect of aliasing.

This expression states that a waveform s(t) bandlimited to [ 1/2, 1/2] can be expanded using an − orthogonal set of basis functions sinc(t j) with the coefficients simply being samples s(j) of the { − } bandlimited waveform s(t) for a sampling interval Ts = 1. The expression for an arbitrary sampling interval Ts can be determined by applying the scaling property of the Fourier transform and gives

∞ X  s(t) = s(jTs)sinc 2 t j , (1.1.79) W − j=−∞ where Ts = 1/2 . In this way, the sequence of sinc functions is seen as the sequence of interpolating functions for aW bandlimited signal. The images of the original signal spectrum S(f) shown in Figure 1.4 are offset by the sampling rate Rs = 1/Ts. When Rs 2 , the images do not overlap. The minimum sampling rate Rs = 2 that generates nonoverlapping≥ W images of the signal spectrum is called the Nyquist rate. W When the sampling rate is greater than or equal to the Nyquist rate, the images do not overlap and the original signal s(t) can be reconstructed as given by (1.1.79). The set of curves in Figure 1.4a show a spectrum for a signal that is sampled at greater than the Nyquist rate. When the sampling rate is less than the Nyquist rate, the images of the original signal spectrum S(f) overlap. This effect is called aliasing and is shown in the set of curves in Figure 1.4b. In this case, any filter that reconstructs all of the frequency components of the original signal must also pass some frequency components from one or more images. Aliasing is a form of signal distortion that replicates frequency components in the original signal at other frequencies in the reconstructed signal.

19 Matrices

A matrix A is a doubly-indexed set aij, i = 1, . . . , n, j = 1, . . . , m of real or complex numbers { } conventionally interpreted as a two-dimensional array A = [aij]. The conjugate of A is the matrix ∗ ∗ T A = [aij]. The transpose of A is the matrix A = [aji]. The conjugate transpose of A is the matrix † ∗ A = [aji]. A matrix with n = m is a square matrix.

P Trace of a Matrix The trace of a square matrix A is defined as the sum n Ann of the diagonal elements of A. The trace is an inherent property of the transformation represented by the matrix and is independent of the choice of basis. Accordingly, the trace is the sum of the eigenvalues of the matrix. The trace operation has the following properties:  trace cA = c traceA (1.1.80a)  trace A + B = traceA + traceB (1.1.80b)   trace AB = trace BA . (1.1.80c) The trace of a square matrix that can be expressed as an outer product of two vectors xy† is equal to the inner product of the same two vectors y†x. This is given by

trace(xy†) = y†x. (1.1.81)

The proof of this statement is asked as an end-of-chapter exercise.

Determinant of a Matrix The determinant of a square matrix A is a real or complex number defined in the usual way. The determinant is an inherent property of the transformation represented by a matrix and is independent of the choice of the basis. The determinant, denoted det( ), is defined · by the Laplace recursion formula. Let Aij be the (n 1) by (n 1) matrix obtained from the n by − − n matrix A by striking out the ith row and the jth column. Then, for any fixed i, n X i+j det A = ( 1) aij det Aij, (1.1.82) − j=1 where aij is the element of A indexed by i and j. A matrix whose determinant is nonzero is a full-rank matrix. For this matrix, all the eigenvalues are nonzero, and if they are distinct, then the eigenvectors are orthogonal with the number equal to the size of the square matrix. The rank of a matrix is defined as the maximum number of linearly-independent row vectors in the matrix. Equivalently, the rank of a matrix is the maximum number of linearly-independent column vectors in the matrix. The rank of a matrix is equal to the size of the largest square submatrix with a nonzero determinant. The determinant has the following properties for an n by n matrix:

det AB = det A det B (1.1.83a) n det(cA) = c det A (1.1.83b) 1 det A = , (1.1.83c) det A−1

20 −1 provided that det A is nonzero where A is the matrix inverse of A. The trace and the determinant are the two important invariant scalar metrics describing the characteristics of a square matrix.

† Hermitian Matrices A matrix that is invariant under the conjugate transpose, satisfies A = A. The transformation it represents is called a self-adjoint transformation and the matrix is called a hermitian matrix. Every hermitian matrix can be diagonalized by a change of basis and has only real eigenvalues. However, not every matrix with real eigenvalues is hermitian. The eigenvectors of a hermitian matrix are orthogonal. By normalizing each eigenvector, the inner products satisfy † bejbej = 1 for every j, and the set of normalized eigenvectors form an orthonormal basis. † † A transformation matrix A that satisfies AA = A A = I is a unitary matrix representing a unitary transformation, where I is the identity matrix. The inverse of a unitary matrix satisfies −1 † A = A . Multiplication of a vector by a unitary matrix preserves length and can be regarded as a generalized rotation in the space spanned by the eigenvectors of A.

Discrete Linear Transformations

A discrete linear transformation, represented by the transformation matrix A, can be represented as a vector-matrix product

r = As.

When the length of the output vector r is equal to the length of the input vector s, the matrix A is a square matrix. A discrete linear transformation corresponding to a matrix A is characterized by a finite set of eigenvectors ej and a corresponding finite set of eigenvalues λj , possibly complex, such that { } { } Dej = λjej.

The eigenvectors are always orthogonal when the eigenvalues are distinct, and can be constrained to be orthogonal even when the eigenvalues are not distinct. The eigenvectors then form a basis. The eigenvalues are the zeros of det(D λI) = 0, which is a polynomial in λ of degree n. −

Projections The set of orthonormal eigenvectors bej of a hermitian matrix representation of a self-adjoint transformation can be used as a set of orthonormal{ } basis vectors. The inner product of an orthonormal basis vector with itself is equal to one. The outer product of an orthonormal basis † vector with itself is a matrix Pj = bejbej with only a single nonzero element equal to one, which is on the diagonal of the matrix. This matrix is referred to as a projection matrix Pj, where the subscript denotes that the projection is onto the jth orthonormal basis vector bej. For a given basis, the sum of all projection matrices Pj equals the identity matrix I in the signal space spanned by that basis. Thus

X X † I = Pj = bejbej. (1.1.84) j j

21 The transformation A expressed by a hermitian matrix can be re-expressed in terms of its eigenvalues λj and its projection matrices Pj. This is a diagonal matrix D given by X X † D = λjPj = diag(λ1, , λn) = ejλje . (1.1.85) ··· b bj j j

The same transformation A can also be re-expressed using yet another basis xbm . Each eigen- P { } vector bej in the new basis is given as bej = n anxbn. The matrix X in the new basis is X † X = λjbejbej j X X ∗ † = λjanamxbnxbm m,n j X † = Amnxbnxbm, (1.1.86) m,n P ∗ where the matrix X has elements Xmn = j λjanam. This equation states that any hermitian matrix can be expressed as a linear combination of the outer products of the basis vectors that are used to represent A. If the basis consists of the set of eigenvectors of A, then (1.1.86) reduces to (1.1.85).

Commuting Transformations Two square matrices A and B of the same size commute if AB = BA. † † † A matrix A that commutes with A is called a normal matrix. Then AA = A A. Two matrices with a common set of eigenvectors commute. For this case, the order in which the transformations are applied AB or BA does not affect the . Two transformations that are represented by matrices comprise a relationship known as a commutator defined as . [A, B] = AB BA. (1.1.87) − Two matrices that do not commute do not have a common set of eigenvectors, the commutator [A, B] is nonzero, and the order in which the transformations are applied does affect the outcome. Two square matrices A and B that do not commute can always be embedded in two larger square matrices that do commute by appending additional row and columns. The proof of this statement is asked as an exercise at the end of the chapter.

Singular Value Decomposition A transformation described by a matrix need not have real eigen- values and the eigenvectors need not be orthogonal. A useful decomposition of the matrix A, called the singular-value decomposition, is

† A = UMV . (1.1.88) † The matrices U and V are each unitary. The columns of U are the eigenvectors of AA , whereas † the columns of V are the eigenvectors of A A. All of the nonzero elements of the matrix M are among the diagonal elements mk. These are called the singular values of A. These values are † the nonnegative square roots √ξk of the eigenvalues ξk of the real symmetric matrix AA so that 2 † ξk = mk . For a hermitian matrix, because A is equal to A, U is equal to V and the matrix is | | diagonalized with the orthogonal eigenvectors of A.

22 1.2 Random Signals

Probability and are important topics for the subject of communications because the in- formation that is conveyed is always random and a communications channel is always noisy. To introduce a quantitative discussion of a , consider the random thermally-generated voltage across a resistor as a function of time. Because the voltage is random, the specific mea- sured waveform is only one of an infinite number of waveforms that could have been measured. Each possible waveform is called a sample function, or realization. The collection of all possible realizations defines a random process.k Four sample functions of the voltage waveform are shown in Figure 1.5. A random process is described in terms of its amplitude structure and its time structure.

Slice at time t 1 fv(v)

v Sample Functions (a) (b)

Figure 1.5: (a) Four realizations of a random voltage waveform. The slice across all possible real- izations at a time t1 defines the random variable v(t1) with a corresponding probability density function fv(v) shown in (b).

In Section 1.2.1, the amplitude structure of a random process at one time instant is described using a function. In Section 1.2.2, the temporal (or spatial) structure of a random process is described using correlation functions and power density spectra.

1.2.1 Probability Distribution Functions

Consider a slice at a fixed time t1 through all possible realizations of the random voltage shown in Figure 1.5a. The voltage at time t1 is a random variable, and as such is denoted by v. A specific instance, or realization, of the random variable v is denoted by v. If the sample function v(t) represents a complex random function of time, such as a complex-baseband signal, then the random variable v(t1) defined at time t1 is a complex random variable v consisting of a real part and an imaginary part. Associated with any discrete random variable is a probability distribution function called a proba- l bility mass function denoted by px(x) or p(x). Associated with any real continuous random variable x is a cumulative probability distribution function (cdf), Fx(x), and a probability density function, fx(x). The cumulative probability distribution function is defined as . Fx(x) = Pr x x , { ≤ }

kThis is also called a . lThe underlined subscript on p denoting the random variable is sometimes omitted for brevity

23 where Pr x < x is the probability that the random variable x is less than x. Every cumulative { } probability density function is nonnegative, monotonically increasing, and goes to one as x goes to infinity. The probability density function fx(x) is related to the cumulative probability density function Fx(x) by

. d fx(x) = Fx(x), (1.2.1) dx provided the derivative exists. The probability density function fx(x) is a nonnegative real function that integrates to one. The definite integral

Z x2 Pr x1 < x < x2 = fx(x)dx { } x1 is the probability that the random variable x lies in the interval between x1 and x2. The statistical expectation, or the , of the random variable x is defined as

. Z ∞ x = x fx(x)dx. (1.2.2) h i −∞ The expectation of x is also called the mean or the first moment of the random variable x. In a similar way, the expected value of a function g(x) of the random variable x is Z ∞ g(x) = g(x) fx(x)dx, (1.2.3) h i −∞ provided the integral exists. The nth moment of x is defined as Z ∞ n . n x = x fx(x)dx, (1.2.4) h i −∞ provided the integral exists. In order for the nth moment to exist, the probability density function must decrease faster than x−n as x goes to infinity. 2 The variance σx of the random variable x is defined as (cf. (1.1.30)) . σ2 = (x x )2 x h − h i i = x2 2 x x + x 2 h i − h ih i h i = x2 x 2. (1.2.5) h i − h i The variance measures the spread of the random variable x about the mean x . The square root h i σx of the variance of the random variable x, and is called the standard deviation of x. This is the root-mean-squared value (rms) of x if x has zero mean. As an example, consider the gaussian probability density function (cf. (1.2.18)) which will be discussed in detail later in this section. An important property for a gaussian random variable is that all moments of the probability density function of an order larger than two can be expressed

24 in terms of the first-order and second-order moments.m Therefore, the gaussian distribution is completely characterized by its mean and variance. As another example, consider the probability density function

 λx−(λ+1) x 1 fx(x) = ≥ , (1.2.6) 0 x < 1 which is called the Pareto probability density function with a Pareto index λ that is a positive number. It is asked as a problem at the end of the chapter to show that the mean is λ/(λ 1) for − λ 1 and otherwise is infinite, and to show that the variance is λ/[(λ 1)2(λ 2)] for λ 2 and otherwise≥ is infinite. − − ≥

Joint Probability Distributions A probability density function can be defined for more than one random variable. The probability density function for two random variables fx,y(x, y) is called a bivariate probability density function or a joint probability density function. The probability that the joint event x1 < x < x2 y1 < y < y2 occurs is equal to the volume under fx,y(x, y) over { } ∪ { } the rectangle supported by the two corners (x1.y1) and (x2, y2). Several other probability density functions can be defined in terms of the joint probability density function fx,y(x, y). Every joint probability density function fx,y(x, y) is associated with marginal probability density functions and density functions. The marginal probability density functions fx(x) and fy(y) are determined by integrating fx,y(x, y) over the range of the other variable, a process called marginalization. Thus, Z ∞ fy(y) = fx,y(x, y)dx (1.2.7a) −∞

Z ∞ fx(x) = fx,y(x, y)dy. (1.2.7b) −∞ Substituting the marginal (1.2.7a) into (1.2.2) gives Z ∞ Z ∞

y = yfx,y(x, y)dxdy −∞ −∞ as the mean for y. The probability density function of y given that the event x = x has occurred is called the { } conditional probability density function. It is denoted by f (y x) and is given by y|x |

fx,y(x, y) fy|x(y x) = , | fx(x) where the notation y x indicates that the probability density function of y depends on, or is con- | ditioned by, the event x = x . Likewise, the conditional probability density function, denoted by { } f (x y), is the probability density function of x given that the event y = y has occurred. x|y | { } mThis statement is called Isserlis theorem See Reed(1962).

25 The joint probability density function of the events x = x and y = y is equal to the probability { } { } density function of the event x = x multiplied by the conditional probability density function of { } the event y = y x = x , so that { | } fx,y(x, y) = fx(x)f (y x). (1.2.8) y|x |

Similarly, fx,y(x, y) = fy(y)f (x y). Equating these two expressions gives x|y |

fx(x)fy|x(y x) fx|y(x y) = | , (1.2.9) | fy(y) which is a form of Bayes rule.n When x represents a transmitted signal value and y represents the value of the received signal, the term on the left is called the posterior probability density function. It is the probability density that x was transmitted given that y is received. The marginal probability density function fy(y) can be expressed in terms of the conditional density function f (y x) and the other marginal probability density function fx(x) by integrating y|x | both sides of (1.2.8) with respect to x and using (1.2.7) to give Z ∞ fy(y) = fx(x)fy|x(y x)dx. (1.2.10) −∞ |

Correlation and Independence The random variables x and y are independent and fx,y(x, y) called a product distribution if the joint probability density function fx,y(x, y) can be written as fx,y(x, y) = fx(x)fy(y). If two random variables are independent, then knowing the realization of one random variable does not affect the probability density function of the other random variable. This means that f (y x) = f(y) for independent random variables x and y. y|x | If x and y are independent, then the probability density function of the sum x + y of the random variables is Z ∞ fz(z) = fx(z y)fy(y)dy = fx(z) ~ fy(z), (1.2.11) −∞ − where ~ is the convolution operator defined in (1.1.3). The derivation of this equation is discussed in an exercise at the end of the chapter. The correlation of two real random variables x and y is the expected value of their product. Then Z ∞ Z ∞ xy = xyfx,y(x, y)dxdy. (1.2.12) h i −∞ −∞ The covariance cov is defined as . cov = xy x y . (1.2.13) h i − h ih i If at least one of the two random variables has zero mean, then cov = xy . h i Two random variables x and y are uncorrelated if xy = x y in (1.2.12) or if cov = 0 in (1.2.13). h i h ih i If random variables x and y are uncorrelated at least one has a zero mean, then xy = cov = 0. h i nBayes rule is simply an immediate consequence of the definition of marginal and conditional distributions.

26 Characteristic Functions The expected value of the function eiωx of the random variable x that has a probability density function f(x) is called the characteristic function Cx(ω)

iωx Cx(ω) = e Z ∞ iωx = fx(x)e dx −∞ ∞ X (iω)n = xn , (1.2.14) n! h i n=0 provided the moments exist, where the power-series expansion for ex has been used in the last expression. For a discrete random variable with a probability mass function p(k) on the integers, the charac- teristic function is ∞ X iωk Ck(ω) = e p(k). (1.2.15) k=−∞

Using the convolution property of the Fourier transform given in (1.1.14), the characteristic function Cz(ω) of the probability density function fz(z) for the sum of two independent random variables x and y (cf. (1.2.11)) is equal to the product Cx(ω)Cy(ω) of the characteristic functions of the two probability density functions for the two random variables. The inverse Fourier transform of this product yields the desired probability density function Z ∞ 1 −iωz fz(z) = Cx(ω)Cy(ω)e dω. (1.2.16) 2π −∞ This expression is readily extended to multiple independent random variables. The nth moment of a probability density function, if it exists, can be determined by differentiation of the characteristic function n n 1 d x = n n Cx(ω) . (1.2.17) h i i dω ω=0 The derivation of this expression is assigned as a problem at the end of the chapter.

Probability Density Functions used in Communication Theory Probability density functions and probability mass functions are regularly encountered in the anal- ysis of communication systems. The most relevant probability density functions for the analysis of digital communication systems are reviewed in this section.

Gaussian Probability Density Function A gaussian random variable has a gaussian probability density function defined by

. 1 −(x−m)2/2σ2 fx(x) = e . (1.2.18) √2πσ

27 It is easy to verify that the mean of x is m and the variance is σ2. A unique property of gaus- sian random variables is that any weighted sum of multiple gaussian random variables, whether independent or dependent, is also a gaussian random variable. The probability that a unit-variance gaussian random variable exceeds a value z, Pr x > z is expressed in terms of the complementary error function, which is denoted by erfc and defined{ as} o

∞   1 Z 2 1 z e−x /2dx = erfc , (1.2.19) √2π z 2 √2 where erfc(z) = 1 erf(z) with erf(z) being the error function, defined as − Z z . 2 2 erf(z) = e−s ds. √π 0 For large arguments, the complementary error function can be approximated by

1 2 erfc(x) e−x . (1.2.20) ≈ x√π A multivariate gaussian probability density function is a joint probability density function for a block x of real random variables with components xi given by

1 T −1 1 − (x−hxi) C (x−hxi) fx(x) = p e 2 , (1.2.21) (2π)N det C where C is the real defined for any multivariate probability density function as

T C = (x x )(x x ) . (1.2.22) − h i − h i The square matrix C has a determinant det C. The diagonal matrix element Cii is the variance of the random variable xi. The off-diagonal matrix element Cij is the covariance (cf. (1.2.13)) of the two random variables xi and xj. These two elements are uncorrelated if Cij equals zero. In general, this need not be a strong statement, but for gaussian random variables, it means that they are independent. It is possible to have a joint probability density function such that each marginal density function is a gaussian probability density function, yet the joint probability density function is not jointly gaussian, and so is not given by (1.2.21). This means that knowing that each marginal probability density function is gaussian is not sufficient to infer that the joint probability density function is jointly gaussian. This is discussed in an end-of-chapter exercise. A zero-mean bivariate gaussian random variable consists of two random, zero-mean gaussian components x and y, which may be correlated. The covariance matrix given in (1.2.22) is

 2  σx ρxyσxσy C = 2 , (1.2.23) ρxyσxσy σy where . ρxy = xy /σxσy, (1.2.24)

28 y x'

y'

x

(a) (b)

Figure 1.6: Contours of the joint gaussian probability density function fx,y(x, y) as a function of the correlation coefficient ρxy: (a) ρxy = 0, (b) ρxy = 0.5. is defined as the correlation coefficient. An example of a two-dimensional joint gaussian probability density function is shown in plan view in Figure 1.6. If σx = σy = σ, then (1.2.21) reduces to

 2 2  1 x 2ρxyxy + y fx,y(x, y) = q exp −2 2 . (1.2.25) 2πσ2 1 ρ2 − 2σ (1 ρxy) − xy −

Moreover, if ρxy = 0, then fx,y(x, y) is a product distribution in the chosen coordinate system. For this case, the bivariate gaussian density function is called a circularly-symmetric density function with the bivariate gaussian random variable called a circularly-symmetric gaussian random variable. The joint gaussian probability density function fx,y(x, y), now also including a nonzero mean for each component, can then be written as     1 −(x−hxi)2/2σ2 1 −(y−hyi)2/2σ2 fx,y(x, y) = e e , (1.2.26) √2πσ √2πσ where the probability density function of each component is written in the form of (1.2.18). There- fore, uncorrelated gaussian random variables are independent. 2 For a set of N independent real gaussian random variables with C = σ IN , the joint probability density function is

1 −(x−hxi)2/2σ2 fx(x) = p e , (1.2.27) (2πσ2)N which factors as a product of the N single variable gaussian densities. A complex version of (1.2.27) is given in (1.2.33). A covariance matrix is a hermitian matrix and can be diagonalized by a change of basis. Therefore, any multivariate gaussian probability density function has a basis for which the probability density function expressed in this basis is a product distribution. The resulting marginal gaussian random variables in this basis are independent, but need not have the same mean and variance. √ o . 1 This probability is also expressed using the equivalent function Q(x) = 2 erfc(z/ 2).

29 For example, consider the two-dimensional gaussian probability density function given in (1.2.25) 2 2 2 2 0 0 with diagonal elements σx = σy = σ and off-diagonal elements σ ρxy. Define a new basis (x , y ) that is a rotation of the original basis (x, y). The components in the new basis for this example can be expressed by a unitary transformation R of components in the original basis as given by  x0   x  = . y0 R y

The matrix R is generated from the normalized eigenvectors of the covariance matrix C given in (1.2.23) and satisfies the matrix equation

T R CR = D, where D is a diagonal matrix with diagonal elements given by the eigenvalues of C. Using these 2 eigenvalues, the variances of the uncorrelated gaussian random variables in this new basis are σx0 = 2 2 2 σ (1 + ρxy) and σy0 = σ (1 ρxy), which can be equal only if ρxy = 0. Using the normalized − 0 1 0 1 eigenvectors of C, the components of the new basis are x = √ (x + y) and y = √ (x y). The 2 2 − joint gaussian probability density function in the new basis is a product distribution given by " #" # 1 02 2 1 02 2 f(x0, y0) = e−x /2σ (1+ρxy) e−y /2σ (1−ρxy) , p 2 p 2 2πσ (1 + ρxy) 2πσ (1 ρxy) − which is written to show that each marginal probability density function in the new basis is an independent gaussian probability density function of the form of (1.2.18) with each distribution having a different variance.

Complex Gaussian Random Variables and Vectors A complex gaussian random variable z = x+iy has components x and y described by a real bivariate gaussian random variable (x, y).A complex gaussian random vector denoted as z = x+iy has vector components zk described by a real bivariate gaussian random variable . (xk, yk) The multivariate complex gaussian probability density function describing a complex gaussian random vector z is

† −1 1 −(z−hzi) W (z−hzi) fz(z) = e , (1.2.28) πN det W where

. † W = (z z )(z z ) , (1.2.29) h − h i − h i i is the complex covariance matrix with † denoting the complex conjugate transpose. Using the N −1 −1 properties of determinants, the leading term (π det W) in (1.2.28) can be written as det(πW) .

Circularly-Symmetric Complex Gaussian Random Variables and Vectors A complex gaussian random variable z = x + iy with independent,zero-mean components x and y of equal variance is a complex circularly-symmetric gaussian random variable. The corresponding probability density

30 function is called a complex circularly-symmetric gaussian probability density function. A circularly- symmetric complex gaussian random variable has the property that eiθz has the same probability density function for all θ. Generalizing, a complex, jointly gaussian random vector z = x + iy is circularly symmetric when iθ each vector component e zk has the same probability density function for all θ. The multivariate complex gaussian probability density function for a circularly-symmetric gaussian random vector z is determined by setting z equal to zero in (1.2.28) and gives h i

† −1 1 −z W z fz(z) = e , (1.2.30) πN det W with

. † W = zz (1.2.31) h i being the autocovariance matrix W. To relate the covariance matrix W for a complex random vector to the covariance matrix C for a real random vector, define z as a vector of N complex circularly-symmetric gaussian random variables with a complex covariance matrix W given in (1.2.29). Define x as a real vector of length 2N that consists of the real part Re[z] and the imaginary part Im[z] in the order x = Re[z1], ..., Re[zN ], Im[z1], ..., Im[zN ] . The real 2N 2N covariance matrix C given in (1.2.22) is {expressed in block form as } ×   1 Re W Im W C = − , (1.2.32) 2 Im W Re W in terms of the N N complex covariance matrix W. × As an example, the covariance matrix of a set of N uncorrelated circularly-symmetric complex 2 gaussian random variables of equal variance has the form W = 2σ IN where IN is the N by N 2 2 N identity matrix. Using det(2σ IN ) = (2σ ) , the joint probability density function given in (1.2.28) reduces to

1 −|z−hzi|2/2σ2 fz(z) = e , (1.2.33) (2πσ2)N which separates into a product of N terms.

Rayleigh and Ricean Probability Density Functions Several useful random variables and their associated probability density functions are generated from the square root of the sum of the squares q r = x2 + y2 of independent gaussian random variables x and y, usually with equal variances and possibly with nonzero means. The random amplitude r has a geometrical interpretation as the length of a random vector with components x and y that are gaussian-distributed. The probability density function of the amplitude can be determined by transforming the joint probability density function fxy(x, y) from rectangular coordinates, which is given in (1.2.26), into the corresponding joint probability density function frφ(r, φ) in polar coordinates. The marginal probability density function fr(r) is generated by integrating out the functional dependence of φ.

31 (a) fr(r) fr(r) fr(r)

r r r (b) fφ(φ) fφ(φ) fφ(φ)

−π − π 0 π π −π − π 0 π π −π − π 0 π π 2 2 2 2 2 2

Figure 1.7: (a) The marginal probability density function of the amplitude fr(r) given in (1.2.35). The expected value A increases from left to right, (b) the marginal probability density functions for the phase fφ(φ) given in (1.2.37). The leftmost set of plots is for A = 0.

q Let A = x 2 + y 2 be the expected amplitude where x 2 and y 2 are the squares of the h i h i h i h i means of the independent gaussian random variables x and y, both with variance σ2. Now make p the change of variables, x = A cos θ, y = A sin θ, dxdy = dr rdθ, r = x2 + y2, x = r cos φ, and h i h i y = r sin φ. Then redefine φ with respect to the phase θ of the constant amplitude signal so that θ φ is replaced by φ. Using these substitutions and standard trigonometric identities, (1.2.26) becomes−

r 2 2 2 f (r, φ) = e−(r −2Ar cos φ+A )/2σ . (1.2.34) rφ 2πσ2

The marginal probability density function fr(r) is

Z 2π r −(r2+A2)/2σ2 Ar cos φ/σ2 fr(r) = 2 e e dφ. 2πσ 0 The integral can be expressed in terms of the modified Bessel function of the first kind of order zero.p Using the change of variable x = Ar/σ2, the probability density function of the amplitude r is   r −(r2+A2)/2σ2 A r fr(r) = e I0 r 0. (1.2.35) σ2 σ2 ≥ This probability density function is known as the ricean probability density function and character- izes a ricean random variable. As the ratio of A/σ becomes large, the ricean probability density

p . 1 R π x cos θ The modified Bessel function of the first kind of order ν is defined as Iν (x) = 2π −π e cos(νθ)dθ. The order ν can be an integer or a half integer.

32 function begins to resemble a gaussian probability density function with the same mean and vari- ance. For A = 0, the probability density function reduces to

r −r2/2σ2 fr(r) = e r 0, (1.2.36) σ2 ≥ which is known as a rayleigh probability density function that characterizes a rayleigh random vari- p able. This probability density function has mean σ π/2 and variance σ2(2 π/2). Plots of the ricean and the rayleigh probability density functions are shown− in Figure 1.7a. The marginal probability density function of the phase φ can be obtained by integrating (1.2.34) over r with the variables changed as r0 = r/√2σ, F = A2/2σ2, and B = √F cos φ. Completing the square in the exponent, and factoring produces Z ∞ 1 (B2−F ) 0 −(r0−B)2 0 fφ(φ) = e r e dr . π 0 To evaluate the integral, use the second change of variables R = r0 B to yield − ∞ ∞ Z 0 2 Z 2 r0e−(r −B) dr0 = (R + B)e−R dR 0 −B 1  2  = e−B + B√π1 + erf(B) . 2

Substituting this expression back into the expression for fφ(φ) gives

1  2   f (φ) = e−F + √πF cos φe−F sin φ 1 + erf√F cos φ . (1.2.37) φ 2π

The function fφ(φ) is a zero-mean, even, periodic function with a period T = 2π that integrates to one over one period for any value of F . If F = 0, then fφ(φ) = 1/2π and the marginal probability density function of the phase is a uniform probability density function over [ π, π). If F = 0, then − 6 fφ(φ) becomes “peaked” about φ = 0 with the width of the probability density function inversely related to F . These effects are shown in Figure 1.7b. For F 1 and φ 0, the approximation erf(√F cos φ) 1 holds. Using this expression, setting  ≈ ≈ cos φ 1, sin2 φ φ2, and neglecting the first term in (1.2.37) as compared to the second term gives ≈ ≈ r F −F φ2 fφ(φ) e . (1.2.38) ≈ π This is a zero-mean gaussian probability density function with variance 1/2F = σ2/A2. This form is evident in Figure 1.7c. While this approximation is defined only over π φ < π, it is sometimes a mathematically expedient approximation when the variance is small− to≤ extend the range to φ < because then the value of fφ(φ) is negligible outside the interval π φ < π. −∞ ≤ ∞ − ≤

33 Exponential and Gamma Probability Density Functions The central chi-square random variable with two degrees of freedom is known as an exponential random variable with an associated expo- nential probability density function. The general form of an exponential probability density function is f(z) = µe−µz, (1.2.39) with mean µ−1 and variance µ−2. The sum of k independent, identically-distributed exponential random variables, each with a mean µ−1, is a gamma random variable with a probability density function given by f(z) = µΓ(k)−1 (µz)k−1 e−µz, (1.2.40) where k > 0 and Γ(k) is the gamma function. The mean of the gamma probability density function is kµ−1 and the variance is kµ−2. Using the substitutions k = N/2 and µ = 1/2σ2, the gamma probability density function is equal to a central chi-square probability density function with N degrees of freedom. Plots of the gamma probability density function of several pairs (µ, k) are shown in Figure 1.8.

(a) (b) 1.0 0.14

(μ=1, k=1) 0.12 (μ=1, k=5) 0.8 0.10 (μ=1, k=30) 0.6 (μ=2, k=2) 0.08 (μ=1, k=50) p(x) p(x) 0.4 0.06 0.04 0.2 0.02

0.0 0.00 0 1 2 3 4 5 0 10 20 30 40 50 60 70 x x

Figure 1.8: (a) Plot of an exponential probability density function with µ = k = 1 and a gamma probability density function with µ = k = 2. (b) For a fixed value of µ, as k increases, the gamma probability density function approaches a gaussian probability density function.

The Central Limit Theorem A random variable that describes an event may arise as the sum of many independent random variables xi for repeated instances of some underlying constituent event. If the variances of the random variables xi are finite and equal, then the probability density 1 PN function px(x) for the normalized sum x = √ (x x ) as N goes to infinity will usually tend N i i − h i towards a gaussian probability density function with mean x irrespective of the functional form h i of the probability density functions pxi (xi) of the individual constituent events. When the density functions are all the same, the formal statement is called the central limit theorem. The central limit theorem explains why the gaussian probability density function and its variants are ubiquitous in statistical analysis. As an example of the application of the central limit theorem, consider the probability density function of the normalized sum of N independent and identically distributed (IID) complex ran- iφ iθ dom variables i added to a constant , where and are zero-mean, independent, and Aie Ae Ai φi

34 identically-distributed random variables, and the probability density function of is uniform over φi [0, 2π). The resulting normalized sum is a complex random variable written as

N 1 X iφ iθ S = A e i Ae √ i − N i=1

This can be written as S = x+iy where x = √1 PN (A cos φ A cos θ) and y = √1 PN (A sin φ N i=1 i i− N k=i i i− A sin θ). In the limit as N goes to infinity, asserting the central limit theorem yields a joint prob- ability density function fx,y(x, y) for S that is a circularly-symmetric gaussian probability density function centered on the constant Aeiθ. This is shown schematically in Figure 1.9. Although the

Many independent contributions

Sum of independent random vectors

Joint gaussian pdf Imaginary Axis (quadrature) Imaginary Axis (quadrature)

Real Axis (in-phase) Real Axis (in-phase)

Figure 1.9: The limit of the sum of many independent random vectors superimposed on a constant signal is a circularly-symmetric gaussian probability density function centered on the constant Aeiθ. central limit is quite powerful, the convergence to a gaussian distribution is not complete for any finite number of summand random variables. This means that calculations of small of events by using the central limit theorem to validate the use of a gaussian distribution may not be valid.

1.2.2 Random Processes Returning to the random voltage measurement shown in Figure 1.5, the probability density function of the random voltage defined at a single time instant is a first-order probability density function. To study the time structure of the random process at two time instants, consider a joint probability density function f(v1, v2; t1, t2) that describes the joint probability that the voltage v1 is measured at time t1 and the voltage v2 is measured at time t2. This probability density function is called a second-order probability density function because it relates two, possibly complex, random variables at two different times. The correlation of two continuous random variables defined from the same random process at two different times t1 and t2 defines the autocorrelation function Z ∞ Z ∞ . ∗ ∗ R(t1, t2) = v1v2 = v1v2f(v1, v2; t1, t2)dv1dv2. (1.2.41) h i −∞ −∞

35 The covariance function is defined as C(t1, t2) = R(t1, t2) v1 v2 (cf. (1.2.13)). Similar expres- sions in time and space are defined for electromagnetic fields− h inih Sectioni 1.3.3. There, the temporal properties of a random electromagnetic field are characterized by a temporal coherence function that is an extension of the autocorrelation function to electromagnetic fields. Higher-order joint probability density functions can be defined in a similar way using n different times. The collection, or ensemble, of all sample functions that could be measured, which are shown schematically in Figure 1.5, along with all of the nth-order probability density functions for all values of n, completely specifies a general random process. A gaussian random process is a random process for which every nth-order probability density function is jointly gaussian. If a gaussian random process is transformed or filtered by a linear system, then the output at any time t is a weighted superposition of gaussian random variables and so is also a gaussian random variable. Accordingly, filtering a gaussian random process pro- duces another gaussian random process, with the filtering affecting the mean, the variance, and the correlation properties, but not the gaussian form of the random process.

Stationarity and Ergodicity The analysis of a random process is simplified whenever some or all of the probability density functions are shift-invariant. When the first-order and the second-order probability density functions are time-invariant, then the mean is independent of the time t1, and the autocorrelation function depends only on the time difference τ = t2 t1 so that R(t1, t2) = R(t2 t1, 0) = R(τ, 0). When − − the autocorrelation function is time-invariant, it is written as R(τ). Random processes for which the first-order and second-order probability density functions are time-invariant are called stationary in the wide sense. In this case, the subscripts are dropped because the mean and the mean-squared value defined by (1.2.2) and (1.2.5) are now independent of time. If all probability density functions that can be defined for a random process are time- invariant, then the process is called strict-sense stationary. Often, every sample function of a stationary random process contains the same statistical informa- tion as every other sample function. If the statistical moments of the ensemble can be constructed from the temporal moments of a single sample function, then the random process is called ergodic. In particular, the expectation can be replaced with a time average over a single sample function v(t)

. 1 Z T v = v = lim v(t)dt h i T →∞ 2T −T for an ergodic random process.

Power Density Spectrum The power density spectrum (f) can be regarded as the density of the power per infinitesimal S frequency interval at the frequency f. For a wide-sense stationary random process s(t), the power density spectrum can be derived by defining a temporal function sT (t) with finite support given by . ∗ sT (t) = s(t)rect(t/2T ) with a Fourier transform UT (f). Convolving sT (t) and sT (t) and using the

36 convolution property of the Fourier transform gives Z ∞ ∗ 2 sT (τ)sT (t τ)dτ UT (f) . −∞ − ←→ | | Take the expectation of each side and divide by T . Then because the expectation is linear, we can write Z ∞ 1 ∗ 1 D 2E sT (τ)sT (t τ) dτ UT (f) . T −∞ − ←→ T | | . Now define R (τ) = s (τ)s∗ (t τ) as the autocorrelation function of the finite duration random T h T T − i process sT (t). In the limit as T goes to infinity, this becomes Z ∞ 1 1 2 lim RT (τ)dτ lim UT (f) . (1.2.42) T →∞ T −∞ ←→ T →∞ T | |

The Wiener-Khintchine theorem states that the left side is the autocorrelation function R(τ) of s(t) with the right side defined as the two-sided power density spectrum. This means that autocorrelation function R(τ) and the power density spectrum (f) are a Fourier transform pair given by S Z ∞ (f) = R(τ)e−i2πfτ dτ (1.2.43a) S −∞ Z ∞ R(τ) = (f)ei2πfτ df. (1.2.43b) −∞ S

If the autocorrelation function R(τ) is a real and even function, then from the properties of the Fourier transform (cf. (1.1.19)), the power density spectrum is real and even as well. For this case, a one-sided power density spectrum can be defined for positive f with a value that is twice that of (f). The relationship between the signal power and the two-sided power density spectrum is given byS Z ∞ P = (f)df. (1.2.44) −∞ S

Coherence Time

The coherence time τc defined as Z ∞ . 1 2 τc = 2 R(τ) dτ (1.2.45) R(0) −∞ | | | | is a measure of the width of the magnitude of the autocorrelation function of a stationary random process. For a random electromagnetic field (cf. (1.3.3)), the coherence time is the width of the temporal coherence function, which is discussed in Section 1.3.3. Within a coherence interval, defined as any interval of duration τc, the values of the random process are highly correlated. This means that over a coherence interval, the random process can be approximated as unchanging and described by a single random variable. A random process can

37 then be approximated as a sequence of random variables with each random variable defined over a coherence interval. The reciprocal quantity Bc = 1/τc of the coherence time is the effective bandwidth, and also quantifies the number of coherence intervals per unit time. The effective bandwidth can be written as

∞ 2 2 R (f)df R(0) −∞ S Bc = | | = , (1.2.46) R ∞ R(τ) 2 dτ R ∞ (f) 2 df −∞ | | −∞ |S | as a consequence of Parseval’s relationship. The concepts of autocorrelation function and power density spectrum are illustrated by determin- ing the autocorrelation function and power density spectrum for an example of a random process called the random telegraph signal. A sequence of random and independent nonoverlapping pulses, each transmitted within a time interval T and of duration T . For each pulse interval, the probability of transmitting amplitude 0 or A is equiprobable. This random process is written as ∞ X t nT j  s(t) = Anrect − − . n=−∞ T where j is an offset time described by uniformly-distributed random variable over [0,T ]. Three realizations with different random offset times are shown in Figure 1.10.

j1

j2

j3

Figure 1.10: Three possible realizations of a random binary waveform consisting of a random se- quence of marks of amplitude A and spaces of amplitude zero offset by a random variable j.

For any realization s(t) and for τ > T , the pulses are independent because they are generated from independent bits. Therefore, the autocorrelation function R(τ) in this region for equally likely pulses with p = 1/2 is

R(τ) = s(t)s(t + τ) h i = s(t) s(t + τ) h i h i A A A2 = = for τ > T. 2 2 4

38 This is the expected power in the random signal with mean amplitude of A/2. For τ < T , there are two possibilities depending on the value of the offset j for each sample function. If j > T τ, then the random variables s(t) and s(t + τ) are defined in different time − intervals. Thus s(t)s(t + τ) = A2/4 because each term has amplitude A with a probability of h i one-half and otherwise has amplitude zero. If j < T τ, then s(t) and s(t + τ) are defined in the − same time interval. If a mark is transmitted, then s(t)s(t + τ) = A2. If a space is transmitted, then s(t)s(t + τ) = 0. Because marks and spaces are equally probable, s(t)s(t + τ) = A2/2. The h i results for j < T τ and j > T τ are combined by recalling that the offset j is described by a − − uniform probability density function, f(j) = 1/T . The resulting autocorrelation is Z T −τ A2 Z T A2 s(t)s(t + τ) = dj + dj h i 0 2T T −τ 4T A2  τ  = 2 (0 < τ < T ). 4 − T The same analysis holds for negative values of τ. Therefore, the autocorrelation function is

( A2 4 τ > T R(τ) = 2   | | (1.2.47) A 2 |τ| τ T. 4 − T | | ≤ A plot of the autocorrelation function is shown in Figure 1.11a. The corresponding power density

(a) R(τ) A2 (b) 0 Zero-frequency 2 Modulated component −10 signal component Power (dB) −

d 20 A2 − 4 30

Normalize −4 −2 0 2 4 T T τ Normalized Frequency (fT) − Figure 1.11: (a) Autocorrelation function of a binary sequences of marks and spaces. (b) Power density spectrum. spectrum (f) is the Fourier transform of R(τ). The Fourier transform of the constant mean S power A2/4 is a Dirac impulse (A2/4)δ(t). The Fourier transform of the triangular function is a sinc-squared function. The total power density spectrum is A2 A2T (f) = δ(f) + sinc2(fT ), (1.2.48) S 4 4 and is shown in Figure 1.11b. For this waveform, half of the signal power is carried in the zero- frequency component and conveys no information. As a second example, consider the random process

M 1 X x(t) = A e−i2πfmt, √ m M m=1

39 where A is again a set of identically-distributed random variables indexed by m, and fm is { m} a known frequency for each m. We want to determine whether this random process is widesense stationary and whether this random process is ergodic. To be widesense stationary, the expected value must be independent of time and the autocorre- lation function can depend only on the time difference. The expected value is

M 1 X x(t) = A e−i2πfmt. h i √ h mi M m=1

For the process to be stationary, the expected value cannot depend on t. Examining x(t) , this h i condition is satisfied only if Am = 0. Therefore, the probability density function of the amplitude must have zero mean. The autocorrelationh i function is

M M X X R(t, t + τ) = A A∗ e−i2πfmt ei2πf`(t+τ) h m ` i m=1 `=1 M M X X = A A∗ e−i2π(fm−f`)t ei2πf`τ . h m ` i m=1 `=1

∗ In order for the autocorrelation function to depend only on τ and not t, the expected value AmA` ∗ 2 h i must vanish when m = `. If m = `, then AmA` = Am . Thus the requirement for the process 6 ∗ 2h i h| | i to be stationary is that AmA` = Am δm` where δm` is the Kronecker impulse. This condition implies that the randomh variablesi areh| uncorrelated.| i Combining the two observations, we conclude that in order for the process to be stationary, the random variables must have mean zero and be uncorrelated. To test for ergodicity, the temporal moments of a single sample function must be equal to the statistical moments of the ensemble. However, each term in the summation for a sample function is i2πfnt of the form Ane , and is sinusoidal. Therefore, each sample function is a deterministic function with individual temporal sections correlated over any time interval. It follows that the random process is not ergodic.

Noise Processes Many common noise processes, both electrical and optical, can be accurately modeled as stationary gaussian random processes with a constant power density spectrum over a limited frequency range. Let n(f) = N0/2 be a constant (two-sided) power density spectrum of a stationary, zero-mean S noise process n(t), possibly gaussian. This noise process has an equal contribution to the power density spectrum from every frequency component and is called a white noise process.q Because n(f) = N0/2 is a constant, the autocorrelation function Rn(τ) of a white noise process is a scaled DiracS impulse. When a stationary random process with a power density spectrum 0(f) is the input to a linear S time-invariant system with a causal impulse response and a baseband transfer function H(f), the

q A photodetected electrical signal has units of current so that N0 is sometimes expressed using an equivalent power density spectrum per unit resistance with units of A2/Hz. A discussion of units is given in the book section titled “Notation”.

40 power density spectrum (f) of the random process at the output is related to the power density S spectrum 0(f) at the input by the expression S (f) = 0(f) H(f) 2. (1.2.49) S S | | 0 For white noise, (f) = N0/2, and S 1 2 (f) = N0 H(f) . (1.2.50) S 2 | | Using (1.2.44), and the fact that the stationary noise process has zero mean, the output noise 2 power Pn at time t is equal to the variance σ given by (cf. (1.2.5)) so that Z ∞ 2 N0 2 Pn = σ = H(f) df (1.2.51a) 2 −∞ | | Z ∞ N0 = h(t) 2dt, (1.2.51b) 2 −∞ | | where the second line follows from Parseval’s relationship (1.1.18). In general, the power in the filtered noise process can be determined using (1.2.49) and (1.2.51b) along with the convolution property of the Fourier transform given by (1.1.14) Z ∞ 2 Pn = n(f) H(f) df −∞ S | | Z ∞ 2 i2πfτ = n(f) H(f) e df −∞ S | | τ=0 ∗ = Rn(τ) h(τ) h ( τ) (1.2.52) ~ ~ − τ=0 where Rn(τ) is the noise autocorrelation function defined in (1.2.43b) corresponding to n(f).A noise source whose power density spectrum varies with frequency is called a colored noiseS source. The noise power σ2 can be written as

2 σ = R(0) = GN0BN , (1.2.53) where BN is defined as the noise equivalent bandwidth Z ∞ . 1 2 BN = H(f) df (1.2.54) 2G −∞ | | 1 Z ∞ = h(t) 2 dt, (1.2.55) 2G 0 | | with the normalization constant G = max H(f) 2 equal to the maximum power gain of the ban- dlimited system that defines the noise power.| The| concept of noise equivalent bandwidth regards the total noise power as equivalent to a power density spectrum that is flat to a frequency BN and zero thereafter. For example, the noise equivalent bandwidth of filtered white noise using a baseband system described by 1 H(f) = 1 + if/ h W

41 is 1 Z ∞ 1 π BN = 2 2 df = h, H(0) 0 1 + (f/ h) 2 W | | W 2 2 where max H(f) = H(0) = G = 1, and h is the half-power or 3-dB bandwidth of the baseband | | | | W spectrum. The expression H(f) 2 and the corresponding noise equivalent bandwidth are plotted in Figure 1.12a. | | The passband noise equivalent bandwidth B measures the width of a passband spectrum (cf. Figure 1.3). It is twice the bandwidth of the baseband signal because it occupies twice the frequency range. The corresponding noiseW equivalent bandwidth of a passband system is plotted in Figure 1.12b.

(a) (b)

0 fc Frequency

Figure 1.12: (a) Noise equivalent bandwidth of a baseband transfer function. (b) Noise equivalent bandwidth of a passband transfer function that is symmetric about the carrier frequency fc.

To determine the noise equivalent bandwidth of a system described by the causal impulse response h(t) = Arect((t/T ) 1/2), use (1.2.55) with the upper limit set to T . The Fourier transform − of a rectangular pulse is a sinc function, which has its maximum value at f = 0. Therefore 2 2 R ∞ 2 G = max H(f) = H(0) = h(t)dt = Arect((t T/2)/T ). Then | | | | 0 − Z T 2 1 2 A T 1 BN = A dt = = . (1.2.56) 2G 0 2G 2T If A = 1, then the filter integrates the noise over a time T , and the noise power is

2 2 T N0 N0T σ = GN0BN = = , (1.2.57) 2T 2 as given by (1.2.53). The effect of the noise is quantified by the signal-to-noise ratio given by expected electrical signal power SNR = . (1.2.58) expected electrical noise power

42 Passband Noise

When the passband bandwidth B is much smaller than the passband center frequency fc, the bandlimited noise process is called a passband noise process. A passband noise process ne(t) can be written in several equivalent forms as

n(t) = nI (t) cos(2πfct) nQ(t) sin(2πfct) e − = A(t) cos (2πfct + φ(t)) h i = Re A(t)ei(2πfct+φ(t)) h i = Re n(t)ei2πfct , (1.2.59)

iφ(t) where n(t) = nI (t) + inQ(t) = A(t)e is the complex-baseband noise process. The term nI (t) =

A(t) cos φ(t) is the in-phase component of the noise, nQ(t) = A(t) sin φ(t) is the quadrature compo- nent of the noise. The autocorrelation function of ne(t) is R (τ) = n(t)n(t + τ) ne(t) e e hD h i i h iE = Re n(t)ei2πfct Re n(t + τ)ei(2πfc(t+τ)) .

1 1 ∗ Using the identity Re[z1]Re[z2] = 2 Re[z1z2]+ 2 Re[z1z2], and the fact that θ is an independent phase gives h i h i 1 i[2πfc(2t+τ)] 1 ∗ i2πfcτ Rn(τ) = Re n(t)n(t + τ) e + Re n(t) n(t + τ) e e 2 h i 2 h i h i = 1 Re n(t)∗n(t + τ) ei2πfcτ 2 h i h i 1 i2πfcτ = 2 Re Rn(τ)e , (1.2.60) where the rapidly oscillating first term is neglected compared to the second term. The term Rn(τ) = n(t)n(t + τ) is the autocorrelation function of the complex-baseband noise process. h The passbandi power density spectrum is obtained by taking the Fourier transform of (1.2.60)

1 ∗  n(f) = n(f fc) + ( f fc) , (1.2.61) Se 4 S − Sn − − where the modulation property of the Fourier transform given in (1.1.16) has been used. The passband power density spectrum for the noise is shown in Figure 1.13a. The power density spectrum of the complex-baseband noise process is obtained by substitut- ing (1.1.58) into (1.2.61) and equating terms

2 n(f) = 2N0 H(f) S | | = 4S (f + f ) f > 0, (1.2.62) ne c where only the positive frequency part of the passband noise power density spectrum S (f) is used to ne determine the complex-baseband power density spectrum Sn(f), and H(f) is the complex-baseband transfer function defined in (1.1.58). This power density spectrum is shown in Figure 1.13b.

43 N /2 (a) 0

fc fc 2N0

(b)

fc fc N0

Passband Noise Bandwidth (c)

fc BN fc

Figure 1.13: (a) Power density spectrum of the real passband noise process S (f), (b) The complex- ne baseband power density spectrum Sn(f), (c) Power density spectrum of the complex-

baseband noise components, SnI (f) and SnQ (f), along with the passband noise equiv- alent bandwidth.

The power density spectrum for each complex-baseband noise component of a passband noise process with a passband bandwidth B is determined using the same steps that were used to derive (1.2.61) with

n (f) = n (f) S I S Q = n(f fc) + n(f + fc). Se − Se This spectrum is shown in Figure 1.13c. 2 Using (1.2.60), the noise power σ in the passband noise process ne(t) is Z ∞ 2 N0 2 σ = He(f) df 2 −∞ 1 ∗ = Rn(0) = Re [ n(t)n (t) ] e 2 h i = 1 n(t) 2 . (1.2.63) 2 | | A complex-baseband signal has twice the signal energy as the corresponding passband signal (cf.

(1.1.74)). Accordingly, the complex-baseband noise equivalent bandwidth BN is defined to be twice as large as the baseband noise equivalent bandwidth given in (1.2.54) so that Z ∞ . 1 2 BN = H(f) df. (1.2.64) G −∞ | |

44 The difference in the noise equivalent bandwidth for a real-baseband signal and a complex-baseband signal is shown in Figure 1.12. For the idealized case in which a constant noise power density spectrum N0 is filtered by an ideal lowpass filter, the noise equivalent bandwidth BN is equal to the effective bandwidth Bc defined in (1.2.46). To show this, let H(f) = rect(f/B) so that the filtered noise process is n(f) = S N0rect(f/B). The effective bandwidth of this noise process is

R ∞ 2 n(f)df −∞ S Bc = R ∞ 2(f)df −∞ Sn R ∞ 2 −∞ N0rect(f/B)df = R ∞ 2 2 = B. (1.2.65) −∞ N0 rect (f/B)df The complex-baseband noise equivalent bandwidth is given by (1.2.64) Z ∞ 1 2 BN = H(f) df G −∞ | | Z ∞ 1 2 2 = 2 N0 rect (f/B)df = B. N0 −∞

Therefore, BN = Bc = B. For this case, the coherence time defined in (1.2.45) is the exact reciprocal of the noise equivalent bandwidth BN . For other filters, this equality need not hold, and the number of coherence intervals per unit time Bc does not equal the noise equivalent bandwidth BN .

When the in-phase nI (t) and quadrature nQ(t) noise components of a passband noise process ne(t) are independent, and have the same statistics, the autocorrelation for each noise component is the same with RnI (τ) equal to RnQ (τ). The corresponding autocorrelation function Rn(τ) of the complex-baseband process is

Rn(τ) = 2RnI (τ) = 2RnQ (τ). (1.2.66)

The power in each quadrature component can be determined using (1.2.63) and (1.2.66)

σ2 = n2(t) = n2(t) = n2 (t) = 1 n(t) 2 . (1.2.67) e I Q 2 | | When the complex process is a gaussian random process, then the two-dimensional gaussian prob- ability density function of the in-phase and quadrature components at any time t describes a circularly-symmetric gaussian random variable with the process being a circularly-symmetric gaus- sian random process.

Noise Figure The amount of noise added to a signal in a linear system can be quantified by the concept of noise figure FN , which is defined in terms of the signal power and the noise power over a bandwidth B. A frequency-dependent noise figure FN (f) can be defined by constraining the bandwidth B, centered at f, to be sufficiently narrow so that the signal power and the noise power are independent of frequency over bandwidth B.

45 The frequency-dependent noise figure FN (f) is defined as the ratio of the total noise power Pin(f) in the system to the noise power Pin(f) at the input due solely to thermal noise

Pin(f) + Pa(f) Pa(f) FN (f) = = 1 + , (1.2.68) Pin(f) Pin(f) where Pa(f) is an additional uncorrelated noise contribution, typically from an amplifier, that is added to the input noise Pin(f). The noise powers are referenced to the output of the system with a transfer function H(f) defined over the bandwidth B given that the noise and signal are frequency independent. The input power Sin(f) for the signal is filtered by the system to produce the output 2 signal power Sout(f) = Sin(f) H(f) . The frequency-dependent noise figure can be expressed in terms of the signal-to-noise ratio| by| multiplying the numerator and denominator of (1.2.68) by 2 Sout(f) = Sin(f) H(f) to obtain | | 2 Sin(f) H(f) (Pin(f) + Pa(f)) FN (f) = | | 2 . (1.2.69) Sin(f) H(f) Pin(f) | | Then

Sin(f) Pout(f) FN (f) = (1.2.70) Sout(f) Pin(f)

2 where Pout(f) = H(f) (Pa(f) + Pin) is the output noise power. | | The frequency-independent form of the noise figure FN is

SinPout SNRin FN = = , (1.2.71) SoutPin SNRout where the signal power and the noise power need not be frequency independent over the bandwidth B.

1.3 Electromagnetics

The physical quantity used for wireless communications is a time-varying electromagnetic field de- scribed by Maxwell’s equations. These equations comprise a system of partial differential equations having a rich set of solutions that depend both on the geometry and on the medium. This section provides a summary review of Maxwell’s equations as used for analyzing electromagnetic signal propagation. The electromagnetic field in free space can be described by two vector functions of space r and time t. These functions are the electric field vector E(r, t) and the magnetic field vector H(r, t). When an electromagnetic field interacts with a material, two additional material-dependent vector quantities D(r, t) and B(r, t) are needed to describe the interaction of the electromagnetic field with the material. In the most basic case, these quantities are scaled forms of E(r, t) and H(r, t) with D(r, t) = εE(r, t) and B(r, t) = µH(r, t) where ε and µ are material-dependent constants. These constants convert the field quantities E(r, t) and H(r, t) into the material-dependent quantities D(r, t) and B(r, t).

46 Suppressing the arguments of the vector functions, Maxwell’s set of partial differential equations for a material with no free charge is given by ∂B E = (1.3.1a) ∇ × − ∂t ∂D H = (1.3.1b) ∇ × ∂t B = 0 (1.3.1c) ∇ · D = 0. (1.3.1d) ∇ · The operator is the scalar divergence differential operator, and the operator is the vector curl differential∇· operator. Integrating the electric field vector E along a line produces∇× the potential difference between the end points of the line. Integrating the magnetic field vector H around a closed curve yields the “displacement current” that passes through the surface enclosed by the closed curve. Each material-dependent quantity, D or B, has units of field per unit area and is called a flux density. The term D, called the electric flux density, accounts for the bound charge within the dielectric material, and so can generate an electric field. The term B, called the magnetic flux density, accounts for any closed current paths that are induced within the material, and so generate a magnetic field. The application of Maxwell’s equations to electromagnetic wave propagation at the single fre- quency ω can be described as a monochromatic electric field vector E(r, t) = Re[E(r, ω)eiωt]. Using a set of unit vectors x, y, z defined for cartesian coordinates, the expression {b b b} E(r, ω) = Ex(r, ω)xb + Ey(r, ω)yb + Ez(r, ω)bz, (1.3.2) is the complex-baseband vector fieldr at a position r and frequency ω of the monochromatic com- ponent given by eiωt. The general electric field is then a superposition over all appropriate ω. As will be explained in Section 1.3.2, the function E(r, ω) satisfies the vector Helmholtz equation (see (1.3.15)) given by

2E(r, ω) + k2n2(r, ω)E(r, ω) = 0, ∇ 0 where k0 = ω/c0 is the free-space wavenumber, and n(r, ω) is the index of refraction as a function of the position r and the frequency ω. Given the boundary conditions determined by the geometry, the solution to this equation describes the propagation of a monochromatic electromagnetic field, which is a field with that has a single temporal frequency.

1.3.1 Material Properties The two quantities D and B account for the effect of the material on the field. In free space, the relationship between the electric field vector E and the electric flux density D is given by D = ε0E where ε0 is a constant known as the permittivity of free space. Similarly, in free space, the magnetic flux density B is related to the magnetic field vector by B = µ0H, where µ0 is a constant known as the permeability of free space.

rThe term “complex-baseband vector field” is not generally used in electromagnetics. Its use here is to emphasize the relationship between electromagnetic fields and the communication signals derived from electromagnetic fields.

47 The materials used for the fabrication of optical communication fibers are glass or plastic. These have no free charge. Materials with no free charge are called dielectric materials. To guide light, the materials are spatially modified to create two or more regions with different dielectric properties. The fields in each region are constrained by the boundary conditions at the interface between the regions. The boundary conditions can be derived from the integral formulation of Maxwell’s equations where either a line integral or a surface integral spans the discontinuous interface. Applying a limiting operation results in the expressions for the boundary conditions for the differential form of Maxwell’s equations. These conditions state that for a dielectric material, the tangential components of the fields E and H and the normal components of the flux densities D and B must be continuous across the interface. When an electric field E is applied to a dielectric material, the bound charge separates slightly to create dipoles. These separated charges produce an additional field, called the material polarizations P, that is added to the original field to produce the electric flux density D. Accordingly, in a dielectric material, the flux densities are related to the fields by

D = ε0E + P (1.3.3a)

B = µ0H, (1.3.3b) which are known as the constitutive relationships. The relationship between the applied electric field E and the resulting material polarization P is the material susceptibility, denoted X. When the susceptibility is linear and does not depend on time or space, it appears as the simple expression

P(t) = ε0XE(t). (1.3.4)

In general the susceptibility may be a scalar function of space X(r), a scalar function of time X(t), or both X(r, t), or even a tensor. These are each described in turn in the following paragraphs.

Homogeneous and inhomogeneous media A material whose parameters do not depend on the position r is homogeneous. If some material parameters do depend on r, then the material is inho- mogeneous. A common form of an inhomogeneous material is a material for which the susceptibility depends on r, and so the index of refraction, denoted n(r), depends on r. The relationship between the index of refraction and the susceptibility is derived later in this section.

Isotropic and anisotropic materials A material whose properties do not depend on the orientation of the electric field within the material are called isotropic materials. Materials for which the material properties depend on the orientation of the electric field are called anisotropic materials. The response of an anisotropic material varies depending on the orientation of the electric field vector E with respect to a set of preferred directions, called principal axes, which are a consequence of the internal structure of the material. An optically-anisotropic material may exhibit birefringence, which describes a material that has an angularly-dependent index of refraction with an index that is different for two orthogonal po- larization states. For a birefringent material, the material polarization component i in a direction P sTwo distinctly different properties are referred to using the word “polarization”—the field polarization and the material polarization P within a material.

48 i for i = 1, 2, 3 representing directions x, y, z, depends on the electric field component i in that E direction as well as the components j in directions for which j is not equal to i. For a material that is both linear and memoryless, thisE dependence can be written as

3 X i = ε0 Xij j, (1.3.5) P E j=1 where Xij is an element of a three-by-three matrix X called the susceptibility tensor. The suscepti- bility tensor represents the anisotropic nature of the material. Birefringent materials are discussed later in this section.

Linear causal media Many materials exhibit a causal, linear response in time, and a local response in space for which the material polarization P at each position r depends on the incident electric field E only at that position and not at other positions. In this case, suppressing the spatial dependence on r from the notation, the polarization P(t) for an isotropic material can be written, in general, as a temporal convolution Z ∞ P(t) = ε0 X(t τ)E(τ)dτ, (1.3.6) 0 − where X(t) is called the temporal susceptibility of the material. The temporal susceptibility de- scribes the real, causal impulse response of the material to the electric field at each position r. Accordingly, the temporal susceptibility represents the temporal memory of the material to the applied E field, as expressed by the temporal convolution in (1.3.6).

Linear Material Dispersion The susceptibility X(t) may have a narrow timewidth compared to the relevant time scale. Then the susceptibility is modeled as instantaneous with X(t) = Xδ(t) where X is a real constant. Materials that have no memory on the relevant time scale are called nondispersive materials. Using (1.1.2) in (1.3.6), the material polarization at a single point in space for a nondispersive material is given by (1.3.4). Substituting that expression into (1.3.3a) gives

D(t) = ε0 (1 + X) E(t)

= ε0εrE(t), (1.3.7) . where εr = 1 + X is the relative permittivity of the nondispersive material. Materials that do exhibit memory on the relevant time scale are called dispersive materials. For a dispersive material, the material polarization P at a single point of the dielectric material often has the linear restoring force of a simple harmonic oscillator, in which case the material is linear. The dielectric material consists of a volume of bound charge with a density N in units of coulombs per cubic centimeter, possibly depending on r. Each bound charge with a mass m and a total charge q has a linear restoring force that models the “stiffness” of the material response. The restoring force can also be nonlinear leading to a nonlinear relationship between E and P. For a linear isotropic dielectric dispersive material, the material polarization P(t) at each point r is in the same direction as the applied electric field E(t) with the response of a single component of

49 (t) to a single component of an applied electric field (t) described by the second-order differential equationP E

2 2 d (t) d (t) 2 Nq (t) P + σω P + ω (t) = E , (1.3.8) dt2 dt 0P m 2 where ω0 = K/m is the resonant frequency of the material response, and σω is the width of the resonance. For this linear material, the response to a monochromatic electric field (t) = Re[E(ω)eiωt] is E another monochromatic field (t) = Re[P (ω)eiωt]. Substituting these forms into the differential P equation (1.3.8), the complex spectral susceptibility, denoted as χ(ω), is

2 1 P (ω) ω0 χ(ω) = = χ0 2 2 . (1.3.9) ε0 E(ω) ω ω + iσωω 0 − Because the spectral susceptibility χ(ω) is the temporal Fourier transform of temporal susceptibility X(t) denoted as X(t) χ(ω), the temporal susceptibility X(t) follows from (1.3.9). As ω goes ←→ 2 2 to zero, χ(ω) goes to χ0 where χ0 = Nq /mε0ω0 is the low-frequency susceptibility. As ω goes to infinity, χ(ω) goes to zero. Because the temporal susceptibility X(t) is a real, causal, linear function of time, there is an inher- ent relationship between the real part χR(ω) and the imaginary part χI (ω) of the complex spectral susceptibility χ(ω) given by the Kramers-Kronig transform (cf. (1.1.23)). A similar relationship exists between the magnitude of χ(ω) and the phase of χ(ω).

1.3.2 The Wave Equation Modulated waveforms within the wave-optics signal model are electromagnetic waves. The wave equation for electromagnetics can be derived by applying the vector curl operation to both sides of (1.3.1a). Substituting (1.3.1b) and the constitutive relationships, (1.3.3a) and (1.3.3b) for a nondispersive material into the resulting equation yields

1 ∂2E(r, t) ∂2P(r, t) E(r, t) + 2 2 = µ0 2 , (1.3.10) ∇ × ∇ × c0 ∂t − ∂t where c0 = 1/√ε0µ0 is the speed of light in vacuum. Substituting the vector identity

E(r, t) = E(r, t) 2E(r, t) ∇ × ∇ × ∇ ∇ · − ∇ and (1.3.4) into (1.3.10) gives

2 2 2  n ∂ E(r, t) E(r, t) E(r, t) 2 2 = 0, (1.3.11) ∇ − ∇ ∇ · − c0 ∂t where

2 . n = 1 + X = εr (1.3.12) defines the index of refraction n for a nondispersive homogeneous material, commonly referred to simply as the index. More generally, for an inhomogeneous material, the index depends on r. For

50 a dispersive material, the index depends on time (or frequency). The index n = c0/c relates the speed of light c in the material to the speed of light c0 in vacuum with c = c0/n, as will be shown later in this section. For silica glass, the index is approximately 1.5. For a nondispersive inhomogeneous material for which the index n(r) varies as a function of r slowly in comparison to the variations of E(r, t) as a function of r, the term ( E(r, t)) in (1.3.11) is zero or can be neglected, and the wave equation is written as ∇ ∇·

2 2 2 n (r) ∂ E(r, t) E(r, t) 2 2 = 0. (1.3.13) ∇ − c0 ∂t The specific conditions that need to be satisfied for (1.3.13) to be valid are discussed in an end-of- chapter problem.

Complex Representations The analysis of Maxwell’s equations in a dispersive medium can often be simplified by using a complex representation based on Fourier analysis. For a time-invariant system, the temporal de- pendence of the electric field vector E(r, t) can be written as an inverse temporal Fourier transform (cf. (1.1.9))

1 Z ∞ E(r, t) = E(r, ω)eiωtdω, (1.3.14) 2π −∞ showing the temporal dependence of E(r, t) expressed as a superposition of complex frequency components E(r, ω)eiωt. Substituting this form into (1.3.13), and using ∂2(E(r, ω)eiωt)/∂t2 = ω2E(r, ω)eiωt, the complex representation of the wave equation for a dispersive material is de- −scribed by the vector Helmholtz equation

2E(r, ω) + k2n2(r, ω)E(r, ω) = 0, (1.3.15) ∇ 0 where E(r, ω) is the complex electric vector field at a position r and frequency ω (cf. (1.3.2), and n(r, ω) is the index at that position and frequency.t A similar expression governs the complex vector magnetic field H(r, ω). The square n2(r, ω) of the frequency-dependent index appearing in (1.3.15) is equal to 1 + χ(r, ω) (cf. (1.3.12)). The complex spectral susceptibility χ(r, ω) (cf. (1.3.9)) is the temporal Fourier transform of the real temporal susceptibility X(r, t) (cf. (1.3.6)) and characterizes the dispersive properties of the material. For a dielectric material, a basic expression for χ(r, ω) is given by (1.3.9). In a homogeneous, nondispersive dielectric material n(r, ω) reduces to a constant n, and the vector Helmholtz equation reduces to

2E(r) + k2E(r) = 0, (1.3.16) ∇ . where k = nk0 is defined as the wavenumber.

tThe expression for the frequency-dependent index n(r, ω) can be formally derived by introducing (1.3.6) into (1.3.11)

51 When all vector components of the E field and the H field have the same functional form, then wave propagation can be analyzed using a scalar form of the Helmholtz equation given by

2U(r, ω) + k2n2(r, ω)U(r, ω) = 0, (1.3.17) ∇ 0 where U(r, ω) is a scalar function representing one component of either the electric field E(r, ω) or the magnetic field H(r, ω) normalized so that U(r, ω) 2 represents a spatial power density. This normalization is discussed later (cf. (1.3.29)). For| a constant| index, the scalar Helmholtz equation simplifies to (cf. (1.3.16))

2U(r) + k2U(r) = 0, (1.3.18) ∇ with k being the wavenumber. This equation is used to develop geometrical optics later in this section.

Plane waves in unbounded media To solve the vector Helmholtz equation for a specified geometry and medium requires applying the specified boundary conditions in an appropriate coordinate system. The most basic geometry consisting of an unbounded, lossless, linear, isotropic, homogeneous medium, not necessarily free space, allows the simplest solution of the Helmholtz equation. To this point, the fields E and H of the form

−iβ·r E(r) = E0e be (1.3.19a) −iβ·r H(r) = H0e hb, (1.3.19b) satisfy (1.3.15) for such a medium, where be and hb are orthogonal unit vectors. This is a plane wave. The cross product e hb defines the real propagation vector βb = βxx + βyy + βzz with the unit . b × b b b vector βb = β/β pointing in the direction of propagation of the plane wave. The magnitude β = β is called the propagation constant. In general, a wave for which both the electric field and the| | magnetic field are transverse to the direction of propagation is called a transverse electromagnetic wave or a TEM wave. The complex amplitudes E0 and H0 for the field given in (1.3.19) for the plane wave are related by p H0 = E0/η where the material-dependent quantity η = µ0/ε is the impedance, which may depend on the frequency. The spatial phase term e−iβ·r depends on the inner product of the propagation vector β and the position vector r = x xb + y yb + z bz. iφ Using E0 = E0 e , the real electric field for a plane wave is given by | |  iωt E(r, t) = Re E(r)e = E0 cos (ωt β r + φ) e. (1.3.20) | | − · b The cosine function repeats in time whenever ωt = m2π, where m is an integer. This condition defines the temporal period T = 2π/ω = 1/f. The cosine function repeats in space whenever β r = m2π. This corresponds to a spatial period or wavelength λ = c/f that is defined as the · distance along the βb direction between two consecutive phase fronts in the medium. For a plane wave, the propagation vector β is also called the wavevector k, and is given by . β = k = k0n(ω)βb, (1.3.21)

52 . where the magnitude k = k = n(ω)k0 is the wavenumber. Setting the propagation constant β | | equal to the wavenumber k gives kλ = 2π or

λ = 2π/k.

The phase velocity c is defined by

. λ ω ω c = = λ(ω) = , (1.3.22) T 2π β(ω) where the notation shows dependence of the propagation constant β, and thereby the dependence of the wavelength λ on the frequency ω.

Dispersion Relationship Substituting the planes waves of (1.3.19) into the vector Helmholtz equation in (1.3.15) and noting that the spatial operator 2 reduces to a multiplication by β2 for ∇ − a plane-wave field of the form of (1.3.19), the Helmholtz equation has a solution when β(ω) satisfies

ωn(ω) 2πn(ω) β(ω) = k0n(ω) = = , (1.3.23) c0 λ0 where k0 = 2π/λ is the free-space wavenumber with λ0 being the free-space wavelength. Using c0 = ω/k0 = λ0f, the index of refraction n(ω) = c0/c(ω) is the ratio of the phase velocity in free space to the phase velocity in the medium. The functional dependence β(ω) of the propagation constant β on the frequency ω is called the dispersion relationship. Values of ω and β that satisfy (1.3.23) produce plane-wave solutions to the Helmholtz equation. As the field propagates a distance L in a lossless medium in the direction of βb, the complex amplitude of the field is multiplied by e−iβ(ω)L, which is a distance-dependent phase shift. This phase shift does not change the functional form of the solution. Each plane-wave solution is called an eigenfunction or eigenmode, of the unbounded, lossless, linear, homogeneous medium with the phase shift e−iβ(ω)L being the eigenvalue defined at a distance L. For other geometries, such as waveguiding structures, the dispersion relationship is different than (1.3.23) because of the presence of boundaries. This dispersion relationship must be derived for a given geometry starting with the vector Helmholtz equation given in (1.3.15), and applying the geometry-dependent boundary conditions. This means that different waveguiding geometries have different dispersion relationships.

Poynting Vector The cross product of the time-varying vector fields E(r, t) and H(r, t), which are not necessarily plane waves, nor orthogonal, is defined as the Poynting vector S(r, t) . S(r, t) = E(r, t) H(r, t). × The Poynting vector is a directional spatial power density with units of power per unit area. It may have components that are not in the direction of propagation.

53 Narrowband fields can be written as E(r, t) = Re[E(r, t)eiωct] and H(r, t) = Re[H(r, t)eiωct], where the complex-baseband electromagnetic fieldsu E(r, t) and H(r, t) are slowly varying as com- pared to the carrier frequency ωc . The time-average power density Save(r, t) for the complex- baseband field is expressed as

 1 ∗  Save(r, t) = Re E(r, t) H (r, t) (1.3.24a) 2 × = Re [Save(r, t)] (1.3.24b)

1 ∗ where Save(r, t) = E(r, t) H (r, t) is the complex Poynting vector. 2 × The intensity of the field I(r, t) is defined as the magnitude of Save(r, t) . I(r, t) = Save(r, t) . (1.3.25) | | When both field vectors are transverse to the direction of propagation, the wave is a transverse TEM wave with the Poynting vector along the direction of propagation. When, instead, one of the two fields has a axial vector component along the direction of propagation, as in a dielectric waveguide, the Poynting vector has a component along the direction of propagation and a component transverse to the direction of propagation.

Power The power P (z, t) flowing along the z axis through a cross-sectional region transverse to the z A axis at a distance z is given by Z P (z, t) = Save(r, t) bz dA, (1.3.26) A · where bz is a unit vector in the z direction and dA is the differential of the area. For a TEM wave propagating in the z direction, the Poynting vector Save(z, t) is in the z direction, and Z P (z, t) = I(r, t)dA. (1.3.27) A If the intensity is constant over the transverse region , then the power P (z, t) and the intensity A I(z, t) differ only by the constant scaling factor equal to the area of the region . As an example, the complex Poynting vector given in (1.3.24a) for the planeA wave described by (1.3.19) is

1 ∗ Save(r, t) = E(r, t) H (r, t) 2 × 1 −iβ·r  iβ·r  = 2 E0e be × H0e hb 2 E0 = | | βb (1.3.28) 2η

2 where e hb = βb and H0 = E0/η (cf. (1.3.19)). The corresponding intensity is I = Save = E0 /2η. b× | | | | uThe same symbol E is used for the complex-baseband electric field E(r, t) in the time domain and the complex electric field E(r, ω) in the frequency domain. The context will resolve the ambiguity.

54 Equation (1.3.28) states that the intensity of a TEM wave is proportional to the squared magni- 2 tude E0 of the electric field, and therefore proportional to the squared magnitude of the magnetic | | field. For this case, it is convenient to define a complex field envelope U(r, t) representing with the squared magnitude of U(r, t) normalized so that

I(r, t) = U(r, t) 2, (1.3.29) | | where U(r, t) represents either E(r, t) or H(r, t).

1.3.3 Random Electromagnetic field Fields A noncoherent wireless carrier may have random fluctuations in both time and space. The temporal properties of a stationary random signal were characterized in Section 1.2.2 by an autocorrelation function. This section extends that analysis to the random properties of a electromagnetic field by defining a joint coherence function for time and space together. The temporal of a electromagnetic field is quantified by the temporal coherence function. For a single point in space, this function is an extension of the autocorrelation function defined in (1.2.41) that includes the vector nature of the field. The spatial randomness in a electromagnetic field is quantified by the spatial coherence function. In general, the term autocorrelation function is commonly used when describing the statistical prop- erties of electrical signals, whereas the term coherence function is commonly used when describing the statistical properties of electromagnetic fields. The meaning is much the same. For a random electromagnetic field, the intensity I(r, t) at a single time instant and single point in space is defined as the expectation of the random complex field envelope U(r, t) (cf. (1.3.29))

I(r, t) = U∗(r, t) U(r, t) = U(r, t) 2 . (1.3.30) h · i h| | i

Extending this definition to two points r1 and r2 in space, one of which may be delayed in time by τ, the first-order mutual coherence function ϕ(r1, r2, τ) is defined as

. ∗ ϕ(r1, r2, τ) = U(r1, t) U (r2, t + τ) , h · i where the expectation is over both time and space. The mutual coherence function may be viewed as the ability of an electromagnetic field to interfere at two points r1 and r2 in space at two times separated by τ. For r1 = r2 = r, the temporal coherence function describes the ability of an electromagnetic field to interfere with a time-delayed version of itself at the single point r. The width of the temporal coherence function, which can be defined in several ways, is called the coherence time τc (cf. (1.2.45)). Likewise, for τ = 0, the spatial coherence function describes the ability of an electromagnetic field to interfere in space at a single time instant t.A coherence region coh is A the spatial equivalent of the coherence time τc and is a region for which the electromagnetic field is highly correlated in space. For a single point in space, an ergodic electromagnetic field is one that satisfies

1 Z T ϕ(r, r, τ) = lim U(r, t) U∗(r, t + τ)dt. T →∞ 2T −T ·

55 Loosely, this expression states that a single realization of the random, ergodic time-varying electro- magnetic field is enough to compute a property defined for the ensemble. In general, the electro- magnetic field from an unmodulated source is ergodic. Likewise, the associated cross-coherence function for two complex electromagnetic fields is

∗ ϕij(r1, r2, τ) = Ui(r1, t) U (r2, t + τ) . (1.3.31) h · j i The intensity autocoherence function is defined as . ϕ (r, r, τ) = I(r, t)I(r, t + τ) I h i = U(r, t)U ∗(r, t)U(r, t + τ)U ∗(r, t + τ) . h i This is a fourth-order function of the complex field envelope at a single point in space.

1.4 References

Basic material on linear systems can be found in Kudeki and Munson(2009). Background material on random processes is discussed in Helstrom(1991) as well as Stark and Woods(1994). The subject of generalized functions was developed by Schwartz(1950). Probability distributions involving gaussian random variables are discussed in Simon(2007). Electromagnetic theory is covered in Harrington(1961), Kong(1990), and Chew(1990).

1.5 Problems

1. Linear systems Show that for any constants a and b, the definition of a linear system can be replaced by the single statement:

a x1(t) + b x2(t) a y1(t) + b y2(t), →

whenever x1(t) y1(t), and x2(t) y2(t). → → 2. Properties of the Fourier transform a) Starting with the definition of the Fourier transform and its inverse, derive the primary properties of the Fourier transform listed in Section 1.1. b) Using the modulation property of the Fourier transform and the transform pair 1 R ∞ i2πf t −i2πf t ←→ δ(f), show that e 1 e 2 dt = δ(f2 f1), thereby demonstrating that the set −∞ − e−i2πfj t of time-harmonic functions is orthogonal. { } c) The statement of the Fourier transform and the statement of its inverse are identical with the exception of the signs in the exponent. Using this observation and the differentiation property, derive the expression

dn tnf(t) in F (ω). ←→ dωn

56 3. Gram-Schmidt procedure The Gram-Schmidt procedure is a constructive method to create an orthonormal basis for the space spanned by a set of N signal vectors that are not necessarily linearly independent. Let xn(t) be a set of signal vectors. The procedure is as follows: { } a) Set ψ1(t) = x1(t)/√E1 where E1 is the signal energy.

b) Determine the component of x2(t) that is linearly independent of ψ1(t) by finding the projection of x2(t) along ψ1(t). This component is given by [x2(t) ψ1(t)]ψ1(t) where the inner product is defined in (1.1.65). ·

c) Subtract this component from x2(t). d) Normalize the difference. The resulting basis vector can be written as

x2(t) [x2(t) ψ1(t)] ψ1(t) ψ2(t) = − · . x2(t) [x2(t) ψ1(t)] ψ1(t) k − · k e) Repeat for each subsequent vector in the set forming the normalized difference between the vector and the projection of the vector onto each of the basis vectors already deter- mined. If the difference is zero, then the vector is linearly dependent on the previous vectors and does not constitute a new basis vector. f) Continue until all vectors have been used. Using this procedure, determine: a) An orthonormal basis for the space over the interval [0, 1] spanned by the functions 2 x1(t) = 1, x2(t) = sin(2πt), and x3(t) = cos (2πt). b) An orthonormal basis for the space over the interval [0, 1] spanned by the functions t −t x1(t) = e , x2(t) = e , and x3(t) = 1. 4. Gaussian pulse 2 2 a) Using the Fourier transform pair e−πt e−πf and the scaling property of the Fourier transform, show that ←→

2 2 2 2 2 2 2 e−t /2σ √2πσe−2π σ f = √2πσe−σ ω /2, ←→

and thereby prove that the timewidth-bandwidth product Trms rms for a gaussian pulse −1 W is equal to (2π) with σf is measured in hertz. b) Repeat part (a) and show that when the root-mean-squared timewidth is defined using the squared magnitude of the pulse and the root-mean-squared bandwidth is defined using the squared magnitude of the Fourier transform of the pulse, Trms rms = 1/2. W c) Derive the relationship between the root-mean-squared bandwidth rms for the signal W power and the 3 dB or half-power bandwidth h for a pulse whose power P (t) is given −t2/2σ2 − W by e P . d) An electromagnetic pulse s(t) modeled as a gaussian pulse with a root-mean-squared timewidth Trms is incident on a square-law photodetector with the electrical pulse p(t) generated by direct photodetection given by s(t) 2/2. Determine the following: | |

57 i. The root-mean-squared timewidth of p(t) in terms of Trms.

ii. The root-mean-squared timewidth of the electrical power per unit resistance Pe(t) = 2 p(t) in terms of Trms. e) Finally, rank-order the root-mean-squared timewidth of the electromagnetic pulse s(t), the electrical pulse p(t) generated by direct photodetection, and the electrical power pulse Pe(t). Is this ordering valid for any pulse shape? 5. Pulse formats Derive relationships between the root-mean-squared width, the 3 dB width, and the full- width-half-maximum width in both the time domain and the frequency− domain for: a)A rectangular pulse defined as p(t) = 1 for W/2 t W/2, and zero otherwise. − ≤ ≤ b)A triangular pulse defined as p(t) = 1 t /W for t W , and zero otherwise. − | | | | ≤ c) A lorentzian pulse defined as

2α p(t) = , t2 + α2 where α is a constant.

6. Pulse characterization The rectangular pulse p(t) defined in Problem 2.5 is used as the input to a time-invariant linear system defined by h(t) = p(t) so that the impulse response is equal to the input pulse. a) Derive the full-width-half-maximum timewidth and the root-mean-squared timewidth of 2 2 the output y(t) = p(t) ~ p(t) and show explicitly that 2σp = σy. b) Let the full-width-half-maximum width be denoted by F . Determine if the relationship 2 2 2Fp = Fy holds for each of the signals defined in Problem 2.5. 7. Passband, baseband, analytic signals, and the Hilbert transform

a) Using  se(t) = A(t) cos 2πfct + φ(t) h i i2πfct = Re (sI (t) + isQ(t)) e = Re[z(t)],

determine expressions for A(t) and φ(t) in terms of sI (t) and sQ(t). b) Verify the following relationships:

 −i2πfct i. sI (t) = Re z(t)e

 −i2πfct ii. sQ(t) = Im z(t)e iii. A(t) = z(t) | | iv. φ(t) = arg z(t)e−i2πfct

58 c) Derive a relationship for the Hilbert transform sb(t) in terms of the complex-baseband signal sI (t) + isQ(t) and the carrier frequency fc. d) Given that s(t) is a real causal function with the Fourier transform pair s(t) S (ω)+ ←→ R iSI (ω), use the conjugate symmetry properties of SR(ω) and SI (ω) to show that the Kramers-Kronig transform can be written as 2 Z ∞ ω S (ω) = S (Ω)dΩ I π ω2 Ω2 R 0 − 2 Z ∞ Ω S (ω) = S (Ω)dΩ. R π Ω2 ω2 I 0 − 8. Preservation of the commutator property Prove that the commutator property [A, B] = 0 is preserved under a change of basis. 9. Trace of an outer product Using (1.1.86), a square matrix T can be written as a weighted sum of outer products

X † T = Tmnxbnxbm. m,n Using this expression and the properties of the trace operation given in (1.1.80b) and (1.1.80c), show that the trace of a square matrix expressed as an outer product of two vectors is equal to the inner product of the same two vectors as given in (1.1.81). 10. Probability density functions a) Verify that the mean of the rayleigh probability density function

r 2 2 f(r) = e−r /2σ r 0, σ2 ≥ is σp π and that the variance is σ2(2 π/2). 2 − b) Show that as A becomes large, a ricean probability density function can be approximated by a gaussian probability density function. Why should this be expected? 11. Transformation of a function of a random variable A new probability density function f(y) is generated when a random variable x with proba- bility density function f(x) is transformed by the functional relationship y = T (x) where T is invertible over the region where f(x) is defined. a) Using the fact that the transformation must preserve probabilities on intervals so that fy(y)dy = fx(x)dx, show that

 −1  dx fy(y) = fx T (y) . dy

 −1  dx 2 x 2 2 b) Using fy(y) = fx T (y) , show that for y = x /2 and fx(x) = 2 exp( x /2σ ), dy σ − which is a rayleigh probability density function, fy(y) is an exponential probability den- sity function with an expected value σ2.

59 c) Let x = G(w, z) and y = F (w, z) be the inverse transformations that expresses the variables (x, y) in terms of the variables (w, z). For a joint probability density function fxy(x, y), the expression for fwz(w, z) is  fwz(w, z) = fxy G(w, z),F (w, z) J , | | where J is the determinant of the jacobian matrix of this transformation, which is given by | |

 ∂F ∂F   ∂w ∂z      .  ∂G ∂G  ∂w ∂z Using this expression and the transformation x = G(w, z) = z w and y = F (w, z) = w show that −

fwz(w, z) = fxy(z w, w). − d) Using the result of part (c), show that Z ∞ fz(z) = fxy(z y, y)dy. −∞ −

e) Show that if the two random variables x and y are independent, then fxy(z y, y) is a − product distribution, and the probability density function fz(z) for z is given by Z ∞ fz(z) = fx(z y)fy(y)dy = fx(z) ~ fy(z), −∞ − which is (1.2.11).

12. Marginalization The bivariate gaussian probability density function has the form

−(ax2+2bxy+cy2) px,y(x, y) = Ae .

a) Express A in terms of a, b and c.

b) Find the marginals, px(x) and py(y), and the conditionals p (x y) and p (y x). x|y | y|x | c) Find the means x , y , the variances σ2, σ2 and the correlation xy . h i h i x y h i Hint: ax2 + 2bxy + cy2 = (a b2/c)x2 + c(y + bx/c)2. − 13. Number of required terms for the Fourier series expansion of the phase function (requires nu- merics)

Let f1(φ) be the exact form for the marginal probability density function of the phase given

60 in (1.2.37) and let f2(N, φ) be the series approximation using N terms of the Fourier series as given by ∞ 1 X f (φ) = + An cos(nφ), φ 2π n=1 where F = A2/2σ2 and where the zero-frequency component is 1/2π. The coefficients of the cosine Fourier series are given byv r      1 F −F/2 F F An = e I + I , 2 π (n−1)/2 2 (n+1)/2 2

where In is the modified Bessel function of order n. Define the root-mean-squared error as follows: s Z π 2 δ(N) = 2 f1(φ) f2(N, φ) dφ. 0 −

a) How many terms are required so that the error is less than 10−6 if F =1? b) How many terms are required if F =10? c) Discuss the results with respect to the number of terms required for a specified accuracy as a function of F . 14. Joint and marginal gaussian probability density functions The joint probability density function p(x, y) is given as  1  1  x2 y2   exp 2 + 2 if xy > 0 p(x, y) = 2πσxσy −2 2σx 2σy .  0 if xy < 0 a) Show that this function is a valid probability density function. b) Sketch p(x, y) in plan view and in three dimensions. Is this joint probability density function jointly gaussian?

c) Find the marginal probability density functions px(x) and py(y) and comment on this result. 15. Coherence function and the power density spectrum a) Let R(τ) = e−|τ|ei2πfcτ . Determine the one-sided power density spectra (f). S

b) A wireless carrier has a power density spectrum (f) given by S π mathcalS(f) = 2 . (f fc) + π − Determine the total signal power P . vSee Prabhu(1969).

61 c) Determine the full-width-half-maximum width of the spectrum in part (b).

d) Estimate the coherence time τc for the spectrum in part (b). 16. Autocorrelation and the power density spectrum of a random signal using sinusoidal pulses A binary waveform consists of a random and independent sequence of copies of the pulse  1 + cos(2πt/T ) rect(t/T ) with random amplitude An for the nth term of the sequence. The start time j of the pulse sequence is a uniformly-distributed random variable over [0,T ]. The symbols transmitted in each nonoverlapping interval of length T are independent. The probability of transmitting a mark with an amplitude A is 1/2. The probability of transmitting a space with an amplitude 0 is 1/2. a) Determine the autocorrelation function of the signal. b) Determine the power density spectrum of the signal.

17. Covariance matrices Define z as a vector of N circularly-symmetric gaussian random variables with a complex co- variance matrix W given in (1.2.29). Define x as a vector of length 2N that consists of the real part Re[z] and the imaginary part Im[z] in the order x = Re[z1], ..., Re[zN ], Im[z1], ..., Im[zN ] . { } Show that the real 2N 2N covariance matrix C given by (cf. (1.2.22)) × T C = (x x )(x x ) . − h i − h i where x is a column vector formed by pairwise terms can be expressed in block form in terms of the N N complex covariance matrix W as ×   1 Re W Im W C = − . 2 Im W Re W

18. Pareto probability density function The Pareto probability density function (cf. (1.2.6)) is

 λx−(λ+1) x 1 fx(x) = ≥ . 0 x < 1

a) Show that the mean is λ/(λ 1) for λ 1 and otherwise is infinite. − ≥ b) Show that the variance is λ/[(λ 1)2(λ 2)] for λ 2 and otherwise is infinite. − − ≥ 19. Diagonalizing a covariance matrix A real covariance matrix C of a bivariate gaussian random variable is given by  1 1  = . C 1 4

a) Determine a new coordinate system (x0, y0) such that the joint probability density func- tion in that coordinate system is a product distribution and express the joint probability density function in that coordinate system as the product of two one-dimensional gaus- sian probability density functions.

62 b) Plot this probability density function using a contour plot showing the original coordi- nates (x, y) and the transformed coordinates (x0, y0). c) Determine the angle θ of rotation defined as the angle between the x axis and the x0 axis. 20. Sums of gaussian random variables A random variable G is formed from the sum of two independent gaussian random variables 2 of equal variance σ and expected values E1 and E1(1 + δ) as shown in the figure below.

a) Determine the probability that the value of G is less than E1 xσ in terms of x, σ, and − δ where x is a scaling parameter. b) Show that if x is large, then for a value of finite δ, the probability determined in part (a) is dominated by the random variable centered at E1 and that the contribution to the probability from the random variable centered at E1 + δ is negligible. 21. Material impedance

a) By substituting the complex plane-wave fields at a frequency ω

−iβ·r E = E0e be −iβ·r H = H0e hb into the two curl equations ∂B E = ∇ × − ∂t ∂D H = , ∇ × ∂t

and using the constitutive relationships along with e hb = βb, show that H0 = E0/η p b × where η = µ0/ε is the material impedance. b) Starting with

2 . n (r) = ε(r)/ε0 = εr(r) = 1 + X(r), show that the material impedance η(ω) as a function of ω can be written as

η0 η(ω) = , n(ω) p where η0 = µ0/ε0 is the impedance of free space. (The impedance of free space has a value of 377 Ω.)

63 22. Wave equation This problem discusses the derivation of the wave equation. a) Using (1.3.7), solve for the divergence term E(r) in (1.3.11) in terms of D(r). The ∇ · result consists of two terms one of which is E(r) log n2(r). · ∇ e b) State the conditions under which the divergence term can be neglected.

64 2 Examples

2.1 Filter Estimation

In communication systems, a discrete-time filter yk based on minimum mean-squared error at the filter output can be estimated directly from the received{ } signal without first estimating the channel impulse response. A detection filter estimated using this direct approach computes the form of the detection filter from the received sequence based on statistical knowledge about the data. This method results in a different filter than an equalization filter. The design of that equalization filter, described as a matched filter followed by a transversal filter, requires knowledge about the received pulse p(t) obtained from an estimate of the channel impulse response. When cast as an estimation problem, the sampled output component of the finite-length f k detection filter can be written as

L−1 X (2.1.1) f k = ymrk−m, m=0 where the sequence r consists of the received samples in additive white gaussian noise and the { k} sequence ym specifies the detection filter to be estimated. The summation in (2.1.1) can be { } written as an inner product of the complex conjugate y∗ of the desired filter of length L and a vector rk of noisy samples with (cf. (1.1.65))

yT r rT y (2.1.2) f k = k = k , where y is an L-by-one column vector whose components are the filter coefficients yk, and rk is an L-by-one column vector of noisy samples from the sequence r defined for each k as r = { k} k [r , . . . , r ]T . The error in the kth component is e = s f . k−L+1 k k k − k The filter coefficients yk are chosen to minimize the mean-squared-error objective function

J(k, y) = e e∗ . (2.1.3) h k ki The filter y that minimizes this objective function is determined by taking the gradient of (2.1.3) and equating the result to zero, where the gradient consists of the vector of all relevant partial derivatives. Because ek is a complex function, a complex gradient is used in preference to the combination of the scalar gradient on each complex component. The complex gradient is defined either asa

T y = [∂/∂u1, ∂/∂v2, . . . , ∂/∂yL] (2.1.4a) ∇

aFor details regarding the complex gradient, see Brandwood(1983).

65 or as

∗ ∗ ∗ T y∗ = [∂/∂u , ∂/∂v , . . . , ∂/∂y ] , (2.1.4b) ∇ 1 2 L where, for yj = uj + ivj,

. 1 ∂/∂yj = 2 (∂/∂uj i∂/∂vj) (2.1.5a) ∗ . 1 − ∂/∂yj = 2 (∂/∂uj + i∂/∂vj). (2.1.5b) These two forms of the complex gradient are essentially equivalent. Either form can be used. The complex gradient has the useful property that

" M # X y∗ akyk = 0 (2.1.6) ∇ k=1 for any complex vector (a1, . . . , aM ). The proof of this statement is asked in an end-of-chapter exercise. It is convenient here to choose y∗ as the complex gradient used to derive the desired filter. Differentiating by parts, this complex∇ gradient becomes

∗ ∗ y∗ J(k, y) = ( y∗ e )e + e y∗ e . ∇ ∇ k k k∇ k ∗ Because ∂yj/∂yj = 0 (cf. (2.1.6)), and ek is linear in yj, the term y∗ ek is equal to zero. The ∗ ∗ † ∗ ∇ ∗ second term is evaluated as follows. Using e = s y r (cf. (2.1.2)), the term y∗ e is written as k k − k ∇ k ∗ ∗ † ∗ y∗ e = y∗ s y r ∇ k ∇ k − k = r∗ (2.1.7) − k † ∗ ∗ because y∗ y r = r . Therefore, ∇ k k ∗ y∗ J(k, y) = ekr . (2.1.8) ∇ −h ki This means that the condition

∗ ekr = 0 (2.1.9) h ki for each k, leads to the minimum mean-squared error where 0 is the zero vector of length L. ∗ Equivalently, ekrk−` = 0 for ` = 0,...,L 1. The optimalh filter iy is the filter for which− yT r has the minimum error for each opt f opt = opt emin k. Multiplying each side of (2.1.9) by yT gives e yT r∗ = yT 0 or opt h k opt ki opt e f = 0, (2.1.10) h min opti where yT r∗ for each . This condition can be described geometrically on the complex f opt = opt k plane as an orthogonality condition between the minimum error emin and the optimal estimated data value for each , as is shown in Figure 2.1. f opt k

66 (a) (b)

sk ek sk ek

emin

fk fopt

Figure 2.1: (a) The error ek for the kth component is the vector difference on the complex plane between the value and the estimated value . (b) If the error is minimized, then sk f k the error is orthogonal to the estimated value and is orthogonal to the th observed f k k value rk for all k.

Now write the orthogonality condition in terms of the components of the sequences

L−1 ∗ D X  ∗ E e r = s ymr r = 0 for ` = 0, 1, ..., L 1. h k k−`i k − k−m k−` − m=0 Separating and equating terms gives

L−1 X ymRrr(m `) = Rrs( `) for ` = 0, 1, ..., L 1, (2.1.11) − − − m=0

∗ where Rrr(m `) = rk−m rk−` is the autocorrelation of the received sequence (cf. (??)), and ∗− Rrs( `) = r s is the cross-correlation of the received sequence with the desired data sequence. − h k−` ki Equation (2.1.11) is valid for each value of `. The resulting set of equations expressed in vector- matrix form is (cf.(??))

Ry = wrs, (2.1.12) where R is the matrix form of the statistical autocorrelation function (cf. (??)) with Rm` = Rrr(m − `),

∗ T wrs = r s = [Rrs(0),Rrs( 1),...,Rrs(1 L))] (2.1.13) h ki − − is the vector form of the cross-correlation function, and

T y = [y0, y1, . . . , yL−1] (2.1.14) is a vector of length L of the filter coefficients. The solution to (2.1.12) is given by

−1 yopt = R wrs, (2.1.15) −1 where the inverse R exists when R is a positive-definite matrix, which is almost always the case for an autocorrelation function. The causal filter described by yopt is called a finite-impulse-response Wiener filter with the set of equations specified by (2.1.12) called the Wiener-Hopf equations.

67 The error for each estimated component can be determined using (2.1.2) to rewrite the objective function given in (2.1.3 ) in terms of matrices. Then

2 † † † J(k, y) = σs y wrs wrsy + y Ry, (2.1.16) − − † ∗ T −1 where (2.1.13) is used to write wrs = skr . Provided that R exists, J(k, y) can be written as the sum of a “perfect square” term andh a residuali error term given by

−1 † −1 2 † −1 J(k, y) = (y R wrs) R(y R wrs) + σs wrsR wrs, (2.1.17) − − − showing that the objective function is an L-dimensional quadratic surface with a unique minimum −1 value given by yopt = R wrs. The error at this minimum is

2 2 † −1 σerr = σs wrsR wrs 2 − † = σ w yopt, (2.1.18) s − rs −1 where the second equation follows from the first equation using yopt = R wrs.

2.2 Constant-Modulus Objective Function

An alternative linear technique for sequence detection starts with the form of the estimate given in (2.1.2) but with a different objective function called the constant-modulus objective function. Recall that s is the sequence of data values, f is the sequence of estimated data values, { k} { k} and rk is a sequence of noisy samples from which the estimate is formed. Whereas the criterion of minimizing{ } the mean-squared error 2 (cf. (2.1.3)) was used in Section 2.1, instead, sk f k the constant-modulus method for estimationh| − of the| i filter coefficients uses the alternative criterion that the squared magnitude 2 is equal to a constant for all . This criterion is suitable for a f k k constant-magnitude signal constellation.| | The advantage of this criterion is that a cross-correlation function Rrs(`) based on the data sk is not required. An objective function that enforces{ } a constant modulus constraint isb

2 J(k, y) = f 2 1 , (2.2.1) | k| − where the squared magnitude of each symbol is normalized to one. The objective function J(k, y) is minimized when y∗ J(k, y) = 0. Using (2.1.2) to write∇ ∗ as y†r∗rT y, the complex gradient with respect to the filter y∗ (cf. f kf k k k (2.1.4bb)) has the same form as (2.1.8) with

2 † ∗ T y∗ J(k, y) = 2 f 1) y∗ (y r r y) ∇ | k| − ∇ k k = 2 ( f 2 1)(r∗rT y)) | k| − k k 2 r∗ = 2 ( f k 1) kf k | ∗| − = 2 ekr . (2.2.2) h ki bOther forms of the objective function replace the value one by a ratio of the moments of the statistical distribution of the datastream.

68 2 † where is the error in the estimate and where the property ∗ y y y of the ek = ( f k 1)f k y ( B ) = B | | − ∗ T ∇ ∗ complex gradient has been used with B = rkrk . The error is minimized when ek is orthogonal to rk. The constant-modulus objective function does not define an L-dimensional quadratic surface as was the case for the minimim mean-squared error. Therefore, there may be local minima at which the gradient is zero. Nevertheless, for small errors, the optimal filter defined by the constant-modulus objective function is a scaled form of the Wiener filter defined by the minimum mean-squared-error objective function (cf.(2.1.12)).c Setting (2.2.2) to the zero vector gives a set of equations written as

Ry = wrf , (2.2.3)

2 ∗ where R is the autocorrelation matrix of the received noisy signal, and wrf = ( f 1)f r is h | k| − k ki the cross-correlation between ( f 2 1)f and r (cf. (2.1.13)). | k| − k k 2.3 Adaptive Estimation

Examining (2.1.15), the calculation of the Wiener filter requires the cross-correlation wrs, which depends on the joint statistics of the received sequence and the data sequence. An initial estimate can be obtained by using a training sequence. The maintenance of the minimum of the objective function can be cast into an adaptive form. To do so, write the vector of filter coefficients y(k + 1) for the (k + 1)th step in terms of y(k) for the kth step and the gradient of the objective evaluated at k so that

y(k + 1) = y(k) µ y∗ J(k, y), (2.3.1) − ∇ where µ is a gain parameter that balances the rate of convergence with the accuracy. Using (2.2.2) and replacing the statistical expectation with the instantaneous value gives

∗ y(k + 1) = y(k) + µekrk. (2.3.2)

This replacement leads to a gradient that is random. This estimation method is called the stochastic gradient descent method. Accordingly, the algorithm will execute a about the optimal solution yopt with an excursion that depends on the noise. The constant-modulus algorithm can also be cast into an adaptive form following the same steps that were used to develop an adaptive form of the Wiener filter. Start with (2.2.2) and replace the expectation by its instantaneous value. Incorporating the factor of two into the definition of µ leads to

y(k + 1) = y(k) µ( f 2 1)f r∗ − | k| − k k = y(k) µe r∗, (2.3.3) − k k . where e = ( f 2 1)f is the update error. k | k| − k

cSee Treichler and Agee(1983).

69

Bibliography

D. H. Brandwood. A complex gradient operator and its application in adaptive array theory. Microwaves, Optics and Antennas, IEE Proceedings H, 130(1):11–16, 1983.

W.C. Chew. Waves and Fields in Inhomogeneous Media. IEEE Press Series on Electromagnetic Waves. Van Nostrand Reinhold, 1990.

R. F. Harrington. Time-Harmonic Electromagnetic Fields. McGraw-Hill„ New York, NY, 1961.

C. W. Helstrom. Probability and Stochastic Processes for Engineers. Macmillan; Collier-Macmillan Canada; Maxwell Macmillan International, New York: Toronto, 1991.

J. A. Kong. Electromagnetic Wave Theory. Wiley, New York, NY, 1990.

E. Kudeki and D.C. Munson. Analog Signals and Systems. Illinois ECE series. Pearson Prentice Hall, 2009.

V. K. Prabhu. Error-rate considerations for digital phase-modulation systems. IEEE Transactions on Communication Technology, 17(1):33–42, 1969.

I. Reed. On a moment theorem for complex gaussian processes. IRE Transactions on , 8(3):194–195, 1962.

L. Schwartz. Théorie des Distributions. Hermann and Cie, Paris, 1950.

M.K. Simon. Probability Distributions Involving Gaussian Random Variables: A Handbook for Engineers and Scientists. International series in engineering and computer science. Springer US, 2007.

H. Stark and J.W. Woods. Probability, Random Processes, and Estimation Theory for Engineers. Industrial and Systems Engineering. Prentice Hall, 1994.

R.S. Strichartz. A Guide to Distribution Theory and Fourier Transforms. Studies in advanced mathematics. World Scientific, 2003.

J.R. Treichler and B.G. Agee. A new approach to multipath correction of constant modulus signals. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-31:459–72, Apr. 1983.

71

Index

Adaptive estimation, 69 probability mass function, 27 Aliasing, 19 Circularly-symmetric Amplitude -gaussian probability density function, 29, 45 root-mean-squared, 13 multivariate, 30, 31 Analytic signal, 6, 58 -gaussian random variable, 29, 45 Angular frequency,4 -gaussian random vector, 30, 62 Anisotropic material, 48 Coherence Associative property,3 function, 55 Autocorrelation function, 35, 40, 62 temporal, 36 complex-baseband noise, 43 interval, 37 noise, 41 region, 55 width, 37 spatial, 55 Autocovariance function, see Covariance function temporal, 55 Axis time, 37, 55 principal, 48 Coherence function, 61 Coherence time, 45 Bandlimited Colored noise, 41 noise, 43 Commutative property,2 signal, 7 Commutator, 22 system, 41 Commuting operator, 22 waveform, 18 Complementary error function, 28 Bandwidth, 7,8 Complex conjugate transpose, 15, 30 3-dB, 8 Complex envelope effective, 38 field, 55 half-power, 8, 42 signal, 13 maximum, 18 Complex gradient, 65 noise equivalent Complex random variable, 23 baseband, 41 Complex-baseband noise complex-baseband, 44 autocorrelation function, 43 passband, 42, 44 equivalent bandwidth, 44 root-mean-squared,8, 57 power density spectrum, 43 Baseband random process, 43 noise equivalent bandwidth, 41 Complex-baseband signal, 13, 23, 59 Basis, 15 energy, 18 gaussian distribution, 29 Complex-baseband system orthonormal, 15, 21, 57 impulse response, 14 Bayes rule, 26 transfer function, 14 Bessel function Component modified, 32, 61 in-phase, 12 Birefringence, 48 quadrature, 12 Bivariate gaussian random variable, 62 Conditional probability, 25 Bivariate probability density function, 25 Conditional probability density function, 25 gaussian, 29, 60 Conjugate symmetric signal,6 Boundary conditions, 48 Constant free-space permeability, 47 Carrier free-space permittivity, 47 frequency,5 Constant-modulus objective function, 68 Causal impulse response, 40 Constitutive relationships, 48, 50 Causality,3 Continuous-time Central limit theorem, 34 system,2 Centroid,8 Convolution, 2 Characteristic function, 27 associative property,3

73 commutative property,2 complex signal, 13 distributive property,2 Equalizer Fourier transform of,5 constant modulus, 68 Correlation, 26 Ergodic process, 36 Correlation coefficient Error function, 28, 33 random variable, 29 complementary, 28 Covariance, 26, 28 Estimation Covariance function, 36 adaptive, 69 Covariance matrix, 29 detection filter, 65 complex, 30 Euclidean distance, 17 real, 28 squared, 17 Cross-coherence function, 56 Expectation, 24 Cumulative probability distribution function, 23 Expected value, 24 Curl, 47, 50 Exponential probability density function, 34, 59 Exponential random variable, 34 Density function, see Probability density function Determinant, 20 Field Dielectric material, 48, 51 electric, 46 Dielectric waveguide, 54 electromagnetic, 46 Differential equation magnetic, 46 Helmholtz, 52 monochromatic, 47, 50 Dipole, 48 narrowband, 54 Dirac impulse, 2 plane-wave, 52 Dispersion Filter material, 49 estimation, 65 Dispersion relationship, 53 linear shift-invariant,2 plane-wave, 53 Wiener, 67 Dispersive material, 49 filter Distance causal,3 euclidean, 17 First moment, 24 squared, 17 Flux density, 47 Distributive property,2 Fourier series, 18 Divergence, 47 cosine, 61 Dyadic product, 17 Fourier transform,4, 12 convolution property,5 Effective timewidth,8 differentiation property,5 Eigenfunction, 7, 53 inverse,4 Eigenmode,7, 53 modulation property,5 Eigenvalue, 7, 21, 53 properties,4, 56 Eigenvector, 7, 20, 21 scaling property,4 -of a covariance matrix, 30 spatial, 4 Electric field, 46 temporal, 4 complex representation, 51 Free-space Electric flux density, 47 impedance, 63 Electrical pulse, 57 permeability, 47 Electromagnetic permittivity, 47 ergodic, 55 phase velocity, 53 Electromagnetic field, 46 wavelength, 53 complex, 56 wavenumber, 47, 53 complex envelope, 55 Frequency complex-baseband, 54 angular,4 random, 36, 55 carrier,5 spatially random, 55 spatial, 4 temporally random, 55 Full-rank matrix, 20 Electromagnetic signal Function power, 54 coherence, 55 Electromagnetic wave, 50 complementary error, 28 Energy error, 28, 33 complex-baseband signal, 18 gamma, 34 passband signal, 17 generalized,2 Ensemble, 36 modified Bessel, 32, 61 Envelope rect, 10 complex field, 55 signum,3

74 sinc, 11 Isserlis theorem, 25 triangular, 39 unit-step,3 Jacobian matrix, 60 Joint probability density function, 25 Gamma probability density function, 34 Gamma random variable, 34 Kramers-Kronig transform, 6, 50, 59 Gaussian probability density function, see Probability Kronecker impulse, 2, 15 density function, gaussian joint, 61 Laplace recursion formula, 20 marginal, 61 Linear causal media, 49 Gaussian pulse, 12 Linear shift-invariant filter,2 Gaussian random process, see Random process, gaussian Linear time-invariant system, 7, 51 Gaussian random variable, see Random variable, gaussian Linearity,1 Generalized function,2 Linearly independent set, 15 Gradient complex, 65 Magnetic field, 46 Gram-Schmidt procedure, 57 complex representation, 51 Magnetic flux density, 47 Half-power bandwidth, 42 Marginal probability density function, 25 Harmonic oscillator Marginalization, 25, 60 classical, 49 Material Heisenberg uncertainty relationship, 10 anisotropic, 48 Helmholtz equation, 47, 51 dielectric, 48, 51 scalar, 52 dispersive, 49 vector, 47, 51 homogeneous, 48 Hermitian matrix, 21, 29 inhomogeneous, 48 Hilbert space, 16 isotropic, 48 Hilbert transform, 6, 58 nondispersive, 49 Homogeneity,1 Material impedance, 63 Homogeneous material, 48 Material polarization, 48 Homogeneous system,2 Matrix conjugate, 20 Identity matrix, 21 conjugate transpose, 20 Impedance, 52 covariance, see Covariance matrix free-space, 63 full-rank, 20 Impulse hermitian, 21 Dirac, 2 identity, 21 Kronecker, 2, 15 nonhermitian, 22 sifting property,2 normal, 22 Impulse response,1, 2, 58 projection, 21 causal, 40, 42 rank, 20 complex-baseband, 14 singular-value decomposition, 22 passband, 14 square, 20 right-sided,3 trace, 20 shift-invariant,2 outer product, 59 time-varying,2 transformation, 15 In-phase noise component, 43 transpose, 20 In-phase signal component, 12 unitary, 21 Independent Maxwell’s equations, 46 -random variable, 26 complex representation, 51 random vector, 35 Mean, 24 Index of refraction, 47, 50, 53 Mean-squared value angularly-dependent, 48 -frequency,8 Inequality timewidth, 8 Schwarz, 17 Memoryless system,3 time-bandwidth, 10 Minimum mean-squared error Inhomogeneous material, 48 orthogonality condition, 66 Inner product, 16, 21 Mode Intensity, 54 -of a linear system,7 Intensity autocoherence function, 56 transverse electromagnetic (TEM), 52 Interpolation, 19 Modified Bessel function, 32, 61 Inverse Fourier transform,4 Modulus,6 Isotropic material, 48 Moment

75 nth, 24 Permittivity, 47, 49 first, 24 Phase noise Monochromatic probability density function, 33 field, 47, 50 Phase probability density function, 60 Multivariant gaussian probability density function, 28 Phase velocity, 53 Mutual coherence function, 55 Photodetector square-law, 57 Narrowband Plane wave, 52, 63 field, 54 Poison summation formula, 11 signal, 18 Polarization (field), 48 Noise Polarization (material), 48 additive white gaussian, 40 Position vector, 52 autocorrelation function, 41 Posterior probability, 26 bandlimited, 43 Power colored, 41 effective bandwidth, 38 complex-baseband electromagnetic, 54 autocorrelation function, 43 Power density spectrum, 36, 61 equivalent bandwidth, 44 complex-baseband noise, 43 equivalent bandwidth, 41 electrical in-phase component, 43 per unit resistance, 40 passband, see Passband, noise one-sided, 37 power, 41 passband noise, 43 passband, 44 random binary waveform, 39 quadrature component, 43 two-sided, 37 thermal, 46 noise, 40 white, 41 Poynting vector, 53 Noise figure, 45 complex, 54 Nondispersive material, 49 Principal axis, 48 Normal matrix, 22 Probability density function, 23 Nyquist rate, 19 bivariate, 25 Nyquist-Shannon sampling theorem, 18 gaussian, 29, 60 conditional, 25 Objective function exponential, 34, 59 constant-modulus, 68 first-order, 35 mean-squared error, 65 gamma, 34 Operator gaussian, 27 commuting, 22 circularly-symmetric, 29, 31, 45 convolution, 26 complex, 31 determinant, 20 multivariate, 28, 30, 31 trace, 20 joint, 25 Orthogonal signals, 18 marginal, 25 Orthogonality condition Pareto, 25, 62 minimum mean-squared error, 66 phase, 33, 60 Orthonormal basis, 15, 21, 57 rayleigh, 31, 33, 59 Outer product, 16, 20 ricean, 31, 32, 59 orthonormal basis vector, 21 second-order, 35 Probability distribution function, 23 Pareto index, 25 Probability mass function, 23 Pareto probability density function, 62 Gordon, see Gordon distribution Parseval’s relationship, 5, 17, 38, 41 Product distribution, 26 Passband gaussian, 30 impulse response, 14 Projection, 16, 57 noise, 43 matrix, 21 power density spectrum, 43 Propagation constant, 52 noise equivalent Propagation vector, 52 bandwidth, 44 Pulse noise equivalent bandwidth, 42 chirp, 12 signal, 12 gaussian, 12, 57 energy, 17 lorentzian, 12, 58 Passband noise process, 43 quadratic phase, 12 Period rectangular, 10 temporal, 52 sinc, 11 Permeability, 47 triangular, 58

76 Quadrature noise component, 43 Sifting property of an impulse, 2 Quadrature signal component, 12 Signal, 1 analytic, 6, 58 Random process, 23, 35 bandlimited, 7 circularly-symmetric gaussian, 45 complex envelope, 13 complex-baseband noise, 43 complex-baseband, 13, 23, 59 ergodic, 36 conjugate symmetric,6 gaussian, 36 narrowband, 18 stationary orthogonal, 18 strict sense, 36 passband, 12 widesense, 36 right-sided,3 Random telegraph signal, 38 spectrum,4 Random variable, 23 square-integrable,4 nth moment, 24 timelimited,7 bivariate Signal model gaussian, 28 wave-optics, 50 complex, 23 Signal space, 15 correlation, 26 basis, 15 expectation, 24 definition of distance, 17 exponential, 34 Signal vector, 15 first moment, 24 basis, 57 gamma, 34 euclidean distance, 17 gaussian, 27 Signum function,3 bivariate, 28 Silica glass, 51 circularly-symmetric, 29, 45 Sinc pulse, 11 complex, 30 Singular value, 22 multivariate, 28 Singular-value decomposition, 22 uncorrelated, 29 Space-invariant system,2 independent, 26 Span, 15 mean, 24 Spatial coherence rayleigh, 33 function, 55 realization, 23 Spatial coherence function, 55 ricean, 32 Spatial Fourier transform, 4 root-mean-squared value, 24 Spatial frequency, 4 uncorrelated, 26 Spectral susceptibility, 50 uniform, 38 Spectrum,4 variance, 24 Speed of light Random vector material, 51 gaussian vacuum, 50 complex, 30 Square-integrable signal,4 Rayleigh probability density function, 31, 33, 59 Standard deviation, 24 Rayleigh random variable, 33 Stationarity Realization strict sense, 36 random variable, 23 widesense, 36 sample function, 23 Stationary random process, 36 Rectangular pulse, 10 Statistic, 23 Relative permittivity, 49 Stochastic gradient decent method, 69 Ricean probability density function, 31, 32, 59 Stochastic process, 23 Ricean random variable, 32 Submatrix, 20 Root-mean-squared Superposition,1 -bandwidth,8, 57 Superposition integral,2 -timewidth, 8, 57 Susceptibility, 48, 49 Root-mean-squared amplitude, 13, 24 spectral, 50 Susceptibility tensor, 49 Sample, 18 System,1 Sample function, 23, 36 additive,2 Sampling rate, 19 bandlimited, 41 Sampling theorem, 18 causal,3 Scalar,2 continuous time,2 Scalar Helmholtz equation, 52 homogeneous,2 Schwarz inequality, 17 linear, 1,1 Self-adjoint transformation, 21 linear time-invariant, 7, 51 Shift-invariant system,2 memoryless,3

77 shift-invariant,2 Transformation space-invariant,2 self-adjoint, 21 spatially local,3 unitary, 21 time-invariant,2 Transformation matrix, 15 Transverse electromagnetic mode (TEM), 52 Temporal coherence, 55 Triangular pulse, 58 Temporal coherence function, 55 Temporal Fourier transform, 4 Unit-step function,3 Tensor product, 17 Unitary matrix, 21 Theorem Unitary transformation, 21 central limit, see Central limit theorem Isserlis, 25 sampling, see Sampling theorem Variance, 24 Wiener-Khintchine, 37 Vector field, 51 Thermal noise, 46 Velocity Time-bandwidth inequality, 10 phase, see Phase velocity Time-invariant system,2 Timelimited signal,7 Wave equation, 50, 64 Timewidth, 7, 49 Wave optics effective,8 signal model, 50 root-mean-squared, 8, 57 Waveform Timewidth-bandwidth product, 9, 57 bandlimited, 18 Trace, 20 random binary, 38 outer product, 59 Waveguide Transfer function, 7, 40 dielectric, 54 baseband Wavelength, 52 noise equivalent bandwidth, 42 free-space, 53 complex-baseband, 14 Wavenumber, 51, 53 passband, 14 free-space, 47, 53 noise equivalent bandwidth, 42 Wavevector, 52 Transform White noise, 41 Fourier, 4, 12 Wiener filter, 67 Hilbert, 6, 58 Wiener-Hopf equations, 67 Kramers-Kronig, 6, 50, 59 Wiener-Khintchine theorem, 37

78