Observational Astrophysics II
JOHAN BLEEKER SRON Netherlands Institute for Space Research Astronomical Institute Utrecht
FRANK VERBUNT Astronomical Institute Utrecht
January 18, 2007 Contents
1RadiationFields 4 1.1 Stochastic processes: distribution functions, mean and variance ...... 4 1.2Autocorrelationandautocovariance...... 5 1.3Wide-sensestationaryandergodicsignals...... 6 1.4Powerspectraldensity...... 7 1.5 Intrinsic stochastic nature of a radiation beam: Application of Bose-Einstein statistics ...... 10 1.5.1 Intermezzo: Bose-Einstein statistics ...... 10 1.5.2 Intermezzo: Fermi Dirac statistics ...... 13 1.6 Stochastic description of a radiation field in the thermal limit ...... 15 1.7 Stochastic description of a radiation field in the quantum limit ...... 20
2 Astronomical measuring process: information transfer 23 2.1Integralresponsefunctionforastronomicalmeasurements...... 23 2.2Timefiltering...... 24 2.2.1 Finiteexposureandtimeresolution...... 24 2.2.2 Error assessment in sample average μT ...... 25 2.3 Data sampling in a bandlimited measuring system: Critical or Nyquist frequency 28
3 Indirect Imaging and Spectroscopy 31 3.1Coherence...... 31 3.1.1 The Visibility function ...... 31 3.1.2 Young’sdualbeaminterferenceexperiment...... 31 3.1.3 Themutualcoherencefunction...... 32 3.1.4 Interference law for a partially coherent radiation field: the complex degreeofcoherence...... 34 3.2Indirectspectroscopy...... 36 3.2.1 Temporalcoherence...... 36 3.2.2 Longitudinalcorrelation...... 37 3.3Indirectimaging...... 39 3.3.1 Quasi-monochromatic point source: spatial response function of a two- elementinterferometer...... 39 3.3.2 Quasi-monochromatic extended source: spatial or lateral coherence . . . 44 3.3.3 TheVanCittert-Zerniketheorem...... 45 3.3.4 Etendue of coherence ...... 48 3.4Aperturesynthesis...... 50 3.4.1 Quasi-monochromatic point source: spatial response function (PSF) and optical tranfer function (OTF) of a multi-element interferometer . . 50
1 3.4.2 EarthRotationApertureSynthesis(ERAS)...... 56 3.4.3 TheWesterborkRadioSynthesisTelescope(WSRT)...... 57 3.4.4 Maximum allowable bandwidth of a quasi-monochromatic source . . . . 61 3.4.5 The general case: arbitrary baselines and a field of view in arbitrary direction...... 62
4 Radiation sensing: a selection 64 4.1General...... 64 4.2Heterodynedetection...... 64 4.3 Frequency limited noise in the thermal limit: signal to noise ratio and limiting sensitivity...... 67 4.4 Photoconductive detection ...... 73 4.4.1 Operationprinciple...... 73 4.4.2 Responsivity of a photoconductor ...... 73 4.4.3 Temporal frequency response of a photoconductor ...... 76 4.5Frequencylimitednoiseinthequantumlimit...... 77 4.5.1 Frequency limited shot noise in the signal-photon limit ...... 77 4.5.2 Example: Shot noise in the signal-photon limit for a photoconductor . . 79 4.5.3 Shot noise in the background-photon limit for a photoconductor: the normalized detectivity D∗ ...... 80 4.6ChargeCoupledDevices...... 81 4.6.1 Operationprinciple...... 81 4.6.2 ChargestorageinaCCD...... 81 4.6.3 ChargetransportinaCCD...... 84 4.6.4 Charge capacity and transfer speed in CCD structures ...... 87 4.6.5 Focalplanearchitectures...... 88 4.6.6 WavelengthresponseofCCDs...... 91
5 Fitting observed data 95 5.1Errors...... 95 5.1.1 Accuracyversusprecision...... 95 5.1.2 Computingthevariousdistributions...... 97 5.2 Error propagation ...... 99 5.2.1 Examples of error propagation ...... 100 5.3 Errors distributed as a Gaussian and the least squares method ...... 101 5.3.1 Weightedaverages...... 102 5.3.2 Fittingastraightline...... 103 5.3.3 Non-linearmodels:Levenberg-Marquardt...... 105 5.3.4 Optimalextractionofaspectrum...... 106 5.4 Errors distributed according to a Poisson distribution and Maximum likelihood 107 5.4.1 Aconstantbackground...... 107 5.4.2 Constantbackgroundplusonesource...... 108 5.5GeneralMethods...... 108 5.5.1 Amoebe...... 108 5.5.2 Geneticalgorithms...... 109 5.6Exercises...... 109 6 Looking for variability and periodicity 112 6.1Fittingsine-functions:Lomb-Scargle...... 112 6.2 Period-folding: Stellingwerf ...... 114 6.3 Variability through Kolmogorov-Smirnov tests ...... 114 6.4Fouriertransforms...... 115 6.4.1 ThediscreteFouriertransform...... 116 6.4.2 Fromcontinuoustodiscrete...... 118 6.4.3 Thestatisticsofpowerspectra...... 120 6.4.4 Detectingandquantifyingasignal...... 121 6.5Exercises...... 121 Chapter 1
Radiation Fields
1.1 Stochastic processes: distribution functions, mean and variance
A stochastic process is defined as an infinite series of stochastic variables, one for each value of the time t. For a specific value of t,thestochastic variable X(t) possesses a certain probability density distribution. The numerical value of the stochastic variable X(t)attimet corresponds to a particular draw (outcome) from this probability distribution at time t.Thetimeseries of draws represents a single time function and is commonly referred to as a realisation of the stochastic process. The full set of all realisations is called the ensemble of time functions (see Fig. 1.1). At each time t, the stochastic variable X(t) describing the stochastic process, is distributed
Figure 1.1: The ensemble of time functions of a stochastic process.
4 by a momentary cumulative distribution, called the first-order distribution:
F (x; t)=P{X(t) ≤ x} (1.1) which gives the probability that the outcome at time t will not exceed the numerical value x. The probability density function, or first-order density, of X(t) follows from the derivative of F (x; t): ∂F(x; t) f(x; t) ≡ (1.2) ∂x This probability function may be a binomial, Poisson or normal (Gaussian) distribution. For the determination of the statistical properties of a stochastic process it suffices in many applications to consider only certain averages, in particular the expected values of X(t)and of X2(t). The mean or average μ(t)ofX(t) is the expected value of X(t) and is defined as
+ ∞ μ(t)=E{X(t)} = xf(x; t) dx (1.3) −∞
The variance of X(t) is the expected value of the square of the difference of X(t)andμ(t) and is, by definition, equal to the square of the standard deviation:
σ2(t)=E{(X(t) − μ(t))2} = E{X2(t)}−μ2(t) (1.4)
(in case of real functions X(t)andμ(t)).
1.2 Autocorrelation and autocovariance
This distribution can be generalized to higher-order distributions, for example the second order distribution of the process X(t) is the joint distribution
G(x1,x2; t1,t2)=P{X(t1) ≤ x1,X(t2) ≤ x2} (1.5)
The corresponding second derivative follows from
2 ∂ G(x1,x2; t1,t2) g(x1,x2; t1,t2) ≡ (1.6) ∂x1 ∂x2
The autocorrelation R(t1,t2)ofX(t) is the expected value of the product X(t1) · X(t2)or ∗ X(t1) · X (t2) if we deal with complex quantities:
+ ∞ + ∞ { · ∗ } ∗ ∗ R(t1,t2)=E X(t1) X (t2) = x1 x2 g(x1,x2; t1,t2) dx1dx2 (1.7) −∞ −∞ with the asterisk (∗) indicating the complex conjugate in case of complex variables (see Fig. 1.2). The value of R(t1,t2) on the diagonal t1 = t2 = t represents the average power of the signal at time t: R(t)=R(t, t)=E{X2(t)} = E{|X(t)|2 } (1.8)
5 Figure 1.2: The autocorrelation of a stochastic process.
Since the average of X(t) is in general not zero, another quantity is introduced, the autoco- variance C(t1,t2), which is centered around the averages μ(t1)andμ(t2):
∗ C(t1,t2)=E{(X(t1) − μ(t1)) · (X(t2) − μ(t2)) } (1.9)
For t1 = t2 = t (verify yourself)
C(t)=C(t, t)=R(t, t)−|μ(t)|2= σ2(t) (1.10)
C(t) represents the average power contained in the fluctuations of the signal around its mean value at time t.
1.3 Wide-sense stationary and ergodic signals
Next, consider a signal for which the average does not depend on time, and for which the autocorrelation only depends on the time difference τ ≡ t2 − t1. This is called a wide-sense stationary (wss) signal. The following relations hold:
signal average μ(t)=μ = constant (1.11)
autocorrelation R(t1,t2)=R(τ) (1.12) 2 autocovariance C(t1,t2)=C(τ)=R(τ) − μ (1.13)
6 For τ =0: R(0) = μ2 + C(0) = μ2 + σ2 (1.14) which expresses that the total power in the signal equals the power in the average signal plus the power in its fluctuations around the average value. Note that both the autocorrelation and the autocovariance are even functions, i.e. R(−τ)=R(τ)andC(−τ)=C(τ). Finally, a wss-stochastic process X(t)isconsideredergodic when the momentaneous av- erage over X(t) can be interchanged with its time average, i.e. when
1 + 2 T 1 μ = lim X(t) dt = E{X(t)} (1.15) T →∞ T − 1 2 T 1 + 2 T 1 and R(τ) = lim X∗(t) · X(t + τ) dt = E{X∗(t) · X(t + τ)} (1.16) T →∞ T − 1 2 T
1.4 Power spectral density
The autocorrelation function R(τ) defined in equations 1.7 and 1.16, has a Fourier transform
+ ∞ S(s)= R(τ) · e−2πisτ dτ (1.17) −∞
Since R(τ) has the dimension of a signal power (e.g. Watt), the dimension of S(s)is[power × time], which corresponds to power per unit frequency, e.g. Watt·Hz−1.Thequantity S(s) is called the power spectral density (PSD), which represents the double-sided (mean- ing: frequency −∞ → +∞) power density of a stochastic variable X(t). The Fourier pair R(τ) ⇔ S(s) is commonly referred to as the Wiener-Khinchine relation.Forτ =0this relation becomes: + ∞ R(0) = E |X(t)|2 = S(s) ds (1.18) −∞ and if the signal average μ =0:
+ ∞ R(0) = C(0) = σ2 = S(s) ds (1.19) −∞ i.e. the power contained in the fluctuations of a signal centered around zero mean follows from the integration of the spectral power density over the entire frequency band. Since physical frequencies run from zero (i.e. DC) to +∞, the one-sided power spectral density SOS(s)=2S(s) is commonly used. The total signal power follows from integration of this entity over the physical frequency domain (see also figure 1.3): ∞ R(0) = SOS(s) ds (1.20) 0 Figure 1.4 shows two examples of SOS(s), displaying the PSDs which relate to the charac- teristic light curves of certain compact X-ray binary systems, the so-called Low Mass X-Ray
7 Figure 1.3: Power spectra: the area under the square of the function h(t) (panel a) equals the area under its one-sided power spectrum at positive frequencies (panel b) and also equals the are under its double-sided spectrum (panel c). Note the factor 2 in amplitude between the one-sided and double-sided spectrum. Figure taken from Press et al. 1992.
Binaries (LMXBs). The binary source GX5-1 shows a “red-noise” excess, i.e. the power in- creases at low frequencies, roughly following a power law dependence proportional to s−β, on which a broad peak around a central frequency s0 is superimposed. This peak refers to so-called quasi-periodic oscillations (QPOs), indicating that the X-ray source produces temporarily coherent oscillations in luminosity. The other source Sco X-1, shows a similar be- haviour, the red-noise referring to stochastic (i.e. non-coherent) variability in X-ray luminosity is largely absent in this case. Note: relation 1.16 defines the autocorrelation function of a stochastic process as a time average. If the stochastic variable X(t) is replaced by a general function f(u), the autocorre- ∞ lation is defined as (provided that −∞ |f(u)| du is finite):
+ ∞ + ∞ R(x)= f ∗(u) · f(u + x) du = f ∗(u) · f(u − x) du (1.21) −∞ −∞ where no averaging takes place. The Fourier transform Φ(s)ofR(x)isnowgivenby + ∞ Φ(s)= R(x) · e−2πisx dx = F (s) · F ∗(s)=|F (s)|2 (1.22) −∞
8 Figure 1.4: One-sided power spectra of the X-ray sources GX 5-1 and Sco X-1. in which f(u) ⇔ F (s). This is the Wiener-Khinchin theorem for a finite function f(u), which is only non-zero over a limited interval of the coordinate u. For x = 0 this results in: + ∞ + ∞ + ∞ R(0) = |f(u)|2 du = Φ(s) ds = |F (s)|2 ds (1.23) −∞ −∞ −∞ The relation: + ∞ + ∞ |f(u)|2 du = |F (s)|2 ds (1.24) −∞ −∞
9 is referred to as Parseval’s theorem, each integral represents the amount of “energy” in a system, one integral being taken over all values of a coordinate (e.g. angular coordinate, wavelength, time), the other over all spectral components in the Fourier domain, see also figure 1.3. The function Φ(s), therefore, has the dimension of a energy density, quite confusingly (and erroneously) in the literature this is also often referred to as a power density, although no averaging process has taken place.
1.5 Intrinsic stochastic nature of a radiation beam: Applica- tion of Bose-Einstein statistics
Irrespective of the type of information carrier, e.g. electromagnetic or particle radiation, the incoming radiation beam in astronomical measurements is subject to fluctuations which derive from the incoherent nature of the emission process in the information (=radiation) source. For a particle beam this is immediately obvious from the corpuscular character of the radiation, however for electromagnetic radiation the magnitude of the fluctuations depends on whether the wave character or the quantum character (i.e. photons) dominates. By employing Bose- Einstein statistics, the magnitude of these intrinsic fluctuations can be computed for the specific case of a blackbody radiation source which is incoherent or “chaotic” with respect to time. Photons are bosons, which are not subject to the Pauli exclusion principle, consequently many bosons may occupy the same quantum state.
1.5.1 Intermezzo: Bose-Einstein statistics The world is quantum-mechanical. One way of describing this, is to note that particles in a unit volume of space (’a cubic centimeter’) are distributed in momentum-space in boxes with a size proportional to h3,whereh is Planck’s constant. At each energy, there is a finite number Z of boxes (where Z ∝ 4πp2dp with p the particle momentum). Consider ni particles, each with energy i, and call the number of boxes available at that energy Zi. Bosons can share a box. The number of ways W (ni)inwhichtheni bosons can be distributed over the Zi boxes is given by
(ni + Zi − 1)! W (ni)= (1.25) ni!(Zi − 1)!
(To understand this: the problem is equivalent to laying ni particles and Zi − 1 boundaries in a row. The number of permutations is (ni + Zi − 1)!; then note that the particles and boundaries can be interchanged.) Similarly, we put nj bosons in Zj boxes, etc. The number ∞ of ways in which N = i=1 ni bosons can be distributed over the boxes in momentum space ∞ thus is: W =Πi=1W (ni). The basic assumption of statistical physics is that the probability of a distribution is pro- portional to the number of ways in which this distribution can be obtained, i.e. to W . The physics behind this assumption is that collisions between the particles will continuously re- distribute the particles over the boxes. The most likely particle distribution is thus the one which can be reached in most different ways. We find this maximum in W by determining the maximum of ln W , by setting its derivative to zero:
∞ ∞ ∂ ln W (n ) ln W = ln W (n ) ⇒ ΔlnW = i Δn = 0 (1.26) i ∂n i i=1 i=1 i
10 We consider one term from the sum on the right-hand side. With Stirlings approximation ln x! x ln x − x for large x we get ln W (ni)=(ni + Zi − 1) ln(ni + Zi − 1) − (ni + Zi − 1) − ni ln ni + ni − (Zi − 1) ln(Zi − 1)
+(Zi − 1) = (ni + Zi − 1) ln(ni + Zi − 1) − ni ln ni − (Zi − 1) ln(Zi − 1) (1.27)
For a nearby number ni +Δni,wehave
∂ ln W (ni) ΔlnW (ni) ≡ ln W (ni +Δni) − ln W (ni) Δni ∂ni =Δni [ln(ni + Zi − 1) − ln ni] (1.28) to first order in Δni. For equilibrium, i.e. the most likely particle distribution, we now have to set: ∞ ΔlnW = Δni [ln(ni + Zi − 1) − ln ni] = 0 (1.29) i=1
Since we consider a system in thermodynamic equilibrium, i.e. for which the number of particles N = ni per unit volume and the energy E = nii per unit volume are constant, the variations in ni must be such as to conserve N and E i.e. ∞ ΔN = Δni = 0 (1.30) i=1 and ∞ ΔE = iΔni = 0 (1.31) i=1 These restrictions imply that we also have: ∞ ΔlnW − αΔN − βΔE = Δni [ln(ni + Zi − 1) − ln ni − α − βi] = 0 (1.32) i=1
The sum in Eq. 1.32 is zero for arbitrary variations Δni, provided that for each i n 1 ln(n + Z − 1) − ln n − α − β =0⇒ i = (1.33) i i i i α+βi Zi − 1 e − 1 which is the Bose Einstein distribution. Since Zi 1, ni/(Zi − 1) can be replaced by ni/Zi, which represents the average occupation at energy level i (occupation number). The values of α and β depend on the total number of particles and the total energy, and can be determined from these by substituting ni in ∞ ∞ N = i=1 ni and in E = i=1 nii. For the Planck function, the number of photons need not be conserved: e.g. an atom can absorb a photon and jump from orbit 1 to orbit 3, and then emit 2 photons by returning via orbit 2. Thus, the Lagrange condition Eq. 1.30 does not apply to photons, and we obtain the Planck function by dropping α in Eq. 1.33: n 1 i = = n (1.34) βi νk Zi − 1 e − 1 nνk is the average occupation number of photons at frequency νk. Note that photons do not collide directly with one another, but reach equilibrium only via interaction with atoms.
11 We can make the connection to thermodynamics by taking
S ≡ k ln W ⇒ ΔS = kΔlnW (1.35)
With Eqs. 1.32 we get ΔS = kαΔN + kβΔE (1.36) and with T ΔS = −ζΔN +ΔE we find that β =1/(kT); here ζ ≡−α/β is the thermody- namical potential per particle.
Fluctuations around equilibrium
To investigate the fluctuations we look at the number of ways in which ni +Δni particles can be distributed, as compared to that for ni particles:
2 2 ∂ ln W (ni) Δni ∂ ln W (ni) ln W (ni +Δni)=lnW (ni)+Δni + 2 (1.37) ∂ni 2 ∂ni where we now include the 2nd order term. In equilibrium the term in Eq. 1.37 proportional to Δni is zero. Consequently, we can rewrite Eq. 1.37 as
W (n ) 2 − i 2 ∂ ln W (ni) 2 Δni ≡− W (ni +Δni)=W (ni)e where W (ni) 2 (1.38) ∂ni
In other words, the probability of a deviation Δni drops exponentially with the square of Δni, 2 i.e. the probability of Δni is given by a Gaussian! The average value for Δni is found by integrating over all values of Δni: