CHAPTER 2

THEORY OF POLARIZED RADIATION

2.1. Introduction

The framework of will be used in the following four chapters to develop the theory of polarized radiative transfer in stellar atmospheres permeated by magnetic fields. Scattering and the Hanle effect will also be covered by the classical treatment. This allows us to establish the direct correspondence between the classical and quantum worlds, when the quantum field theory is developed in Chapters 7 and 8. The prior development of the classical theory is needed for a good understanding of the physical contents of the often abstruse formalism of quantum field theory. In the classical treatment the link between the macroscopic and microscopic properties of the medium is most conveniently described in terms of the macro- scopic complex refractive index n, which is determined by the collective effect of the oscillating electric dipoles in the individual atoms. The atomic dipole moment is induced by the electric field of the electromagnetic radiation and is modified by the ambient magnetic field. The presence of the dipoles determines the dis- persion relation for the electromagnetic waves, and this dispersion relation can be parametrized in terms of the refractive index n. The of the electromagnetic waves can be treated using several different powerful tools, in particular the Jones vector, the coherency matrix, or the Stokes vector. Within these formalisms, the effect of the medium is generally described by non-commuting matrix operators, leading to matrix representations of the radiative transfer problem. The primary physical effects, derived from first principles within a classical framework, are most naturally expressed in terms of Jones vectors. The Jones formalism is however unable to describe a statistical ensemble of uncorrelated , which is needed to relate the theory to the world of observations. The statistical properties of the radiation field can be described by the coherency ma- trix and Stokes vector formalisms, which are equivalent to each other, and which can be derived from the Jones vectors. The Stokes formalism is directly related to the effect of measuring devices and is therefore the natural one to use for the interpretation of observations. The coherency matrix, which contains the equiva- lent polarization information, is the most natural representation for the radiation field in the quantum field theory, since it relates most directly to the creation and annihilation operators used there. A fourth way of describing polarized radiation is in terms of complex spherical vectors. It is needed when evaluating the interaction of the electromagnetic waves with the oscillating dipoles in a magnetic field, since the coupling between the

31 32 CHAPTER 2 dipole vector components caused by the v × B force of the magnetic field can be removed when transforming from a Cartesian to a spherical vector system. This formalism is most naturally introduced in the next chapter, where the interaction between radiation and matter is treated.

2.2. Maxwell’s Equations

The standard set of equations for the electromagnetic field can be written (cf. Jackson, 1975) ∇ · D = ρc , ∇ · B = 0 , ∂B ∇ × E = − , (2.1) ∂t ∂D ∇ × H = j + . ∂t The first of these four equations expresses Coulomb’s law after Gauss’ theorem has been applied (ρc is the charge density), while the second expresses the absence of magnetic monopoles. The third equation represents Faraday’s law, the fourth Amp`ere’s law. The displacement current ∂D/∂t is necessary to ensure that the continuity equation for the charge and current densities, ∂ρ c + ∇ · j = 0 , (2.2) ∂t is satisfied. In addition we need two equations relating D and B to E and H. We will only make use of the equations valid for vacuum, i.e.,

D = 0E , (2.3) B = µ0H, which contain the universal constants −12 −1 0 = 8.854 × 10 farad m , −7 −1 µ0 = 4π × 10 henry m , representing the permittivity and permeability of the vacuum. Often in electromagnetic theory the macroscopic effects of a medium are de- scribed in terms of a dielectric constant  and a magnetic permeability µ in Eq. (2.3). Such concepts however represent a phenomenological element of the theory, which hides the underlying physics and has led to physically inconsistent treatments in the past. We will therefore entirely refrain from introducing any  or µ at all, to allow us to work from a rigorous version of Maxwell’s theory. All the effects of matter will enter exclusively via the charge density ρc and the current density j. THEORY OF POLARIZED RADIATION 33

2.3. The Poynting Flux

An energy equation containing an expression for the radiative energy flux (the Poynting flux) can be obtained by multiplying in Eq. (2.1) Faraday’s law by −H, Amp`ere’s law by E, and adding them. This gives ∂D ∂B −H · (∇ × E) + E · (∇ × H) = j · E + E · + H · . (2.4) ∂t ∂t Since the left hand side can be written as −∇ · (E × H), we obtain

∂ 0 1 E2 + B2 + ∇ · (E × H) = −j · E . (2.5) ∂t  2 2µ0  This represents a continuity equation for the energy. The first term on the left hand side is the time variation of the energy density of the electromagnetic field, the second term is the divergence of the energy flux E × H, while the right hand side describes the energy losses due to Joule heating by the electric currents. If we are dealing with complex vectors E and H, e.g. plane waves of the type E = |E| exp(−iωt + k · r), then a similar derivation gives

∂ 0 1 |E|2 + |B|2 + Re ∇ · (E × H∗) = −Re j · E∗ . (2.6) ∂t  2 2µ0  Since here the energy term is formed from the |E| and |B| of the oscillations, the time averaged energy terms for a harmonically oscillating field are 1 2 1 obtained if we multiply Eq. (2.6) by 2 (since hsin i = 2 ). The average radiative flux is thus Re E × H∗/2.

2.4. The Electromagnetic Wave Equation

For the treatment of radiation problems it is convenient to reformulate Maxwell’s equations in terms of a vector potential A and a scalar potential Φ, defined by B = ∇ × A , ∂A (2.7) E = −∇Φ − . ∂t These definitions assure that B is divergence free, and that E satisfies Faraday’s law. The remaining two Maxwell equations relating D to the charge density ρc and H to the current density j can then be written (since the spatial and temporal derivatives commute)

2 ∂ ∇ Φ + (∇ · A) = −ρ /0 , ∂t c (2.8) ∂ ∂2A ∇ × (∇ × A) = µ0j − 0µ0 ∇Φ + .  ∂t ∂t2  34 CHAPTER 2

Making use of the vector identity

∇ × (∇ × A) = ∇ (∇ · A) − ∇2A , (2.9) and defining −1/2 c = (0µ0) , (2.10) the second of Eqs. (2.8) can be written

2 2 1 ∂ ∂ A ∇ A − ∇ (∇ · A) − ∇Φ + = −µ0j . (2.11) c2 ∂t ∂t2  c represents the velocity of the electromagnetic wave in vacuum (speed of ). The potentials are not uniquely determined by the above equations, but one has the freedom to choose a gauge, the effect of which vanishes when applying the differential operators to derive the values of the physical fields E and B. For radiation problems the natural gauge to choose is the so-called Coulomb or radiation gauge defined by ∇ · A = 0 . (2.12) With this gauge condition Eqs. (2.8) and (2.11) become

2 ∇ Φ = −ρc/0 , 2 (2.13) 2 1 ∂ A 1 ∂ ∇ A − = −µ0j + ∇Φ . c2 ∂t2 c2 ∂t The first of these two equations is a Poisson equation for the scalar potential Φ, implying an instantaneous response of Φ to fluctuations in the charge density, without any wave propagation. This apparent non-local behaviour of Φ is allowed, since Φ is not the observed physical quantity, and the corresponding relevant phys- ical field E always has local behaviour (disturbances propagate with the ). The second equation on the other hand is a wave equation for the vector potential A, with the sources on the right hand side. These sources also contain a Φ-term, which however does not give any contributions to the A waves, for the following reason. Let us consider the solution to the Poisson equation,

0 0 1 ρc(r ) dV Φ(r) = 0 . (2.14) 4π0 Z |r − r | At distances large as compared with the size of the charge region Φ(r) becomes proportional to 1/r with asymptotically vanishing deviations from spherical sym- metry. Time variations in the charge density may therefore, via the ∂∇Φ/∂t term, contribute to A components parallel to ∇Φ, i.e., parallel to r. These varying components can however never lead to A waves propagating from or to the charge region, since the waves can only be transversal and thus perpendicular to r. The transverse nature of the waves is a direct consequence of the gauge condi- tion (2.12). If we introduce plane waves THEORY OF POLARIZED RADIATION 35

−iωt+ik·r A(t, r) = A0(ω) e , (2.15) the gauge condition implies k · A = 0 , (2.16) i.e., the waves must be transverse. The scalar potential Φ thus decouples from the wave propagation problem, and the electromagnetic waves become described by the single equation 2 2 1 ∂ A ∇ A − = −µ0j . (2.17) c2 ∂t2 t Only the transverse component of j (marked by index t), i.e., the component of the current density that is perpendicular to the propagation direction given by the wave vector k, can serve as a source for the waves. The electric field E of the waves is obtained via Eq. (2.7) as ∂A E = − , (2.18) ∂t since Φ does not contribute to the wave. From Eq. (2.7) also follows that B is perpendicular to both E and k. The os- cillations of E and B are in phase with each other, and lie in a plane perpendicular to the propagation direction.

2.5. Dipole Moment and Refractive Index

Plane wave solutions of the type (2.15) to the wave equation (2.17) imply

2 2 ω −iωt+ik·r −k + A0e = −µ0j . (2.19)  c2  t If the currents j are carried by moving electrons, each electron i contributes to the current density by the amount

ji = −er˙ i . (2.20) For the radiation problems that we are interested in here we let the motion of the electrons represent oscillations around an average position r0i (that of the atomic nucleus). The position vector of electron no. i can then be written

0 ri = r0i + ri . (2.21)

As we may disregard the motion r˙ 0i of the average position (because of its much larger mass the velocity of the atomic nucleus is much smaller than that of the electrons), we have 0 r˙ i = r˙ i . (2.22) If we denote the electric dipole moment per unit volume by d, its time derivative is according to Eq. (2.22) 36 CHAPTER 2

d˙ = − er˙ i , (2.23) Xi i.e., it equals the total current density j. Let us now define a refractive index n as embodying the dispersion relation between the wave number k and angular frequency ω through k ≡ nω/c . (2.24) Then we obtain from Eqs. (2.10), (2.19), (2.20), and (2.23)

2 2 −iωt+ik·r d˙ t = 0ω (n − 1) A0e , (2.25) where index t denotes the transverse component. This constitutes a relation be- tween the wave A0 and the amplitude of the electric dipole moment per volume element of the medium, expressed in terms of the complex refractive index n. The dipole moment density can be determined from the radiation-matter in- teraction term in the Hamiltonian of the system (see next chapter). In this way a second relation between d and A0 gets established, which when combined with Eq. (2.25) allows a solution for the refractive index n to be found. The physical interpretation of the complex refractive index is straightforward. From Eqs. (2.15) and (2.24) it is clear that the imaginary part corresponds to damping or absorption of the wave, while the real part corresponds to an oscillating phase factor determining the phase velocity of the wave and thus the ordinary refraction effects. The dispersion effects of the medium get large near the angular frequencies of the atomic resonances.

2.6. Representations of Polarized Light

As we have seen above, light represents transverse vibrations of the electric and magnetic vectors, both of which are derivable from the transverse vibrations of the vector potential A. As E and B have a fixed relation relative to each other (they are always in phase and perpendicular to each other), it is sufficient to describe light in terms of one of them, and normally the choice is E. The best way to picture the state of polarization is to describe the oscillations in a fixed plane as seen from the direction of the observer. It is clear what we mean by linear plarization, but the definition of the sign of the circular polarization requires some care. The convention that is generally used, and which defines the sign unambiguously, is that when the electric vector rotates clockwise in a fixed plane as seen from the observer, we speak of right-handed circular polarization, when the sense of rotation is counterclockwise we have left-handed circular polarization. We will stick to this terminology. Other definitions in terms of the handedness of the spiral pattern traced out in space by the electric vectors tend to lead to considerable confusion. A complete description of polarized light needs four parameters. Various pow- erful tools allowing us to deal efficiently with all aspects of polarization will be introduced in the next subsections. For further details we refer to the monographs THEORY OF POLARIZED RADIATION 37 by Shurcliff (1962), Robson (1974), and Collett (1993). In all these formalisms the physical contents can be visualized by considering the effect of inserting four different, idealized filters, Fk, k = 0, 1, 2, 3, in the light beam. Their properties are illustrated in Fig. 2.1. Filter F0 simply represents empty space, without any absorbing effects, while F1 and F2 transmit with the electric ◦ vector at position angles 0 and 45 , respectively, and F3 transmits right-handed circular polarization. Since F1, F2, and F3 block the orthogonal polarization state, the intensity of an unpolarized beam is reduced by a factor of two.

F0 F1 F2 F3

unpolarized 0 o 45o right-handed circular polarization

Fig. 2.1. Symbolic properties of the four idealized filters described in the text.

A detector records the light intensity Ik transmitted by each filter. These four intensity readings uniquely determine the full state of polarization of the incident beam.

2.6.1.

Let us introduce a set of orthogonal basis vectors e1 and e2 in a plane perpendicular to the propagation direction. At any fixed point in space the electric vector E can then be decomposed as E = Re (E1e1 + E2e2) , (2.26) where −iωt Ek = E0ke , k = 1, 2, (2.27) and E0k are complex amplitudes. Since both amplitudes and phases are involved, the two complex numbers represent four parameters characterizing the light. The oscillating phase factor e−iωt can be ignored for polarization problems, since it is the same for both components of the electric vector, and it disappears when form- ing observable quantities (involving products with complex conjugate components, cf. the following subsections). The Jones vector J is simply defined as

E1 J = . (2.28)  E2  The interaction with a medium can be described by a matrix w operating on J: 38 CHAPTER 2

J 0 = wJ . (2.29)

The Jones matrices for the four filters of Fig. (2.1) are (cf. Shurcliff, 1966)

1 0 1 0 w0 = , w1 = ,  0 1   0 0  (2.30) 1 1 1 1 1 i w2 = , w3 = . 2  1 1  2  −i 1 

To derive w1 and w2 for the linearly polarizing filters, one needs to use a linear polarization basis e1 and e2, while for the derivation of w3 a circular polarization basis is the appropriate one. (Note that filter F3 does not represent a λ/4 plate + linear , as is normally used in practical instruments to detect circular polarization, since the light that emerges would then be linearly instead of circu- larly polarized. Another λ/4 plate has to be added to make the transmitted light have the same right-handed circular polarization as the incident beam, in order to represent the function of F3.) It is useful to express a 2 × 2 matrix in terms of its components with respect to an orthogonal basis (analogous to the decomposition of a vector in its components). Such an orthogonal basis is provided by the Pauli matrices

1 0 1 0 σ0 = , σ1 = ,  0 1   0 −1  (2.31) 0 1 0 i σ2 = , σ3 = .  1 0   −i 0 

The Jones matrices of the four filters can now be expressed in the conveniently compact form 1 wk = 2 (σ0 + σk) , k = 0, 1, 2, 3. (2.32) The coherency matrix and Stokes formalisms will allow us to obtain a deeper physical understanding of these various 2 × 2 matrices. The Jones vectors provide the most direct representation of the polarization of electromagnetic waves. They therefore emerge naturally from the basic theory of the interaction between matter and radiation. As however each Jones vector always represents 100 % polarization (since each is 100 % polarized), the Jones formalism is unable to describe partially polarized light. Partial polarization arises due to the incoherent superposition of mutually uncorrelated photons with different polarization states in a statistical ensemble of photons. The photons are mutually uncorrelated since the atomic processes by which they are created are stochastically independent of each other. To treat statistical ensembles we need to use the coherency matrix of the radiation field, which however can be directly derived from the Jones vectors. THEORY OF POLARIZED RADIATION 39

2.6.2. COHERENCY MATRIX FORMALISM The 2 × 2 coherency matrix D of the radiation field is directly obtained from the Jones vector by D = JJ † , (2.33) where J † denotes the adjoint of J (transposition and complex conjugation of J). The averaging of D over a statistical ensemble of photons represents the case of incoherent superposition of the photon states, since the phase factors e−iωt of the different photons disappear when forming the bilinear products of the Jones vector in Eq. (2.33). If we instead do the averaging of the Jones vectors of the individual photons before forming the product (2.33), this corresponds to a coherent super- position of the states. Since the different photons in a stellar atmosphere arise from stochastically independent atomic processes, the incoherent superposition is the appropriate one to use. 2 The intensity I of a light beam is proportional to |E0| , the square of the amplitude of the electric vector. As the constant of proportionality is unimportant for the description of the polarization state, it is convenient in the context of polarization theory to choose it to be unity. Thus we define 2 2 I = |E01| + |E02| , (2.34) which implies I = Tr D , (2.35) where Tr means the trace (sum over the diagonal matrix elements). If the Jones matrix of the medium is w, then the coherency matrix transforms according to D0 = wDw† , (2.36) which is readily seen from Eqs. (2.29) and (2.33). The 2 × 2 coherency matrix can also be represented in the form of a 4- dimensional vector Dv, defined as

D11 D12 Dv =   , (2.37) D21  D22    where Dij are the components of D. This vector is transformed by a medium according to 0 Dv = W Dv , (2.38) where ∗ ∗ ∗ ∗ w11w11 w11w12 w12w11 w12w12 ∗ ∗ ∗ ∗ ∗ w11w21 w11w22 w12w21 w12w22 W = w ⊗ w =  ∗ ∗ ∗ ∗  (2.39) w21w11 w21w12 w22w11 w22w12 ∗ ∗ ∗ ∗  w21w w21w w22w w22w   21 22 21 22  (cf. Robson, 1974). The symbols ⊗ and ∗ denote tensor product and complex conjugation, respectively. 40 CHAPTER 2

Using Eqs. (2.35) and (2.36), we can now write down the intensities Ik (k = 0, 1, 2, 3) measured behind the four filters Fk of Fig. 2.1, which were represented by the Jones matrices wk. They are

† Ik = Tr (wkDwk) . (2.40)

Inserting the formula (2.32) for wk, we get an expression that subsequently can be reduced to the simple form

1 Ik = 2 [ I + Tr(σkD) ] , k = 0, 1, 2, 3, (2.41)

† † by noting that σk = σk, D = D (as a consequence of the definition (2.33)), 2 σ0D = D, σk = σ0, and Tr (σD) = Tr (Dσ).

2.6.3.

The Stokes parameters Sk can be defined operationally in terms of the intensity measurements Ik with the four filters of Fig. 2.1 as

Sk = 2Ik − I0 . (2.42)

S0 thus represents the ordinary intensity, S1 and S2 the amount of linear polariza- ◦ tion along position angles 0 and 45 , and S3 the amount of right-handed circular polarization. Using expression (2.41) in Eq. (2.42), we obtain the relation between the Stokes parameters and the coherency matrix:

Sk = Tr(σkD) . (2.43)

The Stokes parameters, which are often denoted I, Q, U, and V instead of Sk, form a 4-vector S0 I S1 Q S =   ≡   , (2.44) S2 U  S3   V      which in a medium transforms according to

S0 = MS . (2.45)

M is the 4 × 4 Mueller matrix. The formalism for calculating the effect of a medium or of an optical train on the Stokes vector is called Mueller calculus. The Stokes and coherency matrix formalisms are equivalent to each other and give a complete description of the state of polarization of a beam of electromagnetic radiation. The inverse of Eq. (2.43) can be written

1 D = 2 Skσk , (2.46) Xk THEORY OF POLARIZED RADIATION 41 i.e., the coherency matrix can be expanded in terms of the basis matrices σk, with the Stokes parameters divided by two as the expansion coefficients. Explicitly, in terms of a single matrix, Eq. (2.46) becomes

1 I + Q U + iV D = . (2.47) 2  U − iV I − Q 

The transformation matrix W for the vector version Dv of the coherency matrix, as defined by Eqs. (2.37)–(2.39), can be transformed into a Mueller matrix through M = T W T −1 , (2.48) where 1 0 0 1 1 0 0 −1 T =   , 0 1 1 0  0 −i i 0    (2.49) 1 1 0 0 1 1 0 0 1 i T − =   . 2 0 0 1 −i  1 −1 0 0   