<<

TECTO-125540; No of Pages 36 Tectonophysics xxx (2012) xxx–xxx

Contents lists available at SciVerse ScienceDirect

Tectonophysics

journal homepage: www.elsevier.com/locate/tecto

Review Article Seismic imaging: From classical to adjoint tomography

Q. Liu a,⁎, Y.J. Gu b a Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A7 b Department of Physics, University of Alberta, Edmonton, Alberta, Canada T6G 2E1 article info abstract

Article history: Seismic tomography has been a vital tool in probing the Earth's internal structure and enhancing our knowledge Received 6 December 2011 of dynamical processes in the Earth's crust and mantle. While various tomographic techniques differ in data Received in revised form 3 July 2012 types utilized (e.g., body vs. surface waves), data sensitivity (ray vs. finite-frequency approximations), and Accepted 11 July 2012 choices of model parameterization and regularization, most global mantle tomographic models agree well at Available online xxxx long wavelengths, owing to the presence and typical dimensions of cold subducted oceanic and hot, ascending mantle plumes (e.g., in central Pacific and Africa). Structures at relatively small length scales Keywords: Seismic tomography remain controversial, though, as will be discussed in this paper, they are becoming increasingly resolvable Sensitivity kernels with the fast expanding global and regional seismic networks and improved forward modeling and inversion techniques. Adjoint methods This review paper aims to provide an overview of classical tomography methods, key debates pertaining to Adjoint tomography the resolution of mantle tomographic models, as well as to highlight recent theoretical and computational Full waveform inversion advances in forward-modeling methods that spearheaded the developments in accurate computation of sensitivity kernels and adjoint tomography. The first part of the paper is devoted to traditional traveltime and waveform tomography. While these approaches established a firm foundation for global and regional seismic tomography, data coverage and the use of approximate sensitivity kernels remained as key limiting factors in the resolution of the targeted structures. In comparison to classical tomography, adjoint tomography takes advantage of full 3D numerical simulations in forward modeling and, in many ways, revolutionizes the seismic imaging of heterogeneous structures with strong velocity contrasts. For this reason, this review provides details of the implementation, resolution and potential challenges of adjoint tomography. Further discussions of tech- niques that are presently popular in seismic array analysis, such as noise correlation functions, receiver functions, inverse scattering imaging, and the adaptation of adjoint tomography to these different datasets highlight the promising future of seismic tomography. Crown Copyright © 2012 Published by Elsevier B.V. All rights reserved.

Contents

1. Introduction ...... 0 2. The tomographic problem and classical solutions ...... 0 2.1. Inverse problem ...... 0 2.2. Traveltime inversions ...... 0 2.3. Linearization and parameterization ...... 0 2.4. Waveform inversions ...... 0 2.4.1. Path average approximation (PAVA) ...... 0 2.4.2. PAVA in classical traveltime tomography ...... 0 2.4.3. Improved waveform approaches ...... 0 2.4.4. Partitioned waveform inversion and multi-stage inversion schemes ...... 0 2.5. Inversions for anisotropic structure ...... 0 2.6. Inversions with other geophysical or geochemical constraints ...... 0 2.7. Global optimization methods and probabilistic tomography ...... 0 3. Resolution of tomographic models ...... 0 3.1. Data constraints and general model resolution assessment ...... 0

⁎ Corresponding author. Tel.: +1 416 978 5434. E-mail address: [email protected] (Q. Liu).

0040-1951/$ – see front matter. Crown Copyright © 2012 Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.tecto.2012.07.006

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 2 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx

3.2. Ray vs finite-frequency kernel debate ...... 0 4. Adjoint tomography: theory and implementations ...... 0 4.1. Full waveform inversions in exploration ...... 0 4.2. Applications of FWI to seismic imaging in the crust and mantle ...... 0 4.3. Forward solvers for 3D velocity models ...... 0 4.4. Theory and implementations ...... 0 4.4.1. Definition of misfit function ...... 0 4.4.2. Gradient of misfit function ...... 0 4.4.3. Hessian matrix ...... 0 4.4.4. Time-domain implementation ...... 0 4.4.5. Comparisons with classical tomographic methods ...... 0 4.4.6. Sensitivity kernels for 1D reference velocity models ...... 0 4.4.7. Scattering-integral methods ...... 0 5. Adjoint tomography: inversion schemes ...... 0 5.1. Misfit functions and measurements ...... 0 5.2. 3D initial models and parameterization ...... 0 5.3. Regularization ...... 0 5.4. Step length and preconditioner ...... 0 5.5. Model verification, appraisal and resolution test ...... 0 5.6. Source inversions ...... 0 6. Adjoint tomography: discussions and future directions ...... 0 6.1. Noise correlation tomography ...... 0 6.2. Receiver functions and inverse scattering methods by passive seismic arrays ...... 0 7. Conclusions ...... 0 Acknowledgment ...... 0 References ...... 0

1. Introduction it could be legitimately argued that the fundamentals of geophysical in- verse problems were already in place when Backus and Gilbert (1967, For good reasons, seismic tomography has been closely linked with 1968, 1970) published their seminal studies on the formation, solution computed tomography (CT) commonly performed on X-ray or ultra- and non-uniqueness of linearized inversions (Wiggins, 1972). Rather sound recordings. The similarities are undeniable, as both techniques than quantifying the inverse problem in abstract mathematical terms, aim to extract the internal properties of an object based on integrated Backus and Gilbert (1967) provided formulation and practical solutions data trajectories. The former method is predicated on seismic waves to the inversion of normal mode observations based on Rayleigh's propagating through crustal or mantle rocks, and is therefore influenced Principle (Rayleigh, 1877). The findings of this and subsequent studies by, and sensitive to, the medium elastic parameters. The latter approach by the same authors championed a new generation of studies pertaining mainly seeks a numerical description of tissue density as a function of to the Earth's free oscillations (Dziewonski, 1971; Dziewonski and position. When performed under ideal conditions, both approaches Anderson, 1981; Gilbert and Dziewonski, 1975). Just as importantly, are able to reconstruct accurate two- or three-dimensional (3D) images functional derivatives of seismic measurement with respect to struc- of targeting structures through planar slices (or tomos in Greek conno- tural model parameters (Backus and Gilbert, 1967; Fréchet, 1941; tation). The similarity goes beyond the obvious: in fact, the governing Woodhouse, 1974), commonly known as Fréchet derivatives, Fréchet principles of seismic and computed, mainly radiological (Cormack, kernels or sensitivity kernels, made their debut in and would 1963), tomography are all rooted in the Abel transform of linear soon become the backbone of subsequent studies of traveltime (Aki paths in a radially concentric sphere (Herglotz, 1907; Wiechert, 1910). and Lee, 1976; Woodhouse, 1974, 1981) and waveform (Dziewonski Mathematically this approach was identical to the construction of a and Anderson, 1981; Woodhouse and Dziewonski, 1984) tomography. spherically symmetric quantum mechanical potential in 3D (Keller et The era of global seismic tomography officially began in 1981 al., 1956; Snieder and Trampert, 1999). A more general form of linear with the publication of Preliminary Reference Earth Model (PREM, tomography was presented as the central slice theorem by Radon Dziewonski and Anderson, 1981), the first well-accepted one dimen- (1917), which was later shown to reduce to Abel transform for Earth- sional (1D), radially anisotropic model of the Earth. This study com- like geometry (Deans, 1983; Nowack, 1990; Vest, 1974). While the bines normal mode, traveltimes, and attenuation observations with mathematics introduced by Radon (1917) are beyond the scope of this physical parameters such as Earth's mass and moment of inertia to review, the simple concept of integrating and inverting a projection establish an accurate reference frame, with typical errors less than function between two generic points connected by a straight line (or 1% between predicted and observed traveltimes for teleseismic ar- path) was instrumental not only for seismic and radiological tomogra- rivals, for advanced 3D surveys. This seminal study eventually earned phy, but also for modern-day data de-noising and reconstruction the two original authors well-deserved Crawford Award, the Nobel (Deans, 1983; Gu, 2010; Vest, 1974). prize equivalence for all of geosciences. Over the next 10 years, a series It is unclear whether the subsequent developments of CT and seismic of pioneering studies in linear (Dziewonski, 1984; Nakanishi and tomography were contemporaneous or that the former inspired the Anderson, 1982; Woodhouse and Dziewonski, 1984) and non-linear early pioneering efforts of seismic data analysis. In 1971, the first practi- (Ho-Liu et al., 1988, 1989; Snieder and Romanowicz, 1988; Tarantola cal CT brain scanner by Sir Hounsfield in England opened a new chapter and Valette, 1982) inversions followed suit, combining mathematical in medical radiology. By recording X-ray transmission through patients, elegance with artistry to propel seismic tomography to the forefront medical tomographers began to extract images of the X-ray attenuation of Earth sciences. The need for greater computing power and memory coefficients via what was once termed EMI scans (Cunningham and storage also became apparent. For instance, to enable inversions for Judy, 2000; Filler, 2009). This breakthrough pre-dated the first 2D seis- up to degree 6 spherical harmonics, the original Fortran 77 codes for mic tomographic regional map produced by Aki and Lee (1976),though key parts of the waveform analysis had to resort to the extended use

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 3 of ‘equivalence’ statement to circumvent computer memory limitations doughnut sensitivity kernels (Dahlen et al., 2000) and its applications (Woodhouse and Dziewonski, 1984). It is fair to state that, after four (e.g., Dahlen and Nolet, 2005; Montelli et al., 2004, 2006a; van der decades from Backus and Gilbert's first studies, the pursuit of higher Hilst and de Hoop, 2005). resolution and accuracy of the tomographic images continues to push Parallel to the development of global seismic tomography, structural computer software and hardware capacities to their allowable limits. inversions have long been formulated in the exploration context as full- Increasing seismic data combined with improvements in mea- waveform inversion problems, where the calculation of Fréchet deriva- surement accuracy, model parameterization, and forward modeling tives of an objective function involves two numerical simulations of theory have resulted in a number of published global tomographic wave propagation. This formulation was applied initially to exploration models from different geophysics programs (e.g., Ekström and seismic problems due to the heavy computational cost associated Dziewonski, 1998; Ekström et al., 1995; Gu et al., 2001; Houser et al., with numerical simulations two decades ago (e.g., Gauthier et al., 2008; Kárason and van der Hilst, 2000; Kustowski et al., 2008; 1986; Igel et al., 1996). However, with the advancement of numerical Masters et al., 2000; Montelli et al., 2004; Panning and Romanowicz, modeling techniques and improvement of computational resources, 2006). Captivating images of the mantle, which at times invoke inevita- simulations of propagation for complicated and realistic ble comparisons with objects of little scientificconnection(Fig. 1), are 3D heterogeneous media at both global and regional scales became often considered compelling observations by researchers from various practical in recent years (e.g., Castro and Käser, 2010; Igel, 1996; disciplines. Steady efforts were also made to go beyond simple ray Komatitsch and Tromp, 2002a; Moczo et al., 2007; Tromp et al., theory and compute sensitivities of broadband seismic phases by 2010a). This naturally led to interest in calculating finite-frequency sen- mode coupling (e.g., Li and Romanowicz, 1995), body-wave ray theory sitivity kernels based on more accurate forward modeling methods (e.g., Dahlen et al., 2000), surface-wave ray theory (e.g., Zhou et al., (Boschi et al., 2007; Liu and Tromp, 2006, 2008; Zhao et al., 2005), 2004), normal-mode summation (e.g., Capdeville, 2005; Zhao and and its ensuing applications to tomographic problems (Chen et al., Jordan, 1998) and full 3D numerical simulations (e.g., Liu and Tromp, 2007a; Fichtner et al., 2009b; Tape et al., 2010). This type of tomograph- 2008). The improved Fréchet derivatives (Backus and Gilbert, 1967; ic inversion method (which we refer to as adjoint tomography in this Woodhouse, 1974) translates to more accurate design matrices for the paper) takes advantage of accuracy 3D numerical simulations, and tomographic inverse problem and, at least in principle, should improve through iterations can resolve velocity perturbations up to 10–30%, far the quality of the inverted images of the Earth's interior. However, beyond those achievable by classical tomographic techniques. Howev- different dataset, processing algorithms, sensitivity kernels and regular- er, adjoint tomography is still in its infancy, where computational cost ization manifested in global tomographic models that agree only in rel- reduction (Akcelik et al., 2003; Tape et al., 2010), objective assessment atively long-wavelength scales (e.g., Dziewonski, 2005; Romanowicz, of model resolution (Fichtner and Trampert, 2011a, 2011b), and appli- 2003,alsoseeSection 3). Despite considerable efforts in quantifying cations to array data (e.g., via noise correlation and scattered wave and reconciling the model differences (e.g., Becker and Boschi, 2002; field analysis) remain areas of ongoing research. Still, adjoint tomogra- Becker et al., 2003; Fukao et al., 2001; Gu et al., 2003; Romanowicz, phy holds promise for a new-generate high-resolution images of the 2003), global tomography seems to have reached an impasse in resolu- Earth at local, regional and global scales. tion, which impedes its capability of being directly linked to tectonic While a chronological account of both exploration, crustal and man- and geodynamical interpretations beyond long wavelengths. The tle tomographic techniques and results may be tempting, a truly objec- impact of sensitivity kernels to the resolution of global seismic tomogra- tive presentation of all relevant research is impractical due to space phy was highlighted by a contentious debate surrounding banana- limitation. In the first part of this paper, we aim to revisit methods that had been the backbone of global seismic tomography (up to ‘Teeth’ Beneath Our Feet ~2003); key formulations are provided for completeness and easy com- parison. We then discuss some of the recent debates associated with the resolution of tomographic images (up to ~2006), and carefully review N the formulation and applications of adjoint tomography. Further infor- mation on broad subject of seismic tomography can be found in a Saudi E u r o p e Arabia number of informative reviews (e.g., Dziewonski, 1996; Dziewonski and Woodhouse, 1987, Fukao et al., 2001, 2009; Grand, 2002; Masters, Africa 1989; Masters and Shearer, 1995; Masters et al., 2000; Montagner, 1994; Rawlinson et al., 2010; Ritzwoller and Lavely, 1995; Romanowicz, 1991, 2003; Trampert and van der Hilst, 2005; Virieux and Operto, 2009; Woodhouse and Dziewonski, 1989).

2. The tomographic problem and classical solutions

2.1. Inverse problem

According to Joseph Keller, a prominent mathematician and physi- cist, inverse and forward problems are tightly coupled where the ‘formulation of each involves all or part of the solution of the other’.In other words, an inverse problem is only defined if the answer to ‘the inverse of what?’ is provided through an explicit expression for, and a solution of, the forward problem (Engl et al., 2000). For a forward oper- ator G that maps model parameters m to observations d based on selected rules or principles of physics, Fig. 1. Isotropic seismic velocity perturbations beneath Africa and surrounding regions down to 250-km depth. The model values are computed from S20U7L5 (Ekström et al., GmðÞ¼d: ð1Þ 1995) and the iso-surfaces highlight shear velocity contours of 1% (exterior) and 2% (inner surface, blue). The continental roots suggested by the 2% shear velocity perturba- tions display striking similarities to human teeth. The tomographic rendering was courte- The problem of recovering suitable parameters from a set of observa- sy of Misha Salganik. tions is the associated inverse problem (e.g., Aster et al., 2005; Iyer

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 4 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx and Hirahara, 1993; Menke, 1984; Nolet, 1993; Tarantola, 2005; where dl is a small segment of the ray path, s is the slowness vector Wiggins, 1972). The forward-inverse relationship is exemplified by tangential to the ray path with a magnitude equal to the inverse of integration and differentiation, two rudimentary mathematical op- the local seismic wave speed (Chernov, 1960), and xs and xr are the erations that are critical in the inversion approach of Backus and spatial coordinates of the source and receiver, respectively. Based on Gilbert (1967, 1968, 1970). Simple solution of this inverse problem Snell's law (Descartes and Laurence, 1631; Khan and Lantz, 2002; involves the following key steps (Aster et al., 2005; Menke, 1984; Snellius, 1621), one could trace a given ray between the two end Snieder, 1998): points and solve Eq. (2) as a boundary value problem either by shooting (Acton, 1997; Keller, 1968)orbending (Julian and Gubbins, 1977) fi 1. Writing the problem based on a set of discrete model coef cients. methods; the latter approach was adopted due to faster convergence 2. Computing the predicted data based on the choice of model pa- despite more complex formulations at discontinuous boundaries within rameters for an a priori structure, the majority being known 1D the Earth. model structures. In this formulation, the perturbation of arrival time relative to the fi 3. De ning an objective function and adjusting the model parameters unperturbed model prediction t is directly attributed to local perturba- fi fi to meet the pre-de ned goodness-of- t criteria. tions in slowness. The main objective of the inverse problem is to deter- 4. Estimating the accuracy and resolution of the inversion outcome, mine the fractional change of velocity for all solvable segments along repeating the above steps when necessary. the ray path (Menke, 1984; Tarantola and Valette, 1982; Woodhouse, The first two steps govern the forward problem, which takes advan- 1978). tages of laws of physics and makes informed predictions of the observa- tion. The last two steps aim to recover and refine the physical 2.3. Linearization and parameterization parameters, if properly defined, by minimizing the differences between observations and predictions while attaining certain desired properties Basis functions with desirable properties (e.g., orthogonality) are in the model outcomes (Fig. 2; Aster et al., 2005; Menke, 1984; generally adopted to expand slowness perturbation as a set of linear fi Tarantola, 2005). This process could be repeated for continued refine- equations with discrete unknown weights (as model coef cients). Early ment of the model and objective function. tomographic studies took advantage of a back-projection approach With a suitable choice of model parameters, Eq. (1) can often be lin- based on gridded blocks (e.g., Aki and Lee, 1976; Chou and Brooker, fl earized as a straight-forward multiplication of G, known as the design 1979; Hearn and Clayton, 1986). This parameterization was in uential matrix, and a vector containing the unknowns (see Section 2.2). This for global traveltime inversions in the 1990s using large catalogs of P process is often aided by a sound initial solution and perturbation theo- wave traveltimes (e.g., Grand et al., 1997; van der Hilst et al., 1997) ry, the foundation of Backus and Gilbert (1967) in the investigation and, owing to its simplicity, remains a popular choice in regional and of rock density and elastic moduli based on Rayleigh's principle exploration seismic applications. The formulation is a natural extension (Pekeris and Jarosch, 1958; Rayleigh, 1877, 1906; Woodhouse, 1976). of the discrete form of traveltime On the other hand, existence, ill-posedness,anduniqueness (Backus and Gilbert, 1967) must be critically assessed for any inversion, especially XNp δ ¼ δ ; ð Þ in view of limited observations and the nonlinearity of inverse problems ti sjlij 3 j¼1 pertaining to Earth's internal structure. The section below revisits key algorithms in classical tomography, especially linear inversions, in the δ order of increasing sophistication. where ti is the traveltime difference between observation and refer- ence model prediction for ray i , δsj is the slowness perturbation of the 2.2. Traveltime inversions j-th block in the medium, and lij is the length of the i-th ray in the j-th block. The summation can be performed over the entire study region, – Ray-based traveltime tomography has been the staple of seismic though only a small number of blocks (Np) for a given source station imaging largely due to its simplicity. The forward problem involving pair are sampled and contribute to the traveltime residual (Aki and the computation of the traveltime t of a ray can be expressed as Lee, 1976; Ho-Liu et al., 1989). The objective of the inverse problem is to determine δsj for the majority, if not all, of the blocks through inversions. xr While uniform grids based on rectangular cells or blocks are an obvi- t ¼ ∫ s·dl; ð2Þ xs ous choice due to the simplistic assumption of piece-wise straight-line path segments, the shapes and distribution of the blocks vary broadly in published studies. In fact, to circumvent the lack of sensitivity of sur- face wave ‘great-circle’ paths (Backus, 1964) to local heterogeneities, early tomographic inversions adopted prior regionalization schemes and partitioned the Earth's surface into a few irregular, continent-sized Ground educated guess starting blocks (Dziewonski and Steim, 1982; Kanamori, 1970). Problems associ- Truth m model m’ ated with a priori regionalization (Kawakatsu, 1983) were soon recog- nized and partially resolved by the path-average approximation and a global expansion based on spherical harmonics (Woodhouse and iterate? forward Dziewonski, 1984, see also next section). appraisal The desirable properties of orthogonality and smoothness made spherical harmonics, which was first proposed as solutions to the Laplace's equation, a popular choice of horizontal basis functions in data estimated early global tomographic studies of traveltime and/or waveform data model m’’ (Fig. 3a). The expansion takes the following form, inversion XK XL Xl δvrðÞ; θ; ϕ m m m fl ¼ f ðÞr p ðÞcos θ A cosðÞþmϕ B sinðÞmϕ ; ð4Þ Fig. 2. A ow chart showing the key steps in the construction and evaluation of tomo- vrðÞ k l k l k l graphic models. This figure is inspired by Fig. 2 of Snieder (1998). k¼0 l¼0 m¼0

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 5

(a) Spherical Harmonics (c) Spherical Harmonic Expansion

-10 +1 -20 km 20 (b) Tessalation (362 nodes) (d) B-spline expansion

-20 km 20

Fig. 3. (a) Normalized spherical harmonic function (Press et al., 2007) with angular order l=18 and azimuthal order m=6. The total number of model coefficients (i.e., sin and cos terms) up to degree 18 is 361. (b) Equal-area tessellation of the Earth's surface. The total number of vertices is 362. (c) Degree-18 spherical harmonic expansion of crustal thickness (CRUST 2.0, Bassin et al., 2000). (d) Expansion of crustal thickness using spherical B-splines (Gu et al., 2001) centered at each vertex in panel (b). With similar numbers of model coefficients, the results of the inversion-based expansions are highly consistent between the two parameterizations.

where θ is colatitude (90∘ -latitude), ϕ is longitude, r is radius, and v(r)is irregular global grids (see review by Sambridge and Rawlinson, 2005) the depth dependent velocity of the 1D reference Earth model. The term have often been adopted to provide a closer association between m pl is the associated Legendre polynomial of degree l and azimuthal model and data spaces. Widely accepted but inherently subjective order m,andfk(r) corresponds to a radial basis function that captures criteria for model evaluation, such as smoothness, also play key roles 1D variations of wave speeds along the depth axis (Dziewonski, 1984; in the selection of model parameters. For instance, spline functions Su et al., 1992; Woodhouse and Dziewonski, 1984). Finally, the constant (Gu et al., 2001, 2003; Kustowski et al., 2008; Mégnin and m m multipliers kAl and kBl are the target model coefficients in the 3D Romanowicz, 2000; Nettles and Dziewonski, 2008) have been widely expansion of seismic velocities. The above formulation deviates slightly adopted as an effective compromise between localization and smooth- from the original study by Woodhouse and Dziewonski (1984),in ness. Still, robust outcomes from well constrained inversion problems which a similar expression was detailed for squared velocity perturba- should not be strongly influenced by the choice of model parameters. tions using Legendre polynomial up to the third power and spherical When data coverage is sufficient, the global tomographic problem is in- harmonics up to degree 8. In either form, the first two terms of the herently similar to a simple interpolation problem. Fig. 3 compares an Taylor series of traveltimes establishes a linear dependence between inversion of global Moho depth of CRUST2.0 (Bassin et al., 2000) perturbations in time (observation) and velocity (target of investiga- based on different basis functions. For similar numbers of model param- tion) relative to a reference model such as PREM (Dziewonski and eters (361 and 362), the outcomes of 2D expansions of crustal thickness Anderson, 1981) or IASPEI91 (Kennett and Engdahl, 1991). In matrix based on B-splines and spherical harmonics are virtually identical. The m m form, a model vector m containing unknowns kAl and kBl in Eq. (4) ‘same earth’ assumption will be further assessed through comparisons could be estimated from the following set of linear equations of published 3D models in Section 3.

Gm ¼ d; ð5Þ 2.4. Waveform inversions where G contains assumed path sensitivity and d represents the set of 2.4.1. Path average approximation (PAVA) traveltime or waveform observations. Frequently cited examples of Due to the lack of consensus among published studies, we subjec- tomographic models based on this approach are S12WM13 (Su et al., tively characterize waveform tomography as an inversion that 1) uses 1992), S20A (Ekström and Dziewonski, 1998), S20RTS (Ritsema and time series rather than traveltimes, amplitudes or other secondary van Heijst, 2000) and, more recently, S40RTS that contain more than attributes of the recorded data, and 2) depends on the approximation 1600 horizontal model parameters (Ritsema et al., 2011). of full wave equation rather than ray approximation (Panning et al., While parameterization and nominal resolution (Kissling et al., 2009; Romanowicz, 2003). Hence, waveform inversions are more accu- 2001) often depend on a researcher's personal preference, judicious rate than its traveltime counterpart despite more complex formulations choices should reflect the merits of basis functions as well as data cov- and increased nonlinearity from waveform fitting. erage and density. Unlike well-designed exploration seismic experi- Woodhouse and Dziewonski (1984) marks a milestone effort in ments where receiver distributions are dense and uniform, the global early stages of global waveform tomography. Having recognized the tomographic problem is under-determined due to uneven merits of surface wave analysis (e.g., Brune et al., 1963; Knopoff, and station distributions. For this reason, variable resolutions or 1972) and its largely untapped potential on the global scale (Masters

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 6 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx et al., 1982), this study abandoned controversial a priori regionalization a depth-dependent velocity v(r), the ray theoretical traveltime t(p)of schemes (Dziewonski, 1971; Jordan, 1981; Kanamori, 1970; Kawakatsu, a seismic phase (Brune, 1966; Woodhouse, 1978) becomes, 1983; Souriau and Souriau, 1983; Toksoz and Anderson, 1966)andfor- "# 2 1=2 mally introduced the path integral approximation (or path average ap- rb 1 p tpðÞ¼2∫ − dr; ð8Þ proximation, for short PAVA, see also Mosca and Trampert, 2009; 0 v2ðÞr r2 Panza, 1985; Woodhouse and Dziewonski, 1986) in search of lateral heterogeneity in the mantle. This approximation, which is an effective where r is the instantaneous position of a ray segment from the center WKBJ assumption (Bretherton, 1968; Woodhouse, 1974), essentially of the Earth in kilometers and p represents the spherical ray parameter. states that phase perturbations of surface waves are caused by later- The key is to recognize the inherent connection between the horizontal al heterogeneities horizontally averaged over the source–receiver ray parameter p and the traveltime equivalence of the frequency–phase trajectory. Despite inaccurate assumptions of path sensitivities of velocity relationship (Dahlen and Tromp, 1998; Mosca and Trampert, body wave, triplications, and frequency-dependent surface waves 2009) (Li and Romanowicz, 1996; Snieder, 1996; Snieder and Nolet, 1987) along the great-circle arc (Backus, 1964), Woodhouse and δω ΔðÞp =TpðÞδc local ¼ local ; ð9Þ Dziewonski (1984) is worthy of a brief review for its pioneering ef- ω c c forts in combining fundamental theories of linear inversion and free oscillations with data mining through global surface and body where Δ(p)andT(p) correspond to ray-parameter dependent distance wave observations. and traveltime, respectively, and phase velocity c is defined by the Adopting the original notation of Jordan (1978), Woodhouse and simple relationship c=p−1.Eq.(9) is the traveltime equivalence of Dziewonski (1984) expresses total phase perturbations between the Eq. (6), and the total traveltime perturbation becomes an integral of source and receiver after n orbits as local perturbations in the exact form of Eq. (7) (Mosca and Trampert, 2009), T T δϕ ¼ ∫ x δω þ ∫ δω ; ð Þ local dt n local dt 6 0 0 Δ δ ðÞ¼1 ∫ δ ðÞ; θ; ϕ Δ: ð Þ Tp tlocal p d 10 Δ 0 where δωlocal denotes the local mantle wave frequency shift in response to lateral heterogeneities, and the integral is performed over the group In essence, the equivalence between waveform PAVA and ray theory traveltime along the source-station great circle path T for either a x is the manifestation of mode-ray duality under the zeroth order asymp- major or minor arc. The second term is a cumulative sum of the phase totic assumption (Mosca and Trampert, 2009; Zhao and Dahlen, 1995). shift after n passages of the surface wave around the great-circle path For this reason, the application of PAVA is far more prevalent in seismic defined by the source and receiver. One can define a fictitious frequency tomography than that suggested by waveform inversions. shift as

T 2.4.3. Improved waveform approaches δω ¼ 1 ∫ x δω ¼ : ð Þ local dt path average 7 T 0 PAVA detailed by Woodhouse and Dziewonski (1984) was initially applied to low-pass filtered (cutoff period 135 s) long-period mantle TheessenceofPAVAistorewritetheabovephaseperturbationin waves, named in consideration of their broad depth sensitivity to mantle terms of this apparent frequency term and source–receiver distance structure, and body waves (cutoff period 45 s). Low computation cost perturbation δθ (see Eq. (11) of Woodhouse and Dziewonski, 1984). was a crucial advantage for waveform tomography based on PAVA in The apparent frequency and distance shifts are different for each 1980s due to the assumed dependence of synthetic seismogram on mode, and their partial derivatives with respect to model parame- model perturbations averaged along source–receiver path. However, ters are used to compute the Fréchet kernels for the waveform inver- with vastly improved computer memory and processor speed in the sion problem. early 1990s, tomographers began to place a greater emphasis on the Based on PREM (Dziewonski and Anderson, 1981), synthetic seis- accuracy of the inversion outcomes instead of on the savings on compu- mograms can be computed by normal mode superposition (Gilbert, tational resources. Even for relatively ‘smooth’ variations, a condition that 1970; Gilbert and Dziewonski, 1975; Romanowicz, 2008; Tanimoto, PAVA is based upon, there are well-documented caveats. First, in a later- 1984) and PAVA facilitates the formulation of a linear inverse problem ally heterogeneous mantle, PAVA is inaccurate for body-wave phases of model perturbations. It was the building block for a series of global with geometrically banana-shaped sensitivities (Dahlen et al., 2000). models of mantle heterogeneities (e.g., Ekström and Dziewonski, More accurate kernels were developed by considering mode-branch cou- 1998; Gu et al., 2001, 2003, 2005; Kustowski et al., 2008; Liu and pling (Li and Romanowicz, 1995; Marquering and Snieder, 1995), which Dziewonski, 1998; Nettles and Dziewonski, 2008; Ritsema and van reduces finite-frequency sensitivity to planar, 2-D ray paths. In the same Heijst, 2000; Su et al., 1992, 1994). In theory, PAVA is sensitive to both year, Zhao and Dahlen (1995) derived the asymptotic expressions for the the phase and the amplitude information of surface wave waveforms. Fréchet kernels of modal eigenfunctions in the heterogeneous mantle. In practice, however, perturbations of waveforms in connection with Differential waveforms (Passier and Snieder, 1995)havealsobeen the reduction in data variance is dominated by phase shifts (e.g., Li suggested as a partial solution to improve the accuracy of PAVA. and Romanowicz, 1996; Woodhouse and Dziewonski, 1984); conse- Inaccurate body wave path assumptions are only a part of the quently, waveform inversions based on PAVA falls short in properly problem associated with PAVA. Even with improved geometrical accounting for the amplitudes of surface waves (e.g., Dahlen and path integrals, PAVA would remain a gross assumption due to the Tromp, 1998; Li and Romanowicz, 1995; Panning et al., 2009; Zhao lack of consideration for finite frequency effects of both body (Nolet and Dahlen, 1995). et al., 2005) and surface (Zhou et al., 2004) waves. This problem is highlighted by the use of ‘wiry’ ray theoretical kernel in traveltimes, 2.4.2. PAVA in classical traveltime tomography the waveform equivalent of PAVA, and requires further discussions The application of PAVA goes far beyond waveform tomography. (see Section 3.2). While it has been suggested earlier (see review by Romanowicz, 2003), a recent study by Mosca and Trampert (2009) explicitly validat- 2.4.4. Partitioned waveform inversion and multi-stage inversion schemes ed the notion of ray theory as the body-wave equivalence of PAVA to Based on Fréchet kernels, the inversion approaches outlined in the perturbations in modal frequencies. Assume ray bottom radius rb and preceding sections aim to resolve perturbations in seismic velocities

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 7 in reference to a starting 1D (or known 3D) model directly from Two-step inversions of an entirely different flavor have also been traveltime, phase, or waveform observations. While Woodhouse and well established in surface wave analysis. These studies first seek Dziewonski (1984) suggested that a direct inversion for the coefficients solutions to surface wave phase velocity (Ekström and Dziewonski, is more reliable for their long period Love and Rayleigh wave datasets 1998; Montagner and Nataf, 1986; Nakanishi and Anderson, 1984; this one-step inversion process is, for most datasets, not a requirement Nataf et al., 1984; Ritzwoller et al., 2002; Tanimoto and Anderson, but a choice. In fact, the earliest tomographic applications (especially in 1985; Trampert and Woodhouse, 1996; Yang and Forsyth, 2006), medical imaging based the central slice theorem of Radon, 1917) were group velocity (e.g., Bensen et al., 2007; Ekström and Dziewonski, predicated on two-step processes involving the construction of 3D 1998; Yao et al., 2006) or arrival angle information (Larson et al., volumes from a series of 2D sections. Multi-stage inversion approaches 1998; Laske and Masters, 1996) for multiple frequency ranges, then have also been adopted to recover mantle velocity structures, exem- construct the 3D structures by inverting the resulting dispersion plified by the Partitioned Waveform Inversion (PWI) (Nolet, 1990) curves or phase/group velocity maps (e.g., Ekström and Dziewonski, method. 1998; Shapiro and Ritzwoller, 2002). Stutzmann and Montagner The PWI method introduced by Nolet (1990, 2008) consists of two (1993) proposed a two-step waveform inversion scheme for surface key steps: 1) fit each waveform observation with an average 1D velocity wave fundamental and higher modes that incorporated amplitude along the source–receiver great-circle path, and 2) solve for 3D struc- information. Rather than utilizing full waveforms, as was in the case ture via linearized inversions and the collection of path-averaged for PAVA or PWI, maps obtained by these studies generally emphasize misfits from step 1 (e.g., Das and Nolet, 1998; Maggi and Priestley, a specific aspect or aspects of surface wave records. 2005; Marquering et al., 1996; Nolet, 1990; van der Lee and Nolet, 1997). Step 1 is conceptually important and potentially computational- 2.5. Inversions for anisotropic structure ly intensive due to nonlinear optimization (Nolet, 1990). Unlike Eq. (7), this study recasts the path integral to more confined model domain and Since the early 1960s, the dependence of seismic wave speed defines the path-average shear velocity perturbation as variations on particle motion or direction of propagation (generally hi known as ) has been documented through the dis- ðÞ 1 receiver ðÞ δ ðÞj ¼ ∫ ðÞ; θ; ϕ − j Δ; ð Þ crepant observations of Pn (Hess, 1964; Raitt et al., 1971) and surface vr Δ vr v d 11 j source (Aki, 1968; Forsyth, 1975; Nataf et al., 1984; Schlue and Knopoff, 1977; Yu and Mitchell, 1979) waves, as well as anomalous particle where v( j)(r) represents the radially symmetric reference speed, motions associated with body waves (Anderson, 1966; Ando et al., v(r,θ,ϕ) corresponds to 3D shear velocity variations along a given 1980; Hirn, 1977). Tradeoffs between heterogeneity and anisotropy path j with a path length of Δj. Then the path-averaged perturbations have long been recognized as a potential factor affecting the outcomes ( j) to the wavenumber δkn(ω) of a surface wave mode at frequency ω of tomographic models (e.g., Dziewonski and Anderson, 1981; Smith can be expressed as and Dahlen, 1973; Woodhouse and Dziewonski, 1984), but the isotropic "# assumption was frequently made for mathematical and computational a ∂ ðÞj ðÞω δ ðÞω ðÞj ¼ ∫ kn δ ðÞj ðÞ ; ð Þ conveniences (Anderson, 1989). For instance, inversions of general kn v r dr 12 0 ∂vðÞj ðÞr anisotropic bodies require the recovery of 21 independent elastic pa- rameters according to Hooke's Law. Consider a coarsely sample mantle where a is the radius of the Earth. The spectrum of the wavefield Sj(ω) with spherical harmonic expansion up to degree 20 (horizontal) and at this station becomes an exponential function of the path integral Chebyshev polynomial up to degree 13 (radial) (e.g., Ekström and (Nolet, 1990) and, based on the WKBJ approximation, the velocity Dziewonski, 1998), a back-of-envelope calculation suggests a total 2 term can be discretized and recovered from the waveform misfit. To number of unknowns to be (20+1) ×(13+1)×21≈130,000. This construct 3D variations in shear velocities δv(r,θ,ϕ)frommodelcoeffi- implies that a non-gradient factorization method such as Cholesky de- cients, Nolet (1990) defines a new data vector d based on the 1D model composition (Ekström and Dziewonski, 1998; Gu et al., 2001; Su et al., coefficients and links the 3D velocity perturbation v(r,θ,ϕ)to 1994) would require the storage and manipulation of two upper- 9 partitioned 1D velocity model v( j)(r)forpathj, triangular inner product matrix with 130000×130001/2≈8.5×10 elements each, or with a total memory requirement of ∼700 GB in π hi 1 ∫ ∫ ðÞ; θ; ϕ − ðÞj ðÞ −1ðÞ Δ ¼ δ ; ð Þ single precision. Even though today's high-end supercomputers could Δ vr v r gi r drd di di 13 j 0 path j meet this baseline memory requirement, a fully anisotropic inversion remains impractical due the extensive computation time for matrix where Δj corresponds to the source–receiver distance for the j-th path. inverse, the inherent tradeoffs between the daunting number of inde- −1 This equation is in the form of Gm=d (see Eq. (5)) with gi (r)asthe pendent parameters and, more importantly, the insufficient data sensitivity kernel and v(r,θ,ϕ) as the unknown model vector. Unlike constraints. step 1, the 3D velocity perturbations are solved directly via a linearized Despite challenges in modeling general anisotropy, tomographic inversion approach (Nolet, 1990). inversions targeting anisotropic wave speeds garnered considerable A key advantage of PWI over PAVA is flexibility, since multiple attention in the early 1980s (Dziewonski and Anderson, 1981; Nataf starting models can be simultaneously introduced during the 1D et al., 1984; Woodhouse, 1981) and became more practical in the waveform fitting procedure. During the past 20 years, a number of mid 1990s following the increased affordability and accessibility of continental-scale studies have benefited from PWI in the investigation supercomputers. Simplifying considerations were introduced based of crust (Das and Nolet, 1998; Manaman et al., 2011) and mantle struc- on the approximate symmetry of anisotropic minerals (Anderson, ture (Maggi and Priestley, 2005; van der Lee and Frederiksen, 2005; van 1961; Backus, 1962; Carlson and Christensen, 1979; Christensen and der Lee and Nolet, 1997). Similar to PAVA (Woodhouse and Dziewonski, Crosson, 1968; Dziewonski and Anderson, 1981; Postma, 1955). By 1984), however, the original form of PWI (Nolet, 1990) approximated assuming hexagonal symmetry, general anisotropy reduces to trans- body-wave path sensitivity using the WKBJ assumption in connection verse isotropy (or radial anisotropy) (e.g., Anderson and Dziewonski, with surface wave modes. While this inaccuracy was largely resolved 1982)withfive independent elastic parameters A, C, N, L,andF (Love, by incorporating mode coupling (e.g., Marquering and Snieder, 1995; 1927). The first four elastic constants can be determined by velocity Marquering et al., 1996), and lack of considerations for wave amplitudes measured in two orthogonal directions and F is a function of velocity (Panning et al., 2009)andfinite frequency effects (see Section 3.2) at intermediate directions (Dziewonski and Anderson, 1981). In remains problematic for PWI. addition to mathematical convenience, transverse isotropy offered

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 8 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx appealing physical descriptions of laminated or finely layered solids, Forsyth, 1989), though the assumed polarized kernels are inaccurate oriented cracks or melt zones, as well as preferentially oriented mantle due to the dependence of higher-mode surface waves on both vertical minerals such as olivine and pyroxene under shear deformation and horizontal wave speeds (see below). (Anderson, 1989; Nishimura and Forsyth, 1989; Silver, 1996). Anisotropic inversion of body-wave traveltime observations were Linearized anisotropic inversion methods pertaining to transverse introduced by Woodhouse (1981) as an addendum to Dziewonski isotropic solids were introduced by Dziewonski and Anderson (1981) and Anderson (1981). For simplicity we briefly outline the key results and Woodhouse (1981) for waveform and traveltime data, respec- on traveltime integral of SH waves and refer the readers to the origi- tively. Based on the theory initially developed by Takeuchi and Saito nal study by Woodhouse (1981) for more complex formulations for

(1966) and the definition of Fréchet derivatives (e.g., Backus and P–SV waves. Assume qr is the integrand of a one-way traveltime Gilbert, 1967), a relative change in the squared eigenfrequency of a path integral (see Eq. (8)), then asymptotic ray theory enables the normal mode can be written as (Anderson and Dziewonski, 1982; computation of partial derivatives of SH-wave traveltime with respect Dziewonski and Anderson, 1981) to all five elastic parameters (Woodhouse, 1981) for a radially aniso- tropic solid. To obtain explicit expressions to recover anisotropic seis- δω2 1 mic velocities from traveltime observations, one first needs to map ¼ ∫ r2dr A˜δA þ C˜ δC þ N˜ δN þ L˜δL þ F˜δF þ R˜δρ ; ð14Þ ω2 0 the Fréchet derivatives of qr with respect to the five elastic constants to horizontal and vertical velocities (Woodhouse, 1981) using the where the symbols with tilde signs represent the differential kernels chain rule, then insert the appropriate terms into the path integral (see Takeuchi and Saito, 1966 for alternative formulations) expressible over radius, as functions of the eigenfunctions (or the derivatives of eigenfunctions) rb ∂ rb ∂ and the angular degree of spherical harmonics (Dziewonski and δ ¼ ∫ qr δ þ ∫ qr δ ; ð Þ t 2 V SH dr 2 V SV dr 15 0 ∂ 0 ∂ Anderson, 1981). For practical reasons, this expression was recast into V SH V SV Fréchet derivatives of more measurable quantities such as horizontal and vertical wave speeds (VXH where ‘X’ can be either P or S), and the where δVSH and δVSV are the unknown perturbations in horizontal last term corresponds to perturbations to the density of the anisotropic and vertical shear wave velocities, respectively, along the ray path. For structure. Simple scaling relationships between P, S and density have proper choices of basis functions (see Sections 2.2 and 2.3), this formu- been adopted (Ekström and Dziewonski, 1998; Gu et al., 2005; lation is convenient for a linear, anisotropic inversion of the model coef- Kustowski et al., 2008; Marone et al., 2007; Nettles and Dziewonski, ficients using traveltime observations. 2008; Yuan and Romanowicz, 2010) to reduce the number of free Fig. 4 shows examples of traveltime and normal-mode frequency parameters in the inversion process. Anisotropic parameters have also sensitivities to anisotropic shear velocities. The ray-theoretical traveltime been retrieved from independent inversions of Rayleigh and Love of P1 (an SH-polarized first arrival, Fig. 4a) is highly sensitive to wave wave datasets (e.g., Ekström and Dziewonski, 1998; Nishimura and polarization and propagation directions (Fig. 4b; Gu et al., 2005). The

(a)Raypath (b) Travel Time Path Sensitivity 0

P1 to VSV P1 (SH wave) 500 P1 to VSH 1000 P2 to VSV P2 (SV wave) 1500 depth (km)

2000 ICB CMB 670

-0.20 -0.15 -0.10 -0.05 0 sensitivity

(c) mode 0T25 (d) Anisotropic Velocity 0 0 PREM VSH PREM 100 SH 250 V VSV VSV 200 500

sensitivity to VSH depth (km) depth (km) 300 750 sensitivity to VSV sensitivity to VISO 400 1000 0 5101520 4.2 4.4 4.6 4.8 sensitivity shear velocity (km/s)

Fig. 4. (a) Ray paths of S (SH-polarized) and SKS (SV-polarized). (b) Traveltime depth sensitivities of the two phases shown in graph (a) to horizontal (VSH) and vertical (VSV) wave speeds. The sensitivity of P2 is cut off at 2500 km. (c) Sensitivity of a toroidal normal mode frequency to horizontal and vertical wave speeds. The isotropic kernels (ISO) are com- puted using the Voigt average (see Ekström and Dziewonski, 1998; Gu et al., 2005). (d) Averaged 1D anisotropic velocities near the East Pacific Rise inverted by Gu et al. (2005), relative to PREM (Dziewonski and Anderson, 1981). Inverted 1D VSH and VSV structures are denoted by the thick solid and dashed lines, respectively, and the corresponding values of

PREM VSH and VSV are represented by the thin solid and dashed lines, respectively. Significant anisotropy exists in the and potentially extends to the top of the transition zone.

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 9 consideration for anisotropy is vital near the source and receiver posi- (Fig. 5a) of olivine to wadsleyite and ringwoodite to perovskite and tions where the wave vectors are approximately parallel to the symme- magnesiowüstite (Ita and Stixrude, 1992; Katsura and Ito, 1989, see try axis, or near the turning points where an SH-polarized wave is only Helffrich, 2000; Shearer, 2000 for reviews) can occur at depths tens sensitive to the horizontal velocity. On the other hand, SV-polarized of kilometers away from their respective global averages (Deuss, waves are only sensitive to vertical shear speed, but the sensitivity 2009; Flanagan et al., 1998; Gu et al., 1998; Shearer, 1993). These varies as a function of propagation direction (see Fig. 4b). The depen- steep depth variations (Fig. 5b) will impact the traveltimes and dence of mode frequencies on the anisotropic wave speeds are more trajectories of direct and reflected/converted seismic body waves. complex than the traveltime counterpart, and contributions of both Hence, rather than relying on the conventional treatment to disconti-

VSH and VSV are nonzero for toroidal (Fig. 4c) as well as spheroidal nuity depths (i.e., assumption of flat interfaces; Fig. 5c), a growing (not shown) modes. Fig. 4d shows the averaged horizontal and vertical number of studies are beginning to resolve the tradeoff between seis- shear wave speeds under the East PacificRise(Gu et al., 2005), retrieved mic wave speed and discontinuity depth (Fig. 5d) via simultaneous using both traveltime and waveform information. The substantial dif- inversions (Chang et al., 2010; Gu et al., 2003; Houser et al., 2008; ferences between the inverted anisotropic wave speeds, their depar- Lawrence and Shearer, 2006). tures from PREM (Dziewonski and Anderson, 1981; Tanimoto, 1984) A joint inversion approach represents a significant improvement and the large depth extent (e.g., Beghein and Trampert, 2004; Chen over a priori corrections and assumptions of flat interfaces. Mathe- and Brudzinski, 2003; Gu et al., 2005; Jung and Karato, 2001; Karato, matically, the added constraints and new model parameters can be 1998; Leveque et al., 1998; Panning and Romanowicz, 2006)clearly incorporated into the inversion procedure via accentuate the need to properly account for seismic anisotropy in seis- mic tomography. G d 0 δv ¼ 0 ; ð17Þ Finally, tomography based on the radially isotropic assumption gen- G1 d1 erally considers minimal lateral variations in wave speeds in planes perpendicular to the symmetry axis. This is a gross approximation where d0 and d1 denote conventional and newly added data, respec- considering the abundance of evidence, dating back to early observa- tively, and matrix on the left hand side of the above equation contains tions of Pn wave traveltimes (Hess, 1964; Raitt et al., 1971)and the combined data sensitivity. In this simple example, model vector shear-wave birefringence (Long and Silver, 2009). Smith and Dahlen δv is the seismic velocity perturbation, which is modified by the addi- (1973) introduced the following formulation to characterize the azi- tional weighted equations of constraints during the regularization muthal dependence of wave speeds with respect to a fixed direction and minimization of data errors. Useful constraints include, but are (also known as azimuthal anisotropy), not limited to, the depths of Moho (Chang et al., 2010; Xu and Song, 2010), core–mantle boundary (Boschi and Dziewonski, 2000; Morelli δcðÞr; ϕ and Dziewonski, 1987) and transition zone discontinuities (Gu et al., ¼ a ðÞþr a ðÞr cos 2ϕ þ a ðÞr sin 2ϕ þ a ðÞr cos 4ϕ c 0 1 2 3 2003; Houser et al., 2008; Lawrence and Shearer, 2006), as well as þ ðÞ ϕ; ð Þ a4 r sin 4 16 those associated with magnetotelluric (Moorkamp et al., 2007, 2010; Roux et al., 2011), geodynamical (Forte, 2000; Forte and Mitrovica, where r is a 2D positional vector, ϕ is the relative azimuth of the ray 2001; Forte et al., 1994; Simmons et al., 2006, 2007; Trampert et al., path and ai (i=1,⋯,4) represent the unknown coefficients of the 2004) and geodetic (Chambat and Valette, 2001; Khan et al., 2008) major (2ϕ) and secondary (4ϕ) terms (e.g., Tanimoto, 1984; Tanimoto observations. and Anderson, 1985). This is a convenient linear equation where coeffi- Data integration, which has been a driving force behind some of cients are readily invertible based on observed phase velocity perturba- the recent tomographic efforts (e.g., Chang et al., 2010; Yuan and tions δc/c. Similar to radial anisotropy, the strength and nature of Romanowicz, 2010), is vital for the pursuit of a self-consistent mantle azimuthal anisotropy vary broadly depending on the tectonic history model (see Section 3). The accuracy and interpretation of these and depth extent (Becker et al., 2003; Larson et al., 1998; Marone and models could benefit further from improved imaging of anelasticity in Romanowicz, 2007; Montagner and Nataf, 1986; Montagner and crust and mantle rocks (e.g., Bhattacharyya et al., 1996; Dalton et al., Tanimoto, 1990; Park and Levin, 2002; Simons and van der Hilst, 2008; Durek et al., 1993; Flanagan and Wiens, 1994; Gung and 2003; Tanimoto, 1984; Thybo and Perchuc, 1997; Trampert and van Romanowicz, 2004; Ho-Liu et al., 1988; Romanowicz, 1990, 1994, Heijst, 2002; Trampert and Woodhouse, 2003; Wookey et al., 2005; 1995; Roth and Wiens, 2000; Suda et al., 1991), an important observa- Yuan and Romanowicz, 2010). The broad range of outcomes from tional constraint (see Romanowicz, 1998; Mitchell and Romanowicz, these studies were highly instrumental for the understanding of miner- 1998 and references therein) that received unfair coverage in this alogy and deformation below surface. At the present time, however, review due to space limitation. The compatibility of the different determination and interpretation of seismic anisotropy remain chal- data sets and the relative weights of vastly different datasets, how- lenging, especially in view of its tradeoff with isotropic heterogeneity ever, remain critical issues to be resolved by more quantitative in anisotropic inversions and the apparent lack of depth sensitivity investigations. from S/SKS splitting observations. 2.7. Global optimization methods and probabilistic tomography 2.6. Inversions with other geophysical or geochemical constraints With the exception of PWI, methods documented in this section em- The accuracy of tomographic imaging is a strong function of data phasize the retrieval of a single set of model parameters through the density and data sensitivity to the targeted structure. Despite a fast process of minimization. This deterministic approach is mathematically growing number of seismic stations worldwide and increasing diver- ill-conditioned due to noise and finite path coverage (Romanowicz, sity in seismic data types, high-resolution global seismic tomography 2003; Snieder and Trampert, 1999), thus requiring regularization remains an under-determined inverse problem (Dziewonski, 2005; (e.g., Menke, 1984; Parker, 1994; Tarantola and Valette, 1982), and a Romanowicz, 2003) and could certainly benefit from additional data single optimal solution for highly nonlinear problems such as waveform constraints on the geometry, density and composition of mantle ther- matching (Gauthier et al., 1986; Mosegaard, 1998; Mosegaard and mochemical variations (see Section 3.1 for detailed discussions). One Tarantola, 2002; Sambridge et al., 1991; Shaw, 1992) has also been such constraint is the existence and undulation of known and postu- questioned at the conceptual level (Mosegaard and Tarantola, 2002; lated seismic interfaces (e.g., crust, discontinuities, D″). Resovsky and Trampert, 2003; Scales and Snieder, 1997, 2000; Due to strong temperature dependence, mineral phase changes Tarantola, 2005). A key alternative to this inversion paradigm is a

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 10 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx (a) (b) surface 410 660

Ringwoodite cold hot

410 Pv+Mw olivine wadsleyite 660 temperature

10 15 20 25 pressure (GPA) (c) (d) actual tradeoffs 410 410

fast fast anomaly anomaly

660 660

S410S

S410S

S660S S660S

Fig. 5. (a) Phase diagrams of olivine phase transitions. The Clapeyron slopes of olivine to wadsleyite and ringwoodite to perovskite and magnesiowüstite transitions have opposite signs (Ita and Stixrude, 1992; Katsura and Ito, 1989, see Helffrich, 2000 for a review). Notation: Pv=perovskite, Mw=magnesiowüstite. (b) The expected phase transition behavior under low and high temperatures. A thick mantle transition zone is expected in a low temperature region such as an active zone. (c) Results of conventional inversions for seismic velocity only. The velocity anomaly has no effective connection with, or effect on, the transition zone phase boundaries. (d) Illustration of how an outcome of a joint inversion of discontinuity topography and seismic velocity differs from (c). The tradeoff of the two quantities is properly considered in this case, i.e., the amplitude of the hetero- geneity decreases due to changes in discontinuity depth (hence wave traveltime). probabilistic solution based on the selection of a suitable ensemble and Tarantola, 1995; Scales et al., 1992). Furthermore, the statistical de- of models that fits certain tolerance criterion (e.g., Metropolis et al., scription of the probabilistic model ensemble offers a more objective 1953; Press et al., 2007; Tarantola and Valette, 1982). Generally metric of model resolution than the standard resolution tests of a linear known as the Monte Carlo method (Metropolis and Ulam, 1949), inversion (Fichtner and Trampert, 2011b; Sambridge and Mosegaard, nonlinear stochastic approaches take advantage of random numbers 2002). and statistical description of model spaces without resorting to gradi- Like any other inversion algorithm, probabilistic inversion has its ents of objective functions (Mosegaard and Tarantola, 2002). After the share of challenges. For instance, unbiased sampling of the solution pioneering study of Keilis-Borok and Yanovskaja (1967), applications space for a large number of model parameters remains a costly proce- of Monte Carlo methods became increasingly diverse in earthquake dure even with fast improving computation speed and parallelization. seismology (e.g., Bodin et al., 2012; Kennett et al., 1988; Koper et al., A priori/posteriori probability distributions (Tarantola, 2005; Tarantola 1999; Menke and Schaff, 2004; Sambridge and Drijkoningen, 1992; and Valette, 1982) could also be biased by the tomographer's subjective Sambridge and Kennett, 1986; Scales et al., 1992; Shaw, 1992; input (Scales and Snieder, 1997) in the presence of noise (Meier et al., Trampert et al., 2004). Rather than uniformly sampling the solution 2007a). Hence, despite recent advances in the nonlinear analysis of space (Sambridge and Kennett, 1986), which is computationally velocity and density (e.g., Bodin et al., 2012; Fichtner and Igel, 2008; demanding and prone to ‘rough’ model outcomes (Sambridge, 1990), Fichtner and Trampert, 2011b; Maraschini and Foti, 2010; Mewes subsequent nonlinear global seismic inversion methods feature more et al., 2010; Resovsky and Trampert, 2003; Trampert et al., 2004), prob- sophisticated optimization methods that originated from ferromag- abilistic methods in regional and global seismic applications remain a netic, thermodynamic and cellular systems; prime examples include promising but unfinished product. simulated annealing (Iyer and Hirahara, 1993; Weber, 2000), genetic al- gorithm (Bodin et al., 2012; Koper et al., 1999; Sambridge and 3. Resolution of tomographic models Drijkoningen, 1992) and neural network algorithm (Meier et al., 2007a,2007b). Through proper ‘training’ or ‘tuning’ (Mosegaard, 1998; Results obtained from classical seismic tomography have been Mosegaard and Tarantola, 2002), these algorithms are ideally suited instrumental in the understanding of the large-scale structure and dy- for inverse problems involving high degrees of nonlinearity and/or namics of the Earth's interior. Through sustained efforts and advances multi-modal probability distributions (Koper et al., 1999; Mosegaard in technology and data acquisition, tomographic images have become

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 11 the de facto standard in assessing the dimension, motion and intrinsic reminders of the complex dynamic processes within the Earth's mantle. properties of the lithosphere (e.g., Ekström and Dziewonski, 1998; Gu Many of these first-order mantle heterogeneities are well resolved by a et al., 2005; Gung et al., 2003; Li and Romanowicz, 1996; Simons and wide spectrum of tomographic models, as evidenced by the general van der Hilst, 2003; Su et al., 1992; Yuan and Romanowicz, 2010; agreement between model predictions in the upper mantle transition Zhang and Tanimoto, 1992; Zhou and Clayton, 1990), plate boundary zone (see Fig. 6). zones (e.g., Fukao et al., 1992; Nishimura and Forsyth, 1989; van der Hilst et al., 1991; Zhao, 2001a; Zhao et al., 1994), upper mantle 3.1. Data constraints and general model resolution assessment transition zone (Becker et al., 2003; Gu et al., 2001; Mégnin and Romanowicz, 2000; Widiyantoro and van der Hilst, 1997,seeFukao The resolution increase in seismic tomography has been steady, if not et al., 2001, 2009 for reviews regarding regional subduction slab spectacular, in the past two decades. The road to higher resolution began analysis), mid/lower mantle (Boschi and Ekström, 2002; Grand et al., with spherical harmonics (or basis functions with equivalent resolution) 1997; Gu et al., 2003; Kárason and van der Hilst, 2000; Mégnin and up to degrees 6–8(Dziewonski, 1984; Woodhouse and Dziewonski, Romanowicz, 2000; Ritsema and van Heijst, 2000; Zhao, 2001a), the 1984) in the 1980s. With a resolving length of 3000–5000 km, the lowermost mantle (Gu et al., 2001; Ishii, 2004; Kárason and van der results of these early models established first-order connection between Hilst, 2001; Liu and Dziewonski, 1998; Panning and Romanowicz, major tectonic regimes (i.e., continent and oceans) and seismic veloci- 2006) and the controversial outer core (Boschi and Dziewonski, 2000; ties. Following the development of broadband and the Cao and Romanowicz, 2004; Vasco and Johnson, 1998). Impressive convenient data access provided by the Incorporated Research Institu- tomographic images of deep-penetrating slab (Fig. 6;e.g.,Bijwaard tions for Seismology (IRIS), different shear-wave phases were measured et al., 1998; Fukao et al., 1992, 2001; Kárason and van der Hilst, 2000; and the maximum spherical harmonic degree increased to 12–14 in the van der Hilst et al., 1991) and ascending mantle plumes beneath Africa early 1990s (Bolton and Masters, 1991; Li and Romanowicz, 1996; Su and the PacificOcean(Bijwaard and Spakman, 1999; Gu et al., 2001; et al., 1992), with radial basis functions to characterize the depth varia- Ritsema, 1999; Su et al., 1992; Zhao, 2001a) have served as popular tions in shear velocity. In addition to offering 3–4timestheresolving power, this new generation of models was able to systematically quan- Shear Velocity Perturbations (540 km) tify the spectral signatures, hence the characteristic scales, of mantle structure and deformation. In the late 1990s, the landscape of tomo- fi S12/WM13 SAW12D graphic imaging changed signi cantly with the arrival of high-resolution models based on automatic traveltime picks from International Seismic Center (ISC) bulletins. While these large databases lack the quality control of individual phase picks (e.g., Bolton and Masters, 1991; Liu and Dziewonski, 1998; Ritsema, 1999; Su et al., 1992), the quantity of the data enables inversions with vastly increased nominal resolution. MK12_S S362D1 The output of these models (Grand et al., 1997; Kárason and van der Hilst, 2000) features spectacular descending slab-like structures with lateral length scales as small as a few hundred kilometers. While the realistic resolutions and their correlation with traditional long- wavelength models have been questioned (Boschi and Dziewonski, 1999), there is no denying that high-resolution block models became a sb10l18 sb4l18 catalystinthedebateonthestyleofmantleconvection(e.g.,Albarede and van der Hilst, 2002; Kellogg et al., 1999). The advent of high- resolution tomography and renewed focus on resolution improvements (Becker and Boschi, 2002; Ekström and Dziewonski, 1998; Gu et al., 2001, 2003; Houser et al., 2008; Mégnin and Romanowicz, 2000) tipped the perception scale clearly in favor of a partially layered convection S20RTS S20A system. While it is impractical to assess the resolutions of individual global models in a short review, there are certain helpful tips from the past that could assist future construction of global and regional models. First of all, more is not necessary better. Due to the large number of pre-existing stations (e.g., Global Digital Seismic Network stations) SAW24B16 Grand2000 and large number of occurring on pre-existing lines, there is considerable overlap in the path coverage over time. Adding more data from similar paths would increase the redundancy, but not true resolution in the inversion process. On the other hand, more variety is better. Incorporation of different data types offers addi- tional path coverage and constraints in spite of similar source-station locations for many of the phase picks (Fig. 7). The depth sensitivities of common data types for inversions of shear velocities are illustrated -1.5 δ +1.5 v/v (%) in Fig. 8. Surface wave dispersion measurements offer the best global constraint on the anisotropic mantle lithosphere (e.g., Ekström and Fig. 6. Shear velocity perturbations in the upper mantle transition zone. Despite signif- icant uncertainties under the oceans (see Section 3), high velocity perturbations from Dziewonski, 1998; Gung et al., 2003; Larson et al., 1998; Laske and all models appear to agree on the presence of high velocity zones in the western Pacific Masters, 1996; Montagner and Nataf, 1986; Nettles and Dziewonski, and South America subduction zones. The references for the selected models are S12/ 2008; Nishimura and Forsyth, 1989; Yuan and Romanowicz, 2010). WM13 (Su et al., 1994), SAW12D (Li and Romanowicz, 1996), MK12_S (abbreviation Since surface waves are shear waves in nature, global models of S model from MK12WM13, Su and Dziewonski, 1997), S362D1 (Gu et al., 2001), generally achieve greater spatial resolutions than models at sb10l18 and sb4l18 (Masters et al., 2000), S20RTS (Ritsema and van Heijst, 2000), S20A (Ekström and Dziewonski, 1998), SAW24B1 (Mégnin and Romanowicz, 2000), upper mantle depths (Romanowicz, 2003). The longest period surface and Grand2000 (Grand, 2002). waves also contribute to the resolution of mantle and

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 12 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx

Global Data Constraints diffracted and differential core phases (see Fig. 7; Boschi and surface wave Dziewonski, 2000; Gu et al., 2001, 2003; Kárason and van der Hilst, 2001; Liu and Dziewonski, 1998; Mégnin and Romanowicz, 2000; Receivers č ć SS/PP Obayashi and Fukao, 1997; Tkal i et al., 2002) have been widely used SdS/PdPS/P in tomographic imaging to illuminate the base of the mantle. ScS/PcP Events It should be noted that differences between tomographic images Sdiff/Pdiff could be both informative and challenging for isolating facts from SKKS/PKKP artifacts (Dziewonski, 2005). Existing global P and S tomographic models differ in parameterization, data coverage and inversion strate- SKS/PKP gies. Some of these differences reflect tomographer's personal prefer- ences, e.g., choices of model parameterization and matrix factorization CMB ICB 660 algorithms (e.g., Boschi and Dziewonski, 2000; Spakman and Nolet, 1988; van der Hilst, 1999), others result from informed decisions Fig. 7. Sample ray paths of commonly used seismic phases in global seismic tomogra- phy. The significant differences in penetrating depth and coverage among the phases based on limited data coverage and/or finite computational resources. highlight the importance of an integrated study. Notation: d=discontinuity depth. The inaccessibility of vast sub-crustal depths through direct means necessitates an objective assessment of true (as opposed to nominal) transition zone (see Fig. 8), particularly with full waveform informa- resolution in model appraisal. As a simple exercise we compute the tion from deep-penetrating, reverberating mantle waves. On the bootstrapped average (Efron and Tibshirani, 1991) and standard devia- other hand, lower-mid mantle region is best resolved by body wave tion of 18 published global tomographic models from 1992 to 2004 traveltimes (see Fig. 7), both for P (Antolik et al., 2003; Grand et al., (Fig. 9). Perturbations in P wave velocities have been amplified by a 1997; van der Hilst et al., 1997; Widiyantoro and van der Hilst, factor of 1.73 (Masters et al., 2000; Romanowicz, 2003)tobecompati- 1997) and S (Gu et al., 2001, 2003; Houser et al., 2008; Kustowski ble with the S-wave models. Robust heterogeneous structures include et al., 2008; Masters et al., 2000; Mégnin and Romanowicz, 2000). strong continent–ocean contrast in the upper mantle lithosphere, high Inclusions of direct and reflected P waves can provide coverage in velocities along the western Pacific subduction zones in the upper otherwise undersampled regions such as the oceans. Comparable mantle transition zone, a weak north–south orienting high velocity data coverage and sensitivity in this depth range render a combined zone across the Americas, and anomalously low velocities associated P and S (or bulk and sound) analysis particularly informative with the Pacific and African ‘plume groups’ (Dziewonski et al., 1993) (e.g., Antolik et al., 2003; Houser et al., 2008; Kennett et al., 1998; (see Fig. 9). The spectra of the entire mantle are clearly dominated by Manners, 2008; Su and Dziewonski, 1997). Relative to an average low-degree spherical harmonics (Gu et al., 2001; Masters, 1989; Su et lower mantle depth, the data coverage and sensitivity increases expo- al., 1992), or longer wavelengths (Dziewonski, 2005; Romanowicz, nentially near the core-mantle boundary. Traveltimes of reflected, 2003), despite an apparent loss of overall amplitude due to averaging.

mantle WF(T>85s,T) mantle WF(T>135s,T) mantle WF(T>85s,V) mantle WF(T>200s,T) Rayleigh mantle WF(T>135s,V) mantle WF(T>200s,V) body WF 1(T>45s,V) Love body WF 2(T>45s,V) body WF 2(T>45s,T) Moho 25-70 70-160 110-230 180-320 250-500 400-630 poor coverage TZ 580-850 750-1250 1000-1700 1400-2000 1900-2470 2300-2670 2600-2860 approximate depth range (km) 2890-2840 CMB body WF 1(T>45s,T) SS-S SKKS-SKS SS S-SKS S ScS-S ScS

0normalized amplitude (%) 100

Fig. 8. A graphic representation of the lower half of the inner-product matrix used by Gu et al. (2001) in the inversion for mantle isotropic velocities. The vertical axis is approx- imated by the peak locations and half-widths of B-spline functions used in the inversion. WF1 represents a waveform dataset from recordings between 1991 and 1998; waveform records prior to this time period are represented by WF2. Relatively low resolution is observed in the upper mantle transition region (400–1000 km). This figure is edited from Plate 1 of Gu et al. (2001).

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 13

apparent S velocity uncertainty It is worth noting that the model comparisons detailed in this sec- tion are only a simple exercise in the spirit of a statistical description 100 km of the solution space. Practical and unbiased assessment of data reso- lution remains a key area of geophysical research (Sambridge and Mosegaard, 2002). The following list highlights key considerations and challenges during the resolution assessment of tomographic inversions:

660 km 1. High resolution does not automatically translate to smaller-scale seismic structures (e.g., Boschi and Dziewonski, 1999). The charac- teristic length scales of anomalies are, to first order, reflected by low order spherical harmonics reported over two decades ago (e.g., Romanowicz, 2003; Woodhouse and Dziewonski, 1984). 2. Nominal resolution is not equivalent to the true resolution of a given 1300 km dataset. The term nominal resolution, which appeared in early discus- sions of source parameter inversions (Gilbert and Dziewonski, 1975), is a subjective choice of the tomographer based on his/her own assessment of the dataset. The true resolution is an intrinsic property of the dataset independent of the modeling technique. 3. Successful recovery of an input checkerboard in the deterministic fi 2200 km inversion approaches is only a necessary condition, but not a suf - cient condition, in the pursuit of the true structure. Furthermore, resolution assessment centered on a single optimal model likely underestimates the true uncertainties of the inversion results (e.g., Sambridge and Mosegaard, 2002). In addition to continued deployment of more seismic stations (see 2800 km more discussions in Section 6), seismic tomography could greatly ben- efit from multiscale inversions with enhanced clarity on well-covered regions (e.g., Boschi et al., 2004; Chiao and Kuo, 2001; Hung et al., 2011; Zhao, 2009) as well as data mining beyond main body and sur- face wave arrivals on existing waveforms (see Section 6). The ultimate goal is to be able to explain every wiggle and recover as much informa- tion about the structures of the Earth's interior as data permit. -1.50 +1.5 0.0 0.3 0.8 1.0 δv/v (%) δv/v (%) 3.2. Ray vs finite-frequency kernel debate

Fig. 9. Mean shear velocity perturbations (left column) and standard deviations (right Potential improvement in the resolution of tomographic models can column) from 18 published global tomographic (10 shear and 8 compressional, up to be also achieved through more accurate forward modeling theory. 2004) models. The lithosphere shows the best agreement among the global models, Fitting all wiggles on a seismogram is a very challenging task, and as whereas the upper mantle transition zone, particularly in the vast oceanic regions, re- is true to all inverse problems, the accuracy and efficiency of solutions mains problematic. Models used in this analysis include all S velocity models presented in Fig. 6 as well as eight additional P and/or S models from Boschi and Dziewonski to the forward problem (i.e., seismic wave equations) are critical. Due (1999); Kárason and van der Hilst (2001); Zhao (2001b); Antolik et al. (2003); to the complications involved in solving the seismic equation, the Montelli et al. (2004). majority of tomographers still rely on approximation theories (e.g., high-frequency ray approximation for body-wave modeling or asymp- totic methods for surface-wave modeling; see more discussions in Radially, the lithosphere appears to be best resolved, showing a 3+% Section 2) or assume simple 1D Earth structure (e.g., in normal-mode average heterogeneity with merely ~1% maximum uncertainty near theory) to compute Green's functions. The accuracy of sensitivity the East Pacific Rise and cratonic North America; dense surface wave kernels for measurements, which form row vectors of the design matrix coverage (e.g., Ekström and Dziewonski, 1998; Laske and Masters, G in Eq. (5), is strongly influenced by the accuracy of these forward 1996; Ritsema, 1999) is largely responsible for the enhanced resolution solutions. in this depth range (see also Figs. 7 and 8). Model differences at the The classical solution began with geometrical ray theory for high- remaining depths (see Fig. 9) accentuate the dichotomy in data sensi- frequency waves, which is effectively a zeroth order approximation to tivity between continents and major oceans: structure beneath the seismic wave equations emphasizing the onset time of seismic arrivals oceanic regions, most notably the Pacific Ocean, show the largest (see Section 2). The resulting sensitivity kernels of high-frequency ar- disagreements among the models. Mantle compositional variations rival times are reduced to 2D geometrical ray paths connecting sources (e.g., near the base of the mantle), which would enhance the model dif- and receivers (see Section 2). The applicability of ray theory relies on ference between P and S velocities (Masters et al., 2000; Saltzer et al., the assumption of weak and relatively large-scale lateral heterogene- 2001), are only partially responsible. Large model uncertainties clearly ities, which is not valid for structures with sizes that are comparable highlight the challenges in understanding the upper mantle transition to or less than the dominant wavelengths of the interrogated seismic zone, a well-documented under-sampled regime critical to the under- waves. Instead, the sizes of Fresnel zones, i.e., regions in the neighbor- standing of mantle circulation and mineralogical phase changes (see hood of geometrical ray paths that give rise to constructively interfering Fig. 8; Dziewonski, 2005; Fukao et al., 2001, 2009; Gu et al., 2001; arrivals, may be significant for global finite-frequency arrivals measured Romanowicz, 2003), and the bulk of the lower mantle (Boschi and from broadband recordings. For instance, the Fresnel zone width of a 5-s Dziewonski, 2000). A greater overall resolution in seismic tomography P wave with a wavelength of ~40 km is approximately 560 km at 60∘ likely requires major improvements in ocean-bottom seismic recording epicentral distance, which is similar to the resolution limit of global and body-wave phase analysis. tomographic inversions (Nolet, 2008, Chapter 2). The complicated

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 14 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx sensitivity kernels for reflected waves present further complications rays once model parameterization and damping are applied. Still, the (Deuss, 2009; Liu and Tromp, 2008; Shearer, 1993). Therefore, main issues hindering the improvement of global tomographic finite-frequency effects need to be considered during the computation resolution are the uneven distributions of events and stations, as well of body (at 5–40 s) and surface (40 s or longer, Ritzwoller et al., as the lack of ray coverage in parts of the mantle (see Section 3.1). As 2011) wave sensitivities in global tomographic inversions. global tomographic problems remain under-determined, regularization Decades of theoretical and computational development have aimed is inevitable and hampers a fair comparison between finite-frequency- to go beyond ray theory by combining solutions to seismic wave equa- and ray-based tomographic images (Montelli et al., 2006b; Panning tions with first-order Born approximation. Improved sensitivities of and Romanowicz, 2006). This provides a plausible explanation for the broadband seismic phases have been calculated by mode coupling lack of discernible improvement in the image resolution of global tomo- (e.g., Li and Romanowicz, 1995; Li and Tanimoto, 1993; Marquering et graphic models when approximate 2D and 3D sensitivity kernels were al., 1998), body-wave ray theory (e.g., Dahlen et al., 2000; Hung et al., used in place of rays (e.g., Kárason and van der Hilst, 2000; Li and 2000), surface-wave ray theory (e.g., Zhou, 2009; Zhou et al., 2004, Romanowicz, 1995). 2005), normal-mode summation (e.g., Capdeville, 2005; Zhao and However, the divergent views presented above do not diminish Chevrot, 2011a; Zhao and Jordan, 1998; Zhao et al., 2006)andfull2D/ the importance of choosing proper sensitivity kernels to map lateral 3D numerical simulations (Liu and Tromp, 2006, 2008; Nissen-Meyer heterogeneities, particularly with an increasing number of measure- et al., 2007a). These sensitivity kernels generally follow geometrical ments from global seismic network and regional seismic arrays ray paths for the selected phases, and occupy volumetric regions in (Romanowicz, 2008). In principle, the use of appropriate sensitivity the vicinity of the ray paths instead of being restricted to the center of kernels accounts for the finite-frequency nature of observations, al- the volume. In particular, Marquering et al. (1999) and Dahlen et al. lows different datasets to be consistently combined in tomographic (2000) showed that, counter-intuitively, sensitivity kernels for finite- inversions, and makes it possible to image heterogeneities of sizes frequency cross-correlation traveltime measurements are precisely similar to the first Fresnel zone (e.g., Chevrot and Zhao, 2007; Hung zero along the geometrical paths and therefore acquires the shape of et al., 2004). After all, consistent sensitivity kernels based on im- ‘hollow bananas’. Based on the paraxial ray approximation that sim- proved theory is paramount for the pursuit of more accurate solutions plifies the calculations of traveltimes and geometrical spreading factors to tomographic problems. for body waves, these sensitivity kernels can be computed efficiently and have been applied to long-period body waves in both global (e.g., Montelli et al., 2004, 2006b; Tian et al., 2009) and regional 4. Adjoint tomography: theory and implementations (e.g., Hung et al., 2004, 2010, 2011; Obrebski et al., 2011, 2012; Sigloch et al., 2008) tomographic problems. The counter-intuitive With the advancement of numerical modeling techniques and ‘banana-doughnut’ shape of these sensitivity kernels (and the novelty expansion of computational resources, numerical simulations of seismic of its name), as well as some striking global tomography images based wave propagation for complex and realistic 3D heterogeneous media on these kernels depicting the morphology of possible mantle plumes at both global and regional scales have now become routine practices (e.g., Montelli et al., 2004, 2006b), drew considerable attention from a (e.g., Castro and Käser, 2010; Komatitsch and Tromp, 2002a; Moczo et wide geophysical audience. This new set of mantle images also spurred al., 2007; Tromp et al., 2010a). Factors such as gravity forcing, rotation, significant controversies over its improvement in resolution (Fig. 10; attenuation, anisotropy, topography of the free surface and internal Dahlen and Nolet, 2005; de Hoop and van der Hilst, 2004; Montelli discontinuities that are important for global and/or regional seismology et al., 2006a; van der Hilst and de Hoop, 2005, 2006). de Hoop and can now be accounted for accurately. This naturally renders the possi- vanderHilst(2004)contended that the zero sensitivity of these bility of calculating finite-frequency sensitivity kernels based on im- banana-doughnut kernels on the ray paths is mainly the result of a com- proved forward solver (Liu and Tromp, 2006, 2008; Zhao et al., 2005). bination of factors (e.g., the use of ray-theory, choice of measurement, At the present time, however, numerical calculations of kernels for all assumption of background media, and model dimensionality), and individual source–receiver pairs (i.e., row vectors of matrix G) in realis- does not hold for realistic media, different choices of measurements, tic regional and global tomographic problems remain computationally and model parameterizations. Finite-frequency sensitivity kernels can prohibitive. be regarded as a form of regularization for infinitely thin rays, and Empowered by computational advancement, and potentially inspired thus exhibit similar effects in tomographic inversions as geometrical by the success of adjoint methods in meteorological (e.g., Courtier and Talagrand, 1987; Talagrand and Courtier, 1987) and geodynamical (e.g., Bunge et al., 2003) applications, (Tromp et al., 2005)madeimportant connections among time-reversal imaging, finite-frequency sensitivity kernels in seismic tomography and full waveform inversion popular in When Ray Theory meets Finite Frequency Approximation exploration seismology. Similar to ‘time-reversal mirrors’ used in med- ical imaging (Fink, 1992, 1997; Fink et al., 1989) where residuals of recorded data at receivers are physically time-reversed and transmitted to focus back at either original source locations or possible locations of model inadequacy, data residuals at seismic stations can now be time- reversed to obtain updates on earthquake source parameters based on 3D background models (Hjörleifsdóttir, 2007; Tape et al., 2007). The Ray Fat first step of this method is, in essence, a more sophisticated version of Ray the back-projection technique used for imaging earthquake rupture banana processes (e.g., Ishii et al., 2005; Larmat et al., 2010). On the other donut hand, similar to results derived in full waveform inversions (e.g., Tarantola, 1984a), Fréchet derivatives of structural model parameters (e.g., density and wave speeds) can be computed by the interaction of the regular forward wavefield in the reference model and the adjoint wavefield generated by setting time-reversed data residual signals at Fig. 10. A cartoon that underscores the heated debate centered on the resolution fi improvement of global tomographic models based on finite-frequency sensitivity kernels receivers as simultaneous ctitious sources, i.e., two simulations of seis- vs. simple ray approximation. mic wave equations. Applications of adjoint methods to seismological

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 15 problems have been similarly derived by Fichtner et al. (2006a,b); backward) simulation are needed to compute the gradient direction Fichtner (2010). and one (or few) simulation(s) are needed to estimate the step length The seismic inverse problem may be treated as an optimization prob- of model update along the direction. Serious concerns were also raised lem, where the optimal solution is sought to minimize a pre-defined mis- on the nonlinearity of the waveform misfit function, where the final fit criterion that quantifies the difference between data and synthetics. model may be trapped in local minimum (Gauthier et al., 1986). Initial When numerical solvers are used for the forward problem, the optimiza- solutions need to be close enough to the true model to keep the misfit tion/inverse problem is usually solved based solely on local gradients, i.e., function in the quasi-linear regime with respect to model perturbations Fréchet derivatives of the misfitfunction(e.g.,Epanomeritakis et al., (Mora, 1987). Numerical tests showed that surface reflected data 2008; Fichtner et al., 2009b; Tape et al., 2010), without forming and (equivalent to scattered/converted waves in earthquake seismology, accessing the Hessian matrix (i.e., the second-order derivatives of the see more discussions to Section 6.2), due to their nonlinearity with misfit function, see also Section 4.4.3) due to computational and storage respect to large wavelength structures, recover mostly the high- restrictions. Depending on the technical details of computational frequency component of the original model, while transmission data practices, this type of optimization method has appeared under many (equivalent to direct main arrivals in earthquake seismology) improve different names and caused some confusion in the literature. For the con- the resolvability of the low-frequency content of the model (Gauthier sistency of this paper, we adopt the name of adjoint tomography,and et al., 1986; Mora, 1988). It was noted that preconditioning of the gradi- loosely define it as tomographic inversion techniques based on numeri- ent direction based on simple geometrical arguments may help reduce cal simulations of seismic waves and local gradients computed by adjoint the number of iterations (Gauthier et al., 1986). Mora (1987) also de- methods. The following sections provide a detailed account of the origin, rived the FWI problem based on matrix formulation, and suggested derivation, merits, seismic applications and future outlook of adjoint that the best preconditioner is the inverse Hessian matrix such that tomography. the optimization problem can be solved by one quasi-Newton step. Moderate success has been achieved when applying FWI techniques 4.1. Full waveform inversions in exploration seismology to realistic industry data (e.g., Crase et al., 1990; Igel et al., 1996), and se- rious challenges regarding the nonlinearity of the waveform misfit There is a natural overlap between adjoint tomography, a subject function remain unsolved. Luo and Schuster (1991) remedied this prob- that recently became popular in earthquake seismology, and the full lem by introducing misfit functions that quantify the traveltime differ- waveform inversion (FWI) techniques that have existed for nearly ence between observed and synthetic seismograms and noted that no three decades in exploration seismology. Therefore it is impossible phase identification for ray-theoretical arrivals is required when to review the recent development on adjoint tomography without numerical modeling is used. Traveltime misfit functions are much referring to the voluminous literature in exploration seismology more linear compared to full-waveform misfit, though with reduced pertaining to FWI (see Virieux and Operto, 2009 for a recent review). model resolution. Another successful approach is the frequency- Based on the inverse theory developed by Tarantola and Valette domain iterative waveform inversion based on frequency-domain (1982), Tarantola (1984a) and Lailly (1984) formulated the acoustic numerical solvers. Models are first inverted at a low frequency where imaging problem as an optimization problem that seeks the best the initial guess is sufficiently close to the true model to guarantee the model parameters which minimize the generalized least-square linearity of the waveform function, and then successively inverted at sum of weighted data residual. Reflected wavefield is treated as a higher frequencies using inversion results from the previous frequency scattered wavefield generated by secondary sources associated with as the initial model for the current iteration (e.g., Brenders and Pratt, perturbations to the background velocity model. Based on single- 2007; Geller and Hara, 1993; Plessix, 2009; Pratt, 1999; Pratt and scatter approximation, the gradient of the nonlinear misfit function Worthington, 1990a,1990b; Sirgue and Pratt, 2004). As traveltime is calculated by correlation of forward propagation of the actual tomography based on full-wave equations resolves velocity anomalies source in the current model and a propagation backward in time of of the size of the first Fresnel zone, reflected waves and scattered the weighted data residuals at the receivers as virtual sources. The waves can resolve structures with sizes close to the dominant wave- backward wavefield was also referred to as ‘missing diffracted length used in the inversion (Wu and Toksoz, 1987). It was suggested wavefield’ or the ‘residual wavefield’. Tarantola (1984b) also showed that a combination of traveltime and waveform inversion will be that the first iteration of this optimization problem is equivalent ideal to recover both the long and short wavelength structures of the to pre-stack migration techniques based on imaging principles subsurface while keeping the inverse problem in the quasi-linear (Claerbout, 1971). However, instead of obtaining reflectivity images regime (e.g., Luo and Schuster, 1991; Pratt and Shipp, 1999). Multi- with no direct physical interpretation from pre-stack migration, scale inversions in the time-domain also proved to be promising nonlinear waveform inversions make improvements of physical based on similar principles (Virieux and Operto, 2009). model parameters such as velocities and density through successive iterations. Diffractions, refractions and multiple reflections are also 4.2. Applications of FWI to seismic imaging in the crust and mantle automatically accounted for through iterations when forward prob- lems are calculated based on numerical simulations. To improve the Although the theory and techniques of FWI can be directly applied well-posedness of the nonlinear inversions, this method seeks solutions to earthquake seismology, several essential distinctions exist. First that are close to a given a priori model and simultaneously minimizes there are the obvious lack of control in source locations and uneven the data residual. Formulations were later derived for elastic wave station distributions. Regional seismic imaging for the crust and man- equations (Tarantola, 1986) and anelastic wave equations (Tarantola, tle mainly utilizes teleseismic data, since abundant recordings of 1988). Parameter selection is also critical as long-period waveform local/regional earthquakes are available only for a few tectonically ac- inversions are mostly sensitive to seismic velocities while short- tive regions. Furthermore, earthquake seismologists still work most period waveforms are strongly affected by impedance contrasts in comfortably with traveltime data to map internal structures of the exploration-scale problems. (Tarantola, 1986), observations that reflect earth, and are more cautious towards utilizing amplitude anomalies the scattering properties of elastic waves (Wu and Aki, 1985). (or entire waveform) that are prone to attenuation or surface-wave Gauthier et al. (1986) and Mora (1987, 1988) demonstrated the focusing effects. Measurements are usually taken from individual feasibility of FWI methods based on finite-difference scheme. Monte body-wave phases and may consist of secondary arrival-time observ- Carlo inversion techniques (see Section 2.7) were dismissed at the ables from surface-wave waveforms (e.g., Cara and Leveque, 1987). time due to high computational cost, while the more efficient local- Since windowing is difficult to implement in frequency-domain FWI gradient techniques were favored, since only two (forward and techniques, traveltime inversions may be performed only in the

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 16 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx time-domain. Lastly, due to the computation cost of a single numerical a region with strong velocity variations that 1) honors predefined inter- simulation at global or regional scales, which ranges from less than an nal surfaces and material discontinuities, and 2) maintains a consistent hour to tens of hours even on modern parallel computers, inversion number of points-per-wavelength throughout the entire model for ac- schemes need to be designed to reduce the number of iterations and curacy considerations. External hexahedral meshing softwares are speed up convergence. often used (Casarotti et al., 2008; Peter et al., 2011a; Stupazzini et al., From the inversion setup point of view, individual sensitivity kernels 2009), and careful considerations on load balancing between individual of source–receiver pairs form row vectors of the design matrix G for processors (Peter et al., 2011a) are required to efficiently simulate wave tomographic inversions and are often computed to demonstrate the propagation on such a complicated mesh in parallel. effects of underlying approximations and finite frequency in global seis- One variant of SEM is proposed by Capdeville et al. (2003), wherein mology. However, as full waveforms are inverted in exploration waves are simulated based on the coupling between a 1-D normal- applications, computing sensitivities for individual data points of the mode solution for a spherically symmetric interior (such as the core) scattered waves (also defined as waveform perturbation density, Zhao and SEM solution for a 3D heterogeneous outer shell (such as the and Chevrot, 2011a) is impractical and unnecessary due to the crust and mantle) using a Dirichlet-to-Newman boundary-condition high-frequency nature of controlled sources. Instead, the gradient of operator. This method speeds up simulations without a significant loss the misfit function is computed in two simulations, equivalent to the of accuracy, and has been adopted by various global and continental- weighted sum of sensitivity kernels for scattered waves between all scale tomographic studies (Lekić and Romanowicz, 2011; Yuan et al., source–receiver pairs. 2011). Discontinuous Gerlekin (DG) methods also gained popularity in 4.3. Forward solvers for 3D velocity models seismology in recent years (e.g., Castro and Käser, 2010; Dumbser and Käser, 2006; Hermann et al., 2010; Käser and Dumbser, 2006). The efficiency and accuracy of the forward-modeling technique is Being more general in formulation than SEM, the DG approach can crucial to the success of inverse methods. Therefore we will summa- handle material discontinuities across element boundaries and be rize some of the common numerical techniques used to solve seismic operated on both tetrahedral and hexhedral meshes. In particular, it is wave equations for 3D heterogeneous models as a prelude to our dis- extremely useful for dynamic rupture simulations where high frequen- cussion on adjoint tomography. This section by no means includes all cy waves and super-shear near the fault can only be captured by a dens- references in the related fields, though, it is our hope that further in- er mesh and improved time stepping schemes (e.g., de la Puente et al., formation could be obtained from the references provided by this 2009). Despite being more computational costly, the use of tetrahedral review. elements makes it relatively easier to honor complicated fault systems With the development of computer technology and numerical and internal discontinuous surfaces in the mesh design for geologically techniques, significant advances have been made in solving seismic complex regions, therefore lending itself well to local and regional-scale wave equations numerically. Finite difference (FD) methods have seismic wave simulations. gained popularity since the early 80s due to their relatively simple for- In the context of seismic imaging and inverse problem, most material mulation (e.g., Igel, 1996; Igel et al., 2002; Kelly et al., 1976; Moczo et al., interfaces beneath the surface are not precisely known. To first order, 2007; Olsen et al., 1995). Pseudo-spectral methods (e.g., Carcione, most geological regions are composed of large blocks with relatively 1994; Carcione et al., 2005; Tessmer et al., 1992)havealsobeenused, smooth velocity variations within. In such cases, SEM may be a preferred although they were mostly restricted to smooth models. Both methods approach that strikes a good balance between computational cost and exhibit considerable difficulties with implementing free-surface bound- accuracy of mesh design. Certainly at the global scale, where velocity ary conditions for the accurate propagation of surface waves. On the variations are believed to be long-wavelength (100–1000 km) and other hand, finite-element method (FEM) implicitly incorporates natu- smooth (b10% in Vs) in most regions, SEM is still the most viable solu- ral boundary conditions (e.g., Akcelik et al., 2002; Bao et al., 1998)de- tion to simulate wave propagation in 3D. spite being more programmatically involved. In particular, a specific type of finite-element method, spectral-element method (SEM), was in- 4.4. Theory and implementations troduced to seismology from computational fluid dynamics (Patera, 1984) for both global (Capdeville et al., 2003; Chaljub and Valette, The basic formulation of adjoint tomography has been derived by 2004; Fichtner et al., 2009a; Komatitsch and Tromp, 2002a,2002b) numerous researchers, either based on Born approximation and reci- and regional (Komatitsch et al., 2004; Lee et al., 2008; Peter et al., procity of the Green's function (Tarantola, 1984a; Tromp et al., 2005), 2011a; Pilz et al., 2011; Stupazzini, 2006) seismic wave propagation in Lagrange multiplier approach (Akcelik et al., 2003; Liu and Tromp, 3D heterogeneous media (see Chaljub et al., 2007 for a review). It com- 2006), where the adjoint field is just the time reversal of the Lagrange bines the geometrical flexibility of FEMs with exponential convergence multiplier, or more general functional analysis where no assumption properties of spectral methods. Surface topography and internal discon- on Green's functions and reciprocity is required (Fichtner et al., tinuities are naturally accounted for by a finite-element mesh design. 2006a,2006b). The Lagrange multiplier approach also enabled (Liu Mesh doubling or tripling takes advantage of the velocity increase and Tromp, 2008) to derive expressions of Fréchet derivatives for with depth to minimize computational cost. Unlike classical FEM more complicated global seismic wave equations including factors wherein large linear systems need to be solved at each time step, the such as ellipticity, anelasticity and gravity. Here we only symbolically mass matrix of the SEM is diagonal due to the choice of Gauss– outline expressions for local gradients based on frequency-domain Lobbatto–Legrendre points as both interpolation and integration points wave equations, emphasizing on the differences (e.g., the choice of within hexahedral elements (Komatitsch and Tromp, 1999). This makes misfit function and the computation/storage challenges with utilizing SEM extremely efficient and scalable on parallel computers, especially numerical solvers) between adjoint and classical (either ray or finite- when ported to take advantage of recent Graphics Processing frequency) tomographic techniques. Unit (GPU) hardwares (Komatitsch et al., 2009, 2010). SEM has been applied to automatically simulate seismic wave propagation for 4.4.1. Definition of misfit function

Mw ≥3.5 events in southern California (Komatitsch et al., 2004, http:// Tomographic inversion aims to utilize as much information of www.shakemovie.caltech.edu)andMw ≥5.5 events at the global scale recorded seismograms as possible to map internal structures of the (Tromp et al., 2010a, http://www.globalshakemovie.princeton.edu). Earth. It can also be casted as an optimization problem, where an opti- However, despite the computational efficiency, it remains challenging mal model mopt(x) is sought to minimize a misfit functional that quan- to design a proper conforming hexahedral unstructured SEM mesh for tifies the difference between observed seismograms and synthetics

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 17 calculated based on model parameters m. For example, a least-square a seismic event. The dependence of synthetics s(x,m)onmodelm can misfit functional that quantifies the weighted residual between a specif- then be linearized based on the Born approximation (i.e., single- o ic type of measurement aerp, made on the p-th phase (or wave group) scatter approximation, Nolet, 2008, Chapter 4): of the data recorded by the r-th receiver for the e-th event, and the hi 2 2 corresponding model prediction aerp(m)isgivenby −ρω δs ¼ ∇·ðÞþC : ∇δs ω s δρ þ ∇·ðÞδC : ∇s ; ð22Þ hi Φ ¼ ∑ Φ ¼ 1 ∑ ∑ ðÞm − o 2; ð Þ We e We Wrp aerp aerp 18 i.e., the perturbation to the current wavefield δs is generated by second- e 2 e rp ary sources based on interactions of density and elastic tensor perturba- δρ δ fi where Φ and W are the misfit functional and weight associated with tions ( , C) with the current wave eld s. The Born approximation is e e fi the e-th event, respectively, and W is the weight assigned based on only a rst-order approximation excluding contributions from multiple- rp fi measurement error and the relative importance of this phase and re- scattering, and the perturbed wave eld calculated based on Eq. (22) fi ceiver in the inversion. will always arrive later than the unperturbed eld (Fichtner et al., 2006b). ω In principle, a minimization problem involving Eq. (18) could be Suppose the Green's function G(x,xs, ) for the forward problem solved using global optimization methods based on Monte Carlo simu- (21) can be computed for 1D and/or 3D reference models via approxi- lations or neural network (see Section 2.7). In addition to the optimal mate/semi-analytical methods or numerical simulations, the perturbed fi models achieved, the most probable region of model parameter space (scattered) wave eld then becomes may be sampled and the posterior probability distributions of model hi 2 parameters may be estimated, which provide a basis for probabilistic δsðÞ¼x; ω ∫GðÞx; x′; ω · ω sðÞx′ δρðÞþx′ ∇·ðÞδCxðÞ′ : ∇sðÞx′ dx′: ð23Þ tomography (Mosca, 2010; Resovsky and Trampert, 2003). However, these methods often involve many evaluations of the forward prob- This expression can be substituted into Eq. (20) to obtain the relation lem, which are generally impractical despite moderate success (e.g., between event misfit and material properties. In particular, if one de- Valentine and Trampert, 2011), due to high computational cost of nu- fines an adjoint wavefield generated by injecting time-reversed weight- merical solvers. Hence, local gradient methods which linearize the mis- ed residuals at receivers as virtual adjoint sources as fit function with respect to a reference model are favored, especially for hi high-resolution inversions at regional and global scales, but special care †ðÞ¼′; ω ∑ ðÞ− o ðÞω ðÞ′; ; ω ; ð Þ s x Wrp arp m arp grp ·Gx xr 24 must be taken to reduce the nonlinearity of the misfitfunctionand rp properly assess the non-uniqueness of the inverse problem (see Section 2.7; Snieder, 1998). then based on the source–receiver reciprocity of Green's function (Aki and Richards, 2002, Chapter 2), the variation of misfitfunction 4.4.2. Gradient of misfit function (20) becomes fi Φ Computing the gradient of mis t function for a single event e is a hi † key ingredient in tomographic inversions, although alternative data δΦ ¼ ∫∫s ðÞx′; ω · ω2sxðÞ′; ω δρðÞþx′ ∇·ðÞδCðÞx′ : ∇sxðÞ′; ω dx′dω reduction schemes have been adopted to take advantage of the sparsity ð25Þ of individual sensitivity kernels by stacking multiple event recordings δρ δc ¼ ∫ Ke ðÞx þ KeðÞx dx: for a common station (Capdeville et al., 2005). For brevity, the index ρ ρ c c dependence on event e will be omitted and only the misfit function for one event will be examined until later sections. Let us assume that In this formulation, expressions for Fréchet derivatives of Φ with re- the variation of the measurement δa(m) can be related to the variation spect to relative perturbations of m, also known as the event kernels of synthetics δs through a connective function (Luo and Schuster, 1991) (Tape et al., 2007), are given by grp: e 2 † KρðÞ¼x ρðÞx ∫ω sxðÞ; ω ·s ðÞx; ω dω δa ðÞ¼m ∫g ðÞτ ·δsxðÞ; τ; m dτ; ð19Þ ð26Þ rp rp r eðÞ¼− ðÞ∫∇sðÞ; ω ∇s†ðÞ; ω ω; Kc x C x x x d where xr is the receiver location. The applicability of adjoint tomography to a particular type of measurement hinges on our ability to derive the and involve the interaction of forward and adjoint wavefield. Detailed above analytical expression g(τ)(Fichtner et al., 2006a; Tape et al., expressions of sensitivity kernels for isotropic and general anisotropic 2010), which may be algebraically involved at times (P. Chen et al., 2010). parameterization can be found in Tromp et al. (2005) and Sieminski By substituting Eq. (19) into the variation of Eq. (18) and et al. (2007a,b). transforming the integration from time to frequency domain based on Note that the adjoint source used to generate the adjoint wavefield Parseval's theorem, the variation of misfit function can be rewritten as in Eq. (24) stems entirely from the definition of misfit function (18) and its variation Eq. (20). The choices of measurement and misfitfunc- hi fi δΦ ¼ ∫∫∑ ðÞ− o δðÞ− ðÞω δ ðÞ; ω; ω ; ð Þ tion affect the gradient of the mis tfunction(25) through sensitivity Wrp arp m arp x xr grp · sx m d dV 20 rp kernels (26), both involving the adjoint field.

∗ where grp(ω) is the complex conjugate of grp(ω), equivalent to a time 4.4.3. Hessian matrix reversal of grp(t)(Tarantola, 1984b). What has been computed so far is only the gradient of the misfit Given seismic wave equations in the frequency domain (for sim- function. To solve the minimization problem, it may sometimes be plicity, the effects of ellipticity, rotation and gravity are ignored) desirable to obtain the second-order derivatives of the misfitfunction. For simplicity, the spatial function of model m(x)isfirstcastinto −ρω2 ¼ ∇ ðÞþ: ∇ ; ð Þ s · C s f 21 discrete model parameter vector m through a set of basis functions (see Section 2.3). The misfit function can be linearized with respect to ρ where (x) is the background density distribution, C(x) is elastic a reference model m as tensor relating stress to strain (note that C can be generalized to a com- plex number C=C(ω) for a visco-elastic medium, relating stress to 1 Φ m þ Δ m ∼Φ m þ gT δ m þ Δ mT H Δ m; ð27Þ strain by T(ω)=C(ω):∇s(ω)), and f is the force term associated with 2

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 18 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx

∂Φ where g ¼ is the gradient vector obtained from expanding Eq. (26) s(x,t) and another targets the adjoint wavefield s†(x,t). However, ∂m ∂2Φ when Eq. (26) are transformed into the time-domain, e.g., H ¼ by the basis functions and 2 is the Hessian matrix. It follows ∂m T fi fi e ðÞ¼−ρðÞ∫ ∂2 ðÞ; − ⋅ †ðÞ; ; ð Þ from the de nition of event mis t (18) that Kρ x x t sxT t s x t dt 31 0 () hi 2 fi ∂2Φ ∂ a ∂a ∂a where T denotes the simulation end time, the forward eld and the ¼ ∑ − o rp þ rp rp : ð Þ fi 2 Wrp arp m arp 2 28 adjoint eld need to be convolved at any spatial point x and then ∂ m rp ∂ m ∂ m ∂m † evaluated at time T. This requires access to s and s at the respective times of t and T−t, which is generally unachievable when forward Under the assumption that the measurement function arp(m)is and adjoint simulations are run simultaneously. One apparent solu- fi quasi-linear with respect to m, or that the reference modelhi is suf - tion is to first carry out the forward simulation and save the entire − o wavefield as a function of time and space, then launch the adjoint ciently close to the true model such that the residual arp m arp simulation and at time t access the T−t slice of s stored on disk. is generally small, the first term on the RHS of Eq. (28) can be ig- This procedure could be challenging due to high computer storage nored. Furthermore, the computation of Hessian requires access to demand associated with saving the entire global/regional wavefields ∂a derivatives of individual measurements rp, which are row vectors (typically, tens-to-hundreds of millions of grid points and tens of ∂m of the design matrix in Eq. (5). Therefore, the approximate Hessian thousands of time steps). Having recognized that fewer number of matrix is given by grid points are required to accurately sample sensitivity kernels than to simulate seismic waves, Fichtner et al. (2009b) suggested a remedy by sub-sampling both in time and space based on the neces- ∂2Φ ∂a ∂a sary spatial and temporal resolutions. An alternative solution, which H ¼ ∼∑ W rp rp ¼ GT WG; ð29Þ 2 rp ∂ ∂ reduces the storage requirement at the expense of computational ∂m rp m m cost, is to simulate wave propagation backward in time with the last frame of the wavefield as end condition by exploiting the invari- where W is the weighting matrix with Wrp as diagonal elements. ance of the elastic seismic wave equation (where C(ω)=C) under 0 Neither G nor H involve actual data arp, and are fully determined by time reflection. In this case, while the forward field is reconstructed – source receiver geometry and known background model. The optimal backward in time, the adjoint simulation is performed forward in model parameters that minimize the quadratic function (27) can be time. At any given time t,theT −t slice of s and the t slice of s† are obtained by solving the linear system simultaneously updated in computer memory and henceforth com- bined to obtain the contribution of time slice t to the sensitivity HΔm ¼ −g; ð30Þ kernels in Eq. (26) (e.g., Akcelik et al., 2003; Liu and Tromp, 2006, 2008). Here the requirement on storing the entire forward wavefield although regularization has to be applied to the Hessian in most global is replaced by an extra simulation of reconstructing the forward field tomographic problems involving insufficient data coverage. backward in time, which also doubles memory requirement for In general, the misfit function can be approximated only locally by kernel calculations. Fig. 11 demonstrates the computation of a quadratic function (27), and successive iterations in which both H traveltime sensitivity kernel based on this approach for a single and g are recomputed based on updated velocity models m þ Δm source–receiver pair in a 2D homogeneous model. If attenuation is are necessary to achieve final convergence. If the full Hessian matrix included in the seismic wave equation (where C(ω)isacomplex is used as in Eq. (28), the inverse method is referred to as the number), back reconstruction of s becomes unstable and a check- Newton's method; if the approximate Hessian matrix is used as in pointing scheme, hence extra computer storage, would be required Eq. (29), it becomes Gauss-Newton's method (Nocedal and Wright, to periodically reset the reconstructed wavefield (Liu and Tromp, 2006). 2008). For the case of Φ=arp(m)inSection 4.4.2, i.e., an individual mea- As indicated in Section 4.4.3, the gradient of individual measure- surement, the gradient is also given by Eq. (25) where the adjoint ments may be expensive to compute, while the overall gradient of field is now generated by the time-reversed connectivity function the event misfit function (i.e., event kernels) can be retrieved based g*(ω) as adjoint source. In addition to one forward calculation for s, rp fi ∂a on the interaction of the forward and adjoint wave elds. The gradient one adjoint calculation is also required to compute rp. Therefore, the fi ∂m of the mis t function of all events in Eq. (18) is then simply a weight- total number of adjoint calculations to assemble row vectors of G relat- ed sum of all event kernels. Fig. 12 shows examples of event kernels ed to e-th event is ∼NRNP,whereNR is the average number of receivers generated for a synthetic checkerboard pattern based on a 2D uni- used in an event and NP is the average number of phases picked per form background velocity model. Each event kernel can be interpreted receiver. In comparison, the computation of g as in Eq. (30) requires as the patterns of velocity anomalies ‘seen’ by the measurements just one additional adjoint simulation where residuals for all phases pertaining to this particular event and skewed by its particular and receivers are injected back as adjoint sources simultaneously. source–receiver geometry. When Hessian matrix is present, this When Green's functions are obtained through computationally expen- ‘skewed’ view can be corrected or sharpened by applying the inverse sive numerical methods, assembling H becomes extremely costly and Hessian as a preconditioner, and velocity model is simply updated in −1 is generally avoided. Instead, optimization methods such as nonlinear one iteration step as Δ m ¼ H g. However, when Hessian is not avail- conjugate-gradient techniques based only on gradient functions able, multiple (6–7 for 2D examples in Tape et al., 2007) nonlinear (Fichtner et al., 2009b; Fletcher, 1987; Tape et al., 2007) can effectively conjugate-gradient iterations are necessary to resemble the effect of approximate the operation of H−1 by applying polynomial functions of one Hessian iteration (Akcelik et al., 2002; Chen et al., 2007b; Tape et G (the forward operator) and GT (the adjoint operator) through succes- al., 2007). sive iterations. 4.4.5. Comparisons with classical tomographic methods 4.4.4. Time-domain implementation Although derived under the framework of optimization methods, In practice, when seismic wave equations are solved by numerical Eq. (30) is also essential in classical tomographic inversions (i.e., the techniques, one simulation is performed to obtain the forward field normal equation of (5)). In this sense, classical ray-based and finite-

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 19

Regular Wavefield Adjoint Wavefield Interaction Field Event Kernel

36˚

35˚

34˚ t = 0000 s 33˚

32˚

36˚

35˚

34˚ t = 0048 s 33˚

32˚

36˚

35˚

34˚ t = 0096 s 33˚

32˚

36˚

35˚

34˚ t = 0144 s 33˚

32˚ -120˚ -119˚ -118˚ -117˚ -116˚ -115˚ -120˚ -119˚ -118˚ -117˚ -116˚ -115˚ -120˚ -119˚ -118˚ -117˚ -116˚ -115˚ -120˚ -119˚ -118˚ -117˚ -116˚ -115˚

-0.9 0.0 0.9 -2.25 0.00 2.25 -0.09 0.00 0.09 -0.45 0.00 0.45 s (φ,θ, t), 10-3 m s†(φ,θ, t), 10-8 kg-1 s [F] K´(φ,θ, t), 10-7 m-2s-1 [F] K(φ,θ, t), 10-7 m-2 [F]

Fig. 11. Computation of sensitivity kernels by the interaction of the forward wavefield (first column) and the adjoint wavefield (second column) at four different time slices in a simple 2D homogeneous model. The contribution to sensitivity kernels by one particular time slice is shown in the third column, and the integrated value up to that time slice is shown in the fourth column. The forward field is constructed backwards in time while the adjoint field is constructed forward in time from top to bottom rows. Note the sensi- tivity kernel for a single traveltime measurement shows a ‘cigar’ shape in 2D instead of ‘banana-doughnut’ shape in 3D. Courtesy of Tape et al. (2007).

frequency-based tomography only differ in how H and g are computed. inversion step can be taken for regions with strong velocity heterogene- In classical ray tomography, the design matrix G consists of effective ities (e.g., the crust), even though the misfit function is highly nonlinear lengths of source–receiver rays associated with model parameters, with respect to model parameters. Any successive iteration which whereas different approximate or semi-analytical methods have been requires recomputing the H matrix under the updated model will be in- used to calculate Green's function G(x,xr,t) as well as the corresponding feasible or computationally expensive, and hence is less often attempted sensitivity kernels in the G matrix (see more discussions in Section 3.2) in classical tomography (e.g., Bijwaard and Spakman, 2000; Lekić and for the case of finite-frequency tomography. By adopting approximations Romanowicz, 2011). or semi-analytical approaches, Green's functions can be computed relatively quickly and the H matrix can be assembled efficiently to obtain 4.4.6. Sensitivity kernels for 1D reference velocity models a model update. However, the approximate nature of these techniques With the exception of lithosphere and portions of the D″ layer, the restricts them to the application of specificwavetypesorsimple(pre- amplitude of 3D mantle heterogeneities on the global scale is relatively dominantly 1D) starting models. Consequently, only one Hessian-based small with respect to 1D reference models. Therefore, sensitivity

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 20 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx

(a) Event kernel 04 (b) Event kernel 05 (c) Event kernel 06

(d) Event kernel 07 (e) Misfit kernel (25 events) (f) Target phase speed model (data)

-1.8 0.0 1.8 -1.35 0.00 1.35 -6 -3 0 3 6 K ( φ,θ ) ( 10-7 m-2 s) K ( φ,θ ) ( 10-6 m-2 s ) % pert. from 0.00 km/s

Fig. 12. Event kernels generated for a 2D synthetic example. The ‘true’ model shown in graph (f) is a checkerboard pattern of phase velocity perturbations from a uniform back- ground model. The first four graphs (a–d) show event kernels for four different events (locations indicated by stars), illustrating their ability to ‘see’ the true model. The sum of all twenty-five event kernels (i.e., gradient vector g) is shown in graph (e), which indicates the direction of model update in the first iteration. Courtesy of Tape et al. (2007).

kernels for most long-period global body and surface wave phases may for the forward wavefield s(x, xs, t). Sensitivity kernels for a single mea- not differ significantly for 1D reference models and 3D tomographic surement pertaining to a source–receiver pair can hence be constructed models (Liu and Tromp, 2008; Zhou et al., 2011), i.e., classical tomo- based on Eq. (26), which provides extreme flexibility to compute both graphic inversions based on 1D reference models may still be an effi- gradient vector g and individual kernels that are the building blocks cient and sufficient approach to provide velocity model updates at for G and H; the inverse problem is then reduced to the classical case long wavelength scales (Capdeville et al., 2003). There has been great involving Eq. (30). SI methods also enable experimentation with inver- interest in efficient computation of sensitivity kernels for 1D velocity sion schemes, e.g., weighting schemes (We,Wrp) and exclusion/inclusion models based on accurate forward modeling tools such as normal- of certain sources, receivers or phases, without extra simulation for mode theory (Zhao and Chevrot, 2011a,2011b), frequency-domain either g or H. The biggest gain is in the formal resolution analysis algorithms (Friederich and Wielandt, 1995), Direct Solution Methods where checkerboard tests can now be performed to determine the (Geller and Ohminato, 1994), algorithms that reduce 3D SEM calcula- resolution and error of inverted model. tions to 2D (Nissen-Meyer et al., 2007b, 2008), as well as full 3D SEM SI methods are particularly appealing when the number of receivers simulations (Liu, 2009). In particular, sensitivity kernels for up to 1-s are not excessive (i.e., 3Nr forward solutions can be attained in reason- teleseismic P waves can be efficiently obtained with modest able amount time), and the simulation domain is of relative small size computational resources (Fuji et al., 2011). This is particularly crucial such that G(x,xr,t) as a function of (x,t) can be archived on hard disks for phases that cannot be treated accurately by ray theory or asymptotic for speedy access later. However, if the simulation domain is expanded, methods, such as core diffracted phases that significantly deviate from or shorter periods of seismic waves are inverted, storage may become a assumed diffracted ray paths but are critical for resolving structures daunting issue even after interpolation schemes are introduced to above the core–mantle boundary (Manners, 2008; Wysession, 1996; project the simulation grid onto a coarser grid more reflective of the Zhao et al., 2000). frequency content of the waves (Chen et al., 2007a; Zhao et al., 2006). The other disadvantage is that kernels of single measurements are com- 4.4.7. Scattering-integral methods puted based on the starting model, hence the Hessian matrix H is only Another flavor of tomography methods based on full numerical valid for this model. When the actual structure deviates from the refer- simulations, known as scattering-integral (SI) methods (Chen, 2011; ence model significantly, e.g., up to 10−30% as in some local (Tape Chen et al., 2005, 2007a), differs from adjoint tomography by its et al., 2009) and regional (Fichtner et al., 2009b) tomographic problems, approach in obtaining the adjoint wavefield in Eq. (24). When only a one Hessian iteration may not be enough to fully eliminate the data limited number of receivers are involved, Green's functions G(x,xr,t) residuals and successive iterations with Green's functions and Hessian can be computed numerically for all receivers and stored on temporal matrices recomputed based on updated velocity models are necessary. and spatial grids. This amounts to a total of 3Nr number of simulations For example, structural inversions for the Los Angeles basin in Chen (Eisner and Clayton, 2001) with an additional Ns number of simulations et al. (2007a) showed a maximum 10% of shear-wave speed variations

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 21

∂ in one quasi-Newton step, while adjoint tomographic inversions in calculate derivatives s more accurately by including ad-hoc path- Tape et al. (2010) displayed up to 30% perturbation after 16 adjoint ∂m averaged phase delays (Woodhouse and Dziewonski, 1984) for global iterations for a similar region. As one Hessian iteration may be equiva- surface waves (Panning et al., 2009; Romanowicz, 2008), most tomo- lent to a few gradient-only iterations (usually related to the number graphic applications opt to invert traveltime or frequency-dependent of distinct eigenvalues of the Hessian, for example, approximately 6–7 phase measurements (see Section 2). Amplitude and waveform, for the 2D examples in Tape et al., 2007), the computation and storage which are also strongly affected by earthquake source parameters, demand for SI methods may outweigh the flexibility of Hessian inver- are generally used to verify final inverted models (Chen et al., 2007a; sions for regions with strong velocity contrasts. Overall, the optimal Fichtner et al., 2009b; Tape et al., 2010). tomographic technique based on numerical simulations, either SI or Selecting proper phases and wavegroups with reasonable similar- adjoint methods, is problem specific, and may be subjected to change ity in waveform and relatively small traveltime delays is crucial for with the rapid advances in computer and storage technologies (Chen the successful application of adjoint tomography. However, except et al., 2007b). for problems involving relatively small datasets, where manual selec- On the other hand, the storage of strain Green's function fields for tion and screening are possible (e.g., Chen et al., 2007a; Fichtner et al., seismic stations ∇G(x,xr,t) for SI methods also allows rapid seismic 2009b), an automatic or semi-automatic scheme is needed to mine source determinations independent of, or as a part of, structural through the ever-growing broadband datasets. Automatic phase pick- inversions (see also Section 5.6). Stand-alone source inversions based ing algorithms have been applied to body-wave phase selection for on adjoint methods and full numerical simulations are feasible decades (e.g., Bai and Kennett, 2000; Earle and Shearer, 1994; Houser (Hingee et al., 2011; Kim et al., 2011; Liu et al., 2004), but can be numer- et al., 2008; Sigloch and Nolet, 2006). The use of 3D simulation made ically expensive. Therefore, source inversions based on stored Green's it possible to select windows based solely on waveform similarity and function databases for 3D reference models in SI methods provide an ef- phase delays without a priori knowledge of associated phases. Maggi ficient alternative for improving the accuracy of earthquake source pa- et al. (2009) scan seismograms for ‘pulse’ or ‘wavegroup’-like features, rameters in strongly heterogeneous regions (Lee et al., 2011). and select windows that satisfy criteria on cross-correlation, amplitude ratio, and time shift between data and synthetics within the windows. 5. Adjoint tomography: inversion schemes This automatic scheme has been successfully applied to both global (Bozda et al., 2011b) and regional (Kim et al., 2011; Peter et al., In this section, we will discuss several aspects of adjoint tomograph- 2011b; Zhu et al., 2012) datasets, and was a key component of the ic inversion schemes, particularly in consideration of the intense adjoint tomography for southern California crust (Tape et al., 2010). computation and storage requirements of numerical simulations. In Weights used in the misfit function (18) can be assigned based on principle, tomographic problems can be treated as an unconstrained measurement errors such as those estimated from similar measure- optimization problem, with a vast amount of literature available from ments (e.g., P arrival differences at vertical and radial seismograms, the field of applied and computational mathematics (Nocedal and Chen et al., 2007a). Alternatively, they can be associated with the Wright, 2006). This concept has proved successful to some degree as remaining differences between data and synthetics after measurement demonstrated by both the elegant mathematical derivations and corrections are applied (Tape et al., 2010), which further reinforces the synthetic examples in Akcelik et al. (2002, 2003). However, applications requirement on window similarity. Weight functions can be also select- to realistic crustal and mantle imaging require more practical con- ed based on wave amplitudes within windows in order to balance siderations, e.g., choices of misfit functions, weighting schemes, contributions from different wave types (Fichtner et al., 2009b; Tape preconditioners used for nonlinear conjugate techniques, as well as et al., 2010). parameterization and regularization. Careful considerations of these subjects pay great dividends in decreasing the ill-posedness of inverse problems and speeding up the convergence of velocity 5.2. 3D initial models and parameterization models. One of the main advantages of adjoint tomography compared to 5.1. Misfit functions and measurements classical tomography is that it allows the use of 3D reference models and accurate computation of corresponding sensitivity kernels. Due The choices of measurements made to data and synthetics may to the equivalent computational cost of numerical simulations, 3D include typical cross-correlation traveltime and amplitude measure- reference models could readily replace 1D reference models as the ments of body waves (Hung et al., 2000; Luo and Schuster, 1991; starting point of adjoint tomographic inversions. The selected 3D Sigloch and Nolet, 2006), frequency-dependent phase and amplitude starting models could either originate from existing tomographic measurements of surface waves (Tape et al., 2010; Zhou et al., 2004) studies (Fichtner et al., 2009b) or represent an integration of existing based on multi-taper techniques (Laske and Masters, 1996; Park et al., results from tomography, local , and seismic reflection/refrac- 1987; Thomson, 1982), instantaneous phase differences and enve- tion profiles (Komatitsch et al., 2004; Plesch et al., 2011). This ap- lope ratios (Bozda et al., 2011a), and other frequency-dependent proach is dubbed 3D–3D tomographic inversion, where the two functionals such as those based on short-time Fourier transform references of 3D correspond to the background velocity model and (Fichtner et al., 2008), or generalized seismological data functional physical simulation domain, respectively. By this definition, classical to- (Chen, 2011). Waveform difference between data and synthetics mographic studies based on 1D reference models can be generally con- prevalent in exploration problems (see also Section 4.1) has also sidered as 1D–3D inversions (Tape et al., 2007). The use of 3D initial been utilized in global tomographic settings (Li and Romanowicz, models usually produces synthetics that more closely resemble data, 1996; Panning and Romanowicz, 2006). However, it is recognized which not only increases the number of windows or phases selected that the relation between waveform residual Δu=s−d and model for iterative inversions, but also improves the applicability of Born or perturbations can be highly nonlinear in the case of large phase Rytov approximation on nonlinear misfit functions. delays (Panning et al., 2009), and the validity of Born approximation For global and continental scale applications, surface-wave wave- (see Eq. (22)) requires the initial velocity model to be close to the forms are strongly affected by the 3D heterogeneous crust (Bozda and true model. In comparison, the Rytov approximation relating Trampert, 2008; Lekić et al., 2010). The simulation accuracy of surface traveltime delay ΔT to model perturbation is quasi-linear, which is waves is directly linked to the ability to mesh the crust accurately, even valid for traveltime delays up to a quarter of the dominant period for relatively long period waves (Capdeville and Marigo, 2007). Unfortu- (Chen et al., 2007b). Although attempts have been made recently to nately, honoring the Moho such as those provided by CRUST2.0 (Bassin

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 22 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx et al., 2000), e.g., adapting to the thin oceanic crust, may require small forward and adjoint simulations (Fichtner et al., 2009b; Liu and mesh sizes and significant computation cost for the forward simulation. Tromp, 2008; Tape et al., 2007). It is worth noting that when smoothing Possible remedies are proposed by introducing a smooth and effective is applied to gradient vectors, a similar smoothing operation should be medium for the crust that is equivalent to the original model by applying applied to inverted models at each iteration in order to conform to the corrections based on homogenization of varying coefficients (Capdeville same data rule (i.e., Eq. (5), Chiao et al., 2010). and Marigo, 2007), matching observations such as local dispersion curves (Fichtner and Igel, 2008), or inverting for a smooth crustal 5.4. Step length and preconditioner model based on short-period group velocity dispersion maps (Lekić and Romanowicz, 2011). It is desirable to design inversion schemes that reduce the number Depending on the dataset and measurements used (see Section 5.1), of 3D numerical simulations of the seismic wave equation. When only either isotropic (Chen et al., 2007a) or anisotropic (Fichtner et al., 2010, gradient vectors are available to minimize the misfit function (18), also see Section 2.5) material properties can be targeted for the inver- nonlinear conjugate gradient (CG) methods could be the most natural sion procedure. Tarantola (2005) (section 6.3.3 thereof) indicated that solutions (Fletcher, 1987). In nonlinear CG methods, successive search the bulk-sound and shear-wave speeds are independent, therefore (i.e., conjugate-gradient) directions are defined as linear combinations constitute a better set of parameters than the conventional compres- of the current gradient directions and previous search directions. sional and shear velocities. Traveltime or phase delay measurements Model update requires the computation of a step length that minimizes are less sensitive to density variations for inversions of relatively the misfit function in the given search direction, i.e., a line search long-period waves (Tarantola, 1986). However, density kernels proved (Nocedal and Wright, 2006). Quadratic and cubit interpolations, to be instructive in identifying internal sharp velocity contrasts at which require one (forward) and two (forward and adjoint) extra shorter periods (Liu and Tromp, 2006; Luo et al., 2009; Zhu et al., simulations, respectively, can be performed to obtain an optimal step 2009). Attenuation structures have also been inverted in combination length (Tromp et al., 2005). Since both schemes offer similar conver- with elastic structures for 2D synthetic cases based on waveform misfit gence rates (Tape et al., 2007), the quadratic approach is favored in functions (Askan et al., 2007). However, attenuation has not been line searches of most adjoint-type inversions when the quasi-linear inverted for extensively in adjoint tomography due to its significant misfit function could be approximated locally by a quadratic function tradeoff with focusing/de-focusing on wave amplitudes. Therefore, we of step length along the search direction (Fichtner et al., 2009b; mainly focus on inversions of elastic velocity structures in the following Gauthier et al., 1986; Vigh et al., 2009). Alternative line search algo- sections. rithms can be applied (Fletcher, 1987; Nocedal and Wright, 2006), as long as a balance is struck between extra computational cost and misfit 5.3. Regularization reduction (Akcelik et al., 2002). Practical constraints, e.g., envelope misfitreduction(Fichtner et al., 2009b), may be used to guide the selec- Due to the non-uniform distribution of sources and receivers and tion of step length. To reduce the computational cost at each iteration, a the uneven spatial coverage of sensitivities of phases or wavegroups, representative subset of events may be chosen for the determination of the inverse problem, i.e., solving linear system (30), is generally optimal step length (Bozdaǧ et al., 2011b; Zhu et al., 2012). under-determined and ill-posed, while the associated solution are Weights defined in the misfitfunction(18) are effective pre- non-unique (Tarantola and Valette, 1982, also see Section 3). Some conditioners to the Hessian matrix that emphasize certain measure- form of regularization is needed to stabilize the formal action of ments while down-weighing others. More generally, a preconditioner inverting the Hessian matrix. For Newton type of methods, a damping P can be applied to Eq. (30) to give an equivalent linear system parameter is often used and its value is selected from L-curves that Δ ¼ : ð Þ capture the tradeoff between model norm and reduction in misfit PH m Pg 32 function value (Aster et al., 2005; Tape et al., 2007). In principle, basis functions for inversion parameters could be built on the 3D nu- P is usually a symmetric positive definite matrix that helps reduce merical simulation grids in adjoint tomography (Akcelik et al., 2002; the condition number of this linear system and improve the conver- Tape et al., 2010). However, grid spacings used in simulations (dictated gence rate. In particular, the ideal form of P is the inverse Hessian by computation accuracy requirements) are generally much smaller matrix H−1 (Pratt et al., 1998), which reduces 6–7 steps of nonlinear than the actual model resolvability (dictated by the bandwidth of CG iterations to a simple one-step Hessian inversion for some synthetic given dataset and source–receiver coverage), and using all grid points experiments (Chen et al., 2007b; Tape et al., 2007). As mentioned in increases the ill-posedness of the inverse problem. Regularization may Section 4.4.3,theH−1 preconditioner acts as a ‘lens’ that scales and be explicitly introduced into the misfit function by providing con- sharpens the image of a gradient vector into an actual model update. straints on model variations such as norm or roughness (Tarantola, The diagonal terms of H also provide natural scaling of different types

2005). Although L2 norms are ubiquitously used in regularizing tomo- of parameters as shown by Tape et al. (2007) and Kim et al. (2011). −1 graphic problems due to its simple formulation, L1 norms of model var- In adjoint tomography, H is not formulated and applying H as a iations also prove to be effective in preserving discontinuities in preconditioner is impractical. However, when sensitivity kernels of material properties (Akcelik et al., 2002; Askan et al., 2007; Aster et measurements can be relatively cheaply calculated by approximate al., 2005, Chapter 7). methods (Hung et al., 2000; Li and Tanimoto, 1993; Zhou et al., 2004) On the other hand, adjoint tomography relies mainly on the gradient or based on 1D background velocity models (Zhao et al., 2000, 2006), vector of misfit function for model update at each iteration, and iterates it is conceivable to apply an approximate H−1 as a preconditioner to successively towards models with improved data fits at the expense of speed up the convergence (Shin et al., 2001). Having realized that the increased model norm (or roughness). Spurious model updates in Hessian matrix essentially captures information on the source–receiver regions of inadequate source–receiver coverage are less likely to occur geometry and measurement bandwidth, Igel et al. (1996) and Fichtner in the iteration process. The number of iterations is selected based on et al. (2009b) used less-elaborate preconditioners based on simple geo- convergence criterion, which is similar to the discrepancy principle metrical arguments to minimize the excessively large sensitivity at used in selecting the ‘best’ solution on an L-curve (Aster et al., 2005, sources and receivers. This approach is closely linked to the diagonal Chapter 4). Still, implicit regularization may be introduced by simple of Hessian (Virieux and Operto, 2009), and may have similar effects as local smoothing of gradient vectors, which also eliminates excessively regularization (Tape et al., 2007). Alternatively, Tape et al. (2009, large amplitudes of sensitivity kernels at sources and receivers due to 2010) computed the ‘best’ model update in each iteration as a linear singularities introduced by point moment-tensors or forces in the combination of event kernels based on a source subspace method

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 23

(Sambridge et al., 1991), which exploited the common features of dif- structures depicted by the final shear velocity model at shallow depth ferent event kernels and effectively preconditioned the gradient vec- (2nd column of Fig. 13a). Waveform fits are significantly improved in tors. In this case, no extra test solution needs to be evaluated for successive iterations such that an extra usable window is automatically quadratic interpolations in the gradient direction, and the number of selected from the transverse component seismogram in the final itera- equivalent iterations to one Hessian-based step in the nonlinear inverse tion (see Fig. 13b). problem may be further reduced. On the other hand, Hessian matrix and the local curvature of the 5.5. Model verification, appraisal and resolution test misfit function can also be approximated based on previous iterations (Akcelik et al., 2003). An extra adjoint simulation used in cubic interpo- With the ability to numerically simulate wave propagation in 3D, lation for step length provides additional local curvature information adjoint tomography (as well as scattering-integral methods) can that can be incorporated into the updates of the approximate H−1 in verify the final velocity model in terms of both reduction of measure- BFGS-type quasi-Newton optimization algorithms (Epanomeritakis et ment residuals and waveform fit itself. Tape et al. (2010) computed al., 2008; Nocedal and Wright, 2006), also known as variable metric synthetics for a set of independent ‘extra’ events that were not used methods (section 3.4.3 of Tarantola, 2005). Applications of these in the inversions for their final tomographic model. Reductions in approximate H−1 as preconditioners to gradient vectors speed up the both phase delay times and waveform misfits are statistically similar convergence and are more advantageous than nonlinear CG methods to those for events used in inversions. For inversions based on in practice (Nocedal and Wright, 2006). In particular, in the limited- traveltime or phase delay measurements, amplitude measurements memory BFGS method (L-BFGS, Byrd et al., 1995), local curvature infor- also provide independent datasets to validate the final velocity mation is stored as a set of vectors saved over iterations without the models (Chen et al., 2007a; Fichtner et al., 2009b; Tape et al., 2010). need to form actual H−1 matrices; this approach is more adaptable to Usually more phases and windows are selected as iteration pro- the procedures of adjoint tomography. Most importantly, the final gresses, reflecting that the inverted models successively approach approximate H−1 (i.e., the posterior covariance operator) provides a the true model, and the final model may produce synthetics that possible means to estimate model errors (Tarantola, 2005,morediscus- offer significantly more usable windows or phases under the same se- sions in the next section). lection criteria (see Fig. 13b). This practice effectively improves data Akcelik et al. (2002, 2003) approached the nonlinear optimization coverage. problem with Gauss–Newton methods. H was not formulated explic- The diagonal terms of the H matrix provide information on how itly (for the same reasons discussed in Section 4.4.3), however, the well a domain is sampled based on the given source–receiver geom- matrix–vector product of H with any model update vector q can be etry and selected phases/wavegroups. Similar to the ray density calculated by two extra simulations, one forward and one adjoint. maps prevalent in classical tomography, volumetric coverage kernels The ability to compute Hq allows the linear system (30) to be solved can be constructed by omitting measurement residuals in the adjoint iteratively based on matrix-free Gauss–Newton–Krylov iterations. sources defined in Eq. (24). It provides an ad hoc way of understanding The number of inner iterations (Akcelik et al., 2002, Table 4.2) for model resolvability and offers guidance to the interpretation of final every Gauss–Newton update is consistent with those demonstrated models (Tape et al., 2010). However, formal model error analysis and by Tape et al. (2007) and Chen et al. (2007b). However, the total resolution tests common in classical tomography (as discussed in number of outer iterations drastically exceeds those in Tape et al. Section 3.1) are much more difficult for adjoint tomography. As the (2009) and Fichtner et al. (2009b), due to the significantly larger ve- formal resolution matrix of model update in each Newton step of a locity perturbation (∼100%) used in their synthetic examples as well Hessian-based inversion is given by as the choice of simple unweighted waveform misfit function. In a practical tomographic problem, a reasonable 1D or 3D model from −1 other studies can always be sought as an initial model for which R ¼ GT G þ Γ GT G ; ð33Þ velocity perturbations will be much less (e.g., b30% in regional-scale and b10% in continental-scale studies), and convergence can be achieved in significantly fewer (typically b20) iterations. where Γ is a matrix representing regularization applied, the lack of indi- Another common practice to reduce the non-uniqueness of the vidual sensitivity kernels (i.e., row vectors of G) in adjoint tomography inverse problem is to adopt a multi-scale approach (e.g., Akcelik makes it impossible to compute the full resolution matrix. Even check- et al., 2002; Askan et al., 2007) or iterative bootstrap inversion process erboard tests common in classical tomography require the full adjoint (Pratt and Shipp, 1999). To avoid being trapped in local minima, initial inversion machinery when only gradient vectors are available, i.e., one solutions need to be sufficiently close to true solutions. Datasets are first checkerboard test is numerically as costly as one full 3D adjoint inver- filtered at longer periods and inverted for model parameters on a sion. Nevertheless, attempts were made to design checkerboard tests coarser grid corresponding to long-wavelength structures. The new to evaluate model resolutions at both long and short length scales. model is then used as the initial model in a subsequent inversion of Fig. 14 illustrates checkerboard tests performed by Fichtner et al. shorter-period data for finer-scale structures, where data residuals (2009b) for adjoint tomographic inversions of the Australasian region. (such as traveltime or phase delay) will presumably have been reduced The input model (‘true’ model) is a checkerboard pattern (3∘ ×3∘) compared to those based on the original reference model. Iterative pro- onto a high-velocity patch (~25°×25°) mimicking positive velocity cedures therefore help resolve increasingly finer scale structures in the anomalies in the central and western part of the region. When velocity close vicinity of the ‘true’ model. Practical variations of this technique perturbations of the checkerboards are set to be 3%, 6% and even as include successively decreasing the upper cutoff period over iterations large as 9%, both the long and short wavelength structures are well (Fichtner et al., 2009b) or combining measurements from different recovered by adjoint inversions except in regions with limited ray cov- period bands, where the longer period data dominate the first few iter- erage. In classical tomography, the checkerboard test is simply a ma- ations while more measurements from shorter periods are successively chinery to gauge the resolution of the design matrix (G), where the incorporated in later iterations (Tape et al., 2010). Fig. 13 shows the ap- linearization of measurements with respect to model parameters has plication of adjoint tomography to the inversion of southern California already been made. In contrast, these checkerboard tests compute syn- crustal structure (Tape et al., 2009) based on an automatic windowing thetic data based on full 3D numerical simulations, thus providing met- scheme (Maggi et al., 2009). Large velocity variations are successfully rics on the effect of source–receiver and kernel coverages as well as the resolved after 16 iterations as shown in the final model update (up to nonlinearity of the misfit function, particularly in the presence of strong ±30% in the last column of Fig. 13a) and detailed 3D heterogeneous velocity variations.

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 24 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx

a

b 90 100 110 120 130 140 150 80 90 100 110 120 130 140 150 80 90 100 110 120 130 140 ΔT = 9.60 s ΔT = 9.85 s

Z−1D R−1D T−1D

ΔT = 7.00 s ΔT = 7.30 s

Z−m00 R−m00 T−m00

ΔT = 4.90 s ΔT = 5.15 s

Z−m01 R−m01 T−m01

ΔT = 2.90 s ΔT = 3.05 s

Z−m04 R−m04 T−m04

ΔT = 0.95 s ΔT = 1.00 s ΔT = 0.05 s

Z−m16 R−m16 T−m16

80 90 100 110 120 130 140 150 80 90 100 110 120 130 140 150 80 90 100 110 120 130 140 150

Fig. 13. Adjoint tomographic inversions of southern California crust. (a) Comparisons of the initial model, final model after 16 iterations, and model update of shear velocities at the depth of 10 km. The final model exhibits velocity perturbations up to ±30% and depicts geological features such as sedimentary basins, exhumed batholiths and lithological con- trast across faults. Event 9983429 is denoted by a beach ball and station DAN is denoted by a red triangle. (b) Evolution of waveform fits between data and synthetics at station DAN for event 9983429 over iterations. Both are filtered between 6 and 30 s. Windows automatically picked by FLEXWIN algorithm (Maggi et al., 2009) are indicated by the gray boxes.

Note the successive improvement in waveform fits from the initial model m00 to the final model m16, and the extra window picked on the transverse component after 16 iterations. Courtesy of Tape et al. (2009).

Realizing that the inverse Hessian matrix H−1 provides an approx- extra adjoint simulation is required for each iteration to construct imation to the covariance matrix of inverted model parameters as- the approximate H−1 matrix, and the ultimate computation cost suming individual data are uncorrelated (Tarantola, 2005), Stadler may even be less than those of nonlinear CG methods due to im- (2010) made efforts to estimate significant eigenvalues and the corre- proved convergence of BFGS-type of inversion algorithms and poten- sponding eigenvectors of the H matrix through numerical methods tially more accurate line-search based on cubic interpolation. (e.g., Lancos algorithm) based solely on matrix–vector product of H Therefore BFGS-type of inversion schemes may hold promise to with a given directional vector q (Saad, 2011). In this case, the com- both improved convergence and model resolution analysis in the fu- putation of Hq can be achieved by an extra forward and adjoint sim- ture. To effectively sample the posterior probability distribution of ulation (Akcelik et al., 2002). However, hundreds of such operations model parameters, access to the square-root of H−1 is required may be required to obtain significant eigenvalues of H, which is still (Tarantola, 2007) which prompts the use of square-root variable met- impractical for most realistic applications. Alternatively, the approxi- ric inversion techniques (Hull and Tapley, 1977). mate H−1 matrix saved over iterations can be naturally used as an ap- An apparent advantage of scattering-integral methods over adjoint proximate posterior model covariance matrix if a cubic interpolation methods is the ability to perform a formal resolution analysis, thanks scheme is used for step-length selection (Tromp et al., 2005) and to the availability of the entire design matrix G (Chen et al., 2007a). BFGS-type of inversion schemes are selected (see also Section 5.4). The effect of nonlinearity of the misfit functions will also be reflected The diagonal elements of the approximate H−1 provide error-bars in checkerboard tests as synthetics are calculated numerically for 3D of model parameters, while the off-diagonal elements are associated synthetic models. However, as indicated in Section 4.4.7, more than with correlations between different model parameters. Only one one scattering-integral iteration may be necessary to achieve further

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 25

Fig. 14. Resolution tests for adjoint tomographic inversions of the Australasian region. Initial models are 3∘ ×3∘ checkerboard patterns of various velocity perturbations (3%,6%,9%) superposed onto a 25∘ ×25∘ high-velocity patch. The 1D background velocity model suitable for this regional is given by Fig. 7 of Fichtner et al. (2009b). Resolution tests undergo the same adjoint tomographic procedures as the actual 3D inversions and recover both the high velocity patch and checkerboard patterns of three different perturbations in regions with sufficient source–receiver coverage. Courtesy of Fichtner et al. (2009b).

misfit reduction and convergence for regions with strong velocity numerical simulations of seismic wave propagation offers accurate contrasts. kernels of selected windows on seismograms. No prior knowledge of phases related to these time windows is assumed, which allows 5.6. Source inversions all possible time windows (either as impulses or wave groups) to be included in adjoint inversions. Compared to classical tomography, in Because the misfit between data and synthetics is caused by the which only few known phases are selected and measured, more usable intertwined effects of unknown source and structural variations, windows selected for adjoint tomography effectively increase the accurate knowledge of earthquake source parameters is critical in ray-path coverage under the same source–receiver geometry. Another achieving unbiased model updates in tomography iterations. Similar important consequence is that 3D models, inverted by ray- or finite- to classical tomographic inversions (e.g., Manners, 2008; Roecker frequency-based classical tomographic techniques, can now be et al., 2006), adjoint tomographic inversions may explicitly include exploited as initial models for adjoint tomography since the simulation source parameters as part of the inversion parameter set (Chen et time of 3D models is similar to that of 1D models. Successive lineariza- al., 2007a). However, for most cases, alternate source and structure tion of the inverse problem at updated models and the ability to inversions are performed where updates in one help reduce the bias recalculate gradients of misfit functions in every iteration are critical in the other (Fichtner et al., 2009b; Tape et al., 2009). Since model for the ultimate model convergence, particularly for regions with large updates are moderate between conjugate gradient iterations, source velocity variations. Synthetics for the final models can be computed inversions for location and/or mechanisms only need to be performed again by 3D numerical simulations, thus enabling true validation on at selected iterations (Tape et al., 2010). Even though Akcelik et al. final models. (2003) inverted both a simple finite-fault source model and structural At regional and global scales where velocity variations are moderate, parameters from displacement recordings of a 2D synthetic example, hybrid quasi-Newton methods have been proposed whereby misfit source-time history of events are not inverted in most practical tomo- functions are evaluated by accurate 3D numerical simulations and an graphic problems due to the severe nonlinearity of the misfit func- approximate partial derivative matrix G is computed based on asymp- tions. Events are carefully selected based on magnitudes and source totic methods (Lekić and Romanowicz, 2011; Yuan et al., 2011). The ac- durations are generally much smaller than the main period bands of curate 3D Green's functions reduce the effect of heterogeneous crust on the data used for structural inversions (Fichtner et al., 2009b; Tape surface waves and systematic errors associated with measurements, et al., 2010). In practice, finite-fault inversions based on adjoint tech- while the approximate G matrix exonerates the inversion from costly niques (Kremers et al., 2011) can be sufficiently decoupled from struc- gradient calculations required by adjoint tomography. However, in the tural inversions and represent simple extensions of the popular back- case of strong velocity heterogeneities, the asymptotic approximation projection techniques for fault rupture imaging (e.g., Ishii et al., 2005; of the G matrix may break down and a less accurate G matrix will Larmat et al., 2008). require a significantly greater number of iterations for the final conver- gence. Despite the high computation cost, adjoint tomography may be 6. Adjoint tomography: discussions and future directions the ultimate tool to produce new generation of high-resolution images for strongly heterogeneous regions such as the crust with Here we summarize major advantages of adjoint tomography large Moho and/or surface topography variations (Tape et al., 2010), methods. The use of accurate Green's function through 3D full or tectonically active regions such as subduction zones (M. Chen et al.,

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 26 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx

2007), hotspots (Montelli et al., 2004) and continental collision zones 2008). When the noise sources are uniformly distributed (or, ambient), (M. Chen et al., 2010). it has been shown that the cross-correlation (or its derivative) of noise Adjoint tomography also has its own share of challenges in addition records at two nearby receivers is symmetric about the zero lag time to high computational cost. Localized gradient methods require initial and equivalent to the Green's function (Snieder, 2004). This result solutions to be sufficiently close to the ground truth so that final solu- forms the basis for an increasing number of high-resolution, tions are not trapped in local minima of model space. This condition noise-based images of regional seismic structure; successful examples in turn requires either well-educated initial guesses of the media or include California (e.g., Sabra et al., 2005; Shapiro et al., 2005), North well-designed misfit functions that reside within the linear regime. America (e.g., Lin et al., 2008, 2010; Moschetti et al., 2007, 2010; Yang Convergence can be extremely slow if the gradient scheme is not et al., 2008, 2011), China (e.g., Yao et al., 2006; Zheng et al., 2008), properly preconditioned. Designing the appropriate preconditioner for Northeast Asia (e.g., Cho et al., 2007; Nishida et al., 2008; Zheng et al., seismic inverse problems involving millions of model parameters can 2011), Africa (e.g., Yang et al., 2008), Europe (e.g., Villaseñor et al., also be a challenging task, although the end result could lead to signifi- 2007; Yang et al., 2007) and Oceania (e.g., Saygin and Kennett, 2012). cant reduction in the required number of iterations. Empirical damping The theory behind noise correlation underwent stages of develop- in classical tomography is replaced by implicit regularization of the ment since Aki published his seminal studies on the theory and auto/ gradient vector. The sum of sensitivity kernels of residual measure- cross-correlation of microtremors (e.g., from traffic signals in Tokyo, ments designates regions of models that require updates, rendering Aki, 1957) over a half century ago (Aki 1956, 1957). The concept of the inversion process stable. However, without the Hessian matrix (or correlation-Green's function equivalence was reiterated through its inverse), formal resolution analysis becomes difficult. Standard autocorrelations of plane waves at different depths in a layered checkerboard test which provides information on the resolvability medium, later known as the Clarebout conjecture (e.g., Claerbout, of anomalies of various sizes requires solving the full optimization prob- 1968; Rickett and Claerbout, 1999), and the presence of seismic lem again, and therefore is computationally infeasible unless numerical response in the time-domain cross-correlation function was validated simulations become much less costly with hardware improvement in both in theory (Lobkis and Weaver, 2002) and in observations of unfil- the future (Komatitsch et al., 2009, 2010). The BFGS-type of inversion tered seismic coda waves (Campillo and Paul, 2003). Subsequent stud- schemes with approximated inverse Hessian matrix may be a viable ies using reciprocity and scattered waves (Snieder, 2004; Wapenaar, remedy at this stage. Formal checkerboard test and resolution analysis 2004)ledtoincreasedconfidence in tomographic maps under the as- for adjoint tomography remain areas of active research (Fichtner and sumption of ambient noise sources. Trampert, 2011a, 2011b). Technically, for a spatially uniform broadband noise distribution With ever-expanding computing power, applications of adjoint to- and uniform medium with velocity v, the time derivative of the mographic inversions based on full 3D numerical simulations flourished noise correlation function (NCF) of two receivers separated by a dis- in the past five years. They generated and are in the process of generating tance L can be approximated by the sum of the time-domain Green's robust and high-resolution images of various regions of the Earth's inte- functions from the forward and time-reversed wavefields (Roux et al., rior at local (Chen et al., 2007a; Tape et al., 2010), regional (Fichtner 2005; Sabra et al., 2005; Snieder, 2004). Mathematically, this is et al., 2009b, 2010; Peter et al., 2011b; Zhu et al., 2012)andglobal expressible as (Sabra et al., 2005; Weaver and Lobkis, 2003), (Bozda et al., 2011b) scales. Applications to other geologically and tec- tonically interesting regions are also on the horizon (M. Chen et al., dC ij ¼ −G ðÞþr ; r ; t G ðÞr ; r ; t ; ð34Þ 2010), especially with the continued improvement in numerical tech- dt ij 1 2 ji 2 1 niques, computational infrastructure and inversion schemes adapted to local-gradient based inversions. Combined with geodynamical where Cij corresponds to the NCF for the i-th station and j-th compo- modeling, the interpretation of these improved high-resolution images nent. The time-domain Green's function Gij(r1;r2,t) is the displace- will further our understanding of the temperature, composition and ment response in direction j at location r2 to a unit concentrated dynamics of the Earth's interior. impulse input in direction i at location r1 (Sabra et al., 2005). The In addition, with the proliferation of both permanent (e.g., Chen above approximation presents an effective means to extract the et al., 2006; Gee et al., 1996; Kanamori et al., 1991; Okada et al., 2004) Green's function based on the derivative of the correlation function, and transportable arrays (e.g., Levander et al., 1999; Nábelek et al., or occasionally by the correlation function between two stations 2009; Velasco et al., 2007), the concept of adjoint tomography could (e.g., Bensen et al., 2007; Campillo and Paul, 2003; Shapiro et al., be highly valuable to the analysis of various datasets in array seismology 2005). The lag times of the correlation peaks offer reliable constraints (Gu, 2010; Levander and Nolet, 2005; Rost and Thomas, 2002, 2009). on regional shear/Rayleigh wave speeds (e.g., Bensen et al., 2007; Modeling of ambient noise correlation functions and scattered/ Shapiro et al., 2005; Yang et al., 2008; Yao et al., 2006), anisotropy converted waves of teleseismic phases are two prime examples that (e.g., Lin et al., 2009; Moschetti et al., 2010; Yao and van der Hilst, will benefit from the framework of adjoint tomography. The following 2009; Yao et al., 2010) and attenuation (Lawrence and Prieto, 2011; sections briefly review these two types of measurements and discuss Levshin et al., 2010; Lin et al., 2011) down to ∼100 km depth. how they may be naturally incorporated into adjoint tomography. Aside from uncertainties associated with the ‘ambient noise’ assumption (e.g., Brzak et al., 2009; Stehly et al., 2006; Yang and 6.1. Noise correlation tomography Ritzwoller, 2008), noise correlation tomography is, in principle, no dif- ferent from earthquake tomography. For this reason, both classical Since the late 1990s, techniques based on the cross-correlation of and adjoint tomographic approaches pertaining to earthquake observa- ambient noise fields have been overwhelmingly successful for a broad tions are equally valid in the extraction of structural parameters from spectrum of problems in physics, geophysics (both global and ambient noise. Furthermore, Tromp et al. (2010b) computed sensitivity exploration-based), engineering, acoustics and medical imaging (see kernels for ensemble-averaged noise correlation measurements based reviews by Weaver and Lobkis, 2004; Snieder et al., 2009). Data and on the interaction of an ensemble forward wavefield and an ensemble applications vary substantially among the various disciplines, but adjoint wavefield, similar to methods detailed in Section 4.4.4.By the vast majority of the studies aim to complete one or both of the fol- injecting a signal with the power spectrum of the noise at the first lowing tasks: 1) to extract the Green's function between two receivers station, the ensemble forward calculation first simulates and stores a by cross-correlating their noise records (Aki, 1957; Shapiro and generating field at the source region of the noise (e.g., the surface of Campillo, 2004), and 2) to locate and characterize the source of domi- the Earth), and then uses the generating field as virtual sources for nant noise signals (e.g., Gu et al., 2007; Sabra et al., 2005; Yang et al., another forward calculation. The ensemble adjoint field is simulated

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 27 by injecting the time-reversed measurement between synthetics and and later surface reverberations and multiple converted waves (such observed ensemble-averaged correlation functions at the second sta- as PpPs, PpSs+PsPs, etc.) are primarily observed on the radial compo- tion as fictitious sources. Kernels calculated based on the interaction nent. Therefore, in principle, deconvolution of the vertical (Z)fromthe of these two ensemble wavefields for surface-wave cross-correlation radial (R) component will remove the source signature and isolate the functions sometimes display ‘extruding parabolic jets’ in addition to receiver function response due to mode conversions; a rotation from the typical ‘banana-doughnut’ shape (see also Section 3.2), which may Z–R to the P–SV ray coordinate system is often applied, and the SV com- be explained by sensitivities of multi-orbit surface waves to both ponent is subsequently deconvolved from the P component to obtain the minor and major arcs. Though the kernel calculation for ensemble- receiver function. In reality, P coda waves (i.e., vertical receiver func- averaged noise measurements require an extra forward simulation tions) may also contain information about the subsurface scatterers and the storage of the generating wavefield, it also allows the imple- (e.g., Langston and Hammer, 2001; Pavlis, 2005), and the effective re- mentation of nonuniform noise sources. Therefore tomographic inver- moval of source signature remains an open-ended topic of discussion sions based on properly calculated sensitivity kernels for ensemble- (e.g., Li and Nfibelek, 1999; Langston and Hammer, 2001; Bostock, averaged correlation functions may provide a greater understanding 2004; Escalante et al., 2007; C.W. Chen et al., 2010). of structures embedded in the ambient noise as well as the nature of Based on a similar approach, teleseismic S receiver functions can be the noise sources. calculated for the analysis of S-to-P conversions below station (or source) locations (Farra and Vinnik, 2000; Vinnik and Farra, 2002; 6.2. Receiver functions and inverse scattering methods by passive seismic Wilson et al., 2006). Unlike P-receiver functions, S-receiver functions arrays are not prone to interference from multiple P waves, hence are highly informative and complementary in mapping mantle stratifications Regional-scale, high-resolution imaging has become popular in such as the lithosphere–asthenosphere boundaries (LAB; e.g., Rychert the past two decades, thanks to the proliferation of dense seismic and Shearer, 2011; Rychert et al., 2005) and transition zone discontinu- arrays deployed around the world. In order to take advantage of ities. However, S-receiver functions can be extracted only for a limited dense station spacing, which varies from tens to hundreds of kilome- range of epicentral distances (Yuan et al., 2006), and therefore provide ters, seismic imaging methods have been adapted to map crust and less data than P-receiver functions. mantle structures at passive receiver arrays. For example, traditional Local crustal thickness and Poisson's ratio can be estimated from re- body-wave tomographic techniques have been modified for passive ceiver functions at individual stations (e.g, Niu and James, 2002; Yan seismic array analysis by measuring multi-channel cross-correlation and Clayton, 2007; Zhu and Kanamori, 2000), while 2D/3D subsurface traveltimes of teleseismic waves between stations (e.g., Vandecar image sections can be constructed through migration and stacking in and Crosson, 1990), and attribute traveltime residuals solely to veloc- an array setting (e.g, Dueker and Sheehan, 1997; Levander et al., ity anomalies underneath the observing array; surface-wave phase 2005; Wilson and Aster, 2003). Transverse component seismograms velocity maps are more frequently determined in combination with in- could also be utilized to constrain the seismic anisotropy in the crust cident wavefields based on two- or multi-plane wave assumptions for and mantle (Levin et al., 2008). In addition to arrival times and ampli- passive seismic arrays (e.g., Forsyth et al., 1998; Friederich and tude information, receiver functions may be inverted for physical struc- Wielandt, 1995; Pollitz, 1999; Yang and Forsyth, 2006). These tech- tural parameters (velocity and density perturbations) based on inverse niques led to hundreds of regional tomography studies and greatly im- scattering techniques widely used in exploration seismology, e.g., the proved knowledge on regional velocity structures (e.g., Chang and van generalized Radon transform (e.g., Bostock and Rondenay, 1999; der Lee, 2011, Dando et al., 2011; Hung et al., 2010; Mercier et al., Bostock et al., 2001; Rondenay et al., 2005; Shragge et al., 2001). 2009; Obrebski et al., 2011, Waldhauser and Tolstoy, 2011, Yang and Other minor phases surrounding the main arrivals, e.g., coda waves Forsyth, 2006). and precursors, provide further information on potential scatterers and In addition to seismic tomography, amplitudes, arrival-times and interfaces at depth. For instance, van der Hilst et al. (2007) and Wang et waveforms of reflected, converted and scattered seismic waves from al. (2008) used scattered ScS precursor and SKKS coda waves to detect interfacial or heterogeneous structures are effective measures of multiple piecewise continuous interfaces in the D″ layer beneath fine-scale structures and changes in rock elastic properties beneath Central and North America, while Cao et al. (2011) and Gu et al. seismic arrays. As stated in Section 2.4.1, inclusion and modeling of (2012) used SS precursors to image the upper-mantle discontinuities these secondary phases are a necessity, rather than a luxury, for a beneath the central and western Pacific ocean, respectively. greater accuracy in crustal and upper-mantle imaging/inversion. The Adjoint tomography can be readily adapted to robust high- list of popular secondary phases that may benefit from seismic arrays resolution imaging of localized structures based on converted or includes ScS reverberations (e.g., Gaherty and Jordan, 1995; Revenaugh scattered waves. Isochrons (e.g., Bostock et al., 2001) may be replaced and Jordan, 1989), SS and PP precursors (Shearer, 1991; Shearer and by sensitivity kernels that are constructed based on the interaction of Masters, 1992), P-to-S/S-to-P converted waves (Langston, 1977; forward teleseismic wavefield corresponding to the main phase and Vinnik, 1977)andP/S wave codas (Langston, 1989; Langston and adjoint wavefield generated by time-reversing converted or scattered Hammer, 2001). The first two phase groups are particularly well suited waves. Constructing the forward teleseismic wavefield at the receiver for global mapping of mantle stratification (see Section 2.6; Deuss, region by propagating the field from the earthquake source to the re- 2009; Flanagan et al., 1998; Gu et al., 1998; Shearer, 1993), though ceiver array is too costly, as numerical simulations of wave propagation their dominant frequencies (b0.1 Hz) are less than ideal for resolving accurate to typical P (1–2s)andS (3–6 s) receiver-function periods are structural variations with characteristic length scales less than 300 km far from practical (Komatitsch et al., 2010). However, if one assumes (Neele et al., 1997). This section will emphasize the key properties that scattered waves are only caused by local structures directly under- and future prospect of imaging based on scattered and converted neath receiver (for receiver functions) or reflection points (for PP and SS waves, a phase group that exhibits broad frequency spectra and strong reverberations), then a significantly smaller section of the teleseismic sensitivities to both crust and mantle structural variations at variable wavefield centered on the area of imaging interest would be needed length scales. in the kernel computations. Hybrid methods can be applied to simulate Studies of converted waves have greatly benefited from the receiver- wave propagation in the 3D local media in which initial and boundary function techniques, which was introduced in the late 70s (e.g., Langston, conditions are exchanged with rapidly computed forward wavefield 1977; Vinnik, 1977) and gradually became a routine method in seismic for 1D background models (Bielak and Christiano, 1984; Bielak et al., array analysis (e.g., Chen et al., 2009; Levander et al., 2005; Rondenay, 2003; Bouchon and Sanchez-Sesma, 2007; Godinho et al., 2009; 2009). Converted teleseismic P waves at the Moho (such as Ps phase) Moczo et al., 1997; Opršal et al., 2009; Robertsson and Chapman,

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 28 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx

2000). For receiver function studies of local arrays where incident Acknowledgment teleseismic wavefield can be well approximated by plane waves (Liu and Chen, 2011), forward wavefield for a 1D background medium can First and foremost, we would like to thank Timothy Horscroft and be easily computed by semi-analytical methods for layered media Hans Thybo from Tectonophysics for the invitation to contribute to (Aki and Richards, 2002; Zhu and Rivera, 2002). On the other hand, this review. They have shown extraordinary patience and understand- fast and accurate semi-analytical methods for 1D global background ing in the past 1.5 years during the drafting and reviewing of this models have to be employed for large aperture arrays where the manuscript. We thank Po Chen and Andreas Fichtner for their detailed flat-earth approximation is no longer valid (Chevrot et al., 2011). and encouraging reviews. We thank Carl Tape, Andreas Fichtner for Similar to adjoint tomographic methods in global and regional kindly providing some of the figures. We also would like to thank Jeroen studies, the advantages of inverting converted/scattered waves Tromp, Guy Masters, Shu-huei Hung, Yingjie Yang, Dimitri Komatitsch, based on adapted adjoint tomography for passive arrays are many Li Zhao, Chin-wu Chen, Ling-yun Chiao, Huajian Yao, Adam Dziewonski . First, the use of full 3D numerical simulations, i.e., accurate solu- and the late Albert Tarantola for their inspiring and fruitful discussions. tions to the full wave equations with plane wave or teleseismic wave Many figures were originally made using the GMT software (Wessel inputs, enables the incorporation of all possible phases into the and Smith, 1991). Q. Liu is supported by the Discovery Grants of the same time windows. For example, when interfaced with 1D global Natural Sciences and Engineering Research Council of Canada (NSERC) models, sensitivity kernels for time windows including scattered and the University of Toronto Startup Fund. Y. J. Gu is currently and converted P waves properly account for the incoming multiple supported by NSERC and Alberta Geological Survey. P phases, thereby their use reduces artifacts in the interpretation of reflecting interfaces and scatterers. Furthermore, these sensitivity References kernels are three-dimensional by construction and, as discussed in Section 4.1, not only exhibit a similar effect as Kirchhoff migration Acton, F.S., 1997. Numerical Methods that Usually Work. The Mathematical Association in the standard receiver function analysis (e.g., Levander et al., of America. Akcelik, V., Biros, G., Ghattas, O., 2002. Parallel multiscale Gauss–Newton–Krylov methods 2005), but also provide directions of actual physical model updates for inverse wave propagation. Proceedings of the 2002 ACM/IEEE Conference on as in the inverse scattering techniques (e.g., Bostock et al., 2001; Supercomputing, pp. 1–15. Rondenay et al., 2005). In addition, based on hybrid methods, sensi- Akcelik, V., Bielak, J., Epanomeritakis, I., Biros, G., Ghattas, O., Fernandez, A., Kim, E.J., tivity kernels can also be computed for 3D background models pro- Lopez, J., O'Hallaron, D., Tu, T., Urbanic, J., 2003. High resolution forward and inverse earthquake modeling on terascale computers. Proceedings of the 2003 duced by previous tomographic studies based on measurements ACM/IEEE Conference on Supercomputing, p. 52. from local earthquakes as well as noise/multi-channel cross correla- Aki, K., 1956. Correlogram analyses of seismograms by means of a simple automatic – tions. With 3D background models, interfaces may be mapped to computer. Journal of Physics of the Earth 4, 71 79. Aki, K., 1957. Space and time spectra of stationary stochastic waves with special reference more precise depths and the resulting image may appear sharper. to microtremors. Bulletin of the Earthquake Research Institute 35, 415–456. Other advantages such as model iterations and validations are also Aki, K., 1968. Seismological evidence for the existence of soft thin layers in the upper carried over. mantle beneath Japan. Journal of Geophysical Research 73, 585–594. fl Aki, K., Lee, W.H.K., 1976. Determination of three-dimensional velocity anomalies In short, adjoint tomography for converted, re ected and scattered- under a seismic array using first P arrival times from local earthquakes, 1. A homo- waves at passive arrays may become a vital tool in high-resolution geneous initial model. Journal of Geophysical Research 81, 4381–4399. imaging of tectonically and geodynamically interesting regions in the Aki, K., Richards, P.G., 2002. Quantitative Seismology, 2nd edition. University Science Books. near future. Albarede, F., van der Hilst, R.D., 2002. Zoned . Philosophical Transac- tions of the Royal Society of London, Series A 360, 2569–2592. Anderson, D.L., 1961. Elastic wave propagation in layered anisotropic media. Journal of Geophysical Research 66, 2953–2963. 7. Conclusions Anderson, D.L., 1966. Recent evidence concerning the structure and composition of the Earth's mantle. Physics and Chemistry of the Earth 6, 1–131. Seismic tomography has made significant strides since its practical Anderson, D.L., 1989. Theory of the Earth. Blackwell Scientific Publications. applications in the late 70s. The increasingly detailed images of the Anderson, D.L., Dziewonski, A.M., 1982. Upper mantle anisotropy: evidence from free oscillation. Geophysical Journal of the Royal Astronomical Society 69, 383–404. crust and mantle greatly improved our knowledge of the Earth's inter- Ando, M., Ishikawa, Y., Wada, H., 1980. S-wave anisotropy in the upper mantle under a nal structures and dynamics, as well as their potential connections to volcanic area in Japan. Nature 286, 43–46. surface . The continued improvement in image resolution can Antolik, M., Gu, Y.J., Ekström, G., Dziewonski, A.M., 2003. J362D28: a new joint model of compressional and shear velocity in the Earth's mantle. Geophysical Journal Inter- be largely attributed to advances in instrumentation, proliferation of national 153, 443–466. permanent regional seismic networks, and numerous deployments Askan, A., Akcelik, V., Bielak, J., Ghattas, O., 2007. Full waveform inversion for seismic of transportable seismic arrays. While uneven spatial data coverage velocity and anelastic losses in heterogeneous structures. Bulletin of the Seismological Society of America 97, 1990–2008. remains a major challenge in further improving seismic imaging resolu- Aster, R.C., Borchers, B., Thurber, C.H., 2005. Parameter Estimation and Inverse Problems. tion, theoretical and computational advances, the focus of this review, Elsevier Academic Press. made it possible to interpret the fast-expanding datasets with greater Backus, G.E., 1962. Long-wave elastic anisotropy produced by horizontal layering. Journal of Geophysical Research 67, 4427–4440. accuracy. Among the various classical and current methods, we Backus, G.E., 1964. Geographical interpretation of measurements of average phase highlighted adjoint tomography as a promising tool for a new velocities of surface waves over great circular and great semi-circular paths. Bulletin generation of regional and global imaging efforts. By taking advantage of the Seismological Society of America 54, 571–610. Backus, G.E., Gilbert, F., 1967. Numerical applications of a formalism for geophysical in- of accurate 3D forward simulations, this inversion method could verse problems. Geophysical Journal of the Royal Astronomical Society 13, 247–276. resolve strongly heterogeneous structures with improved resolution. Backus, G.E., Gilbert, F., 1968. The resolving power of gross earth data. Geophysical The future of seismic imaging also rests on our abilities to develop Journal of the Royal Astronomical Society 16, 169–205. and promote innovative approaches to extract more information from Backus, G.E., Gilbert, F., 1970. Uniqueness in the inversion of inaccurate gross earth data. Philosophical Transactions of the Royal Society of London, Series A 266, 123–192. seismic recordings and reliably interpret them. While only a handful Bai, C.Y., Kennett, B.L.N., 2000. Automatic phase-detection and identification by full use of such new and exciting datasets were highlighted in this review, it of a single three-component broadband seismogram. Bulletin of the Seismological – should be recognized that inversions based on a wide range of data Society of America 90, 187 198. fi Bao, H., Bielak, J., Ghattas, O., Kallivokas, L.F., O'Hallaron, D.R., Shewchukb, J.R., Xu, J., 1998. could bene t from the framework of adjoint tomography and 3D Large-scale simulation of elastic wave propagation media on parallel computers. numerical simulations. The outcomes of these inversions could provide Computer Methods in Applied Mechanics and Engineering 152, 85–102. further insights into regional and global processes at resolution and Bassin, C., Laske, G., Masters, G., 2000. The current limits of resolution for surface wave tomography in North America. EOS Transactions AGU 81, F897. precision far exceeding those achievable by standard tomographic Becker, T.W., Boschi, L., 2002. A comparison of tomographic and geodynamic mantle models. techniques. , Geophysics, Geosystems 3. http://dx.doi.org/10.1029/2001GC000168.

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 29

Becker, T.W., Kellogg, J.B., Ekström, G., O'Connell, R.J., 2003. Comparison of azimuthal Capdeville, Y., 2005. An efficient born normal mode method to compute sensitivity ker- seismic anisotropy from surface waves and finite strain from global mantle- nels and synthetic seismograms in the Earth. Geophysical Journal International circulation models. Geophysical Journal International 155, 696–714. 163, 639–646. Beghein, C., Trampert, J., 2004. Probability density functions for radial anisotropy: im- Capdeville, Y., Marigo, J.J., 2007. Second order homogenization of the elastic wave plications for the upper 1200 km of the mantle. Earth and Letters equation for non-periodic layered media. Geophysical Journal International 170, 217, 151–162. 823–838. Bensen, G.D., Ritzwoller, M.H., Barmin, M.P., Levshin, A.L., Lin, F.C., Moschetti, M.P., Capdeville, Y., Vilotte, J.P., Montagner, J.P., 2003. Coupling the spectral element method Shapiro, N.M., Yang, Y., 2007. Processing seismic ambient noise data to obtain reliable with a modal solution for elastic wave propagation in global Earth models. Geo- broad-band surface wave dispersion measurements. Geophysical Journal International physical Journal International 152, 34–67. 169, 1239–1260. Capdeville, Y., Romanowicz, B., Gung, Y., 2005. Towards global earth tomography using Bhattacharyya, J., Masters, G., Shearer, P.M., 1996. Global lateral variations of shear wave the spectral element method: a technique based on source stacking. Geophysical attenuation in the upper mantle. Journal of Geophysical Research 101, 22,273–22,289. Journal International 162, 541–554. Bielak, J., Christiano, P., 1984. On the effective seismic input for non-linear soil- Cara, M., Leveque, J.J., 1987. Waveform inversion using secondary observables. structure interaction systems. Earthquake Engineering and Structural Dynamics Geophysical Research Letters 14, 1046–1049. 12, 107–119. Carcione, J.M., 1994. The wave equation in generalized coordinates. Geophysics 59, Bielak, J., Loukakis, K., Hisada, Y., Yoshimura, C., 2003. Domain reduction method for 1911–1919. three-dimensional earthquake modeling in localized regions, part I: theory. Bulle- Carcione, J.M., Helle, H.B., Seriani, G., Plasencia Linares, M.P., 2005. Simulation of tin of the Seismological Society of America 93, 817–824. seismograms in a 2-D viscoelastic Earth by pseudospectral methods. Geofisica In- Bijwaard, H., Spakman, W., 1999. Tomographic evidence for a narrow whole mantle ternational 44, 123–142. plume below Iceland. Earth and Planetary Science Letters 166, 121–126. Carlson, R.L., Christensen, N.I., 1979. Velocity anisotropy in semi-indurated calcareous Bijwaard, H., Spakman, W., 2000. Non-linear global P-wave tomography by iterated lin- deep sea sediments. Journal of Geophysical Research 84, 205–211. earized inversion. Geophysical Journal International 141, 71–82. Casarotti, E., Komatitsch, D., Stupazzini, M., Lee, S.J., Tromp, J., Piersanti, A., 2008. CUBIT Bijwaard, H., Spakman, W., Engdahl, E.R., 1998. Closing the gap between regional and and seismic wave propagation based upon the Spectral-Element Method: an ad- global travel time tomography. Journal of Geophysical Research 103, 30,055–30,078. vanced unstructured mesher for complex 3D geological media. Proceedings, 16th Bodin, T., Sambridge, M.S., Tkalčić, H., Arroucau, P., Gallagher, K., Rawlinson, N., 2012. International Meshing Roundtable. Springer, pp. 1–14. Transdimensional inversion of receiver functions and surface wave dispersion. Castro, C.E., Käser, M., 2010. Seismic waves in heterogeneous material: subcell resolu- Journal of Geophysical Research 117. http://dx.doi.org/10.1029/2011JB008560. tion of the discontinuous Galerkin method. Geophysical Journal International 182, Bolton, H., Masters, G., 1991. Long period absolute P times and lower mantle structure. 250–264. EOS Transactions AGU 72, 339. Chaljub, E., Valette, B., 2004. Spectral element modelling of three-dimensional wave Boschi, L., Dziewonski, A.M., 1999. High- and low-resolution images of the Earth's mantle: propagation in a self-gravitating Earth with an arbitrarily stratified outer core. implications of different approaches to tomographic modeling. Journal of Geophysical Geophysical Journal International 158, 131–141. Research 104, 25,567–25,594. Chaljub, E., Komatitsch, D., Vilotte, J.P., Capdeville, Y., Valette, B., Festa, G., 2007. Spec- Boschi, L., Dziewonski, A.M., 2000. Whole Earth tomography from delay times of P, PcP tral-element analysis in seismology. Advances in Geophysics 48, 365–418. and PKP phases: Lateral heterogeneities in the outer core or radial anistropy in the Chambat, F., Valette, B., 2001. Mean radius, mass, and inertia for reference Earth mantle. Journal of Geophysical Research 105, 13,675–13,696. models. Physics of the Earth and Planetary Interiors 124, 237–253. Boschi, L., Ekström, G., 2002. New images of the Earth's upper mantle from measure- Chang, S.J., van der Lee, S., 2011. Mantle plumes and associated flow beneath Arabia ments of surface wave phase velocity anomalies. Journal of Geophysical Research and East Africa. Earth and Planetary Science Letters 302, 448–454. 107. http://dx.doi.org/10.1029/2000JB000059. Chang, S.J., van der Lee, S., Flanagan, M.P., Bedle, H., Marone, F., Matzel, E.M., Pasyanos, Boschi, L., Ekström, G., Kustowski, B., 2004. Multiple resolution surface wave tomography: M.E., Rodgers, A.J., Romanowicz, B., Schmid, C., 2010. Joint inversion for three- the Mediterranean basin. Geophysical Journal International 157, 293–304. dimensional S velocity mantle structure along the Tethyan margin. Journal of Geo- Boschi, L., Ampuero, J.P., Peter, D., Mai, P., Soldati, G., Giardini, D., 2007. Petascale com- physical Research 115. http://dx.doi.org/10.1029/2009JB007204. puting and resolution in global seismic tomography. Physics of the Earth and Plan- Chen, P., 2011. Full-wave seismic data assimilation: theoretical background and recent etary Interiors 163, 245–250. advances. Pure and Applied Geophysics 168, 1527–1552. Bostock, M.G., 2004. Green's functions, source signatures, and the normalization of Chen, W.P., Brudzinski, M.R., 2003. Seismic anisotropy in the mantle transition zone beneath teleseismic wave fields. Journal of Geophysical Research 109. http://dx.doi.org/ Fiji-Tonga. Geophysical Research Letters 30. http://dx.doi.org/10.1029/2002GL016330. 10.1029/2003JB002783. Chen, P., Zhao, L., Jordan, T.H., 2005. Finite-moment tensor of the 3 September 2002 Yorba Bostock, M.G., Rondenay, S., 1999. Migration of scattered teleseismic body waves. Linda earthquake. Bulletin of the Seismological Society of America 95, 1170–1180. Geophysical Journal International 137, 732–746. Chen, Q.F., Chen, Y., Li, L., 2006. China digital seismic network improves coverage and Bostock, M.G., Rondenay, S., Shragge, J., 2001. Multiparameter two-dimensional inver- quality. EOS Transactions AGU 87, 4–7. sion of scattered teleseismic body waves, 1. Theory for oblique incidence. Journal Chen, M., Tromp, J., Helmberger, D.V., Kanamori, H., 2007. Waveform modeling of the of Geophysical Research 106, 30,771–30,782. slab beneath Japan. Journal of Geophysical Research 112. http://dx.doi.org/ Bouchon, M., Sanchez-Sesma, F.J., 2007. Boundary integral equations and boundary 10.1029/2006JB004394. elements methods in elastodynamics. Advances in Geophysics 48, 157–189. Chen, P., Zhao, L., Jordan, T.H., 2007a. Full 3D tomography for the crustal structure of Bozda , E., Trampert, J., 2008. On crustal corrections in surface wave tomography. the Los Angeles region. Bulletin of the Seismological Society of America 97, Geophysical Journal International 172, 1066–1082. 1094–1120. Bozda , E., Trampert, J., Tromp, J., 2011a. Misfit functions for full waveform inversion Chen, P., Zhao, L., Jordan, T.H., 2007b. Full three-dimensional tomography: a compari- based on instantaneous phase and envelope measurements. Geophysical Journal son between the scattering-integral and adjoint-wavefield methods. Geophysical International 185, 845–870. Journal International 170, 175–181. Bozda , E., Zhu, H.J., Peter, D., Tromp, J., 2011b. Towards global adjoint tomography, Chen, C.W., Rondenay, S., Evans, R.L., Snyder, D.B., 2009. Geophysical detection of relict abstract S41A-2167. 2011 Fall Meeting. AGU. metasomatism from an Archean (3.5 Ga) subduction zone. Science 326, Brenders, A.J., Pratt, R.G., 2007. Full waveform tomography for lithospheric imaging: 1089–1091. results from a blind test in a realistic crustal model. Geophysical Journal International Chen, C.W., Miller, D.E., Djikpesse, H.A., Haldorsen, J.B.U., Rondenay, S., 2010a. Array- 168, 133–151. conditioned deconvolution of multiple-component teleseismic recordings. Geo- Bretherton, F.P., 1968. Propagation in slowly varying waveguides. Proceedings of the physical Journal International 182, 967–976. Royal Society of London Series A 302, 555–576. Chen, M., Huang, H., Yao, H., van der Hilst, R.D., 2010b. Adjoint tomography using Brune, J., 1966. P and S wave travel times and spheroidal normal modes of a homoge- Green's functions from ambient noise, Abstract S31B-07. 2010 Fall Meeting. AGU. neous sphere. Journal of Geophysical Research 71, 2959–2965. Chen, P., Jordan, T.H., Lee, E.J., 2010c. Perturbation kernels for generalized seismological Brune, J., Espinosa, A., Oliver, J., 1963. Relative excitation of surface waves by earth- data functionals (GSDF). Geophysical Journal International 183, 869–883. quakes and underground explosions in the California–Nevada region. Journal of Chernov, L.A., 1960. Wave Propagation in a Random Medium. McGraw-Hill. Geophysical Research 68, 3501–3513. Chevrot, S., Zhao, L., 2007. Multiscale finite-frequency Rayleigh wave tomography of Brzak, K., Gu, Y.J., Ökeler, A., Steckler, M., Lerner-Lam, A., 2009. Migration imaging and the Kaapvaal craton. Geophysical Journal International 169, 201–215. forward modeling of microseismic noise sources near southern Italy. Geochemistry, Chevrot, S., Monteiller, V., Komatitsch, D., Fuji, N., Martin, R., 2011. A hybrid technique Geophysics, Geosystems 10. http://dx.doi.org/10.1029/2008GC02234. for 3-D waveform modeling and inversion of high frequency teleseismic body Bunge, H.P., Hagelberg, C.R., Travis, B.J., 2003. Mantle circulation models with varia- waves, Abstract S11D-05. 2011 Fall Meeting. AGU. tional data assimilation: inferring past mantle flow and structure from plate Chiao, L.Y., Kuo, B.Y., 2001. Multiscale seismic tomography. Geophysical Journal Inter- motion histories and seismic tomography. Geophysical Journal International 152, national 145, 517–527. 280–301. Chiao, L.Y., Fang, H.Y., Gung, Y., Chang, Y.H., Hung, S.H., 2010. Comparative appraisal of Byrd, R.H., Lu, P.H., Nocedal, J., Zhu, C.Y., 1995. A limited memory algorithm for bound linear inverse models constructed via distinctive parameterizations (comparing constrained optimization. SIAM Journal on Scientific Computing 16, 1190–1208. distinctly inverted models). Journal of Geophysical Research 115. http://dx.doi.org/ Campillo, M., Paul, A., 2003. Long-range correlations in the diffuse seismic coda. Science 10.1029/2009JB006867. 299, 547–549. Cho, K.H., Herrmann, R.B., Ammon, C.J., Lee, K., 2007. Imaging the upper crust of the Cao, A.M., Romanowicz, B., 2004. Constraints on density and shear velocity contrasts at Korean Peninsula by surface-Wave tomography. Bulletin of the Seismological the inner core boundary. Geophysical Journal International 157, 1146–1151. Society of America 97, 198–207. Cao, Q., van der Hilst, R.D., de Hoop, M.V., Shim, S.H., 2011. Seismic imaging of transi- Chou, C.W., Brooker, J.R., 1979. A Backus–Gilbert approach to inversion of travel-time tion zone discontinuities suggests hot mantle west of Hawaii. Science 332, data for three-dimensional velocity structure. Geophysical Journal of the Royal 1068–1071. Astronomical Society 59, 325–344.

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 30 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx

Christensen, N.I., Crosson, R.S., 1968. Seismic anisotropy in the upper mantle. Escalante, C., Gu, Y.J., Sacchi, M., 2007. Simultaneous iterative time-domain sparse Tectonophysics 6, 93–107. deconvolution to teleseismic receiver functions. Geophysical Journal International Claerbout, J.F., 1968. Synthesis of a layered medium from its acoustic transmision 171, 316–325. response. Geophysics 33, 264–269. Farra, V., Vinnik, L., 2000. Upper mantle stratification by P and S receiver functions. Claerbout, J.F., 1971. Toward a unified theory of reflector mapping. Geophysics 36, Geophysical Journal International 141, 699–712. 467–481. Fichtner, A., 2010. Full Seismic Waveform Modelling and Inversion. Springer-Verlag, Cormack, A.M., 1963. Representation of a function by its line integrals, with some ra- Heidelberg. diological applications. Journal of Applied Physics 34, 2722–2727. Fichtner, A., Igel, H., 2008. Efficient numerical surface wave propagation through the Courtier, P., Talagrand, O., 1987. Variational assimilation of meteorological observa- optimization of discrete crustal models — a technique based on non-linear disper- tions with the adjoint vorticity equation. II: numerical results. Quarterly Journal sion curve matching (DCM). Geophysical Journal International 173, 519–533. of the Royal Meteorological Society 113, 1329–1347. Fichtner, A., Trampert, J., 2011a. Hessian kernels of seismic data functionals based upon Crase, E., Pica, A., Noble, M., McDonald, J., Tarantola, A., 1990. Robust elastic nonlinear adjoint techniques. Geophysical Journal International 185, 775–798. waveform inversion: application to real data. Geophysics 55, 527–538. Fichtner, A., Trampert, J., 2011b. Resolution analysis in full waveform inversion. Geo- Cunningham, I.A., Judy, P.F., 2000. Computed tomography, In: Bronzino, J.D. (Ed.), The physical Journal International 187, 1604–1624. Biomedical Engineering Handbook, Chapter 62, 2nd edition. CRC Press LLC. Fichtner, A., Bunge, H.P., Igel, H., 2006a. The adjoint method in seismology — I. Theory. Dahlen, F.A., Nolet, G., 2005. Comment on ‘On sensitivity kernels for ‘wave-equation’ Physics of the Earth and Planetary Interiors 157, 86–104. transmission tomography’ by de Hoop and van der Hilst. Geophysical Journal Inter- Fichtner, A., Bunge, H.P., Igel, H., 2006b. The adjoint method in seismology — II. Appli- national 163, 949–951. cations: traveltimes and sensitivity functionals. Physics of the Earth and Planetary Dahlen, F.A., Tromp, J., 1998. Theoretical Global Seismology. Princeton University Press, Interiors 157, 105–123. New Jersey. Fichtner, A., Kennett, B.L.N., Igel, H., Bunge, H.P., 2008. Theoretical background for Dahlen, F.A., Nolet, G., Hung, S.H., 2000. Fréchet kernels for finite-frequency traveltimes — continental- and global-scale full-waveform inversion in the time-frequency do- I. Theory. Geophysical Journal International 141, 157–174. main. Geophysical Journal International 175, 665–685. Dalton, C.A., Ekström, G., Dziewonski, A.M., 2008. The global attenuation structure of Fichtner, A., Igel, H., Bunge, H.P., Kennett, B.L.N., 2009a. Simulation and inversion of the upper mantle. Journal of Geophysical Research 113. http://dx.doi.org/ seismic wave propagation on continental scales based on a spectral-element meth- 10.1029/2007JB005429. od. Journal of Numerical Analysis, Industrial and Applied Mathematics 4, 11–22. Dando, B.D.E., Stuart, G.W., Houseman, G.A., Hegedüs, E., Brückl, E., Radovanović, S., Fichtner, A., Kennett, B.L.N., Igel, H., Bunge, H.P., 2009b. Full seismic waveform tomog- 2011. Teleseismic tomography of the mantle in the Carpathian–Pannonian region raphy for upper-mantle structure in the Australasian region using adjoint of central Europe. Geophysical Journal International 186, 11–31. methods. Geophysical Journal International 179, 1703–1725. Das, T., Nolet, G., 1998. Crustal thickness map of the western United States by partitioned Fichtner, A., Kennett, B.L.N., Igel, H., Bunge, H.P., 2010. Full waveform tomography for waveform inversion. Journal of Geophysical Research 103, 30,021–30,038. radially anisotropic structure: new insights into present and past states of the Aus- de Hoop, M.V., van der Hilst, R.D., 2004. On sensitivity kernels for ‘wave-equation’ tralasian upper mantle. Earth and Planetary Science Letters 290, 270–280. transmission tomography. Geophysical Journal International 160, 621–633. Filler, A., 2009. The history, development and impact of computed imaging in neuro- de la Puente, J., Ampuero, J.P., Käser, M., 2009. Dynamic rupture modeling on unstruc- logical diagnosis and neurosurgery: CT, MRI, and DTI. Nature Precedings. http:// tured meshes using a discontinuous Galerkin method. Journal of Geophysical dx.doi.org/10.1038/npre.2009.3267. Research 114. http://dx.doi.org/10.1029/2008JB006271. Fink, M., 1992. Time reversal of ultrasonic fields — Part I: basic principles. IEEE Trans- Deans, S.R., 1983. The Radon Transform and Some of its Applications. Wiley- actions on Ultrasonics, Ferroelectrics, and Frequency Control 39, 555–566. Interscience, New York. Fink, M., 1997. Time-reversed acoustics. Physics Today 34–40 (March). Descartes, R., Laurence, J.L., 1631. Discourse on method and meditations (trans.). Dover Fink, M., Prada, C., Wu, F., Cassereau, D., 1989. Self focusing in inhomogeneous media Publications, 2003. with time reversal acoustic mirrors. IEEE Ultrasonics Symposium, pp. 681–686. Deuss, A., 2009. Global observations of mantle discontinuities using SS and PP precur- Flanagan, M.P., Wiens, D.A., 1994. Radial upper mantle attenuation structure of inactive sors. Surveys in Geophysics 30, 301–326. back arc basin from differential shear wave measurements. Journal of Geophysical Dueker, K., Sheehan, A.F., 1997. Mantle discontinuity structure from midpoint stacks of Research 99, 15,469–15,485. converted P to S waves across the Yellowstone track. Journal of Geophys- Flanagan, M.P., Shearer, P.M., Cecil, H., 1998. Global mapping of topography on transi- ical Research 102, 8312–8327. tion zone velocity discontinuities by stacking SS precursors. Geophysics 103, Dumbser, M., Käser, M., 2006. An arbitrary high-order discontinuous Galerkin method 2673–2692. for elastic waves on unstructured meshes — II. The three-dimensional isotropic Fletcher, R., 1987. Practical Methods of Optimization, 2nd edition. Wiley-Interscience case. Geophysical Journal International 167, 319–336. Publication. Durek, J.J., Ritzwoller, M.H., Woodhouse, J.H., 1993. Constraining upper mantle anelas- Forsyth, D.W., 1975. The early structural evolution and anisotropy of the oceanic upper ticity using surface wave amplitude anomalies. Geophysical Journal International mantle. Geophysical Journal of the Royal Astronomical Society 43, 103–162. 114, 249–272. Forsyth, D.W., Webb, S.C., Dorman, L.M., Shen, Y., 1998. Phase velocities of Rayleigh Dziewonski, A.M., 1971. Overtones of free oscillations and the structure of the Earth's waves in the MELT experiment on the East Pacific Rise. Science 280, 1235–1238. interior. Science 172, 1336–1338. Forte, A.M., 2000. Seismic-geodynamic constraints on mantle flow: implications for Dziewonski, A.M., 1984. Mapping the lower mantle: determination of lateral heteroge- layered convection, mantle viscosity, and seismic anisotropy in the deep mantle. neity in P velocity up to degree and order 6. Journal of Geophysical Research 89, In: Karato, S. (Ed.), Earth's Deep Interior: and Tomography from 5929–5952. the Atomic to the Global Scale. AGU, pp. 3–36. Dziewonski, A.M., 1996. Earth's mantle in three dimensions. Seismic modelling of earth Forte, A.M., Mitrovica, J.X., 2001. Deep-mantle high-viscosity flow and thermochemical structure, Vol. 25. Editrice Compositoii, pp. 507–572. structure inferred from seismic and geodynamic data. Nature 410, 1049–1056. Dziewonski, A.M., 2005. The robust aspects of global seismic tomography. GSA Special Forte, A.M., Woodward, L., Dziewonski, A.M., 1994. Joint inversions of seismic and Papers 388, 147–154. http://dx.doi.org/10.1130/0-8137-2388-4.147. geodynamic data for models of three-dimensional mantle heterogeneity. Journal Dziewonski, A.M., Anderson, D.L., 1981. Preliminary reference Earth model. Physics of of Geophysical Research 99, 21,857–21,877. the Earth and Planetary Interiors 25, 297–356. Fréchet, M.R., 1941. Sur la loi de répartition de certaines grandeurs géographiques. Dziewonski, A.M., Steim, J.M., 1982. Dispersion and attenuation of mantle waves Journal de la Société de Statistiques de Paris 82, 114–122. through waveform inversion. Geophysical Journal of the Royal Astronomical Society Friederich, W., Wielandt, E., 1995. Interpretation of seismic surface waves in regional 70, 503–527. networks: joint estimation of wavefield geometry and local phase velocity. Method Dziewonski, A.M., Woodhouse, J.H., 1987. Global images of the Earth's interior. Science and numerical tests. Geophysical Journal International 120, 731–744. 236, 37–48. Fuji, N., S., C., Zhao, L., Geller, R.J., Kawai, K., 2011. Efficient calculation of complete 3D Dziewonski, A.M., Forte, A.M., Su, W.J., Woodward, R.L., 1993. Seismic tomography and Fréchet derivatives for high frequency compressional teleseismic body waves, . In: Aki, K., Dmowska, R. (Eds.), Relating Geophysical Structures and Abstract S44A-01. 2011 Fall Meeting. AGU. Processes: The Jeffreys Volume. : Geophysical Monograph, 76. AGU, pp. 67–105. Fukao, Y., Obayashi, M., Inoue, H., Nenbai, M., 1992. Subducting slabs stagnant in the Earle, P.S., Shearer, P.M., 1994. Characterization of global seismograms using an mantle transition zone. Journal of Geophysical Research 97, 4809–4822. automatic-picking algorithm. Bulletin of the Seismological Society of America 84, Fukao, Y., Widiyantoro, S., Obayashi, M., 2001. Stagnant slab in the upper and lower 366–376. mantle transition region. Reviews of Geophysics 39, 291–323. Efron, B., Tibshirani, R., 1991. Statistical data analysis in the computer age. Science 253, Fukao, Y., Obayashi, M., Nakakuki, T., 2009. Stagnant slab: a review. Annual Review of 390–395. Earth and Planetary Sciences 37, 19–46. Eisner, L., Clayton, R.W., 2001. A reciprocity method for multiple-source simulations. Gaherty, J.B., Jordan, T.H., 1995. Lehmann discontinuity as the base of an anisotropic Bulletin of the Seismological Society of America 91, 553–560. layer beneath continents. Science 268, 1468–1471. Ekström, G., Dziewonski, A.M., 1998. The unique anisotropy of the Pacific upper mantle. Gauthier, O., Virieux, J., Tarantola, A., 1986. Two-dimensional nonlinear inversion of Nature 394, 168–172. seismic waveforms: numerical results. Geophysics 51, 1387–1403. Ekström, G., Dziewonski, A.M., Su, W.J., Smith, G.P., 1995. Elastic and anelastic structure Gee, L.S., Neuhauser, D.S., Dreger, D., Pasyanos, M.E., Uhrhammer, R.A., Romanowicz, B., beneath Eurasia. Technical Report. Contract No. F49620-92-J-0392, sponsored by 1996. Real-time seismology at UC Berkeley: The Rapid Earthquake Data Integration AFOSR. project. Bulletin of the Seismological Society of America 86, 936–945. Engl, H.W., Hanke, M., Neubauer, A., 2000. Regularization of Inverse Problems. Kluwer Geller, R.J., Hara, T., 1993. Two efficient algorithms for iterative linearized inversion of Academic Publishers, Netherland. seismic waveform data. Geophysical Journal International 115, 699–710. Epanomeritakis, I., Akcelik, V., Ghattas, O., Bielak, J., 2008. A Newton-CG method for Geller, R.J., Ohminato, T., 1994. Computation of synthetic seismograms and their partial large-scale three-dimensional elastic full-waveform seismic inversion. Inverse derivatives for heterogeneous media with arbitrary natural boundary conditions Problems 24. http://dx.doi.org/10.1088/0266-5611/24/3/034015. using the Direct Solution Method. Geophysical Journal International 116, 421–446.

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 31

Gilbert, F., 1970. Excitation of the normal modes of the Earth by earthquake sources. Ishii, M., 2004. Constraining large-scale mantle heterogeneity using mantle and inner- Geophysical Journal of the Royal Astronomical Society 22, 223–226. core sensitive normal modes. Physics of the Earth and Planetary Interiors 146, Gilbert, F., Dziewonski, A.M., 1975. An application of normal mode theory to the re- 113–124. trieval of structural parameters and source mechanisms from seismic spectra. Phil- Ishii, M., Shearer, P.M., Houston, H., Vidale, J.E., 2005. Extent, duration and speed of the osophical Transactions of the Royal Society of London, Series A 278, 187–269. 2004 Sumatra-Andaman earthquake imaged by the Hi-Net array. Nature 435, Godinho, L., Amado, P.A., Tadeu, A., Cadena-Isaza, A., Smerzini, C., Sanchez-Sesma, F.J., 933–936. Madec, R., Komatitsch, D., 2009. Numerical simulation of ground rotations along Ita, J., Stixrude, L., 1992. , elasticity, and composition of the mantle transition 2D topographical profiles under the incidence of elastic plane waves. Bulletin of zone. Journal of Geophysical Research 97, 6849–6866. the Seismological Society of America 99, 1147–1161. Iyer, H.M., Hirahara, K., 1993. Seismic Tomography, Theory and Practice. Chapman and Grand, S.P., 2002. Mantle shear-wave tomography and the fate of subducted slabs. Phil- Hall, London. osophical Transactions of the Royal Society of London, Series A 360, 2475–2491. Jordan, T.H., 1978. A procedure for estimating lateral variations from low-frequency Grand, S.P., van der Hilst, R.D., Widiyantoro, S., 1997. Global seismic tomography: a eigenspectra data. Geophysical Journal of the Royal Astronomical Society 52, snapshot of convection in the Earth. GSA Today 7, 1–7. 441–455. Gu, Y.J. (Ed.), 2010. Arrays and Array Methods in Global Seismology. Springer. Jordan, T.H., 1981. Global tectonic regionalization for seismological data analysis. Gu, Y.J., Dziewonski, A.M., Agee, C.B., 1998. Global de-correlation of the topography of Bulletin of the Seismological Society of America 71, 1131–1141. transition zone discontinuities. Earth and Planetary Science Letters 157, 57–67. Julian, B.R., Gubbins, D., 1977. Three-dimensional seismic ray tracing. Journal of Gu, Y.J., Dziewonski, A.M., Su, W.J., Ekström, G., 2001. Models of the mantle shear veloc- Geophysics 43, 95–113. ity and discontinuties in the pattern of lateral heterogeneities. Journal of Geophysical Jung, H., Karato, S., 2001. Water-induced fabric transitions in olivine. Science 293, Research 106, 11,169–11,199. 1460–1463. Gu, Y.J., Dziewonski, A.M., Ekström, G., 2003. Simultaneous inversion for mantle shear Kanamori, H., 1970. Velocity and Q of mantle waves. Physics of the Earth and Planetary velocity and topography of transition zone discontinuities. Geophysical Journal Interiors 2, 259–275. International 154, 559–583. Kanamori, H., Hauksson, E., Heaton, T.H., 1991. Terrascope and CUBE Project at Caltech. Gu, Y.J., Lerner-Lam, A., Dziewonski, A.M., Ekström, G., 2005. Deep structure and seis- EOS Transactions AGU 72. mic anisotropy beneath the East Pacific Rise. Earth and Planetary Science Letters Kárason, H., van der Hilst, R.D., 2000. Constraints on mantle convection from seismic 232, 259–272. tomography. The history and dynamics of global plate motions: Geophysical Gu, Y.J., Dublanko, C., Lerner-Lam, A., Brzak, K., Steckler, M., 2007. Probing the sources monograph, 121. AGU, pp. 277–288. of ambient near the coasts of southern Italy. Geophysical Research Kárason, H., van der Hilst, R.D., 2001. Tomographic imaging of the lowermost mantle Letters 34. http://dx.doi.org/10.1029/2007GL031967. with differential times of refracted and diffracted core phases (PKP, Pdiff). Journal Gu, Y.J., Okeler, A., Schultz, R., 2012. Tracking slabs beneath northwestern Pacific sub- of Geophysical Research 106, 6569–6587. duction zones. Earth and Planetary Science Letters 331–332, 269–280. Karato, S., 1998. Seismic anisotropy in the deep mantle, boundary layers and the geom- Gung, Y., Romanowicz, B., 2004. Q tomography of the upper mantle using three- etry of mantle convection. Pure and Applied Geophysics 151, 565–587. component long-period waveforms. Geophysical Journal International 157, Käser, M., Dumbser, M., 2006. An arbitrary high-order discontinuous Galerkin method 813–830. for elastic waves on unstructured meshes — I. The two-dimensional isotropic case Gung, Y., Panning, M., Romanowicz, B., 2003. Global anisotropy and the thickness of with external source terms. Geophysical Journal International 166, 855–877.

continents. Nature 422, 707–711. Katsura, T., Ito, E., 1989. The system Mg2SiO4–Fe2SiO4 at high pressure and tempera- Hearn, T.M., Clayton, R.W., 1986. Lateral velocity variations in southern California. I. Re- tures: precise determination of stabilities of olivine, modified spinel, and spinel. sults for upper crust from Pg waves. Bulletin of the Seismological Society of Amer- Journal of Geophysical Research 94, 15,633–15,670. ica 76, 495–509. Kawakatsu, H., 1983. Can ‘pure-path’ models explain free oscillation data? Geophysical Helffrich, G., 2000. Topography of the transition zone seismic discontinuities. Reviews Research Letters 10, 186–189. of Geophysics 38, 141–158. Keilis-Borok, V.I., Yanovskaja, T.B., 1967. Inverse problems of seismology (structural re- Herglotz, G., 1907. Über das Benndorfsche problem der fortplanzungsgeschwindigkeit view). Geophysical Journal of the Royal Astronomical Society 13, 223–234. der erdbeben-strahlen. Zeitschrift für Geophysik 8, 145–147. Keller, H., 1968. Numerical Methods for Two-point Boundary-value Problems. Blaisdell, Hermann, V., Käser, M., Castro, C.E., 2010. Non-conforming hybrid meshes for efficient Waltham. 2-D wave propagation using the Discontinuous Galerkin Method. Geophysical Keller, J.B., Kay, I., Shmqys, J., 1956. Determination of the potential from scattering data. Journal International 184, 746–758. Physical Review 102, 557–559. Hess, H.H., 1964. Seismic anisotropy of the uppermost mantle under oceans. Nature Kellogg, L.H., Hager, B.H., van der Hilst, R.D., 1999. Compositional stratification in the 203, 629–631. deep mantle. Science 283, 1881–1884. Hingee, M., Tkalčić, H., Fichtner, A., Sambridge, M.S., 2011. Seismic moment tensor Kelly, K.R., Ward, R.W., Treitel, S., Alford, R.M., 1976. Synthetic seismograms: a finite- inversion using a 3-D structural model: applications for the Australian region. difference approach. Geophysics 41, 2–27. Geophysical Journal International 184, 949–964. Kennett, B.L.N., Engdahl, E.R., 1991. Traveltimes for global earthquake location and Hirn, A., 1977. Anisotropy in the continental upper mantle: possible evidence from ex- phase identification. Geophysical Journal International 105, 429–465. plosion seismology. Geophysical Journal of the Royal Astronomical Society 49, Kennett, B.L.N., Sambridge, M.S., Williamson, P.R., 1988. Subspace methods for large in- 49–58. verse problems with multiple parameter classes. Geophysical Journal 94, 237–247. Hjörleifsdóttir, V., 2007. Earthquake source characterization using 3D numerical Kennett, B.L.N., Widiyantoro, S., van der Hilst, R.D., 1998. Joint seismic tomography for modeling. Ph.D. thesis. Caltech. bulk sound and shear wave speed. Journal of Geophysical Research 103, 469–493. Ho-Liu, P., Kanamori, H., Clayton, R.W., 1988. Application of attenuation tomography to Khan, A., Lantz, E., 2002. Who really discovered Snell's law? Physics World 64–84 (April). Imperial Valley and Coso-Indian Wells region, southern California. Journal of Khan, A., Connolly, J.A.D., Taylor, S.R., 2008. Inversion of seismic and geodetic data for Geophysical Research 93, 10,501–10,520. the major element chemistry and temperature of the Earth's mantle. Journal of Ho-Liu, P., Montagner, J.P., Kanamori, H., 1989. Comparison of iterative back-projection Geophysical Research 113. http://dx.doi.org/10.1029/2007JB005239. inversion and generalized inversion without blocks: case studies in attenuation Kim, Y., Liu, Q., Tromp, J., 2011. Adjoint centroid-moment tensor inversions. Geophys- tomography. Geophysical Journal 97, 19–29. ical Journal International 186, 264–278. Houser, C., Masters, G., Shearer, P.M., Laske, G., 2008. Shear and compressional velocity Kissling, E., Husen, S., Haslinger, F., 2001. Model parametrization in seismic tomogra- models of the mantle from cluster analysis of long-period waveforms. Geophysical phy: a choice of consequence for the solution quality. Physics of the Earth and Journal International 174, 195–212. Planetary Interiors 123, 89–101. Hull, D.G., Tapley, B.D., 1977. Square-root variable-metric methods for minimization. Knopoff, L., 1972. Observation and inversion of surface-wave dispersion. Tectonophysics Journal of Optimization Theory and Applications 21, 251–259. 13, 497–519. Hung, S.H., Nolet, G., Dahlen, F.A., 2000. Fréchet kernels for finite-frequency traveltimes Komatitsch, D., Tromp, J., 1999. Introduction to the spectral element method for three- — II. Examples. Geophysical Journal International 141, 175–203. dimensional seismic wave propagation. Geophysical Journal International 139, Hung, S.H., Shen, Y., Chiao, L.Y., 2004. Imaging seismic velocity structure beneath the 806–822. Iceland hot spot: a finite frequency approach. Journal of Geophysical Research Komatitsch, D., Tromp, J., 2002a. Spectral-element simulations of global seismic wave 109. http://dx.doi.org/10.1029/2003JB002889. propagation — I. Validation. Geophysical Journal International 149, 390–412. Hung, S.H., Chen, W.P., Chiao, L.Y., Tseng, T.L., 2010. First multi-scale, finite-frequency Komatitsch, D., Tromp, J., 2002b. Spectral-element simulations of global seismic wave tomography illuminates 3-D anatomy of the Tibetan Plateau. Geophysical Research propagation — II. Three-dimensional models, oceans, rotation and self- Letters 37. http://dx.doi.org/10.1029/2009GL041875. gravitation. Geophysical Journal International 150, 303–318. Hung, S.H., Chen, W.P., Chiao, L.Y., 2011. A data-adaptive, multiscale approach of finite- Komatitsch, D., Liu, Q., Tromp, J., Süss, M.P., Stidham, C., Shaw, J.H., 2004. Simulations of frequency, traveltime tomography with special reference to P and S wave data ground motion in the Los Angeles Basin based upon the spectral-element method. from central Tibet. Journal of Geophysical Research 116. http://dx.doi.org/ Bulletin of the Seismological Society of America 94, 187–206. 10.1029/2010JB008190. Komatitsch, D., Michéa, D., Erlebacher, G., 2009. Porting a high-order finite-element Igel, H., 1996. P–SV wave propagation in the Earth's mantle using lowermost mantle earthquake modeling application to NVIDIA graphics cards using CUDA. Journal structure. Geophysical Research Letters 23, 415–418. of Parallel and Distributed Computing 69, 451–460. Igel, H., Djikpéssé, H., Tarantola, A., 1996. Waveform inversion of marine reflection Komatitsch, D., Erlebacher, G., Göddeke, D., Michéa, D., 2010. High-order finite-element seismograms for P impedance and Poisson's ratio. Geophysical Journal Internation- seismic wave propagation modeling with MPI on a large GPU cluster. Journal of al 124, 363–371. Computational Physics 229, 7692–7714. Igel, H., Nissen-Meyer, T., Jahnke, G., 2002. Wave propagation in 3D spherical sections: Koper, K.D., Wysession, M.E., Wiens, D.A., 1999. Multimodal function optimization with effects of subduction zones. Physics of the Earth and Planetary Interiors 132, a niching genetic algorithm: A seismological example. Bulletin of the Seismological 219–234. Society of America 89, 978–988.

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 32 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx

Kremers, S., Fichtner, A., Brietzke, G.B., Igel, H., Larmat, C., Huang, L., Käser, M., 2011. Ex- Liu, Q., Chen, C.W., 2011. High-resolution array imaging using teleseismic converted ploring the potentials and limitations of the time-reversal imaging of finite seismic waves based on adjoint methods, abstract S13C-03. 2011 Fall Meeting. AGU. sources. Solid and Earth Discussions 3, 217–248. Liu, X.F., Dziewonski, A.M., 1998. Global analysis of shear wave velocity anomalies in Kustowski, B., Ekström, G., Dziewonski, A.M., 2008. Anisotropic shear-wave velocity the lowermost mantle. In: Gurnis, M., Wysession, N., Knittle, E., Buffett, B. (Eds.), structure of the Earth's mantle: a global model. Journal of Geophysical Research The Core–Mantle Boundary Region. AGU, pp. 21–36. 113. http://dx.doi.org/10.1029/2007JB005169. Liu, Q., Tromp, J., 2006. Finite-frequency kernels based on adjoint methods. Bulletin of Lailly, P., 1984. Migration methods: partial but efficient solutions to the seismic inverse the Seismological Society of America 96, 2283–2397. problem. In: Santosa, F. (Ed.), Inverse Problem of Acoustic and Elastic Waves. Society Liu, Q., Tromp, J., 2008. Finite-frequency sensitivity kernels for global seismic wave for Industrial and Applied Mathematics, pp. 182–214. propagation based upon adjoint methods. Geophysical Journal International 174, Langston, C.A., 1977. Corvallis, Oregon, crustal and upper mantle receiver structure 265–286. from teleseismic P and S waves. Bulletin of the Seismological Society of America Liu, Q., Tromp, J., Polet, J., Komatitsch, D., 2004. Spectral-element moment tensor inver- 67, 713–724. sions for earthquakes in southern California. Bulletin of the Seismological Society Langston, C.A., 1989. Scattering of teleseismic body waves under Pasadena, California. of America 94, 1748–1761. Journal of Geophysical Research 94, 1935–1951. Lobkis, O., Weaver, R.L., 2002. On the emergence of the Green's function in the correlations Langston, C.A., Hammer, J.K., 2001. The vertical component P-wave receiver function. of a diffuse field: pulse-echo using thermal phonons. Ultrasonics 40, 435–439. Bulletin of the Seismological Society of America 91, 1805–1819. Long, M.D., Silver, P.G., 2009. Shear wave splitting and mantle anisotropy: measure- Larmat, C., Liu, Q., Tromp, J., Montagner, J.P., 2008. Time reversal location of glacial ments, interpretations, and new directions. Surveys in Geophysics 30, 407–461. earthquakes. Journal of Geophysical Research 113. http://dx.doi.org/10.1029/ Love, A.E.H., 1927. A Treatise on the Mathematical theory of Elasticity. Cambridge 2008JB005607. University Press. Larmat, C., Guyer, R.A., Johnson, P.A., 2010. Time-reversal methods in geophysics. Physics Luo, Y., Schuster, G.T., 1991. Wave-equation traveltime inversion. Geophysics 56, Today 63, 31–35. 645–653. Larson, E.W.F., Tromp, J., Ekström, G., 1998. Effects of slight anisotropy on surface Luo, Y., Zhu, H.J., Nissen-Meyer, T., Morency, C., Tromp, J., 2009. Seismic modeling and waves. Geophysical Journal International 132, 654–666. imaging based upon spectral-element and adjoint methods. The Leading Edge 28, Laske, G., Masters, G., 1996. Constraints on global phase velocity maps from long- 568–574. period polarization data. Journal of Geophysical Research 101, 16,059–16,075. Maggi, A., Priestley, K., 2005. Surface waveform tomography of the Turkish-Iranian Lawrence, J.F., Prieto, G.A., 2011. Attenuation tomography of the western United States plateau. Geophysical Journal International 160, 1068–1080. from ambient seismic noise. Journal of Geophysical Research 116. http:// Maggi, A., Tape, C.H., Chen, M., Chao, D., Tromp, J., 2009. An automated time-window dx.doi.org/10.1029/2010JB007836. selection algorithm for seismic tomography. Geophysical Journal International Lawrence, J.F., Shearer, P.M., 2006. A global study of transition zone thickness using re- 178, 257–281. ceiver functions. Journal of Geophysical Research 111. http://dx.doi.org/10.1029/ Manaman, N.S., Shomali, H., Koyi, H., 2011. New constraints on upper-mantle S- 2005JB003973. velocity structure and crustal thickness of the Iranian plateau using partitioned Lee, S.J., Chen, H.W., Liu, Q., Komatitsch, D., Tromp, J., Huang, B.S., 2008. Three-dimensional waveform inversion. Geophysical Journal International 184, 247–267. simulations of seismic-wave propagation in the Taipei basin with realistic topography Manners, U.J., 2008. Investigating the structure of the core–mantle boundary region based upon the Spectral-element method. Bulletin of the Seismological Society of using S and P diffracted waves. Ph.D. thesis. University of California, San Diego. America 98, 253–264. Maraschini, M., Foti, S., 2010. A Monte Carlo multimodal inversion of surface waves. Lee, E.J., Chen, P., Jordan, T.H., Wang, L.Q., 2011. Rapid full-wave centroid moment Geophysical Journal International 182, 1557–1566. tensor (CMT) inversion in a three-dimensional earth structure model for earth- Marone, F., Romanowicz, B., 2007. The depth distribution of azimuthal anisotropy in quakes in Southern California. Geophysical Journal International 186, 311–330. the continental upper mantle. Nature 447, 198–201. Lekić, V., Romanowicz, B., 2011. Inferring upper-mantle structure by full waveform to- Marone, F., Gung, Y., Romanowicz, B., 2007. Three-dimensional radial anisotropic struc- mography with the spectral element method. Geophysical Journal International ture of the North American upper mantle from inversion of surface waveform data. 185, 799–831. Geophysical Journal International 171, 206–222. Lekić, V., Panning, M., Romanowicz, B., 2010. A simple method for improving crustal Marquering, H., Snieder, R., 1995. Surface-wave mode coupling for efficient forward corrections in waveform tomography. Geophysical Journal International 182, modelling and inversion of body-wave phases. Geophysical Journal International 264–278. 120, 186–208. Levander, A., Nolet, G. (Eds.), 2005. Seismic earth: array analysis of broadband Marquering, H., Snieder, R., Nolet, G., 1996. Waveform inversions and the significance seismograms. : Geophysical Monograph Series, 157. AGU. of surface-wave mode coupling. Geophysical Journal International 124, 258–278. Levander, A., Humphreys, E.D., Ekström, G., Meltzer, A.S., Shearer, P.M., 1999. Proposed Marquering, H., Nolet, G., Dahlen, F.A., 1998. Three-dimensional waveform sensitivity project would give unprecedented look under North America. EOS Transactions kernels. Geophysical Journal International 132, 521–534. AGU 80, 245–251. Marquering, H., Nolet, G., Dahlen, F.A., 1999. Three-dimensional sensitivity kernels for Levander, A., Niu, F.L., Symes, W.W., 2005. Imaging teleseismic P to S scattered waves finite-frequency traveltimes: the banana-doughnut paradox. Geophysical Journal using the Kirchhoff integral. In: Levander, A., Guust, N. (Eds.), Seismic Earth: International 137, 805–815. Array Analysis of Broadband Seismograms. AGU, pp. 149–169. Masters, G., 1989. Low-frequency seismology and the three-dimensional structure of Leveque, J.J., Debayle, E., Maupin, V., 1998. Anisotropy in the Indian Ocean upper man- the Earth. Philosophical Transactions of the Royal Society of London, Series A tle from Rayleigh- and Love-waveform inversion. Geophysical Journal Internation- 328, 329–349. al 133, 529–540. Masters, G., Shearer, P.M., 1995. Seismic models of the Earth: elastic and anelastic. In: Levin, V., Roecker, S., Graham, P., Hosseini, A., 2008. Seismic anisotropy indicators in Ahrens, T.J. (Ed.), Global Earth Physics, A Handbook of Physical Constants. AGU, Western Tibet: shear wave splitting and receiver function analysis. Tectonophysics pp. 88–103. 462, 99–108. Masters, G., Jordan, T.H., Silver, P.G., Gilbert, F., 1982. Aspherical earth structure from Levshin, A.L., Yang, X.N., Barmin, M.P., Ritzwoller, M.H., 2010. Midperiod Rayleigh wave fundamental spheroidal-mode data. Nature 298, 609–613. attenuation model for Asia. Geochemistry, Geophysics, Geosystems 11. http:// Masters, G., Laske, G., Bolton, H., Dziewonski, A.M., 2000. The relative behavior of shear dx.doi.org/10.1029/2010GC003164. velocity, bulk sound speed, and compressional velocity in the mantle: implications Li, X.Q., Nfibelek, J.L., 1999. Deconvolution of teleseismic body waves for enhancing for chemical and thermal structure. Earth's Deep Interior: Mineral Physics and structure beneath a array. Bulletin of the Seismological Society of Tomography from the Atomic to the Global Scale: Geophysical monograph, 117. America 89, 190–201. AGU, pp. 63–87. Li, X.D., Romanowicz, B., 1995. Comparison of global waveform inversions with and Mégnin, C., Romanowicz, B., 2000. A comparison between tomographic and without considering cross-branch modal coupling. Geophysical Journal Interna- geodynamic models of the Earth's mantle. History and Dynamics of plate Motions. tional 121, 695–709. : Geophysical Monograph, 121. AGU, pp. 257–276. Li, X.D., Romanowicz, B., 1996. Global mantle shear velocity model developed using Meier, U., Curtis, A., Trampert, J., 2007a. Fully nonlinear inversion of fundamental mode nonlinear asymptotic coupling theory. Journal of Geophysical Research 101, surface waves for a global crustal model. Geophysical Research Letters 34. http:// 22,245–22,272. dx.doi.org/10.1029/2007GL030989. Li, X.D., Tanimoto, T., 1993. Waveforms of long-period body waves in a slightly aspher- Meier, U., Curtis, A., Trampert, J., 2007b. Global crustal thickness from neural network ical earth model. Geophysical Journal International 112, 92–102. inversion of surface wave data. Geophysical Journal International 169, 706–722. Lin, F.C., Moschetti, M.P., Ritzwoller, M.H., 2008. Surface wave tomography of the Menke, W., 1984. Geophysical Data Analysis: Discrete Inverse Theory. Academic Press. western United States from ambient seismic noise: Rayleigh and Love wave Menke, W., Schaff, D., 2004. Absolute earthquake locations with differential data. phase velocity maps. Geophysical Journal International 173, 281–298. Bulletin of the Seismological Society of America 94, 2254–2264. Lin, F.C., Ritzwoller, M.H., Snieder, R., 2009. Eikonal tomography: surface wave tomog- Mercier, J.P., Bostock, M.G., Cassidy, J.F., Dueker, K., Gaherty, J.B., Garnero, E.J., raphy by phase front tracking across a regional broad-band seismic array. Revenaugh, J., Zandt, G., 2009. Body-wave tomography of western Canada. Geophysical Journal International 177, 1091–1910. Tectonophysics 475, 480–492. Lin, F.C., Ritzwoller, M.H., Yang, Y., Moschetti, M.P., Fouch, M.J., 2010. Complex and var- Metropolis, N., Ulam, S., 1949. The Monte Carlo method. Journal of the American Statistical iable crustal and uppermost mantle seismic anisotropy in the western United Association 44, 335–341. States. Nature Geoscience 4, 55–61. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., 1953. Equation of Lin, F.C., Ritzwoller, M.H., Shen, W.S., 2011. On the reliability of attenuation measure- state calculation by fast computing machines. The Journal of Chemical Physics 21, ments from ambient noise cross-correlations. Geophysical Research Letters 38. 1087–1092. http://dx.doi.org/10.1029/2011GL047366. Mewes, A., Kulessa, B., McKinley, J.D., Binley, A.M., 2010. Anisotropic seismic inversion Liu, Q., 2009. A database of global finite-frequency sensitivity kernels based upon using a multigrid Monte Carlo approach. Geophysical Journal International 183, spectral-element simulations. Abstract S33B-1761: Eos Trans. AGU, 90. 267–276.

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 33

Mitchell, B.J., Romanowicz, B. (Eds.), 1998. Q of the Earth: Global, Regional, and Labo- Nocedal, J., Wright, S.J., 2006. Numerical Optimization, 2nd edition. Springer. ratory Studies: PAGEOPH, vol. 153. Birkhauser Verlag, Basel. Nolet, G., 1990. Partitioned waveform inversion and two-dimensional structure under Moczo, P., Bystricky, E., Kristek, J., Carcione, J.M., Bouchon, M., 1997. Hybrid modeling the network of autonomously recording seismographs. Journal of Geophysical of P–SV seismic motion at inhomogeneous viscoelastic topographic structures. Research 95, 8499–8512. Bulletin of the Seismological Society of America 87, 1305–1323. Nolet, G., 1993. Solving large linearized tomographic problems. In: Iyer, H.M., Hirahara, Moczo, P., Robertsson, J.O.A., Eisner, L., 2007. The finite-difference time-domain K. (Eds.), Seismic Tomography — Theory and Practice. Chapman & Hall, London, pp. method for modeling of seismic wave propagation. Advances in Geophysics 48. 227–247. http://dx.doi.org/10.1016/S0065-2687(06)48008-0. Nolet, G., 2008. A Breviary of Seismic Tomography: Imaging the Interior of the Earth Montagner, J.P., 1994. Can seismology tell us anything about convection in the mantle. and Sun. Cambridge University Press. Reviews of Geophysics 32, 115–137. Nolet, G., Dahlen, F.A., Montelli, R., 2005. Traveltimes and amplitudes of seismic waves: Montagner, J.P., Nataf, H.C., 1986. A simple method for inverting the azimuthal anisot- a re-assessment. In: Levander, A., Nolet, G. (Eds.), Seismic Earth: Array Analysis of ropy of surface waves. Journal of Geophysical Research 91, 511–520. Broadband Seismograms. AGU, pp. 37–47. Montagner, J.P., Tanimoto, T., 1990. Global anisotropy in the upper mantle inferred Nowack, R.L., 1990. Tomography and the Herglotz–Wiechert inverse formulation. Pure from the regionalization of phase velocities. Journal of Geophysical Research 95, and Applied Geophysics 133, 305–315. 4797–4819. Obayashi, M., Fukao, Y., 1997. P and PcP travel time tomography for the core–mantle Montelli, R., Nolet, G., Dahlen, F.A., Masters, G., Engdahl, E.R., Hung, S.H., 2004. Finite-fre- boundary. Journal of Geophysical Research 102, 17,825–17,841. quency tomography reveals a variety of plumes in the mantle. Science 303, 338–343. Obrebski, M., Allen, R.M., Pollitz, F., Hung, S.H., 2011. Lithosphere–asthenosphere inter- Montelli, R., Nolet, G., Dahlen, F.A., 2006a. Comment on ‘Banana-doughnut kernels and action beneath the western United States from the joint inversion of body-wave mantle tomography’ by van der Hilst and de Hoop. Geophysical Journal International traveltimes and surface-wave phase velocities. Geophysical Journal International 167, 1204–1210. 185, 1003–1021. Montelli, R., Nolet, G., Dahlen, F.A., Masters, G., 2006b. A catalogue of deep mantle Obrebski, M., Allen, R.M., Zhang, F.X., Pan, J.T., Wu, Q.W., Hung, S.H., 2012. Shear wave plumes: new results from finite-frequency tomography. Geochemistry, Geophysics, tomography of China using joint inversion of body and surface wave constraints. Geosystems 7. http://dx.doi.org/10.1029/2006GC001248. Journal of Geophysical Research 117. http://dx.doi.org/10.1029/2011JB008349. Moorkamp, M., Jones, A.G., Eaton, D.W., 2007. Joint inversion of teleseismic receiver Okada, Y., Kasahara, K., Hori, S., Obara, K., Sekiguchi, S., Fujiwara, H., Yamamoto, A., functions and magnetotelluric data using a genetic algorithm: are seismic veloci- 2004. Recent progress of seismic observation networks in Japan. Earth, Planets ties and electrical conductivities compatible? Geophysical Research Letters 34. and Space 56, xv–xxviii. http://dx.doi.org/10.1029/2007GL030519. Olsen, K.B., Pechmann, J.C., Schuster, G.T., 1995. Simulation of 3D elastic wave propaga- Moorkamp, M., Jones, A.G., Fishwick, S., 2010. Joint inversion of receiver functions, sur- tion in the Salt Lake basin. Bulletin of the Seismological Society of America 85, face wave dispersion, and magnetotelluric data. Journal of Geophysical Research 1688–1710. 115. http://dx.doi.org/10.1029/2009JB006369. Opršal, I., Matyska, C., Irikura, K., 2009. The source-box wave propagation hybrid Mora, P., 1987. Nonlinear two-dimensional elastic inversion of multioffset seismic data. methods: general formulation and implementation. Geophysical Journal Interna- Geophysics 52, 1211–1228. tional 176, 555–564. Mora, P., 1988. Elastic wave-field inversion of reflection and transmission data. Panning, M., Romanowicz, B., 2006. A three-dimensional radially anisotropic model of Geophysics 53, 750–759. shear velocity in the whole mantle. Geophysical Journal International 167, 361–379. Morelli, A., Dziewonski, A.M., 1987. Topography of the core–mantle boundary and Panning, M., Capdeville, Y., Romanowicz, B., 2009. Seismic waveform modelling in a 3- lateral homogeneity of the liquid core. Nature 325, 678–683. D Earth using the Born approximation: potential shortcomings and a remedy. Mosca, I., 2010. Probabilistic tomography using body wave, normal-mode and surface Geophysical Journal International 177, 161–178. wave data. Ph.D. thesis. University Utrechet. Panza, G.F., 1985. Synthetic seismograms: The Rayleigh waves modal summation. Journal Mosca, I., Trampert, J., 2009. Path-average kernels for long wavelength traveltime to- of Geophysics 58, 125–145. mography. Geophysical Journal International 177, 639–650. Park, J., Levin, V., 2002. Seismic anisotropy: tracing plate dynamics in the mantle. Science Moschetti, M.P., Ritzwoller, M.H., Shapiro, N.M., 2007. Surface wave tomography of the 296, 485–489. western United States from ambient seismic noise: Rayleigh wave group velocity Park, J., Lindberg, C.R., Thomson, D.J., 1987. Multiple-taper spectral analysis of terrestri- maps. Geochemistry, Geophysics, Geosystems 8. http://dx.doi.org/10.1029/ al free oscillations: part I. Geophysical Journal of the Royal Astronomical Society 2007GC001655. 91, 755–794. Moschetti, M.P., Ritzwoller, M.H., Lin, F., Yang, Y., 2010. Seismic evidence for wide- Parker, R.L., 1994. Geophysical Inverse Theory. Princeton University Press. spread western-US deep-crustal deformation caused by extension. Nature 464, Passier, M.L., Snieder, R., 1995. Using differential waveform data to retrieve local S 885–889. velocity structure or path-averaged S velocity gradients. Journal of Geophysical Mosegaard, K., 1998. Resolution analysis of general inverse problems through inverse Research 100, 24,061–24,078. Monte Carlo sampling. Inverse Problems 14, 405–426. Patera, A., 1984. A spectral element method for fluid dynamics: laminar flow in a Mosegaard, K., Tarantola, A., 1995. Monte Carlo sampling of solutions to inverse prob- channel expansion. Journal of Computational Physics 54, 468–488. lem. Journal of Geophysical Research 100, 12,431–12,447. Pavlis, G.L., 2005. Direct imaging of the coda of teleseismic P waves. In: Levander, A., Mosegaard, K., Tarantola, A., 2002. Probabilistic approach to inverse problems. Interna- Guust, N. (Eds.), Seismic Earth: Array Analysis of Broadband Seismograms. AGU, tional handbook of earthquake and engineering seismology, vol. 81A, pp. 237–265. pp. 171–185. Nábelek, J., Hetényi, G., Vergne, J., Sapkota, S., Kafle, B., Jiang, M., Su, H.P., Chen, J., Pekeris, C.L., Jarosch, H., 1958. The free oscillations of the earth. Contributions in Huang, B.S., 2009. Underplating in the Himalaya-Tibet collision zone revealed by Geophysics in Honor of Beno Gutenberg. Pergamon, New York, pp. 171–192. the Hi-CLIMB experiment. Science 325, 1371–1374. Peter, D., Komatitsch, D., Luo, Y., Martin, R., Goff, L., Casarotti, E., Loher, P.L., Magnoni, F., Nakanishi, I., Anderson, D.L., 1982. Worldwide distribution of group velocity of mantle Liu, Q., Blitz, C., Nissen-meyer, T., Basini, P., Tromp, J., 2011a. Forward and adjoint Rayleigh wave as determined by spherical harmonic inversion. Bulletin of the Seis- simulations of seismic wave propagation on unstructured hexahedral meshes. mological Society of America 72, 1185–1194. Geophysical Journal International 186, 721–739. Nakanishi, I., Anderson, D.L., 1984. Measurements of mantle wave velocities and inver- Peter, D., Savage, B., Rodgers, A., Morency, C., Tromp, J., 2011b. Adjoint tomography of sion for lateral heterogeneity and anisotropy — II. analysis by the single-station the Middle East, abstract S44A-04. 2011 Fall Meeting. AGU. method. Geophysical Journal of the Royal Astronomical Society 78, 573–617. Pilz, M., Parolai, S., Stupazzini, M., Paolucci, R., Zschau, J., 2011. Modelling basin effects Nataf, H.C., Nakanishi, I., Anderson, D.L., 1984. Anisotropy and shear-velocity heteroge- on earthquake ground motion in the Santiago de Chile basin by a spectral element neities in the upper mantle. Geophysical Research Letters 11, 109–112. code. Geophysical Journal International 187, 929–945. Neele, F., de Regt, H., Vandecar, J., 1997. Gross errors in upper-mantle discontinuity to- Plesch, A., Tape, C.H., Graves, J.R., Small, P., Ely, G., Shaw, J.H., 2011. Updates for the pography from underside reflection data. Geophysical Journal International 129, CVM-H including new representations of the offshore Santa Maria and San 194–204. Bernardino basin and a new Moho surface. Proceedings and Abstracts, vol. 21. Nettles, M., Dziewonski, A.M., 2008. Radially anisotropic shear velocity structure of the Plessix, R.E., 2009. Three-dimensional frequency-domain full-waveform inversion with upper mantle globally and beneath North America. Journal of Geophysical Re- an iterative solver. Geophysics 74, WCC149–WCC157. search 113. http://dx.doi.org/10.1029/2006JB004819. Pollitz, F., 1999. Regional velocity structure in northern California from inversion Nishida, K., Kawakatsu, H., Obara, K., 2008. Three-dimensional crustal S wave velocity of scattered seismic surface waves. Journal of Geophysical Research 104, structure in Japan using microseismic data recorded by Hi-net tiltmeters. Journal 15,043–15,072. of Geophysical Research 113. http://dx.doi.org/10.1029/2007JB005395. Postma, G.W., 1955. Wave propagation in a stratified medium. Geophysics 20, Nishimura, C.E., Forsyth, D.W., 1989. The anisotropic structure of the upper mantle in 780–806. the Pacific. Geophysical Journal 96, 203–229. Pratt, R.G., 1999. Seismic waveform inversion in the frequency domain, part 1: theory Nissen-Meyer, T., Dahlen, F.A., Fournier, A., 2007a. Spherical-earth Fréchet sensitivity and verification in a physical scale model. Geophysics 64, 888–901. kernels. Geophysical Journal International 168, 1051–1066. Pratt, R.G., Shipp, R.M., 1999. Seismic waveform inversion in the frequency domain, part 2: Nissen-Meyer, T., Fournier, A., Dahlen, F.A., 2007b. A two-dimensional spectral- fault delineation in sediments using crosshole data. Geophysics 64, 902–914. element method for computing spherical-earth seismograms — I. Moment-tensor Pratt, R.G., Worthington, M.H., 1990a. Inverse theory applied to multi-source cross- source. Geophysical Journal International 168, 1067–1092. hole tomography. Part 2: elastic wave-equation method. Geophysical Prospecting Nissen-Meyer, T., Fournier, A., Dahlen, F.A., 2008. A 2-D spectral-element method 38, 311–329. for computing spherical-earth seismograms — II. Waves in solid–fluid media. Pratt, R.G., Worthington, M.H., 1990b. Inverse theory applied to multi-source cross-hole to- Geophysical Journal International 174, 873–888. mography. Part 1: acoustic wave-equation method. Geophysical Prospecting 287–310. Niu, F.L., James, D.E., 2002. Fine structure of the lowermost crust beneath the Kaapvaal Pratt, R.G., Shin, C., Hicks, G.J., 1998. Gauss–Newton and full Newton methods in craton and its implications for crustal formation and evolution. Earth and Planetary frequency-space seismic waveform inversion. Geophysical Journal International Science Letters 200, 121–130. 133, 341–362.

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 34 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx

Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 2007. Numerical Recipes: Sambridge, M.S., Kennett, B.L.N., 1986. A novel method of hypocentre location. The Art of Scientific Computing, 3rd edition. Cambride University Press. Geophysical Journal of the Royal Astronomical Society 87, 679–697. Radon, J., 1917. On the determination of functions from their integrals along certain Sambridge, M.S., Mosegaard, K., 2002. Monte Carlo methods in geophysical inverse manifolds. Berichte über die Verhandlungen der Sächsische Akademie der problems. Reviews of Geophysics 40. http://dx.doi.org/10.1029/2000RG000089. Wissenschaften 69, 262–277. Sambridge, M.S., Rawlinson, N., 2005. Seismic tomography with irregular meshes. In: Raitt, R., Shorjr, G., Morris, G., Kirk, H., 1971. Mantle anisotropy in the Pacific Ocean. Lavender, A., Nolet, G. (Eds.), Seismic Earth: Array Analysis of Broadband Seismograms. Tectonophysics 12, 173–186. AGU, pp. 49–65. Rawlinson, N., Pozgay, S., Fishwick, S., 2010. Seismic Tomography: A Window into Deep Sambridge, M.S., Tarantola, A., Kennett, B.L.N., 1991. An alternative strategy for non- Earth. Physics of the Earth and Planetary Interiors 178, 101–135. linear inversion of seismic waveforms. Geophysical Prospecting 39, 723–736. Rayleigh, L., 1877. The Theory of Sound. Macmillan, London. Saygin, E., Kennett, B.L.N., 2012. Crustal structure of Australia from ambient seismic Rayleigh, L., 1906. On the dilatational stability of the Earth. Proceedings of the Royal noise tomography. Journal of Geophysical Research 117. http://dx.doi.org/ Society of London Series A 77, 486–499. 10.1029/2011JB008403. Resovsky, J., Trampert, J., 2003. Using probabilistic seismic tomography to test mantle Scales, J.A., Snieder, R., 1997. To Bayes or not to Bayes? Geophysics 62, 1045–1046. velocity–density relationships. Earth and Planetary Science Letters 215, 121–134. Scales, J.A., Snieder, R., 2000. The anatomy of inverse problems. Geophysics 65, Revenaugh, J., Jordan, T.H., 1989. A study of mantle layering beneath the Western Pacific. 1708–1710. Journal of Geophysical Research 94, 5787–5813. Scales, J.A., Smith, M.L., Fischer, T.L., 1992. Global optimization for multimodal inverse Rickett, J., Claerbout, J.F., 1999. Acoustic daylight imaging via spectral factorization: problems. Journal of Computational Physics 103, 258–268. helioseismology and reservoir monitoring. The Leading Edge 18, 957–960. Schlue, J.W., Knopoff, L., 1977. Shear-wave polarization anisotropy in the Pacific Basin. Ritsema, J., 1999. Complex shear wave velocity structure imaged beneath Africa and Geophysical Journal of the Royal Astronomical Society 49, 145–165. Iceland. Science 286, 1925–1928. Shapiro, N.M., Campillo, M., 2004. Emergence of broadband Rayleigh waves from cor- Ritsema, J., van Heijst, H.J., 2000. Seismic imaging of structural heterogeneity in Earth's relations of the ambient seismic noise. Geophysical Research Letters 31. http:// mantle: evidence for large-scale mantle flow. Science Progress 83, 243–259. dx.doi.org/10.1029/2004GL019491. Ritsema, J., Deuss, A., van Heijst, H.J., Woodhouse, J.H., 2011. S40RTS: A degree-40 Shapiro, N.M., Ritzwoller, M.H., 2002. Monte-Carlo inversion for a global shear-velocity shear-velocity model for the mantle from new Rayleigh wave dispersion, model of the crust and upper mantle. Geophysical Journal International 151, 88–105. teleseismic traveltime and normal-mode splitting function measurements. Shapiro, N.M., Campillo, M., Stehly, L., Ritzwoller, M.H., 2005. High-resolution surface- Geophysical Journal International 184, 1223–1236. wave tomography from ambient seismic noise. Science 307, 1615–1618. Ritzwoller, M.H., Lavely, E.M., 1995. Three-dimensional seismic models of the Earth's Shaw, P.R., 1992. Quantitative comparison of seismic data sets with waveform inver- mantle. Reviews of Geophysics 33, 1–66. sion: testing the age-dependent evolution of crustal structure. Journal of Geophys- Ritzwoller, M.H., Shapiro, N.M., Barmin, M.P., Levshin, A.L., 2002. Global surface wave ical Research 97, 19,981–19,991. diffraction tomography. Journal of Geophysical Research 107. http://dx.doi.org/ Shearer, P.M., 1991. Imaging global body wave phases by stacking long-period 10.1029/2002JB001777. seismograms. Journal of Geophysical Research 96, 20,353–20,364. Ritzwoller, M.H., Lin, F.C., Shen, W.S., 2011. Ambient noise tomography with a large Shearer, P.M., 1993. Global mapping of upper mantle reflectors from long-period SS seismic array. Comptes Rendus Geoscience 343, 558–570. precursors. Geophysical Journal International 115, 878–904. Robertsson, J.O.A., Chapman, C.H., 2000. An efficient method for calculating finite- Shearer, P.M., 2000. Upper mantle seismic discontinuities. Earth's Deep Interior: Min- difference seismograms after model alterations. Geophysics 65, 907–918. eral Physics and Tomography from the Atomic to the Global Scale. : Geophysical Roecker, S., Thurber, C.H., Roberts, K., Powell, L., 2006. Refining the image of the San Monograph, 117. AGU, pp. 115–131. Andreas Fault near Parkfield, California using a finite difference travel time compu- Shearer, P.M., Masters, T.G., 1992. Global mapping of topography on the 660-km dis- tation technique. Tectonophysics 426, 189–205. continuity. Nature 355, 791–796. Romanowicz, B., 1990. The upper mantle degree 2: constraints and inferences. Journal Shin, C., Jang, S., Min, D.J., 2001. Improved amplitude preservation for prestack depth of Geophysical Research 95, 11,051–11,071. migration by inverse scattering theory. Geophysical Prospecting 49, 592–606. Romanowicz, B., 1991. Seismic tomography of the Earth's mantle. Annual Review of Shragge, J., Bostock, M.G., Rondenay, S., 2001. Multiparameter two-dimensional inver- Earth and Planetary Sciences 19, 77–99. sion of scattered teleseismic body waves, 2. Numerical examples. Journal of Romanowicz, B., 1994. Anelastic tomography: a new perspective on upper mantle thermal Geophysical Research 10630 (30), 783–793. structure. Earth and Planetary Science Letters 128, 113–121. Sieminski, A., Liu, Q., Trampert, J., Tromp, J., 2007a. Finite-frequency sensitivity of body Romanowicz, B., 1995. A global tomographic model of shear attenuation in the upper waves to anisotropy based upon adjoint methods. Geophysical Journal International mantle. Journal of Geophysical Research 100, 12,375–12,394. 171, 368–389. Romanowicz, B., 1998. Attenuation tomography of the Earth's mantle: a review of current Sieminski, A., Liu, Q., Trampert, J., Tromp, J., 2007b. Finite-frequency sensitivity of status. Pure and Applied Geophysics 153, 257–272. surface waves to anisotropy based upon adjoint methods. Geophysical Journal Romanowicz, B., 2003. Global mantle tomography: progress status in the past 10 years. International 168, 1153–1174. Annual Review of Earth and Planetary Sciences 31, 303–328. Sigloch, K., Nolet, G., 2006. Measuring finite-frequency body-wave amplitudes and Romanowicz, B., 2008. Using seismic waves to image Earth's internal structure. Nature traveltimes. Geophysical Journal International 167, 271–287. 451, 266–268. Sigloch, K., McQuarrie, N., Nolet, G., 2008. Two-stage subduction history under North Rondenay, S., 2009. Upper mantle imaging with array recordings of converted and America inferred from multiple-frequency tomography. Nature Geoscience 1. scattered teleseismic waves. Surveys in Geophysics 30, 377–405. http://dx.doi.org/10.1038/ngeo231. Rondenay, S., Bostock, M.G., Fischer, K.M., 2005. Multichannel inversion of scattered Silver, P.G., 1996. Seismic anisotropy beneath the continents: probing the depths of teleseismic body waves: practical considerations and applicability. In: Levander, geology. Annual Review of Earth and Planetary Sciences 24, 385–432. A., Guust, N. (Eds.), Seismic Earth: Array Analysis of Broadband Seismograms. Simmons, N.A., Forte, A.M., Grand, S.P., 2006. Constraining mantle flow with seismic AGU, pp. 187–203. and geodynamic data: a joint approach. Earth and Planetary Science Letters 246, Rost, S., Thomas, C., 2002. Array seismology: methods and applications. Reviews of 109–124. Geophysics 40. http://dx.doi.org/10.1029/2000RG000100. Simmons, N.A., Forte, A.M., Grand, S.P., 2007. Thermochemical structure and dynamics Rost, S., Thomas, C., 2009. Improving seismic resolution through array processing tech- of the African superplume. Geophysical Research Letters 34. http://dx.doi.org/ niques. Surveys in Geophysics 30, 271–299. 10.1029/2006GL028009. Roth, E.G., Wiens, D.A., 2000. An empirical relationship between seismic attenuation and Simons, F.J., van der Hilst, R.D., 2003. Seismic and mechanical anisotropy and the past velocity anomalies in the upper mantle. Geophysical Research Letters 27, 601–604. and present deformation of the Australian lithosphere. Earth and Planetary Science Roux, P., Sabra, K.G., Gerstoft, P., Kuperman, W.A., 2005. P-waves from cross- Letters 211, 271–286. correlation of seismic noise. Geophysical Research Letters 32. http://dx.doi.org/ Sirgue, L., Pratt, R.G., 2004. Efficient waveform inversion and imaging: a strategy for 10.1029/2005GL023803. selecting temporal frequencies. Geophysics 69, 231–248. Roux, E., Moorkamp, M., Jones, A.G., Bischoff, M., Endrun, B., Lebedev, S., Meier, T., 2011. Smith, M.L., Dahlen, F.A., 1973. The azimuthal dependence of Love and Rayleigh wave Joint inversion of long-period magnetotelluric data and surface-wave dispersion propagation in a slightly anisotropic medium. Journal of Geophysical Research curves for anisotropic structure: application to data from Central Germany. Geo- 78, 3321–3333. physical Research Letters 38. http://dx.doi.org/10.1029/2010GL046358. Snellius, W., 1621. Cyclometricus. Reprinted by Kessinger Publishing, 2010. Rychert, C.A., Shearer, P.M., 2011. Imaging the lithosphere–asthenosphere boundary Snieder, R., 1996. Surface wave inversions on a regional scale. In: Boschi, E., Ekstrom, G., beneath the Pacific using SS waveform modeling. Journal of Geophysical Research Morelli, A. (Eds.), Seismic Modelling of Earth Structure: Editrice Compositori, vol. 116. http://dx.doi.org/10.1029/2010JB008070. 25, pp. 149–173. Rychert, C.A., Fischer, K.M., Rondenay, S., 2005. A sharp lithosphere–asthenosphere Snieder, R., 1998. The role of nonlinearity in inverse problems. Inverse Problems 14, boundary imaged beneath eastern North America. Nature 436, 542–545. 387–404. Saad, Y., 2011. Numerical Methods for Large Eigenvaue Problems, 2nd edition. SIAM. Snieder, R., 2004. Extracting the Green's function from the correlation of coda waves: a Sabra, K.G., Gerstoft, P., Roux, P., Kuperman, W.A., Fehler, M.C., 2005. Extracting time- derivation based on stationary phase. Physical Review E 69. http://dx.doi.org/ domain Green's function estimates from ambient seismic noise. Geophysical 10.1103/PhysRevE.69.046610. Research Letters 32. http://dx.doi.org/10.1029/2004GL021862. Snieder, R., Nolet, G., 1987. Linearized scattering of surface waves on a spherical Earth. Saltzer, R.L., van der Hilst, R.D., Kárason, H., 2001. Comparing P and S wave heterogene- Journal of Geophysics 61, 55–63. ity in the mantle. Geophysical Research Letters 28, 1335–1338. Snieder, R., Romanowicz, B., 1988. A new formalism for the effect of lateral heterogene- Sambridge, M.S., 1990. Non-linear arrival time inversion: constraining velocity anomalies ity on normal modes and surface waves — I: isotropic perturbations, perturbations by seeking smooth models in 3-D. Geophysical Journal International 102, 653–677. of interfaces and gravitational perturbations. Geophysical Journal 92, 207–222. Sambridge, M.S., Drijkoningen, G., 1992. Genetic algorithms in seismic waveform in- Snieder, R., Trampert, J., 1999. Inverse problems in geophysics. In: Wirgin, A. (Ed.), version. Geophysical Journal International 109, 323–342. Wavefield Inversion. Springer-Verlag, New York, pp. 119–190.

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx 35

Snieder, R., Miyazawa, M., Slob, E., Vasconcelos, I., Wapenaar, K., 2009. A comparison of Trampert, J., Deschamps, F., Resovsky, J., Yuen, D., 2004. Probabilistic tomography maps strategies for seismic interferometry. Surveys in Geophysics 30, 503–523. chemical heterogeneities throughout the lower mantle. Science 306, 853–856. Souriau, A., Souriau, M., 1983. Test of tectonic models by great circle Rayleigh waves. Tromp, J., Tape, C.H., Liu, Q., 2005. Seismic tomography, adjoint methods, time reversal Geophysical Journal of the Royal Astronomical Society 73, 533–551. and banana-doughnut kernels. Geophysical Journal International 160, 195–216. Spakman, W., Nolet, G., 1988. Imaging algorithms, accuracy and resolution in delay Tromp, J., Komatitsch, D., Hjörleifsdóttir, V., Liu, Q., Zhu, H.J., 2010a. Near real-time sim- time tomography. In: Vlaar, N.J., Nolet, G., Wortel, M.J.R., Cloetingh, S. (Eds.), Math- ulations of global CMT earthquakes. Geophysical Journal International 183, ematical Geophysics: A Survey of Recent Developments in Seismology and 381–389. Geodynamics. D. Reidel, pp. 155–188. Tromp, J., Luo, Y., Hanasoge, S., Peter, D., 2010b. Noise cross-correlation sensitivity ker- Stadler, G., 2010. Spectral DG for wave propagation and inversion in coupled acoustic- nels. Geophysical Journal International 183, 791–819. elastic media. 1st QUEST Workshop, Sardinia, Italy. Valentine, A.P., Trampert, J., 2011. Reducing the dimension of seismic waveforms using Stehly, L., Campillo, M., Shapiro, N.M., 2006. A study of the seismic noise from its long- Autoencoders: a step towards fully non-linear tomography, and a practical tool, range correlation properties. Journal of Geophysical Research 111. http:// Abstract S12C-05. 2011 Fall Meeting. AGU. dx.doi.org/10.1029/2005JB004237. van der Hilst, R.D., 1999. Compositional heterogeneity in the bottom 1000 kilometers Stupazzini, M., 2006. 3D ground motion simulation of the Grenoble Valley by GeoELSE. of Earth's mantle: toward a hybrid convection model. Science 283, 1885–1888. Proceedings of the 3rd International Symposium on the Effects of Surface Geology van der Hilst, R.D., de Hoop, M.V., 2005. Banana-doughnut kernels and mantle tomog- on Seismic Motion (ESG), Grenoble, France. raphy. Geophysical Journal International 163, 956–961. Stupazzini, M., Paolucci, R., Igel, H., 2009. Near-fault earthquake ground-motion simu- van der Hilst, R.D., de Hoop, M.V., 2006. Reply to comment by R. Montelli, G. Nolet and lation in the Grenoble Valley by a high-performance spectral element code. Bulle- F. A. Dahlen on ‘Banana-doughnut kernels and mantle tomography’. Geophysical tin of the Seismological Society of America 99, 286–301. Journal International 167, 1211–1214. Stutzmann, E., Montagner, J.P., 1993. An inverse technique for retrieving higher mode van der Hilst, R.D., Engdahl, E.R., Spakman, W., Nolet, G., 1991. Tomographic imaging of phase velocity and mantle structure. Geophysical Journal International 113, 669–683. subducted lithosphere below northwest Pacific island arcs. Nature 353, 37–43. Su, W.J., Dziewonski, A.M., 1997. Simultaneous inversion for 3-D variations in shear van der Hilst, R.D., Widiyantoro, S., Engdahl, E.R., 1997. Evidence for deep mantle circu- and bulk velocity in the mantle. Physics of the Earth and Planetary Interiors 100, lation from global tomography. Nature 386, 578–584. 135–156. van der Hilst, R.D., de Hoop, M.V., Wang, P., Shim, S.H., Ma, P., Tenorio, L., 2007. Struc- Su, W.J., Woodward, R.L., Dziewonski, A.M., 1992. Deep origin of mid-ocean-ridge seis- ture of Earth's core–mantle boundary region. Science 315, 1813–1817. mic velocity anomalies. Nature 360, 149–152. van der Lee, S., Frederiksen, A., 2005. Surface wave tomography applied to the North Su, W.J., Woodward, R.L., Dziewonski, A.M., 1994. Degree 12 model of shear velocity American upper mantle. In: Levander, A., Guust, N. (Eds.), Seismic Earth: Array heterogeneity in the mantle. Journal of Geophysical Research 99, 6945–6980. Analysis of Broadband Seismograms. AGU, pp. 67–80. Suda, N., Shibata, N., Fukao, Y., 1991. Degree-2 pattern of attenuation structure in the van der Lee, S., Nolet, G., 1997. Upper mantle S velocity structure of North America. upper mantle from apparent complex frequency measurements of fundamental Journal of Geophysical Research 102, 22,815–22,838. spheroidal modes. Geophysical Research Letters 18, 1119–1122. Vandecar, J.C., Crosson, R.S., 1990. Determination of teleseismic relative phase arrival Takeuchi, H., Saito, M., 1966. Seismic surface waves. Methods in Computational Physics, times using multi-channel cross-correlation and least-squares. Bulletin of the Seis- 11. Academic Press, New York, pp. 217–295. mological Society of America 80, 150–169. Talagrand, O., Courtier, P., 1987. Variational assimilation of meteorological observa- Vasco, D.W., Johnson, L.R., 1998. Whole earth structure estimated from seismic arrival tions with the adjoint vorticity equation. I. Theory. Quarterly Journal of the Royal times. Journal of Geophysical Research 103, 2633–2671. Meteorological Society 113, 1311–1328. Velasco, A.A., Gee, V.L., Rowe, C., Grujic, D., Hollister, L.S., Hernandez, D., Miller, K.C., Tanimoto, T., 1984. A simple derivation of the formula to calculate synthetic long- Tobgay, T., Fort, M., Harder, S., 2007. Using small, temporary seismic networks period seismograms in a heterogeneous earth by normal mode summation. for investigating tectonic deformation: brittle deformation and evidence for Geophysical Journal of the Royal Astronomical Society 77, 275–278. strike-slip faulting in Bhutan. Seismological Research Letters 78, 446–453. Tanimoto, T., Anderson, D.L., 1985. Lateral heterogeneity and azimuthal anisotropy of Vest, C.M., 1974. Formation of images from projections: Radon and Abel transforms. the upper mantle: Love and Rayleigh waves 100–250 s. Journal of Geophysical Journal of the Optical Society of America 64, 1215–1218. Research 90, 1842–1858. Vigh, D., Starr, E.W., Kapoor, J., 2009. Developing earth models with full waveform Tape, C.H., Liu, Q., Tromp, J., 2007. Finite-frequency tomography using adjoint methods inversion. The Leading Edge 28, 432–435. — methodology and examples using membrane surface waves. Geophysical Jour- Villaseñor, A., Yang, Y., Ritzwoller, M.H., Gallart, J., 2007. Ambient noise surface wave nal International 168, 1105–1129. tomography of the Iberian Peninsula: implications for shallow seismic structure. Tape, C.H., Liu, Q., Tromp, J., Maggi, A., 2009. Adjoint tomography of the southern Geophysical Research Letters 34. http://dx.doi.org/10.1029/2007GL030164. California crust. Science 325, 988–992. Vinnik, L.P., 1977. Detection of waves converted from P to SV in the mantle. Physics of Tape, C.H., Liu, Q., Maggi, A., Tromp, J., 2010. Seismic tomography of the southern Cal- the Earth and Planetary Interiors 15, 39–45. ifornia crust based on spectral-element and adjoint methods. Geophysical Journal Vinnik, L., Farra, V., 2002. Subcratonic low-velocity layer and flood basalts. Geophysical International 180, 433–462. Research Letters 29. http://dx.doi.org/10.1029/2001GL014064. Tarantola, A., 1984a. Inversion of seismic reflection data in the acoustic approximation. Virieux, J., Operto, S., 2009. An overview of full-waveform inversion in exploration geo- Geophysics 49, 1259–1266. physics. Geophysics 74, WCC1–WCC26. Tarantola, A., 1984b. Linearized inversion of seismic reflection data. Geophysical Waldhauser, F., Tolstoy, M., 2011. Seismogenic structure and processes associated with Prospecting 32, 998–1015. magma inflation and hydrothermal circulation beneath the East Pacific Rise at Tarantola, A., 1986. A strategy for nonlinear elastic inversion of seismic reflection data. 9∘50′N. Geochemistry, Geophysics, Geosystems 12. http://dx.doi.org/10.1029/ Geophysics 51, 1893–1903. 2011GC003568. Tarantola, A., 1988. Theoretical background for the inversion of seismic waveforms Wang, P., de Hoop, M.V., van der Hilst, R.D., 2008. Imaging of the lowermost mantle (D″) including elasticity and attenuation. Pure and Applied Geophysics 128, 365–399. and the core–mantle boundary with SKKS coda waves. Geophysical Journal Interna- Tarantola, A., 2005. Inverse Problem Theory and Methods for Model Parameter Estimation. tional 175, 103–115. SIAM. Wapenaar, K., 2004. Retrieving the elastodynamic Green's function of an arbitrary in- Tarantola, A., 2007. The Square-root Variable Metric Algorithm. homogeneous medium by cross correlation. Physical Review Letters 93. http:// Tarantola, A., Valette, B., 1982. Generalized nonlinear inverse problems solved using dx.doi.org/10.1103/PhysRevLett.93.254301. the least squares criterion. Reviews of Geophysics and Space Physics 20, 219–232. Weaver, R.L., Lobkis, O., 2003. Elastic wave thermal fluctuations, ultrasonic waveforms Tessmer, E., Kosloff, D., Behle, A., 1992. Elastic wave propagation simulation in the by correlation of thermal phonons. The Journal of the Acoustical Society of America presence of surface topography. Geophysical Journal International 108, 621–632. 113, 2611–2621. Thomson, D.J., 1982. Spectrum estimation and harmonic analysis. Proceedings of the Weaver, R.L., Lobkis, O., 2004. Diffuse fields in open systems and the emergence of the IEEE 70, 1055–1096. Green's function (L). The Journal of the Acoustical Society of America 116, 2731–2734. Thybo, H., Perchuc, E., 1997. The seismic 8o discontinuity and partial melting in conti- Weber, Z., 2000. Seismic traveltime tomography: a simulated annealing approach. nental mantle. Science 275, 1626–1629. Physics of the Earth and Planetary Interiors 119, 149–159. Tian, Y., Sigloch, K., Nolet, G., 2009. Multiple-frequency SH-wave tomography of the Wessel, P., Smith, W.H., 1991. Free software helps map and display data. EOS Transactions western US upper mantle. Geophysical Journal International 178, 1384–1402. AGU 72, 441. Tkalčić, H., Romanowicz, B., Houy, N., 2002. Constraints on D″ structure using PKP(AB- Widiyantoro, S., van der Hilst, R.D., 1997. Mantle structure beneath Indonesia inferred DF), PKP(BC-DF) and PcP–P traveltime data from broad-band records. Geophysical from high-resolution tomographic imaging. Geophysical Journal International 130, Journal International 148, 599–616. 167–182. Toksoz, M.N., Anderson, D.L., 1966. Phase velocities of long-period surface waves and Wiechert, E., 1910. Bestimmung des weges der erdbebenwellen im erdinneren, I. structure of the upper mantle. Journal of Geophysical Research 71, 1649–1658. theoretisches. Physikalische Zeitschrift 11, 294–304. Trampert, J., van der Hilst, R.D., 2005. Towards a quantitative interpretation of global Wiggins, R.A., 1972. The general linear inverse problem: Implication of surface waves seismic tomography. Earth's deep mantle: Structure, composition, and evolution. and free oscillations for Earth structure. Reviews of Geophysics and Space Physics AGU, pp. 47–62. 10, 251–285. Trampert, J., van Heijst, H.J., 2002. Global azimuthal anisotropy in the transition zone. Wilson, D.C., Aster, R.C., 2003. Imaging crust and upper mantle seismic structure in the Science 296, 1297–1299. southwestern United States using teleseismic receiver functions. The Leading Edge Trampert, J., Woodhouse, J.H., 1996. High resolution global phase velocity distributions. 22, 232–235. Geophysical Research Letters 23, 21–24. Wilson, D.C., Angus, D.A., Ni, J.F., Grand, S.P., 2006. Constraints on the interpretation of Trampert, J., Woodhouse, J.H., 2003. Global anisotropic phase velocity maps for fundamen- S‐to‐P receiver functions. Geophysical Journal International 165, 969–980. tal mode surface waves between 40 and 150 s. Geophysical Journal International 154, Woodhouse, J.H., 1974. Surface waves in a laterally varying layered structure. Geophysical 154–165. Journal of the Royal Astronomical Society 37, 461–490.

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006 36 Q. Liu, Y.J. Gu / Tectonophysics xxx (2012) xxx–xxx

Woodhouse, J.H., 1976. On Rayleigh's Principle. Geophysical Journal of the Royal Astro- Yuan, H.Y., Romanowicz, B., Fischer, K.M., Abt, D.L., 2011. 3-D shear wave radially and nomical Society 46, 11–22. azimuthally anisotropic velocity model of the North American upper mantle. Geo- Woodhouse, J.H., 1978. Asymptotic results for elastodynamic propagator matrices in physical Journal International 184, 1237–1260. plane-stratified and spherically-stratified earth models. Geophysical Journal of Zhang, Y.S., Tanimoto, T., 1992. Ridges, hotspots and their interaction as observed in the Royal Astronomical Society 54, 263–280. seismic velocity maps. Nature 355, 45–49. Woodhouse, J.H., 1981. A note on the calculation of travel times in a transversely Zhao, D.P., 2001a. New advances of seismic tomography and its applications to subduc- isotropic Earth model. Physics of the Earth and Planetary Interiors 25, 357–359. tion zones and earthquake fault zones: a review. Island Arc 10, 68–84. Woodhouse, J.H., Dziewonski, A.M., 1984. Mapping the upper mantle: three-dimensional Zhao, D.P., 2001b. Seismic structure and origin of hotspots and mantle plumes. Earth modeling of earth structure by inversion of seismic waveforms. Journal of Geophysical and Planetary Science Letters 192, 251–265. Research 89, 5953–5986. Zhao, D.P., 2009. Multiscale seismic tomography and mantle dynamics. Gondwana Re- Woodhouse, J.H., Dziewonski, A.M., 1986. Three-dimensional mantle models based on search 15, 297–323. mantle wave and long period body wave data. Eos Transactions AGU 67, 307. Zhao, L., Chevrot, S., 2011a. An efficient and flexible approach to the calculation of Woodhouse, J.H., Dziewonski, A.M., 1989. Seismic modelling of the Earth's large-Scale three-dimensional full-wave Fréchet kernels for seismic tomography — I. Theory. three-dimensional structure. Philosophical Transactions of the Royal Society of Geophysical Journal International 185, 922–938. London, Series A 328, 291–308. Zhao, L., Chevrot, S., 2011b. An efficient and flexible approach to the calculation of Wookey, J., Stackhouse, S., Kendall, J.M., Brodholt, J., Price, G.D., 2005. Efficacy of the three-dimensional full-wave Fréchet kernels for seismic tomography — II. Numer- post-perovskite phase as an explanation for lowermost-mantle seismic properties. ical results. Geophysical Journal International 185, 939–954. Nature 438, 1004–1007. Zhao, L., Dahlen, F.A., 1995. Asymptotic normal modes of the Earth — III. Frechet kernel Wu, R.S., Aki, K., 1985. Scattering characteristics of elastic waves by an elastic heterogene- and group velocity. Geophysical Journal International 122, 229–325. ity. Geophysics 50, 582–595. Zhao, L., Jordan, T.H., 1998. Sensitivity of frequency-dependent traveltimes to laterally Wu, R.S., Toksoz, M.N., 1987. Diffraction tomography and multisource holography applied heterogeneous, anisotropic Earth structure. Geophysical Journal International 133, to seismic imaging. Geophysics 52, 11–25. 683–704. Wysession, M.E., 1996. Large-scale structure at the core–mantle boundary from Zhao, D.P., Hasegawa, A., Kanamori, H., 1994. Deep structure of Japan subduction zone diffracted waves. Nature 382, 244–248. as derived from local, regional and teleseismic events. Journal of Geophysical Xu, Z.J., Song, X.D., 2010. Joint inversion for crustal and Pn velocities and Moho depth in Research 99, 22,313–22,329. eastern margin of the Tibetan Plateau. Tectonophysics 491, 185–193. Zhao, L., Jordan, T.H., Chapman, C.H., 2000. Three-dimensional Frechet differential Yan, Z., Clayton, R.W., 2007. Regional mapping of the crustal structure in southern kernels for seismic delay times. Geophysical Journal International 141, 558–576. California from receiver functions. Journal of Geophysical Research 112. http:// Zhao, L., Jordan, T.H., Olsen, K.B., Chen, P., 2005. Fréchet kernels for imaging regional dx.doi.org/10.1029/2006JB004622. earth structure based on three-dimensional reference models. Bulletin of the Seis- Yang, Y., Forsyth, D.W., 2006. Rayleigh wave phase velocities, small-scale convection, mological Society of America 95, 2066–2080. and azimuthal anisotropy beneath southern California. Journal of Geophysical Zhao, L., Jordan, T.H., Chen, P., 2006. Strain Green's tensors, reciprocity, and their appli- Research 111. http://dx.doi.org/10.1029/2005JB004180. cations to seismic source and structure studies. Bulletin of the Seismological Society Yang, Y., Ritzwoller, M.H., 2008. Characteristics of ambient seismic noise as a source for sur- of America 96, 1753–1763. face wave tomography. Geochemistry, Geophysics, Geosystems 9. http://dx.doi.org/ Zheng, S.H., Sun, X.L., Song, X.D., Yang, Y., Ritzwoller, M.H., 2008. Surface wave tomog- 10.1029/2007GC001814. raphy of China from ambient seismic noise correlation. Geochemistry, Geophysics, Yang, Y., Ritzwoller, M.H., Levshin, A.L., Shapiro, N.M., 2007. Ambient noise Rayleigh Geosystems 9. http://dx.doi.org/10.1029/2008GC001981. wave tomography across Europe. Geophysical Journal International 168, 259–274. Zheng, Y., Shen, W.S., Zhou, L.Q., Yang, Y., Xie, Z.J., Ritzwoller, M.H., 2011. Crust and up- Yang, Y., Li, A., Ritzwoller, M.H., 2008. Crustal and uppermost mantle structure in permost mantle beneath the North China Craton, northeastern China, and the Sea southern Africa revealed from ambient noise and teleseismic tomography. of Japan from ambient noise tomography. Journal of Geophysical Research 116. Geophysical Journal International 174, 235–248. http://dx.doi.org/10.1029/2011JB008637. Yang, Y., Ritzwoller, M.H., Jones, C.H., 2011. Crustal structure determined from ambient Zhou, Y., 2009. Surface-wave sensitivity to 3-D anelasticity. Geophysical Journal Inter- noise tomography near the magmatic centers of the Coso region, southeastern national 178, 1403–1410. California. Geochemistry, Geophysics, Geosystems 12. http://dx.doi.org/10.1029/ Zhou, H.W., Clayton, R.W., 1990. P and S wave travel time inversions for subducted slab 2010GC003362. under the island arcs of the Northwest Pacific. Journal of Geophysical Research 95, Yao, H., van der Hilst, R.D., 2009. Analysis of ambient noise energy distribution and 6829–6851. phase velocity bias in ambient noise tomography, with application to SE Tibet. Zhou, Y., Nolet, G., Dahlen, F.A., 2004. Three-dimensional sensitivity kernels for surface Geophysical Journal International 179, 1113–1132. wave observables. Geophysical Journal International 158, 142–168. Yao, H., van der Hilst, R.D., de Hoop, M.V., 2006. Surface-wave array tomography in SE Zhou, Y., Nolet, G., Dahlen, F.A., Laske, G., 2005. Finite-frequency effects in global Tibet from ambient seismic noise and two-station analysis: I. Phase velocity maps. surface-wave tomography. Geophysical Journal International 163, 1087–1111. Geophysical Journal International 166, 732–744. Zhou, Y., Liu, Q., Tromp, J., 2011. Surface-wave sensitivity: mode summation versus ad- Yao, H., van der Hilst, R.D., Montagner, J.P., 2010. Heterogeneity and anisotropy of the joint SEM. Geophysical Journal International 187, 1570–1576. lithosphere of SE Tibet from surface wave array tomography. Journal of Geophysical Zhu, L., Kanamori, H., 2000. Moho depth variation in southern California from Research 115. http://dx.doi.org/10.1029/2009JB007142. teleseismic receiver functions. Journal of Geophysical Research 105, 2969–2980. Yu, G.K., Mitchell, B.J., 1979. Regionalized shear velocity models of the Pacific upper Zhu, L., Rivera, L.A., 2002. A note on the dynamic and static displacements from a point mantle from observed Love and Rayleigh wave dispersion. Geophysical Journal of source in multilayered media. Geophysical Journal International 148, 619–627. the Royal Astronomical Society 57, 311–341. Zhu, H.J., Luo, Y., Nissen-Meyer, T., Morency, C., Tromp, J., 2009. Elastic imaging and time- Yuan, H.Y., Romanowicz, B., 2010. Lithospheric layering in the North American craton. lapse migration based on adjoint methods. Geophysics 74, WCA167–WCA177. Nature 466, 1063–1068. Zhu, H.J., Bozda , E., Peter, D., Tromp, J., 2012. Adjoint tomography reveals structure of Yuan, X.H., Kind, R., Li, X.Q., Wang, R.J., 2006. The S receiver functions: synthetics and the European upper mantle. Nature Geoscience 5, 493–498. data example. Geophysical Journal International 165, 555–564.

Please cite this article as: Liu, Q., Gu, Y.J., Seismic imaging: From classical to adjoint tomography, Tectonophysics (2012), http://dx.doi.org/ 10.1016/j.tecto.2012.07.006