<<

Inverse Problems

TOPICAL REVIEW Related content

- A comparison of seismic velocity inversion The seismic methods for layered acoustics - A mathematical framework for inverse To cite this article: W W Symes 2009 Inverse Problems 25 123008 wave problems in heterogeneous media - A method for inverse scattering based on the generalized Bremmer coupling series

View the article online for updates and enhancements. Recent citations

- On quasi- propagation in highly anisotropic triclinic layer between distinct semi-infinite triclinic geomedia Pato Kumari and Neha

- Reduced Order Model Approach to Inverse Scattering Liliana Borcea et al

- Seismic wavefield redatuming with regularized multi-dimensional Nick Luiken and Tristan van Leeuwen

This content was downloaded from IP address 189.46.171.20 on 21/10/2020 at 04:11 IOP PUBLISHING INVERSE PROBLEMS Inverse Problems 25 (2009) 123008 (39pp) doi:10.1088/0266-5611/25/12/123008

TOPICAL REVIEW

The seismic reflection inverse problem

W W Symes

Computational and Applied Mathematics, Rice University, MS 134, Rice University, Houston, TX 77005, USA

E-mail: [email protected]

Received 3 July 2009, in final form 2 September 2009 Published 1 December 2009 Online at stacks.iop.org/IP/25/123008

Abstract The seismic reflection method seeks to extract maps of the ’s sedimentary crust from transient near-surface recording of echoes, stimulated by explosions or other controlled sound sources positioned near the surface. Reasonably accurate models of seismic energy propagation take the form of hyperbolic systems of partial differential equations, in which the coefficients represent the spatial distribution of various mechanical characteristics of rock (density, stiffness, etc). Thus the fundamental problem of reflection is an inverse problem in partial differential equations: to find the coefficients (or at least some of their properties) of a linear hyperbolic system, given the values of a family of solutions in some part of their domains. The exploration community has developed various methods for estimating the Earth’s structure from seismic data and is also aware of the inverse point of view. This article reviews mathematical developments in this subject over the last 25 years, to show how the mathematics has both illuminated innovations of practitioners and led to new directions in practice. Two themes naturally emerge: the importance of single scattering dominance and compensation for spectral incompleteness by spatial redundancy.

1. Introduction

Reflection seismology is the main tool used by the oil and gas industry to map deposits in the Earth’s upper crust. Environmental and civil engineers also use variants of the technique to locate bedrock, aquifers and other near-surface features, and academic geophysicists have extended it to a tool for imaging the lower crust and mantle. Seismic wave propagation, the physical phenomenon underlying the reflection method as well as other types of seismology, is modeled with reasonable accuracy as small-amplitude displacement of a continuum, using various specializations and generalizations of linear elastodynamics. In these models, the various mechanical properties of rock regulating the wave propagation phenomenon appear as spatially varying coefficients in a system of time- dependent hyperbolic linear partial differential equations (‘PDEs’). (Somewhat confusingly,

0266-5611/09/123008+39$30.00 © 2009 IOP Publishing Ltd Printed in the UK 1 Inverse Problems 25 (2009) 123008 Topical Review the literature on this subject also uses the word ‘model’ to connote the collection of coefficients characterizing an instance of a continuum mechanical model. I shall also use the word in this sense.) Thus the main goal of reflection seismology could be construed as the solution of inverse problems for certain PDEs. The data of the reflection seismology inverse problem are samples of the solutions of these PDEs, in the form of time series of pressure or related quantities collected at points corresponding to locations of seismic sensors (the common name for these is ‘receivers’). The solution of the inverse problem is the model (collection of coefficient functions), and the implicit relation between data and solution is the system of PDEs. It should also be noted that seismic surveys are multiexperiments: that is, they consist of a number of more or less identical individual experiments (commonly called ‘shots’), from each of which part of the data is collected. In modeling terms, the data are samples taken from solutions of a family of linear PDEs, sharing the same coefficients but with different (and, to some extent, known) right-hand sides. Geophysicists have developed a variety of methods for the approximate solution of this inverse problem. In many cases, these methods were not originally construed as solutions of inverse problems for PDEs, but as processes which convert seismic data to images of the Earth’s interior. The purpose of this review is to overview the mathematical understanding of these seismic imaging methods that has developed almost entirely over the last 25 years. This understanding has both illuminated the rationale and performance of methods used in practice and suggested improvements and alternatives to widely used processes as well as algorithms to extract previously inaccessible information from the data. Amongst other things, the mathematical meaning of the word ‘imaging’ has become much clearer, along with the relation between imaging and inversion. New approaches to solution of inverse problems for hyperbolic PDEs have also emerged from this study. Two central themes pervade this subject. The first is the importance of single scattering, which in mathematical terms is more or less equivalent to linearization of the equations of motion with respect to the model (coefficients), at a smooth model. That single scattering is apparent at all is somewhat surprising, since rocks are typically highly heterogeneous, exhibiting spatial structure at all scales from micrometers to kilometers. Given an appropriate relation between wavelengths, travel distances and correlation lengths of material structures, elastodynamics can produces diffuse fields (Weaver and Lobkis 2006), in which the singly scattered waves are submerged in a sea of multiple scattering. A similar phenomenon occurs in optical tomography of biological tissue, in which ballistic photons typically play an insignificant role (Arridge 1999). The onset of elastodynamic diffusion may in fact limit the scope of the seismic reflection method (Delprat-Jannaud and Lailly 2005). On the other hand, frequencies, length scales, times and distances appear to conspire to produce at least discernible, if not dominant, single scattering in much reflection data from the upper crust. This is the fact upon which the state of the art in reflection seismic imaging is founded. The linearization of the equations of motion with respect to the coefficients results in small error when the background or reference model is smooth, and the perturbation is oscillatory. (The words ‘smooth’ and ‘oscillatory’ are vague—a precise sense for this decomposition is discussed below.) Linearization defines a linear map from coefficient perturbations to data perturbations (sampled solution perturbations). This linear map has been studied extensively in the last 25 years, and the story that has emerged is one of the two main topics of this review. The main results follow from its oscillatory integral representation: it is a so-called Fourier integral operator, a class of oscillatory integral operators developed by Hormander¨ and others to represent the solutions of non-elliptic PDEs (Hormander¨ 1971, 1983). These operators have singularity mapping properties which facilitate an explanation of the relation between imaging and inversion, and indeed motivate a definition of imaging. Critical geological features—

2 Inverse Problems 25 (2009) 123008 Topical Review transitions over very short distances from one type of rock to another with different mechanical characteristics—amount to singular (nonsmooth) or approximately singular features in the coefficient functions. If one defines an image of an Earth structure model as any function which has the same singularities (precise definition below) as an algebraic combination of the coefficient functions of a model, then the mapping properties of the linearized operator explain why solution of the linearized inverse problem results in an image. Also, the imaging processes developed in the seismic literature are approximations in various senses of the linearized inverse, retaining its singularity mapping properties. This picture developed starting in the 1980s with the work of Beylkin (1985), Rakesh (1988), Beylkin and Burridge (1990) and others. The relation between widely used imaging (‘migration’) methods and linearized inversion was already implicit in the pathbreaking work of Cohen and Bleistein (1977, 1979) and was further developed by Lailly (1983, 1984) and many others. Bleistein (1999) provides an overview of this history; I give another below. The second pervasive theme is the compensation for data incompleteness in one sense (spectral) by data redundancy in another sense (spatial). For excellent physical reasons, seismic time series are spectrally incomplete—Fourier components at both very low and very high frequencies are below the noise level. The lack of very high frequencies means that singularities cannot actually be reproduced in the linearized inverse, but their locations are still determinate to within a wavelength, roughly speaking. Something similar seems to be true of the nonlinear inverse problem, though for the most part very little is known mathematically along these lines. The lack of low frequency data is much more damaging, as it turns out. Were data complete to zero Hertz, it would apparently be possible to reconstruct a model of that part of the Earth probed by the waves within the experimental time window from the result of a single seismic experiment (that is, one shot). This is certainly true for the one-dimensional version of the problem (Symes 1986a), but appears also to hold for two and three space dimensions, according to very limited mathematical evidence (Sacks and Symes 1985,Ramm 1986). However, the low temporal frequencies in the data correspond, at least roughly, to the low spatial frequencies in the model. These low spatial frequencies are in turn the controlling model feature in determining the kinematics (time of travel) of waves, since time of travel is a path integral of reciprocal wave velocity. I will develop a simple example which shows this correspondence explicitly. The kinematics in turn determine the singularity mapping properties of the linear coefficient-data map. Therefore, missing low frequencies in the data imply that single-experiment data are insufficient for imaging (or linearized inversion) purposes. This observation explains why seismic surveys are multiexperiments. On the other hand, multiexperiment data, even with the aforementioned spectral lacunae, are overdetermined. Even worse, data from different models tend to be essentially orthogonal, so that the L2 difference is large unless the models are very close. Serendipitous similarity between limited parts of the data leads to oscillatory behavior of least-squares and other error measures. Thus, the very natural error minimization or optimization formulation of the inverse problem, promoted extensively by Tarantola and others in the 1980s (Tarantola 1984, 1986, 1987), is not amenable to solution by descent or Newton-type methods (Gauthier et al 1986, Bunks et al 1995, Shin and Min 2006). Since these are the only feasible numerical methods given the scale of industrially appropriate problems, the least error approach (‘full waveform inversion’ in the seismic literature) has met with very limited success. The failure of least error inversion in reflection seismology presents a striking contrast to the effectiveness of this technique in other geophysical inversion problems (gravimetry, controlled source electromagnetics, traveltime tomography, etc).

3 Inverse Problems 25 (2009) 123008 Topical Review

Data redundancy provides an alternate avenue to determine the large scale structure of Earth models, and hence the kinematic information necessary for imaging. The geophysical industry has developed this avenue within the context of partially linearized modeling: that is, the model is represented as a perturbation of a smooth background or kinematic model, and both the background and perturbation parts of the model are to be determined. This methodology goes under the name velocity analysis and is perhaps the part of the technology which least resembles anything in the mathematical inverse problems’ literature. I have proposed a mathematical framework within which velocity analysis may be viewed as a solution of the partially linearized inverse problem (Symes 2008b), which I review here. This overview explicitly omits a number of important topics, to keep the length reasonable. For example, recent years have seen a flowering of work on inverse problems for wave equations from the purely mathematical point of view—see for example Belishev (2007), Stefanov and Uhlmann (2005), Imanuvilov and Yamamoto (2001). I will not discuss these very interesting developments, as they concern problem formulations which do not directly address the data types and degree of model heterogeneity characteristic of reflection seismology. I will also limit myself to inverse problems whose data consist of samples of solutions of the governing PDEs, rather than derived data such as travel times of waves. The so-called traveltime or reflection tomography is a tool of major importance in the practice of reflection seismology (Bishop et al 1987) with many recent and important theoretical developments (Delprat-Jannaud and Lailly 1993, Kurylev et al 2009). Finally, I have not discussed the extensive literature on one-dimensional problems, for which quite complete results are available—see for instance Gel’fand and Levitan (1955), Goupillaud (1961), Gopinath and Sondhi (1971), Bamberger et al (1979), Santosa and Schwetlick (1982), Gray and Symes (1985), Symes (1986a, 1991), Browning (2000), Rakesh (2001)—except insofar as these illuminate various issues arising for two- and three-dimensional problems. After establishing some notations for seismic reflection experiments and their continuum mechanical modeling in the first section, I explore linearized inversion in the second. A simple one-dimensional model problem captures many of the problem’s essential features, and will pop up in several places. I then review the Fourier integral operator theory of the linearized problem. One of its main results is that the linear coefficient-data map is approximately unitary, in that its adjoint is an imaging operator (reproduces the positions, if not the detailed forms, of singularities). The adjoint state method provides a derivation of the so-called reverse time migration imaging operator, which is currently under intensive industry development. The third section explains velocity analysis in terms of model extension and discusses some attempts to formulate velocity analysis as an optimization problem. The final section explores some ideas which may reduce the reliance of the technology on linearization, thus considerably expanding its scope.

2. Mathematical framework

This section reviews the physical circumstances of the seismic experiment and the nature of the data acquired, and then proposes a class of continuum mechanical models and a basic formulation for the inverse problem of reflection seismology.

2.1. The reflection seismic experiment The seismic experiment has three essential components: • energy/sound source—creates wave traveling into the subsurface; • receivers–record waves (echoes) reflected from the subsurface;

4 Inverse Problems 25 (2009) 123008 Topical Review

hydrophone streamer acoustic source xxrsh (airgun array)

Figure 1. Cartoon of the marine seismic experiment. (This figure is in colour only in the electronic version)

• recording and signal-processing instrumentation. A survey consists of many experiments or shots. Each shot amounts to the use of one source, localized near time and space—position xs . Survey output is the simultaneous recording of reflections at many localized receivers, positions xr , over a time interval = 0 − O(10)safter initiation of the source. Insofar as it makes any difference, the marine version of reflection seismology will guide the conceptual developments in this review. For marine seismology, • typical energy source is the airgun array, towed behind a survey ship, which releases (array of) supersonically expanding bubbles of compressed air, and so generates a sound pulse in water; • typical receivers are hydrophones (waterproof microphones) in one or more 5–10 km flexible streamer(s)—wired together in 500–30 000 groups or acoustic antennas, and towed behind a survey ship, usually the same as the one towing the airgun array but sometimes another vessel; • onboard the survey ship(s), lots of recording and processing capacity. A cartoon of this configuration appears in figure 1. Land acquisition is similar in concept, but uses quite different equipment. Acquisition and processing are more complex and labor-intensive—for example, the receivers must be moved by manual labor (usually). Consequently, the vast bulk (90%+) of data acquired each year is marine. Non-streamer marine acquisition is growing (for example, ocean bottom cables), but streamer surveys still dominate. Typical data parameters describing streamer data include

• time t − 0  t  tmax,tmax = 5–15 s; • source (airgun array) location xs , 100–100 000 distinct values (10 000 in a typical survey); • receiver (hydrophone group) location xr ; • for each source location, receiver locations typically occupy the same range of offsets = − − = xr xs xr xs for each shot, often parametrized by half offset, either vector h 2 , or scalar h =|h|, with 100–300 000 distinct values (typical: 5000); • data values: microphone output (volts), filtered version of local pressure (force/area). Modern cable positioning and navigation equipment apparently determines the location parameters to within a few meters or less. In idealized marine ‘streamer’ geometry, xs and xr lie roughly on constant depth planes, and source–receiver lines are parallel. Thus, three spatial degrees of freedom (for example, xs ,h) suffice to describe all source–receiver pairs. Since the possible source and receiver positions, assuming fixed depths, could occupy parts of two planes, this geometry could be termed codimension 1.

5 Inverse Problems 25 (2009) 123008 Topical Review

Contemporary surveys may feature simultaneous recording by multiple streamers (up to 12 or more) with many hundreds of receiver groups, many (roughly) parallel ship tracks (‘lines’) and recording times up to 15 s or more with sample rates of as little as 2 ms. Typically these surveys generate Terabytes of data. The recent development of Wide Angle Towed Streamer (WATS) technology has increased potential data volume by at least another order of magnitude. WATS relies on the accuracy of modern navigation hardware to position ships repeatedly in the same places and uses multiple passes of multiple ships towing streamer arrays to generate for areal sampling of source and receiver positions with reasonable density (Regone 2006). The imaging improvements due to WATS are sufficiently significant to compensate for its considerable addition expense, at least in some cases. We introduce the relation Y ⊂ R3 × R3 of (source, receiver) position pairs, which largely characterizes survey geometry. Since all time series recorded generally have the same length T, the data become a real-valued function on d : Y × [0,T] → R. As noted before, source and receiver depths zs and zr are fixed by the survey equipment, at least approximately, so Y ⊂{(x,y,z): z = zr }×{(x,y,z): z = zs }. Certain distinguished data subsets play an important role in the description of the data and processes to extract images from it, and in the theory to be developed below. A trace is the time series recorded at one receiver for one source position, that is, t → d(xr ,t; xs ).Agather or bin is a collection of traces corresponding to a subset Y, characterized by the common value of an acquisition parameter. Important examples include

• shot (or common source) gather: traces with same shot position xs (that is, the data collected in a single field experiment); • offset (or common offset) gather: traces with same half offset h = (xr − xs )/2; • common midpoint (CMP) gather: traces with same midpoint between source and receiver (xr + xs )/2. In practice, ‘same’ is replaced by ‘same to within specificed tolerance’, to account for residual position inaccuracies. A typical shot gather is depicted in figure 2. A midpoint gather from the same survey, after some filtering (smoothing via convolution in time) and cutting off via multiplication by a smoothed characteristic function (muting), appears as in figure 3. Both figures plot amplitude of the recorded signal versus offset and time, and demonstrate the characteristic feature of this type of data: waves, that is, spacetime coherent structures, called events in the seismic literature. Figure 2 is raw data, as recorded in the field, and illustrates a very important feature of reflection data, namely temporal bandlimitation: this data is oscillatory, and lacks energy at both low and high frequencies (in the latter case, even at frequencies which can be well represented at the sample rate employed). The lack of high frequencies arises from several causes: the airgun source generates substantial output only over a limited range of frequencies, the hydrophones used to detect the signal are sensitive only over a limited band, and high frequencies are selectively attenuated by various absorption processes as the waves propagate through the Earth. Low frequencies (below roughly 3 Hz) are not present for different reasons, having to do with the energy required to produce them (Ziolkowski 1993, Ziolkowski and Bokhorst 1993). A simple thought experiment showing the link between total energy output and absence of low frequencies: to introduce a nonzero amplitude at 0 Hz would require shifting the entire Earth out of its current orbit, or plastic deformation of the entire region containing the sources and receivers—within linear elastodynamic theory (to be reviewed in the next section), both options take essentially infinite energy! We now turn to choice of models adequate to represent at least the salient behavior of seismic waves.

6 Inverse Problems 25 (2009) 123008 Topical Review

offset (km) -4 -3 -2 -1 0

1

2 time (s) 3

4

5

Figure 2. Shot gather from survey in the .

-2.5 -2.0 -1.5 -1.0 -0.5 1.0

1.5

2.0

2.5

3.0

Figure 3. CMP gather from the Gulf of Mexico survey, filtered with trapezoidal bandpass (4–10– 25–40 Hz) and muted. The vertical axis is time (s), and the horizontal axis is offset (km).

2.2. Continuum mechanical models Seismic wavelengths run in the tens of meters, so it is reasonable to presume that the mechanical properties of rocks responsible for seismic wave motion might be locally homogeneous on length scales of millimeters or less—that is, that the Earth might be modeled as a

7 Inverse Problems 25 (2009) 123008 Topical Review mechanical continuum. Except possibly for a few meters around the source location, the wavefield produced in seismic reflection experiments does not appear to result in permanent damage or deformation—the waves are entirely transient. These considerations suggest linear elastodynamics as a mechanical model for seismic wave motion, and this choice has been widely accepted for more than 100 years (Aki and Richards 1980). The dynamic equations of linear elasticity may be written as ∂v ρ =∇·σ + f, ∂t (1) ∂σ 1 = C(∇v + ∇vT ) + γ, ∂t 2 where v is the particle velocity field, σ is the stress tensor, f is a body force density, γ is a defect in the elastic constitutive law, ρ is the mass density and C is the Hooke tensor (Achenbach 1973). The right-hand sides f and γ provide a variety of representations for external energy input to the system (that is, the energy source). Seismic experiments use transient energy sources and produce transient wavefields. Accordingly, the appropriate initial conditions for the system (1)are v = 0,σ= 0fort  0. (2) The sea surface supports essentially no normal stress, which provides one set of boundary conditions. Since the energy sources are localized in space and the time extent of the seismic experiment is limited, the domain for the system (1) is essentially unbounded. For numerical purposes, various absorbing boundary conditions mock up an unbounded domain while confining actual computations to a bounded one (Moczo et al 2006). Even the system (1) is not really adequate: seismic waves are observed to undergo attenuation and frequency dispersion as they propagate, phenomena not predicted by (1). Other theories such as viscoelasticity (Christensen 1982), in which the Hooke tensor becomes a convolution operator in time, must be invoked to capture these phenomena. Isotropic elastodynamics depends on three independent parameters, for example the compressional and shear wave speeds cp and cs and the density ρ. It is instructive to examine direct measurements of these quantities, made in a borehole. The measurements (so-called well logs) are actually solutions of another type of inverse problem, involving waves with frequencies one to two orders of magnitude higher than those occurring in field seismograms like the one displayed in figure 2, as well as ionizing radiation. They nonetheless constitute the only direct access to mechanical properties of rocks in situ deep in the Earth. Figure 4 shows cp,cs and ρ measured along a borehole in the . Observe that • all of these quantities are heterogeneous at all scales represented (the data were subject to a 30 m moving average, to remove distracting fine scale fluctuations—in other words, the actual measurements are even more heterogeneous than the data displayed) and • cp and cs show large-scale trends, with cp doubling and cs tripling over the 2 km extent of the log, whereas ρ is much closer to constant. Thus, wave speeds can be expected to vary considerably both on short and long scales, whereas most of the variation in density appears to occur on a short scale. The physical reason for this contrast is that gravitational compression tends to increase the stiffness of the rock with depth, but is not sufficient to overcome the intermolecular forces to change density substantially. In this review, I will not develop the theory of the seismic inverse problem in the general context of the elastodynamic system (1), as in for example Stolk and De Hoop (2002), or more general systems (Blanch et al 1998). Instead, I will limit my discussion to a special case, linear acoustics. In this model, it is supposed that the material does not support shear stress.

8 Inverse Problems 25 (2009) 123008 Topical Review

4500

4000

3500

3000

2500

2000

1500

1000

500 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 depth (m)

Figure 4. Blocked logs from a well in the North Sea. Solid: p-wave velocity (m s−1), dashed: s-wave velocity (m s−1), dash-dot: density (kg m−3). ‘Blocked’ means ‘averaged’ (over a moving 30 m window). Original sample rate of log tool <1m.

The stress tensor becomes scalar, σ =−pI, p being the pressure, and only one significant component, the bulk modulus κ, is left in the Hooke tensor. The system (1) collapses to ∂v ρ =−∇p + f, ∂t (3) 1 ∂p =−∇·v + g, κ ∂t in which I have chosen to represent the energy source as a constitutive law defect g. This model predicts neither multiple wave modes and speeds, nor material anisotropy, attenuation or frequency dispersion of waves. It does however predict wave motion with = κ spatially varying wave speed c ρ . It is convenient to represent the acoustic dynamics in terms of the acoustic field potential t u(x,t)= −∞ dsp(x,s): ∂u 1 p = , v = ∇u. ∂t ρ A short calculation leads to the second-order wave equation for the potential, equivalent to the acoustic system (3): 1 ∂2u 1 −∇· ∇u = g. (4) ρc2 ∂t2 ρ To make a well-posed problem out of (4)or(3), one must add suitable initial and boundary conditions. Transience and causality of the wavefield imply u ≡ 0,t 0. (5) Zero normal stress at the sea surface (roughly, z = 0) translates into p = 0 for acoustics, so

u|z=0 ≡ 0. (6)

9 Inverse Problems 25 (2009) 123008 Topical Review

In principle, the domain should be the unbounded half space {z>0}. As for elasticity, numerical absorbing boundary conditions have been developed for computationally feasible unbounded medium simulation. For the most part, I will add two further idealizations: • density ρ is constant, • the source (transient constitutive law defect g) is an isotropic point radiator located at the source point x = xs with known time dependence (‘source pulse’ w(t), typically of compact support),

g(x,t; xs ) = w(t)δ(x − xs ). (7)

Thus, the acoustic potential and pressure fields also depend on the source location xs . The isotropic point radiator does not produce the radiation pattern of actual seismic sources, which can be quite directional (anisotropic). For a study of the extent to which multipole expansions, that is linear combinations of partial derivatives of distributions of the form (7), can represent arbitrary localized sources, see Santosa and Symes (2000). If the wave velocity c is constant, then the system (4), (5) and (7)(inallofR4, rather than the half-space) has the well-known outgoing spherical wave solution (see for example Courant and Hilbert (1962)): w(t − r/c) u(x,t)= ,r=|x − x |. (8) 4πr s The free surface (boundary condition (6)) solution in a half-space may be constructed from this expression by the method of images. One sees from the above expression that only heterogeneity in the sound velocity or density can be responsible for the existence of reflections. In view of the observed highly heterogeneous character of sedimentary rocks, it is natural to ask: how nonconstant can c, ρ be and still permit ‘reasonable’ solutions of wave equation? This question was answered in the late 1960s by J-L Lions—see for example Lions (1972), and also Stolk (2000b), Blazek et al (2008) for generalizations and refinements. The sense in which the system is solved requires careful definition. For the sake of concreteness, suppose that (4)istobesolvedina domain × [0,T] ⊂ R4, with sufficiently regular boundary. The cited references give details on necessary boundary regularity; one admissible domain is a hypercube, which suffices for present purposes. Assume that Dirichlet conditions are posed on ∂ × R.Aweak solution of the Dirichlet problem (similar treatment for other boundary conditions): ∈ 1 ; 2 ∩ 0 ; 1 u C (R L ( )) C R H0 ( ) satisfies: for any φ ∈ C∞( × R),  0   1 ∂u ∂φ 1 dt dx − ∇u ·∇φ + gφ = 0. 2 ρc ∂t ∂t ρ Then, Theorem (Lions 1972). Suppose that log ρ,log c ∈ L∞( ), f ∈ L2( × R). Then weak solutions of the Dirichlet problem exist; initial data ∂u u(·, 0) ∈ H 1( ), (·, 0) ∈ L2( ) 0 ∂t uniquely determine them. In particular, a unique causal weak solution (u = 0,t  0) exists, provided that f is also causal. Since the problem is autonomous, additional right-hand side regularity in time immediately translates into additional time regularity of the weak solution. For additional spatial regularity in the solution, the coefficients (ρ,c in this case) must also be more regular.

10 Inverse Problems 25 (2009) 123008 Topical Review

2.3. An inverse problem

The well-posedness result of Lions, described in the last section, does not quite justify sampling the solution of the inverse problem at the receivers to define the modeled data, nor does it accommodate singular sources such as the isotropic point radiator. For modeling the marine seismic experiment, a simple remedy is available. In the usual geometry of streamer surveys, the sources and receivers are positioned in the water column 0  z  zw, that is 0

• a distribution solution in the usual sense of the constant-coefficient problem (4), supported in the water column 0  z  zw, for which the right-hand side is the sum of the isotropic point radiator (7) and a smooth function f supported in the water column, and 2 • a weak solution in z>zs , whose L right-hand side is −f .

The solution constructed in this way is smooth in the water column except at the point source. These observations justify the definition of the mapping

2 F : A[c0,ρ0,c−,ρ−,c+,ρ+] → L (Y × [0,T]), ∞ 3 A[c0,ρ0,c−,ρ−,c+,ρ+] ={c, ρ ∈ L (R ) : c = c0,

ρ = ρ0,z0}.HereY is supplied with a measure. For discrete Y, modeling actual acquisition at discrete locations, the measure is simply a linear combination of delta functions supported at the points of Y. For the idealized case of a smooth submanifold of dimension 4, the measure is the usual induced one. The latter choice is idealized in the sense that sampling is regarded as continuous. It is possible to do away with the requirement of a homogenous layer, for example to model land seismic, at the cost of considerably more refined definition of the map F. F maps model (fields c, ρ) to predicted data (function on Y × [0,T]). In the parlance of inverse problems, F is the forward or data-prediction map. The inverse problem of (marine) reflection seismology is: given d ∈ L2(Y × [0,T]), find (c, ρ) ∈ A so that F[c, ρ] d. I have deliberately used the vague ‘ ’ symbol to reflect the variety of possible meanings that might be assigned to this problem. In view of the many approximations and idealizations involved in modeling the seismic wavefield via linear acoustics as described here, field data d seems unlikely to lie in the range of F. A very common approach is to interpret ‘ ’ as ‘as close as possible in the sense of L2(Y × [0,T])’, that is, to turn the inverse problem stated above into a nonlinear least-squares problem, perhaps with some additional regularizing constraints on the model. I shall return to this formulation of the inverse problem below, after discussing the approximation responsible for most practically useful work on .

11 Inverse Problems 25 (2009) 123008 Topical Review

3. Linear inverse scattering

Contemporary seismic imaging technology is based almost entirely on linearization of F with respect to c. This approximation is also known as first-order perturbation or (somewhat less accurately) as the Born approximation. For convenience, I will restrict this discussion to constant density acoustics, that is, ρ independent of x. With appropriate choice of units, ρ = 1. Define F as above, as the map from c to the restriction to Y × [0,T] of the solution of (4), (5), (7) and (6). It turns out to be convenient to express the first-order change in F[c + δc], due to a small change δc in sound velocity, in terms of the relative perturbation r = δc/c. Thus, write c = v(1+r) and treat r as relative first-order perturbation about v, resulting in perturbation of the pressure field δp = ∂δu = 0,t  0, in which ∂t 1 ∂2 2r ∂2u −∇2 δu = ; u ≡ 0,t 0. (10) v2 ∂t2 v2 ∂t2 Define the linearized forward map F by

F [v]r = δp|Y ×[0,T ]. (11) The linearized inverse problem for constant density acoustics is: given d ∈ L2(Y × [0,T]), and suitable v, find r so that F [v]r d. The reference velocity field v is part of the data of this problem. Of course, v is not data in the sense of being measured directly, any more than is r, or any other descriptor of the subsurface. Ultimately v must be determined as well. Methods for doing so will be discussed in the next section. An immediate and natural question is: in what sense is F [v]r = DF[v](vr) the (Gateaux)ˆ derivative of F at c = v in the direction δc = vr? This question was answered by Chris Stolk in his thesis (Stolk 2000b). Suppose δc ∈ L∞( ), and again assume Dirichlet conditions on ∂ , as in the statement of Lion’s theorem above. Then the linearized problem (11) has a unique causal weak solution, and δu by solving the perturbational problem: set v = c, r = δc/c. Stolk shows that for small enough h ∈ R,

u[c + hδc] − u[c] − δu C0([0,T ],L2( )) = o(h) (Stolk 2000b, pp 52–4). Note ‘loss of derivative’: the error in Newton quotient is o(1) in the weaker norm (L2 as opposed to H1) than that defining the space of weak solutions. A small modification of Stolk’s argument also shows that

F[c] L2(Y ×[0,T ]) = O( w L2(R)) but

F[v(1+r)] − F[v] − F [v]r L2(Y ×[0,T ]) = O( w H 1(R)). (12) As will be shown below, these estimates are both sharp.

3.1. Linearization error Numerical experiment and physical intuition suggest that the error estimate (12), despite being sharp, cannot possibly tell the whole story. Crudely speaking, the linearization error (left-hand side of (12)) appears to be • small when v is smooth, r is rough or oscillatory on the wavelength scale (well-separated scales) and

12 Inverse Problems 25 (2009) 123008 Topical Review

x (km) x_r (km) 0 0.5 1.0 1.5 2.0 0 0.2 0.4 0.6 0.8 1.0 0 0

0.2 0.2

2.4 0.4 2.2 0.4 2.0 t (s) z (km) 0.6 1.8

1.6 0.6

0.8

0.8

1.0

Figure 5. Left: total velocity c = v(1+r) with smooth (linear) background v(x, z), oscillatory (random) r(x, z). Standard deviation of r = 5%, distributed uniformly. Right: simulated seismic response for shot located in the upper center of the 2D model (figure 5), that is, (F[v(1+r)]), = bandpass filter 4–10–30–45 Hz. Simulator is (2,4) finite difference scheme.

• large when v is not smooth and/or r is not oscillatory (poorly separated scales). Figures 5–9 illustrate this phenomenon. These figures show the results of finite difference simulation applied to a two-dimensional (‘2D’) version of problems (4), (5), (7) and (6). The data consist of a single shot gather, with a ‘split spread’ receiver array, that is, receivers on both sides. Thus, Y is one dimensional. The velocity field in figure 5 left is the sum of a smooth (linear) v(x, z) and oscillatory δc(x,z) = v(x,z)r(x,z) depending only on z, that is, a ‘layered medium’ (not a bad simplification, as many types of sedimentary rocks begin life as flat layers of sediments). The two summands appear as given in figure 6. The oscillatory part r is uniformly random with mean zero and standard deviation of 5%. The source wavelet w(t) is a bandpass filter, with corner frequencies 4–10–30–45 Hz. Figure 5 (right) shows the simulated seismic data (seismogram). Note that the considerable heterogeneity in the model leads to the presence of many reflections in the data. The simulation in the smooth background (figure 7(left)) contains no visible reflections, only the sampling on Y of what appears to be an outgoing circular wave. Geometric optics predicts this behavior, as will be discussed in the next section. The same finite difference method applied to the linearized problem (10) produces figure 7 (right), which appears to consist entirely of reflections. The sum of the data in these two figures appears to resemble closely the data depicted in figure 5 (right). Figure 8 presents the contrasting case of smooth perturbation of nonsmooth models. The left panel shows the simulation for c = v(0.95 + r), that is, for the smooth component shown in figure 6 diminished by 5%, with the same rough component. The right panel shows the linearized simulation for the difference, that is, F [v(0.95 + r)](0.05v/v(0.95 + r)).The linearization error appears to be considerably larger, and that is indeed the case. Figure 9 depicts the two linearization errors, plotted on the same scale. The linear approximation of

13 Inverse Problems 25 (2009) 123008 Topical Review

x (km) x (km) 0 0.5 1.0 1.5 2.0 0 0.5 1.0 1.5 2.0 0 0

0.2 0.2

0.10 2.4 0.4 0.4 0.05 2.2 0 2.0 z (km) z (km) -0.05 0.6 1.8 0.6 -0.10 1.6

0.8 0.8

1.0 1.0

Figure 6. Left: smooth part (v) of the model depicted in figure 5. Right: rough/oscillatory part (r) of the model depicted in figure 5.

x_r (km) x_r (km) 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0

0.2 0.2

0.4 0.4 t (s) t (s)

0.6 0.6

0.8 0.8

Figure 7. Left: simulated shot for smooth model of figure 6,thatis,F[v], same conditions as in figure 5 (right). Note that no reflected waves are present, only the so-called direct wave traveling horizontally in the water column. Right: simulated linearized shot for the perturbational model of figure 6,thatis,F [v]r, same conditions as in figure 5 (right). Note that data appear to consist entirely of reflected waves. The pointwise sum of the data displayed in these two panels comes very close to reproducing the data displayed in figure 5 (right).

the data in figure 5 with smooth reference and oscillatory perturbation is considerably more accurate than the same approximation with nonsmooth reference and smooth perturbation.

14 Inverse Problems 25 (2009) 123008 Topical Review

x_r (km) x_r (km) 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0

0.2 0.2

0.4 0.4 t (s) t (s)

0.6 0.6

0.8 0.8

Figure 8. Left: simulated shot for the nonsmooth model obtained by subtracting 5% of the model shown in figure 6 from the model shown in figure 5,thatis,F[v(0.95 + r)]. Right: simulated linearized shot for smooth perturbation of the nonsmooth model, that is, F [v(0.95 + r)](0.05v/v(0.95 + r)).

A suggestive ‘physical’ explanation for this phenomenon derives from the kinematics of waves, which will be formalized in the next section in the form of geometric optics. The time of travel of the wave, during its transit from source to reflection point to receiver, is the line integral of the reciprocal velocity (‘slowness’) over a path. A zero mean oscillatory perturbation of velocity does not much change this path integral, so the time of arrival stays the same, to good approximation. A smooth perturbation, on the other hand, has a substantial effect on the time of travel, resulting in a time shift.Timeshifts1/4 wavelength or more are very large perturbations for oscillatory signals. A close examination of figures 5 and 8 will show that the very large error (on the order of 100%) evident in figure 9 (right) is largely the effect of timeshifts on the order of 1/2 wavelength, caused by the 5% smooth velocity perturbation. The anomalous accuracy of linearization about smooth velocity models (‘macromodels’) appears to be a very large part of the reason for the success of contemporary imaging technology. This is not to say that the models used (implicitly) as reference in perturbation computations are always smooth—however, they are almost always at least piecewise smooth, and to some extent the anomalous linearization accuracy persists for this larger class. Mathematically rigorous explanation of this phenomenon seems quite difficult. The only (partial) result, of which the author is aware, concerned the 1D problem, and was part of R M Lewis’ thesis (Lewis and Symes 1991).

3.2. The 1D problem: an instructive example A great deal about the seismic inverse scattering problem can be understood from examination of the one-dimensional (‘1D’) version, linearized about constant background. For the single

15 Inverse Problems 25 (2009) 123008 Topical Review

x_r (km) x_r (km) 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0

0.2 0.2

0.4 0.4 t (s) t (s)

0.6 0.6

0.8 0.8

Figure 9. Left: linearization error (left-hand side of (12)), for nonsmooth perturbation of the smooth model (figures 5 (right), 6 (left) and 6 (right)). Right: linearization error (left-hand side of (12)), for smooth perturbation of the nonsmooth model (figures 5 (right), 8 (left) and 8 (right)). Figures plotted on the same gray scale. spatial coordinate we use z, signifying depth. We place both sources and receivers at depth zero and assume that the sound velocity is constant in the left half plane z<0—in effect, a simple model of seismology in an infinitely deep ocean. The constant density acoustic version of this problem couples the acoustic potential field u(z, t), its perturbation δu(z,t), the constant reference sound velocity c and its perturbation δc(z) through the linearized equations of motion 1 ∂2u ∂2u − = 0, (13) c2 ∂t2 ∂z2 1 ∂2δu ∂2δu 2δc ∂2u − = , (14) c2 ∂t2 ∂z2 c3 ∂t2 and causal initial conditions u(z, t) = 0,δu(z,t)= 0,t 0. (15) This pair of initial value problems is well posed and has a unique solution depending continuously on the initial conditions (i.e. on w) in a suitable sense, by a minor extension of the theory discussed above. Assume that δc(z) = 0forz<0 and w(t) = 0fort<0. Then it is easy to see that δu obeys the radiation boundary condition

1 ∂δu ∂u − (0,t)= 0. c ∂t ∂z That is, at z = 0,δuconsists of waves moving in the negative z direction, commonly called upcoming.

16 Inverse Problems 25 (2009) 123008 Topical Review

We will assume that supp δc ⊂{z>0}, and that the receiver depth zr = 0. Define the linearized forward map (formal derivative of F)by DF[c]δc(t) = δu(0,t). (Physically, the pressure p = ∂u/∂t is the more natural measurement; however, the choice of u instead simplifies the subsequent calculations and is in principle equivalent.) The solution u(z, t) = w(t − z/c) of the reference problem is a downgoing wave. A short calculation using the Green’s function for the constant density acoustic 1D wave operator

c |z − z | G(z, t; z ,t ) = H t − t − (16) 2 c and equation (14) shows that    1 δc cτ DF[c]δc(t) = dτw(t− τ) 2 . (17) 2 c Thus, the impulse response (w = δ)is   1 δc ct DF [c]δc(t) = 2 . (18) δ 2 c In terms of the reflectivity r = δc/c, in the notation of the previous section,

1 ct F [c]r(t) = r . (19) δ 2 2 The RHS of equation (18) is often called the reflectivity—its convolution with the pulse w gives the linearized reflection. A number of conclusions about this very simple problem may be reached from inspection of equation (17). (1) The influence of localized perturbations in c is also local: a localized perturbation in c near depth z results in a perturbation in the reflected wave (that is, the forward map) at time 2z/c, which is the time of travel from the surface z = 0 to depth z and back, at speed c. (2) The spatial frequency band in which the linearized seismogram controls the velocity perturbation is entirely determined by the temporal frequency band of the pulse w: the Fourier transform δc(ˆ 2πk) is well determined iffw ˆ is safely nonzero at frequency f = ck/2. (3) As discussed in the last section, the passband of w typically takes the form [fmin,fmax] with fmin > 0 (typically fmin 3–5 Hz). Thus both very short and very long wavelengths in δc are undetermined from the data of the inverse problem, representing limitations on both spatial resolution and ability to discern large-scale trends in velocity (perturbation). (4) It follows from the results in Stolk (2000b) that F : A → L2[0,T] is well defined for a suitable open domain A ⊂ L∞(R) of positive velocities, for any w ∈ L2(R). However, differentiability requires that w ∈ H 1(R); part of this conclusion is evident from equation (17), namely that the derivative is a bounded linear map only under this condition. Thus differentiating F with respect to c results in a ‘loss of one derivative’. (5) If w ∈ L2(R), then DF[c] may be regarded as a map on L2(R), hence composable with its formal adjoint. The normal operator DF[c]∗DF[c] is a convolution operator, whose kernel is the autocorrelation of w.Ifw has small support near t = 0, then the same is true of its autocorrelation. Hence, the support of DF[c]∗DF[c]δc is very close to the support of δc. If the data d(t)are consistent with the linearized model, that is, d(t) = DF[c]δc(t) for some δc, then DF[c]∗d has support near that of δc. Thus DF[c]∗ is an imaging operator.

17 Inverse Problems 25 (2009) 123008 Topical Review

(6) For the idealized impulsive case (w = δ), DF[c]∗DF[c] is an elliptic differential operator (in fact, a multiplier by a smooth positive function), and hence preserves the singularities of δc. All of these conclusions will hold in a suitably modified form for the more complex problems to be described below. In particular, observation (6) motivates a precise and abstract definition of imaging operator.

3.3. The generalized Radon transform representation The following paragraphs present an integral representation of the linearized forward map F [v]. I will develop this representation for the particular, idealized source pulse w = δ, and denote the result as Fδ[v]. Of course any other version of the forward map is related to this one by time convolution. At a deeper level, the role of high frequencies (on the length scale associated with the smooth velocity v) is predominant, and this dominance is nicely captured by Fδ[v]. Jack Cohen and Norm Bleistein were early proponents of this viewpoint (Cohen and Bleistein 1977, 1979). It is both implicit in early geophysical literature on migration (Hagedoorn 1954), and key to the geometrical understanding of single scattering developed over the last quarter-century. Under this assumption, the acoustic potential u is the same as the causal Green’s function, or the retarded fundamental solution G(x,t; x ). It is the distribution solution of s 1 ∂2 −∇2 G(x,t; x ) = δ(t)δ(x − x ) v2 ∂t2 s s and G ≡ 0,t <0. Then (w = δ!) p = ∂G,δp = ∂δG, and ∂t ∂t 1 ∂2 2 ∂2G −∇2 δG(x,t; x ) = (x,t; x )r(x). v2 ∂t2 s v2(x) ∂t2 s = | It is convenient to ‘forget’ a derivative: from now on, define Fδ[v]r δG x=xr . Duhamel’s principle then implies that   2r(x) ∂2G F [v]r = dx dsG(x ,t − s; x) (x,s; x ). (20) δ v(x)2 r ∂t2 s Since v is smooth, one expects the geometric optics (or geometric acoustics) approximation of G to be reasonably accurate. The local version states that if x is ‘not too far’ from xs , then

G(x,t; xs ) = a(x; xs )δ(t − τ(x; xs )) + R(x,t; xs ), in which the traveltime τ(x; xs ) solves the eikonal equation v|∇τ|=1 (21) with ‘initial condition’ |x − xs | τ(x; xs ) ∼ , x → xs . v(xs )

The amplitude a(x; xs ) solves the transport equation ∇·(a2∇τ) = 0. This approximation is developed in many books on partial differential equations; see, for example, Courant and Hilbert (1962), Friedlander (1958). In this context, ‘not too far’ means: there should be one and only one ray of geometric optics connecting each xs or xr to each x ∈ supp r. I will call this hypothesis the simple geometric optics assumption.

18 Inverse Problems 25 (2009) 123008 Topical Review

xxr s

y

τ τ t= (y,xrs )+ (y,x )

Figure 10. Illustration of the integration surfaces for the generalized Radon transform approximation (equation (23)).

All of this is meaningful only if the remainder R is small in a suitable sense. A simple exercise in energy estimates yields   T 2 dx dt|R(x,t; xs )|  C v C4 , 0 for a constant C depending on the size of the domain and the time T but not otherwise on v. That is, the remainder is square-integrable, hence smoother than the leading term. Smoother means smaller for high frequency components, which is the sense in which geometric optics is an approximation. Assuming the simple geometric optics hypothesis, the distribution kernel K of F [v]is  ∂2G 2 K(x ,t,x ; x) = dsG(x ,t − s; x) (x,s; x ) r s r 2 s 2  ∂t v (x) 2a(x , x)a(x, x ) ds r s δ(t − s − τ(x , x))δ (s − τ(x, x )) v2(x) r s 2a(x, x )a(x, x ) = r s δ (t − τ(x, x ) − τ(x, x )), (22) v2(x) r s provided that

∇xτ(x, xr ) + ∇xτ(x, xs ) = 0, which is to say that the velocity at x of the ray from xs is not the negative of the velocity of the ray from xr . In other words, the last step above is valid provided that no forward scattering occurs (Gel’fand and Shilov 1958). The symbol in (22) means ‘differs by the kernel of a relatively smoothing operator’. In principle, one can complete the geometric optics approximation of the Green’s function so that the kernel difference is C∞. Then the approximation has the same singularity-mapping properties as Fδ[v]. Practical computations tend to ignore this refinement. Moreover, the singularity mapping properties turn out to be governed by the leading term, that is, the approximation presented in (22), so I will also ignore the distinction. For r supported in a simple geometric optics domain, and assuming no forward scattering,  ∂2 2r(x) F [v]r(x ,t; x ) dx a(x, x )a(x, x )δ(t − τ(x, x ) − τ(x, x )). (23) δ r s ∂t2 v2(x) r s r s That is, the pressure perturbation is the sum (integral) of r over reflection isochrons {x : t = τ(x, xr ) + τ(x, xs )}, with weighting and filtering. Note that if v is constant, then the isochron is ellipsoid, as τ(xs , x) =|xs − x|/v.The focii are the source and receiver positions, and the radius is the distance corresponding to the time t. This situation is depicted in figure 10. In general, the approximation (23) computes Fδ[v]r as the weighted integral of r over a family of (possibly) distorted ellipses, parametrized by the source and receiver positions and

19 Inverse Problems 25 (2009) 123008 Topical Review

H(φ)=0 φ φ <0 =0 ξ x H(φ)=1

φ>0

Figure 11. WF (H (φ)) ={(x,ξ): φ(x) = 0,ξ ∇φ(x)}. time. Accordingly, this expression has been dubbed the generalized Radon transform (‘GRT’) approximation of the linearized forward map (Beylkin 1985).

3.4. Beylkin’s theorem and a definition of image The first important result in the study of the linearized inverse problem is due to Beylkin (1985), with alternate proofs, refinements and important extensions by Rakesh (1988). This theorem gives precise on the singularity mapping properties of Fδ[v]. Its statement requires a concept from harmonic analysis (Hormander¨ 1983, Duistermaat 1996). E = ∞ D = ∞ In the following, I will use the standard notations ( ) C ( ), ( ) C0 ( ) for the smooth functions, respectively, the smooth functions with compact support, on an open set ⊂ Rn. The topological dual spaces are E ( ), the distributions of compact support, and D ( ), the distributions. The wave front set WF(u) of a distribution u ∈ D (Rn) is the subset of Rn × Rn\{0} (the set of pairs (position, direction)) defined by the condition (x0,ξ0)/∈ WF(u)if and only if × ⊂ n × n\{ } ∈ ∞ n there is an open neighborhood X R R 0 of (x0,ξ0) so that for any χ C0 (R ) with suppχ ⊂ X,anyN ∈ N, and any ξ ∈ , |χu(τξ) |=O(τ−N ).

That is, the Fourier transform of a localization of u near x0 acts like the Fourier transform of a smooth function of compact support (decreases rapidly at infinity) in directions ξ near ξ0. Thus the wave front set of a distribution detects orientation as well as position of singularities. Proofs of several important properties of wave front sets may be found in Hormander¨ (1983) and Duistermaat (1996): (1) The direction part of the neighborhood may naturally be taken to be a cone (ξ ∈ if and only if τξ ∈ for any τ>0). Such a neighborhood of (x0,ξ0) is called conic. (2) WF(u)is invariant under the change of coords if it is regarded as a subset of the cotangent bundle T ∗(Rn) (that is, the ξ components transform as covectors). (3) The standard example (see figure 11): suppose φ ∈ C∞(Rn), and φ(x) = 0 only if ∇φ(x) = 0, so that at least locally the zero level set of φ is a smooth hypersurface. If u is smooth except possibly for discontinuity across the ‘interface’ φ(x) = 0, then WF(u) ⊂ Nφ ={(x,ξ): φ(x) = 0,ξ ∇φ(x)} (the normal bundle of {φ = 0}). One of the chief consequences of the results in Beylkin (1985) is this: provided that no forward scattering takes place, and that no rays of geometric optics are tangent to ∗ the acquisition submanifold Y,thenormal operator Fδ[v] Fδ[v] is well defined as a map on distributions supported in a simple geometric optics domain. For such distributions ∗ r, W F (Fδ[v] Fδ[v]r) ⊂ WF(r).

20 Inverse Problems 25 (2009) 123008 Topical Review

∗ One proof (essentially Beylkin’s) begins by observing that Fδ[v] is also expressed as a GRT. Introducing the Fourier transform and approximating the resulting integral representation for the normal operator for large wavenumbers by the principle of stationary phase lead to an expression for the normal operator in the form  F [v]∗F [v]r(x) = p(x,D)r(x) ≡ dξp(x,ξ)eix·ξ r(ξ)ˆ + ···, (24) in which p ∈ C∞, and for some m (the order of p), all multi-indices α, β, and all compact 3 K ⊂ R , there exist constants Cα,β,K  0forwhich

α β  | | m−|β| ∈ Dx Dξ p(x,ξ) Cα,β,K (1+ ξ ) , x K. For details, including an explicit computation of the symbol p, see Symes (1995). The ellipses stand for an integral operator with smooth kernel, the presence of which will not affect the wavefront sets in question. An operator defined by an oscillatory integral of the form (24) is called the pseudodifferential operator (‘DO’). A standard result in theory of DOs states that they do not enlarge wavefront sets: if A is a DO, u ∈ E (Rn), then WF (Au) ⊂ WF(u). Together with the observation (24), this finishes the proof. A few other important facts about DOs are worth mentioning: • differential operators are DOs; • DOs of order m form a module over C∞(Rn); • DOs may be composed: the composition of a DO A of order m and a DO B of order l is a DO AB of order  m + l; • scalar DOs commute to the leading order: the order of AB−BA is  m + l − 1; • formal adjoints of DOs are DOs of the same order. Complete accounts of theory and many applications may be found in Taylor (1981), Hormander¨ (1983), Treves (1980) and Duistermaat (1996). These ideas lead to an idealized definition of image: a reflectivity model r˜ is an image of r if WF(r)˜ ⊂ WF(r)(the closer to equality, the better the image). Using this definition, it is possible to define an idealized migration problem, which algorithms approximately solve: given d (hence WF(d)) deduce somehow a function which has the right reflectors, i.e. a function r˜ with WF(r)˜ WF(r). ∗ ∗ Note that if the data are model-consistent, d = Fδ[v]r, then Fδ[v] d = Fδ[v] Fδ[v]r. ∗ Thus the result of Beylkin quoted above implies that the adjoint operator Fδ[v] is a migration ∗ or imaging operator. That is, application of the adjoint operator Fδ[v] to the data solves the migration problem, at least for noise-free data. This is an appropriate point at which to remind the reader that the reference velocity v is assumed known (and smooth) in all of these developments. In practice, it must also be estimated from the data.

3.5. Microlocal ellipticity and asymptotic prestack inversion Recall that in the 1D constant coefficient problem,

1 ct F [v]r(t) = r . δ 2 2 It follows that 1 2z F [v]∗d(z) = d , (25) δ c c

21 Inverse Problems 25 (2009) 123008 Topical Review so 1 F [v]∗F [v]r(z) = r(z), (26) δ δ 2c which is about the simplest invertible operator imaginable. In particular, the normal operator is an elliptic differential operator (of order zero). Something similar is true for the multidimensional linearized problem, with some ∗ important differences. The main complication is that Fδ[v] Fδ[v] cannot be invertible, at least not stably. The physical reason is simple to state: for certain orientations of reflectors, the energy in the reflected wave travels along paths which do not intersect the acquisition ∗ manifold Y. Thus because WF(Fδ[v] Fδ[v]r) can be quite a bit ‘smaller’ than WF(r). In the next subsection I will describe a conic set [v,Y] ⊂ R3 × R3\{0} within which the normal operator is microlocally elliptic,thatis,ifWF(r) ⊂ [v,Y], then ∗ WF(Fδ[v] Fδ[v]r) = WF(r). Thus for reflectivities with high-frequency components ∗ concentrated in [v,Y],Fδ[v] Fδ[v] ‘acts invertible’. In fact, Beylkin (1985) showed that a choice of amplitude or integration weight b(xr ,t; xs ) exists for which the modified GRT migration operator  † F [v] d(x) = dxr dxs dtb(xr ,t; xs )δ(t − τ(x; xs ) − τ(x; xr ))d(xr ,t; xs ) (27) yields F [v]†F [v]r − r ∈ C∞(R3) if WF(r) ⊂ [v]. For details of Beylkin’s construction, see Beylkin (1985), and also Bleistein (1987) and Symes (1995). This approximate pseudoinverse goes under various names, such as GRT inversion, Ray–Born inversion, migration/inversion, true amplitude migration, etc. It has been generalized to various submodels of elasticity: see for example Beylkin and Burridge (1990), Virieux et al (1992), Madariaga et al (1992), De Hoop and Bleistein (1997), Burridge et al (1998), Operto et al (2000), Xu et al (2001) and Stolk and De Hoop (2002). Even with physically incomplete models such as constant-density acoustics, GRT inversion can provide reasonably good notion of where ‘strong reflectors’, that is, large contrasts in rock mechanics, sit within the Earth. An example appears in figure 12, constructed in the mid-1990s by K Araya. The input data were a 2D (single sail line, single streamer) marine survey from the Gulf of Mexico, consisting of about 500 shot gathers. One of these appears as figure 2. The components of the GRT inversion formula (27), that is, the amplitude b and the traveletime τ, were all constructed as by-products of a numerical finite difference solution of the eikonal equation (21). The velocity v was estimated by means of an algorithm to be explained in the next section. Various geological features of the Earth are suggested. In particular, the very strong horizontal signal in the central part of the figure at roughly 2000 m depth corresponds to a commercial gas deposit, in which the presence of significant gas dissolved in pore fluids causes a dramatic change in compressibility of the rock. The very strong signal is thus indicative of an actual feature. An evident limitation of this construction is its apparent reliance on the simple geometric optics approximation. The next section explores the extent to which this hypothesis can be dropped.

3.6. Global inverse scattering The simple ray geometry hypotheses generally fails in the presence of strong . For example, White (1982) showed that for random but smooth v in 2D with pointwise variance σ , points at distance O(σ−2/3) from the source have more than one ray connecting to the source, with probability 1. This multipathing phenomenon is also associated with the formation of

22 Inverse Problems 25 (2009) 123008 Topical Review

CDP 0 200 400 600 800 1000 1200 1400 0

500

1000

1500 Depth in Meters

2000

2500

Figure 12. Example of GRT inversion (application of F [v]†). ‘2.5D’ linear inversion of marine streamer data from the Gulf of Mexico, 500 source positions, 120 receiver channels. caustics ray envelopes. The formation of caustics invalidates asymptotic analysis on which the theory discussed in the last subsection is based. Strong refraction leading to multipathing and caustic formation is typical in some geological regimes, for example salt (4–5 km s−1) intrusion into sedimentary rock (2–3 km s−1) as is common in the Gulf of Mexico or chalk formations intermixed with shales and sands under the North Sea. These are some of the most economically promising petroleum provinces. From the mathematical point of view, the theory of GRT inversion explained above is local—it holds only for a region of the Earth model relatively close to the source and receiver points. It is natural to ask for more global results. The first global results about linearized inversion were established by Rakesh (1988), who gave a geometrical description of the relation between wavefront sets of r and Fδ[v]r. Rakesh 3 studied the case of a single point source (Y ⊂{xs = 0}×R ). Nolan and Symes (1997) generalized these results to general manifolds Y, using similar ideas. ∗ 3 ∗ The canonical relation C[v,Y]ofFδ[v] is the subset of (T (R )\{0})×(T (Y ×R))\{0} defined by the possible singularities of input and output: (x,ξ,y,η)∈ C[v,Y], if and only if 3 for some r ∈ E (R ), (x,ξ)∈ WF(r) and (y,η)∈ WF(Fδ[v]r). The set [v,Y] mentioned in the last section is simply the projection of C[v,Y] onto the first factor, that is, [v,Y] ={(x,ξ)∈ T ∗(R3)\{0} : (x,ξ,y,η)∈ C[v,Y]forsome(y,η)∈ T ∗(Y )\{0}}. The rays of geometric optics play a fundamental role in the geometry of the canonical relation: they are solutions of the Hamiltonian system: dX d =∇ H(X, ), =−∇ H(X, ), dt dt X with H(X, )= 1 − v2(X)| |2 = 0 (28) (also known as null bicharacteristics).

23 Inverse Problems 25 (2009) 123008 Topical Review

Π Π

ξ x ξ x s,, s r, r

Ξ X s, s Ξ X r, r x,ξ −Ξ r(t−t r )

Ξ (t ) s s

Figure 13. Diagram of the reflection geometry, encoded in the canonical relation.

Rakesh showed that ∈ ⊂ ∗ 3 \{ }× ∗ \{ } ((x,ξ),(xs ,t,xr ,ξs,τ,ξr)) CFδ [v] T (R ) 0 T (Y ) 0 , if and only if there are rays of geometric optics (Xs , s ), (Xr , r ) and times ts ,tr so that

(Xs (0), t, Xr (t), s (0), τ, r (t)) = (xs ,t,xr ,ξs ,τ,ξr ),

Xs (ts ) = Xr (t − tr ) = x,ts + tr = t, s (ts) − r (t − tr ) ξ.

Since s (ts), − r (t − tr ) have the same length because of (28), their sum is their bisector. Thus velocity vectors of the incident ray from the source and the reflected ray from the receiver (traced backward in time) make equal angles with reflector at x with normal ξ. See figure 13. In other words, the canonical relation of Fδ[v] simply enforces the equal-angles law of reflection, interpreted in terms of rays. Note that Rakesh’s characterization of CF is global: no assumptions about ray geometry, other than no forward scattering and no grazing incidence on the acquisition surface Y,are needed. The proof applies several tools of harmonic analysis of PDEs to the linearized problem (10), including the Gabor calculus of wave front sets (Duistermaat 1996) and Hormander’s¨ theorem on propagation of singularities of solutions to PDEs (Taylor 1981). See Rakesh (1988) for details. Rakesh also showed that Fδ[v]isaFourier integral operator (‘FIO’). This class of oscillatory integral operators was introduced by Hormander¨ and others in the 1970s to describe the solutions of nonelliptic PDEs (Hormander¨ 1971,Treves1980, Duistermaat 1996). Phases and amplitudes of FIOs satisfy certain restrictive conditions and generate canonical relations of which C[v,Y] is a special case. The adjoint of an FIO is an FIO with the inverse canonical relation. DOsarealsoFIOs. Composition of FIOs does not yield an FIO in general. Beylkin had shown that F [v]∗F [v] is FIO (DO, actually) under simple ray geometry hypothesis—but this condition is only sufficient. Rakesh noted that Beylkin’s result follows from general results of Hormander¨ (1971): the simple ray geometry hypothesis implies that the canonical relation C[v,Y]isa graph (of the exterior derivative of the phase function in the FIO representation of Fδ[v]), and Hormander¨ had already shown that in this case the composition with the adjoint defines a

24 Inverse Problems 25 (2009) 123008 Topical Review

xxsr

x x

Figure 14. Violation of traveltime injectivity (‘TIC’). Symmetric waveguide: time of travel from xs to x¯ to xr is the same as the time of travel from xs to x to xr , so TIC fails.

FIO. Moreover, various mapping properties between Sobolev spaces follow from the general theory of FIOs: for example, Fδ[v] is an operator of order (d − 1)/2 in space dimension d,in the sense that a bounded extension s− d−1 s d → 2 × Fδ[v]:H0 (R ) Hloc (Y R) exists. The first step beyond simple ray geometry in the understanding of the normal operator ∗ was the work of Ten Kroode et al (1998). They showed that Fδ[v] Fδ[v]isaDO, as in the simple ray geometry case, provided that

(1) source, receiver positions (xs , xr ) form an open 4D manifold (‘complete coverage’—all source, receiver positions at least locally), and −1 ⊂ ∗ × \{ }× ∗ 3 \{ } (2) the traveltime injectivity condition (‘TIC’) holds: CF [v] T (Y R) 0 T (R ) 0 is a function—that is, initial data for source and receiver rays and total travel time together determine reflector location uniquely. TIC is a nontrivial constraint on ray geometry, and hence on v. Figure 14 shows a cartoon of a TIC violation. For an explicit construction, see Stolk (2000a). While the recently developed WATS technology comes close to satisfying the ‘complete coverage’ condition 1 above, modulo sampling limitations, most data acquired in both marine and land seismic do not. Inversion from partial data is also possible and plays a role in velocity analysis algorithms discussed in the next section. These data subsets—such as individual shot or offset gathers—do not satisfy condition 1, of course. Nolan and Symes (1997)showed that the normal operator is a DO even without the ‘complete coverage’ condition, provided that TIC is satisfied, and that in addition the projection C[v,Y] → T ∗Y is an embedding. The latter can occur for example in the classic co-dimension 1 marine streamer geometry, provided that the velocity changes sufficiently slowly and smoothly in the ‘crossline’ direction (perpendicular to the sail lines). However, examples violating TIC are much easier to construct when source–receiver submanifold has a positive co-dimension. Nolan and Symes (1997), De Hoop et al (2003) and Stolk and Symes (2004) give explicit examples for single gather data—common shot, common offset, etc. The examples show that failure of the normal operator to be a DO expresses itself in the appearance of kinematic artifacts in the inversion output (r), that is,

25 Inverse Problems 25 (2009) 123008 Topical Review spatially coherent signal representing spurious reflectors not present in the reflectivity model used to generate the data. The positions of these artifacts follows from the analysis of the canonical relation, and can be computed via geometric optics (Stolk and Symes 2004). Thus ∗ the adjoint Fδ[v] may fail to be an imaging operator, and its output cannot be turned into an approximate linearized inversion by application of a DO. In particular, even versions of GRT that account for multiple rays connecting sources, receivers and scattering points (discussed below) cannot produce a linearized inversion. The likelihood of encountering a velocity for which the normal operator is not (at least essentially) a DO was addressed by Stolk (2000b), under the ‘complete coverage’ hypothesis. ∗ According to Rakesh (1988), Fδ[v] and Fδ[v] are FIOs in general. Recall that the class of FIOs is not closed under composition, so the normal operator is not guaranteed to be even a FIO. Stolk showed that the set of velocities for which the normal operator is a FIO is open and dense in the topology of C∞. Furthermore, for space dimension 2, the set of v for which F [v]∗F [v]isaDO plus a relatively smoothing error is also open and dense C∞(R2). Thus the DO property of the normal operator, with its positive consequences for linearized inversion, is generic, that is, recovered by an arbitrarily small perturbation of the velocity, at least for two space dimensions. It is natural to conjecture that the same property holds for three-dimensional problems.

3.7. Non-GRT inversion and the adjoint state method

It remains to describe how to compute a linearized inverse, or at least an image, when the simple ray geometry hypothesis is violated; hence, the GRT inversion formula (27) is not available. The challenge is to accommodate somehow the multiplicity of rays connecting sources, receivers and scattering points, that is, the non-graph nature of the canonical relation. A number of attempts in the 1990s to use selected single ray pairs, that is, a restricted canonical relation which is a graph, met with mixed success (Gray and May 1994, Audebert et al 1997). Operto et al (2000) analyze systematically the various coherent subrelations of the canonical relation in a complex example, and conclude that effectively all rays (that is, the entire canonical relation) must be taken into account to produce a reasonable result. They show that an imaging operator may be constructed as a sum of terms of the same form as the right-hand side of (27), each term corresponding to one ray pair connecting the scattering point to the source and receiver. The book-keeping for this construction is formidable. For example, if typically five rays connect a point in the Earth model with a point on the surface—not a difficult situation to arrange, with a mildly refracting velocity model—then 25 separate terms of the form (27) must be summed to produce the image. De Hoop et al (2003) address the additional computations required to turn the image into a linearized inversion. According to the last subsection, the production of an image boils down to application of the adjoint F [v]∗, for a choice of pulse at least close in some sense to δ. Alternate methods, not based on the GRT representation (23)ofF [v] and therefore not suffering its limitation to the simple ray geometry case, have been used since the 1970s to carry out this computation. These non-GRT algorithms fall into two classes. The first, pioneered by Claerbout (1971), produces an image as a by-product of solution of the wave equation (4) as an evolution equation in depth. As is well known, this evolution problem is ill-posed (Courant and Hilbert 1962). Various regularizations have been devised over the years; the literature on this depth extrapolation version of migration is enormous. I limit myself to pointing out the papers (Stolk and De Hoop 2005, 2006), which give a mathematically cogent discussion of this technique together with many references.

26 Inverse Problems 25 (2009) 123008 Topical Review

The second alternative originates in control theory (as does the depth extrapolation approach, viewed in the right way). Control engineers long ago devised a method for computing the adjoint of a linear map defined by solving an evolution problem. The method applies the adjoint by solving another evolution problem, the adjoint state system. Lions pioneered the application of this idea to control problems governed by partial differential equations (Lions 1971, 1981). Chavent carried it over to inverse problems (Chavent and Lemmonier 1974). Tarantala popularized the adjoint state method for application to seismic inversion (Tarantola 1984, 1986, 1987). Meanwhile, reflection seismologists discovered the concept quite independently, calling it reverse time migration (‘RTM’) (Whitmore 1983, Loewenthal and Mufti 1983, Baysal et al 1983). RTM was almost immediately identified as the adjoint state method (Lailly 1983, 1984), and as thereby related to the inverse problem, as will be discussed in the next section. It gained a reputation as the Cadillac of migration methods: accurate but expensive. As computer hardware and algorithms have improved, the relative cost of RTM has decreased, to the point where it is now very much a mainstream industry interest. See for example Mulder and Plessix (2004), Bednar and Bednar (2006) for some comparison of RTM with depth extrapolation migration, and Plessix (2006) for an overview of the adjoint state method in geophysics. In the context of the problem defining F [v] (equations (4), (5), (7), (6) and (10)), the adjoint state method requires introduction of another field q(x,t; xs ), which is a solution of the backward-in-time initial-boundary value problem  1 ∂2 −∇2 q(x,t; x ) = dm(x ; x )d(x ,t; x )δ(x − x ), (29) 2 2 s r s r s r v ∂t Y(xs ) q(·,t;·) = 0,.t>T. (30)

In (29), m is the induced measure on the receiver set Y(xr ) for source at xs . In practice, m is a discrete measure, the integral becomes a sum, and the right-hand side of (29) is a superposition of point radiators. Thus the ‘receivers act as sources, backward in time’. Finally,   2 ∂2u F [v]∗r(x) = dx dtq(x,t; x ) (x,t; x ). (31) v2(x) s s ∂t2 s Widely used implementations solve the wave equations (4) and (29) using finite difference or finite element methods, and tend to ignore the factor of 2/v2 and the second time derivative on the reference field u. The latter amendments do not affect the imaging property: the operator ∗ so produced differs from Fδ[v] by a DO factor. Convergence of numerical methods, on the ∗ other hand, requires finite frequency content, so that Fδ[v] is not really accessible. The pulse used should approximate δ—a bandpass filter, capturing the frequency content of the data, seems to be an optimal choice. The use of a finite frequency pulse is equivalent to applying ∗ Fδ[v] to smoothed data, with an as-yet only partly understood effect on image resolution. All of the considerations regarding success or failure in imaging (reproduction of WF(r)) discussed in the preceding paragraphs apply ipso facto to RTM, as it is simply a computation of F [v]∗. For example, Nolan and Symes (1997) produced the first numerical examples of kinematic artifacts by application of RTM to single shot gathers. It remains to consider whether RTM could be used as a component in linearized inversion. It is required to apply pseudoinverse of the normal operator. Two obvious avenues are (1) approximation of the pseudoinverse as a DO, via construction of the symbol p(x,ξ)in the defining oscillatory integral (24), and

27 Inverse Problems 25 (2009) 123008 Topical Review

(2) solving the system of normal equations: F [v]∗F [v]r = F [v]∗d, using an iterative method such as conjugate gradient iteration (Golub and Loan 1989). To this author’s knowledge, the first option has never been exercised, especially not in the non- simple ray geometry case, where Operto et al (2000) would suggest that the entire canonical relation would need to be included in the computation. The second has been carried out many times, usually as part of the solution of the nonlinear inverse problem; some of this work will be cited below. A third option uses the generic DO nature of the normal operator in the case of complete coverage to compute the action of the inverse normal operator implicitly, using only a small fixed number of applications of F [v] and F [v]∗. See Symes (2008a) for description and references.

4. Velocity analysis and nonlinear inversion

Even within the context of the linearized acoustic inverse problem, the reference velocity v is an essential ingredient. It is no more a data input from the seismic experiment than is the perturbation part of the model, which linearized inversion recovers. Some direct information about the velocity may be available, for example, from sonic logs in boreholes. However, sample a very small part of the subsurface volume. Also, sonic logging uses frequencies in the kHz range, which may excite somewhat different rock physics than do seismic waves in the tens of Hz. Thus information about velocity for linearized inversion must generally also be extracted from seismic survey data. The limitations of the linearized problem also suggest that some attention be devoted to the nonlinear inverse problem as formulated in subsection 2.3. As mentioned before, linearization is essentially synonymous with single scattering. Thus the chief physical phenomenon not within the scope of the linearized model is multiple reflection, in which waves interact multiple times with the material. Multiple-reflected waves are visible in most seismic field data, and often dominate. For example, in regions with acoustically hard sea bottoms, such as parts of the North Sea, multiple reflection between the surface and the sea bottom may obscure single scattering from deeper reflectors. Similarly, multiple reflection between the sea surface and any strong reflector, such as the top or bottom of a salt body, may appear as a very strong, near-repetitive signal. The acoustic wave equation (along with other less specialized variants of elastodynamics) models all of these data components, at least to some extent. It is natural to think that inversion of all of the data, based on a nonlinearized elastodynamic model, might more faithfully reflect the actual physics of seismic waves, hence result in a more accurate estimation of subsurface mechanics.

4.1. Nonlinear least-squares and waveform inversion A natural formulation of the nonlinear inverse problem of section 2.3, in view of the mapping properties of the forward map F, is nonlinear least squares:

minc F[c] − d L2(Y ×[0,T ]). (32) The least-squares approach to inversion has a long and productive history in geosciences; the classic works (Backus and Gilbert 1968, 1970, Jackson 1972, 1976, 1979, Robinson and Treitel 1980,Parker1977, Lines and Treitel 1984) provide excellent overviews of theory and application. Albert Tarantola popularized this approach to the inverse problem of

28 Inverse Problems 25 (2009) 123008 Topical Review reflection seismology in the 1980s, and many researchers have followed his lead. Considerable computational experience with this problem has accumulated over the last quarter-century: a partial list of contributions is Tarantola (1984, 1986), Gauthier et al (1986), Kolb et al (1986), Jannane et al (1989), Santosa and Symes (1989), Cao et al (1990), Tarantola et al (1990), Crase et al (1990), Bunks et al (1995), Cheong et al (2006), Shin and Min (2006), Chung et al (2007). As simulation of acoustic and elastic wavefields has become more computationally feasible, the nonlinear least-squares approach has enjoyed a robust revival of interest in the oil industry, under the name full waveform inversion. Two obstacles stand in the way of successful application to the reflection seismic problem. First, considerable circumstantial evidence suggests that even for consistent data (d = F[ctrue]), the least-squares problem (32) has many local minima, most having nothing to do with the model ctrue generating the data. Since both range and domain must be represented discretely as very high-dimensional vector spaces, rapidly convergent algorithms are highly favored—practically, this means steepest descent, Newton’s method, and combinations and variants. These fast optimization methods are all local, in that they find stationary points. In application to the problem (32) for various seismic models, these method tend to stall at non-informative presumed local optima, suggesting that many local minima exist. Thus, a very good initial guess is necessary in order that iterative optimization methods converge to global minimizers. Once such high-quality estimates are available, it is not clear that linearized inversion is not equally effective. This circumstance is the principal obstacle to successful deployment of (32). Finally, very little of any mathematical depth is known about the optimization problem at present. Natural mathematical questions include the following. • Do subdomains of natural function spaces of models c exist in which is the objective function (32) is differentiable? This is not a purely academic question, as it determines the form and characteristics of the optimization. • For such choices of domain, does (32) have a global minimizer? If not, what sorts of regularization can be employed to induce the existence of a minimizer without destroying critical properties of models (for example, rapid transitions)? • What is the effect on the global minimizer of the finite bandwidth of field data? • Do local minima actually exist? Some of these questions have been answered, at least partially, in the 1D case. A very instructive treatment of the 1D problem by Bamberger et al (1979) illustrates the importance of proper definition of domain; see the introduction for other 1D references. Delprat-Jannaud and Lailly (2005) demonstrate some of the potential pathology that can result from apparently natural choices. Very little is known about these questions for the 3D problem. For these and other reasons, most of the practical work on the nonlinear aspect of seismic waveform inversion has focused on the partially linearized problem, on which intuitive and mathematical information about single scattering can be brought to bear.

4.2. Partially linearized inversion The partially linearized inverse problem is as follows. Given d, find v,r so that F [v]r d. Just as in the nonlinear case, this problem may be formulated as a nonlinear least-squares problem:

minv,r F [v]r − d L2(Y ×[0,T ]). (33) In fact, problem (33) exhibits the pathology suspected of the nonlinear inverse problem (32), in a more transparent form.

29 Inverse Problems 25 (2009) 123008 Topical Review

To see how this works, turn once again to the 1D model problem of subsection 3.2. Introduce the explicit expression (19)into(33) to obtain  T 2 1 ct minc,r dt r − d(t) . (34) 0 2 2 This optimization problem has the trivial global zero-residual solution 2z r(z) = 2d , (35) c in which c functions as a free parameter. In other words, there is not enough information in the data to determine the model. This example has meaning for the actual problem insofar as the linearization is accurate. Accuracy requires that the perturbation (cr) be oscillatory, in particular lacking in low spatial frequency energy. Thus the predicted data are also oscillatory. This ability to fit oscillatory data with a wide variety of models with differing low-spatial- frequency trends is also a feature of the nonlinear inverse problem (see appendix D of Santosa and Symes (1989)). As asserted in the introduction to this review, spatial redundancy compensates for spectral incompleteness. A simple problem generalizing the 1D Born approximation problem and exhibiting this phenomenon is the plane wave inverse problem for layered constant density acoustics. The role of source parameter is played by the time lag vector ξ or slowness built into a plane wave source: g(x,t; ξ) = w(t − ξ · x)δ(z). (36) Plane waves have a long history in this subject—see for example Diebold and Stoffa (1981), Treitel et al (1982) and Duquet and Lailly (2006). Sources of the form (36) are not feasible in the field, but data for plane wave sources can be approximated ex post facto from field data for point sources by a version of the Radon transform (Santosa and Symes 1988). The model is layered if c = c(z), r = r(z), that is, the material properties depend only on depth. In that case, the solution exhibits the same symmetry as the source, at least for small enough |ξ|. By virtue of radial symmetry, it suffices to consider ξ = (ξ, 0, 0). Denote by u(x,t; ξ) the solution of (4) with the plane wave source as defined above. Then a short calculation shows that u(x,t; ξ) = U(z,t − ξx; ξ), for a solution U of 1 ∂2 ∂2 − ξ 2 − U(z,t; ξ) = w(t)δ(z), (37) c2(z) ∂t2 ∂z2 which is the same as the 1D constant density acoustic equation with the sound velocity c replaced by the vertical wave velocity c(z) v(z,ξ) = . 1 − c(z)2ξ 2 Note that for this expression to be sensible, it must be the case that c(z)|ξ| < 1. Accordingly, assume global bounds cmin  c(z)  cmax on velocity, and a corresponding bound |ξ|  −1 ξmax

30 Inverse Problems 25 (2009) 123008 Topical Review

The least-squares problem takes the form   ξmax T 2 1 v(ξ)t minc,r dξ dt r − d(t; ξ) . (39) ξmin 0 2 2 View (39) for a moment as a quadratic optimization for r, with c held fixed. The adjoint and normal operators are quickly computed, by analogy with (25) and (26):  ξmax 1 2z F [c]∗d(z) = dξ d ,ξ , (40) ξmin v(2ξ) v(ξ)  1 ξmax 1 F [c]∗F [c]r(z) = dξ r(z) ≡ N[c]r(z). (41) 2 ξmin 2v(ξ)

The normal operator is once again multiplied by a positive constant N[c], and the normal equations are easily solved to yield the minimizer in r of (39) for fixed c:  1 ξmax 1 2z r[c](z) = dξ d ,ξ . N[c] ξmin v(2ξ) v(ξ) Now suppose that the data d are consistent with this model, that is, for some r∗ ∈ L2(R), c∗ ∈ ∗ ∗ [cmin,cmax],d = F [c ]r . Then  ∗ 1 ξmax 1 v (ξ) r[c](z) = dξ r∗ z . (42) 2N[c] ξmin v(2ξ) v(ξ) If c = c∗, then r[c] = r∗ and the residual is zero, as it should be. If c = c∗, then it is easy to verify that for a positive K depending on c ,c ,ξ ,ξ , min max min max ∗ d v (ξ) ∗  K|c − c |,ξ∈ [ξmin,ξmax]. (43) dξ v(ξ) It follows that there exists c so that for sufficiently oscillatory r∗   ξmax T 2 1 v(ξ)t minr dξ dt r − d(t; ξ) 2 2 ξmin 0   ξmax T = dξ dt|F [c]r[c](t; ξ)− d(t; ξ)|2 ξmin 0 1 ξmax T  dξ dt|d(t; ξ)|2, (44) 2 ξmin 0 provided that |c − c∗|  c. To give a more precise meaning to this statement, consider a family of reflectivities   1 1 z − zmax r∗(z) = √ r∗ 2 , (45)   1  in which d 1 1 r∗(z) = R(z), R ∈ C∞ − z , z . (46) 1 dz 0 4 max 4 max This family is a fortiori oscillatory, with mean spatial wavelength proportional to . Insert the representation (45) and the definition (46) in the integral (42) and integrate by parts, using (43) to justify a change of variables under the integral sign and estimate the integrand. It follows that r[c](z) given by (42)isO(/|c − c∗|), whence the estimate (44)

31 Inverse Problems 25 (2009) 123008 Topical Review follows. A more refined analysis shows that for velocity errors on the order of , the mean square error is essentially constant and near 100%. That is, the width of the zone of small residuals for the least-squares objective is proportional to a wavelength. This is the behavior widely suspected of the nonlinear least-squares objective (32) as well, and explains the general failure of descent-based optimization methods for this problem.

4.3. Extensions and velocity analysis Since a straightforward data-fitting approach does not work, an alternative point of view has developed, leading to effective algorithms in daily use. I will present a formal version of this circle of ideas, rather different from the usual presentation in the geophysical literature, emphasizing the intrinsic connection to inversion. For many references on this topic, the reader may consult Symes (2008b). In the following, X denotes the part of R3 containing the support of the reflectivity r. By an extension of F [v] I mean manifold X¯ and maps χ : E (X) → E (X),¯ F¯ [v]: E (X)¯ → D (Y × (0.T )) so that F¯ [v] E (X)¯ → D (Y × (0,T)) χ ↑↑id E (X) → D (Y × (0,T)) F [v] commutes. An extension is invertible if F¯ [v] has a right parametrix F¯ [v]†, i.e. I − F¯ [v]F¯ [v]† is smoothing. (The trivial extension—X¯ = X, F¯ = F —is virtually never invertible.) Also χ has a left inverse η. Claim. The linearized inverse problem stated at the beginning of section 3 is equivalent to the following problem: given d, find v so that F¯ [v]†d ∈ R(χ). Proof. To say that F¯ [v]†d ∈ R(χ) means that for some r ∈ E (X), χr = F¯ [v]†d. Then the parametrix relation implies that F¯ [v]χr = F¯ [v]F¯ [v]†d d but the left-hand side of the above equation is simply F [v]r, because F¯ [v] extends F [v], establishing the claim. Note that restatement of the linearized inverse problem involves only v explicitly, not r (though it is determined implicitly as well), which is the origin of the term ‘velocity analysis’. Industrial terminology for the parametrix F¯ [v]† is prestack true amplitude migration operator, or one of several homonyms. It is common to replace the parametrix with the adjoint F¯ [v]∗, or with some other operator which differs from it by a DO so has the same singularity mapping properties. Membership in the range of χ then must be relaxed to having the same wavefront set as a member of the range of χ. The terminology originates in the form of one of the two commonly used extensions, which I shall call the standard extension. In both, X¯ = X × H , where H is a neighborhood of 0inRd ,d  3. In the standard extension, χ is the pullback by the projection on the X factor, that is, χr(x, h) = r(x). The extended forward map F¯ corresponding to F [v]is   δ ∂2 2r(¯ x, h) F¯ [v]r(¯ x +2h,t,x ) = dx dsG(x + h,t − s; x)G(x ,s; x). (47) s s ∂t2 v2(x) s s

32 Inverse Problems 25 (2009) 123008 Topical Review

channel offset (km) 10 20 30 40 50 0.5 1.0 1.5 2.0 2.5 0

0.6

0.8 0.5

1.0

1.2 1.0 time (s) time (s) 1.4

1.6 1.5

1.8

2.0

Figure 15. The right panel is the image gather created from the left panel (a CMP gather from a North Sea marine survey) by NMO correction, a crude approximation to prestack migration roughly correct when velocity is layered or nearly so and gently varying. Velocity was selected by an automated velocity analysis algorithm of differential semblance type to produce a maximally flat gather.

Note that the only difference between the right-hand sides of (20) and (47) is the presence of r(¯ x, h) rather than r(x), and that if r¯ = χr then F¯ [v]r¯ = Fδ[v]r. Note also that h is the half-offset between source and receiver positions (xr = 2h+xs ) in the predicted data (left-hand side), and is simply a parameter: F¯ [v]r(¯ · +2h, ·, ·) depends only on r(¯ ·, h)—that is, F¯ [v]is ‘block diagonal’, with h playing the role of the block index. Other versions of this extension use shot position or receiver position rather than half-offset as the block index. Being in the range of χ means being independent of h. Therefore, fixing the horizontal coordinates x and y, and plotting r¯ as a function of depth z vertically and one of the offset parameters (often h =|h|) horizontally produce a two-dimensional graphic with horizontal striations. A similar graphical display also appears flat if r¯ is the output of an operator differing from the parametrix only by a DO factor, such as the adjoint of F¯ [v]. These (depth, offset) slices through r¯ are termed image gathers. A typical example is presented in figure 15. Their flatness is an index of velocity correctness and leads to a generic velocity analysis algorithm: adjust the velocity until the image gathers are flat.In practice, only a limited selection of image gathers are inspected, either visually or digitally, and the velocities are parametrized with few enough parameters to achieve maximal flatness by systematically varying them. In the common case in which ray velocity vectors stay in a symmetric cone of aperture less than π about the vertical, r(¯ ·,z)is affected by velocity only at depths smaller than z, so it is possible to adjust spatially local velocity parameters (such as spline nodal values) progressively in depth, returning later to use deeper reflectors to refine shallower velocity parameters. This type of algorithm was strictly manual and interactive in

33 Inverse Problems 25 (2009) 123008 Topical Review the past. More recently, various techniques for automation have been advanced, one of which is reviewed in the next section. The standard extension is adequate for nearly layered structures. For more refractive velocity fields, it loses the invertible property: a right parametrix no longer exists, or is no longer related to the adjoint by a DO factor, which amounts to the same thing. This failure of invertibility was explained in Nolan’s thesis work reviewed in the last section (Nolan and Symes 1997), and is discussed at length by Stolk and Symes (2004). For velocity analysis in such complex refractive models, a different extension is more likely to be invertible. I term this the Claerbout extension, as it was introduced implicitly by Claerbout as part of his pioneering development of imaging technology (Claerbout 1985). The extension operator is χ[r](x, h) = r(x)δ(h), which makes sense for distributions of compact support, and    ∂2 2r(¯ x, h) F¯ [v]r(¯ x ,t,x ) = dx dh dsG(x ,t − s; x +2h)G(x ,s; x). r s ∂t2 v(x)2 s r Note that in the Claerbout extension, the half-offset vector h is not a function of the acquisition geometry, but rather a difference between two scattering points (second arguments of the Green’s functions). Also the condition of being in the range of χ is focusing rather than flatness. In other respects, this extension can be used in the same way as the standard extension to develop velocity analysis algorithms. As with the standard extension, the adjoint rather than a parametrix is typically used and is computed via either depth extrapolation or a variant of RTM, for which reason the somewhat misleading term ‘wave equation migration’ is often used. See Symes (2008b) for details and many further references to the large literature on this topic. The critical point is that the Claerbout extension is invertible, under much more general conditions than apply to the standard extension. Provided that (i) the complete coverage condition holds, modulo sampling, (ii) velocity vectors of rays (or at least those rays carrying significant portions of the wave front set of the scattered field) stay in a symmetric cone of aperture <πabout the vertical (that is, rays do not turn horizontal), and (ii) offset vectors h are restricted to horizontal (h = (hx ,hy , 0)), the canonical relation of F¯ [v] is a graph of the differential of a phase function. This landmark result of Stolk and De Hoop (2001)is explained in Stolk et al (2005) and Stolk and De Hoop (2006). It is this property, rather than the use of numerical wave equation solvers, to which the perceptible superiority of the Claerbout extension and velocity analysis based on it in complex refracting model is due. I close this subsection by reiterating its main point: velocity analysis is actually a method for approximate solution of the partially linearized inverse problem. The conceptual link between the two is the concept of extension. 

4.4. Optimization formulations of velocity analysis Since velocity analysis is a technique for solution of a partially linearized version of the inverse problem, it is natural to wonder if it can be posed as an optimization problem. This subsection presents one such reformulation. Suppose W : E (X)¯ → D (Z) annihilates the range of χ: χW E (X) → E (X)¯ → D (Z) → 0 and moreover W is bounded on L2(X)¯ . Then ; = 1 ¯ † 2 = 1  ¯ † ∗ ∗ ¯ †  J [v d] 2 WF [v] d 2 d,(F [v] ) W WF [v] d (48)

34 Inverse Problems 25 (2009) 123008 Topical Review attains its minimum value (0) when F¯ [v]†d] = χr, in which case (v, r) solves the partially linearized inverse problem. Guillemin (1985) first considered the construction of DO annihilators for the the ranges of FIOs such as (F [v]). For this particular FIO, the extension provides a simple avenue for such a construction. For example, in the standard extension one ∗ † may take W =∇h. Then F¯ [v]W WF¯ [v] is a DO vanishing on the range of F [v]. For the Claerbout extension, an obvious choice is multiplication by h. Provided that W is a DO, smoothness of J [v,d]inv follows from the DO property of the range annihilator F¯ [v]W ∗WF¯ [v]†, which differs from (F¯ [v]†)∗W ∗WF¯ [v]† in expression (48)byaDO factor. Smoothness of the symbols of these DOs follows from geometric optics, and smoothness of J [v,d] is a direct consequence. It is somewhat more interesting that only DO ‘pre-annihilators’ W give rise to smooth objective functions of this form. See Stolk and Symes (2003) for a discussion of this fact and its consequences. Objective functions of the form (48) were first proposed in Symes (1986b), and now go under the name differential semblance. For much additional discussion and examples of velocity analysis with synthetic and field data via differential semblance optimization, see Symes and Carazzone (1991), Symes (1991, 1993), Symes and Versteeg (1993), Kern and Symes (1994), Minkoff and Symes (1997), Plessix et al (1999), Plessix (2000), Chauris and Noble (2001), Mulder and ten Kroode (2002), Shen et al (2003), Brandsberg-Dahl et al (2003), De Hoop et al (2003), Shen et al (2005), De Hoop et al (2005), Khoury et al (2006), Albertin et al (2006), Li and Symes (2007), Verm and Symes (2006), Shen and Symes (2008). Beyond smoothness, not much is known mathematically, except for the special case of layered media. For layered partially linearized inversion, the author has been able to establish the absence of spurious stationary points, justifying the use of local optimization methods to minimize J [v,d] (Symes 2001, 1999). The reader is invited to investigate this question for the differential semblance function for the partially linearized plane wave modeling problem for layered acoustics and constant reference velocity v (equation (38)). An appropriate version of the standard extension simply permits the reflectivity r to depend on ξ. A natural ‘pre- annihilator’ W is differentiation with respect to ξ. With these choices, it is easy to see that J [v,d] is a convex quadratic function of v. Numerical experience as reported in the cited papers tends to indicate that the absence of spurious local minima holds generally for the differential semblance function and a wide variety of models.

5. Prospects: beyond linearization

As was discussed at the beginning of the last section, compelling practical and mathematical reasons exist for interest in the inverse problem without linearization. Despite a great deal of current industry interest in full waveform inversion (as the nonlinear least-squares problem (32) is known), the mathematical obstacles identified a quarter-century ago by Tarantola and colleagues (Gauthier et al 1986) remain in force. It seems likely that ideas that have proven useful in application to the partially linearized inverse problem, that is, velocity analysis, will need to be carried over in some form to the nonlinear problem if it is to be successfully attacked as a large-scale optimization problem. In particular, nonlinear versions of differential semblance seem feasible. The crucial step is identifying the appropriate extension. The analogue of the standard extension is straightforward: one simply models each gather (offset, shot,etc) independently, using a model depending on the gather parameter. I have shown that the differential semblance objective so obtained is locally convex near the global optimum, in the case of layered media and plane wave modeling (Symes 1991). These arguments depended strongly on the 1D theory cited in

35 Inverse Problems 25 (2009) 123008 Topical Review the introduction, and new ideas will be required to prove similar results for multidimensional problems. The standard extension is of limited utility even for the partially linearized problem, as explained in the last section. In Symes (2008b), the author introduced a nonlinear generalization of the Claerbout extension, which involves hyperbolic systems with operator coefficients. It remains to be seen whether this line of thought is productive. The two extensions introduced in the last section for the partially linearized problem are not globally equivalent, in any sense. The classification problem for extensions remains open, even for the partially linearized problem: to devise a satisfactory definition of equivalence, and to identify the resulting equivalence classes. Extensions inequivalent to the standard and Claerbout would be particularly interesting. I will end by mentioning one other source of open problems, suppressed throughout the rest of the discussion. The energy source, represented by the right-hand side g in the acoustic model (4) for example, is not known any more than is the model. This fact is at first surprising, but in fact direct measurement of the source is quite difficult. This observation raises the question: can the source be determined from the seismic data as well, at least insofar as is necessary to construct a reliable estimate of Earth mechanics? Some positive evidence exists in the context of the linearized problem (Minkoff and Symes 1997, Routh et al 2003); however, Lailly and coworkers have cast doubt on the possibility for the nonlinear acoustic inverse problem without strong constraints on the acoustic parameters (Delprat-Jannaud and Lailly 2005). A resolution of this issue is crucial to the prospects for successful nonlinear inversion in practice.

Acknowledgments

Many people have contributed to the picture of seismic inverse scattering sketched in these pages. I am especially grateful to Chris Stolk, Maarten De Hoop, Biondo Biondi, Sven Treitel, Albert Tarantola, Jim Carazzone, Scott Morton, Jon Claerbout, Norm Bleistein, Patrick Lailly, Guy Chavent, Paul Sacks, Fadil Santosa, Pierre Kolb, Greg Beylkin, Ken Bube, Joakim Blanch, Mark Gockenbach, Johan Robertsson, James Rickett, Gary Margrave, Rene-Edouard´ Plessix, Sergey Fomel, Wim Mulder, Fons Ten Kroode, Gils Lambare,´ Pascal Podvin, Herve´ Chauris, Mark Noble, Rakesh, Cliff Nolan, Sue Minkoff, Paul Sava, Felix Herrmann, Peng Shen, Alan Levander, Chris Stork, Jean-David Benamou, Eric Dussaud and many others for sharing their insights and enthusiasm over the years. I have tried to give the reader a reasonable overview of the literature, without any hope of being truly comprehensive, and apologize for any omissions. I thank ExxonMobil Upstream Research Co. and its predecessors, Exxon Production Research Co. and Mobil Research and Development Corp., for permission to use the data behind figures 2, 3, 4 and 12. Figure 15 was derived from data provided by NAM, de Nederlandse Aardolie Maatschappij, a joint venture of Shell and ExxonMobil.

References

Achenbach J 1973 Wave Propagation in Elastic Solids (Amsterdam: North-Holland) Aki K and Richards P 1980 Quantitative Seismology: Theory and Methods (San Francisco: Freeman) Albertin U, Sava P, Etgen J and Maharramov M 2006 Expanded Abstracts (Society of Exploration Geophysicists) pTOM2.1 Arridge S 1999 Inverse Problems 15 R41–93 Audebert F, Nichols D, Rekdal T, Biondi B, Lumley D and Urdaneta H 1997 Geophysics 62 1533–43 Backus G and Gilbert F 1968 Geophys. J. R. Astron. Soc. 16 169–205 Backus G and Gilbert F 1970 Phil. Trans. R. Soc. A 266 123–92

36 Inverse Problems 25 (2009) 123008 Topical Review

Bamberger A, Chavent G and Lailly P 1979 Appl. Math. Optim. 5 1–47 Baysal E, Kosloff D D and Sherwood J W C 1983 Geophysics 48 1514–24 Bednar J B and Bednar C J 2006 Expanded Abstracts (Society of Exploration Geophysicists) p SPMI1.4 Belishev M 2007 Inverse Problems 23 R1–R67 Beylkin G 1985 J. Math. Phys. 26 99–108 Beylkin G and Burridge R 1990 Wave Motion 12 15–22 Bishop T, Bube K, Cutler R, Laingan R, Love P, Resnick J, Shuey R, Spindler D and Wyld H 1987 Geophysics 50 903–23 Blanch J, Symes W W and Versteeg R 1998 Comparison of Seismic Inversion Methods on a Single Real Dataset ed R Keys and D Foster (Tulsa, OK: Society of Exploration Geophysicists) Blazek K, Stolk C C and Symes W W 2008 Well-posedness of initial/boundary value problems for hyperbolic integro-differential systems with nonsmooth coefficients Technical Report 08-5 Department of Computational and Applied Mathematics Rice University, Houston, TX, USA (http://www.caam.rice.edu) Bleistein N 1987 Geophysics 52 931–42 Bleistein N 1999 The Leading Edge 18 918–27 Brandsberg-Dahl S, De Hoop M V and Ursin B 2003 Geophysics 68 232–54 Browning B 2000 Inverse Problems 16 1377–403 Bunks C, Saleck F, Zaleski S and Chavent G 1995 Geophysics 60 1457–73 Burridge R, De Hoop M V, Miller D and Spencer C 1998 Geophys. J. Int. 134 757–77 Cao D, Singh S and Tarantola A 1990 Geophysics 55 458–69 Chauris H and Noble M 2001 Geophys. J. Int. 144 14–26 Chavent G and Lemmonier P 1974 Appl. Math. Optim. 1 121–62 Cheong S, Pyun S and Shin C S 2006 J. Seism. Explor. 14 335–45 Christensen R M 1982 Theory of Viscoelasticity—An Introduction (New York: Academic) Chung W, Ha T and Shin C 2007 Expanded Abstracts (Society of Exploration Geophysicists) pp SI2–4 Claerbout J F 1971 Geophysics 36 467–81 Claerbout J F 1985 Imaging the Earth’s Interior (Oxford: Blackwell) Cohen J and Bleistein N 1977 SIAM J. Appl. Math. 32 784–99 Cohen J and Bleistein N 1979 Geophysics 44 1077–85 Courant R and Hilbert D 1962 Methods of Mathematical Physics vol 2 (New York: Wiley Interscience) Crase E, Pica A, Noble M, McDonald J and Tarantola A 1990 Geophysics 55 527–38 De Hoop M V and Bleistein N 1997 Inverse Problems 16 669–90 De Hoop M V, Brandsberg-Dahl S and Ursin B 2003 Geophys. Prospect. 51 295–314 De Hoop M V, Foss S K and Ursin B 2005 Geophysics 70 U51–65 Delprat-Jannaud F and Lailly P 1993 J. Geophys. Res. 98 6589–605 Delprat-Jannaud F and Lailly P 2005 Geophysics 70 R1–R14 Diebold J and Stoffa P 1981 Geophysics 46 238–54 Duistermaat J J 1996 Fourier Integral Operators (Amsterdam: Birkhauser) Duquet B and Lailly P 2006 Geophysics 71 S185–97 Friedlander F 1958 Sound Pulses (Cambridge: Cambridge University Press) Gauthier O, Tarantola A and Virieux J 1986 Geophysics 51 1387–403 Gel’fand I and Levitan B 1955 Transl. Am. Math. Soc. 1 253–304 Gel’fand I and Shilov G 1958 Generalized Functions vol 1 (New York: Academic) Golub G H and Loan C F V 1989 Matrix Computations 2nd edn (Baltimore, MD: John Hopkins University Press) Gopinath B and Sondhi M 1971 J. Acoust. Soc. Am. 49 1867–73 Goupillaud P 1961 Geophysics 26 754–60 Gray S H and May W P 1994 Geophysics 59 810–7 Gray S H and Symes W W 1985 Geophys. J. R. Astron. Soc. 80 149–63 Guillemin V 1985 Pseudodifferential Operators and Applications (Providence, RI: American Mathematical Society) pp 149–55 Hagedoorn J G 1954 Geophys. Prospect. 2 85–127 Hormander¨ L 1971 Acta Math. 127 79–123 Hormander¨ L 1983 The Analysis of Linear Partial Differential Operators vols 1–4 (Berlin: Springer) Imanuvilov O and Yamamoto M 2001 Commun. Partial Differ. Equ. 26 1409–25 Jackson D D 1972 Geophys. J. R. Astron. Soc. 28 97–110 Jackson D D 1976 J. Geophys. Res. 81 1027–30 Jackson D D 1979 Geophys. J. R. Astron. Soc. 57 137–57 Jannane M et al 1989 Geophysics 54 906–10

37 Inverse Problems 25 (2009) 123008 Topical Review

Kern M and Symes W W 1994 Geoph. Prosp. 99 565–614 Khoury A, Symes W W, Williamson P and Shen P 2006 Expanded Abstracts (Society of Exploration Geophysicists) p SPMI3.4 Kolb P, Collino F and Lailly P 1986 Proc. IEEE 74 498–506 Kurylev S, Lassas M and Uhlmann G 2009 Am. J. Math. (to appear) Lailly P 1983 Conference on Inverse Scattering: Theory and Applications ed J Bednar (Philadelphia: Society for Industrial and Applied Mathematics) pp 206–20 Lailly P 1984 Inverse Problems of Acoustic and Elastic Waves ed F Santosa (Philadelphia: Society for Industrial and Applied Mathematics) Lewis R M and Symes W W 1991 Inverse Problems 7 597–632 Li J and Symes W W 2007 Geophysics 72 U75–88 Lines L and Treitel S 1984 Geophys. Prospect. 32 159–86 Lions J L 1971 Optimal Control of Systems Governed by Partial Differential Equations (Berlin: Springer) Lions J L 1972 Nonhomogeneous Boundary Value Problems and Applications (Berlin: Springer) Lions J L 1981 Some Methods in Mathematical Analysis of Systems and Their Control (New York: Gordon and Breach) Loewenthal D and Mufti I 1983 Geophysics 48 627–35 Madariaga R, Lambare´ G, Virieux J and Jin S 1992 Geophysics 57 1138–54 Minkoff S E and Symes W W 1997 Geophysics 62 540–53 Moczo P, Robertsson J O A and Eisner L 2006 Adv. Geophys. 48 421–516 Mulder W A and Plessix R E 2004 Geophysics 69 1491–504 Mulder W A and ten Kroode A P E 2002 Geophysics 67 1184–91 Nolan C and Symes W W 1997 Commun. Partial Diff. Equ. 22 919–52 Operto M S, Xu S and Lambare´ G 2000 Geophysics 65 1223–38 Parker R 1977 Annu. Rev. Earth Planet. Sci. 5 35–64 Plessix R E 2000 Geophys. Prospect. 48 937–51 Plessix R E 2006 Geophys. J. Int. 167 495–503 Plessix R E, de Roeck Y H and Chavent G 1999 SIAM J. Sci. Comput. 20 1033–52 Rakesh 1988 Commun. Partial Diff. Equ. 13 573–601 Rakesh 2001 Inverse Problems 17 1401–17 Ramm A 1986 Scattering by Obstacles (Dordrecht: Reidel) Regone C 2006 The Leading Edge 25 1467–75 Robinson E and Treitel S 1980 Geophysical Signal Analysis (Englewood Cliffs, NJ: Prentice-Hall) Routh P, Anno P and Baumel R 2003 Expanded Abstracts (Society of Exploration Geophysicists) Sacks P E and Symes W W 1985 Commun. Partial Diff. Equ. 10 635–76 Santosa F and Schwetlick H 1982 Wave Motion 4 99–110 Santosa F and Symes W W 1988 J. Comput. Phys. 74 318–81 Santosa F and Symes W W 1989 An Analysis of Least-Squares Velocity Inversion (Geophysical Monographs vol 4) (Tulsa, OK: Society of Exploration Geophysicists) Santosa F and Symes W W 2000 Chin. J. Mech. 16 15–21 Shen P and Symes W W 2008 Geophysics 73 VE49–60 Shen P, Symes W W, Morton S and Calandra H 2005 Expanded Abstracts (Society of Exploration Geophysicists) p SPVA1.4 Shen P, Symes W W and Stolk C C 2003 Expanded Abstracts (Society of Exploration Geophysicists) pp 2135–9 Shin C and Min D J 2006 Geophysics 71 R31–42 Stefanov P and Uhlmann G 2005 Int. Math. Res. Not. 17 1047–61 Stolk C C 2000a Wave Motion 32 267–90 Stolk C C 2000b On the modeling and inversion of seismic data PhD Thesis Universiteit Utrecht Stolk C C and De Hoop M V 2001 Seismic inverse scattering in the ‘wave-equation’ approach Technical Report2001- 047 The Mathematical Sciences Research Institute (http://msri.org/publications/preprints/2001.html) Stolk C C and De Hoop M V 2002 Commun. Pure Appl. Math. 55 261–301 Stolk C C and De Hoop M V 2005 SIAM J. Appl. Math. 65 1388–406 Stolk C C and De Hoop M V 2006 Wave Motion 43 579–98 Stolk C C, De Hoop M V and Symes W W 2005 Kinematics of shot-geophone migration Technical Report 05-04 Department of Computational and Applied Mathematics Rice University, Houston, TX, USA (http://www.caam.rice.edu) Stolk C C and Symes W W 2003 Inverse Problems 19 73–89 Stolk C C and Symes W W 2004 Geophysics 69 562–75

38 Inverse Problems 25 (2009) 123008 Topical Review

Symes W W 1986a SIAM J. Math. Anal. 17 1400–20 Symes W W 1986b Proc. 7th Int. Conf. on Computing Methods in Applied Science and ed R Glowinski and J Lions (New York: North-Holland) Symes W W 1991 SIAM J. Math. Anal. 22 680–716 Symes W W 1993 J. Geophys. Res. 98 2061–73 Symes W W 1995 Mathematical Foundations of Reflection Seismology Technical report The Rice Inversion Project Rice University, Houston, TX, USA (http://www.trip.caam.rice.edu) Symes W W 1999 All stationary points of differential semblance are asymptotic global minimizers: layered acoustics Technical Report 99-09 Department of Computational and Applied Mathematics Rice University, Houston, TX, USA (http://www.caam.rice.edu) Symes W W 2001 Asymptotic analysis of differential semblance for layered acoustics Technical Report 01-21 Department of Computational and Applied Mathematics Rice University, Houston, TX, USA (http://www.caam.rice.edu) Symes W W 2008a Geophysics 73 R23–35 Symes W W 2008b Geophys. Prospect. 56 765–90 Symes W W and Carazzone J J 1991 Geophysics 56 654–63 Symes W W and Versteeg R 1993 Expanded Abstracts (Society of Exploration Geophysicists) pp 696–9 Tarantola A 1984 Geophysics 49 1259–66 Tarantola A 1986 Geophysics 51 1893–903 Tarantola A 1987 Inverse Problem Theory (Amsterdam: Elsevier) Tarantola A, Crase E, Jervis M, Konen Z, Lindgren J, Mosegard K and Noble M 1990 Expanded Abstracts (Society of Exploration Geophysicists) p S13.7 Taylor M 1981 Pseudodifferential Operators (Princeton, NJ: Princeton University Press) Ten Kroode A P E, Smit D J and Verdel A R 1998 Wave Motion 28 149–72 Treitel S, Gutowski P and Wagner D 1982 Geophysics 47 1375–401 Treves F 1980 Introduction to Pseudodifferential Operators and Fourier Integral Operators vol II (New York: Plenum) Verm R and Symes W W 2006 Expanded Abstracts (Society of Exploration Geophysicists) pSI3.7 Virieux J, Jin S, Madariaga R and Lambare´ G 1992 Geophys. J. Int. 108 575–88 Weaver R and Lobkis O 2006 Geophysics 71 S15–9 White B S 1982 SIAM J. Appl. Math. 44 127–49 Whitmore N D 1983 Expanded Abstracts (Society of Exploration Geophysicists) p S10.1 Xu S, Chauris H, Lambare´ G and Noble M 2001 Geophysics 66 1877–94 Ziolkowski A 1993 Geophysics 58 1174–82 Ziolkowski A and Bokhorst K 1993 Geophysics 58 1183–94

39