Geodynamic tomography: constraining upper-mantle deformation patterns from Bayesian inversion of surface waves J. K. Magali, T Bodin, N Hedjazian, H Samuel, S Atkins

To cite this version:

J. K. Magali, T Bodin, N Hedjazian, H Samuel, S Atkins. Geodynamic tomography: constraining upper-mantle deformation patterns from Bayesian inversion of surface waves. Geophysical Journal International, Oxford University Press (OUP), 2021, 224 (3), pp.2077 - 2099. ￿10.1093/gji/ggaa577￿. ￿hal-03189035￿

HAL Id: hal-03189035 https://hal.archives-ouvertes.fr/hal-03189035 Submitted on 2 Apr 2021

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Geophys. J. Int. (2021) 224, 2077–2099 doi: 10.1093/gji/ggaa577 Advance Access publication 2020 December 03 GJI

Geodynamic tomography: constraining upper-mantle deformation patterns from Bayesian inversion of surface waves

J. K. Magali,1 T. Bodin ,1 N. Hedjazian ,1 H. Samuel 2 and S. Atkins3 1UCBL, CNRS, LGL-TPE, Universite´ de Lyon, 69622 Villeurbanne, France. E-mail: [email protected] 2Institut de Physique du Globe de Paris, Universite´ de Paris, CNRS, F-75005 Paris, France 3Laboratoire de Geologie,´ Ecole Normale Superieure,´ PSL Res. Univ, 75005 Paris, France Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Accepted 2020 December 1. Received 2020 November 21; in original form 2020 September 23

SUMMARY In the Earth’s upper mantle, seismic anisotropy mainly originates from the crystallographic preferred orientation (CPO) of olivine due to mantle deformation. Large-scale observation of anisotropy in tomography models provides unique constraints on present- day mantle flow. However, surface waves are not sensitive to the 21 coefficients of the elastic tensor, and therefore the complete anisotropic tensor cannot be resolved independently at every location. This large number of parameters may be reduced by imposing spatial smoothness and symmetry constraints to the elastic tensor. In this work, we propose to regularize the tomographic problem by using constraints from geodynamic modelling to reduce the number of model parameters. Instead of inverting for seismic velocities, we parametrize our inverse problem directly in terms of physical quantities governing mantle flow: a temperature field, and a temperature-dependent viscosity. The forward problem consists of three steps: (1) calculation of mantle flow induced by thermal anomalies, (2) calculation of the induced CPO and elastic properties using a micromechanical model, and (3) computation of azimuthally varying surface wave curves. We demonstrate how a fully nonlinear Bayesian inversion of surface wave dispersion curves can retrieve the temperature and viscosity fields, without having to explicitly parametrize the elastic tensor. Here, we consider simple flow models generated by spherical temperature anomalies. The results show that incorporating geodynamic constraints in surface wave inversion help to retrieve patterns of mantle deformation. The solution to our inversion problem is an ensemble of models (i.e. thermal structures) representing a posterior probability, therefore providing uncertainties for each model parameter. Key words: Inverse theory; Probability distributions; Seismic anisotropy; Seismic tomog- raphy; Surface wave and free oscillations.

seismic data, tomographers have produced detailed models of az- 1 INTRODUCTION imuthal anisotropy (e.g. Debayle et al. 2005; Deschamps et al. Seismic anisotropy reveals key insights into the Earth’s interior 2008; Adam & Lebedev 2012; Yuan & Beghein 2013, 2014), and structure and dynamics. In the upper mantle, the propagation of radial anisotropy (e.g., Plomerova´ et al. 2002;Lebedevet al. 2006; seismic waves appears to be anisotropic, which has generally been Nettles & Dziewonski´ 2008; Chang et al. 2014, 2015). Numerous associated with the preferred alignment of mantle minerals (Nico- studies have inverted dispersion curves by minimizing the differ- las & Christensen 1987; Montagner 1994). This so-called intrinsic ence between observed and synthetic phase and/or group velocities, anisotropy relates to the strain history induced by regional-scale proving that they can effectively constrain the depth dependence convection and is observable with various seismological tools, in- of anisotropy (e.g., Montagner & Tanimoto 1990; Ritzwoller et al. cluding surface waves. 2002). Seismic anisotropy can be described with 21 independent com- ponents of the elastic tensor. In practice however, the full tensor 1.1 Surface wave tomography studies cannot be resolved by the seismic data independently at every lo- Surface wave tomography offers a powerful technique to constrain cation, and generally only a restricted number of parameters are seismic anisotropy and to image the structure of the upper man- inverted for. This is done by assuming specific symmetry classes, tle at both regional and global scales. With growing amounts of or by using petrological constraints to impose relations between

C The Author(s) 2020. Published by Oxford University Press on behalf of The Royal Astronomical Society. All rights reserved. For permissions, please e-mail: [email protected] 2077 2078 J.K. Magali et al. some of the parameters. Surface waves in particular are only sensi- Ritzwoller 2002;Shenet al. 2012; Bodin et al. 2016; Ravenna & tive to 13 parameters that are just a linear combination of the elastic Lebedev 2017;Xu&Beghein2019). constants (Montagner & Nataf 1986). General practices in surface In this study, we propose a complementary approach to estimate wave tomography thus investigate: (1) radial anisotropy (assuming the full elastic tensor. This involves the incorporation of geodynamic vertical transverse isotropy, VTI, where the axis of hexagonal sym- and mineral physics modelling constraints: the textural evolution of metry is vertical), constrained by comparing the speed of Rayleigh peridotite aggregates during their deformation in the convective waves with that of Love waves, also known as the Rayleigh–Love mantle. We propose a method to invert directly for the temperature discrepancy (Babuska & Cara 1991); or (2) azimuthal anisotropy, field that produces convective flow and texture evolution. Modelling which deals with first-order variations of velocities as function of intrinsic anisotropy in this way removes the issue of low sensitivity the azimuth of propagation. For example, azimuthal anisotropy can from seismic waves since the elastic tensor is not explicitly inverted be inferred from the azimuthal terms of the phase for, but instead computed directly from texture evolution models. velocities (Smith & Dahlen 1973). Additionally,the inversion is performed using a Bayesian sampling Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021 Simultaneous interpretations of radial and azimuthal anisotropy algorithm, hence provide uncertainties on the obtained temperature have been the subject of extensive research (e.g. Beghein et al. field. 2014; Burgos et al. 2014). Joint efforts involving the use of apriori information have already been conducted to reduce the high dimen- sionality of anisotropic inversion. Montagner & Anderson (1989) showed that correlations exist between the elastic constants de- 1.2 Deformation-induced seismic anisotropy rived from petrological models, thereby reducing the total number In the upper mantle, the existence of large-scale anisotropy ap- of free parameters to be inverted for. This motivated the devel- pears to be ubiquitous in regions associated with strong deformation opment of ‘vectorial tomography’ where it involves inverting for (McKenzie 1979). Its interpretation is based on the development of seven parameters instead of 13: two angles defining the strike and crystallographic preferred orientation (CPO) in olivine aggregates dip of the symmetry axis, three coefficients defining the strength during their plastic deformation (Nicolas & Christensen 1987). Due of anisotropy and finally two isotropic coefficients (Montagner & to the physical process at its origin, seismic anisotropy can be inter- Nataf 1988; Montagner & Jobert 1988). Such a medium is also preted in terms of the strain history associated with upper-mantle known as tilted transverse isotropy (TTI) and describes the 3-D circulation. distributions of anisotropy. This further led to studies revealing Different proxies have then been utilized to interpret seismic that deformation-induced anisotropy can be described by a TTI anisotropy directly in terms of mantle flow. First-order seismic ob- medium where correlations appear to exist between P-andS-wave servations suggest that the fast axis of azimuthal anisotropy tends anisotropy (Becker et al. 2006). Such correlations can then be ex- to align with horizontal mantle flow (Ribe 1989; Becker et al. 2003, ploited to further simplify anisotropic inversion. Panning & Nolet 2014). However, this behaviour may not always be exhibited due (2008) then laid the groundwork to derive finite-frequency kernels to complex local deformation mechanisms associated with CPO of surface waves that are explicitly based on a TTI medium. In evolution. Moreover, it is also important to emphasize that the de- practice however, constraining the tilt may still be difficult due to velopment of anisotropy relates to the history of velocity gradients sparse azimuthal sampling, alongside other competing factors such along a flow line, and not to the velocity field itself. Laboratory ex- as non-uniqueness of the solution and poor data quality. Even so, periments of simple shear suggest that, at low strains, the orientation simultaneous inversions for radial and azimuthal anisotropy using of the olivine fast axis tends to be aligned with the long axis of the TTI models have already been applied at the regional scale using finite-strain ellipsoid (FSE, Zhang & Karato 1995; Ribe 1992). The probabilistic approaches to combat these shortcomings (Xie et al. amplitude of anisotropy, on the other hand, can be approximated as 2015, 2017). a monotonic function relating to the ratio between the long axis and Surface wave tomography is an ill-posed inverse problem. This the short axis of the FSE (Ribe 1992; Hedjazian & Kaminski 2014). arises from the uneven distribution of sources and receivers causing At sufficiently large strains however, CPO evolution deviates from limited ray path coverage, and from noise in the observed seis- the FSE due to the apparition of dynamic recrystallization. It tends mograms. The type of spatial parametrization may also lead to to align nearly parallel to the direction of shear instead (Zhang & ambiguity when interpreting tomographic results. A conventional Karato 1995; Bystricky et al. 2000), although its transient behaviour technique is to separate the problem into two steps. The first step remains complex (Hansen et al. 2014a). Following this observation, is to construct velocity maps for each considered period, which is a possible proxy is to interpret the orientation of the anisotropy fast an almost linear inverse problem. It is followed by an inversion of axis as the infinite strain axis (ISA), that is, the axis of the FSE each local dispersion curve to build a model of elastic parameters. in the limit of infinite strains (Kaminski & Ribe 2002). In practice The inversion is in general performed using a linearized technique, however, this proxy have had limited success at the global scale which favours a stable and unique solution through regularization, (Becker et al. 2014). for example by adding a spatial smoothness constraint on the model For that reason, an adequate interpretation of seismic anisotropy parameters. is usually based on numerical models of texture evolution. They More recently, the development of probabilistic approaches using require geodynamic flow models as inputs to provide the complete direct sampling of the model space makes it possible to handle the strain history. However, in some problems, the complete flow tra- non-uniqueness of the solution and estimate uncertainties on the jectory is unknown or too costly to compute, and only present-day inferred parameters. These methods require the evaluation of the flow is available. In this case, we propose to use a steady-state forward model a large number of times, and hence have a high com- assumption to reconstruct the deformation history. This approxima- putational cost. Nevertheless, numerous works have been successful tion is acceptable provided that the time-scale of texture evolution in applying such inversion schemes to seismic data and in particu- is much smaller than that of the flow fluctuations (Kaminski & Ribe lar to the inversion of surface waves dispersion curves (Shapiro & 2002). Geodynamic tomography 2079

1.3 Interpreting tomographic images with geodynamic be inverted for, our method directly inverts for a single scalar field modelling (e.g. temperature anomalies) and extra information is driven by the physics of mantle convection. The complete solution to our prob- In order to explain surface wave anisotropy, particularly in intra- lem is a probability distribution of the 3-D present-day thermal oceanic and young continental regions, first-order interpretations structure of the upper mantle. Since the complete elastic tensor is involve finite strains computed from global circulation models computed for each sampled model, we can also obtain a posterior (Becker et al. 2003). In their work, the density field derived from distribution of the full elastic tensor. In fact, any variable that is isotropic tomography (Becker & Boschi 2002)isusedtocom- implicitly computed in the forward model can be expressed as a pute instantaneous flow solutions in the upper mantle. Finite-strain posterior distribution in their respective model space (temperature, models derived from the flow are subsequently compared with az- flow, deformation and anisotropy). Thus, geodynamic tomography imuthal anisotropy in surface waves. However, as discussed above, may be viewed as a technique to reduce model dimension (i.e. the finite strain-derived models may fall short at larger strains due to number of inverted parameters) in the inverse problem. Our goal in Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021 dynamic recrystallization (Zhang & Karato 1995). This urges the this study is to lay its proof of concept by applying it to simple syn- use of computational strategies that incorporate texture evolution thetic temperature fields. In Section 2, we explain how geodynamic models to estimate the level of CPO anisotropy. tomography is implemented, starting with the model parametriza- Texture evolution can be modelled using micromechanical mod- tion, followed by the forward problem, the data and finally the els of viscoplastic deformation of upper-mantle minerals (Tommasi Bayesian inversion scheme. This is followed by Section 3, where et al. 2000). One of which in particular uses a kinematic formalism we apply the method to synthetic data obtained from prescribed to model texture evolution of olivine aggregates by plastic defor- temperature fields. The last section discusses current limitations of mation and dynamic recrystallization (Kaminski et al. 2004). It has geodynamic tomography, and its potential applications to real-Earth been extensively applied to predict CPO-induced anisotropy from problems. geodynamic flow models in a forward modelling approach at the regional (Hall et al. 2000; Lassak et al. 2006; Miller & Becker 2012; Faccenda & Capitanio 2013) and at the global scale (Becker et al. 2006, 2008). Forward models such as this assist further in 2 METHODOLOGY the interpretation of seismic tomography models in terms of mantle Geodynamic tomography involves two main procedures: (1) eval- circulation patterns. To cite an example, CPO-induced anisotropy uate the forward model completely, and (2) implement a fully resulting from to 3-D numerical simulations of subducting slabs Bayesian nonlinear inversion scheme with an McMC sampling shows consistency with radial anisotropy patterns inferred from technique. The solution of our inversion scheme is a poste- global tomographic images (Ferreira et al. 2019; Sturgeon et al. rior distribution of thermal structures and their corresponding 2019). However, most studies rely on visual comparisons between uncertainty bounds. Fig. 1 illustrates the complete inversion CPO obtained from numerical simulations and tomographic im- scheme. ages. To the best of our knowledge, no study yet exists where mantle deformation has been inferred directly from seismic observations using an inverse approach. 2.1 Model parametrization To parametrize the 3-D thermal structure in a Cartesian domain (x, 1.4 Geodynamic tomography y, z), we build a basis containing spherical temperature anomalies, on top of an adiabatic temperature gradient. Mathematically, this This motivated us to implement geodynamic tomography, an ap- translates to: proach where no symmetry is imposed to the elastic tensor at the M outset, and where seismic observations are inverted with constraints T (r) = T (r) + T i (r), (1) from geodynamic modelling, in a fully Bayesian parameter search background anomaly i=1 approach. To constrain the patterns of mantle deformation, we in- vert Love and azimuthally varying Rayleigh phase velocity dis- where the background temperature is assumed to be linear and only persion curves to retrieve the present-day thermal structure of the a function of depth z:   upper mantle. The thermal structure relates to density anomalies = + z − − , through a linear equation of state. The complete forward problem Tbackground(r) T0 1 (T0 1200 K) (2) L proceeds as follows (see Fig. 1): (1) given a temperature field, we s first numerically solve an instantaneous 3-D convection problem and M is the number of spherical anomalies, r = (x, y, z)defines with temperature-dependent viscosity (Samuel 2012). (2) Using the any point in the 3-D volume, T0 is the temperature at the bottom obtained velocity field and velocity gradient obtained, we track CPO (i.e. also the reference value) and Ls is the characteristic length evolution of olivine crystals where the steady-state assumption of scale. Each anomaly has a distinct size, temperature and position. the flow is implied. The result is a complete elastic tensor Sij at each We define the basis function for one given spherical anomaly as: point in space (Kaminski et al. 2004). (3) The last step involves     Tc β R computing synthetic surface wave dispersion curves using normal T (r) =− 1 − tanh r − r − , (3) anomaly 2 L 0 2 mode summation in a spherical earth (Smith & Dahlen 1973)and s their azimuthal variations from the full Sij (Montagner & Nataf where Tc is maximum temperature anomaly reached at the centre 1986). of the sphere r0 = (x0, y0, z0)andR controls its size. These five The inversion explores the model space using a Markov chain variables are unknown model parameters to be inverted for in our Monte Carlo (McMC) algorithm, and evaluates through Bayesian problem. The non-dimensional constant β = 20 controls the sharp- inference the posterior probability of model parameters. In oppo- ness of the temperature gradient. Additional details can be found in sition to conventional tomography where elastic parameters are to Appendix A. 2080 J.K. Magali et al. Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Figure 1. Geodynamic tomography (green) in comparison with traditional tomographic techniques (red). In geodynamic tomography, the unknown model to be inverted for is the temperature field denoted by T, whereas in traditional tomography, the model is a fourth-order elastic tensor Sij with 21 independent coefficients. Often, tomographers assume a hexagonally symmetric medium onto Sij to reduce model complexity. The complete forward model (in green) is cast in a Bayesian McMC framework. One of the advantages of geodynamic tomography is the reduction of unknown model parameters due to constraints from geodynamics.

We model the medium rheology by assuming a temperature- Table 1. Dimensional parameters that define the Rayleigh number. dependent viscosity, following the Frank–Kamenetskii approxima- Symbol Parameter Value tion to Arrhenius-type viscosity. Here, we only invert for a di- η 21 · mensionless scalar constant E, which plays a similar role to the 0 Viscosity 10 Pa s α Thermal expansion 2 × 10−5 K−1 conventional activation energy (i.e. the sensitivity of viscosity to g Gravity 9.81 m s−2 temperature). The viscosity field is described by:   Ls Layer thickness 400 km T Temperature scale 1900 K (T (r) − T0) 0 −6 2 −1 η(r) = η0 exp −E , (4) k Thermal diffusivity 10 m s T0 −3 ρ0 Density 3800 kg m × 6 where η0 is a reference value for viscosity. The total number of Ra Rayleigh number 1.05 10 parameters defining the model is therefore 5M + 1, and the corre- sponding model vector m is defined as: flow is given by: i i i i i M M M M M m = [ E, x 0 , y0 , z0 , R , T c , ..., x 0 , y0 , z0 , R , T c ]. ∇·u = 0, (6) (5) and −∇P +∇·[η(∇u +∇uT )] + ρ g eˆ = 0, (7) 2.2 The forward problem g The forward problem involves three main steps: (1) regional flow where u is the flow velocity, P is the dynamic pressure and eˆg is modelling in 3-D Cartesian coordinates, (2) modelling texture evo- a unit vector pointing towards the direction of gravity. We assume ρ lution and computation of the full elastic tensor and (3) computation density to be a function of temperature T using a linear equa- α of seismic surface wave dispersion curves. We enhance the com- tion of state controlled by a thermal expansion coefficient ,where ρ = ρ − ρ α − putational efficiency in Step 2 by using a surrogate model based (T ) 0 0 (T T0). The Rayleigh number, a dimensionless on an artificial neural network (ANN) to compute the deformation- quantity that relates to the level of free convection, is chosen such = × 6 induced anisotropy. that it is representative of the upper mantle (Ra 1.05 10 ). The dimensional values of the governing parameters are listed in Table 1. The Stokes equations are discretized using a finite-volume approach (e.g. Patankar 1980; Albers 2000), and are solved using the cou- 2.2.1 Flow model pled iterative geometric multigrid method using V-cycles (Brandt For our instantaneous flow models, we consider the buoyancy-driven 1982;Gerya2010), yielding linear convergence with the number convection of a highly viscous, Newtonian and incompressible fluid of unknowns. The complete code is parallelized with OpenMP. The in a 3-D Cartesian coordinate system. The flow is subjected to free- accuracy of the numerical solution has been benchmarked against slip boundary conditions. The system of equations describing the numerical and analytical solutions (Samuel 2012, 2018). Geodynamic tomography 2081

Although the code accommodates sharp viscosity contrasts, the model, such as ANN are sometimes therefore used. Such approxi- latter tend to reduce the speed of convergence. Sharp viscosity mations, however, lead to a theoretical error (also called modelling contrasts are avoided in this study since smooth thermal structures error). The form of these errors can be estimated and modelled as are considered in our prior distribution. The velocity gradients are a Gaussian probability distribution with its resulting variance be- obtained by second-order finite differences of the computed velocity ing accounted for in the likelihood function during the inversion field. process (Hansen et al. 2014b;Kopke¨ et al. 2018). In our case, the computational bottleneck is clearly the texture evolution modelling, which we addressed by using an ANN-based surrogate model to ap- 2.2.2 Modelling intrinsic anisotropy proximate seismic anisotropy. In the field of geophysics, these methods have already been used Upper-mantle minerals develop CPO due to progressive shearing to approximate the inverse function in a variety of applications in along a flow path. We initially model CPO evolution by employing seismology (e.g. Meier et al. 2007;Kaufl¨ et al. 2014; Hansen & Cor- Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021 D-Rex, a kinematic model of strain-induced crystal lattice pre- dua 2017; Hulbert et al. 2019), and in geodynamics (e.g. Shahnas ferred orientation of olivine and enstatite aggregates developed by et al. 2018). Among these studies, some have already applied sur- Kaminski et al. (2004). The crystal aggregates respond to an im- rogate models for fast forward approximations in sampling-based posed macroscopic deformation by two mechanisms: (1) dislocation techniques (Hansen & Cordua 2017;Kopke¨ et al. 2018; Conway creep which induces re-orientation of each crystallographic axis et al. 2019; Moghadas et al. 2020). and (2) dynamic recrystallization, which allows for the evolution These networks are composed of highly nonlinear functions that of crystallographic volume fractions by grain nucleation and grain can be trained to approximate a nonlinear mapping between an boundary migration. In this study, we only consider pure olivine of input and an output (Bishop et al. 1995). To approximate such type-A fabric corresponding to dry upper-mantle conditions. The a function, one needs to train this network given a collection of raw output of D-Rex is a set of crystallographic orientations and training data consisting of a set of input and output pairs. In this volume fractions for a given aggregate. Finally, its effective elastic work, we replicate the operator for texture evolution, which we properties can be estimated with an averaging scheme such as the now denote as gCPO. Flow streamlines with assigned local velocity Voigt average (Mainprice 1990). In Voigt notation, the elastic tensor gradients are fed into the network as training inputs. The training can be represented as a 6 × 6 matrix with 21 independent elastic output contains the anisotropic part of the elastic tensor δS(T0, P0) coefficients. computed from D-Rex. The package scikit-learn in Python is used D-Rex does not account for pressure and temperature dependence to train the network (Pedregosa et al. 2011, see Appendix B for full of the single crystal elastic parameters. We model the temperature details of the method). and pressure dependence of the isotropic speeds (Vp Once the network is trained, which we denote as the operator and Vs)usingPerpleX, a numerical tool that solves the Gibbs free gnn, we perform a simple numerical test of 3-D deformation due to energy minimization problem (Connolly 2005, 2009). We use the a cold spherical temperature anomaly, and applied both operators thermodynamic model from Stixrude & Lithgow-Bertelloni (2011). to output seismic anisotropy. Fig. 2 shows the percentage of total We assume olivine mantle composition for isotropic seismic wave anisotropy found by the two methods. We observe comparable levels speed calculations. Meanwhile, the elastic tensor given by D-Rex is of anisotropy. Moreover, the approximation also appears to capture at a reference temperature and pressure. It can be decomposed into some important features such as the absence of anisotropy at the δ an isotropic and anisotropic part Siso,and S(T0, P0), respectively: centre, which is ascribed to the larger viscosity of the anomaly in this region. However, the surrogate model tends to underestimate S(T0, P0) = Siso(T0, P0) + δS(T0, P0). (8) the total anisotropy, which may be attributed to the simplicity of We replace the isotropic part of the tensor with the one computed the network architecture, and the number of available training data from Perple X. To account for the pressure and temperature depen- used. dence of the anisotropic part, it is scaled by the ratio between the shear modulus μ(T, P) at the given pressure and temperature, and the shear modulus at the reference temperature–pressure μ(T0, P0) 2.2.4 Predicting surface wave data (Gallego et al. 2013). Other methods are available, such as the use of first-order corrections around the elastic tensor at ambient T and For any geographical location at the surface, we can extract the P conditions (Estey & Douglas 1986;Beckeret al. 2006). Thus, 1-D velocity profile (e.g., Sij as a function of depth) and compute the full elastic tensor, whose isotropic part depends on pressure and dispersion curves for Love and Rayleigh waves. The azimuthal temperature is: dependence of surface wave phase velocity can be treated as the sum of a small anisotropic perturbation around an isotropic phase μ(T, P) velocity model (Smith & Dahlen 1973) giving: S(T, P) = Siso(T, P) + δS(T0, P0). (9) μ(T0, P0) c(T,θ) = c0(T ) + c1(T )cos(2θ) + c2(T )sin(2θ)

+c3(T )cos(4θ) + c4(T )cos(4θ), (10) 2.2.3 Fast forward approximation for texture evolution where T is the period and θ is the azimuthal angle. calculations In this work, we only invert c0(T ), c1(T )andc2(T ) for Rayleigh Sampling-based techniques such as McMC schemes can be applied waves and only c0(T ) for Love waves. It is not common to to invert to most geophysical inverse problems provided that the parame- other terms, due to low sensitivity or to high levels of noise. For ter space can be sampled efficiently. In some cases however, the convenience, we denote isotropic Rayleigh wave phase velocity as forward model is computationally expensive, and sampling-based cR (T ) and phase velocity as cL (T ). techniques may not be efficient at approximating a multidimen- The different terms in eq. (10) can be computed from Sij in a fully sional probability distribution. Fast approximations of the forward nonlinear fashion by normal mode summation with a Runge–Kutta 2082 J.K. Magali et al. Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Figure 2. Vertical cross section of the percentages of total anisotropy obtained from: neural networks (left), and D-Rex (right). The total anisotropy is derived from the norm of the elastic tensor. The slices are oriented along the yz-plane, and taken at the centre of the x-axis (i.e. x = 200 km).

Table 2. True model parameters defining 2.3.1 The likelihood function the synthetic temperature field. The likelihood function p(d|m) quantifies how well the model pa- Model Assigned rameters explain the observed data (i.e. the ensemble of local dis- parameter value persion curves located at the surface). Supposing that each data x0 200 km type (i.e. cR and cL for isotropic Rayleigh and Love wave disper- y0 200 km sion curves, respectively; c1 and c2 for Rayleigh wave anisotropy) z0 200 km is measured independently, the likelihood function gives: R 120 km Tc 800 K p(d|m) = p(cR|m) p(cL |m) p(c1|m) p(c2|m). (12) E 11.0 For all dispersion curves, we assume that the errors are uncorre- lated and follow Gaussian distributions with zero mean, and vari- ances σ 2 , σ 2 , σ 2 and σ 2 . For isotropic Rayleigh and isotropic cR cL c1 c2 matrix integration (Takeuchi & Saito 1972). We refer the reader to Love waves cR and cL , respectively, we can express the likelihood Montagner & Nataf (1986) and Bodin et al. (2016) for details. The function as a Gaussian distribution:   seismic forward model is computed using a 1-D earth assumption −|| obs − ||2 1 cR,L cR,L (m) beneath each geographical location. We acknowledge that surface p(cR,L |m) = exp . (13) 2 N σ 2 (2πσ ) 2 2 c , waves velocities depend on 3-D heterogeneities, and particularly cR,L R L the fact that surface wave computations exhibit nonlinearities due Here, the likelihood function corresponds to a single dispersion to mode-coupling and finite-frequency effects (e.g. Sieminski et al. measurement where N is the number of discrete periods. The like- 2007;Ekstrom¨ 2011). However, these approximations can be treated lihood functions of the 2θ terms, c1 and c2, can be written in the as theoretical errors and can be accounted for in the Bayesian in- same manner as eq. (13). version procedure.

2.3.2 A maximum-likelihood estimate of data errors σ 2.3 Bayesian sampling scheme In general, it is difficult to estimate cR,L due to the lack of knowl- edge on the error distribution. In particular, approximating an elastic We formulate the problem in a fully nonlinear Bayesian framework tensor with a neural network may introduce errors that are difficult (Box & Tiao 2011; Smith 1991; Mosegaard & Tarantola 1995), to quantify. where the predicted surface wave dispersion curves estimated for a In this work, we use a maximum-likelihood estimate (MLE) of the large ensemble of models (3-D temperature fields) are compared to σ σ noise parameters cR,L and c1,2 following the work of Dettmer et al. observed data. The solution of the inverse problem is the posterior (2007). This is performed by maximizing the likelihood function | distribution p(m d), the probability model of parameters m given over the data standard deviation. The strength of this technique is the data d. According to Bayes’ theorem, we have: that it is not necessary to estimate each contribution to the noise parameters individually. Maximizing eq. (13)overσ yields: p(m|d) ∝ p(m) p(d|m). (11) cR,L   N 1/2 1 obs 2 The prior distribution p(m) describes our predetermined knowl- σ = (c − c , (m)) . (14) cR,L N R,L R L edge on m (i.e. the position and the amplitude of thermal anomalies, i=1 as well as the activation energy). The likelihood function p(d|m) Substituting eq. (14) onto eq. (13), and taking the log likelihood describes the probability of observing the data given our current we obtain: knowledge of the model parameters.   Since our forward problem is highly nonlinear, the posterior dis- N N obs 2 ln[p(c , |m)] =− ln (c − c , (m)) . (15) tribution is sampled using an McMC algorithm. It involves direct R L 2 R,L R L sampling of the parameter space by random iterative search, where i=1 the distribution of the sampled models asymptotically converges The log-likelihood functions of c1 and c2 can be defined using the towards the posterior distribution. same procedure. This method has two advantages: (1) the absolute Geodynamic tomography 2083 Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Figure 3. (a) Cross-sectional view in the yz-plane of the 3-D temperature field. The slice is taken at the centre of the x-axis. (b) 3-D flow velocity due to the sinking anomaly. Largest flow magnitudes correspond to the cold anomaly.

Figure 4. Phase velocity maps resulting from one sinking anomaly at 100 s period. (a) Rayleigh wave phase velocity (km s−1). (b) Azimuthal anisotropy in Rayleigh waves (km s−1). The solid black lines correspond to the direction of the fast propagation axis. Surface wave maps always lie along the xy-lateral plane. value of errors need not be defined and (2) in the case of joint knowledge (Mosegaard & Sambridge 2002). Adopting the same inversion, we do not have to define the relative weights between formulation, the prior can be written as:  each data type. Finally, the full log-likelihood function gives: > , < = 0 mi mmax mi mmin p(mi ) 1 ≤ ≤ , (17) ln[p(d|m)] = ln[p(cR |m)] + ln[p(cL |m)] + ln[p(c1|m)] m mmin mi mmax

+ ln[p(c2|m)]. (16) where mmax and mmin are the prior bounds for the model. Assuming that the model parameters in our inversion are prior independent, we can express the prior fully as: 2.3.3 The prior distribution   M = i i i i i , In Bayesian inference, one expresses the aprioriinformation in p(m) p(E) p(x0) p(y0) p(z0) p(R ) p(Tc ) (18) terms of a probability distribution p(m). In geophysical inverse i=1 problems, model parameters are typically given a uniform prior where p(E) is the prior distribution for the activation energy, and distribution with given upper and lower bounds inferred from prior M is the total number of spherical temperature anomalies. For an 2084 J.K. Magali et al. Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Figure 5. Synthetic surface wave dispersion curves from 10 to 200 s at a given location: (a) Rayleigh wave phase velocity, (b) Love wave phase velocity, (c) Rayleigh anisotropy c1 and (d) Rayleigh anisotropy c2. Scatter plot: observed dispersion curve with added noise. Line plot: observed dispersion curve without noise.

i i i ith temperature anomaly, p(x0), p(y0)andp(z0) are the prior dis- (4) perturb the temperature of the sphere Tc. i i tributions for position; p(R )andp(Tc ) are the prior distributions for the size and temperature, respectively. We choose wide uniform prior distributions. For the prior bounds, we select: (1) the length Each perturbation is drawn from a univariate normal distribution of the spatial domain (0–400 km) for the positions x0, y0 and z0, centred at the current value of the model parameter. (2) 40–240 km for R, (3) 500–1200 K for Tc and (3) 6–12 for (ii) Perturb the activation energy: we then apply eq. (1)tode- E. Choosing wide bounds ensures that the model parameters are fine the 3-D temperature field. Alongside, we perturb the acti- loosely constrained from the prior, and more emphasis is given to vation energy E by using a normal distribution centred at the the information provided by the data. current value of E, and apply eq. (4) to define the 3-D viscos- ity field. These two scalar fields are used as inputs in the flow calculation. 2.3.4 A random walk to sample the posterior distribution We use an McMC algorithm to sample the posterior distribution. It begins by randomly selecting an initial temperature model followed If the proposed model lies within the prior bounds following by the evaluation of the initial log likelihood. At each iteration, the eq. (18), we evaluate the forward problem completely. The com- current model is perturbed to propose a new model. The proposal puted dispersion curves from the latter are compared with the proceeds sequentially as follows: observed data using eq. (15). The resulting likelihood is then compared to the likelihood of the current model, and the pro- (i) Assign local perturbation: one sphere is randomly picked out posed model is either accepted or rejected according to an ac- of M number of spheres. Once a sphere is picked, we randomly select ceptance probability (Metropolis et al. 1953; Hastings 1970). If one of four possible ways to perturb the sphere are as follows: the proposed model is accepted, it becomes the current model (1) perturb horizontal position; i.e. x0 and y0 together; for the next iteration. After a sufficient number of iterations, (2) perturb vertical position z0; the ensemble of accepted models converges towards the posterior (3) perturb the size of the sphere R; distribution. Geodynamic tomography 2085 Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Figure 6. Posterior probability distribution in the 6-D parameter space inferred from the isotropic inversion p(m|cR , cL ). Diagonal panels show 1-D marginal distributions for each model parameter. Off-diagonal panels show 2-D marginal distributions and depict possible trade-offs between pairs of model parameters. The red vertical lines and the black markers indicate the true model values for the diagonal and the off-diagonal panels, respectively. The intensity pertains to the level of posterior probability (i.e. high intensity means high probability, and thus low misfit).

3 APPLICATION WITH 3-D SYNTHETIC phase velocity is maximum at the middle of the region, due to the TEMPERATURE FIELDS presence of the cold anomaly underneath. Fig. 4(b) shows a map of azimuthal anisotropy in Rayleigh waves. 3.1 Inversion for one spherical anomaly Here, anisotropy is at its minimum at the centre, above where the cold more viscous anomaly is located. As a result of this higher rigid- We demonstrate our proof of concept by setting up a simple temper- ity, local velocity gradients are lower, resulting in smaller amounts ature field consisting of one spherical negative temperature anomaly of deformation and hence lower anisotropy. Another feature is the (i.e. negatively buoyant) placed at the middle of a 400 km × 400 km presence of strong anisotropy at certain locations. These regions are × 400 km box. The setup is a very simple toy example inspired by points where shear deformation is at its maximum due to the con- the work of Baumann et al. (2014) where they applied Bayesian vergence of flow lines. On top of the level of azimuthal anisotropy inversion to constrain rheology from gravity anomalies and surface is the orientation of its fast axis. Since we expect the flow direction velocities. to converge towards the centre when observed from the top, the fast Table 2 shows the complete list of true model parameters, and axis may be interpreted as the horizontal projection of the flow. Fig. 3 displays a cross-sectional view of the temperature field, and The complete data constitute a regular array of 8 × 8 locations its associated instantaneous velocity field. containing c , c , c and c spanning the entire surface. We empha- We simulate the full forward model given the true model param- R L 1 2 size that the data generated comes from an elastic tensor computed eters to generate synthetic dispersion curves at periods between 10 with D-Rex whereas during inversion, the estimated data are ob- and 200 s. Fig. 4 shows a map of the computed phase velocity and tained from an elastic tensor approximated by neural networks. azimuthal anisotropy for Rayleigh waves at 100 s. In Fig. 4(a), the 2086 J.K. Magali et al. Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Figure 7. Posterior probability distribution in the 6-D parameter space inferred from the anisotropic inversion p(m|cR , cL , c1, c2).

Finally, we added random uncorrelated noise onto cR , cL , c1 2-D histograms to explore possible trade-offs. The black circles and c2. Standard deviations for Love and Rayleigh are set at σ R = indicate the values of the true model parameters. Compared to −1 σ = −1 σ = −1 0.05 km s and L 0.05 km s ,whereas c1 0.01 km s and isotropic inversion, the width of the posterior distribution inferred σ = −1 c2 0.01 km s . Fig. 5 illustrates the resulting dispersion curves from geodynamic tomography has been reduced considerably. More at one given location with and without noise. information is thus added by introducing geodynamic constraints The inversion consists of 20 independent Markov chains each in the tomographic problem. containing 40 000 samples initiated at a random temperature struc- As expected, the posterior distribution on the activation energy ture. We demonstrate two cases. First is an isotropic inversion, E in the isotropic case is flat, as isotropic velocities are only sen- where no anisotropy is involved in the forward model. In this case, sitive to temperature and not to viscosity. Anisotropic inversion, it is not necessary to compute instantaneous flow and anisotropy, as on the other hand, constrains E asshowninFig.7. The distri- isotropic seismic velocities Vp and Vs can be directly scaled with bution, however, appears to be distant from the correct value of temperature. The inverted data are the isotropic phase velocities cR E. Such a behaviour is also evident in its 2-D marginal poste- and cL . Secondly, we present an anisotropic inversion (geodynamic rior where the true value is outside the inferred distribution. This tomography). Both isotropic and anisotropic inversions are given the clearly exhibits a bias which is deduced from the imperfections of same wide uniform priors allowing for more mobility when search- the neural network when computing anisotropy. This effect is elim- ing the parameter space. We initiate geodynamic tomography by inated when one uses the correct forward operator for modelling first employing an isotropic inversion. Once the chains have con- anisotropy. Another distinct feature in these figures is the negative verged in this phase, we then start the actual anisotropic inversion trade-off between Tc and R, which may be attributed to the sym- procedure. metry of the problem considered. An increase in temperature of The diagonal panels of Figs 6 and 7 illustrate the ensemble of the anomaly compensates for an increase in its radius. Such trade- models recovered from isotropic inversion and anisotropic inver- offs may be reduced in the case where the true model exhibits less sion. The off-diagonal panels depict 2-D marginal distributions as symmetry. Geodynamic tomography 2087 Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Figure 8. Upper panel: cross-sectional view in the yz-plane of the mean temperature field recovered from (a) isotropic inversion, and (b) anisotropic inversion. Lower panel: standard deviations around the mean temperature fields from (c) isotropic inversion and (d) anisotropic inversion. These cross-sections are taken at the centre of the x-axis.

We also plot the mean temperature models from both inversions and the trend of the fast axis of azimuthal anisotropy at a given lo- (see Fig. 8). The figures are obtained by averaging the temperature cation. Both methods capture the 1-D structures for temperature. values at each point. By visual inspection, anisotropic inversion However, by adding geodynamic constraints (i.e. anisotropic inver- better resolves the 3-D thermal structure. This is further supported sion), we observe that the temperature is much better resolved. Ad- by the standard deviation computed around the mean temperature ditionally, we successfully recover radial anisotropy and azimuthal at a given pixel as shown in Figs 8(c) and (d). In both cases, the anisotropy without having to explicitly invert for the elastic tensor standard deviations is higher at the centre of the box, where the (see Fig. 9b). Here, due to the positioning of the chosen depth profile spherical anomaly is located. This is due to the variations in the for temperature (passing nearly through the centre of the anomaly), location and amplitude of the sphere in the ensemble of sampled the azimuthal anisotropy appears to be non-existent at this location. models. In the anisotropic case, the vertical position of the sphere For that reason, we consider another depth profile (x = 325 km and y is less constrained than its horizontal position, as can be seen in the = 225 km) where azimuthal anisotropy is notable (Fig. 9b, middle). 2-D histograms. The ensemble of sampled spheres therefore share This method also allows us to resolve 3-D structures of seis- the same horizontal position but have a variable vertical position, mic properties. In fact, any implicitly computed variable can be which explains the shape of the standard deviation map in Fig. 8(c). restructured in 3-D. Figs 10 and 11 show the resulting structures The posterior uncertainties are also relatively small compared to the computed from the mean temperature model placed side by side recovered temperature field, implying that sufficient information can with that of the true model. It appears that the value of anisotropy be retrieved from the noisy dispersion curves. computed with the neural network is underestimated compared to Fig. 9(a) shows the 1-D depth marginal posterior probability that of D-Rex when using the same input model. This explains why profiles (see the captions for further details) for temperature, and the activation energy E resulting from the inversion is lower com- Fig. 9(b) for radial anisotropy ξ, peak-to-peak azimuthal anisotropy, pared to the true value: to produce larger anisotropy and replicate 2088 J.K. Magali et al. Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Figure 9. Upper panel: probability density plots of temperature with depth. Lower panel: probability density plots of radial anisotropy, peak-to-peak azimuthal anisotropy and its fast axis with depth. The depth profiles of temperature and radial anisotropy are taken nearly through the centre of the sphere. To show that azimuthal anisotropy is also well constrained, we took a depth profile at (x = 325 km and y = 225 km), where azimuthal anisotropy is large. Geodynamic tomography offers the capability to constrain seismic anisotropy. The solid red lines indicate the true structures. Geodynamic tomography 2089 Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Figure 10. Cross-sectional view in the yz-plane of the radial anisotropy ξ inferred from (a) true model and (b) mean model. Radial anisotropy is often used as a proxy to infer flow orientation. A ξ>1 (positive radial anisotropy) is often interpreted as horizontal flow. A ξ<1 (negative radial anisotropy) on the other hand, pertains to vertical flow. A ξ = 1 indicates the absence of radial anisotropy. The cross-sections are taken at the centre of the x-axis.

Figure 11. Cross-sectional view in the yz-plane of the percentage of total anisotropy (i.e. norm of Sij) inferred from (a) true model and (b) mean model. The absence of anisotropy at the centre corresponds to a region of minimal deformation for the cold and highly viscous anomaly. The cross-sections are taken at the centre of the x-axis. the same output as obtained from D-Rex, one has to reduce the We tested the convergence of the Markov chain by plotting the value of E. Indeed, reducing the viscosity of the material allows for estimates for data errors with MC steps. For further details, refer to a stronger deformation. The resulting percentage of total anisotropy Appendix C. from both figures are nearly identical. Fig. 10 shows the presence of positive radial anisotropy at the bottom, indicating horizontal flow. Due to the imposition of free-slip boundary conditions combined 3.2 Inversion for multiple spherical anomalies with zero normal velocities imposed on all surfaces, the flow at the This section covers the inversion for ten spherical temperature bottom of the box is oriented nearly horizontally. The negative radial anomalies with different properties (i.e. temperature Tc and radius anisotropy we observe implies vertical flow (see Fig. 10 caption for R), positioned randomly in 3-D space. Such parametrization scheme details). This is a result of convection cells forming at the sides of may be essential to represent anomalies with complex shapes (e.g. the anomaly as it sinks. At the top of the anomaly, negative radial subducting slab) using a collection of several spheres with differ- anisotropy also indicates vertical flow due to downwelling. Finally ent characteristics. The synthetic data ares generated from a true and as we expect, radial anisotropy at the middle is nearly unity due temperature model consisting of ten spherical anomalies as well. to the presence of the more viscous anomaly. The difference in the We compare the true temperature model with the mean temperature structures may be attributed to the following: (1) imperfections of models obtained from isotropic and anisotropic inversions (Fig. 12). the forward model used in the inversion; and (2) information loss Even with this much more complex structure, we are able to recover related to data sensitivity and data noise. the main features of the temperature field. Also, as in the test of 2090 J.K. Magali et al. Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Figure 12. Isovolumetric view of the temperature fields. Left: true temperature field. Middle: mean temperature field from isotropic inversion. Right: mean temperature field from anisotropic inversion.

Section 3, anisotropic inversion better recovers the structure than may only be feasible at the global scale due to boundary effects. It isotropic inversion. Posterior uncertainties are represented in Fig. 13 should still be possible to apply this technique at the regional scale, and support this observation. However, some differences with the but the structure of interest should be far from the borders of the exact true structure remain, even using anisotropic inversion. Sur- region considered in order to avoid these boundary effects. Another face waves are long-period observations and hence, small and sharp simple yet effective parametrization would be to invert for constant thermal anomalies may not be resolved. Other contributing factors parameters (e.g. density and viscosity) within geometrical blocks involve the very nature of the tomographic problem itself as enu- defined from aprioriinformation regarding the tectonics of the merated earlier (e.g. data and modelling errors). region (Baumann et al. 2014). In general, the quality of the results In Fig. 14, we choose one depth profile to show the 1-D marginal will depend on the choice of the model parameters, and the prior posterior probability densities for temperature, radial anisotropy information available for the region of interest. and azimuthal anisotropy. The dashed black lines represent the true model. Based on the recovered profiles, anisotropic inver- sion resolves temperature better than the isotropic case again due 4.1.1 Neural network-based approach to texture evolution to the complementing information brought by geodynamic con- The computational demands of direct sampling techniques such as straints. Radial and azimuthal anisotropy still appears to be tightly McMC is high, as it requires evaluating the forward model a large constrained; however with some notable deviations from the true number of times. Among all routines involved in the forward model, model. calculating CPO anisotropy proved to be the most costly. We there- fore devised a surrogate model that computes texture evolution via a neural network, thus reducing the computation time by three orders 4 DISCUSSION of magnitude compared to D-Rex (see Appendix B for absolute computation times of both methods). 4.1 Additional comments on the method However, the surrogate model introduces theoretical errors, Model parametrization: the goal of this study was to test the method which can be reduced by using a network architecture or a training in the most simple cases, and we acknowledge that our parametriza- procedure more adapted to the problem at hand. More accurate pre- tion of the temperature field in terms of a sum of spherical anoma- dictions could be obtained by using a larger training data set, but lies is simplistic. However, such parametrization can be applied this has a higher initial computational cost. We observed that the to invert for more complex geometries such as a detached slab, a surrogate model does not generalize well. It has been trained for homogeneous plume, or upper-mantle structures beneath cratons. a specific type of flow (convective flows due to spherical tempera- A step further will be to test more realistic approaches. One pos- ture anomalies), and thus provides correct predictions only for flow sible alternative parametrization is the use of initial temperature models of the same nature. However, only these specific flow types models inferred from isotropic tomography, and an iterative update are tested in the McMC scheme, and it is therefore not necessary of the structure based on the anisotropy signature at the surface here to have a general neural network that applies to any type of (i.e. anisotropic surface wave dispersion curves). This, however, flows. Geodynamic tomography 2091 Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Figure 13. Isovolumetric view of the standard deviations around the mean temperature models. Left: standard deviation for the isotropic inversion. Right: standard deviation for the anisotropic inversion.

The success of our synthetic tests is in some ways a proof of the the surrogate models. If the distribution of residuals is approximated quality of the neural network. The inverted anisotropic seismic data as a normal distribution, theoretical errors can be accounted for in sets were calculated using the exact D-Rex model. Therefore, any the likelihood function (Hansen et al. 2014b). However, the size of errors introduced by the network would manifest themselves by pro- the residual vector may not be large enough to properly represent ducing a poor fit to the observed data. These theoretical errors have the statistics of errors. Here instead, we used an MLE to implicitly been quantified and accounted for in the Bayesian inversion (see account for these theoretical errors (Dettmer et al. 2007). Section 2.3.2). If we want to treat another problem, such as a sink- ing slab with complex geometry, one needs to re-train the surrogate model for the specific parametrization and prior distribution used. 4.1.2 The data A possible future avenue of geodynamic tomography that is inde- In this work, we assume that the measurement errors in the pendent of this specific step would be to directly parametrize mantle data are uncorrelated. In reality however, surface wave disper- flow, and build a family of expected convection patterns (together sion measurements are inherently smooth, and correlated both in with their predicted anisotropy) to investigate flow patterns under- space and frequency. A simple improvement when modelling noise neath mid-ocean ridges and subduction zones. Such parametrization can be made by introducing a function that varies with period can be easily extended to the global scale by treating these patterns while still maintaining the assumption of uncorrelated errors, as in terms of source and sink models derived from prescribed plate in the work of Ravenna & Lebedev (2017). One may proceed velocities (Bercovici 1995). a step further by constructing a covariance matrix of data noise, The Bayesian formulation is a practical tool to quantify and ac- more importantly when working on highly spatially correlated data count for the theoretical errors introduced by the parametrization sets. choice and the surrogate model. Statistics of these errors can be It is also worth mentioning that the method is not limited to the studied by comparing responses obtained with the true forward and use of a single data type (i.e. surface wave measurements) to ef- 2092 J.K. Magali et al. Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Figure 14. Upper panel: comparison between isotropic and anisotropic inversion. Probability density plots of temperature with depth. The profiles are taken nearly through the centre of the sphere. Lower panel: anisotropic inversion: probability density plots of radial anisotropy, peak-to-peak azimuthal anisotropy and its fast axis with depth. All profiles correspond to the temperature profile above. The solid red lines indicate the true structures. fectively constrain the patterns of upper-mantle deformation. This approach. Such strategies have already been successfully imple- calls for the inclusion of other data types such as gravity anomalies, mented to invert for the 3-D density structure of the mantle (Ricard surface topography and/or surface velocities in a joint or separate & Wuming 1991). Geodynamic tomography 2093

4.2 Physical assumptions anisotropy has not yet been formulated cohesively with thermody- namic models, let alone casting it in an inverse problem. The trade-off between physical complexity and computational cost In general, intrinsic anisotropy in the upper mantle results from is evident in every geophysical problem considered. In this work, complex deformation processes, which depend on a plethora of we chose to decrease the computational cost to massively explore physical parameters that may be linked to one another. Unlike con- the parameter space (using an inverse problem formulation) but at ventional tomographic techniques, the elastic structure recovered the price of using simplified physical assumptions. in our scheme directly depends on the assumptions made on these upper-mantle processes. As an example, one would expect that the 4.2.1 Nature of the flow model inclusion of enstatite in our models would dilute the overall ampli- tude of anisotropy in surface waves. In addition, inversion results We assumed that the flow is in steady state in order to trace the flow depend on control parameters for CPO modelling such as the choice streamlines, which is a pre-requisite to compute CPO anisotropy. of the slip systems of olivine. For the moment, the value of these Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021 However, this may not be the case in regions where flow appears to parameters have been chosen ad hoc, using current available knowl- be time-dependent such as migrating trenches and mid-ocean ridges edge mostly originating from laboratory experiments, and thus can (Heuret & Lallemand 2005;Masalu2007). A time-dependent flow be viewed as prior (regularization). Ultimately, the flexibility of could be implemented by accounting for the evolution of the sur- Bayesian inference would allows us to treat these parameters as face tectonics (Ricard et al. 1993) and the retrodiction of internal unknown parameters to be inverted for in geodynamic tomography. heterogeneities (Bunge et al. 2003; Steinberger et al. 2004). Never- theless, steady-state assumption is still valid in some places such as intra-oceanic regions where flow has been observed to be in steady state over the last 40 Myr (Becker et al. 2003, 2006). 5 CONCLUSION Another limiting factor is the imposition of arbitrary boundary We have laid the groundwork for geodynamic tomography, a novel conditions on the sides of the model domain which strongly impact approach that involves constraints from geodynamic modelling to the nature of the flow. Note that the boundary conditions could be invert seismic surface waves. Imposing these geodynamic con- treated as an unknown parameter to be inverted for. An obvious straints reduces the number of model parameters to a single scalar way to address this issue is also to work at the global scale. In this field (i.e. temperature) and one scalar variable (i.e. activation energy case, a fast and reliable method to compute geodynamic flow in a for viscosity). The inverse problem is cast using Bayesian inference spherical Earth is indispensable. To cite an example, semi-analytical where we directly sample the model space using McMC algorithm. circulation models such as that of Hager & O’Connell (1981)can Here, instantaneous flow, deformation history, and finally seismic be computed from simple density distributions assuming no lateral anisotropy are computed in our forward problem. The model space variations in viscosity. However, the latter may not render a rea- is reduced further by parametrizing the temperature field as a sum sonable assumption within the context of geodynamic tomography of spherical temperature anomalies with variable position, size and since lateral viscosity variations affect the flow significantly, and temperature. thus may also strongly influence the resulting anisotropy. We tested geodynamic tomography in simple cases, where In the context of inverse modelling, the inclusion of lateral viscos- we successfully recovered synthetic 3-D temperature fields, by ity variations is indeed computationally more challenging. However, jointly inverting fundamental mode anisotropic Rayleigh wave and it remains attainable by performing these calculations in a coarser isotropic Love wave phase velocities. In the process, we are also able grid to obtain the general pattern of the flow. This step can be fol- to constrain the complete deformation pattern, to provide a quanti- lowed by interpolating the coarse grid solution on a finer grid prior tative interpretation of seismic anisotropy in the mantle. Given the to the computation of CPO. Using iterative approaches to flow cal- Bayesian formulation, one may express the ensemble of temper- culations, another practical approach is to degrade the accuracy of ature models, and any implicitly computed variables (such as de- the solution should convergence be an impediment. When cast in formation or anisotropy) as posterior probability distributions, and a Bayesian formulation, the modelling error due the approximation quantify their associated uncertainties. Geodynamic tomography is of the flow can be accounted for in the inversion process, similar to therefore a potentially powerful technique to study the structure of how the errors due to the ANN were dealt with (see Section 2.3.2). the upper mantle, and interpret seismic observations in terms of Consequently, texture evolution modelling at the global scale could mantle deformation patterns. reasonably be achieved from flows of this nature. The availability of global surface wave maps on the other end should thus make geodynamic tomography feasible at the global scale. ACKNOWLEDGEMENTS We thank Yanick Ricard for his valuable comments on the 4.2.2 Composition of the mantle manuscript. This work was funded by the European Union’s Horizon Here, we assumed that the composition of the mantle to be olivine, 2020 research and innovation programme under grant agreement no. with an A-type crystal fabric, corresponding to dry upper-mantle 716542. conditions. In the real Earth, seismic wave velocities not only de- pend on temperature and pressure variations, but also on the com- positional structure of the minerals. Recently, self-consistent ther- DATA AVAILABILITY modynamic models have already been incorporated in seismic in- version schemes to interpret tomographic images in terms of mantle The code underlying the inversion scheme, the instantaneous flow composition (Ricard et al. 2005; Cammarano et al. 2009). While computation, as well as the neural networks will be shared upon the bulk properties (i.e. seismic wave speeds) obtained from Gibbs reasonable request to the authors. No new data were generated or minimization are isotropic, to our knowledge, deformation-induced analysed in support of this research. 2094 J.K. Magali et al.

REFERENCES Connolly, J.A., 2005. Computation of phase equilibria by linear program- Adam, J.M.-C. & Lebedev, S., 2012. Azimuthal anisotropy beneath South- ming: a tool for geodynamic modeling and its application to subduction ern Africa from very broad-band surface-wave dispersion measurements, zone decarbonation, Earth planet. Sci. Lett., 236(1–2), 524–541. Geophys. J. Int., 191(1), 155–174. Conway, D., Alexander, B., King, M., Heinson, G. & Kee, Y., 2019. Inverting Albers, M., 2000. A local mesh refinement multigrid method for 3-d convec- magnetotelluric responses in a three-dimensional earth using fast forward tion problems with strongly variable viscosity, J. Comput. Phys., 160(1), approximations based on artificial neural networks, Comput. Geosci., 127, 126–150. 44–52. Babuska, V. & Cara, M., 1991. Seismic Anisotropy in the Earth, Vol. 10, Debayle, E., Kennett, B. & Priestley, K., 2005. Global azimuthal seismic Springer Science & Business Media. anisotropy and the unique plate-motion deformation of australia, Nature, Baumann, T.S., Kaus, B.J. & Popov, A.A., 2014. Constraining effective 433(7025), 509, doi:10.1038/nature03247. rheology through parallel joint geodynamic inversion, Tectonophysics, Deschamps, F., Lebedev, S., Meier, T. & Trampert, J., 2008. Azimuthal 631, 197–211. anisotropy of Rayleigh-wave phase velocities in the east-central united

Becker, T.W. & Boschi, L., 2002. A comparison of tomographic states, Geophys. J. Int., 173(3), 827–843. Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021 and geodynamic mantle models, Geochem. Geophys. Geosyst., 3(1), Dettmer, J., Dosso, S.E. & Holland, C.W., 2007. Uncertainty estimation doi:10.1029/2001GC000168. in seismo-acoustic reflection travel time inversion, J. acoust. Soc. Am., Becker, T.W., Kellogg, J.B., Ekstrom,¨ G. & O’Connell, R.J., 2003. Compar- 122(1), 161–176. ison of azimuthal seismic anisotropy from surface waves and finite strain Ekstrom,¨ G., 2011. A global model of love and rayleigh surface wave dis- from global mantle-circulation models, Geophys. J. Int., 155(2), 696–714. persion and anisotropy, 25–250 s, Geophys. J. Int., 187(3), 1668–1686. Becker, T.W., Chevrot, S., Schulte-Pelkum, V. & Blackman, D.K., 2006. Estey, L.H. & Douglas, B.J., 1986. Upper mantle anisotropy: a preliminary Statistical properties of seismic anisotropy predicted by upper man- model, J. geophys. Res.: Solid Earth, 91(B11), 11393–11406. tle geodynamic models, J. geophys. Res.: Solid Earth, 111(B8), Faccenda, M. & Capitanio, F., 2013. Seismic anisotropy around subduc- doi:10.1029/2005JB004095. tion zones: insights from three-dimensional modeling of upper mantle Becker, T.W.,Kustowski, B. & Ekstrom,¨ G., 2008. Radial seismic anisotropy deformation and sks splitting calculations, Geochem. Geophys. Geosyst., as a constraint for upper mantle rheology, Earth planet. Sci. Lett., 267(1– 14(1), 243–262. 2), 213–227. Ferreira, A.M., Faccenda, M., Sturgeon, W., Chang, S.-J. & Schardong, L., Becker, T.W., Conrad, C.P., Schaeffer, A.J. & Lebedev, S., 2014. Origin of 2019. Ubiquitous lower-mantle anisotropy beneath subduction zones, Nat. azimuthal seismic anisotropy in oceanic plates and mantle, Earth planet. Geosci., 12(4), 301–306. Sci. Lett., 401, 236–250. Gallego, A., Ito, G. & Dunn, R., 2013. Investigating seismic anisotropy Beghein, C., Yuan, K., Schmerr, N. & Xing, Z., 2014. Changes in seismic beneath the reykjanes ridge using models of mantle flow, crystallographic anisotropy shed light on the nature of the gutenberg discontinuity, Science, evolution, and surface wave propagation, Geochem. Geophys. Geosyst., 343(6176), 1237–1240. 14(8), 3250–3267. Bercovici, D., 1995. A source-sink model of the generation of plate tectonics Gerya, T.V., 2010. Introduction to Numerical Geodynamic Modeling, Camb- from non-Newtonian mantle flow, J. geophys. Res.: Solid Earth, 100(B2), dridge University Press. 2013–2030. Hager, B.H. & O’Connell, R.J., 1981. A simple global model of plate dy- Bishop, C.M., 1995. Neural Networks for Pattern Recognition, Oxford Uni- namics and mantle convection, J. geophys. Res.: Solid Earth, 86(B6), versity Press. 4843–4867. Bodin, T., Leiva, J., Romanowicz, B., Maupin, V. & Yuan, H., 2016. Imag- Hall, C.E., Fischer, K.M., Parmentier, E. & Blackman, D.K., 2000. The ing anisotropic layering with Bayesian inversion of multiple data types, influence of plate motions on three-dimensional back arc mantle flow and Geophys. J. Int., 206(1), 605–629. shear wave splitting, J. geophys. Res.: Solid Earth, 105(B12), 28009– Box, G.E. & Tiao, G.C., 2011. Bayesian Inference in Statistical Analysis, 28033. Vol. 40, John Wiley & Sons. Hansen, L.N., Zhao, Y.-H., Zimmerman, M.E. & Kohlstedt, D.L., 2014a. Brandt, A., 1982. Guide to multigrid development, Lect. Notes Math., 960, Protracted fabric evolution in olivine: Implications for the relationship 220–312. among strain, crystallographic fabric, and seismic anisotropy, Earth Bunge, H.-P., Hagelberg, C. & Travis, B., 2003. Mantle circulation models planet. Sci. Lett., 387, 157–168. with variational data assimilation: inferring past mantle flow and structure Hansen, T.M. & Cordua, K.S., 2017. Efficient monte carlo sampling of from plate motion histories and seismic tomography, Geophys. J. Int., inverse problems using a neural network-based forward—applied to gpr 152(2), 280–301. crosshole traveltime inversion, Geophys. J. Int., 211(3), 1524–1533. Burgos, G., Montagner, J.-P., Beucler, E., Capdeville, Y., Mocquet, A. & Hansen, T.M., Cordua, K.S., Jacobsen, B.H. & Mosegaard, K., 2014b. Drilleau, M., 2014. Oceanic lithosphere-asthenosphere boundary from Accounting for imperfect forward modeling in geophysical inverse surface wave dispersion data, J. geophys. Res.: Solid Earth, 119(2), 1079– problems—exemplified for crosshole tomography, Geophysics, 79(3), 1093. H1–H21. Bystricky, M., Kunze, K., Burlini, L. & Burg, J.-P., 2000. High shear strain Hastings, W., 1970. Monte carlo sampling methods using markov chains of olivine aggregates: rheological and seismic consequences, Science, and their applications, Biometrika, 57(1), 97–109. 290(5496), 1564–1567. Hedjazian, N. & Kaminski, E., 2014. Defining a proxy for the interpretation Cammarano, F., Romanowicz, B., Stixrude, L., Lithgow-Bertelloni, C. & of seismic anisotropy in non-newtonian mantle flows, Geophys. Res. Lett., Xu, W.,2009. Inferring the thermochemical structure of the upper mantle 41(20), 7065–7072. from seismic data, Geophys. J. Int., 179(2), 1169–1185. Heuret, A. & Lallemand, S., 2005. Plate motions, slab dynamics and back- Chang, S.-J., Ferreira, A.M., Ritsema, J., van Heijst, H.J. & Woodhouse, J.H., arc deformation, Phys. Earth planet. Inter., 149(1–2), 31–51. 2014. Global radially anisotropic mantle structure from multiple datasets: Hulbert, C., Rouet-Leduc, B., Johnson, P.A., Ren, C.X., Riviere,` J., Bolton, a review, current challenges, and outlook, Tectonophysics, 617, 1–19. D.C. & Marone, C., 2019. Similarity of fast and slow earthquakes illumi- Chang, S.-J., Ferreira, A.M., Ritsema, J., Heijst, H.J. & Woodhouse, J.H., nated by machine learning, Nat. Geosci., 12(1), 69–74. 2015. Joint inversion for global isotropic and radially anisotropic mantle Kaminski, E.´ & Ribe, N.M., 2002. Timescales for the evolution of seismic structure including crustal thickness perturbations, J. geophys. Res.: Solid anisotropy in mantle flow, Geochem. Geophys. Geosyst., 3(8), 1–17. Earth, 120(6), 4278–4300. Kaminski, E., Ribe, N.M. & Browaeys, J.T., 2004. D-rex, a program for cal- Connolly, J., 2009. The geodynamic equation of state: what and how, culation of seismic anisotropy due to crystal lattice preferred orientation Geochem. Geophys. Geosyst., 10(10), doi:10.1029/2009GC002540. in the convective upper mantle, Geophys. J. Int., 158(2), 744–752. Geodynamic tomography 2095

Kaufl,¨ P., Valentine, A.P., O’Toole, T.B. & Trampert, J., 2014. A framework Plomerova,´ J., Kouba, D. & Babuska,ˇ V., 2002. Mapping the lithosphere– for fast probabilistic centroid-moment-tensor determination—inversion asthenosphere boundary through changes in surface-wave anisotropy, of regional static displacement measurements, Geophys. J. Int., 196(3), Tectonophysics, 358(1–4), 175–185. 1676–1693. Ravenna, M. & Lebedev, S., 2017. Bayesian inversion of surface-wave data Kopke,¨ C., Irving, J. & Elsheikh, A.H., 2018. Accounting for model error for radial and azimuthal shear-wave anisotropy, with applications to cen- in bayesian solutions to hydrogeophysical inverse problems using a local tral mongolia and west-central italy, Geophys. J. Int., 213(1), 278–300. basis approach, Adv. Water Res., 116, 195–207. Ribe, N.M., 1989. Seismic anisotropy and mantle flow, J. geophys. Res.: Lassak, T.M., Fouch, M.J., Hall, C.E. & Kaminski, E.,´ 2006. Seismic Solid Earth, 94(B4), 4213–4223. characterization of mantle flow in subduction systems: can we re- Ribe, N.M., 1992. On the relation between seismic anisotropy and finite solve a hydrated mantle wedge? Earth planet. Sci. Lett., 243(3–4), strain, J. geophys. Res.: Solid Earth, 97(B6), 8737–8747. 632–649. Ricard, Y. & Wuming, B., 1991. Inferring the viscosity and the 3-d den- Lebedev, S., Meier, T. & van der Hilst, R.D., 2006. Asthenospheric flow sity structure of the mantle from geoid, topography and plate velocities,

and origin of volcanism in the baikal rift area, Earth planet. Sci. Lett., Geophys. J. Int., 105(3), 561–571. Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021 249(3–4), 415–424. Ricard, Y., Richards, M., Lithgow-Bertelloni, C. & Le Stunff, Y., 1993. LeCun, Y., Bengio, Y. & Hinton, G., 2015. Deep learning, Nature, A geodynamic model of mantle density heterogeneity, J. geophys. Res.: 521(7553), 436, doi:10.1038/nature14539. Solid Earth, 98(B12), 21895–21909. Mainprice, D., 1990. A fortran program to calculate seismic anisotropy from Ricard, Y., Mattern, E. & Matas, J., 2005. Mineral physics in thermo- the lattice preferred orientation of minerals, Comput. Geosci., 16(3), 385– chemical mantle models, eds Hilst, R., Bass, J.D. & Matas Trampert, J., 393. Composition, Structure and Evolution of the Earth Mantle, AGU Mono- Masalu, D.C., 2007. Mapping absolute migration of global mid-ocean ridges graph, Vol. 160, pp. 283–300. since80matopresent,Earth Planets Space, 59(9), 1061–1066. Ritzwoller, M.H., Shapiro, N.M., Barmin, M.P. & Levshin, A.L., 2002. McKenzie, D., 1979. Finite deformation during fluid flow, Geophys. J. Int., Global surface wave diffraction tomography, J. geophys. Res.: Solid Earth, 58(3), 689–715. 107(B12), ESE–4, doi:10.1029/2002JB001777. Meier, U., Curtis, A. & Trampert, J., 2007. Global crustal thickness from Rumelhart, D.E., Hinton, G.E. & Williams, R.J., 1985. Learning internal neural network inversion of surface wave data, Geophys. J. Int., 169(2), representations by error propagation, Tech. Rep., California Univ San 706–722. Diego La Jolla Inst for Cognitive Science. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. & Teller, Samuel, H., 2012. Time-domain parallelization for computational geody- E., 1953. Equation of state calculations by fast computing machines, J. namics, G-cubed, doi:10.1029/2011GC003905. Chem. Phys., 21(6), 1087–1092. Samuel, H., 2018. A deformable particle-in-cell method for advective Miller, M.S. & Becker, T.W., 2012. Mantle flow deflected by interactions transport in geodynamic modelling, Geophys. J. Int., 214, 1744–1773, between subducted slabs and cratonic keels, Nat. Geosci., 5(10), 726, doi:10.1093/gji/ggy231. doi:10.1038/ngeo1553. Shahnas, M., Yuen, D. & Pysklywec, R., 2018. Inverse problems in geody- Moghadas, D., Behroozmand, A.A. & Christiansen, A.V., 2020. Soil elec- namics using machine learning algorithms, J. geophys. Res.: Solid Earth, trical conductivity imaging using a neural network-based forward solver: 123(1), 296–310. applied to large-scale bayesian electromagnetic inversion, J. appl. Geo- Shapiro, N. & Ritzwoller, M., 2002. Monte-carlo inversion for a global phys., 104012,doi:10.1016/j.jappgeo.2020.104012. shear-velocity model of the crust and upper mantle, Geophys. J. Int., Montagner, J. & Nataf, H., 1988. Vectorial tomography. Part I: theory, 151(1), 88–105. Geophys. J. Int., 94, 295–307. Shen, W., Ritzwoller, M.H., Schulte-Pelkum, V. & Lin, F.-C., 2012. Joint Montagner, J.-P.,1994. Can seismology tell us anything about convection in inversion of surface wave dispersion and receiver functions: a Bayesian the mantle? Rev. Geophys., 32(2), 115–137. Monte-Carlo approach, Geophys. J. Int., 192(2), 807–836. Montagner, J.-P.& Anderson, D.L., 1989. Petrological constraints on seismic Sieminski, A., Liu, Q., Trampert, J. & Tromp, J., 2007. Finite-frequency anisotropy, Phys. Earth planet. Inter., 54(1–2), 82–105. sensitivity of surface waves to anisotropy based upon adjoint methods, Montagner, J.-P. & Jobert, N., 1988. Vectorial tomography—II. Application Geophys. J. Int., 168(3), 1153–1174. to the indian ocean, Geophys. J. Int., 94(2), 309–344. Smith, A. F.M., 1991. Bayesian computational methods, Philos. Trans. R. Montagner, J.-P. & Nataf, H.-C., 1986. A simple method for inverting the Soc. Lond. Ser. A: Phys. Eng. Sci., 337(1647), 369–386. azimuthal anisotropy of surface waves, J. geophys. Res.: Solid Earth, Smith, M.L. & Dahlen, F., 1973. The azimuthal dependence of Love and 91(B1), 511–520. Rayleigh wave propagation in a slightly anisotropic medium, J. geophys. Montagner, J.-P.& Tanimoto, T., 1990. Global anisotropy in the upper mantle Res., 78(17), 3321–3333. inferred from the regionalization of phase velocities, J. geophys. Res.: Steinberger, B., Sutherland, R. & O’connell, R.J., 2004. Prediction of Solid Earth, 95(B4), 4797–4819. emperor-hawaii seamount locations from a revised model of global plate Mosegaard, K. & Sambridge, M., 2002. Monte carlo analysis of inverse motion and mantle flow, Nature, 430(6996), 167–173. problems, Inverse Probl., 18(3), R29, doi:10.1088/0266-5611/18/3/201. Stixrude, L. & Lithgow-Bertelloni, C., 2011. Thermodynamics of mantle Mosegaard, K. & Tarantola, A., 1995. Monte carlo sampling of solu- minerals—II. Phase equilibria, Geophys. J. Int., 184(3), 1180–1213. tions to inverse problems, J. geophys. Res.: Solid Earth, 100(B7), Sturgeon, W., Ferreira, A.M., Faccenda, M., Chang, S.-J. & Schardong, 12431–12447. L., 2019. On the origin of radial anisotropy near subducted slabs in the Nettles, M. & Dziewonski,´ A.M., 2008. Radially anisotropic shear velocity midmantle, Geochem. Geophys. Geosyst., 20(11), 5105–5125. structure of the upper mantle globally and beneath North America, J. Takeuchi, H. & Saito, M., 1972. Seismic surface waves, Methods Comput. geophys. Res.: Solid Earth, 113(B2), doi:10.1029/2006JB004819. Phys., 11, 217–295. Nicolas, A. & Christensen, N.I., 1987. Formation of anisotropy in upper Tommasi, A., Mainprice, D., Canova, G. & Chastel, Y., 2000. Viscoplastic mantle peridotites—a review, Compos. Struct. Dynam. Lithos.-Asthenos. self-consistent and equilibrium-based modeling of olivine lattice pre- Syst., 16, 111–123. ferred orientations: implications for the upper mantle seismic anisotropy, Panning, M.P. & Nolet, G., 2008. Surface wave tomography for azimuthal J. geophys. Res.: Solid Earth, 105(B4), 7893–7908. anisotropy in a strongly reduced parameter space, Geophys. J. Int., 174(2), Xie, J., Ritzwoller, M.H., Brownlee, S. & Hacker, B., 2015. Inferring the 629–648. oriented elastic tensor from surface wave observations: preliminary appli- Patankar, S.V.,1980. Numerical Heat Transfer and Fluid Flow, Hemisphere cation across the western united states, Geophys. J. Int., 201(2), 996–1021. Publishing Corporation, New York. Xie, J., Ritzwoller, M.H., Shen, W. & Wang, W., 2017. Crustal anisotropy Pedregosa, F. et al., 2011. Scikit-learn: machine learning in Python, J. Mach. across eastern Tibet and surroundings modeled as a depth-dependent tilted Learn. Res., 12, 2825–2830. hexagonally symmetric medium, Geophys. J. Int., 209(1), 466–491. 2096 J.K. Magali et al.

Xu, H. & Beghein, C., 2019. Measuring higher-mode surface wave dis- time steps. Each step contains one Lij matrix and one corresponding persion using a transdimensional bayesian approach, Geophys. J. Int., dt. Thus, each step has 10 independent components as inputs. The doi:10.1093/gji/ggz133. number of inputs in the neural network first layer is Nx = 2000 Yuan, K. & Beghein, C., 2013. Seismic anisotropy changes across upper (see eq. B1). The functions a1, a2 and a3 are known as activation mantle phase transitions, Earth planet. Sci. Lett., 374, 132–144. functions whose purpose are to introduce nonlinearity to the output Yuan, K. & Beghein, C., 2014. Three-dimensional variations in Love and of one neuron and to constrain its output to a desired range and Rayleigh wave azimuthal anisotropy for the upper 800 km of the mantle, J. geophys. Res.: Solid Earth, 119(4), 3232–3255. distribution. Here, we choose them as default rectified linear unit Zhang, S. & Karato, S.-i., 1995. Lattice preferred orientation of functions to allow faster convergence (Pedregosa et al. 2011). Lastly, olivine aggregates deformed in simple shear, Nature, 375(6534), 774, the w’s refer to the weights which reflect the significance of a given doi:10.1038/375774a0. neuron. To build a suitable surrogate model to D-Rex, the weights w1, w2 3 and w have to be adjusted to the proper value. This is performed Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021 APPENDIX A: PARAMETRIZING by minimizing a loss function which is the difference between the TEMPERATURE WITH SPHERICAL training outputs gCPO(X) and the output of the network itself gnn(X) ANOMALIES using a stochastic gradient descent algorithm (Rumelhart et al. 1985). Formally, the loss function is a squared L norm and takes For a given anomaly, we define a basis function corresponding to 2 the form: that anomaly using eq. (3). The negative sign indicates that the 1 λ anomaly is colder than the background temperature if Tc is positive , ,w =  − 2 + w2 . Loss(Y Y ) Y Y 2 2 (B2) (a negatively buoyant anomaly). Should Tc be negative, then the 2 2 anomaly adds up with the background temperature resulting to a The second term constrains the weights to avoid data overfitting, positively buoyant anomaly. The function is designed such that: where α is a regularization parameter that quantifies the degree of (1) when r − r0 > R and tanh returns a value of nearly one, Ls Ls penalization. The weights are updated iteratively by subtracting its then the temperature is just the background temperature. (2) When current value from the gradient of the loss function with respect to r − r0 = R , then the temperature at just half of the radius of the Ls Ls the weights: Tc r0 R anomaly is equal to Tbackground − . (3) Finally, when r − < 2 Ls Ls w = w − ∇ , and tanh returns a value of minus one, this corresponds to the i+1 i lossi (B3) − temperature at the centre of the anomaly Tbackground Tc. where is the learning rate which controls the step size for updating β Here, controls the sharpness of the temperature gradient and the weights, and i is the iteration step. The training achieves conver- β is held at a fixed value. Choosing a very large value for results gence when the tolerance value tol for the loss function is reached. in a sharp temperature gradient (see Fig. A1). In addition, opting However, the algorithm may also be stopped once the maximum for a smooth function such as hyperbolic tangent avoids very sharp number of iterations is reached. viscosity contrasts when computing for the flow. The advantage of The network is trained by considering 30 flow models, each building a basis set is to reduce the number of model parameters. In comprising M spherical anomalies to drive thermal convection. conventional inversion schemes of scalar fields, we usually invert Each sphere has a random position and size, and can either be for a scalar at a given grid point. Hence, the number of model positively or negatively buoyant. This is to ensure that each flow parameters depends on the grid size. In a cube, this would result to path we define is unique enough so that the network can learn 3 3 N model parameters to constrain, where N is the size of the 3-D a variety of input–output combinations. Here, we acknowledge block. In our case, this gives us 5M parameters to be inverted, where that the choice of flow models is not enough to be able to pre- M is the number of spherical anomalies. Finally, we define the 3-D dict seismic anisotropy in the most general case. However, in this scalar temperature field as the sum of the background temperature work, we only attempt to predict anisotropy for a small class of and the spherical anomalies as shown in eq. (1). flow models (convection due to a collection of spherical temper- ature anomalies). Since only such classes of models are tested, we can restrict ourselves to this type of model when training the APPENDIX B: A NEURAL network. NETWORK-BASED APPROXIMATION TO One training input corresponds to one deformation history along D-REX a streamline whereas one training output corresponds to one stiff-

In this work, we use an ANN as a surrogate model gnn, to approxi- ness matrix computed with D-Rex. The training set can be repre- mate the forward operator for texture evolution gCPO. We consider sented as a matrix containing the stiffness coefficients and the input a simple architecture of feedforward neural network called a multi- parameters given by [Yl = 1, 21, Xi=1,Nx]n=1,Ntrain where Ntrain is the layer perceptron (MLP) with two hidden layers similar to the work number of training sets. Thus, the training inputs are of the size of LeCun et al. (2015) defined by: [2000, Ntrain] and the training outputs are of the size [21, Ntrain].     In this problem, 163 input–output combinations for each 3-D flow Nh1 Nh2 Nx model are used to train the network. In total, there are M = 1.2288 g (X ) = Y = a w3 a w2 a w1 X . (B1) nn l l 1 kl 2 jk 3 ij i × 105 training sets for the network to learn from. k=1 j=1 i=1 We adopt the Python package scikit-learn to train the network

The output Yl of the MLP is an estimate of the 21 independent (Pedregosa et al. 2011). Table B1 below summarizes the parameters coefficients of the stiffness tensor where l is the index pertaining used to design and build the network. to one element in the tensor. Nh1 and Nh2 are the sizes of the two The network is tested by considering a 3-D deformation due to hidden layers considered, and Nx is the size of the input vector. We a sinking anomaly that is not part of the training input. Table B2 design the network such that the input X contains the deformation shows the computation times for computing anisotropy from both history along a flow streamline. The streamline is divided into 200 D-Rex and neural networks. The relative speed-up of using neural Geodynamic tomography 2097 Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Figure A1. 1-D temperature profiles with depth for different values of R and β. Left: β = 5. Middle: β = 20. Right: β = 50. Here, we consider a spherical anomaly with Tc = 800 K located at the centre of the 3-D volume. The plots refer to 1-D depth profiles of temperature through the middle of the sphere at specified values of R and β.Thex-andy-axes correspond to temperature and depth, respectively. Based on our parametrization, increasing the value of R at constant β increases the size of the temperature anomaly. At constant R, the anomalies retain their respective sizes but the temperature gradient becomes sharper at increasing β. Thus, choosing an appropriate β is important so as to avoid sharp viscosity contrasts (since η depends on T) when computing flow. In our inversion, we choose to fix β = 20, and invert for R.

Figure A2. 1-D marginal distribution of the difference between gCPO(X)andgnn(X) in terms of the VTI and HTI-projected elastic tensor. networks is over three orders of magnitude compared to performing as well as for surface wave dispersion curves calculations. Each texture evolution calculations with D-Rex. For reference, we also routine in the forward problem has been executed in a serial fashion give the computation times for network training, flow modelling, for the sake of comparison. 2098 J.K. Magali et al.

Table B1. Neural network parameters.

Ntrain Nx Ny Nh1 Nh2 λ tol Max iterations 1.2288 × 105 2000 21 100 50 0.1 1.0 × 10−3 1.0 × 10−4 1000

Table B2. Computation times for each subroutine in the forward model. APPENDIX C: A SIMPLE TEST FOR Routine D-Rex ANN Flow Dispersion Training CONVERGENCE Time (s) 73919.83 21.55 6.6 119.63 603.85 Fig. C1 shows the noise estimate plotted against MC step in the one sphere case. The standard deviation of data noise is implicitly computed with MLE (see Section 2.3.2), and is simply given by

the level of data fit. The starting point for each plot is the iteration Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021 The elastic tensor computed from gnn is projected into both a VTI medium, thus having elastic parameters A, C, F, L and N,and at which anisotropic tomography commences. The trends exhibit radial anisotropy strength ξ, φ and η; and an HTI medium, with well-mixed random walk behaviours indicating that convergence has been achieved. This level of noise estimated by MLE represents parameters Gs, Gc, Bs and Bc. Aside from plotting the percentage of total anisotropy (as in Section 2.2.3), we compare the results further the combination of observational errors (white noise added to the with D-Rex by plotting 1-D marginal distributions of the residuals data), and theoretical errors (errors of the surrogate model used for of each seismic parameter. Each parameter contains a small bias texture evolution). very close to zero which is attributed to the minimization of the L2 loss function. Geodynamic tomography 2099 Downloaded from https://academic.oup.com/gji/article/224/3/2077/6019874 by INFU BIBLIO PLANETS user on 02 April 2021

Figure C1. Noise estimate with MC step for (a) Rayleigh waves, (b) Love waves, (c) c1 and (d) c2. Each coloured line plot is associated with one independent Markov chain. Solid green line indicates the standard deviation of random errors added to the data.