<<

Copyright by 2013

The Dissertation Committee for Yang Xue Certifies that this is the approved version of the following dissertation:

NOVEL STOCHASTIC INVERSION METHODS AND WORKFLOW FOR RESERVOIR CHARACTERIZATION AND MONITORING

Committee:

Mrinal K. Sen, Supervisor

Robert H. Tatham

Sergey Fomel

Kyle T. Spikes

Sanjay Srinivasan

Long

NOVEL STOCHASTIC INVERSION METHODS AND WORKFLOW FOR RESERVOIR CHARACTERIZATION AND MONITORING

by

Yang Xue, Dipl.-Ing.

Dissertation Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment

of the Requirements for the Degree of

DOCTOR OF PHILOSOPHY

The University of Texas at Austin December 2013

To my parents Musheng and Xue

To my husband Liqing

Acknowledgements

The course of my Ph. D studying and the writing of this dissertation would not have been possible without the supports from so many people as follows. First of all, I would like to express my deepest appreciations to my supervisor, Dr. Mrinal K. Sen, for his financial supports, helps and guidance during the entire Ph. D course. As a student from a different background, Dr. Sen gave me the opportunity to develop my research interest in the exploration geophysics, provided me the exciting research subjects and helped me explore ideas. Without his patient guidance, persistent helps, constant inspirations, and great efforts in editing this dissertation, this journey would not have brought me so much fun and this dissertation would never have shaped. I would also like to give special thanks to my committee members, Dr. Robert Tatham, Dr. Fomel Sergey, Dr. Kyle Spikes, Dr. Sanjay Srinivasan, and Dr. Jin for their time, energy and insightful comments provided valuable inputs to my dissertation. In addition, I sincerely thank Dr. Clark R. Wilson and Dr. Zong- Yang, who supported me in the first semester of study. My thanks also go to Dr. Paul Stoffa and Dr. Zhiwen for their kind discussions, from which I benefited a lot. I also sincerely thank Philip Guerrero, Margo Grace, Judy Sansom and Nancy Hard for your administrative assistances. Many thanks also go to Jackson School of Geosciences, sponsors of Edger Forum and Shell Oil Company for the financial support during my research. Special thanks go to quantitative reservoir management (QRM) team at Shell Oil Company, who financially supported my research for two year, offered me the exciting projects, allowed me to work on the projects at the Shell office for more than one year, provided me the excellent in- v

house software, hardware and many learning opportunities, and allows me to publish my work at Shell. I must thank Dr. Long Jin and Dr. Eduardo Jimenez again for their nice teaching, helpful guidance and supervising during my visit at Shell. It has been a lot of fun working with them and learning from them. Without their collaboration, the projects would not have gone through smoothly. I would also want to thank other team members and experts from Shell, Dr. Denial Weber, Dr. Javier Ferrandis, Dr. Jaap Leguijt, Dr. Paul Gelderblom, Dr. Tim Barker, Dr. Rocky Detomo, Dr. Jorge Lopez, Dr. Paul van den

Hoek, Dr. Detlef Hohl and Dr. Hans Potters for their constructive comments and generous help in my research. I also thank EAGE (European Association of Geoscientists and Engineers) for the permission to include chapter four and five of my dissertation, which were originally presented at EAGE conference meeting at London, 2013. I owe a debt of gratitude to my families for standing behind me all the time. My mother, Musheng Cao, and my father, Li Xue, always stresses the importance of education and provided me the opportunities of studying across three continents. These ten years experiences of studying and living abroard made my life quite different and so exciting. I am also grateful to my husband, Liqing Huang, not only for his endless love and tolerance, but also for his accompany and encouragement through the exploring journey. is the sources of my strength and he makes my life so colorful.

vi

Novel Stochastic inversion methods and workflow for reservoir characterization and monitoring

Yang Xue, PhD The University of Texas at Austin, 2013

Supervisor: Mrinal K. Sen

Reservoir models are generally constructed from seismic, well logs and other related datasets using inversion methods and geostatistics. It has already been recognized by the geoscientists that such a process is prone to non-uniqueness. Practical methods for estimation of uncertainty still remain elusive. In my dissertation, I propose two new methods to estimate uncertainty in reservoir models from seismic, well logs and well production data.

The first part of my research is aimed at estimating reservoir impedance models and their uncertainties from seismic data and well logs. This constitutes inverse problem, and we recognize that multiple models can fit the measurements. A deterministic inversion based on minimization of the error between the observation and forward modeling only provides one of the best-fit models, which is usually band-limited. A complete solution should include both models and their uncertainties, which requires drawing samples from the posterior distribution. A global optimization method called very fast simulated annealing (VFSA) is commonly used to approximate posterior distribution with fast convergence. Here I address some of the limitations of VFSA by developing a new stochastic inference method, named Greedy Annealed Importance Sampling (GAIS). GAIS combines VFSA with greedy importance sampling (GIS), which vii

uses a greedy search in the important regions located by VFSA to attain fast convergence and provide unbiased estimation. I demonstrate the performance of GAIS on post- and pre-stack data from real fields to estimate impedance models. The results indicate that GAIS can estimate both the expectation value and the uncertainties more accurately than using VFSA alone. Furthermore, principal component analysis (PCA) as an efficient parameterization method is employed together with GAIS to improve lateral continuity by simultaneous inversion of all traces.

The second part of my research involves estimation of reservoir permeability models and their uncertainties using quantitative joint inversion of dynamic measurements, including synthetic production data and time-lapse seismic related data. Impacts from different objective functions or different data sets on the model uncertainty and model predictability are investigated as well. The results demonstrate that joint inversion of production data and time-lapse seismic related data (water saturation maps here) reduces model uncertainty, improves model predictability and shows superior performance than inversion using one type of data alone.

viii

Table of Contents

List of Figures ...... xi

Chapter 1: Introduction ...... 1 1.1 Inverse problem in exploration geophysics ...... 1 1.2 Bayes theorem ...... 2 1.3 Methods for geophysical inversion ...... 5 1.4 Simultaneous seismic inversion ...... 9 1.5 Reservoir monitoring ...... 11

Chapter 2: Overview of stochastic inversion ...... 15 2.1 Importance sampling ...... 16 2.2 Markov Chain ...... 19 2.3 Metropolis sampling ...... 20 2.4 Global optimization methods ...... 22 2.4.1 Metropolis simulated annealing ...... 22 2.3.2 Very fast simulated annealing ...... 26 2.4.3 Genetic algorithm...... 29 2.4.4 Particle swarm optimization ...... 31 2.4.5 Summary of global optimization methods ...... 34 2.4 Joint approach (annealed importance sampling)...... 34 2.5 Current shortcomings and new objectives ...... 37

Chapter 3: Novel stochastic seismic inversion using Greedy Annealed Importance Sampling (GAIS) ...... 40 3.1 Introduction ...... 40 3.2 Background: greedy importance sampling ...... 44 3.3 Methodology: greedy annealed importance sampling (GAIS) ...... 52 3.4 Application of GAIS to seismic inversion ...... 53 3.5 Synthetic test of GAIS in seismic inversion ...... 56 3.6 Inversion of post-stack seismic data ...... 62 3.7 Inversion of pre-stack seismic data ...... 69 ix

3.8 Discussions and conclusions ...... 85

Chapter 4: Simultaneous stochastic seismic inversion using Principal Component Analysis (PCA) ...... 88 4.1 Introduction ...... 88 4.2 Principal component analysis ...... 90 4.3 Workflow of simultaneous inversion ...... 93 4.3.1 Sampling of training images ...... 93 4.3.2 Reconstruction of the original model space ...... 96 4.3.3 Model Perturbation ...... 97 4.3.4 PCA based simultaneous inversion ...... 98 4.4 Simultaneous inversion of post-stack seismic data ...... 100 4.5 Simultaneous inversion of pre-stack seismic data ...... 107 4.6 Discussions and conclusions ...... 120

Chapter 5: Reservoir monitoring using joint inversion of production data and time- lapse seismic related data ...... 122 5.1 Introduction ...... 122 5.2 Novel workflow for joint inversion ...... 127 5.3 Objective functions ...... 132 5.4 Synthetic test of history matching workflow ...... 135 5.4.1 Reference model ...... 135 5.4.2 History matching ...... 137 5.4.3 Model predictability ...... 146 5.5 Discussions and conclusions ...... 148

Chapter 6: Conclusions and future work ...... 149 6.1 Conclusions ...... 149 6.2 Future works ...... 151

Bibliography ...... 154

Vita 161

x

List of Figures

Figure 1.1: A schematic representation of the relationship between prior, likelihood

and posterior (Hong, 2008) ...... 4

Figure 2.1 A challenging situation for importance sampling with blue curve showing the proposal distribution and red curve showing target distribution. There are many unrepresented points and only a few high weighted points dominate the estimation (Schuurmans and Southey,

2000)...... 18

Figure 2.2: A pseudo FORTRAN code of Metropolis simulated annealing algorithm

(Sen and Stoffa, 2013) ...... 25

Figure 2.3: A temperature dependent Cauchy like distribution applied in model generation: more substantial perturbation (fatter tail) is generated in high

temperature than in low temperature...... 27

Figure 2.4: A pseudo FORTRAN code of VFSA (Sen and Stoffa, 2013) ...... 28

Figure 2.5: Process of single-point crossover...... 31

Figure 2.6: Flowchart for PSO (Shaw and Srivastava, 2007) ...... 33

Figure 3.1: Workflow of greedy importance sampling (Schuurmans and Southey,

2000)...... 47

xi

Figure 3.2: An example of sampling a Gaussian distribution using GIS: a) draw

samples m1, m2, …, m100 independently from a uniform prior

distribution; b), c) and d) expand each individual point mi to a block of

points {mi,1, mi,2, mi,3,…} with mi,1=mi by taking step size of 0.1 and climbing 100 steps until the local maximum of the probability density

has been reached, figure b) is associated with the point of m1 , figure c)

with m2 and m3, figure d) with m4. A similar process would be applied

to other initial samples of m5, …, m100...... 49

Figure 3.3: a) Gaussian with µ=0 and σ=3; b) histogram of samples drawn using multiple VFSA with a mean of 0.036 and standard deviation of 0.78; c) histogram of samples drawn using GIS with mean of -0.036 and

standard deviation of 2.82...... 50

Figure 3.4: Each initial point (red point outlined with black rectangular) drawn from uniform distribution is expanded to a block of points shown by blue dots

by downhill movement along axis parallel directions with a fixed step

length of 1...... 51

Figure 3.5: Synthetic post-stack seismic trace generated by convolution of reflectivity with a Ricker wavelet (a central frequency of 30Hz and a

sampling interval of 2ms)...... 56

Figure 3.6: a) Estimated expectation value of P impedance derived from GAIS (blue) compared with the result from VFSA (green), well log (red) and

initial model (magenta); b) relative standard deviation (normalized by

posterior mean) sampled from GAIS (blue) and from VFSA (green).58

xii

Figure 3.7: a) Reference seismic trace; b) synthetic seismic trace generated from the posterior mean of VFSA models; c) synthetic seismic trace generated from the posterior mean of GAIS models; d) residuals of synthetic seismic derived from VFSA; e) residuals of synthetic seismic derived

from GAIS...... 59

Figure 3.8: Histograms of sampled impedance models by GAIS at selected layers with the green lines indicating the posterior mean of estimation and the

red lines indicating the reference impedance model...... 60

Figure 3.9: Histograms of sampled impedance models by VFSA at selected layers (the same as in fig. 3.8) with the green lines indicating the posterior mean of estimation and the red lines indicating the reference impedance

model...... 61

Figure 3.10: Post-stack seismic data along a 2D line from HRS demo dataset

STRATA module...... 62

Figure 3.11: Extracted wavelet of seismic data from HRS ...... 63

Figure 3.12: Frequency spectrum of seismic data generated from HRS ...... 63 Figure 3.13: a) Initial P impedance model; b) inverted P impedance profile from HRS

strata...... 65

Figure 3.14: Estimated expectation value of P impedance profile from VFSA (a) and

from GAIS (b)...... 66

Figure 3.15: Normalized standard deviation: σ(Zp)/Zp of posterior impedance maps

drawn from VFSA (a) and from GAIS (b)...... 67

Figure 3.16: a) Synthetic 2D post-stack seismic derived from expected impedance model using GAIS; b) residuals of synthetic seismic derived from GAIS

subtracted from the observation...... 68 xiii

Figure 3.17: Pre-stack angle gather from HRS demo data in AVO module...... 70

Figure 3.18: Pre-stack seismic data with the angle of 24 degree along a 2D line from HRS demo data. The well log of P impedance marked as red curve in

figure 3.17 is located at the trace number of 40 in the figure 3.18. ..70

Figure 3.19: Extracted wavelets (top) and their frequency spectrum (bottom) of near

offset 3o -15o (red) and far offset 15o-24o (blue) from HRS...... 71

Figure 3.20: The well logs of P-, S-impedance (calculated from HRS based on Castagna equation and Gassmann fluid substitution), density, gamma

ray, resistivity and Vp/Vs ratio with top and base of the gas layer

marked from 628ms and 634ms in TWT...... 72

Figure 3.21: Comparison of initial model (magenta), estimated expectation value of P impedance model (a) and S impedance model (b) from VFSA (green) and from GAIS (blue) with the well logs (red). The gas layer is marked

by the black dotted line...... 74

Figure 3.22: Normalized standard deviation of posterior P impedance (a) and S

impedance models (b) from VFSA (green) and GAIS (blue)...... 75

Figure 3.23: a) Observed pre-stack seismic data at the well location; b) synthetic pre- stack seismic from GAIS at the well location; c) residuals between

synthetic seismic and observed seismic...... 76

Figure 3.24: a) Initial P impedance model along 2D line; b) initial S impedance

model along 2D line...... 77

Figure 3.25: a) Inverted P impedance profile; b) inverted S impedance profile; and c) inverted Vp/Vs ratio by HRS STRATA. The well is located at the trace 40 (marked by the black dotted line) with the gas layer marked in red.

...... 78 xiv

Figure 3.26: Estimated expectation value of P impedance model from a) VFSA; from b) GAIS. The well is located at the trace 40 (marked by the black dotted

line) with the gas layer marked in red...... 79

Figure 3.27: Estimated expectation value of S impedance model from a) VFSA; from b) GAIS. The well is located at the trace 40 (marked by the black dotted

line) with the gas layer marked in red...... 80

Figure 3.28: Estimated expectation value of Vp/Vs ratio from a) VFSA; from b) GAIS. The well is located at the trace 40 (marked by the black dotted

line) with the gas layer marked in red...... 81

Figure 3.29: Normalized standard deviation: σ(Zp)/Zp of posterior P impedance maps

drawn from VFSA (a) and from GAIS (b)...... 82

Figure 3.30: Normalized standard deviation: σ(Zs)/Zs of posterior S impedance maps

drawn from VFSA (a) and from GAIS (b)...... 83

Figure 3.31: a) Synthetic 2D post-stack seismic derived from expected impedance model using GAIS; b) residuals of synthetic seismic subtracted from the

observation...... 84

Figure 4.1: a) P impedance log at the along the TWT; b) spectral density of impedance logs; c) rescale range analysis to estimate Hurst coefficient.

The slope of the best-fit grey line gives the Hurst coefficient around 0.82. Two lines with slopes of 0.5 and 1.0 show the theoretical limit of

the Hurst coefficient. (copied from Srivastava and Sen, 2010) ...... 95

Figure 4.2: Workflow of PCA based simultaneous inversion along a 2D line using

GAIS...... 99

Figure 4.3: Randomly picked training images of P impedance profile along the post-

stack seismic line...... 101 xv

Figure 4.4: Energy plot of initial model space constructed by the 1000 training images of P impedance profile along the post-stack seismic line with

most energy (>90%) contained in the first 200 Eigenvectors...... 102

Figure 4.5 Selected principal components of P impedance profile along the post-

stack seismic line...... 103

Figure 4.6: Estimated expectation value of P impedance profile by a) trace-by-trace inversion using GAIS (left); b) simultaneous inversion using PCA based

GAIS (right). Improved lateral continuity is demonstrated in the zones

marked by black ellipse...... 104

Figure 4.7: Normalized standard deviation: σ(Zp)/Zp of posterior P impedance maps

drawn from simultaneous inversion using PCA based GAIS...... 105

Figure 4.8: a) Synthetic seismic section derived from the expectation value of impedance profile using PCA based GAIS; b) the residuals subtracted

from the observed post-stack seismic data...... 106

Figure 4.9: Randomly picked training images of P impedance along the pre-stack

seismic line...... 109

Figure 4.10: Randomly picked training images of S impedance along the pre-stack

seismic line...... 110

Figure 4.11: Energy plot of initial model space constructed by the 1000 training images of elastic properties (P-, S- impedance and density) along pre- stack seismic line. The first 200 eigenvectors contain more than 80% of

the total energy...... 111

Figure 4.12 Selected principal components of P impedance profile along pre-stack

seismic line...... 112

xvi

Figure 4.13 Selected principal components of S impedance profile along pre-stack

seismic line...... 113

Figure 4.14: Estimated expectation value of P impedance profile from a) trace by trace inversion using GAIS and from b) simultaneous inversion using PCA based GAIS. Higher resolution is demonstrated from simultaneous inversion than trace-based inversion in the zone marked by the black ellipse. The well is located at the trace 40 (marked by the black dotted

line) with the gas layer marked in red...... 114

Figure 4.15: Estimated expectation value of S impedance profile from a) trace by trace inversion using GAIS and from b) simultaneous inversion using PCA based GAIS. Improved lateral continuity and higher resolution are demonstrated from the simultaneous inversion in the zone marked by the black ellipse. The well is located at the trace 40 (marked by the black

dotted line) with the gas layer marked in red...... 115

Figure 4.16: Estimated P/S impedance ratio from a) trace-by-trace inversion using GAIS and b) from simultaneous inversion using PCA based GAIS. Higher resolution is demonstrated in the simultaneous inversion than the trace-based inversion. The gas zone is more concentrated in the P/S

impedance ratio map derived from simultaneous inversion than that from trace-based inversion. Simultaneous inversion provides a more confident interpretation of the boundary of gas zone. The interpreted gas zone with

low Vp/Vs ratio is marked in the small black ellipse in the figure b, which matches the gas layer (marked in red along the black dotted line)

from the well log very well...... 116

xvii

Figure 4.17: a) Normalized standard deviation σ(Zp)/Zp of posterior P impedance maps estimated from PCA based GAIS; b) normalized standard deviation σ(Zs)/Zs of posterior S impedance maps estimated from PCA

based GAIS...... 117

Figure 4.18: a) Comparison of P impedance at the well location estimated from simultaneous inversion (blue) with the reference P impedance log; b) comparison of S impedance at the well location estimated from

simultaneous inversion (blue) with the reference S impedance log.118

Figure 4.19: a) Synthetic pre-stack seismic at the angle of 24 degree derived from the expectation value of 2D elastic properties using PCA based GAIS; b) the residuals compared with the observed pre-stack seismic at the angle of

24 degree...... 119

Figure 5.1 Meeting points for geophysics and reservoir engineering (Landa and

Kumar, 2011) ...... 126

Figure 5.2: Workflow for joint inversion of PP and PS wave (copied from Deng et

al., 2011) ...... 128

Figure 5.3: a) Objective function of joint inversion starting from unconditioned initial reservoir model with misfit of production data (green), misfit of

water saturation change (red) and summation of equally weighed misfits; b): objective function of joint inversion starting from water saturation maps constrained reservoir model with misfit of production data (green),

misfit of water saturation change (red) and summation of equally

weighed misfits...... 129

xviii

Figure 5.4: Novel HM workflow using two loops in a sequential way: in the first loop a) time-lapse seismic related data is applied to constrain the models; the constrained models are used as initial models for the second loop b), where the misfits from both types of data are equally weighed.

...... 131

Figure 5.5: The procedure to calculate the objective function Ewsat_binary: standard maps of water saturation changes simulated from the reference model

and from an updated model are transferred to binary images by defining

a water front, such that more than 60% of water saturation is considered as water, shown in blue and less that 60% of water saturation is considered as oil, shown in red. Then the synthetic binary image is

subtracted from the reference binary image. In the end, Ewsat_binary can be

calculated by summing all the absolute value of residuals together.134

Figure 5.6: a) Reference facies model; b) reference permeability model (PermX)135

Figure 5.7: The reference production data (water rate) over a five years history

sampled by month...... 136

Figure 5.8: The reference map of water saturation changes over a two years (a) and

a five years (b) history...... 136

Figure 5.9: Selected training images of facies models shown in a), b) and c) and

their corresponding permeability models shown in d), e) and f). ...140

Figure 5.10: Selected principal components of the permeability models in a

logarithmic scale...... 141

xix

Figure 5.11: Comparison of the mean of 60 binary images of the water saturation changes after two years before HM a) and after HM using different

objective functions c) Ewsat_pixel, d) Ewsat_binary, e) Ewsat_corr2. The reference

binary image of water saturation after two years is shown in b). ...142

Figure 5.12: Comparison of the average of 60 binary images of the water saturation changes after two years before HM a) and after HM using different

objective functions c) Ecomb_pixel, d) Ecomb_binary. The reference binary

image of water saturation after two years is shown in b)...... 143

Figure 5.13: Comparison of the reference permeability model c) with selected best-fit

permeability models based on different objective funcionts: a) Ewsat_pixel,

b) Ewsat_binary, d) Ep, e) Ecomb_pixel , f) Ecomb_binary ...... 144

Figure 5.14: Comparison of c): the standard deviation of the initial permeability models before HM with the standard deviation of conditioned permeability models after HM using different objective functions: a)

Ewsat_pixel, b) Ewsat_binary, d) Ep, e) Ecomb_pixel , f) Ecomb_binary ...... 145

Figure 5.15: Simulated water rates using different objective functions: a) Ewsat_pixel, b)

Ewsat_binary, d) Ep, e) Ecomb_pixel , f) Ecomb_binary , including the 5 year history period (before the black dashed line) and the 7 years forecast period

(after the black dashed line). Green and blue curves demonstrate simulated water rate before and after history matching accordingly. The

red curve is the simulation from the reference model...... 147

xx

Chapter 1: Introduction

1.1 INVERSE PROBLEM IN EXPLORATION GEOPHYSICS

When direct measurement of interior properties is quite expensive or impossible, an inverse problem is formulated to estimate the physical properties of a system. In such a case, the interior properties are usually termed model parameters, and their values can be estimated by combining information of the measurements of the observables (data), the physical relationship between data and model parameters and the prior information on the model parameters. The scientific procedure for a general inverse problem can be divided into following three steps (Tarantola, 2005): i. Parameterization of the system: discovery of a minimal set of model parameters whose values completely characterize the system. ii. Forward modeling: discovery of the physical laws allowing us, for given values of the model parameters, to make predictions of the results of

measurements based on some observable parameters. iii. Inverse modeling: use of measurements of the observable parameters to infer the actual values of the model parameters. The inverse problem in hydrocarbon exploration geophysics involves estimation of reservoir properties as a function of two-way vertical travel time or depth. The most accurate and high resolution logs can be generated from well measurements only in the vertical direction at a few sparse locations. Pseudo logs of reservoir properties away from well locations are usually derived from surface recorded seismic data, which are seismic wave recordings from man-made sources. Referring to the seismic inversion studied in this research, reservoir properties are parameterized by the elastic properties, including compressional- and shear-impedances

1

(the product of compressional- and shear velocity with density). The seismic traces (wave amplitude) are generated by the convolution of a wavelet with a series of reflection coefficients. The most challenging part is the inverse modeling, which will be introduced in the section 1.3.

1.2 BAYES THEOREM

Before introducing several commonly used geophysical inversion methods, the

Bayes theorem is described here to demonstrate how the inverse problem is cast in a statistical framework. In the Bayesian framework, the degree of belief can be updated by evidence. Represented in mathematical form, the posterior probability density function (PPD) is proportional to the product of the prior and the likelihood (figure 1.1):

p(m)  p(dobs | m) p(m | dobs)  , p(dobs) (1.1)

where m and dobs represent model parameter and data vectors, respectively; p(m) is a priori probability density function (pdf) representing the prior knowledge independent of data; p(dobs | m) is the likelihood function representing the probability of obtaining the observation given the model; p(m| dobs ) is the conditional pdf of m given the data d representing the complete solution of an inverse problem, or called the target distribution.

The denominator p(dobs ) called the marginal likelihood, is a constant for normalization purposes (sum of the PPD equals one). Normally the likelihood function dominates the much larger subspaces of the model space over the prior pdf. The choice of the likelihood functions depends on the noise distribution. With the assumption of Gaussian noise, the likelihood function can be approximated as: 2

l(dobs | m)  exp(E(m)), (1.2) where E(m) is the error function given by

T 1 E ( m )   ( d obs  g(m)) / 2 CD (dobs  g(m)), (1.3)

where g(m) is the forward modeling operator and CD is called the data covariance matrix, which consists of the observation error and theory error. A complete solution of an inverse problem can then be described by the expectation value (or named the posterior mean model, eqn. 1.4), marginal distribution of model parameters (eqn. 1.5) and the posterior variance (eqn. 1.6):

m  dm m p(m| d ),  obs (1.4) (m |d )  dm ... dm dm ... dm (m|d ), i obs  1  i1 i1  M obs (1.5)

C '  dm  ( m  m) (m  m)T  p(m| d ). (1.6) M  obs

All these equations can be expressed in the form of an integral

I  dmf (m)p(m| d ).  obs (1.7)

Analytically evaluating these integrals requires performing forward modeling of each point in the model space, which is impractical in a high dimensional model space.

3

To circumvent it, stochastic inference methods introduced in the section below are commonly used.

Figure 1.1: A schematic representation of the relationship between prior, likelihood and posterior (Hong, 2008)

4

1.3 METHODS FOR GEOPHYSICAL INVERSION

Geophysical inversion methods can be broadly classified into two categories: (1) direct inversion methods or operator-based inversion; (2) model based inversion methods (Sen and Stoffa, 2013). In direct inversion methods, the model parameters are calculated directly from measurements using a reverse operator, such as layer-stripping method (Yagle and Levy, 1983, 1985; Clarke, 1984; Ziolkowski et al., 1989; Singh et al., 1989) and Born inversion

(Clayton and Stolt, 1981; Weglein and Gray, 1993; Stolt and Weglein, 1985). However, designing a reverse operator is quite challenging when the forward modeling is very complex. Furthermore, the operator based inversion methods are usually quite sensitive to the problem of incomplete data contaminated with noise (Sen and Stoffa, 2013). To overcome such a problem, model based inversion methods are preferred for seismic inversion. Unlike the direct inversion methods, there is no need for model based inversion methods to construct a reverse operator. In a model based inversion approach, model parameters are updated for a better match (fitness) between observed and synthetic data, which is calculated by forward modeling based on the updated model. The fitness is quantified by a misfit function, or an objective function, which is usually given by an error norm (e.g., Menke, 2012). If the misfit is small enough, the current model is accepted as one of the solutions. Otherwise, we perturb the current model and recalculate the misfit. This forward modeling based procedure is repeated until an acceptably small misfit is obtained. The model based inversion methods can be further divided into linear/linearized methods, iterative linear methods for quasi-linear problems, enumerative method, stochastic inference methods and global optimization methods (Sen and Stoffa, 2013). 5

Linear/linearized methods, such as the least-square method are usually applied when the forward modeling is a linear operator, or the relationship between the data (or the change of data misfit) and the model parameters (or the change of model parameters) can be linearized under certain conditions. The solutions can be obtained by a one-step procedure using methods of linear algebra. When such linearity is invalid, the error surface is curved, and the minimum of the curved error surface usually cannot be reached by a one-step procedure. In such situations it may be approached iteratively through a series of linear segments, which are defined by the sensitivity matrix whose elements are the partial derivatives of observed data with the respect to model parameters. For such quasi-linear problems, the solution is updated iteratively through linear methods. This gradient based approach belongs to the category of local optimization methods because the solution obtained is initial model dependent. If the error surface has multiple minima, only the one closest to the initial model will be found. However, in many geophysical inverse problems, the relationship between the model and the data is nonlinear and the error function usually contains multiple minima of different heights. Gradient based local optimization methods may not be able to find the solution with global minimum error unless the initial model is close enough to it. In such highly non-linear problems, enumerative methods, stochastic inference methods and global optimization methods are more suitable than linear or iterative linear methods. In an enumerative method, the best-fit model is located by calculating the forward modeling at each point in a pre-defined model space Such a grid search technique works well in a small dimensional model space with a limited number of model parameters and grids. It is, however, computationally quite expensive and impractical in a high-dimensional model space.

6

Stochastic inference methods solve the inversion problem by drawing samples from the posterior probability distribution of the model space given the observed data and evaluating the Monte Carlo integral (eqn. 1.7). The stochastic inference methods can be divided into two groups (Southey and Shurmann, 2001): independent Monte-Carlo methods, such as importance sampling and rejection sampling (MacKay, 1998; Geweke, 1989), and dependent Monte-Carlo Markov Chain methods, such as Metropolis sampling (Metropolis et al., 1953) and Gibb’s sampler (Geman and Geman, 1984).

Independent Monte-Carlo methods, such as importance sampling, randomly draw samples independently from a proposal distribution and then assign weights to the samples according to the ratio of PPD and prior probability density to compensate for the biased sampling from a different distribution. Unbiased estimation of the expectation value can be derived, however, only if the proposal distribution is similar to the target distribution. It is hard to obtain an appropriate proposal distribution for each model parameter in most geophysical inverse problem without sharply peaked prior distributions. A simple variation of importance sampling, called Greedy Importance sampling (GIS), is presented by Schuurmans and Southey (2000, 2001). It shows an improved inference quality compared to the conventional stochastic inference methods. The GIS has been tested with several inference problems and proved that the technique yields unbiased estimates even the prior distribution misses the high probability regions of the posterior distribution. Dependent MCMC methods draw samples from the target distribution by constructing a Markov chain. The most common MCMC method is the Metropolis- Hastings sampler (Metropolis and Ulam, 1949; Hastings, 1970), which updates the current sample by using a proposal distribution and rejects it based on the change of misfit. If the change of misfit is negative, the update is always accepted. Otherwise, it is 7

accepted based on certain probability. The critical factor is how to balance computational efficiency and accuracy. High acceptance rate can be obtained by taking small steps, but it is very slow for exploring the entire region of the posterior distribution. Large perturbations are usually associated with high rejection, and the risk of missing high probability regions is high. Global optimization methods, such as simulated annealing (SA), genetic algorithm (GA) and very fast simulated annealing (VFSA), are the most popular meta- heuristics that look for the maximum a posteriori (MAP) point. The SA (Kirkpatrick et al., 1983) uses a temperature-dependent probability to define the acceptance rule such that along a predefined cooling schedule, the updates in high temperatures are more likely to be accepted than at low temperatures, where the SA chain approaches MAP point. The purpose of this design is to avoid getting trapped into local minima. The GA (Goldberg 1989; Davis, 1991; Davis and Principe, 1993; Suzuki, 1998) borrows an analogy between model updating and biological evolution. The new update, considered as one generation, is generated by the genetic processes of selection, crossover and mutation with the acceptance rule based on the change of fitness. Stoffa and Sen (1991, 1992) introduce temperature dependent fitness function for a better control of convergence of the GA. Very fast simulated annealing is a modification of SA proposed by Ingber (1989). The speed of convergence is dramatically increased by generating a model in each iteration from a temperature dependent Cauchy distribution with a specified cooling schedule. The SA, GA and VFSA have been applied to many geophysical inversion problems with the purpose of searching for the global minimum error (e.g. Rothman, 1985; Basu and Frazer, 1990; Sen and Stoffa, 1991, 1992, 1996, 2013; Sambridge and Drijkonigen, 1992). The problems with MAP methods are: 1): For a complex probability distribution, the distribution is skewed and the expectation value may be far away from the MAP 8

point; 2) the uncertainty is usually under-estimated because of distortion of distribution for the peak of PPD, such as SA and VFSA. In general, sampling based on MAP methods will result in biased estimation, and its accuracy depends on the shape of target distribution, the optimization method and the number of independent runs (Sen and Stoffa, 1996). More details of some commonly used stochastic inference methods and MAP methods are given in the chapter two for completeness. For an optimal balance between the computational efficiency and accuracy, a hybrid method combining a global optimization method and an independent Monte Carlo method is developed and introduced in the chapter 3. This new method, called greedy annealed importance sampling (GAIS), is a combination of VFSA and GIS. It comprises two major steps. In the first step, multiple VFSA threads with a small number of iterations are employed to roughly locate an important region of the target distribution. In the second step, dense sampling around the important region is performed using GIS (Schuurmans and Southey, 2000, 2001) by taking small steps with a fixed step length along axis-parallel directions. This design enables sampling the important part of the posterior distribution very accurately and efficiently by taking the advantages of good starting models and grid walk strategy. The GAIS is applied to estimate reservoir elastic properties as well as their uncertainties using both post- and pre-stack seismic from

Hampson Russell Strata demo data in chapters 3 and 4.

1.4 SIMULTANEOUS SEISMIC INVERSION

Lateral continuity plays an important role in stratigraphic interpretation of a depositional system. However, traditional seismic inversion based on trace-by-trace inversion without considering the correlation between traces may not preserve the lateral continuity very well. The discontinuity in the inverted image might be due to the

9

inconsistency of data acquisition and data processing at different locations, as well as the inherent non-uniqueness problem of inversion. In most seismic inverse problems, especially in post-stack inversion, multiple models may fit the measurements at each trace. The inverted impedance model of one trace may differ from the neighboring models, which is physically not realistic. Such a laterally discontinuous earth model can result in incorrect stratigraphic interpretation of the depositional environment. Geological constraints are generally incorporated in a trace-by-trace inversion for a better spatial smoothness. Gelderblom and Leguijt (2010) introduced lateral continuity in a stochastic seismic inversion by using a conditional prior distribution, which is derived from current model at neighboring traces, i.e., well logs and variograms. If an increase in the value of a model parameter at the current location has been accepted, an increase of model parameter at its neighboring locations is more likely to be accepted. Merletti et al. (2003) conduct trace-based geostatistical inversion by using a lateral variogram inferred from fluvial depositional environment and vertical variogram inferred from correlation of well logs. Although these geostatistics-based inversion algorithms can improve lateral continuity in certain geological features, they may also damage some real discontinuous events related with faults and folds. They also don’t replicate the physics of deposition.

An alternative approach to improve the lateral continuity is simultaneous inversion of traces from all surface locations along a 2D line or a 3D volume. Rather than estimating the elastic properties at one trace, we update elastic properties at all locations simultaneously to match the seismic profile. It involves optimization of a function with a large number of variables using a large data volume. This was not attempted before due to its extremely high computational demand considering the high dimensional model space and large data set. To reduce the dimension of the model space, an efficient model 10

parameterization method, principal component analysis (PCA), is applied to enable simultaneous seismic inversion. PCA has been successfully applied as an efficient parameterization tool in image recognition, compression (Kim, 2002), reservoir modeling (Echeverria and Mukerji, 2009) and history matching ( et al., 2012). Given a set of training images drawn from the prior distribution, PCA linearly transforms them into a set of uncorrelated principal components. The number of principal components (PCs) is usually much smaller than that of the model parameters due to the strong correlation between the training images. Based on the PCs and the average of training images, the model space can be reconstructed with a small amount of error by updating only the coefficients for PCs. In the case of PCs based seismic inversion, instead of updating impedance models of all traces simultaneously, only the weights for PCs will be updated at one time. Therefore, the dimension of model space is reduced significantly. The feasibility of simultaneous inversion using PCA is reported based on post- stack and pre-stack data along a 2D seismic line in chapter 4. The results are compared with those obtained from trace-by-trace inversion using the same seismic lines reported chapter 3.

1.5 RESERVOIR MONITORING

Reservoir monitoring involves reservoir modeling and prediction of future production – it is needed during the production period. In the reservoir modeling, reservoir fluid flow properties (e.g. permeability and porosity) are calibrated (or inverted) by matching the history of the dynamic measurements such as the production data. This process is traditionally known as “history matching” (HM). The HM is essentially an inverse problem, where the reservoir properties are updated for the global minimum of

11

the misfit between the synthetic flow responses and the dynamic surface measurements of flow history. The adjusted reservoir models can then be used to simulate future production for reservoir management and business decision making. A reliable forecast primarily depends on the accuracy of reservoir model. Conventional HM is usually carried out by geologists and reservoir engineers. At first, geologists built static geo-models by honouring well data and 3D seismic data. Then reservoir engineers sample the reservoir models by honouring geology and dynamic measurements, i.e., the production data. However, HM using production data alone suffers from a large degree of non-uniqueness and the uncertainty away from the wells is very high. Simulation from variable arrangements of reservoir parameters might match the history of production. Even though the adjusted model can reproduce the production history, they may have different production forecasts. The main challenge of HM is how to reduce the model uncertainty and improve the model predictability. Quantitative integration of production data and time-lapse geophysical data is aimed at overcoming this challenge. Time-lapse seismic data (repeated seismic survey) measures the differences of seismic amplitude over the same region but at successive times. These changes, named 4D signal, are due to the variation of pore pressure and water saturation (Nur and

Simmons, 1969; and Nur, 1988) caused by fluid movement, such as oil/gas production and water injection. Integration of the 4D signal in the HM process provides an opportunity to reduce uncertainty in reservoir models and increase the reliability of forecast (Stephen and MacBeth, 2006; Landa and Kumar, 2011; Xue et al., 2013a). This is because seismic data has higher resolution in lateral directions (10-20m) than the well logs.

12

Quantitative integration of production data and time-lapse seismic (related) data has been actively studied in the last decade; however, it still remains a challenging task. The previous studies of HM using quantitative joint inversion generally differ from: 1) setting the meeting point between geophysicist and reservoir engineers in the workflow (fig. 5.1), namely defining a seismic attribute to be used for the objective (misfit) function, for instance, change in pressure and saturation (Landa and Horne,1997), seismic impedance (Gosselin et al., 2003; Stephen et al., 2005; Roggero et al., 2007; Castro,

2007), gas-presence indicator (Kretz et al, 2004), seismic amplitude (Landa and Kumar,

2011; Dadashpour el al., 2007), and seismic time shift and time strain (Tolstukhin et al., 2012); 2) Inversion algorithms for updating the models, such as gradient based (Landa and Horne, 1997; Dadashpour et al., 2007), gradual deformation method (Kretz et al., 2004; Roggero et al., 2007), probability based perturbation (Castro and Caers, 2006), Particle Swarm Optimization method (Suman, 2011; Jin et al., 2012), Very Fast Simulated Annealing (Jin et al., 2012), Markov Chain Monte Carlo method (Landa and

Kumar, 2011); 3) model parameterisation, such as Principal Component Analysis (PCA) (Dadashpour, 2009; Echeverria1 and Mukerji, 2009; Suman, 2011; Chen et al., 2012), pilot points (Jin et al., 2009), petrophysical properties (permeability, saturation, fault transmissibility, fracture orientation/density) and their related global and regional multipliers (Stephen et al., 2005; Landa and Kumar, 2011; Tolstukhin et al., 2012). One important aspect that has not been reported so far is the impact of different objective functions, or different types of data sets, on model uncertainty and predictability.

Here I report on a novel stochastic inversion workflow designed for quantitative integration of production data and time-lapse seismic data to reduce model uncertainty and improve the model predictability. Different objective functions regarding different types of data, including both production and time-lapse seismic data, are investigated as 13

well to study their impacts on the model uncertainty estimation and predictability in the chapter 5. Considering the expensive computational cost of reservoir simulation, multiple very fast simulated annealing (MVFSA) is applied for the HM and uncertainty analysis. The tools that I used for sampling of the training images and forward modelling of reservoir simulation are internal sources from Shell, which are not covered in this dissertation.

14

Chapter 2: Overview of stochastic inversion

Two basic problems in statistical inference are model selection and parameter estimation (Gregory, 2005). Model selection is the procedure to decide which model is the most probable one among all competing candidate models given the current state of knowledge. The competing models may have different model parameterizations. On the other hand, assuming that the model parameterization is known, the parameter estimation involves evaluation of the expectation of a function of interest under the posterior. The expected value of a random variable is the value one would "expect" to find if the random variable process can be repeated an infinite number of times. It is essentially a weighted average of all samples. The variance is the standard deviation of all samples from the expected variable, which measures dispersion of samples around the expected one. These two statistical properties are the key factors in the characterization of a probability distribution, which are desired in most geophysical inverse problems. In this thesis, I focus mainly on the estimation of the expectation value of each parameter and its variance with the posterior defined in the Bayesian framework (introduced in section 1.2). Estimation of the expected value of a function of interest involves drawing samples from a posterior probability distribution, which is followed by evaluation of the

Monte Carlo integral (eqn. 1.7). Stochastic inference methods are usually applied to address this problem. The family of stochastic inference methods can be divided into two major groups (Shuurmans and Southey, 2000): independent Monte Carlo methods, such as importance sampling (Hammersley and Handscomb, 1964; Rubinstein, 1981; Geweke, 1989) and dependent Markov Chain Monte Carlo (MCMC) methods, such as Metropolis- Hastings sampling (Metropolis and Ulam, 1949; Metropolis et al., 1953; and Hastings, 15

1970). Both approaches are computationally quite expensive especially in a high dimensional problem. Instead of sampling from the entire posterior probability distribution, global optimization algorithms, such as simulated annealing, genetic algorithm and particle swarm optimization, speed the convergence by sampling only the most significant part of posterior probability distribution, which is around the maximum a-posteriori (MAP) point.

As a background review, importance sampling, Metropolis sampling and several global optimization methods are introduced in this chapter with details on their strategies and limitations.

2.1 IMPORTANCE SAMPLING

In importance sampling, we attempt to estimate the expectation value of a function of interest , where its probability density is proportional to Instead of direct sampling of from the target distribution , we draw samples

independently from a simple proposal distribution, with probability densities proportional to Referring to the inversion problem, is essentially the probability density of the posterior | and may be the probability density of the prior distribution. To compensate for the biased sampling from a different distribution, each sample is then weighted according to the ratio of posterior probability density and prior probability density: . In the end, the expectation value of can be determined by summation of all weighted samples together using the following formula from Neal (1998):

n n E f (m)   w(mi ) f (mi ) /  w(mi ) , (2.1) i1 i1 16

where is the importance weight of each sample and ∑ will converge to ∫ ∫ as goes to infinity, where ∫ and ∫ are normalizing constants for and , and will converge to the expectation value of with the distribution defined by (Neal, 1998). Referring to the equations (eqn. 1.4 and eqn. 1.6) of Monte Carlo integral, we notice that for the estimation of the expected mean and for the estimation of the variance of . The accuracy of importance sampling depends on the variance of the normalized importance weights (Neal, 1998). If a proposal distribution misses high probability regions of the target distribution (fig. 2.1), importance weights with high variability will result in biased estimation because there are many unrepresented points in the sample and only a few high weighted points dominate the estimation (Schuurmans and Southey, 2000). For an unbiased estimation, the proposal distribution must approximate the target distribution over most of the domain. However, finding a fairly good proposal distribution of is quite challenging, especially when is high-dimensional and has multiple peaks or troughs (i.e., a multi-modal function). Furthermore, the shape of the target distribution is usually unknown. To derive it, evaluation of each point in the model space is required, which is computationally quite expensive for a high dimensional problem. To circumvent the direct evaluation of the Monte Carlo integral, dependent sampling methods based on the construction of Markov chain, called Monte Carlo

Markov Chain (MCMC) methods are applied to reach the posterior distribution as its equilibrium distribution. The properties of Markov chain and Metropolis algorithm based MCMC methods are described in the section 2.2 and section 2.3, respectively.

17

1

0.8

0.6

0.4

0.2

0 -20 -10 0 10 20 30

Figure 2.1 A challenging situation for importance sampling with blue curve showing the proposal distribution and red curve showing target distribution. There are many unrepresented points and only a few high weighted points dominate the estimation (Schuurmans and Southey, 2000).

18

2.2 MARKOV CHAIN

A Markov chain is a discrete-time stochastic process referring to a sequence (chain) of random variables, in which the conditional probability distribution of a future state depends only on the random variable’s current state and not on the previous one. The sequence of random variables is connected through a transition probability, which is the probability that a process at one state moves to another state in one single step. More specifically, at each state, a number of tentative steps are proposed by model perturbation. Then a pre-defined acceptance criterion is applied to allow the movement to another state. Therefore, the transition probability is determined by the product of model generation (proposal) probability and acceptance probability. A Markov chain starting from a random variable will converge to a unique stationary distribution (independent from the initial sample) after a sufficient number of iterations if the Markov chain possesses two properties: irreducibility and aperiodicity (Sen and Stoffa, 2013). Irreducibility means that all states communicate with each other or each state can be reached from every other state in a finite number of transitions. A Markov chain is said to be aperiodic if its period is one. In other words, the number of steps taken to move between two states is not a multiple of some integer, which results in a cycle of fixed length between two states (Walsh, 2004). As irreducibility and aperiodicity are essentially determined by the transition probability, the key factors of a Markov chain can be interpreted as how we perturb the model and how we set the acceptance rule. Model perturbation is defined by the proposal distribution, such as a random walker or a Gaussian. The most common acceptance rule satisfying irreducibility and aperiodicity is defined by the Metropolis criterion (Metropolis et al., 1953), which is described in the section below.

19

2.3 METROPOLIS SAMPLING

The Metropolis algorithm based Markov chain Monte Carlo (MCMC) method was developed by Metropolis and Ulam (1949), Metropolis et al. (1953), and Hastings (1970). The idea is to construct a random (Monte Carlo) walk, in which each step depends only on the previous one (Markov chain). The proposed moves are accepted or rejected based on the Metropolis rule. After a large number of moves (or so called burn- in period), a stationary distribution independent of the initial sample will be reached, which is essentially the posterior distribution in the inversion problem. An example of the Metropolis rule based MCMC is given below.

Assuming we are trying to draw samples mi from a distribution with probability density of p(m), where p(m) = q(m)/K, with K an unknown normalizing constant, which is difficult to compute. Steps taken by the MCMC are listed below (adapted from Walsh, 2004)

1. Start with any initial value m0 satisfying q(m0)>0. 2. Based on current value of m, sample a candidate point m* from a symmetric

proposal distribution G(m1,m2), which is the probability of returning a value of

m2 given a previous value of m1. Using the random walk sampler, this step can

be implemented as mi1  mi   , where σ is a positive small number

considered as the step length of the Markov chain and  is a random variable

sampled from Gaussian distribution with zero mean and unit variance.

3. Given the candidate point m2, calculate the ratio of the probability density at

the candidate m* and current mj-1 points,

p(m*) q(m*)    . (2.2) p(m j1) q(m j1)

20

Notice that the normalization constant K cancels out. 4. If the perturbation increases the density (α>1), accept the candidate point (set

mj = m*) and return to step 2. If the perturbation decreases the density (α<1), then generate a random number u from a uniform distribution [0, 1]. If α>u, then accept the candidate point, else reject it and return to step 2. For an arbitrary proposal distribution, the acceptance probability density can be represented using the following Metropolis-Hastings criterion:

 q(m*)G(m*,m j1)    min ,1 . (2.3)  ,m   q(m j1)G(m j1 *) 

Metropolis–Hastings based and other MCMC algorithms have following disadvantages. 1: It may be quite time consuming for the Markov chain to reach the desired joint distribution in a high dimensional model space. The process taken by the system prior to the target or the stationary distribution is named as “burn-in”. Jumping in a too large size will likely result in a high rejection rate, while jumping in a too small size will generate a set of high correlated samples with slow movement towards the desired distribution. 2: Even through the stationary distribution has been reached, the samples nearby are correlated with each other, which may not correctly reflect the distribution. This means that to obtain independent samples of the target distribution, the majority of samples need to be thrown away, considering only every nth (e.g. 100th) sample is taken from the samples after the burn-in period and the initial samples (e.g. one thousand samples at the beginning) during the burn-in period should be ignored because their distribution is different from the target distribution. 21

In geophysical inversion problems, we are facing the challenges that the model space is usually high dimensional and the posterior may be multi-modal. In such a case, global optimization methods looking for the highest peak of PPD (maximum a-posteriori probability) may simplify the problem, although it cannot fully characterize the PPD.

2.4 GLOBAL OPTIMIZATION METHODS

Global optimization methods are point estimates, looking for the point corresponding to the maximum of a-posteriori probability (MAP), which is the same as the point with the global minimum of an objective (or error) function defined by the misfit. Compared with MCMC methods, the MAP methods provide fast convergence to the highest peak of PPD by reducing the “random walk” behavior. Several commonly used global optimization methods are introduced in this section.

2.4.1 Metropolis simulated annealing

Simulated Annealing (SA) is a global optimization method by simulating physical annealing process to reach a global minimum energy state, in which model parameters are considered as particles in an idealized physical system (Kirkpatrick et al., 1983). In a physical system, when a solid is heated to a certain temperature, all the particles are distributed randomly in a liquid phase. After following a slow cooling schedule, all the particles arrange themselves in the low energy state, where crystallization occurs. Referring to the stochastic processes, each configuration of particles is considered as a state of model parameters; the energy function is the error function or the objective function, and the configuration with the lowest energy state is the MAP point. Before going into the details of the sampling strategy, a quick review of the mathematical background is given below. At each temperature, the thermal equilibrium is reached when the probability of being in the state i with energy Ei follows Gibbs pdf: 22

exp(Ei /(KT )) p ( E i )  , (2.4)  exp(E j /(KT )) j  S where E is the energy (error) function and T is the control parameter that has the same dimension as that of E. The set S consists of all possible configurations of particles (models) and K is the Boltzmann’s constant. As the temperature decreases, the peak of Gibbs distribution becomes more and more distinguishable and converges to the MAP point.

In geophysical inversion problems, K is set equal to 1. Now the temperature T dependent Gibbs distribution can be rewritten as:

exp(E(mi ) /T ) p ( m i )  , (2.5)  exp(E(m j ) /T ) jS where mi is the model configuration in the state i and E(m) is the error function calculated by T 1 (2.6) E(m) 1/ 2(dobs  g(m)) CD (dobs  g(m)),

where dobs is the data measurements, g(m) is the forward modeling operator and CD is called the data covariance matrix, which consists of the observation error and theory error. The temperature-dependent Gibbs distribution is not the posterior distribution, but it converges to the highest mode of the posterior distribution as the temperature approaches 0. Furthermore, Gibb’s distribution at temperature T=1 is essentially the posterior distribution with the assumption of Gaussian error (Tarantola, 2005; Sen and Stoffa, 2013).

23

The computational procedures of Metropolis based SA (with its pseudo code given in the fig. 2.2) are quite similar to those for Metropolis based MCMC, except that SA uses a temperature dependent acceptance probability. SA starts with an initial sample and perturbs it for a new sample. The error function of the current sample is compared with that of the new sample. If the error decreases, the new update is always accepted. Otherwise, it is accepted with a temperature dependent probability:P = exp(-ΔE/T). This means that when the temperature is high, possible hill-climbing (with the respect to the error function) is accepted with a reasonable probability in order to explore the entire space; while such hill-climbing moves are more likely to be rejected as the temperature decreases. The purpose of this design is to avoid being trapped in local minima of the error function. Such temperature dependent acceptance rule indicates that SA does not sample from the posterior distribution. Rather it samples from a series of intermediate distributions, which are biased towards the MAP point. The major difficulty of SA in reaching the state of global minimum error is how to choose an appropriate cooling schedule for a given problem. If starting with a too low temperature or cooling too fast, the material may become “quenched” by becoming trapped into a local minimum.

24

Figure 2.2: A pseudo FORTRAN code of Metropolis simulated annealing algorithm (Sen and Stoffa, 2013)

25

2.3.2 Very fast simulated annealing

Although in a carefully controlled cooling schedule the global optimum can be obtained by SA, it is quite time consuming if a random perturbation is applied in a high dimensional model space. To speed up the convergence, several modifications are employed by Ingber (1989, 1993), namely very fast simulated annealing (VFSA). Instead of sampling a D-dimensional parameter space, a D-product of one-dimensional Cauchy distribution (fig. 2.3) is applied for model perturbation to speed up the algorithm. The proposal probability is given as:

2 2 1/ 2 q(m)  T /(m  T ) , (2.7)

when Δm is the perturbing range and T is the temperature, which follows the annealing schedule defined with a temperature decreasing exponentially along time k (iteration),

. Different model parameters can have different starting temperatures perturbing ranges and decay parameters c. A temperature-dependent Cauchy distribution has a “fatter” tail than the Gaussian distribution, which allows more substantial perturbation for a current sample at high temperature and small perturbation at low temperature. A pseudo FORTRAN code of VFSA is shown in fig. 2.4. Although this design speeds up the convergence, by sampling from progressively sharper proposal distribution, it also brings bias due to the short tail (Sen and Stoffa, 1996). Furthermore, the problem of designing a slow cooling schedule is not fully solved

(Neal, 1998). The setting of initial temperature T0, the decay parameter c and parameter D is case dependent. An inappropriate setting may lead to stagnation at a local minimum.

26

1 T=10 T=1 0.8 T=0.1 T=0.01

0.6

0.4

0.2

0 0 2 4 6 8 10

Figure 2.3: A temperature dependent Cauchy like distribution applied in model generation: more substantial perturbation (fatter tail) is generated in high temperature than in low temperature.

27

Figure 2.4: A pseudo FORTRAN code of VFSA (Sen and Stoffa, 2013)

28

2.4.3 Genetic algorithm

Genetic algorithm (GA), first proposed by John Holland (1975) and described in Goldberg (1989), is a global optimization method based on an analogy with biological evolution. Each realization of model parameters is represented by each individual chromosome and model perturbation is conducted through biological evolution processes. During evolution, information between models is exchanged efficiently and randomly updated in the process of selection, cross over and mutation. This allows the algorithm to improve the model fitness by assimilating and exploiting the information accumulated (Sambridge and Drijkoningen, 1992). Davis and Principe (1993) further demonstrate the asymptotic convergence of GA when the mutation operator is used. GA (Holland, 1975; Goldberg, 1989; Davis, 1991; Forrest, 1993) starts with designing a coding scheme of model space with specifications of boundaries and resolution. In a simple GA, the model space is coded in binary. Other coding schemes, such as gray coding (Forrest, 1993), logarithmic scaling, delta coding (Whitley et al.,

1991) and real coding (Hong and Sen, 2008) have been used to improve the numerical precision. Different model parameters may use different coding schemes due to their variable ranges and resolutions. After a coding scheme has been selected, a chromosome can be formed by putting the coded model parameters all together as a long bit string. In this way, the model space can be parameterized using a population of chromosomes with the first generation of chromosomes are chosen at random. In the second step, a fitness function is defined to measure the similarity of data and the synthetic response from the current generation of the individual. The definition of the fitness function is case dependent. Take an example of a geophysical inversion problem (Sen and Stoffa, 1991), where two normalized correlation functions are given below: 29

d  d (m) F(m)  0 s , (2.8) 1/ 2 1/ 2 (d 0  d0) (d s (m)  ds (m))

and

2d  d (m) F(m)  0 s , (2.9) (d 0  d0) (d s (m)  ds (m))

where d0 is the observed data, ds(m) is the synthetic response from generation of m, and  represents correlation between the two.

After defining the fitness function, the biological evolution begins with the process of selection, where the chromosomes with higher fitness are selected with higher probability than those with lower fitness. Then a certain number of selected chromosomes are chosen to be paired for crossover operation. In crossover, two offspring are produced from each pair of parents by exchanging the information between the paired models. Crossover can be performed either through single-point or multi-points. In a single-point crossover, one random crossover point is picked for both parents. Then the string from the beginning of chromosome to the crossover point is copied from one parent and the rest is copied from the second parent (fig. 2.5). In a multi-point crossover, a similar process is repeated for each model parameter with independent crossover locations. In this way, the population of the next generation preserves the genetic information from their parents and gains variety by exchanging information as well. The variety of each generation can be further increased by mutation, in which a random mutation point is selected, and its bit value is changed based on a certain mutation

30

probability. By repeating these three operators, the model parameters are perturbed in the direction of increasing fitness. The basic GA looks quite robust; it has been successfully applied in many cases, including seismic inversion problems (Sambridge and Drijkoningen, 1992; Sen and Stoffa, 2013). The performance of basic GA is dependent on the model parameterization and the diversity in the population. By the lack of diversity and inefficient parameterization, either a premature convergence (a rapid stagnation at a local minimum) or slow convergence to the global optimum may occur.

Figure 2.5: Process of single-point crossover

2.4.4 Particle swarm optimization

Particle swarm optimization (PSO) algorithm, first introduced by Kennedy and Eberhart (1995), searches for the global optimum by simulating the social behavior of a swarm of insects (birds or fishes) hunting for food. If one in the swarm finds a desirable way to go, the other members will follow quickly.

Borrowing the analogy between a swarm of birds hunting for food and the convergence of samples (particles) towards the MAP point, each individual bird is considered as one particle in a multi-dimensional space indicated by its position and velocity. The position represents the current state of particle and velocity determines its movement towards future state. The particles move around the model space and communicate the best positions they have remembered to each other. The best positions include “local best” and “swarm best”. “Local best” gained from its own cognitive 31

knowledge drives each individual particle towards the best location already known. “Swarm best” is the best location found by all the particles in the swarm. More details of PSO are described below. At the initial step of PSO, a swarm of particles is drawn randomly from a specified model space, with each particle in an M-dimensional space representing one sample (ith) of model parameter mi  (mi1,mi2,...miM ) . For each particle, its previous best position and velocity in M-dimensional space are memorized. The position of each particle is updated through velocity adjustment, which is determined by the previous best position of its own and the best position provided among all the particles in the swarm. This updating is represented in the equations below,

k k1 l k g k vi  vi  b (.)(mi  mi )  c ran(.)(m  mi ) , (2.10) k1 k k mi  mi  a  vi , (2.11)

k k where mi and vi represent the current location and velocity of the ith particle at the kth l g iteration; mi is the best location achieved by the particle so far and m is the best location achieved by all the particles in the swarm prior to the kth iteration. The initial velocity for each particle is set as zero. The constant b and c are the learning rates and the constant a is a constriction factor introduced by Clerc (1999). The symbol ran(.) represents a random number sampled from uniform distribution within [0,1]. The flowchart of PSO is shown in fig. 2.6.

Compared with GA, PSO is simpler to implement. The PSO has been successfully applied in many applications, including geophysical inversion (Carlisle and Dozier, 2001; Shaw and Srivastava, 2007; Fernandez-Martinez et al., 2008). However, PSO perturbs the

32

model by using information on global and local best simultaneously, which is essentially making a compromise between the two. The accuracy of estimation can be critical in the case of a multi-modal problem.

Figure 2.6: Flowchart for PSO (Shaw and Srivastava, 2007)

33

2.4.5 Summary of global optimization methods

Compared with MCMC methods, global optimization methods provide fast convergence by sampling towards the MAP point. The fast convergence strategy also brings biased evaluation of Monte Carlo integral (including the estimation of expected mean and variance) of PPD. 1: In a complex high dimensional model space, the marginal PPD is skewed and the MAP point might be far away from the expectation value of individual parameters. 2:

The uncertainty of model parameters is usually under estimated by the distortion of distribution for the peak of PPD. In general, sampling based on MAP methods will result in biased estimation, and its accuracy is dependent on the shape of target distribution, the optimization method and the number of independent runs (Sen and Stoffa, 1996). For an optimal balance between the computational efficiency and accuracy, a joint approach of independent Monte Carlo sampling and dependent Markov chain approach is interested. In this category, Neal (1998) introduced annealed importance sampling

(section 2.5), which combines SA and importance sampling.

2.4 JOINT APPROACH (ANNEALED IMPORTANCE SAMPLING)

Neal (1998) proposed annealed importance sampling (AIS), which combines multiple simulated annealing chains using an importance sampling distribution. AIS is especially suitable to estimate the expectation value of variable when multimodality may be a problem. As discussed in section 2.1, the accuracy of estimation of expectations depends on the variance of normalized importance weights. If the prior distribution misses the important region of the posterior distribution, the variance of the weights will be very high. This leads to a biased estimation, which is dominated only by a few points with

34

large weights. To reduce the variance of weights and gain more heavily weighted points, multiple independent SA chains are applied to sample the points from the target distribution. Weights will be assigned on the final samples of multiple SA chains for accurate estimation of the expectation value. As mentioned in section 2.4.1, Metropolis SA, allowing possible free movement in a global scale, is more suitable in a multi-modal problem than conventional Monte Carlo Markov Chain (MCMC). However, SA has difficulty providing a precisely correct estimation and exactly the right probability (Neal, 1998). Therefore, weighting the points based on their importance in the PPD is necessary to compensate for the bias of estimation. Suppose the objective is to find the expectation value of f( m ) with the probability density function of p0(m), which is difficult to directly sample from. We are able to approximate p0(m) using a sequence of other distributions p1(m) to pn(m) and we are able to compute qj(m) that is proportional to pj(m). A SA Chain is then constructed with the distribution follows:  j 1 j q j (m)  q0 (m) qn (m) , (2.12) where and is the prior density (starting from ), is the unnormalized posterior density. Notice that, the SA chain defined here is different from the conventional Metropolis SA chain (in the section 2.4.1) and the probability distribution of the final state (as ) of this SA chain (eqn. 2.12).is proportional to the target distribution The procedure of sampling can be summarized as follows:

Generate mn-1 from pn.

Generate mn-2 from m n-1 using a transition Tn-1.

35

Generate m1 from m2 using a transition T2.

Generate m0 from m1 using a transition T1.

Then the final sample (m0) of each SA chain is used as the independent sample for evaluation of Monte Carlo integration. The weight wi for each independent sample in the ith SA chain is defined as (More details of the derivation of weights can be found in Neal, 1998)

qn1(mn1) qn2(mn2 ) q1(m1) q0(m0) wi  ... . (2.13) qn (mn1) qn1(mn2 ) q2(m1) q1(m0)

In the end, the expectation value of function f(m) can be estimated using eqn. 2.14 (Neal, 1998).

N N E f (m)   wi f (mi ) /  wi . (2.14) i1 i1

Annealed importance sampling is able to handle isolated modes and to characterize the posterior distribution by combining independent sampling with SA. Unlike dependent MCMC, there is no need to calculate the autocorrelations to assess the accuracy with AIS. Instead, estimation of the variance of normalized importance weights is required. Therefore, it is important to select an appropriate Markov chain transitions to reach the target distribution for each SA chain (Neal, 1998). Furthermore, AIS can be computationally expensive because a large number of SA runs are needed to get sufficient independent samples.

36

2.5 CURRENT SHORTCOMINGS AND NEW OBJECTIVES

The major problems encountered in solving most of geophysical inversion problems are: 1) there is nonlinear relationship between models and data; and 2) number of observations is smaller than the number of models, which results non-unique solutions and irregular error surface with multiple peaks and troughs (multi-modal). Deterministic methods, such as steepest descent, conjugate gradients and Least Squares, using gradient based algorithm assume the error surface is well defined and only provide one of the best fit models, which is usually initial model dependent. A better way to address this problem is using stochastic inference methods to sample the posterior distribution given available data for a fair representation of the expectation value and the uncertainty analysis. Unlike gradient based methods, no linearization is required by stochastic inference methods. Instead, directly forward modeling of synthetics based on the random sampling of model parameters is investigated to compare with the measurements and find better models with smaller misfit.

Stochastic inference methods can be divided into two classes: independent Monte Carlo methods and dependent Markov Chain methods. Independent Monte Carlo methods, such as importance sampling, sample from a proposal distribution independently and then assign weights to compensate the biased sampling. The problems associated with independent Monte Carlo methods are first of all, it is usually computational quite expensive to explore the entire model space in a high dimension. Second, its accuracy is dependent on the prior distribution. Sampling from an inappropriate prior distribution usually leads to high variance of weights and biased estimation of expectations. An alternative way to sample the posterior distribution is through Markov chain based Monte Carlo method. Conventional Metropolis/Metropolis-Hastings sampler 37

(Metropolis and Ulam, 1949; Hastings, 1970), are proven to asymptotically converge to a stationary distribution, which is the posterior distribution. However, it is usually quite time consuming for the walker to explore the entire model space by taking relative small steps, especially in a high dimensional model space and a majority of samples need to be discarded considering the possible long burn-in time and the correlation between the nearby samples. Instead of sampling from the posterior distribution, maximum a posteriori (MAP) methods, such as SA, GA, VFSA and PSO, improve the convergence speed through sampling towards the MAP point, which is the highest mode of the posterior distribution. The way of distorted sampling provides fast convergence, but it may bring biased estimation as well. At first, in the case of a high dimensional model space with a complex PPD, the marginal PPD of model parameters may be asymmetric. Therefore, MAP point may be different from the expected mean of each individual parameter. Second, biased sampling towards the global optimum is usually associated with an under estimated posterior variance (Sen and Stoffa, 1996). Furthermore, the performances of MAP methods are dependent on the user’s experiences. Sampling with an inappropriate cooling schedule or coding scheme may not reach the global optimum, but be stacked at some local minima.

As a joint approach of independent Monte Carlo estimation and dependent Markov chain method, Neal (1998) proposed annealed importance sampling (AIS), which is a combination of importance sampling and SA. Multiple independent SAs are conducted with importance weights assigned to the end point of each SA chain. Compared with importance sampling, it avoids biased estimation and high variance of normalized importance weights by sampling from a distribution proportional to the target distribution. Compared with SA alone, it improves the accuracy of estimation by 38

independent sampling and fairly weighing the samples. However, to attain a large weighted point, an appropriate sequence of intermediate distributions is required to reach the equilibrium, which is not clearly guided in the real physical problem. In this thesis, a new joint approach combining modified importance sampling and VFSA is proposed for an optimal balance between computational efficiency and accuracy. Compared with AIS, a fast convergence of each chain can be approached by VFSA and the limitation of weight dominated accuracy can be addressed by using greedy importance sampling (GIS). Even the final states of multiple VFSA threads may not reach the important region of posterior distribution, the greedy search strategy defined by GIS will generate independent blocks of large weighted points, including both global optimum points and local optimum points from the posterior distribution.

39

Chapter 3: Novel stochastic seismic inversion using Greedy Annealed Importance Sampling (GAIS)

3.1 INTRODUCTION

One of the major tasks in exploration geophysics is to estimate a log of rock properties as a function of two-way vertical travel time or depth. High resolution reservoir properties in both vertical and horizontal directions are expected to improve the confidence of interpretation, and the accuracy of reservoir location and reservoir volume.

The most accurate and high resolution logs can be generated from direct well measurements, however, only in vertical direction at a few sparse locations. To fill in the gaps between the wells, seismic data with better areal coverage in the horizontal direction are integrated with well data to invert for the elastic properties such as compressional and shear-wave impedance. Because most seismic inversion problems are non-linear, a model-based inversion is preferred. In model-based inversion methods, an objective (misfit) function is defined as the difference between observation and synthetic data, which is calculated from the forward modeling using sampled model parameters. Then the model parameters are iteratively updated to search for the minimum of the objective function. However, the topography of a typical objective function in a high dimensional seismic inversion problem is usually very complex containing multiple local minima. In addition, we are faced with a non-uniqueness problem, in which several models match the observation equally well within acceptable limits. Deterministic inversion methods based on a gradient search, such as least squares, steepest descent and conjugate gradient methods, only provide one of the solutions, which is usually the nearest to the initial model and is band limited (poor resolution). For an accurate estimation of expectation value of model

40

parameters and uncertainty analysis, sampling of the entire posterior distribution might be necessary. The most common ways to approach that is to make use of stochastic inference methods. The family of stochastic inference methods can be divided into two major groups: independent Monte Carlo methods and dependent Markov Chain methods. Independent Monte Carlo methods, such as the importance sampling, draw samples independently from a proposal distribution and then assign weights to each sample for compensation of biased sampling. In the end, the expectation value of the target distribution can be approximated by summation of all the weighted samples. This scheme is easy to implement; however, an appropriate proposal distribution is necessary for an unbiased estimation, which is usually impractical in a high dimensional inversion problem, because the marginal posterior distribution of each model parameter is unknown. To address this problem, Schuurmans and Southey (2000, 2001) made a simple variation on importance sampling, called greedy importance sampling (GIS). In GIS, each individual point is expanded to a block of points by a greedy search for the important regions of the target distribution. In this way, each individual block must contain at least one or two heavily weighted (important) points. Unbiased estimation independent of the prior distribution and improved inference quality using GIS are demonstrated in their papers by comparison with other conventional stochastic inference methods (e.g. importance sampling and Metropolis sampler). However, the efficiency of GIS in a high dimensional problem remains critical. It is computationally quite expensive to explore the entire model space with small steps and from arbitrary starting points. Instead of independent sampling, Markov Chain Monte Carlo methods (MCMC) tries to approach the equilibrium state of the target distribution through Markov chain, in which the next sample is only dependent on the current sample. The most common 41

MCMC method is the Metropolis-Hastings sampler (Metropolis and Ulam, 1949; Hastings, 1970), which generates random samples from a proposal distribution and rejects proposed moves based on the Metropolis criterion. Although this method is proven to asymptotically converge to a stationary distribution, it is generally very slow. Other Markov chain based global optimization methods, such as simulated annealing (SA), very fast simulated annealing (VFSA) and genetic algorithm (GA), speed up the convergence by biased sampling towards the maximum a posteriori (MAP) point.

The SA (Kirkpatrick et al., 1983) simulates the physical annealing process with a slow cooling schedule to reach the stationary distribution of the global minimum energy (error). To further improve its speed of convergence, VFSA uses a temperature dependent Cauchy distribution for model perturbation, in which substantial perturbations occur in high temperatures compared to those in low temperature (Ingber, 1989; Sen and Stoffa, 2013). The performance of SA and VFSA are, however, dependent on the starting temperature and defined cooling schedule. The Genetic algorithm (GA) is a global optimization method that is based on an analogy with biological evolution (Holland, 1975; Goldberg, 1989; Davis, 1991; Stoffa and Sen, 1991; Sambridge and Drijkoningen, 1992; Davis and Principe, 1991; Suzuki, 1998). Each realization of model parameters is represented by each individual chromosome and model perturbation is conducted through biological evolution processes. During evolution, information between models is exchanged efficiently and random updated in the process of selection, cross over and mutation. This allows the algorithm to improve the model fitness by assimilating and exploiting the information accumulated. A successful GA requires defining appropriate algorithm parameters. A premature convergence may occur by employing small number of parameters, while over-parameterization results slow convergence and non-

42

uniqueness. Stoffa and Sen (1991, 1992) introduced temperature dependent fitness function for fast convergence while preventing premature stagnation. The methods such as SA and GA can be termed as MAP methods. The problems with MAP methods are 1): by a complex probability distribution, the distribution is skewed and the expectation value (posterior mean) may far away from the MAP point; 2): the uncertainty is usually under-estimated because of biased sampling near the peak of the PPD (Sen and Stoffa, 1996). In general, a MAP method will result in biased estimation of expectation value and variance of samples unless the PPD is truly Gaussian.

Its accuracy is dependent on the shape of target distribution, the optimization method and the number of independent runs (Sen and Stoffa, 1996). Neal (1998) proposed a method called annealed importance sampling (AIS), which is a combination of importance sampling and a modified SA, which has the target distribution as the final state of the chain. Multiple independent Markov chains are conducted and weights are assigned to the end point of each chain. Compared with importance sampling alone, it avoids biased estimation by sampling directly from the target distribution, which has been already located by multiple annealing chains. Compared with SA alone, it improves the accuracy of estimation by fairly weighting the independent samples in the target distribution. However, an appropriate sequence of intermediate distributions (cooling schedule) is required for annealing chain to reach the target distribution, which is not clearly guided in a real physical problem. Furthermore, the computation can be quite expensive if a long “burn-in” time (from initial state to the target distribution) is needed for each chain. In this chapter, a new stochastic inference method is introduced to improve the accuracy of estimation on the models, their uncertainties and the expectation value of the target distribution, which is especially suitable for a high dimensional multi-modal 43

problem. In this method important regions of model space are explored and exploited efficiently through a combination of a global optimization method, which is multiple VFSA and a local optimization method, which is greedy importance sampling (Schuurmans and Southey, 2000, 2001). This new method proposed here, called greedy annealed importance sampling (GAIS, Xue et al., 2011) is employed to estimate reservoir elastic properties and their uncertainties using both post- and pre-stack seismic data. The results are compared with those derived from VFSA and from deterministic inversion method used in HRS.

3.2 BACKGROUND: GREEDY IMPORTANCE SAMPLING

Schuurmans and Southey (2000, 2001) introduced greedy importance sampling (GIS), which is a simple variation of importance sampling. The objective of GIS and importance sampling is to estimate the expectation value of a function of interest, given the models drawn from the target distribution . The problem is that it is usually difficult to draw samples directly from The strategy applied in the importance sampling is that samples are drawn from a simple distribution first, and then they are weighted based on the ratio of ⁄ where is the weight assigned to the sample , is the probability density of the target distribution and

is the probability density of a simple distribution, which can be easily sampled from. The limitation of important sampling is that an appropriate distribution of is required for an unbiased estimation, because the accuracy is dependent on the variance of weights (Neal, 1998). To collect more samples with larger weights, GIS attempts to sample independent blocks of points from the important regions of , with each block containing at least one or two heavily weighted samples.

44

The procedure of GIS is demonstrated in the figure 3.1. The basic idea is to greedy search the important regions of the target distribution P by sampling from a simple proposal distribution. Starting with independent sampling from a given proposal distribution Q, GIS expands each individual point to a block of points step by step along the ascending direction of | | until a local maximum has been reached or a certain number of steps has been taken. A distribution with the probability density proportional to | | is called the optimal proposal distribution, which minimizes the variance of estimation (Rubinstein, 1981; Evans, 1991; Schuurmans and Southey,

2000, 2001). The final samples are composed of the ascending samples from all blocks. Then the expectation value can be calculated by assembling all the weighted samples. The key question is how to weigh the samples. An auxiliary weighting method is introduced in GIS, in which represents a block initiated by and is one of the successors in its block. An indicator is defined such that if , otherwise

. For each in the block of , its weight ( ) is defined as

( ) ⁄ . The associated weights are relatively arbitrary except that they must satisfy ∑ . This means that the summation of all incoming

-weights from different blocks to each particular destination point must equal 1.

This requirement is necessary because the same may appear in different blocks in a discrete model space (when different search paths merge or collide) and multiple counting the same sample in the Monte Carlo integral results in biased estimation. The above workflow is also applicable in a high dimensional continuous space.

To avoid correction for compressing or dilating transformation, additional requirements of searching along axis-parallel directions and fixed size steps need to be satisfied

(Shuurmans and Southey, 2000). More specifically, for each initial point of mi in a n- dimensional Euclidean space, we conduct the search among 2n neighbors of mi by 45

perturbing mi along only one axis at one time with a fixed step length. For instance, starting from an initial point located at [u,v], we compare the posterior probability at its neighbors: [u+∆u,v], [u,v+∆v], [u-∆u,v], [u,v-∆v] and choose the one with the highest probability as our next starting point . Furthermore, the -weight can be simplified as 1, because the search paths seldom merge or collide in a continuous model space with large dimensions. The advantage of GIS is that even if Q misses the high probability regions of P, the weighted samples from Q still will be able to demonstrate a “fair” representation of P.

Shuurmans and Southey (2000) applied GIS for conducting Monte Carlo inference in several cases and demonstrated that GIS yields unbiased estimates (bias up to sampling error) independent of prior distribution.

46

Step 1: draw samples m independently from Q .

Step 2: For each mi , let mi,1  mi . Compute block {mi,1,mi,2,...,mi,n}by taking

local search steps in the direction of maximum | f (m)P(m) |until a local maximum

or n 1steps.

Step 3: Create the final sample from the blocks of points

m1,1,...,m1,n ,m2,1,...,m2,n ,...,mq,1,...,mq,n .

Step 4: Assign each point m j  Bi with a weight

wi (mj )  P(m j )i, j /Q(mi ) ,

where mi is the initial point of block Bi , m j is one of the successors in its block and

ij is relative arbitrary except they must satisfy  ij Iij  1 mi M

with Iij 1 if m j  Bi and I ij  0 if m j  Bi

Step 5: Estimate the expectation value of f (m) by assembling all the weighted

samples together. 1 n k E f (m)    f (mij )wi (m j ) . n i1j1

Figure 3.1: Workflow of greedy importance sampling (Schuurmans and Southey, 2000).

47

To visualize the search strategy of GIS, two examples are demonstrated here. The first example is to estimate the expectation value and standard deviation of a 1D Gaussian distribution (as target distribution) with µ=0 and σ=3 by sampling from a uniform prior distribution. The steps of GIS for a Gaussian distribution using representative points of m1, m2, m3 and m4 are illustrated in figure 3.2. At first, one hundred independent samples

are generated from the uniform distribution. The greedy search strategy is then applied, but with a modified proposal distribution. Instead of searching for an increased

| | used by Shuurmans and Southey (2000, 2001) for a minimized variance of estimation, the search here is done along the ascending direction of . The purpose of this design is to draw important samples from the target distribution because a fair representation of the variance of the target distribution is desired in this research. The histogram of final samples drawn using GIS is compared with the target distribution and the result from VFSA (fig. 3.3). As expected, the distribution sampled using GIS is very close to the posterior distribution and flatter than that sampled using VFSA, which is biased towards the MAP point. Both of the expectation values estimated from GIS and VFSA are very close to the true mean of 0 with less than 4% of biasness. However, VFSA under estimated the standard deviation of target distribution (estimation of 0.78 compared with the true standard deviation of 3) due to the continuous change of prior distribution (getting sharper and shaper towards the MAP point). The uncertainty value estimated from GIS is 2.82 while the true standard deviation is 3.

48

Figure 3.2: An example of sampling a Gaussian distribution using GIS: a) draw samples m1, m2, …, m100 independently from a uniform prior distribution; b), c) and d) expand each individual point mi to a block of points {mi,1, mi,2, mi,3,…} with mi,1=mi by taking step size of 0.1 and climbing 100 steps until the local maximum of the probability density has been reached, figure b) is associated with the point of m1 , figure c) with m2 and m3, figure d) with m4. A similar process would be applied to other initial samples of m5, …, m100.

49

Figure 3.3: a) Gaussian with µ=0 and σ=3; b) histogram of samples drawn using multiple VFSA with a mean of 0.036 and standard deviation of 0.78; c) histogram of samples drawn using GIS with mean of -0.036 and standard deviation of 2.82.

Because many geophysical inverse problems involve searching for the global minimum of a misfit function, the second example is designed for this purpose. A 2D Ackley function with one global minimum at [0, 0] and many local minima widely distributed within the domain, is investigated as the misfit function in the second example. The surface of the Ackley function within ([-10, 10], [-10, 10]) is color mapped in each subplot of fig. 3.4. To avoid being trapped in one of the local minima, a large step size (∆v=∆u=1) is taken here. The footprint of solid blue dots in each subplot of figure 3.4 demonstrates the search path. The starting point (colored in red and outlined with a black rectangle) of each subplot is drawn from a uniform distribution of ([-10, 10], [-10, 10]). At each step, the misfit of the current sample is compared with those of its four neighbors: ([u+∆u,v], [u,v+∆v], [u-∆u,v], [u,v-∆v]), and the smallest one is accepted as the next starting point. The search continues until a local maximum of PPD is reached. The estimated global minimum from GIS is located at [0.0069, -0.0191].

50

Figure 3.4: Each initial point (red point outlined with black rectangular) drawn from uniform distribution is expanded to a block of points shown by blue dots by downhill movement along axis parallel directions with a fixed step length of 1.

Although GIS is independent of a prior probability distribution, the locations of the starting points for use in greedy search are essential for fast convergence and accurate estimation, especially in a high dimensional problem. Sampling from a uniform distribution is perhaps logical, but it is time consuming to generate blocks with large samples considering forward modeling is required at each step. An efficient way is to start from a set of points with large weights, which are around the important region of target distribution but they are far away from each other. This can be achieved by applying multiple threads of VFSA (MVFSA) starting at different temperatures with a small number of iterations each. This new approach of MVFSA initiated GIS is introduced in the section below.

51

3.3 METHODOLOGY: GREEDY ANNEALED IMPORTANCE SAMPLING (GAIS)

Here I propose a new stochastic inference method called GAIS (Xue et al., 2011) based on a modification of GIS, which is designed for optimal balance between computational efficiency and accuracy of estimation. Modifications are proposed mainly on the following two aspects: 1) How to sample the initial point of each block for an efficient estimation? By taking a small step, GIS can either easily get trapped in a local minimum or spend too much time on search. Approximately locating the important regions of the posterior distribution is necessary for a fast convergence and unbiased estimation, because the accuracy of estimation is dependent on the variance of normalized weights. Therefore, multiple VFSA (MVFSA) threads are applied and the end point of each VFSA thread is used as an initial point for greedy search. However, if a large number of iterations is taken for each VFSA thread such that it converges to a very good solution, greedy search will no longer be needed. To gain variability of the initial samples and avoid under estimating the uncertainty, a small number (hundreds) of iterations should be taken for each VFSA thread. Furthermore, the starting temperatures for MVFSA should be drawn uniformly for use in different VFSA threads because temperature can be considered as a hyper-parameter (parameter of a prior distribution) in a Bayesian framework of VFSA. 2) How to avoid biased estimation of uncertainty? For a fair representation of the shape of the PPD, samples need to cover important regions of the target distribution as much as possible. Therefore, the proposal distribution used for the greedy search here is the target distribution of In other words, local search steps are taken in the direction of ascending instead of ascending | | which is used by the classic GIS (Schuurmans and Southey, 2000, 2001). In addition, to avoid the search stops too early due to possible good starting points located by MVFSA, which are too close to the global 52

optimum or to some local optima, the number of steps is fixed to cover larger prospective regions. More specifically, GAIS specifies a reliable range of the model space at first (e.g. the values of elastic properties should be larger than zero and smaller than certain boundaries) and then conducts greedy search up to the edges of the specified range. The same strategy of “fixed size of grid walk” is applied but with a “forced moving policy”, in which we only compare the posterior probability density of all neighbors of the current sample (not including the current sample itself) and accept the one with the highest probability density as the next point. In other words, if the current sample is the local optimum, “downhill movement” is allowed (or accepted) to enable a continued search until a certain number of steps have been reached. In the end, all the accepted samples obtained from the greedy search are evaluated in the estimation of the expectation value and uncertainty estimation.

3.4 APPLICATION OF GAIS TO SEISMIC INVERSION

Seismic inversion involves setting model parameters, forward modeling, defining the objective function, and adjustment of inversion parameters. In this thesis, the model space of reservoir elastic properties is parameterized by traces of compressional wave (P) impedance and shear wave (S) impedance. Within each trace, the impedance layers have the same sampling interval as the seismic data, which is usually 2ms. The forward modeling to generate a seismic trace is consists of two steps: 1. calculation of the reflection coefficient R at each layer boundary from the impedance model; 2. convolution of R with the seismic wavelet w. The reflection coefficients for post-stack data (eqn. 3.1) and for angle dependent pre-stack data (eqn. 3.2) are computed by using Fatti’s approximation (Fatti et al.,1994) given below:

53

Zp  Zp R  i1 i , (3.1) Zpi1  Zpi 2  Zs  2  i  2 Rpp(θ)  (1 ( ))[ln Zpi1  ln Zpi ]/ 2  4  sin  [ln Zsi1  ln Zsi ]  Zpi  (3.2) 2 tan 2 ( )  Zs   ρ    2 i  sin 2   , 2  Zp  ρ   i  

where Zpi, and Zsi are P and S impedances respectively at the ith interface, ρ is the density, and θ is the reflection angle. Seismic wiggles are generated by the convolution of the reflection coefficient with the wavelet w extracted from the seismic data: S  R  w . (3.3) The objective functions for both post-stack and pre-stack inversion are normalized

L2 norm of the difference between the synthetic seismic traces and observed seismic traces (eqn. 3.4) as follows (Srivastava and Sen, 2010):

2 (Sobs  Ssyn ) d  i, j . (3.4)    (S  S )2  (S  S )2   obs syn  obs syn   i, j i, j 

Application of GAIS in seismic inversion involves setting of the cooling schedule, the number of iterations per VFSA, the number of VFSA threads, the step length and the number of steps for greedy search. Based on testing of seismic inversion with synthetic and real data sets, 50 to 100 threads of VFSA with 200 to 500 iterations per each thread are chosen to sample the starting points before the greedy search. The cooling schedule is mainly dependent on the noise in the seismic data. In practice, the adjustment of cooling schedule can be made by quality control at the well location, if well logs are available.

54

The next issue is how to calculate the weights defined in

( ) ⁄ . The prior distribution of Q is constructed by (sampled from MVFSA). This distribution can be considered uniform, because a small number of iteration is taken by each VFSA thread and their starting temperatures (hyper-parameter of the prior distribution) are drawn from a uniform distribution. Although the exact shape of the PPD is unknown, it is known that the PPD is proportional to the Gibb’s pdf at the temperature of T = 1 (Tarantola, 2005) with the analytical form of | | , where | | is the likelihood, is the misfit and is the prior probability density of models. A temporary weight ( ) can then be calculated if we substitute with : ( ) ⁄ . Considering that the summation of all final weights must equal one, a normalization constant of needs to be applied: , in which can be approximated by summing all the temporary weights together:

∑ (Neal, 1998). The associated weights are relatively arbitrary except that they must satisfy ∑ . This means that the summation of all incoming -

weights from different blocks to each particular destination point must equal 1, because multiple counting of the same point in the Marte Carlo integral results in biased estimation. Considering that there is no discretization of model space made here (although finite number of samples are generate for evaluation of the integral) and the search paths from different blocks seldom collide or merge in a high dimensional inversion problem, the -weight can be ignored (or treated as one) The GAIS is first tested using a synthetic seismic trace and then applied to estimate reservoir elastic properties as well as their uncertainties using both post- and pre-stack seismic data obtained from Hampson Russell Strata demo dataset. A fixed step length of 50m/ss*g/cc for compressional wave (P) impedance and 30m/ss*g/cc for shear wave (S) impedance are set for the greedy search here considering the fact that the grid 55

size should be comparable to the model resolution, which is dependent on the seismic resolution and signal to noise ratio.

3.5 SYNTHETIC TEST OF GAIS IN SEISMIC INVERSION

The GAIS is applied to estimate the elastic impedance models and their uncertainties from a synthetic seismic trace. The synthetic post-stack seismic trace with TWT of 72ms (fig. 3.5) is generated by the convolution of the reflection coefficient with a Ricker wavelet, with a peak frequency of 30Hz and a sampling interval of 2ms.

0

10

20

30

40 TWT:[ms]

50

60

70 0 2 4 6

Figure 3.5: Synthetic post-stack seismic trace generated by convolution of reflectivity with a Ricker wavelet (a central frequency of 30Hz and a sampling interval of 2ms).

56

The workflow of GAIS in a trace-based inversion of post-stack data is listed below: 1. Generate initial impedance model by smoothing the reference model for a low frequency trend. 2. Apply 100 threads of VFSA with different starting temperatures, which are drawn from a uniform distribution ranging from 1 degree to 100 degrees.

3. Starting from the models after 300 iterations of MVFSA, greedy search

| | along the ascending direction of the probability density ( by taking 100 steps with a fixed step length of 50 [g/cc*m/s] along axis parallel directions. 4. Weigh each model along (probability) ascending directions based on exp( | E |) i and sum all the weighted models together to estimate the 101*100 exp( | Ei |) i1

expected impedance model. The results from GAIS are compared with the results from 5000 iterations of VFSA, including comparison of expectation value of P impedance (fig. 3.6a), normalized standard deviation of marginal posterior distribution (fig. 3.6b), residuals of synthetic seismic traces (fig. 3.7) and histograms of sampled impedance models at selected layers (fig. 3.8 for GAIS and fig. 3.9 for VFSA). In each histogram, the estimated posterior mean of the impedance model is marked in green, while the reference impedance model is marked in red color. The estimation of the expectation value from GAIS is very close to the reference model and is more accurate than using VFSA alone. The asymmetric property of marginal PPD of individual model parameter is demonstrated in the histogram of GAIS (fig. 3.8), in which the expectation value does not necessarily locate 57

at the peak of distribution. The histograms of VFSA have relative sharp peaks, where heavy sampling occurs. This leads to under estimation of uncertainty using VFSA. The normalized standard deviation estimated from VFSA (10%) is less than that from GAIS (15%).

Initial VFSA VFSA GAIS 10 GAIS 10 Well

20 20

30 30

40 40

TWT:[ms] TWT:[ms]

50 50

60 60

70 70 5000 5500 6000 0.05 0.1 0.15 0.2 Zp:[m/s*g/cc]  Zp/Zp

a) b)

Figure 3.6: a) Estimated expectation value of P impedance derived from GAIS (blue) compared with the result from VFSA (green), well log (red) and initial model (magenta); b) relative standard deviation (normalized by posterior mean) sampled from GAIS (blue) and from VFSA (green).

58

Observed VFSA GAIS VFSA GAIS 0 0 0 0 0

10 10 10 10 10

20 20 20 20 20

30 30 30 30 30

TWT:[ms] 40 40 40 40 40

50 50 50 50 50

60 60 60 60 60

70 70 70 70 70 0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6

a) b) c) d) e)

Figure 3.7: a) Reference seismic trace; b) synthetic seismic trace generated from the posterior mean of VFSA models; c) synthetic seismic trace generated from the posterior mean of GAIS models; d) residuals of synthetic seismic derived from VFSA; e) residuals of synthetic seismic derived from GAIS.

59

2000 2000

1000 1000

0 0 3000 4000 5000 6000 7000 3000 4000 5000 6000 7000

2000 2000

1000 1000

0 0 3000 4000 5000 6000 7000 3000 4000 5000 6000 7000

2000 2000

1000 1000

0 0 3000 4000 5000 6000 7000 3000 4000 5000 6000 7000

Figure 3.8: Histograms of sampled impedance models by GAIS at selected layers with the green lines indicating the posterior mean of estimation and the red lines indicating the reference impedance model.

60

2000 2000

1000 1000

0 0 4000 5000 6000 7000 4000 5000 6000 7000

2000 2000

1000 1000

0 0 4000 5000 6000 7000 4000 5000 6000 7000

2000 2000

1000 1000

0 0 4000 5000 6000 7000 4000 5000 6000 7000

Figure 3.9: Histograms of sampled impedance models by VFSA at selected layers (the same as in fig. 3.8) with the green lines indicating the posterior mean of estimation and the red lines indicating the reference impedance model.

61

3.6 INVERSION OF POST-STACK SEISMIC DATA

Post-stack seismic data along a 2D line with 119 traces and a depth in TWT ranging from 980ms to 1102ms (fig. 3.10), given as demonstration data set in Hampson and Russell software (HRS) STRATA module, is investigated to estimate elastic impedance profile using GAIS. The seismic data was sampled at 2ms. The wavelet extracted from HRS is shown in fig. 3.11 with a frequency band ranging from 10 to 90 Hz and a dominant frequency of 35Hz (fig 3.12). The initial model of the P impedance profile (fig. 3.13a) was generated by HRS based on the picked horizons and extrapolation of smoothed well logs away from this seismic line.

Trace number 0 20 40 60 80 100 120 980

1000

1020

1040 TWT(ms)

1060

1080

1100

1 km

Figure 3.10: Post-stack seismic data along a 2D line from HRS demo dataset STRATA module.

62

Figure 3.11: Extracted wavelet of seismic data from HRS

Figure 3.12: Frequency spectrum of seismic data generated from HRS

63

Trace-by-trace inversion using GAIS is conducted following a workflow demonstrated in section 3.5. The expectation value of the P impedance profile estimated from GAIS (fig. 3.14b) is compared with that derived from 1000 iterations of VFSA (fig. 3.14a) and the deterministic solution estimated from HRS (fig. 3.13b). Estimation from VFSA is quite noisy and is inconvenient for structural interpretation. Estimation from GAIS is able to provide clearer structure than VFSA and demonstrates improved lateral continuity and higher resolution than the results from HRS.

The standard deviation of posterior P impedance models drawn from GAIS at each trace is normalized by its posterior mean and further compared with that obtained from VFSA (fig. 3.15). Uncertainty values estimated (presented by the normalized standard deviation) by VFSA have an average of about 10-15% in P-impedance, while uncertainty estimation using GAIS have an average about 20-30% uncertainty of P- impedance. The synthetic seismic traces derived from GAIS and their residuals subtracted from the observed seismic are shown in fig. 3.16.

64

1 km

a) b)

Figure 3.13: a) Initial P impedance model; b) inverted P impedance profile from HRS strata.

65

1 km

a) b)

Figure 3.14: Estimated expectation value of P impedance profile from VFSA (a) and from GAIS (b).

66

a) b)

Figure 3.15: Normalized standard deviation: σ(Zp)/Zp of posterior impedance maps drawn from VFSA (a) and from GAIS (b).

67

1 km a) b)

Figure 3.16: a) Synthetic 2D post-stack seismic derived from expected impedance model using GAIS; b) residuals of synthetic seismic derived from GAIS subtracted from the observation.

68

3.7 INVERSION OF PRE-STACK SEISMIC DATA

Another demonstration data set (pre-stack angle gather along another 2D line) from Hampson and Russell software AVO module, is investigated to estimate both P and S impedance profiles using GAIS. Pre-stack seismic data has angle gathers ranging from 3o to 24o (fig. 3.17) with a frequency band from 10Hz to 70Hz and a central frequency of 35Hz. Because little difference is observed between the near offset (3o to 15o) and far offset (15o to 24o) wavelets extracted from HRS (fig. 3.19), only the near offset wavelet is used for inversion of all angle gathers. Initial models of P and S impedance profiles were generated by HRS based on the picked horizons and extrapolation of smoothed well logs. The well is located at the cross line 71 in fig. 3.17 (corresponding to the trace number of 40 in fig. 3.18). The top and the bottom of the gas reservoir are marked in the logs (fig. 3.20 with the TWT ranging from 628ms and 634ms) demonstrating low P impedance, low density, low gamma ray, high S impedance, high resistivity and a Vp/Vs ratio of 1.6. The depth (TWT) of the gas layer marked in the logs is the same as the depth (TWT) of the bright spot located in the seismic (fig. 3.18).

69

30 m

Figure 3.17: Pre-stack angle gather from HRS demo data in AVO module.

1 km

Figure 3.18: Pre-stack seismic data with the angle of 24 degree along a 2D line from HRS demo data. The well log of P impedance marked as red curve in figure 3.17 is located at the trace number of 40 in the figure 3.18. 70

Figure 3.19: Extracted wavelets (top) and their frequency spectrum (bottom) of near offset 3o -15o (red) and far offset 15o-24o (blue) from HRS.

71

Figure 3.20: The well logs of P-, S-impedance (calculated from HRS based on Castagna equation and Gassmann fluid substitution), density, gamma ray, resistivity and Vp/Vs ratio with top and base of the gas layer marked from 628ms and 634ms in TWT.

Before inversion along the entire 2D line, inversion of angle gathers at the well location are investigated at first for quality control. A workflow similar to that shown in the section 3.5 is employed but with an angle dependent seismic forward modeling, i.e., using Fatti’s approximation (eqn. 3.2). The model space is parameterized by P-, S- impedance and density. At each trace location, the observation includes 8 angle- dependent traces. The objective function is the L2 norm of summation of misfits defined in eqn. 3.4. Compared with inversion of post-stack data, the degree of freedom is reduced because the number of observation increases more than the number of unknown model parameters increases. Furthermore, after MVFSA, density models are not updated in the greedy search process to avoid expensive computation cost, because seismic traces of

72

near offset are usually insensitive to the density contrast, and inverted density models are usually not reliable due to the low signal to noise ratio of far offset data. The expected P-, S-impedance and their normalized standard deviations estimated from GAIS are compared with those derived from VFSA after 1000 iterations and the well logs (fig. 3.21 and fig. 3.22). The results from GAIS match the well logs very well. Although the results from VFSA are able to catch the major trend of logs, unrealistic high frequency components are significant, especially in the P-impedance models. The uncertainty values estimated from both P- and S- impedance models using GAIS is larger than those obtained by VFSA in most of the domain. The synthetic pre-stack seismic derived from GAIS are compared with observations at each angle and their residuals are plotted together with the observed seismic in fig. 3.23.

73

a) b)

Figure 3.21: Comparison of initial model (magenta), estimated expectation value of P impedance model (a) and S impedance model (b) from VFSA (green) and from GAIS (blue) with the well logs (red). The gas layer is marked by the black dotted line.

74

a) b)

Figure 3.22: Normalized standard deviation of posterior P impedance (a) and S impedance models (b) from VFSA (green) and GAIS (blue).

75

a) b) c)

Figure 3.23: a) Observed pre-stack seismic data at the well location; b) synthetic pre- stack seismic from GAIS at the well location; c) residuals between synthetic seismic and observed seismic.

After quality control, trace-by-trace inversion using GAIS is conducted along the seismic line with the initial models shown in fig. 3.24. The expectation value of P-, S- impedances, and Vp/Vs ratio derived from GAIS (fig. 3.26b, fig. 3.27b and fig. 3.28b) are compared with those derived from VFSA after 1000 iterations (fig. 3.26a, fig. 3.27a, and fig. 3.28a) and the deterministic solutions calculated from HRS (fig. 3.25). Similar to post-stack inversion, GAIS is able to get rid of unrealistic high frequency components (compared with VFSA) and provide clearer structure with improved lateral continuity and higher resolution (compared with HRS). The normalized uncertainties of P- and S-impedance along the 2D profile are compared with these derived from VFSA (fig. 3.29 and fig. 3.30). Compared to the post- stack inversion, the uncertainties in the resulting impedance decrease in the pre-stack inversion. Within the pre-stack inversion, S-impedance models have larger normalized uncertainties than P-impedance models. More specifically, P-impedance models sampled

76

from VFSA have around 5% uncertainty, while P-impedance models sampled from GAIS have around 10% uncertainty. Uncertainty values estimated by VFSA are about 5% in P- impedance and 5-10% in S-impedance, while uncertainty values estimated by GAIS are around 10% in P-impedance and 10-15% in S-impedance. In the end, the synthetic pre-stack seismic at the angle of 15 degree derived from GAIS is compared with the observed seismic with its residual plotted in fig 3.31.

1 km

a) b)

Figure 3.24: a) Initial P impedance model along 2D line; b) initial S impedance model along 2D line.

77

550 8000 550 4000

7500 3500

600 7000 600

3000 6500

6000

2500 TWT:[ms] TWT:[ms] 650 650

5500

2000 5000 700 700

4500 1500 20 40 60 80 100 20 40 60 80 100 Traces Traces a) b)

550 2.5

2.4

2.3

600 2.2

2.1

2

TWT:[ms] 650 1.9

1.8

1.7 700

1.6 20 40 60 80 100 Traces 1 km

c)

Figure 3.25: a) Inverted P impedance profile; b) inverted S impedance profile; and c) inverted Vp/Vs ratio by HRS STRATA. The well is located at the trace 40 (marked by the black dotted line) with the gas layer marked in red.

78

550 8000 550 8000

7500 7500

600 7000 600 7000

6500 6500

6000 6000 TWT:[ms] 650 TWT:[ms] 650

5500 5500

5000 5000 700 700 4500 4500 20 40 60 80 100 20 40 60 80 100 Traces Traces

1 km a) b)

Figure 3.26: Estimated expectation value of P impedance model from a) VFSA; from b) GAIS. The well is located at the trace 40 (marked by the black dotted line) with the gas layer marked in red.

79

550 4000 550 4000

3500 3500

600 600

3000 3000

2500 2500 TWT:[ms] TWT:[ms] 650 650

2000 2000

700 700 1500 1500 20 40 60 80 100 20 40 60 80 100 Traces Traces 1 km

a) b)

Figure 3.27: Estimated expectation value of S impedance model from a) VFSA; from b) GAIS. The well is located at the trace 40 (marked by the black dotted line) with the gas layer marked in red.

80

550 2.5 550 2.5

2.4 2.4

2.3 2.3

600 2.2 600 2.2

2.1 2.1

2 2 TWT:[ms] TWT:[ms] 650 650 1.9 1.9

1.8 1.8

1.7 1.7 700 700 1.6 1.6 20 40 60 80 100 20 40 60 80 100 Traces Traces 1 km

a) b)

Figure 3.28: Estimated expectation value of Vp/Vs ratio from a) VFSA; from b) GAIS. The well is located at the trace 40 (marked by the black dotted line) with the gas layer marked in red.

81

a) b)

Figure 3.29: Normalized standard deviation: σ(Zp)/Zp of posterior P impedance maps drawn from VFSA (a) and from GAIS (b).

82

a) b)

Figure 3.30: Normalized standard deviation: σ(Zs)/Zs of posterior S impedance maps drawn from VFSA (a) and from GAIS (b).

83

1 km

a) b)

Figure 3.31: a) Synthetic 2D post-stack seismic derived from expected impedance model using GAIS; b) residuals of synthetic seismic subtracted from the observation.

84

3.8 DISCUSSIONS AND CONCLUSIONS

A new hybrid stochastic inversion method combining dependent Markov chain method (VFSA) and independent Monte Carlo method (GIS), named greedy annealed importance sampling (GAIS), is developed to sample the models from the posterior distribution, and estimate their uncertainties and expectation value. Both global and local optima in the important regions are considered in this new method for an optimized balance between computational efficiency and accuracy. The GAIS starts to seek important regions starting with models that are close to the important regions already located by MVFSA with a small number of iterations and explicitly explores the target distribution by grid-searching the high PPD regions with a fixed step length along axis parallel directions. GAIS is applied in trace-based seismic inversion of 1D synthetic post-stack data, and 2D field data (post-stack and pre-stack, given as HRS demo data sets) to sample the posterior distribution of impedance models. In the synthetic test, the expectation value of the P impedance model estimated from GAIS matches the reference model very well. The estimation from VFSA also follows the trend of the reference model; however, it appears noisy due to unrealistic high frequency components, which is essentially imposed by stochastic perturbations. The histograms of GAIS demonstrate that the marginal PPD of individual parameter is usually asymmetric and the expectation value is close to, but not necessary located at the MAP point. The same phenomena can be observed in the histograms of

VFSA as well. The posterior mean estimated from VFSA is close to the reference model; however, its accuracy is worse than GAIS and the samples are biased towards the MAP point.

85

Compared with VFSA, the accurate estimation of expectation value and improved uncertainty analysis from GAIS is benefitted from independent sampling of blocks - the grid-walk strategy with a fixed step length and the weighting scheme. Superior performance of GAIS compared to VFSA is also demonstrated in the test of real field data. In both post- and pre-stack inversion, the expected impedance profiles estimated from GAIS demonstrate clearer identified structures with higher resolution and a better lateral continuity than the results from VFSA and HRS. The noisy results derived from VFSA are mainly due to random perturbations without imposing any trace to trace correlation. Therefore, the estimation at one trace location may not be similar to its neighboring trace location. The stability of GAIS is, however, demonstrated by the smoothed impedance profile even from trace-based inversion. The normalized STD of P impedance posterior distribution sampled from GAIS is around 20-30% in post-stack inversion and around 7-12% in pre-stack inversion, because the degree of freedom is reduced by multiple angle-gathers in pre-stack data. The normalized STD of S impedance posterior distribution sampled from GAIS is larger than P impedance, around 10-15%, because S wave is only sensitive to the far offset with large angles. Generally speaking, the normalized STD from VFSA is about half of that obtained from GAIS.

Although the effect of anisotropy is not considered here, theoretically, GAIS can also be applied to this or even more complicated cases by replacing the forward modeling and adding more parameters. However, the applicability of GAIS is dependent on the number of parameters and the computational cost of each forward modeling because the objective functions of 2n neighborhood need to be calculated at each step of greedy search in n-dimensional Euclidean space. If a large number of model parameters and time

86

consuming forward modeling are presented, MVFSA with appropriate cooling schedule is recommended to approximate the posterior mean.

87

Chapter 4: Simultaneous stochastic seismic inversion using Principal Component Analysis (PCA)

4.1 INTRODUCTION

Lateral continuity plays an important role in stratigraphic interpretation of a depositional environment. However, traditional trace-based seismic inversion may not preserve the lateral continuity very well. The discontinuity in the inverted image may be due to inconsistency of data acquisition and data processing at different locations, as well as the inherent non-uniqueness problem of inversion. Multiple models may fit the measurements at one trace, and the inverted impedance model of one trace may differ from the neighboring models. Such laterally discontinuous earth models can result in incorrect stratigraphic interpretation and biased estimation of reservoir volume. To improve the lateral continuity, geological constraints are generally incorporated in a trace-by-trace inversion. Gelderblom and Leguijt (2010) introduced lateral continuity in a stochastic seismic inversion by using a conditional prior distribution, which was derived from the current model at neighboring traces, well logs and variograms. If an increase in the value of a model parameter at the current location has been accepted, an increase of model parameter at its neighboring locations is more likely to be accepted. Merletti et al. (2003) conducted trace-based geostatistical inversion by using a lateral variogram inferred from fluvial depositional environment and a vertical variogram inferred from the correlation of well logs. Although these geostatistics based inversion algorithms can improve lateral continuity in certain geological features, they may also damage some real discontinuous events, such as faults and folds.

88

An alternative approach to improve on lateral continuity is simultaneous inversion of traces from all surface locations along a 2D line or a 3D volume. Rather than estimating the elastic properties at one trace, we update elastic properties at all locations simultaneously to match the seismic profile. It involves optimization of a function with a large number of variables using a large data volume, which is a computationally challenging task. To enable the simultaneous inversion, an efficient parameterization method is needed to reduce the dimensionality of the model space. One popular parameterization method employed in reservoir modeling uses pilot points (Long et. al., 2009). This is essentially an up-scaling method using representative cells at only a few sparse locations (pilot points). However, it is not trivial to choose the locations of these pilot points and to recover the rock properties at other locations accurately. To circumvent these problems, we investigate dimension reduction in terms of an orthogonal transformation, called principal component analysis (PCA).

PCA has been successfully applied as an efficient parameterization tool in image recognition, compression (Kim, 2002), reservoir modeling (Echeverria and Mukerji, 2009) and history matching (Chen et al., 2012). Given a set of training images drawn from prior distribution, PCA linearly transforms them into a set of uncorrelated principal components. The number of principal components (PCs) is usually much smaller than that of the model parameters due to the strong correlation between the training images. Based on the PCs and the average of training images, the model space can be reconstructed using a linear combination of PCs. Application of PCA in seismic inversion brings a new concept to the inversion process (Xue et. al., 2013b). At first, thousands of training images containing 2D elastic properties are simulated to cover a large uncertainty range of the prior distribution. Then 89

their principal components (PCs) are calculated from the major eigenvectors of the covariance matrix of the mean subtracted from training images. Based on these PCs, which usually take over 80% of the total energy of the prior model space, we are able to reconstruct the posterior elastic profiles by only updating the weights associated with the PCs. The applicability of PCA based simultaneous inversion reported in this chapter is based on both post-stack and pre-stack seismic lines and compared to the results obtained with trace-by-trace inversion using the same data sets.

4.2 PRINCIPAL COMPONENT ANALYSIS

Similar to singular value decomposition (SVD), PCA uses a linear transformation to project a large number of correlated models (training images) into a set of orthonormal basis or uncorrelated principal components for better analysis of relationships among the original models (Kim, 2002). This provides with the opportunity to order dimensions along the directions with most variation. In this way, the original models can be reconstructed using fewer dimensions. Use of PCA as an efficient parameterization tool is achieved in two parts: de- correlation and reconstruction. In the first part, the eigenvectors and eigenvalues of the covariance matrix of zero-mean training images are calculated to identify the directions with large variances. In the second part, reconstruction of the original image is made by using a linear relationship among the important eigenvectors (with large variances) only. Let us, for example, consider PCA on image reconstruction. Given N number of training images with n number of cells (n = rows of image * columns of image), the steps of de-correlation can be summarized as follows (Kim, 2002):

90

First, each image is vectorized into a column of length n and N images are stored in a matrix of size n * N:

M  [m1,m2,...,mN ] . (4.1) N Then the mean image μ  mk / N is subtracted from each image vector as follows k 1

Yk  (mk  μ) . (4.2)

The covariance matrix C is calculated as

C  (YYT ) /(N 1) . (4.3)

Because n is usually much larger than N (n can be over four thousand for an image with

T T 64 cells), it is more practical to solve the eigenvectors of Y Y instead of YY . Let λi

T and ei be the eigenvectors and eigenvalues of the Y Y , respectively, i.e., T Y i  iei. (4.4)

By multiplying both sides by Y , and set W  [e1,e2,...eN ] and  is a diagonal matrix with diagonal components equal to λi: YYT YW  YW, (4.5) in which (YW)normalized and i are N-1 eigenvectors (The -1 comes from the subtraction of the mean image) and scaled eigenvalue of the matrix YYT , respectively. The eigenvectors are then sorted in the descending direction of their corresponding eigenvalues. A small number (nr) of eigenvectors (YW)normalized with large variances are selected as PCs based on the criteria defined in the following equation: nr N i  t i (4.6) i1 i1 in which the summation of their eigenvalues reaches the major part (the threshold value: t is usually larger than 80%) of the total energy. In this way, we project a high dimensional

91

model space to a subspace with less dimension and we are able to reconstruct the original model space with ignorable error using a linear combination of these PCs. An important study on the reconstruction of original models using PCs is demonstrated by Chen et al. (2012). At first, singular value decomposition is applied to the matrix YYT , T T 1/ 2 1/ 2 T YY  (YW)norm(YW)norm  (YW)norm  (YW)norm , (4.7)

Let p be a random vector with mean zero and the covariance matrix of an identity matrix, i. e., 1 I  pp T , (4.8) nr 1 then the eqn. 4.7 can be rewritten as: T T  1 1/ 2  1 1/ 2  YY   (YW )norm  p (YW )norm  p (4.9)  nr 1  nr 1  .

If we bring the eqn. 4.2 to the left side of eqn. 4.9, each training image can be approximately reconstructed as 1 1/ 2 mk  mˆ k  μ  (Yk W)normalized  p . (4.10) nr 1

The vector p associated with each training image can be considered a coefficient vector in the nr-dimensional subspace, which follows Gaussian normal distribution (eqn. 4.8). 1 1/ 2 The scaled eigenvector (Yk W)normalized  is called the principal component nr 1 in this thesis. Referring to the inverse problem of post-stack data with nl layers (a sampling interval of 2ms is considered as one layer) and nt traces, each zero-mean training image of P impedance profiles is formed as a vector mk with a total length of nl*nt. In the case

92

of pre-stack inversion, mk has a length (3*nl*nt) due to the tripled size of model parameters:

T mk = [ , , ] (4.11)

4.3 WORKFLOW OF SIMULTANEOUS INVERSION

Before introducing the workflow of simultaneous seismic inversion, three questions need to be considered, namely, how to sample the training images, how to reconstruct the elastic properties and how to perturb the model.

4.3.1 Sampling of training images

A training image in this thesis refers to elastic properties, including P-, S- impedance and density, along a 2D seismic profile or within a 3D volume. Training images should cover the full uncertainty range of possible elastic profiles given the observation. In some reported studies, various geostatistical approaches have been used to sample the training images, for instance, variogram based simulation using sequential

Gaussian simulation (Deutsch and Journel, 1998) and object based simulation containing conceptual geological patterns, such as channels and faults (Strebelle, 2000). To strike a balance between the required strong correlation between training images and maximum uncertainty coverage of all possible models, we use a trace-based stochastic inversion method to sample the training images by constructing a series of intermediated probability distributions. The fractal property of well logs, represented by the mean, the variance and the autocorrelation of logs, is applied to constrain the prior distribution. Due to the presence of cycles in sedimentary strata at different scales, most of the well logs demonstrate fractal behaviour (Hewett, 1986; Emanual et al., 1987; Hardy, 1992; Dimri, 2000, 2005; Browaeys and Fomel, 2009; Srivastava and Sen, 2009, 2010), or so called the self-affine 93

property (Mandelbrot, 1983). The self-affine fractal has the property that its energy spectrum is power law dependent on its frequency : . (4.12) The exponent value of is dependent on the Hurst coefficient H: (4.13) The Hurst coefficient quantifying the correlation of time series (Hurst, 1951) can be calculated by the R/S algorithm: log( R(T) / S(T)) H(T)  , (4.14) log(T) where R(T) and S(T) are the range of variations (maximum – minimum) and the variance over all the partial time series of length T (bin size), respectively. Most well logs have H values between 0.5 and 1, demonstrating long-term positive autocorrelation, which means that a high value in a given time series will likely have a high value nearby. The Hurst coefficient of the P impedance log is around 0.8 according to the fractal character analysis (fig. 4.1) by Srivastava and Sen (2010) using the same logs in the pre-stack seismic line. Interpolated (pseudo) well logs are used to estimate fractal properties away from the well locations, including mean, variance and the Hurst coefficient.

94

Figure 4.1: a) P impedance log at the along the TWT; b) spectral density of impedance logs; c) rescale range analysis to estimate Hurst coefficient. The slope of the best-fit grey line gives the Hurst coefficient around 0.82. Two lines with slopes of 0.5 and 1.0 show the theoretical limit of the Hurst coefficient. (copied from Srivastava and Sen, 2010)

95

The next step is to simulate multiple realizations of elastic properties honouring the fractal properties of (pseudo) logs. Assuming that the time series within a reservoir zone is stationary, the algorithm of the exact fractional Gaussian process (Caccia et al., 1997; Srivastava and Sen, 2009, 2010) is applied to generate fractal Gaussian noise (fGn) trace-by-trace along the seismic line. These fGn images are used as initial guess of impedance models in a trace-based stochastic inversion. The stochastic method used to construct the intermediate distributions from the fractal based prior distribution to the seismic data constrained posterior distribution is multiple independent threads of very fast simulated annealing (MVFSA), taking the advantages of its fast convergence and its large uncertainty coverage by using variable starting temperatures. More specifically, 10 threads of VFSA are conducted with different starting temperatures ranging from 10 to 100. From each VFSA thread, 100 samples are picked among the 1000 iterations. In the end, we have in total 1000 training images, which are sampled from 10 threads of trace-based inversion using fractal-based initial guesses (100 samples per VFSA * 10 VFSA threads).

4.3.2 Reconstruction of the original model space

To reconstruct the elastic properties along the seismic line, a linear combination of PCs is applied (eqn. 4.15 for post-stack inversion and eqn. 4.16 for pre-stack inversion), in which Zp0 , Zs0, 0 are the average of P-, S- impedance and density from the training images and p is the coefficients vector, which will be updated to match the observed seismic traces at all surface locations.

1 1/ 2 Zp  Zp 0  (YW )normalized  p (4.15) nr 1

96

 Zp  Zp 0      1 1/ 2  Zs    Zs 0   (YW )normalized  p . (4.16)     nr 1  ρ   ρ0 

4.3.3 Model Perturbation

In the PCA based inversion, we do not perturb the elastic models directly, but only through updating the coefficient vector of p. The prior distribution of p is sampled from a Gaussian distribution. Then p is iteratively updated to look for the best fit of the 2D seismic profile, or the global/local minimum of objective function. The objective functions for both post-stack and pre-stack inversion are normalized

L2 norm of summed differences between the synthetic 2D seismic profile and the observed seismic profile (eqn. 4.17). The difference of the objective function between trace-based inversion and simultaneous inversion is that for the former, we only sum the residuals of one trace, while for the latter residuals of all traces are summed together, i.e.,

2  (Sobs  Ssyn ) E  i, j . (4.17)    (S  S )2  (S  S )2    obs syn  obs syn  i, j i, j 

The GAIS is applied to estimate the expected image of elastic properties along the 2D line by perturbing the coefficient vector p. The workflow of GAIS in PCA based simultaneous inversion is listed below:

1. Sample an initial coefficient vector p from a Gaussian distribution. 2. Apply 20 threads of VFSA with different starting temperatures, uniformly drawn from 10 degree to 100 degrees.

97

3. Starting with the coefficient models p after 500 iterations of MVFSA, greedy search along the ascending direction of the probability density

| | ( , in which E is calculated from eqn. 4.15 to 4.17) by taking 40 steps with a fixed step length of 0.05 along all possible directions defined by the PCs. 4. Reconstruct profiles of elastic properties (P-, S- impedance and density), exp( | E |) weigh each elastic profile based on i and sum all the 21*40 exp( | Ei |) i1

weighted elastic models together to estimate the expectation value of the elastic profile. Unlike the trace-based GAIS introduced in the chapter 3, the PCA based greedy search is conducted in a simplified subspace with much smaller dimension and the search directions are orthonormal. Therefore, a small number of threads and steps are applied here to reduce the computational cost.

4.3.4 PCA based simultaneous inversion

The workflow of the PCA based simultaneous inversion using an example of a 2D seismic line is described in fig. 4.2, which can be also applied to 3D seismic volumes. At first, trace-based inversion for simulation of training images is conducted with ten parallel VFSA threads with different starting temperatures ranging from 10o to 100o. Fractal initial models (Srivastava and Sen, 2009, 2010) are employed to extend the frequency band of the impedance models and gain correlation of model space by using fractal properties (mean, variance and Hurst coefficient) as constraints. One hundred training images of an elastic profile are selected from one thousand iterations of each

98

VFSA thread. In total, one thousand training images of elastic profile are simulated from ten VFSA threads. After investigation of PCA on the training images, a small number of PCs with large eigenvalues are selected to represent the original model space. In the end, the GAIS is applied to update the 2D elastic profile simultaneously through randomly perturbing the coefficient vector associated with the PCs to estimate the expectation value of the elastic profiles.

Figure 4.2: Workflow of PCA based simultaneous inversion along a 2D line using GAIS.

99

4.4 SIMULTANEOUS INVERSION OF POST-STACK SEISMIC DATA

The same post-stack data as in the trace-based inversion (refer to the fig. 3.10 in the section of 3.6) is investigated in the simultaneous inversion for comparison purposes. According to the workflow (fig. 4.2), MVFSA threads with fractal initial models are applied to simulate training images at first. Some randomly selected training images of P impedance models are shown in fig. 4.3. In the next step, PCA is applied on these 1000 training images. Singular value decomposition is applied on the covariance matrix: YT Y with a dimension of (1000,1000). From the energy plot (fig. 4.4), we notice that the first 200 eigenvectors contain more than 90% of the total energy (summation of the eigenvalues). Based on these 200 eigenvectors the principal components (PCs) are calculated using eqn. 4.10. Several PCs of P impedance are illustrated in fig. 4.5. The first PC has the largest variance and the variance decreases dramatically because their corresponding eigenvalues reduce gradually.

In the end, PCA based GAIS is applied to estimate the expected P impedance profile based on the same workflow in the section 4.3.3. We are able to match the observed seismic line by only perturbing the coefficient vector p associated with these 200 PCs (fig. 4.8). The estimated expectation value of P impedance profile using PCA based GAIS is compared with trace-based GAIS (fig. 4.6). The normalized uncertainty of the P impedance profile estimated from simultaneous inversion is shown in fig. 4.7.

100

4 4 x 10 x 10 1.4 1.4 1000 1000 1.2 1.2 1050 1 1050 1

0.8 0.8

TWT:[ms] TWT:[ms] 1100 0.6 1100 0.6 20 40 60 80100 20 40 60 80 100

Traces 4 Traces 4 x 10 x 10 1.4 1.4 1000 1000 1.2 1.2 1050 1 1050 1

0.8 0.8

TWT:[ms] TWT:[ms] 1100 0.6 1100 0.6 20 40 60 80100 20 40 60 80 100

Traces 4 Traces 4 x 10 x 10 1.4 1.4 1000 1000 1.2 1.2 1050 1 1050 1

0.8 0.8

TWT:[ms] TWT:[ms] 1100 0.6 1100 0.6 20 40 60 80100 20 40 60 80 100 Traces Traces

Figure 4.3: Randomly picked training images of P impedance profile along the post- stack seismic line.

101

12 x 10

5

4

3

Eigenvalues 2

1

0 200 400 600 800 1000 Number of eigenvectors

Figure 4.4: Energy plot of initial model space constructed by the 1000 training images of P impedance profile along the post-stack seismic line with most energy (>90%) contained in the first 200 Eigenvectors.

102

The 1st PC of P impedance The 5th PC of P impedance 1000 1000 1000 1000

1050 0 1050 0 TWT:[ms] TWT:[ms] 1100 -1000 1100 -1000 20 40 6080100 2040 60 80100 Traces Traces The 10th PC of P impedance The 15th PC of P impedance 1000 1000 1000 1000

1050 0 1050 0 TWT:[ms] TWT:[ms] 1100 -1000 1100 -1000 20 40 6080100 2040 60 80100 Traces Traces The 20th PC of P impedance The 25th PC of P impedance 1000 1000 1000 1000

1050 0 1050 0 TWT:[ms] TWT:[ms] 1100 -1000 1100 -1000 20 40 6080100 2040 60 80100 Traces Traces

Figure 4.5 Selected principal components of P impedance profile along the post-stack seismic line.

103

1 km a) b)

Figure 4.6: Estimated expectation value of P impedance profile by a) trace-by-trace inversion using GAIS (left); b) simultaneous inversion using PCA based GAIS (right). Improved lateral continuity is demonstrated in the zones marked by black ellipse.

104

Figure 4.7: Normalized standard deviation: σ(Zp)/Zp of posterior P impedance maps drawn from simultaneous inversion using PCA based GAIS.

105

1 km a) b)

Figure 4.8: a) Synthetic seismic section derived from the expectation value of impedance profile using PCA based GAIS; b) the residuals subtracted from the observed post-stack seismic data.

106

4.5 SIMULTANEOUS INVERSION OF PRE-STACK SEISMIC DATA

Here I use the same pre-stack data that I used in the trace-based inversion (refer to the fig. 3.17 and fig. 3.18 in the section of 3.7) in the simultaneous inversion. Trace- based inversion using GAIS already demonstrates promising lateral continuity (fig. 3.26 and fig. 3.27) due to the reduced uncertainty, which is benefitted from the increased number of observations (angle gathers) at each trace location. Therefore, we do not expect significant improvement of lateral continuity from simultaneous seismic inversion of pre-stack data. The major purpose of conducting this research is to test the applicability of PCA based GAIS when a large number of observations and unknown parameters are presented. Similar to the simultaneous inversion of post-stack data, MVFSA threads with fractal initial models are applied to simulate 1000 training images per each elastic property (P-, S- impedance and density profile) in the first step. Several randomly picked training images of P- and S- impedance profiles are shown in fig. 4.9 and fig. 4.10, respectively. Comparing with post-stack inversion, the training images in pre-stack inversion look more similar to each other, because the degree of freedom in pre-stack inversion is reduced considering the increased observations (angle dependent gathers) at each trace.

Unlike the post-stack inversion, each vectorized training image Yk in pre-stack inversion is constructed by putting the vectorized training image of different elastic properties together Yk=[ Yk_zp; Yk_zs; Yk_rho] (eqn. 4.11). More specifically, referring to eqn. 4.2, Y now is a matrix of size (24600 by1000), in which each vectoried image Yk has the length of 24600 coming from 82layers/trace*100traces*3elastic properties. PCA is still applicable on the oversized training image, because we apply singular value

107

decomposition on YT Y instead of YY T . We selected the first two hundred eigenvectors taking more than 80% of the total energy (fig. 4.11) to calculate the PCs using the eqn. 4.10. Several PCs of P- and S- impedance are illustrated in fig. 4.12 and fig. 4.13, respectively. As done in post-stack inversion, PCA based GAIS is applied to perturb the coefficient vector. We are able to match the synthetic 2D seismic profile with the observation (fig. 4.19) by only updating the coefficient vector. The estimated expectation values of elastic profiles of P-, S- impedance and P/S impedance ratio from simultaneous inversion using PCA based GAIS are compared with trace-based inversion using GAIS (fig. 4.14, fig. 4.15 and fig. 4.16). Higher resolution of impedance models with improved lateral continuity at certain thin layers are demonstrated from simultaneous inversion than from trace-based inversion, which increases our confidence in picking the fault and the gas zone (marked in the fig. 4.16). The normalized uncertainty images of P- and S- impedance profile estimated from simultaneous inversion are shown in fig. 4.17. Furthermore, the inverted P- and S- impedance models at the well location from simultaneous inversion are compared with the reference logs for quality control (fig. 4.18).

108

8000 8000 600 7000 600 7000

6000 6000 TWT:[ms] TWT:[ms] 700 5000 700 5000

20 40 60 80 100 20 40 60 80 100 Traces Traces 8000 8000 600 7000 600 7000

6000 6000 TWT:[ms] TWT:[ms] 700 5000 700 5000

20 40 60 80 100 20 40 60 80 100 Traces Traces 8000 8000 600 7000 600 7000

6000 6000 TWT:[ms] TWT:[ms] 700 5000 700 5000

20 40 60 80 100 20 40 60 80 100 Traces Traces

Figure 4.9: Randomly picked training images of P impedance along the pre-stack seismic line.

109

4000 4000 600 600

3000 3000 TWT:[ms] TWT:[ms] 2000 2000 700 700

20 40 60 80 100 20 40 60 80 100 Traces Traces 4000 4000 600 600

3000 3000 TWT:[ms] TWT:[ms] 2000 2000 700 700

20 40 60 80 100 20 40 60 80 100 Traces Traces 4000 4000 600 600

3000 3000 TWT:[ms] TWT:[ms] 2000 2000 700 700

20 40 60 80 100 20 40 60 80 100 Traces Traces

Figure 4.10: Randomly picked training images of S impedance along the pre-stack seismic line.

110

10 x 10

4

3.5

3

2.5

2

Eigenvalues 1.5

1

0.5

0 200 400 600 800 1000 Number of eigenvectors

Figure 4.11: Energy plot of initial model space constructed by the 1000 training images of elastic properties (P-, S- impedance and density) along pre-stack seismic line. The first 200 eigenvectors contain more than 80% of the total energy.

111

The 1st PC of P impedance The 5th PC of P impedance 500 500 600 600

0 0 TWT:[ms] TWT:[ms] 700 -500 700 -500 20 40 60 80 100 20 40 60 80 100 Traces Traces The 10th PC of P impedance The 15th PC of P impedance 500 500 600 600

0 0 TWT:[ms] TWT:[ms] 700 -500 700 -500 20 40 60 80 100 20 40 60 80 100 Traces Traces The 20th PC of P impedance The 25th PC of P impedance 500 500 600 600

0 0 TWT:[ms] TWT:[ms] 700 -500 700 -500 20 40 60 80 100 20 40 60 80 100 Traces Traces

Figure 4.12 Selected principal components of P impedance profile along pre-stack seismic line

112

Figure 4.13 Selected principal components of S impedance profile along pre-stack seismic line.

113

550 8000 550 8000

7500 7500

600 7000 600 7000

6500 6500

6000 6000 TWT:[ms] 650 TWT:[ms] 650

5500 5500

5000 5000 700 700 4500 4500 20 40 60 80 100 20 40 60 80 100 Traces Traces 1 km

a) b)

Figure 4.14: Estimated expectation value of P impedance profile from a) trace by trace inversion using GAIS and from b) simultaneous inversion using PCA based GAIS. Higher resolution is demonstrated from simultaneous inversion than trace-based inversion in the zone marked by the black ellipse. The well is located at the trace 40 (marked by the black dotted line) with the gas layer marked in red.

114

550 4000 550 4000

3500 3500

600 600

3000 3000

2500 2500 TWT:[ms] 650 TWT:[ms] 650

2000 2000

700 700 1500 1500 20 40 60 80 100 20 40 60 80 100 Traces Traces 1 km

a) b)

Figure 4.15: Estimated expectation value of S impedance profile from a) trace by trace inversion using GAIS and from b) simultaneous inversion using PCA based GAIS. Improved lateral continuity and higher resolution are demonstrated from the simultaneous inversion in the zone marked by the black ellipse. The well is located at the trace 40 (marked by the black dotted line) with the gas layer marked in red.

115

550 2.5 550 2.5

2.4 2.4

2.3 2.3

600 2.2 600 2.2

2.1 2.1

2 2 TWT:[ms] 650 TWT:[ms] 650 1.9 1.9

1.8 1.8

1.7 1.7 700 700 1.6 1.6 20 40 60 80 100 20 40 60 80 100 Traces Traces

1 km

a) b)

Figure 4.16: Estimated P/S impedance ratio from a) trace-by-trace inversion using GAIS and b) from simultaneous inversion using PCA based GAIS. Higher resolution is demonstrated in the simultaneous inversion than the trace- based inversion. The gas zone is more concentrated in the P/S impedance ratio map derived from simultaneous inversion than that from trace-based inversion. Simultaneous inversion provides a more confident interpretation of the boundary of gas zone. The interpreted gas zone with low Vp/Vs ratio is marked in the small black ellipse in the figure b, which matches the gas layer (marked in red along the black dotted line) from the well log very well.

116

a) b)

Figure 4.17: a) Normalized standard deviation σ(Zp)/Zp of posterior P impedance maps estimated from PCA based GAIS; b) normalized standard deviation σ(Zs)/Zs of posterior S impedance maps estimated from PCA based GAIS.

117

560 560 PCA PCA Well Well 580 580

600 600

620 620 TWT:[ms] TWT:[ms] 640 640

660 660

680 680

5000 6000 7000 2000 3000 4000 P impedance:[m/s*g/cc] S impedance:[m/s*g/cc] a) b)

Figure 4.18: a) Comparison of P impedance at the well location estimated from simultaneous inversion (blue) with the reference P impedance log; b) comparison of S impedance at the well location estimated from simultaneous inversion (blue) with the reference S impedance log.

118

1 km a) b)

Figure 4.19: a) Synthetic pre-stack seismic at the angle of 24 degree derived from the expectation value of 2D elastic properties using PCA based GAIS; b) the residuals compared with the observed pre-stack seismic at the angle of 24 degree.

119

4.6 DISCUSSIONS AND CONCLUSIONS

The novel strategy of seismic inversion introduced in this chapter is simultaneous inversion of all traces using an efficient model parameterization method: PCA (principal component analysis). The orthogonal transformation is applied to project the original model space of elastic properties to a smaller number of uncorrelated principal components (PCs). Instead of perturbing the model parameters at each layer per trace, only the weighting (coefficient) vector for the PCs are updated or inverted. The advantages of applying PCA in seismic inversion are: 1) dimension reduction, which addresses the difficulty of convergence caused by a large number of unknown parameters involved in the simultaneous inversion; 2) data driven, which avoids damaging the real physical structure by imposing user defined geological constraints. The applicability of simultaneous inversion using PCA based GAIS is studied using the same post-stack and pre-stack seismic data as in chapter 3, and the results are compared with those from trace-based inversion using GAIS. The dimension of the original model space is reduced to two hundred after de-correlation using SVD. We are able to match the observed 2D seismic profiles by perturbing only the weighting vector for the PCs in both post-stack and pre-stack seismic lines. Compared with trace-based inversion, we notice that simultaneous inversion improves lateral continuity significantly along the post-stack seismic line and enhances lateral continuity in small scales (thin layers) with improved resolution along the pre- stack seismic line. This provides the opportunity for confident stratigraphic interpretation

(such as horizon picking), accurate estimation of reservoir volume and reliable reservoir forecast (flow connectivity related). One of the key factors for a successful dimension reduction using PCA is how to sample the training images. In order to have a small number of eigenvectors containing 120

most of the total energy, a strong correlation of training images is required. However, the variability of these training images is expected to be large enough to cover maximum uncertainty ranges. In this thesis, I obtain correlation by simulating a series of intermediated probability distribution, which are gradually conditioned by seismic data. To increase the uncertainty range, multiple realizations of fractal Gaussian noise are imposed at the initial guess and the models are updated stochastically using multiple VFSA threads.

Compared to trace-based inversion, the uncertainties of the posterior elastic profiles are reduced in simultaneous inversion. Two possible reasons for the uncertainty reduction are as follows: 1. the variability of training images may not large enough to cover the maximum uncertainty range; 2. eigenvectors with small variances are cut off for dimension reduction.

121

Chapter 5: Reservoir monitoring using joint inversion of production data and time-lapse seismic related data

5.1 INTRODUCTION

Reservoir monitoring involves reservoir modeling and prediction of future production – it is required during the production period. In reservoir modeling, reservoir fluid flow parameters, such as permeability and porosity, are adjusted (or inverted) by matching the history of the dynamic measurements such as the production data. This work process is traditionally known as “history matching” (HM). The adjusted reservoir models can then be used to simulate future production for reservoir management and business decision making. A traditional HM is usually carried out by geologists and reservoir engineers. At first, geologists build static geo-models by honouring well data and 3D seismic data. Then reservoir engineers sample the reservoir models by honouring geology and dynamic measurements, i.e., the production data. However, HM using production data alone entails a large degree of freedom and thus, the uncertainty away from the well locations is very high. Simulation from variable arrangements of reservoir parameters may match the history of production. Even though the adjusted model can reproduce the production history, they may have different production forecasts. The main challenge of HM is how to reduce the model uncertainty and improve on the model predictability. Quantitative integration of production data and time-lapse geophysical data is aimed at overcoming this challenge.

Time-lapse seismic data, which is the change of seismic amplitudes and/or travel times between the multiple surveys over the same area, monitors changes in water saturation and pressure caused by oil/gas production and injection of water. The time-

122

lapse seismic related data refers to the change of reservoir properties, which are directly related to the changes in seismic amplitude, such as seismic time shift and water saturation change. Integration of time-lapse seismic (related) data in the HM process provides an opportunity to reduce uncertainty in reservoir models and increase the reliability of forecast (Stephen and MacBeth, 2006; Landa and Kumar, 2011; Xue et al., 2013a). This is because seismic data has relatively high resolution in lateral directions (10-20m).

The problems of quantitative integration of production data and time-lapse seismic (related) data have been an area of active investigation for the last ten years. It still remains a non-trivial task despite advances in technology and approaches. We are faced with following challenges: 1) The quality of time-lapse seismic data is dependent on the timing of acquisition, repeatability and the change of reservoir properties. In practice, the data are usually noise contaminated due to poor repeatability, and the changes may not be significant. 2) It is very difficult to match time lapse seismic (related) data because the comparison of seismic attributes is made at each pixel of a given reservoir zone. It is even more difficult to find good reservoir models with additional seismic constraints than using production constraints alone. The minimization of the misfit at each pixel of the reservoir zone usually suffers from slow convergence. 3) The integration of multi-disciplinary knowledge involves contributions from geologists, reservoir engineers, petrophysicists and geophysicists. To construct a forward modeling chain requires fluid flow simulations, petro-elastic modeling, seismic modeling as well as a reasonable interpretation of time-lapse seismic data for a better understanding of fluid changes. 123

4) The number of unknown model parameters is usually quite large for the estimation of high resolution reservoir models because the reservoir zone is gridded for flow simulation and the value (permeability and/or porosity) at each grid cell needs to be updated simultaneously during an iterative optimization. Optimization of a function with a large number of variables is generally a computationally challenging task. The previous studies of HM using quantitative joint inversion address these challenges in different ways: 1) setting the meeting point between geophysicist and reservoir engineers in the workflow (fig. 5.1), namely defining which reservoir property to be used for objective (misfit) function, for instance, changes in pressure and saturation (Landa and Horne,1997), seismic impedance (Gosselin et al., 2003; Stephen et al., 2005; Roggero et al., 2007; Castro, 2007), gas-presence indicator (Kretz et al., 2004), seismic amplitude (Landa and Kumar, 2011; Dadashpour el al., 2007), and seismic time shift and time strain (Tolstukhin et al., 2012); 2) Inversion algorithms for updating the models, such as gradient based method (Landa and Horne, 1997; Dadashpour et al., 2007), gradual deformation method (Kretz et al., 2004; Roggero et al., 2007), probability based perturbation (Castro and Caers, 2006), Particle Swarm Optimization method (Suman, 2011; Jin et al., 2012), Very Fast Simulated Annealing (Jin et al., 2012), Markov Chain Monte Carlo method (Landa and Kumar, 2011); 3) model parameterization for dimension reduction, such as Principal Component Analysis (PCA) (Dadashpour, 2009; Echeverria1 and Mukerji, 2009; Suman, 2011; Chen et al., 2012), pilot points (Jin et al., 2009), petrophysical properties (permeability, saturation, fault transmissibility, fracture orientation/density) and their related global and regional multipliers (Stephen et al., 2005; Landa and Kumar, 2011; Tolstukhin et al., 2012). In this thesis, a novel stochastic inversion workflow for quantitative integration of production data and time-lapse seismic data is designed to reduce model uncertainty and 124

improve on the model predictability. In this project, I focus mainly on the testing of feasibility of our new workflow for joint inversion and studying the impact of different objective functions, or different types of data, including both production and time-lapse seismic data, on the uncertainty estimation and model predictability. A one layer permeability model is built as the reference model to simulate synthetic production data over a five year history and time-lapse seismic data, in the form of maps of water saturation changes after two and five years. To simplify the problem and avoid uncertainty propagation through petro-elastic modelling, we assume here that the changes of seismic amplitudes are caused only by the changes in water saturation. Therefore, instead of minimizing the changes of seismic amplitude, we are minimizing the misfit of the changes in the water saturation maps, which are considered as time-lapse seismic related data. Due to the expensive computational cost (can up to 30 minutes) per flow simulation, multiple very fast simulated annealing (MVFSA) is applied to sample the permeability models with acceptable misfit and estimate their uncertainties. In the end, production over a seven year forecast period is predicted based on the permeability models derived from each type of objective function and the predictions are compared with the forecast production from the reference model.

125

Figure 5.1 Meeting points for geophysics and reservoir engineering (Landa and Kumar, 2011)

126

5.2 NOVEL WORKFLOW FOR JOINT INVERSION

One of the critical factors in joint inversion is how to weigh the objective (misfit) functions from different types of data. To circumvent this problem, a game theoretical approach (Aumann, 1987; Myerson, 1991) for joint inversion of PP and PS seismic data (fig. 5.2) is introduced by Deng et al. (2011). Instead of assigning weights to the misfit of PP and PS data, a sequential updating using PP and PS data is applied per iteration such that the accepted model from one type of data is used as the initial model to match another type of data. The interaction between PP and PS data continues until the misfits for both types of data reach their minima. Such a sequential updating usually requires a large number of iterations (1000, or even more) until the equilibrium has been reached. It is, however, not practical in a HM loop because the flow simulation is computationally quite expensive. Motivated from the above game theoretical approach, instead of sequential updating per iteration, a novel HM workflow using two loops in a sequential way is developed in this project, which will be described in the paragraph below.

It is known that the large dimensional time-lapse seismic (related) data is more difficult to match than the production data (a volume of liquid produced) alone at sparse locations. If the misfits from two types of data are equally weighed in a joint inversion, the sampling of reservoir models will be biased towards the model distribution matching the production data alone considering that an increase in the misfit of seismic (related) data will still be accepted if the misfit of production data decreases more. Such behavior is demonstrated in our test shown in the fig. 5.3a. Therefore, if the weights are unknown or the misfits are equally weighed, it is better to start from the distribution that is already constrained by the time-lapse seismic related data and then extra constraints from production data can be added in the HM loop. Based on this idea, HM using two loops in a sequential way is designed. In the first loop, multiple independent threads of VFSA 127

(Ingber, 1989; Sen and Stoffa, 1996, 2013) are applied to update the models by matching time-lapse seismic related data (water saturation changes) alone. The first loop stops when an acceptable small misfit from every VFSA thread is reached. The conditioned models (output) are then used as initial models for the second HM loop, where both production data and water saturation changes are taken into account and equally weighted. In this way, despite possible increase in seismic misfit during iteration, the misfits from both types of data remain low in the final stage. The interaction between the misfits from both types of data is demonstrated in our test shown in the fig. 5.3 b.

Figure 5.2: Workflow for joint inversion of PP and PS wave (copied from Deng et al., 2011)

128

0.7 summation of weighted misfits 0.6 misfit of Wsat misfit of prod 0.5

0.4

Misfit 0.3

0.2

0.1

0 10 20 30 40 50 60 Number of iterations a)

0.7 summation of weighted misfits misfit of Wsat 0.6 misfit of prod

0.5

0.4 Misfit

0.3

0.2

0.1 10 20 30 40 50 60 Number of iterations b)

Figure 5.3: a) Objective function of joint inversion starting from unconditioned initial reservoir model with misfit of production data (green), misfit of water saturation change (red) and summation of equally weighed misfits; b): objective function of joint inversion starting from water saturation maps constrained reservoir model with misfit of production data (green), misfit of water saturation change (red) and summation of equally weighed misfits.

129

The flowcharts of two HM loops are shown in the fig. 5.4. Similar to simultaneous inversion in the chapter 4 (fig. 4.2), PCA (Kim, 2002; Echeverria and Mukerji, 2009; Chen et al., 2012) is used here as the model parameterization method. At first, thousands of training images of permeability models are generated. Because most histograms of permeability models follow the log-normal distribution, the training images are converted to logarithmic scale for the PCA. After applying PCA, a small number of Eigen-vectors taking most part of energy (sum of Eigenvalues) are selected to represent the prior model space. Instead of trying to estimate the reservoir properties for the entire grid cells, we are updating only the coefficient vector p for these principal components. For fast convergence and for the purpose of obtaining a good approximation of the uncertainty, multiple (e.g. 60) independent threads of VFSA are applied to optimize the reservoir models. The loop stops until an acceptable small misfit is reached by each VFSA thread. Similar procedure is employed in the second HM loop except that: 1) The initial models are the output from the first HM loop; 2) The objective function is the summation of equally weighed misfits from both types of data. This workflow can also be applied to estimate other reservoir properties and to match other types of data with additional forward modelling after flow simulation.

130

a) b)

Figure 5.4: Novel HM workflow using two loops in a sequential way: in the first loop a) time-lapse seismic related data is applied to constrain the models; the constrained models are used as initial models for the second loop b), where the misfits from both types of data are equally weighed.

131

5.3 OBJECTIVE FUNCTIONS

To demonstrate superior performance of joint inversion in terms of model uncertainty and predictability, different objective functions for different types of data are defined. The HM using one type of data alone follows only the first HM loop (fig 5.4a) with substituting an appropriate misfit (objective) function. Assuming a linear error model, the objective functions are defined by using: a) production data alone. Water rate is used here, which is the volume of water produced per day; b) Water saturation changes alone; c) Both water rate and water saturation changes. a) The normalized objective function using production data alone is defined as (eqn. 5.1):

ref syn ref syn E p  | d p  d p |/ | d p  d p | , (5.1)

ref where Ep is the misfit of water rate, dp is the simulated water rate from the reference

syn model and dp is the synthetic water rate from the updated reservoir model. b) Because it is more challenging to match water saturation changes than 1D production data, three objective functions, named Ewsat_pixel, Ewsat_binary and Ewsat_corr2 are defined for a better HM of water saturation changes at each scenario. The total misfit of water saturation changes is the sum of the misfits from different scenarios (after two years and after five years).

To calculate Ewsat_pixel, the value of the reference map of water saturation change at each pixel location is subtracted from the value of the synthetic map at each pixel location, and then the residuals are summed pixel by pixel and normalized by the sum of the reference map and the synthetic map (eqn. 5.2). To calculate Ewsat_binary, each standard image of water saturation change is transferred to a binary image by defining a threshold value, e.g. 60%, that allows us to parameterize where the water front is located. 132

Specifically, if the water saturation change at a given pixel is larger than 60%, the value of the binary map at that pixel is 1 (shown in blue color in the figure 5.5); otherwise it is 0 (shown in red color in the figure 5.5). Then the residuals between the synthetic binary image and the reference binary image at each pixel are summed and normalized (eqn. 5.3). Transforming the saturation change maps into a binary space smoothen the surface of the objective function, which may help the optimization algorithm to converge faster, though some (acceptable) resolution is lost. To calculate Ewsat_corr2, the correlation coefficient between the reference image and the synthetic image is calculated and subtracted from one (eqn. 5.4).

N M N M ref syn ref syn Ewsat_ pixel    | , j  Di, j | /   | Di, j  Di, j | (5.2) i1j1 i1j1 N M N M ref syn ref syn Ewsat_ binary    | Bi, j  Bi, j | /   | Bi, j  Bi, j | (5.3) i1j1 i1j1 N M ref syn   (Di, j  Dref )(Di, j  Dsyn) i1j1 Ecorr2  1 , (5.4)  2  2  N M N M syn    (Dref  D )    (D  D )   i, j ref  i, j syn  i1j1 i1j1 

ref ref where Di,j (B i,j ) is the reference (binary) water saturation change at the pixel location

syn syn of (i, j), D i,j (B i,j ) is the synthetic (binary) water saturation change of the updated model at the pixel location of (i, j), Dref is the mean of water saturation changes from all pixels in the reference map, and Dsyn is the mean of water saturation changes from all pixels in the synthetic map. c) For the joint inversion in the second HM loop, the misfits from both data types are equally weighed and added together (eqn. 5.5, eqn. 5.6 and eqn. 5.7): 133

Ecomb_pixel  0.5* E p  0.5*(Ewsat_pixel _2  Ewsat_pixel_5 ) (5.5)

Ecomb_binar y  0.5* E p  0.5*(Ewsat_binar y_2  Ewsat_binar y_5 ) (5.6)

Ecomb_corr2  0.5* E p  0.5*(Ecorr2_2  Ecorr2_5) (5.7)

Figure 5.5: The procedure to calculate the objective function Ewsat_binary: standard maps of water saturation changes simulated from the reference model and from an updated model are transferred to binary images by defining a water front, such that more than 60% of water saturation is considered as water, shown in blue and less that 60% of water saturation is considered as oil, shown in red. Then the synthetic binary image is subtracted from the reference binary image. In the end, Ewsat_binary can be calculated by summing all the absolute value of residuals together.

134

5.4 SYNTHETIC TEST OF HISTORY MATCHING WORKFLOW

5.4.1 Reference model

In the synthetic test, a one layer reference model with 67*67 pixels (the pixel size of 15feet*15feet) is built, the pressure and porosity are set as constant at each pixel of the reservoir zone and during the production period. The reference model of facies and the corresponding permeability models are shown in figures 5.6a and 5.6b, respectively. To reduce the number of unknown model parameters, an assumption for flow simulation is made that permeability in the X-direction (X- and Y- direction are in the horizontal plane and Z- direction is in the vertical plane) is equal to the permeability in the Y-direction and is 10 times larger than that in the Z-direction (described as permx = permy = 10*permz). The injection well and the production well are marked in figure 5.6. Based on the reference model, production data (water rate at the producer) over a five year history sampled by month (figure 5.7) and two scenarios of the changes of water saturation maps after two years (figure 5.8a) and after five years (figure 5.8b) are simulated using Shell’s in-house flow simulation tool.

a) b)

Figure 5.6: a) Reference facies model; b) reference permeability model (PermX)

135

Figure 5.7: The reference production data (water rate) over a five years history sampled by month.

a) b)

Figure 5.8: The reference map of water saturation changes over a two years (a) and a five years (b) history.

136

5.4.2 History matching

To cover a large uncertainty range, thousands of training images (fig. 5.9) of facies models and their corresponding permeability models (permx) with various sand/shale ratios and anisotropy parameters are sampled at first. The tools used for sampling of training images and forward modelling of reservoir simulation are internal sources from Shell Oil Company, which are not discussed in this dissertation. The permeability models are then converted to logarithmic scale for PCA. After PCA, we note that 150 Eigenvectors contain more than 85% of the total energy (sum of the Eigenvalues), which are selected to represent the prior model space. The corresponding principal components in logarithmic scale are shown in the figure 5.10. To reconstruct the image of permeability models in the logarithmic scale, a linear combination of the principal components is employed (eqn. 5.8):

mˆ k  μ  sp , (5.8)

where mˆ k is the reconstructed permeability models in a logarithmic scale, μ is the mean of training images in a logarithmic scale, s is the matrix of principal components having a size of (67*67,150) with each column representing a vectorized principal component, and p is the coefficient vector having a size of (150,1) with each component representing the weight for the corresponding principal component. More background of PCA can be found in the chapter 4.2.

To update the permeability models, 60 threads of VFSA are employed to perturb the coefficient vector p for the 150 principal components by searching for the global minimum of each objective function defined in the section 5.3. The initial coefficient vector p is sampled from Gaussian distribution. After 100 iterations of VFSA, the misfits 137

reduce by around 90% in the test of using production data alone and the misfits reduce by around 70-90% in the test of using water saturation data alone. The convergence of production data (fig. 5.15) is demonstrated together with the prediction over the next 7 years in the section 5.4.3. To illustrate the convergence of water saturation changes, the averages of 60 images of water saturation changes (after two years) before HM and after HM are demonstrated together and are compared with the reference water saturation changes after two years (fig. 5.11). Before calculating the average of these images, each standard (original) image of water saturation changes

(before and after HM) is converted to a binary image by defining a water front (if a pixel has a water saturation change larger than 60%, it is defined as water, shown in blue; otherwise it is defined as oil, shown in red.). The training images before HM are sampled randomly; therefore the binary images before HM are quite different from each other and the mean of these images has a large area of non-integer value between [0,1], shown as the (yellow and green) boundary between red (oil) and blue (water). The thick boundary before HM (fig. 5.11a) indicates large uncertainty of water front or large variability of the water saturation changes. By a successful HM, it is expected that the water front will converge to the water front of the reference image (fig. 5.11b). Significant convergence of water fronts are demonstrated by using all three objective functions (fig. 5.11c, 5.11d and 5.11e). However, the convergent water front using the objective function defined by correlation coefficient (fig. 5.11e) goes over the water front of the referent image or the water saturation changes are overestimated. Therefore, HM of water saturation changes using correlation coefficient based objective function (Ewsat_corr2) is unsuccessful and is discarded in the joint inversion. In the second loop HM (or joint inversion), the sum of the misfits from both types of data shows a reduction by 80-90% after 100 iterations. The same comparison 138

procedure is applied to the images of water saturation changes after HM (fig. 5.12). The shape of convergent water front using both types of data approximates the reference water front better than using water saturation changes alone. Some small features of the water front (marked with the white circles in the fig. 5.12c and fig. 5.12d) can be identified using joint inversion. Due to the large degree of non-uniqueness involved in the reservoir modeling and unavailability of additional constraints, the best fit permeability models from different

VFSA threads can be quite different from each other, although they share similar flow behavior at certain locations and certain scenarios of history. Selected best fit permeability models based on different objective functions are demonstrated together and compared with the reference model (fig. 5.13). The locations of the injection well and the production well are indicated by the black and purple spots, respectively. For uncertainty analysis, the standard deviations (STD) of the output permeability models after HM are compared with those before HM at each objective function (fig.

5.14). The HM using production data alone only reduces the model uncertainty along the stream line. By adding the constraints of water saturation changes, we are able to obtain an overall reduction in the uncertainty.

139

Figure 5.9: Selected training images of facies models shown in a), b) and c) and their corresponding permeability models shown in d), e) and f).

140

The 1st PC The 5th PC 1.5 1.5 20 1 20 1 40 40 0.5 0.5 60 60 0 0 20 40 60 20 40 60 The 10th PC The 15th PC 1.5 1.5 20 1 20 1 40 40 0.5 0.5 60 60 0 0 20 40 60 20 40 60 The 20th PC The 25th PC 1.5 1.5 20 1 20 1 40 40 0.5 0.5 60 60 0 0 20 40 60 20 40 60

Figure 5.10: Selected principal components of the permeability models in a logarithmic scale.

141

Figure 5.11: Comparison of the mean of 60 binary images of the water saturation changes after two years before HM a) and after HM using different objective functions c) Ewsat_pixel, d) Ewsat_binary, e) Ewsat_corr2. The reference binary image of water saturation after two years is shown in b).

142

Figure 5.12: Comparison of the average of 60 binary images of the water saturation changes after two years before HM a) and after HM using different objective functions c) Ecomb_pixel, d) Ecomb_binary. The reference binary image of water saturation after two years is shown in b).

143

Figure 5.13: Comparison of the reference permeability model c) with selected best-fit permeability models based on different objective funcionts: a) Ewsat_pixel, b) Ewsat_binary, d) Ep, e) Ecomb_pixel , f) Ecomb_binary .

144

Figure 5.14: Comparison of c): the standard deviation of the initial permeability models before HM with the standard deviation of conditioned permeability models after HM using different objective functions: a) Ewsat_pixel, b) Ewsat_binary, d) Ep, e) Ecomb_pixel , f) Ecomb_binary .

145

5.4.3 Model predictability

In order to test model predictability, water rates over a seven year forecast period are simulated based on the 60 optimized permeability models derived from each objective function. They were compared against the reference model before and after HM (fig. 5.15). The spread of blue curves demonstrate the uncertainty of prediction. By a successful reservoir modeling, it is expected that the blue curves would converge to the reference production data (red curve) and have a narrow spread.

Although the permeability models matching the water saturation changes do not necessarily share a similar flow behavior over the five years history at the well location, the convergence of their flow responses can be identified in long term. The prediction based on the permeability models derived from joint inversion match the reference production very well and has a narrower spread than using production data alone.

146

Figure 5.15: Simulated water rates using different objective functions: a) Ewsat_pixel, b) Ewsat_binary, d) Ep, e) Ecomb_pixel , f) Ecomb_binary , including the 5 year history period (before the black dashed line) and the 7 years forecast period (after the black dashed line). Green and blue curves demonstrate simulated water rate before and after history matching accordingly. The red curve is the simulation from the reference model.

147

5.5 DISCUSSIONS AND CONCLUSIONS

From our results, the following observations and conclusions can be made (Xue et al., 2013a). (1) As expected, production data seems to do little in terms of reducing the uncertainty of the derived permeability images away from wells and it is biased towards high permeability values. In practice, this could lead to severe overestimation of hydrocarbon recovery. (2) Including the time-lapse seismic values either in the form of pixels or binary maps effectively reduces the uncertainty in the permeability images; this applies either production data is used or not. (3) When executing a joint inversion, our results show that better images are derived using time-lapse seismic in the form of binary maps. This is due to the regularization coming from the binary conditioning image highlighting the importance of it in history matching. (4) Overall, the production responses from the posterior ensemble nicely represent the reference signal for both the history matching and prediction period when both time-lapse seismic and production data are used. Due to the low temporal resolution of the seismic maps, it is difficult to fully define the arrival time at the wells when only seismic data are used. Finally, our results demonstrate the importance of bringing additional data to support production-based history matching.

148

Chapter 6: Conclusions and future work

6.1 CONCLUSIONS

Estimation of reservoir properties from surface measurements constitutes an inverse problem, which has non-unique solutions. A complete solution usually contains the expectation value of model parameters, models drawn from the posterior distribution and their uncertainties. To address this problem, a novel stochastic inversion method and workflows are developed in this thesis and applied to seismic inversion and reservoir monitoring. The novel stochastic inversion method, named greedy annealed importance sampling (GAIS), is a hybrid stochastic inversion that combines a global optimization method (VFSA) and an independent Monte Carlo method (GIS). The GAIS searches important regions of the target distribution starting with the models that are in close neighborhood of the important regions already located by MVFSA rapidly in a few iterations and explicitly explores the target distribution by grid-searching the high probability density regions with a fixed step length along axis parallel directions. In this thesis, the target distribution is essentially the posterior distribution of reservoir properties given the measurements. At first, the GAIS is applied to a trace-by-trace seismic inversion for the estimation of the expectations of compressional and shear impedance models and their uncertainties. The data used for testing are 1D synthetic post-stack traces, and 2D post- stack and pre-stack seismic data8sets given by the HRS demo data sets. From the tests, I demonstrated that 1: impedance models sampled by GAIS have higher resolution than the deterministic model inverted by HRS, even when low frequency starting models are used;

149

2: the estimated expected values of impedance models from GAIS are more accurate and stable than that those obtained from using VFSA alone, which are contaminated with unrealistic high frequency components; 3: benefitted from accurate estimation at each trace location, lateral continuity is improved by GAIS without imposing any trace to trace correlation; 4: by grid walking over the important region of posterior distribution with a fixed step length, the uncertainty is better quantified by GAIS than using VFSA alone, which under-estimated the posterior variance due to biased sampling towards the MAP point.

A new seismic inversion workflow involving GAIS and an efficient parameterization tool (PCA) is developed and applied to simultaneous inversion of all (one hundred) traces over the surface locations. Taking the advantage of the strong correlation among the pre-conditioned training images, the dimension of model space is reduced to two hundred after PCA. Through perturbing only the coefficient vector for the two hundred principal components, the impedance models along the 2D line are updated simultaneously with the synthetic seismic profile gradually converging to the observation. Compared with trace-based inversion using GAIS, improved lateral continuity and higher resolution in small scales (thin layers), are demonstrated by simultaneous inversion of all traces. Consequently, a more confident structural interpretation is provided, such as fault picking and identification the location of gas zone. A stochastic inversion workflow containing two HM loops is developed and applied to quantitative integration of production data and time-lapse seismic related data (water saturation changes) for reservoir monitoring. It is found that using the two HM loops, the misfits of both types of data can be reduced to acceptable small values. By

150

comparing the results of joint inversion with that of inversion (HM) using one type of data alone, it is demonstrated that 1: HM using production data alone has the difficulty to reduce uncertainty in the derived permeability models away from the stream line and the conditioned models are biased towards high permeability values; 2: although the water saturation changes can effectively reduce the uncertainty of the permeability models, it is hard to fully define the arrival time of the fluid flow at the wells due to the low temporal resolution of time-lapse seismic (related) data;

3: joint inversion of both types of data reduces model uncertainty, improves model predictability and shows superior performance compared to inversion using one type of data alone.

6.2 FUTURE WORKS

In this thesis, I use elastic properties (compressional-, shear impedance and density) as model parameters for seismic inversion. The inverted impedance models and density models can then be further used for estimation of petrophysical properties, such as porosity, water saturation and clay content, based on theoretical petro-elastic models (Berryman, 1995; Mavko et al., 1998) or empirical models (eqn. 6.1) derived for a specific depositional environment (Koesoemadinata, 2001, 2003). That is

Vp  F(,C, S, In P, f )

Vs  F(,C, S, In P, f ) , (6.1)   F(,C, S)

where F is a linear operator,  is the porosity, C is clay content, S is saturation, P is effective pressure, f is frequency. However, such a sequential inversion ignores the 151

non-uniqueness of seismic inversion and assumes that the impedance and density are accurately inverted, which allows uncertainty propagation of previous inversion errors. A direct sampling of petrophysical properties in seismic inversion has been the subject of many studies (Mazzotti, 2003; Spikes et al., 2006; Jin et al., 2007; Shahin et al., 2012). Compared with the inversion for elastic properties, the solution is better constrained by searching for petrophysical properties, because the range of possible solutions is significantly reduced. The range for porosity is usually between 0.05 and 0.35 and for water saturation is usually from 0 to 1. The results from previous studies show that the porosity is the best-resolved property with less uncertainty than water saturation (Mazzotti, 2003; Bachrach, 2006; Spikes et al., 2007). Direct searching over petro-physical properties is also applicable to the two new approaches developed for seismic inversion in this thesis: trace based GAIS and PCA based GAIS for simultaneous inversion. For trace based inversion using GAIS, we only need to modify the specific range of petro-physics properties, the step length for each property and choose appropriate petro-elastic models to link the petro-physical properties with elastic properties. For simultaneous inversion using PCA based GAIS, we need to generate training images of porosity models, water saturation models or clay contents for dimension reduction using PCA.

One of the interesting topics in quantitative integration of production data and time-lapse seismic (related) data is where to set the meeting point between reservoir engineers and geophysicist or in which domain to compare the time-lapse seismic data, such as in the time domain of seismic traces or in the domain of inverted impedance or inverted water saturation? Rather than in the inverted domain, comparison in the seismic domain using only the forward modeling is usually preferred for a better understanding of uncertainty (Landa and Kumar, 2011). However, if the comparison is set at the level of 152

seismic domain, high sensitivity of time-lapse seismic data to the variation of model parameters is desirable (Sagitov and Stephen, 2013). The study from Sagitov and Stephen (2013) demonstrate that if the changes of pressure and saturation do not result significant variation of impedance, comparison using the (error-free) synthetic impedance has a better performance (higher resolution) on the constraining model parameters than the comparison at the seismic amplitude level due to limited resolution of seismic data. This conclusion, however, does not fully answer the question of which data to compare, because uncertainty is always associated with the inverted properties in practice, especially in seismic inversion. For a better understanding of uncertainty propagation and choosing an appropriate meeting point, instead of using synthetic impedance, comparison using inverted impedance models or saturation models from the posterior distribution may be of interest for future work, including the study of the sensitivities of the model parameters to the errors (uncertainty) of inverted properties (impedance/saturation).

153

Bibliography

Ackley, D. H. (1987). A connectionist machine for genetic hillclimbing. Boston: Kluwer Academic Publishers. Aumann, R. J. (1987). game theory. The New Palgrave: A Dictionary of Economics 2, pp. 460-482. Bachrach, R. (2006). Joint estimation of porosity and saturation using stochastic rock- physics modeling. Geophysics, Vol. 71, O53–O63. Basu, A., and L. N. Frazer. (1990). Rapid determination of critical temperature in simulated annealing inversion. Science, 249, 1409-1412. Berryman, J. G. (1995). Mixture theories for rock properties. In T. J. Aherens, A handbook of physical constants (pp. 205–228). American Geophysical Union. Browaeys, T. J. and S. Fomel. (2009). Fractal heterogeneities in sonic logs and low- frequency scattering attenuation. Geophysics, 74, no. 2, WA77-WA92. Caccia, D. C., D. Percival, M. J. Cannon, G. Raymond, and J. B. Bassingthwaighte, (1997). Analyzing exact fractal time series: Evaluating dispersional analysis and rescaled range methods: PhysicaA,246, 609–632. Carlisle, A., and G. Dozier. (2001). An off-the-shelf PSO: Proceedings of the Workshop on Particle Swarm Optimization, 1-6. Castro S. and J. Caers. (2006). A Probabilistic Approach to Integration of Well Log, Geological Information, 3D/4D Seismic and Production Data. 10th European Conference on the Mathematics of Oil Recovery. Amsterdam, The Netherlands. Castro, S. A. (2007). A probabilistic approach to jointly integrate 3D/4D seismic production data and geological information for building reservoir models. PhD dissertation. Stanford University. Chen, C. H., L. Jin, G. H. , D. Weber, J. C. Vink, D. Hohl, F. Alpak and C. Pirmez. (2012). Assisted History Matching Using Three Derivative-Free Optimization Algorithms. 74th EAGE/SPE Conference and Exhibition. Copenhagen, Denmark. Clarke, T. J. (1984). Full reconstruction of a layered elastic medium from P-SV slant stack data. Geophysical Journal of the Royal Astronomical Society, 78, 775-793. Clayton, R. W., and R. H. Stolt. (1981). A Born-WKBJ inversion method for acoustic relfection data. Geophysics, 46, 1559-1567. Clerc, M.. (1999). The swarm and the queen: Towards a deterministic and adaptive particle swarm optimization: Proceedings of the IEEE Congress on Evolutionary Computation, 1951-1957. 154

Dadashpour, M., M. Landro and J. Kleppe. (2007). Porosity and Permeability Estimation from 4D Seismic Data. EAGE 69th Conference and Exhibition. London, UK. Davis, L. D. (1991). Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York. Davis, T. E., and J. C. Principe. (1991). A simulated anealing-like convergence thoery for the simple genetic algorithm. Proceedings of the Fourth International Conference on Genetic Algorithms, (pp. 174-181). Deng, Z., M. K. Sen, U. Wang, X. , and Y. Xue. (2011). Prestack PP & PS wave joint stochastic inversion in the same PP time scale. SEG Annual meeting. San Antonio. Deutsch, C. and A. Journel. (1998). GSLIB: Geostatistical Software Library. Oxford: Oxford University Press. Dimiri, V. P. (1992). Deconvolution and inverse theory: Application to geophysical problems. Elsevier Science Publ. Co., Inc. Dimri, V. P. (2005). Fractals in geophysics and seismology:An introduction. In V. P. Dimri, Fractal behaviour of the earth system (pp. 1-18). Springer. Drimri, V. P. (2000). Fractal dimension analysis of soil for flow studies. In V. P. Drimri, Application of fractals in earth sciences (pp. 189-193). A. A. Balkema. Echeverria, D. and T. Mukerji. (2009). A robust scheme for spatio-temporal inverse modeling of oil reservoirs. 18th World IMACS/MODSIM Congress. Cairns, Australia. Emanual, A. S., G. D. Alameda, R. A. Behrens, and T. A. Hewett. (1987). Reservoir performance prediction methods based on fractal geostatistics. 62ndAnnual Technical Conference, SPE, paper 16971. Evans, M. (1991). Adaptive Importance Sampling and Chaining. Statistical Multiple Integration, eds. N. Flournoy and R. K. Tsutakawa, Providence, RI: American Mathematical Society. Fatti, J. L., G. C. Smith, P. J. Vail, P. J. Strauss, and P. R. Levitt. (1994). Detection of gas in sandstone reservoirs using AVO analysis. Geophysics, 59, 1362-1376. Fernández-Martínez, J. L., Fernández-Álvarez, J. P., García-Gonzalo, E., Menéndez- Pérez, C. O., and Kuzma, H. A. (2008). ParticleSwarm Optimization (PSO): a simple and powerful algorithm family for geophysical inversion. SEG Expanded Abstracts 27, 3568-3571. Forrest, S. (1993). Genetic algorithms: Principles of natural selection applied to computation. Science 261, 872-878. Gelderblom, P. and L. Jaap. (2010). Geological constraints in model-based seismic inversion. SEG annual meeting. Denver. 155

Geman, S., and D. Geman. (1984). Stochastic relaxation, Gibbs’ distribution and Bayesian restoration of images. IEEE Trans., PAMI-6, 721-741. Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration. Econometrica, 57, 1317-1339. Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley: Reading, Massachusetts. Gosselin, O., S. I. Aanonsen, I. Aavatsmark, A. Cominelli, R. Gonard, M. Kolasinski, F. Ferdinandi, L. Kovacic and K. Neylon. (2003). History Matching Using Time- Lapse Seismic. SPE Annual Technical Conference. Denver, Colorado. Gregory, P. (2005). Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach with Mathematical Support. Cambridge University Press. Hammersley, J. M. and D. C. Handscomb. (1964). Monte Carlo Methods. London: Chapman & Hall. Hardy, H. H. (1992). The fractal character of photos of slabbed cores. Mathematical Geology , 24, 73-97. Hastings, W. K. (1970). Monte Carlo methods using Markov chains and their applications. Biometrika, 57, 97-109. Hewett, T. A. (1986). Fractal distribution of reservoir heterogeneity and their influence on fluid transport. 61st Annual Technical Conference, SPE, paper 15386. Holland, J. H. (1975). Adaption in Natural and Artificial Systems. Ann Arbor: University of Michigan Press. Hong, T. (2008). MCMC algorithm, integrated 4D seismic reservoir characterization and uncertainty analysis in a Bayesian framework. Dissertation submitted to Jackson School of Geosciences, University of Texas at Austin. Hong. T. and M. K. Sen. (2008). Joint bayesian inversion for reservoir characterization and uncertainty quantification. SEG annual meeting. Las Vegas, Nevada. Hurst, H. E. (1951). Long-term storage capacity of reservoirs. Transaction of American Society of Civil Engineering, 116, 770-808. Ingber, L. (1989). Very fast simulated re-annealing. Mathematical and Computer Modelling, 12, 967-973. Ingber, L. (1993). Simulated annealing: Practice versus theory. Mathl. Comput. Modeling 18(11), 29-57. Jin, L., G. H. Gao, J. C. Vink, C. Chen, D. Weber, F. O. Alpak and P. Hoek. (2012). An Improved Inversion Workflow Jointly Assimilating 4D Seismic and Production Data. EAGE/SPE annual conference. Copenhagen, Denmark.

156

Jin, L., M. K. Sen, T. Hong and P. L. Stoffa. (2007). Joint Estimation of Porosity and Saturation by Combining a Rock Physics Model and Constrained Pre-stack Seismic Waveform Inversion. SEG annual meeting. San Antonio. Jin, L., P. L. Stoffa, M. K. Sen, R. K. Seif and A. Sena. (2009). Pilot point parameterization in stochastic inversion for reservoir properties using time-lapse seismic and production data. J. Seis. Explor. 18(1), 1-20. Kennedy, J. and R. C. Eberhart. (1995). Particle swarm optimization. Proceedings of IEEE International Conference on Neural Networks, (pp. 1942-1948). Piscataway, NJ. Kim, K. (2002). Face recognition using principal component analysis. http://www.umiacs.umd.edu/~knkim/KG_VISA/PCA/FaceRecog_PCA_Kim.pdf. Kirkpatrick, S., Jr. C. D. Gelatt, and M. P. Vecchi. (1983). Optimization by simulated annealing. Science, 220, 671-680. Koesoemadinata, A. P., and G. A. McMechan. (2001). Empirical estimation of viscoelastic seismic parameters from petrophysical properties of sandstones. Geophysics, 66, 1457-1470. Kretz, V., M. L. Ravalec-Dupin, and F. Roggero. (2004). An integrated reservoir characterization study matching production data and 4D seismic. SPE Reservoir Evaluation and Engineering, 7(2), 116-122. Landa, J. L. and D. Kumar. (2011). Joint inversion of 4D seismic and production data. SPE 146771, SPE Annual Technical Conference and Exhibition. Denver, Colorado. Landa, J. L., and R. N. Horne. (1997). A procedure to integrate all well test data, reservoir performance history and 4D seismic information into a reservoir description. SPE38653, in 1997 SPE Annual Technical Conference and Exhibition. MacKay, D. (1998). Intro to Monte Carlo methods. In Learning in Graphical Models. Kluwer. Mandelbrot, B. B. (1983). The fractal geometry of nature. W. H. Freeman and Co. Mavko, G. T. (1998). The rock physics handbook. Cambridge University Press. Mazzotti, A. and E. Zamboni. (2003). Petrophysical inversion of AVA data. Geophysical Prospecting, 51, 517-530. Merletti, G. D., Hlebszevitsch, J. C., and Torres-Verdín, C. (2003). Geostatistical inversion for the lateral delineation of thin-layer hydrocarbon reservoirs: a case study in San Jorge Basin, Argentina (Expanded Abstract). Society of Exploration Geophysicists (SEG) 73th Ann. Internat. Mtg., Dallas, Texas, October 26-31.

157

Menke, W., (2012), Geophysical Data Analysis: Discrete Inverse Theory, Academic Press, New York Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. Teller, and H. Teller . (1953). Equations of state calculations by fast computing machines. Journal of Chemical Physics, 21: 1087-1091. Metropolis, N., and S. Ulam. (1949). The Monte Carlo method. J. Acous. Soc. Am., 44, 335-341. Myerson, R. B. (1991). Game theory: analysis of conflict. Harvard University Press, ISBN 978-0-674-34116-6. Neal, R. M. (1998). Annealed Importance Sampling. Department of Statistics, University of Toronto: Technical Report No. 9805. Nur, A., and G. Simmons. (1969). The effect of saturation on velocity in low porosity rocks. Earth and Planetary Sciences Letters, 7, 183-193. Roggero, F., D. Y. , P. Berthet, F. Lefeuvre, P. Perfetti and C. Bordenave. (2007). Constraining reservoir models to production and 4D seismic data – application to the Girassol field, offshore Angola. SPE 109929, SPE Annual Technical Conference and Exhibition. Anaheim, California. Rothman, D. H. (1985). Nonlinear inversion, statistical mechanics, and residual statics estimation. Geophysics, 50, 2784-2796. Rubinstein, R. Y. (1981). Simulation and the Monte Carlo Method. New York: John Wiley and Sons. Sagitov, I. and K. D. Stephen. (2013). Optimizing the Integration of 4D Seismic Data in History Matching: Which Data Should We Compare? SPE 164852, EAGE/SPE Annual Conference and Exhibition. London, United Kingdom. Sambridge, M. (1999). Geophysical inversion with a neighbourhood algorithm- Searching a parameter space. Geophys. J. Int., 138, 479-494. Sambridge, M. and G. Drijkoningen. (1992). Genetic algorithm in seismic waveform inversion. Geophys. J. Int. , 109,323-342. Schuurmans, D. and F. Southey. (2000). Monte Carlo inference via greedy importance sampling. UAI. Sen, M. K and P. L. Stoffa. (2013). Global Optimization Methods in Geophyscial Inversion. Elservier Science Publications. Sen, M. K. and P. L. Stoffa. (1992). Rapid sampling of model space using genetic algorithms: Examples from seismic waveform inversion. Geophysical Journal International, 108, 281-292. Sen, M. K., and P. L. Stoffa. (1996). Bayesian inference, Gibbs' sampler and uncertainty estimation in geophyscial inversion. Geophysical Prospecting, 44, 313-350. 158

Sen, M. K.and P. L. Stoffa. (1991). Nonlinear one-dimensional seismic waveform inversion using simulated annealing. Geophysics, 56, 1624-1638. Shahin, A., K. Key, P. Stoffa and R. Tatham. (2012). Petro-elastic modeling for CSEM reservoir characterization and monitoring. Geophysics, Vol. 77, E9-E20. Shaw R. and S. Srivastava. (2007). Particle swarm optimization: A new tool to invert geophysical data. Geophysics, F75-F83. Singh, S. C., G. F. West, N. D. Bregman and C. H. Chapman. (1989). Full waveform inversion of reflection data. Journal of Geophysical Research, 94,1777-1794. Southey, F., D. Schuurmans, and A. Ghodsi. (2002) Regularized greedy importance sampling. In Advances in Neural Information Processing Systems, 14, pages 753– 760. Spikes, K. M., T. Mukerji, J. Dvorkin and G. Mavko. (2007). Probabilistic seismic inversion based on rock-physcis models. Geophysics, Vol. 72, R87-R97. Srivastava, R. P. and M. K. Sen. (2009). Fractal-based stochastic inversion of post-stack seismic data using very fast simulated annealing. Journal of Geophysics, 6, 412- 425. Srivastava, R. P. and M. K. Sen. (2010). Stochastic inversion of pre-stack seismic data using fractal based initial models. Geophysics, 75, R47-R59. Stephen, K. D. and C. Macbeth. (2006). Reducing reservoir prediction uncertainty using seismic history matching. EAGE 68th conference and Exhibition. Vienna, Austria. Stephen, K. D., J. Soldo, C. MacBeth, and M. Christie. (2005). Multiple Model Seismic and Production History Matching: A Case Study. SEP 94173, SPE Europe/EAGE Annual Conference. Madrid, Spain. Stolt, R. H., and A. B. Weglein. (1985). Migration and inversion of seismic data. Geophysics, 50, 2458-2472. Strebelle, S. (2000). Sequential Simultation Drawing Structures from Training Images. A dissertation submitted to the department of Geological and Environmental Sciences of Stanford University. Suman, A., J. L. Fernandez-Martinez and T. Mukerji. (2011). Joint Inversion of Time- Lapse Seismic And Production Data For Norne Field. SEG Annual meeting. San Antonio, Texas. Suzuki, J. (1998). A further result on the Markov chain model of Genetic Algorithms and its applications to a Simulated Annealing-like strategy. IEEE Transactions on Systems, Man. and Cybernetics- Part B: Cybernetics, 28(1): 95-102. Tarantola, A. (2005). Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM. 159

Tolstukhin, E., B. Lyngnes, and H. H. Sudan. (2012). Ekofisk 4D Seismic - Seismic History Matching Workflow. 74th EAGE/SPE conference and Exhibition. Copenhagen, Denmark. Walsh, B. (2004). Markov Chain Monte Carlo and Gibbs Sampling. Lecture Notes. Wang, Z. and A. Nur. (1988). Seismic velocities in heavy oil and tar sands: The basis for in-situ recovery monitoring. 4th UNITAR/UNDP conference on heavy oil crude and tar sands, (p. no. 110). Weglein, A. B., and S. H. Gray. (1983). The sensitivity of Born inversion to the choice of reference velocity - a simple example. Geophysics, 48, 36-38. Whitley, D., T. Starkweather and D. Shanner. (1991). The traveling salesman and sequence scheduling: Quality solution using genetic edge recombination. In L. Davis, Handbook of Genetic Algorithms (pp. 350-372). New York: Van Nostrand Reinhold. Xue, Y., E. Jimenez and L. Jin. (2013a). "Integrated Analysis of Reservoir Model Uncertainty and Predictability Using Production and Time-lapse Seismic Data. 75th EAGE Conference & Exhibition incorporating SPE EUROPEC 2013. London, United Kingdom. Xue, Y., L. Jin and M. K. Sen. (2013b). Application of Principal Component Analysis to Simultaneous Seismic Inversion. 75th EAGE Conference & Exhibition incorporating SPE EUROPEC 2013. London, United Kingdom. Xue, Y., M. K. Sen and Z. Deng. (2011). A new stochastic inference method for inversion of pre-stack seismic data. 2011 SEG Annual Meeting. Yagle, A. E., and B. C. Levy. (1983). Application of the Schur algorithm to the inverse problem for a layered acoustic medium. Journal of the Acoustical Society of America, 76, 301-308. Yagle, A. E., and B. C. Levy. (1985). A layer-stripping solution of the inverse problem for a one-dimensional elastic medium. Geophysics, 50, 425-433. Ziolkowski, A., J. T. Fokkema, K. Koster, A. Confurius, and R. Vanboom. (1989). Inversion of common mid-point seismic data. In P. L. Stoffa, Tau-p: A Plane Wave Approach to the Analysis of Seismic Data (pp. 141-176). Boston: Kluwer Academic Publishers.

160

Vita

Yang Xue was born in P. R. China. She entered the school of Remote Sensing and Information Engineering at Wuhan University in 2001 and was selected as an exchange student to University of Stuttgart in 2002. She then enrolled in the Diplom program at the department of Geodesy and Geoinformatics in University of Stuttgart in 2003 and obtained her Diplom-Ingenieur degree in 2009. She started her Ph.D. study in the Jackson School of Geosciences at the University of Texas at Austin in the fall of 2009. During the summer of 2010 and 2011, she worked for Shell Oil Company, Houston. Upon graduation, she will join Shell Oil Company, Houston.

Email: [email protected] This dissertation was typed by the author.

161