Download File

Total Page:16

File Type:pdf, Size:1020Kb

Download File Advances in Bayesian inference and stable optimization for large-scale machine learning problems Francois Fagan Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2019 c 2018 Francois Fagan All Rights Reserved ABSTRACT Advances in Bayesian inference and stable optimization for large-scale machine learning problems Francois Fagan A core task in machine learning, and the topic of this thesis, is developing faster and more accurate methods of posterior inference in probabilistic models. The thesis has two components. The first explores using deterministic methods to improve the efficiency of Markov Chain Monte Carlo (MCMC) algorithms. We propose new MCMC algorithms that can use deterministic methods as a \prior" to bias MCMC proposals to be in areas of high posterior density, leading to highly efficient sampling. In Chapter 2 we develop such methods for continuous distributions, and in Chapter 3 for binary distributions. The resulting methods consistently outperform existing state-of-the-art sampling techniques, sometimes by several orders of magnitude. Chapter 4 uses similar ideas as in Chapters 2 and 3, but in the context of modeling the performance of left-handed players in one-on-one interactive sports. The second part of this thesis explores the use of stable stochastic gradient descent (SGD) methods for computing a maximum a posteriori (MAP) estimate in large-scale machine learning problems. In Chapter 5 we propose two such methods for softmax regression. The first is an implementation of Implicit SGD (ISGD), a stable but difficult to implement SGD method, and the second is a new SGD method specifically designed for optimizing a double-sum formulation of the softmax. Both methods comprehensively outperform the previous state-of-the-art on seven real world datasets. Inspired by the success of ISGD on the softmax, we investigate its application to neural networks in Chapter 6. In this chapter we present a novel layer-wise approximation of ISGD that has efficiently computable updates. Experiments show that the resulting method is more robust to high learning rates and generally outperforms standard backpropagation on a variety of tasks. Table of Contents List of Figures vii List of Tables xi 1 Introduction 1 I Advances in Bayesian inference: Leveraging deterministic methods for ef- ficient MCMC inference 5 2 Elliptical Slice Sampling with Expectation Propagation 6 2.1 Introduction . .6 2.2 Expectation propagation and elliptical slice sampling . .8 2.2.1 Elliptical slice sampling . .8 2.2.2 Expectation propagation . 10 2.2.3 Elliptical Slice Sampling with Expectation Propagation . 11 2.3 Recycled elliptical slice sampling . 13 2.4 Analytic elliptical slice sampling . 17 2.4.1 Analytic EPESS for TMG . 19 2.5 Experimental results . 20 2.5.1 Probit regression . 21 2.5.2 Truncated multivariate Gaussian . 22 2.5.3 Log-Gaussian Cox process . 24 2.6 Discussion and conclusion . 24 i 3 Annular Augmentation Sampling 26 3.1 Introduction . 26 3.2 Annular augmentation sampling . 28 3.2.1 Annular Augmentations . 29 3.2.2 Generic Sampler . 31 3.2.3 Operator 2: Rao-Blackwellized Gibbs . 32 3.2.4 Choice And Effect Of The Priorp ^ ........................ 34 3.3 Experimental results . 35 3.3.1 2D Ising Model . 36 3.3.2 Heart Disease Risk Factors . 39 3.4 Conclusion . 41 4 The Advantage of Lefties in One-On-One Sports 42 4.1 Introduction and Related Literature . 42 4.2 The Latent Skill and Handedness Model . 48 4.2.1 Medium-Tailed Priors for the Innate Skill Distribution . 49 4.2.2 Large N Inference Using Only Aggregate Handedness Data . 51 4.2.3 Interpreting the Posterior of L .......................... 53 4.3 Including Match-Play and Handedness Data . 56 4.3.1 The Posterior Distribution . 57 4.3.2 Inference Via MCMC . 58 4.4 Using External Rankings and Handedness Data . 62 4.5 Numerical Results . 63 4.5.1 Posterior Distribution of L ............................ 64 4.5.2 On the Importance of Conditioning on Topn;N ................. 64 4.5.3 Posterior of Skills With and Without the Advantage of Left-handedness . 66 4.6 A Dynamic Model for L .................................. 69 4.7 Conclusions and Further Research . 73 ii II Advances in numerical stability for large-scale machine learning problems 76 5 Stable and scalable softmax optimization with double-sum formulations 77 5.1 Introduction . 77 5.2 Convex double-sum formulation . 80 5.2.1 Instability of ESGD . 81 5.2.2 Choice of double-sum formulation . 82 5.3 Stable and scalable SGD methods . 83 5.3.1 Implicit SGD . 83 5.3.2 U-max method . 86 5.4 Experiments . 88 5.4.1 Experimental setup . 88 5.4.2 Results . 91 5.5 Conclusion . 94 5.6 Extensions . 94 6 Implicit backpropagation 95 6.1 Introduction . 95 6.2 ISGD and related methods . 97 6.2.1 ISGD method . 97 6.2.2 Related methods . 98 6.3 Implicit Backpropagation . 99 6.4 Implicit Backpropagation updates for various activation functions . 101 6.4.1 Generic updates . 101 6.4.2 Relu update . 102 6.4.3 Arctan update . 104 6.4.4 Piecewise-cubic function update . 104 6.4.5 Relative run time difference of IB vs EB measured in flops . 104 6.5 Experiments . 106 6.6 Conclusion . 109 iii III Bibliography 111 IV Appendices 129 A Elliptical Slice Sampling with Expectation Propagation 130 A.1 Monte Carlo integration . 130 A.2 Markov Chain Monte Carlo (MCMC) . 130 A.3 Slice sampling . 131 B Annular Augmentation Sampling 132 B.1 Rao-Blackwellization . 132 B.2 2D-Ising model results . 132 B.2.1 LBP accuracy . 133 B.2.2 Benefit of Rao-Blackwellization . 133 B.2.3 RMSE for node and pairwise marginals . 133 B.3 Heart disease dataset . 139 B.4 CMH + LBP sampler . 139 B.5 Simulated data . 141 B.5.1 d =10........................................ 141 C The Advantage of Lefties in One-On-One Sports 143 C.1 Proof of Proposition 1 . 143 C.2 Rate of Convergence for Probability of Left-Handed Top Player Given Normal Innate Skills . 146 C.3 Difference in Skills at Given Quantiles with Laplace Distributed Innate Skills . 150 C.4 Estimating the Kalman Filtering Smoothing Parameter . 151 D Stable and scalable softmax optimization with double-sum formulations 153 D.1 Comparison of double-sum formulations . 153 D.2 Proof of variable bounds and strong convexity . 155 D.3 Proof of convergence of U-max method . 158 D.4 Stochastic Composition Optimization . 160 iv D.5 Proof of general ISGD gradient bound . 161 D.6 Update equations for ISGD . 161 D.6.1 Single datapoint, single class . 162 D.6.2 Bound on step size . 167 D.6.3 Single datapoint, multiple classes. 168 D.6.4 Multiple datapoints, multiple classes . 171 D.7 Comparison of double-sum formulations . 175 D.8 Results over epochs . 177 D.9 L1 penalty . ..
Recommended publications
  • Sampling Methods (Gatsby ML1 2017)
    Probabilistic & Unsupervised Learning Sampling Methods Maneesh Sahani [email protected] Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2017 Both are often intractable. Deterministic approximations on distributions (factored variational / mean-field; BP; EP) or expectations (Bethe / Kikuchi methods) provide tractability, at the expense of a fixed approximation penalty. An alternative is to represent distributions and compute expectations using randomly generated samples. Results are consistent, often unbiased, and precision can generally be improved to an arbitrary degree by increasing the number of samples. Sampling For inference and learning we need to compute both: I Posterior distributions (on latents and/or parameters) or predictive distributions. I Expectations with respect to these distributions. Deterministic approximations on distributions (factored variational / mean-field; BP; EP) or expectations (Bethe / Kikuchi methods) provide tractability, at the expense of a fixed approximation penalty. An alternative is to represent distributions and compute expectations using randomly generated samples. Results are consistent, often unbiased, and precision can generally be improved to an arbitrary degree by increasing the number of samples. Sampling For inference and learning we need to compute both: I Posterior distributions (on latents and/or parameters) or predictive distributions. I Expectations with respect to these distributions. Both are often intractable. An alternative is to represent distributions and compute expectations using randomly generated samples. Results are consistent, often unbiased, and precision can generally be improved to an arbitrary degree by increasing the number of samples. Sampling For inference and learning we need to compute both: I Posterior distributions (on latents and/or parameters) or predictive distributions.
    [Show full text]
  • Slice Sampling for General Completely Random Measures
    Slice Sampling for General Completely Random Measures Peiyuan Zhu Alexandre Bouchard-Cotˆ e´ Trevor Campbell Department of Statistics University of British Columbia Vancouver, BC V6T 1Z4 Abstract such model can select each category zero or once, the latent categories are called features [1], whereas if each Completely random measures provide a princi- category can be selected with multiplicities, the latent pled approach to creating flexible unsupervised categories are called traits [2]. models, where the number of latent features is Consider for example the problem of modelling movie infinite and the number of features that influ- ratings for a set of users. As a first rough approximation, ence the data grows with the size of the data an analyst may entertain a clustering over the movies set. Due to the infinity the latent features, poste- and hope to automatically infer movie genres. Cluster- rior inference requires either marginalization— ing in this context is limited; users may like or dislike resulting in dependence structures that prevent movies based on many overlapping factors such as genre, efficient computation via parallelization and actor and score preferences. Feature models, in contrast, conjugacy—or finite truncation, which arbi- support inference of these overlapping movie attributes. trarily limits the flexibility of the model. In this paper we present a novel Markov chain As the amount of data increases, one may hope to cap- Monte Carlo algorithm for posterior inference ture increasingly sophisticated patterns. We therefore that adaptively sets the truncation level using want the model to increase its complexity accordingly. auxiliary slice variables, enabling efficient, par- In our movie example, this means uncovering more and allelized computation without sacrificing flexi- more diverse user preference patterns and movie attributes bility.
    [Show full text]
  • Efficient Sampling for Gaussian Linear Regression with Arbitrary Priors
    Efficient sampling for Gaussian linear regression with arbitrary priors P. Richard Hahn ∗ Arizona State University and Jingyu He Booth School of Business, University of Chicago and Hedibert F. Lopes INSPER Institute of Education and Research May 18, 2018 Abstract This paper develops a slice sampler for Bayesian linear regression models with arbitrary priors. The new sampler has two advantages over current approaches. One, it is faster than many custom implementations that rely on auxiliary latent variables, if the number of regressors is large. Two, it can be used with any prior with a density function that can be evaluated up to a normalizing constant, making it ideal for investigating the properties of new shrinkage priors without having to develop custom sampling algorithms. The new sampler takes advantage of the special structure of the linear regression likelihood, allowing it to produce better effective sample size per second than common alternative approaches. Keywords: Bayesian computation, linear regression, shrinkage priors, slice sampling ∗The authors gratefully acknowledge the Booth School of Business for support. 1 1 Introduction This paper develops a computationally efficient posterior sampling algorithm for Bayesian linear regression models with Gaussian errors. Our new approach is motivated by the fact that existing software implementations for Bayesian linear regression do not readily handle problems with large number of observations (hundreds of thousands) and predictors (thou- sands). Moreover, existing sampling algorithms for popular shrinkage priors are bespoke Gibbs samplers based on case-specific latent variable representations. By contrast, the new algorithm does not rely on case-specific auxiliary variable representations, which allows for rapid prototyping of novel shrinkage priors outside the conditionally Gaussian framework.
    [Show full text]
  • Approximate Slice Sampling for Bayesian Posterior Inference
    Approximate Slice Sampling for Bayesian Posterior Inference Christopher DuBois∗ Anoop Korattikara∗ Max Welling Padhraic Smyth GraphLab, Inc. Dept. of Computer Science Informatics Institute Dept. of Computer Science UC Irvine University of Amsterdam UC Irvine Abstract features, or as is now popular in the context of deep learning, dropout. We believe deep learning is so suc- cessful because it provides the means to train arbitrar- In this paper, we advance the theory of large ily complex models on very large datasets that can be scale Bayesian posterior inference by intro- effectively regularized. ducing a new approximate slice sampler that uses only small mini-batches of data in ev- Methods for Bayesian posterior inference provide a ery iteration. While this introduces a bias principled alternative to frequentist regularization in the stationary distribution, the computa- techniques. The workhorse for approximate posterior tional savings allow us to draw more samples inference has long been MCMC, which unfortunately in a given amount of time and reduce sam- is not (yet) quite up to the task of dealing with very pling variance. We empirically verify on three large datasets. The main reason is that every iter- different models that the approximate slice ation of sampling requires O(N) calculations to de- sampling algorithm can significantly outper- cide whether a proposed parameter is accepted or not. form a traditional slice sampler if we are al- Frequentist methods based on stochastic gradients are lowed only a fixed amount of computing time orders of magnitude faster because they effectively ex- for our simulations. ploit the fact that far away from convergence, the in- formation in the data about how to improve the model is highly redundant.
    [Show full text]
  • Sampling Methods (Gatsby ML1 2015)
    Probabilistic & Unsupervised Learning Sampling Methods Maneesh Sahani [email protected] Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2016 Both are often intractable. Deterministic approximations on distributions (factored variational / mean-field; BP; EP) or expectations (Bethe / Kikuchi methods) provide tractability, at the expense of a fixed approximation penalty. An alternative is to represent distributions and compute expectations using randomly generated samples. Results are consistent, often unbiased, and precision can generally be improved to an arbitrary degree by increasing the number of samples. Sampling For inference and learning we need to compute both: I Posterior distributions (on latents and/or parameters) or predictive distributions. I Expectations with respect to these distributions. Deterministic approximations on distributions (factored variational / mean-field; BP; EP) or expectations (Bethe / Kikuchi methods) provide tractability, at the expense of a fixed approximation penalty. An alternative is to represent distributions and compute expectations using randomly generated samples. Results are consistent, often unbiased, and precision can generally be improved to an arbitrary degree by increasing the number of samples. Sampling For inference and learning we need to compute both: I Posterior distributions (on latents and/or parameters) or predictive distributions. I Expectations with respect to these distributions. Both are often intractable. An alternative is to represent distributions and compute expectations using randomly generated samples. Results are consistent, often unbiased, and precision can generally be improved to an arbitrary degree by increasing the number of samples. Sampling For inference and learning we need to compute both: I Posterior distributions (on latents and/or parameters) or predictive distributions.
    [Show full text]
  • Computer Intensive Statistics STAT:7400
    Computer Intensive Statistics STAT:7400 Luke Tierney Spring 2019 Introduction Syllabus and Background Basics • Review the course syllabus http://www.stat.uiowa.edu/˜luke/classes/STAT7400/syllabus.pdf • Fill out info sheets. – name – field – statistics background – computing background Homework • some problems will cover ideas not covered in class • working together is OK • try to work on your own • write-up must be your own • do not use solutions from previous years • submission by GitHub at http://github.uiowa.edu or by Icon at http://icon.uiowa.edu/. 1 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney Project • Find a topic you are interested in. • Written report plus possibly some form of presentation. Ask Questions • Ask questions if you are confused or think a point needs more discussion. • Questions can lead to interesting discussions. 2 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney Computational Tools Computers and Operating Systems • We will use software available on the Linux workstations in the Mathe- matical Sciences labs (Schaeffer 346 in particular). • Most things we will do can be done remotely by using ssh to log into one of the machines in Schaeffer 346 using ssh. These machines are l-lnx2xy.stat.uiowa.edu with xy = 00, 01, 02, . , 19. • You can also access the CLAS Linux systems using a browser at http://fastx.divms.uiowa.edu/ – This connects you to one of several servers. – It is OK to run small jobs on these servers. – For larger jobs you should log into one of the lab machines. • Most of the software we will use is available free for installing on any Linux, Mac OS X, or Windows computer.
    [Show full text]
  • Slice Sampling with Multivariate Steps by Madeleine B. Thompson A
    Slice Sampling with Multivariate Steps by Madeleine B. Thompson A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Statistics University of Toronto Copyright c 2011 by Madeleine B. Thompson ii Abstract Slice Sampling with Multivariate Steps Madeleine B. Thompson Doctor of Philosophy Graduate Department of Statistics University of Toronto 2011 Markov chain Monte Carlo (MCMC) allows statisticians to sample from a wide variety of multidimensional probability distributions. Unfortunately, MCMC is often difficult to use when components of the target distribution are highly correlated or have disparate variances. This thesis presents three results that attempt to address this problem. First, it demonstrates a means for graphical comparison of MCMC methods, which allows re- searchers to compare the behavior of a variety of samplers on a variety of distributions. Second, it presents a collection of new slice-sampling MCMC methods. These methods either adapt globally or use the adaptive crumb framework for sampling with multivariate steps. They perform well with minimal tuning on distributions when popular methods do not. Methods in the first group learn an approximation to the covariance of the tar- get distribution and use its eigendecomposition to take non-axis-aligned steps. Methods in the second group use the gradients at rejected proposed moves to approximate the local shape of the target distribution so that subsequent proposals move more efficiently through the state space. Finally, this thesis explores the scaling of slice sampling with multivariate steps with respect to dimension, resulting in a formula for optimally choos- ing scale tuning parameters.
    [Show full text]
  • Slice Sampling on Hamiltonian Trajectories
    Slice Sampling on Hamiltonian Trajectories Benjamin Bloem-Reddy [email protected] John P. Cunningham [email protected] Department of Statistics, Columbia University Abstract tonian system, and samples a new point from the distribu- tion by simulating a dynamical trajectory from the current Hamiltonian Monte Carlo and slice sampling sample point. With careful tuning, HMC exhibits favorable are amongst the most widely used and stud- mixing properties in many situations, particularly in high ied classes of Markov Chain Monte Carlo sam- dimensions, due to its ability to take large steps in sam- plers. We connect these two methods and present ple space if the dynamics are simulated for long enough. Hamiltonian slice sampling, which allows slice Proper tuning of HMC parameters can be difficult, and sampling to be carried out along Hamiltonian there has been much interest in automating it; examples in- trajectories, or transformations thereof. Hamil- clude Wang et al.(2013); Hoffman & Gelman(2014). Fur- tonian slice sampling clarifies a class of model thermore, better mixing rates associated with longer trajec- priors that induce closed-form slice samplers. tories can be computationally expensive because each sim- More pragmatically, inheriting properties of slice ulation step requires an evaluation or numerical approxi- samplers, it offers advantages over Hamiltonian mation of the gradient of the distribution. Monte Carlo, in that it has fewer tunable hyper- parameters and does not require gradient infor- Similar in objective but different in approach is slice sam- mation. We demonstrate the utility of Hamil- pling, which attempts to sample uniformly from the volume tonian slice sampling out of the box on prob- under the target density.
    [Show full text]
  • A Markov Chain Monte Carlo Version of the Genetic Algorithm Differential Evolution: Easy Bayesian Computing for Real Parameter Spaces
    Stat Comput (2006) 16:239–249 DOI 10.1007/s11222-006-8769-1 A Markov Chain Monte Carlo version of the genetic algorithm Differential Evolution: easy Bayesian computing for real parameter spaces Cajo J. F. Ter Braak Received: April 2004 / Accepted: January 2006 C Springer Science + Business Media, LLC 2006 Abstract Differential Evolution (DE) is a simple genetic al- Keywords Block updating · Evolutionary Monte Carlo · gorithm for numerical optimization in real parameter spaces. Metropolis algorithm · Population Markov Chain Monte In a statistical context one would not just want the optimum Carlo · Simulated Annealing · Simulated Tempering; but also its uncertainty. The uncertainty distribution can be Theophylline Kinetics. obtained by a Bayesian analysis (after specifying prior and likelihood) using Markov Chain Monte Carlo (MCMC) sim- ulation. This paper integrates the essential ideas of DE and 1. Introduction MCMC, resulting in Differential Evolution Markov Chain (DE-MC). DE-MC is a population MCMC algorithm, in This paper combines the genetic algorithm called Differen- which multiple chains are run in parallel. DE-MC solves tial Evolution (DE) (Price and Storn, 1997, Storn and Price, an important problem in MCMC, namely that of choosing an 1997, Price, 1999) for global optimization over real parame- appropriate scale and orientation for the jumping distribu- ter space with Markov Chain Monte Carlo (MCMC) (Gilks tion. In DE-MC the jumps are simply a fixed multiple of the et al., 1996) so as to generate a sample from a target distribu- differences of two random parameter vectors that are cur- tion. In Bayesian analysis the target distribution is typically a rently in the population.
    [Show full text]
  • Slice Sampler Algorithm for Generalized Pareto Distribution
    Slice sampler algorithm for generalized Pareto distribution Mohammad Rostami∗y, Mohd Bakri Adam Yahyaz, Mohamed Hisham Yahyax, Noor Akma Ibrahim{ Abstract In this paper, we developed the slice sampler algorithm for the gener- alized Pareto distribution (GPD) model. Two simulation studies have shown the performance of the peaks over given threshold (POT) and GPD density function on various simulated data sets. The results were compared with another commonly used Markov chain Monte Carlo (MCMC) technique called Metropolis-Hastings algorithm. Based on the results, the slice sampler algorithm provides closer posterior mean values and shorter 95% quantile based credible intervals compared to the Metropolis-Hastings algorithm. Moreover, the slice sampler algo- rithm presents a higher level of stationarity in terms of the scale and shape parameters compared with the Metropolis-Hastings algorithm. Finally, the slice sampler algorithm was employed to estimate the re- turn and risk values of investment in Malaysian gold market. Keywords: extreme value theory, Markov chain Monte Carlo, slice sampler, Metropolis-Hastings algorithm, Bayesian analysis, gold price. 2000 AMS Classification: 62C12, 62F15, 60G70, 62G32, 97K50, 65C40. 1. Introduction The field of extreme value theory (EVT) goes back to 1927, when Fr´echet [1] formulated the functional equation of stability for maxima, which later was solved with some restrictions by Fisher and Tippett [2] and finally by Gnedenko [3] and De Haan [4]. There are two main approaches in the modelling of extreme values. First, under certain conditions, the asymptotic distribution of a series of maxima (minima) can be properly approximated by Gumbel, Weibull and Frechet distributions which have been unified in a generalized form named generalized extreme value (GEV) distribution [5].
    [Show full text]
  • Sampling Algorithms, from Survey Sampling to Monte Carlo Methods: Tutorial and Literature Review
    Sampling Algorithms, from Survey Sampling to Monte Carlo Methods: Tutorial and Literature Review Benyamin Ghojogh* [email protected] Department of Electrical and Computer Engineering, Machine Learning Laboratory, University of Waterloo, Waterloo, ON, Canada Hadi Nekoei* [email protected] MILA (Montreal Institute for Learning Algorithms) – Quebec AI Institute, Montreal, Quebec, Canada Aydin Ghojogh* [email protected] Fakhri Karray [email protected] Department of Electrical and Computer Engineering, Centre for Pattern Analysis and Machine Intelligence, University of Waterloo, Waterloo, ON, Canada Mark Crowley [email protected] Department of Electrical and Computer Engineering, Machine Learning Laboratory, University of Waterloo, Waterloo, ON, Canada * The first three authors contributed equally to this work. Abstract pendent, we cover importance sampling and re- This paper is a tutorial and literature review on jection sampling. For MCMC methods, we cover sampling algorithms. We have two main types Metropolis algorithm, Metropolis-Hastings al- of sampling in statistics. The first type is sur- gorithm, Gibbs sampling, and slice sampling. vey sampling which draws samples from a set or Then, we explain the random walk behaviour of population. The second type is sampling from Monte Carlo methods and more efficient Monte probability distribution where we have a proba- Carlo methods, including Hamiltonian (or hy- bility density or mass function. In this paper, we brid) Monte Carlo, Adler’s overrelaxation, and cover both types of sampling. First, we review ordered overrelaxation. Finally, we summarize some required background on mean squared er- the characteristics, pros, and cons of sampling ror, variance, bias, maximum likelihood estima- methods compared to each other. This paper can tion, Bernoulli, Binomial, and Hypergeometric be useful for different fields of statistics, machine distributions, the Horvitz–Thompson estimator, learning, reinforcement learning, and computa- and the Markov property.
    [Show full text]
  • Multivariate Slice Sampling
    Multivariate Slice Sampling A Thesis Submitted to the Faculty of Drexel University by Jingjing Lu in partial ful¯llment of the requirements for the degree of Doctor of Philosophy June 2008 °c Copyright June 2008 Jingjing Lu . All Rights Reserved. Dedications To My beloved family Acknowledgements I would like to express my gratitude to my entire thesis committee: Professor Avijit Banerjee, Professor Thomas Hindelang, Professor Seung-Lae Kim, Pro- fessor Merrill Liechty, Professor Maria Rieders, who were always available and always willing to help. This dissertation could not have been ¯nished without them. Especially, I would like to give my special thanks to my advisor and friend, Professor Liechty, who o®ered me stimulating suggestions and encouragement during my graduate studies. I would like to thank Decision Sciences department and all the faculty and sta® at LeBow College of Business for their help and caring. I thank my family for their support and encouragement all the time. Thank you for always being there for me. Many thanks also go to all my friends, who share a lot of happiness and dreams with me. i Table of Contents List of Figures ................................................................ v List of Tables ................................................................. vii Abstract ....................................................................... viii 1. Introduction ............................................................... 1 1.1 Bayesian Computations ............................................. 4 1.2 Monte
    [Show full text]