Lecture Notes on Bayesian Nonparametrics Peter Orbanz

Total Page:16

File Type:pdf, Size:1020Kb

Lecture Notes on Bayesian Nonparametrics Peter Orbanz Lecture Notes on Bayesian Nonparametrics Peter Orbanz Version: May 16, 2014 These are class notes for a PhD level course on Bayesian nonparametrics, taught at Columbia University in Fall 2013. This text is a draft. I apologize for the many typos, false statements and poor explanations no doubt still left in the text; I will post corrections as they become available. Contents Chapter 1. Terminology1 1.1. Models1 1.2. Parameters and patterns2 1.3. Bayesian and nonparametric Bayesian models3 Chapter 2. Clustering and the Dirichlet Process5 2.1. Mixture models5 2.2. Bayesian mixtures7 2.3. Dirichlet processes and stick-breaking8 2.4. The posterior of a Dirichlet process 10 2.5. Gibbs-sampling Bayesian mixtures 11 2.6. Random partitions 15 2.7. The Chinese restaurant process 16 2.8. Power laws and the Pitman-Yor process 17 2.9. The number of components in a mixture 19 2.10. Historical references 20 Chapter 3. Latent features and the Indian buffet process 23 3.1. Latent feature models 24 3.2. The Indian buffet process 25 3.3. Exchangeability in the IBP 26 3.4. Random measure representation of latent feature models 26 Chapter 4. Regression and the Gaussian process 29 4.1. Gaussian processes 29 4.2. Gaussian process priors and posteriors 30 4.3. Is the definition meaningful? 33 Chapter 5. Models as building blocks 35 5.1. Mixture models 35 5.2. Hierarchical models 36 5.3. Covariate-dependent models 40 Chapter 6. Exchangeability 43 6.1. Bayesian models and conditional independence 43 6.2. Prediction and exchangeability 44 6.3. de Finetti's theorem 46 6.4. Exchangeable partitions 48 6.5. Exchangeable arrays 50 6.6. Applications in Bayesian statistics 52 iii iv CONTENTS Chapter 7. Posterior distributions 55 7.1. Existence of posteriors 56 7.2. Bayes' theorem 57 7.3. Dominated models 58 7.4. Conjugacy 60 7.5. Gibbs measures and exponential families 63 7.6. Conjugacy in exponential families 65 7.7. Posterior asymptotics, in a cartoon overview 66 Chapter 8. Random measures 71 8.1. Sampling models for random measure priors 72 8.2. Random discrete measures and point processes 73 8.3. Poisson processes 73 8.4. Total mass and random CDFs 76 8.5. Infinite divisibility and subordinators 77 8.6. Poisson random measures 79 8.7. Completely random measures 80 8.8. Normalization 81 8.9. Beyond the discrete case: General random measures 83 8.10. Further references 85 Appendix A. Poisson, gamma and stable distributions 87 A.1. The Poisson 87 A.2. The gamma 87 A.3. The Dirichlet 88 A.4. The stable 89 Appendix B. Nice spaces 91 B.1. Polish spaces 91 B.2. Standard Borel spaces 92 B.3. Locally compact spaces 93 Appendix C. Conditioning 95 C.1. Probability kernels 95 C.2. Conditional probability 95 C.3. Conditional random variables 96 C.4. Conditional densities 97 Appendix. Index of definitions 99 Appendix. Bibliography 101 CHAPTER 1 Terminology 1.1. Models The term \model" will be thrown around a lot in the following. By a statistical model on a sample space X, we mean a set of probability measures on X. If we write PM(X) for the space of all probability measures on X, a model is a subset M ⊂ PM(X). The elements of M are indexed by a parameter θ with values in a parameter space T, that is, M = fPθjθ 2 Tg ; (1.1) where each Pθ is an element of PM(X). (We require of course that the set M is measurable in PM(X), and that the assignment θ 7! Pθ is bijective and measur- able.) We call a model parametric if T has finite dimension (which usually means T ⊂ Rd for some d 2 N). If T has infinite dimension, M is called a nonparametric model. To formulate statistical problems, we assume that n observations x1; : : : ; xn with values in X are recorded, which we model as random variables X1;:::;Xn. In classical statistics, we assume that these random variables are generated i.i.d. from a measure in the model, i.e. X1;:::;Xn ∼iid Pθ for some θ 2 T : (1.2) The objective of statistical inference is then to draw conclusions about the value of θ (and hence about the distribution Pθ of the data) from the observations. Example 1.1 (Parametric and Nonparametric density estimation). There is noth- ing Bayesian to this example: We merely try to illustrate the difference between parametric and nonparametric methods. Suppose we observe data X1;X2;::: in R and would like to get an estimate of the underlying density. Consider the following two estimators: Figure 1.1. Density estimation with Gaussians: Maximum likelihood estimation (left) and kernel density estimation (right). 1 2 1. TERMINOLOGY (1) Gaussian fit. We fit a Gaussian density to the data by maximum likelihood estimation. (2) Kernel density estimator. We again use a Gaussian density function g, in this case as a kernel: For each observation Xi = xi, we add one Gaussian density with mean xi to our model. The density estimate is then the density 1 Pn pn(x) := n i=1 g(xjxi; σ). Intuitively, we are \smoothing" the data by con- volution with a Gaussian. (Kernel estimates usually also decrease the variance with increasing n, but we skip details.) Figure 1.1 illustrates the two estimators. Now compare the number of parameters used by each of the two estimators: • The Gaussian maximum likelihood estimate has 2 degrees of freedom (mean and standard deviation), regardless of the sample size n. This model is para- metric. • The kernel estimate requires an additional mean parameter for each additional data point. Thus, the number of degrees of freedom grows linearly with the sample size n. Asymptotically, the number of scalar parameters required is infinite, and to summarize them as a vector in a parameter space T, we need an infinite-dimensional space. / 1.2. Parameters and patterns A helpful intuition, especially for Bayesian nonparametrics, is to think of θ as a pattern that explains the data. Figure 1.2 (left) shows a simple example, a linear regression problem. The dots are the observed data, which shows a clear linear trend. The line is the pattern we use to explain the data; in this case, simply a linear function. Think of θ as this function. The parameter space T is hence the set of linear functions on R. Given θ, the distribution Pθ explains how the dots scatter around the line. Since a linear function on R can be specified using two scalars, an offset and a slope, T can equivalently be expressed as R2. Comparing to our definitions above, we see that this linear regression model is parametric. Now suppose the trend in the data is clearly nonlinear. We could then use the set of all functions on R as our parameter space, rather than just linear ones. Of course, we would usually want a regression function to be continuous and reasonably smooth, so we could choose T as, say, the set of all twice continuously differentiable functions on R. An example function θ with data generated from it could then look Figure 1.2. Regression problems: Linear (left) and nonlinear (right). In either case, we regard the regression function (plotted in blue) as the model parameter. 1.3. BAYESIAN AND NONPARAMETRIC BAYESIAN MODELS 3 like the function in Figure 1.2 (right). The space T is now infinite-dimensional, which means the model is nonparametric. 1.3. Bayesian and nonparametric Bayesian models In Bayesian statistics, we model the parameter as a random variable: The value of the parameter is unknown, and a basic principle of Bayesian statistics is that all forms of uncertainty should be expressed as randomness. We therefore have to consider a random variable Θ with values in T. We make a modeling assumption on how Θ is distributed, by choosing a specific distribution Q and assuming Q = L(Θ). The distribution Q is called the prior distribution (or prior for short) of the model. A Bayesian model therefore consists of a model M as above, called the observation model, and a prior Q. Under a Bayesian model, data is generated in two stages, as Θ ∼ Q (1.3) X1;X2;::: jΘ ∼iid PΘ : This means the data is conditionally i.i.d. rather than i.i.d. Our objective is then to determine the posterior distribution, the conditional distribution of Θ given the data, Q[Θ 2 • jX1 = x1;:::;Xn = xn] : (1.4) This is the counterpart to parameter estimation in the classical approach. The value of the parameter remains uncertain given a finite number of observations, and Bayesian statistics uses the posterior distribution to express this uncertainty. A nonparametric Bayesian model is a Bayesian model whose parameter space has infinite dimension. To define a nonparametric Bayesian model, we have to define a probability distribution (the prior) on an infinite-dimensional space. A distribution on an infinite-dimensional space T is a stochastic process with paths in T. Such distributions are typically harder to define than distributions on Rd, but we can draw on a large arsenal of tools from stochastic process theory and applied probability. CHAPTER 2 Clustering and the Dirichlet Process The first of the basic models we consider is the Dirichlet process, which is used in particular in data clustering. In a clustering problem, we are given observations x1; : : : ; xn, and the objective is to subdivide the sample into subsets, the clusters. The observations within each cluster should be mutually similar, in some sense we have to specify.
Recommended publications
  • Introduction to Lévy Processes
    Introduction to Lévy Processes Huang Lorick [email protected] Document type These are lecture notes. Typos, errors, and imprecisions are expected. Comments are welcome! This version is available at http://perso.math.univ-toulouse.fr/lhuang/enseignements/ Year of publication 2021 Terms of use This work is licensed under a Creative Commons Attribution 4.0 International license: https://creativecommons.org/licenses/by/4.0/ Contents Contents 1 1 Introduction and Examples 2 1.1 Infinitely divisible distributions . 2 1.2 Examples of infinitely divisible distributions . 2 1.3 The Lévy Khintchine formula . 4 1.4 Digression on Relativity . 6 2 Lévy processes 8 2.1 Definition of a Lévy process . 8 2.2 Examples of Lévy processes . 9 2.3 Exploring the Jumps of a Lévy Process . 11 3 Proof of the Levy Khintchine formula 19 3.1 The Lévy-Itô Decomposition . 19 3.2 Consequences of the Lévy-Itô Decomposition . 21 3.3 Exercises . 23 3.4 Discussion . 23 4 Lévy processes as Markov Processes 24 4.1 Properties of the Semi-group . 24 4.2 The Generator . 26 4.3 Recurrence and Transience . 28 4.4 Fractional Derivatives . 29 5 Elements of Stochastic Calculus with Jumps 31 5.1 Example of Use in Applications . 31 5.2 Stochastic Integration . 32 5.3 Construction of the Stochastic Integral . 33 5.4 Quadratic Variation and Itô Formula with jumps . 34 5.5 Stochastic Differential Equation . 35 Bibliography 38 1 Chapter 1 Introduction and Examples In this introductive chapter, we start by defining the notion of infinitely divisible distributions. We then give examples of such distributions and end this chapter by stating the celebrated Lévy-Khintchine formula.
    [Show full text]
  • Superprocesses and Mckean-Vlasov Equations with Creation of Mass
    Sup erpro cesses and McKean-Vlasov equations with creation of mass L. Overb eck Department of Statistics, University of California, Berkeley, 367, Evans Hall Berkeley, CA 94720, y U.S.A. Abstract Weak solutions of McKean-Vlasov equations with creation of mass are given in terms of sup erpro cesses. The solutions can b e approxi- mated by a sequence of non-interacting sup erpro cesses or by the mean- eld of multityp e sup erpro cesses with mean- eld interaction. The lat- ter approximation is asso ciated with a propagation of chaos statement for weakly interacting multityp e sup erpro cesses. Running title: Sup erpro cesses and McKean-Vlasov equations . 1 Intro duction Sup erpro cesses are useful in solving nonlinear partial di erential equation of 1+ the typ e f = f , 2 0; 1], cf. [Dy]. Wenowchange the p oint of view and showhowtheyprovide sto chastic solutions of nonlinear partial di erential Supp orted byanFellowship of the Deutsche Forschungsgemeinschaft. y On leave from the Universitat Bonn, Institut fur Angewandte Mathematik, Wegelerstr. 6, 53115 Bonn, Germany. 1 equation of McKean-Vlasovtyp e, i.e. wewant to nd weak solutions of d d 2 X X @ @ @ + d x; + bx; : 1.1 = a x; t i t t t t t ij t @t @x @x @x i j i i=1 i;j =1 d Aweak solution = 2 C [0;T];MIR satis es s Z 2 t X X @ @ a f = f + f + d f + b f ds: s ij s t 0 i s s @x @x @x 0 i j i Equation 1.1 generalizes McKean-Vlasov equations of twodi erenttyp es.
    [Show full text]
  • Birth and Death Process in Mean Field Type Interaction
    BIRTH AND DEATH PROCESS IN MEAN FIELD TYPE INTERACTION MARIE-NOÉMIE THAI ABSTRACT. Theaim ofthispaperis to study theasymptoticbehaviorofa system of birth and death processes in mean field type interaction in discrete space. We first establish the exponential convergence of the particle system to equilibrium for a suitable Wasserstein coupling distance. The approach provides an explicit quantitative estimate on the rate of convergence. We prove next a uniform propagation of chaos property. As a consequence, we show that the limit of the associated empirical distribution, which is the solution of a nonlinear differential equation, converges exponentially fast to equilibrium. This paper can be seen as a discrete version of the particle approximation of the McKean-Vlasov equations and is inspired from previous works of Malrieu and al and Caputo, Dai Pra and Posta. AMS 2000 Mathematical Subject Classification: 60K35, 60K25, 60J27, 60B10, 37A25. Keywords: Interacting particle system - mean field - coupling - Wasserstein distance - propagation of chaos. CONTENTS 1. Introduction 1 Long time behavior of the particle system 4 Propagation of chaos 5 Longtimebehaviorofthenonlinearprocess 7 2. Proof of Theorem1.1 7 3. Proof of Theorem1.2 12 4. Proof of Theorem1.5 15 5. Appendix 16 References 18 1. INTRODUCTION The concept of mean field interaction arised in statistical physics with Kac [17] and then arXiv:1510.03238v1 [math.PR] 12 Oct 2015 McKean [21] in order to describe the collisions between particles in a gas, and has later been applied in other areas such as biology or communication networks. A particle system is in mean field interaction when the system acts over one fixed particle through the em- pirical measure of the system.
    [Show full text]
  • 1 Introduction
    The analysis of marked and weighted empirical processes of estimated residuals Vanessa Berenguer-Rico Department of Economics; University of Oxford; Oxford; OX1 3UQ; UK and Mansfield College; Oxford; OX1 3TF [email protected] Søren Johansen Department of Economics; University of Copenhagen and CREATES; Department of Economics and Business; Aarhus University [email protected] Bent Nielsen1 Department of Economics; University of Oxford; Oxford; OX1 3UQ; UK and Nuffield College; Oxford; OX1 1NF; U.K. bent.nielsen@nuffield.ox.ac.uk 29 April 2019 Summary An extended and improved theory is presented for marked and weighted empirical processes of residuals of time series regressions. The theory is motivated by 1- step Huber-skip estimators, where a set of good observations are selected using an initial estimator and an updated estimator is found by applying least squares to the selected observations. In this case, the weights and marks represent powers of the regressors and the regression errors, respectively. The inclusion of marks is a non-trivial extention to previous theory and requires refined martingale arguments. Keywords 1-step Huber-skip; Non-stationarity; Robust Statistics; Stationarity. 1 Introduction We consider marked and weighted empirical processes of residuals from a linear time series regression. Such processes are sums of products of an adapted weight, a mark that is a power of the innovations and an indicator for the residuals belonging to a half line. They have previously been studied by Johansen & Nielsen (2016a) - JN16 henceforth - generalising results by Koul & Ossiander (1994) and Koul (2002) for processes without marks. The results presented extend and improve upon expansions previously given in JN16, while correcting a mistake in the argument, simplifying proofs and allowing weaker conditions on the innovation distribution and regressors.
    [Show full text]
  • A Mathematical Theory of Network Interference and Its Applications
    INVITED PAPER A Mathematical Theory of Network Interference and Its Applications A unifying framework is developed to characterize the aggregate interference in wireless networks, and several applications are presented. By Moe Z. Win, Fellow IEEE, Pedro C. Pinto, Student Member IEEE, and Lawrence A. Shepp ABSTRACT | In this paper, we introduce a mathematical I. INTRODUCTION framework for the characterization of network interference in In a wireless network composed of many spatially wireless systems. We consider a network in which the scattered nodes, communication is constrained by various interferers are scattered according to a spatial Poisson process impairments such as the wireless propagation effects, and are operating asynchronously in a wireless environment network interference, and thermal noise. The effects subject to path loss, shadowing, and multipath fading. We start introduced by propagation in the wireless channel include by determining the statistical distribution of the aggregate the attenuation of radiated signals with distance (path network interference. We then investigate four applications of loss), the blocking of signals caused by large obstacles the proposed model: 1) interference in cognitive radio net- (shadowing), and the reception of multiple copies of the works; 2) interference in wireless packet networks; 3) spectrum same transmitted signal (multipath fading). The network of the aggregate radio-frequency emission of wireless net- interference is due to accumulation of signals radiated by works; and 4) coexistence between ultrawideband and nar- other transmitters, which undesirably affect receiver nodes rowband systems. Our framework accounts for all the essential in the network. The thermal noise is introduced by the physical parameters that affect network interference, such as receiver electronics and is usually modeled as additive the wireless propagation effects, the transmission technology, white Gaussian noise (AWGN).
    [Show full text]
  • Hyperprior on Symmetric Dirichlet Distribution
    Hyperprior on symmetric Dirichlet distribution Jun Lu Computer Science, EPFL, Lausanne [email protected] November 5, 2018 Abstract In this article we introduce how to put vague hyperprior on Dirichlet distribution, and we update the parameter of it by adaptive rejection sampling (ARS). Finally we analyze this hyperprior in an over-fitted mixture model by some synthetic experiments. 1 Introduction It has become popular to use over-fitted mixture models in which number of cluster K is chosen as a conservative upper bound on the number of components under the expectation that only relatively few of the components K0 will be occupied by data points in the samples X . This kind of over-fitted mixture models has been successfully due to the ease in computation. Previously Rousseau & Mengersen(2011) proved that quite generally, the posterior behaviour of overfitted mixtures depends on the chosen prior on the weights, and on the number of free parameters in the emission distributions (here D, i.e. the dimension of data). Specifically, they have proved that (a) If α=min(αk; k ≤ K)>D=2 and if the number of components is larger than it should be, asymptotically two or more components in an overfitted mixture model will tend to merge with non-negligible weights. (b) −1=2 In contrast, if α=max(αk; k 6 K) < D=2, the extra components are emptied at a rate of N . Hence, if none of the components are small, it implies that K is probably not larger than K0. In the intermediate case, if min(αk; k ≤ K) ≤ D=2 ≤ max(αk; k 6 K), then the situation varies depending on the αk’s and on the difference between K and K0.
    [Show full text]
  • Analysis of Error Propagation in Particle Filters with Approximation
    The Annals of Applied Probability 2011, Vol. 21, No. 6, 2343–2378 DOI: 10.1214/11-AAP760 c Institute of Mathematical Statistics, 2011 ANALYSIS OF ERROR PROPAGATION IN PARTICLE FILTERS WITH APPROXIMATION1 By Boris N. Oreshkin and Mark J. Coates McGill University This paper examines the impact of approximation steps that be- come necessary when particle filters are implemented on resource- constrained platforms. We consider particle filters that perform in- termittent approximation, either by subsampling the particles or by generating a parametric approximation. For such algorithms, we de- rive time-uniform bounds on the weak-sense Lp error and present associated exponential inequalities. We motivate the theoretical anal- ysis by considering the leader node particle filter and present numeri- cal experiments exploring its performance and the relationship to the error bounds. 1. Introduction. Particle filters have proven to be an effective approach for addressing difficult tracking problems [8]. Since they are more compu- tationally demanding and require more memory than most other filtering algorithms, they are really only a valid choice for challenging problems, for which other well-established techniques perform poorly. Such problems in- volve dynamics and/or observation models that are substantially nonlinear and non-Gaussian. A particle filter maintains a set of “particles” that are candidate state values of the system (e.g., the position and velocity of an object). The filter evaluates how well individual particles correspond to the dynamic model and set of observations, and updates weights accordingly. The set of weighted particles provides a pointwise approximation to the filtering distribution, which represents the posterior probability of the state.
    [Show full text]
  • Introduction to Lévy Processes
    Introduction to L´evyprocesses Graduate lecture 22 January 2004 Matthias Winkel Departmental lecturer (Institute of Actuaries and Aon lecturer in Statistics) 1. Random walks and continuous-time limits 2. Examples 3. Classification and construction of L´evy processes 4. Examples 5. Poisson point processes and simulation 1 1. Random walks and continuous-time limits 4 Definition 1 Let Yk, k ≥ 1, be i.i.d. Then n X 0 Sn = Yk, n ∈ N, k=1 is called a random walk. -4 0 8 16 Random walks have stationary and independent increments Yk = Sk − Sk−1, k ≥ 1. Stationarity means the Yk have identical distribution. Definition 2 A right-continuous process Xt, t ∈ R+, with stationary independent increments is called L´evy process. 2 Page 1 What are Sn, n ≥ 0, and Xt, t ≥ 0? Stochastic processes; mathematical objects, well-defined, with many nice properties that can be studied. If you don’t like this, think of a model for a stock price evolving with time. There are also many other applications. If you worry about negative values, think of log’s of prices. What does Definition 2 mean? Increments , = 1 , are independent and Xtk − Xtk−1 k , . , n , = 1 for all 0 = . Xtk − Xtk−1 ∼ Xtk−tk−1 k , . , n t0 < . < tn Right-continuity refers to the sample paths (realisations). 3 Can we obtain L´evyprocesses from random walks? What happens e.g. if we let the time unit tend to zero, i.e. take a more and more remote look at our random walk? If we focus at a fixed time, 1 say, and speed up the process so as to make n steps per time unit, we know what happens, the answer is given by the Central Limit Theorem: 2 Theorem 1 (Lindeberg-L´evy) If σ = V ar(Y1) < ∞, then Sn − (Sn) √E → Z ∼ N(0, σ2) in distribution, as n → ∞.
    [Show full text]
  • Fclts for the Quadratic Variation of a CTRW and for Certain Stochastic Integrals
    FCLTs for the Quadratic Variation of a CTRW and for certain stochastic integrals No`eliaViles Cuadros (joint work with Enrico Scalas) Universitat de Barcelona Sevilla, 17 de Septiembre 2013 Damped harmonic oscillator subject to a random force The equation of motion is informally given by x¨(t) + γx_(t) + kx(t) = ξ(t); (1) where x(t) is the position of the oscillating particle with unit mass at time t, γ > 0 is the damping coefficient, k > 0 is the spring constant and ξ(t) represents white L´evynoise. I. M. Sokolov, Harmonic oscillator under L´evynoise: Unexpected properties in the phase space. Phys. Rev. E. Stat. Nonlin Soft Matter Phys 83, 041118 (2011). 2 of 27 The formal solution is Z t x(t) = F (t) + G(t − t0)ξ(t0)dt0; (2) −∞ where G(t) is the Green function for the homogeneous equation. The solution for the velocity component can be written as Z t 0 0 0 v(t) = Fv (t) + Gv (t − t )ξ(t )dt ; (3) −∞ d d where Fv (t) = dt F (t) and Gv (t) = dt G(t). 3 of 27 • Replace the white noise with a sequence of instantaneous shots of random amplitude at random times. • They can be expressed in terms of the formal derivative of compound renewal process, a random walk subordinated to a counting process called continuous-time random walk. A continuous time random walk (CTRW) is a pure jump process given by a sum of i.i.d. random jumps fYi gi2N separated by i.i.d. random waiting times (positive random variables) fJi gi2N.
    [Show full text]
  • Levy Processes
    LÉVY PROCESSES, STABLE PROCESSES, AND SUBORDINATORS STEVEN P.LALLEY 1. DEFINITIONSAND EXAMPLES d Definition 1.1. A continuous–time process Xt = X(t ) t 0 with values in R (or, more generally, in an abelian topological groupG ) isf called a Lévyg ≥ process if (1) its sample paths are right-continuous and have left limits at every time point t , and (2) it has stationary, independent increments, that is: (a) For all 0 = t0 < t1 < < tk , the increments X(ti ) X(ti 1) are independent. − (b) For all 0 s t the··· random variables X(t ) X−(s ) and X(t s ) X(0) have the same distribution.≤ ≤ − − − The default initial condition is X0 = 0. A subordinator is a real-valued Lévy process with nondecreasing sample paths. A stable process is a real-valued Lévy process Xt t 0 with ≥ initial value X0 = 0 that satisfies the self-similarity property f g 1/α (1.1) Xt =t =D X1 t > 0. 8 The parameter α is called the exponent of the process. Example 1.1. The most fundamental Lévy processes are the Wiener process and the Poisson process. The Poisson process is a subordinator, but is not stable; the Wiener process is stable, with exponent α = 2. Any linear combination of independent Lévy processes is again a Lévy process, so, for instance, if the Wiener process Wt and the Poisson process Nt are independent then Wt Nt is a Lévy process. More important, linear combinations of independent Poisson− processes are Lévy processes: these are special cases of what are called compound Poisson processes: see sec.
    [Show full text]
  • Financial Modeling with L´Evy Processes and Applying L
    FINANCIAL MODELING WITH LEVY´ PROCESSES AND APPLYING LEVY´ SUBORDINATOR TO CURRENT STOCK DATA by GONSALGE ALMEIDA Submitted in partial fullfillment of the requirements for the degree of Doctor of Philosophy Dissertation Advisor: Dr. Wojbor A. Woyczynski Department of Mathematics, Applied Mathematics and Statistics CASE WESTERN RESERVE UNIVERSITY January 2020 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the dissertation of Gonsalge Almeida candidate for the Doctoral of Philosophy degree Committee Chair: Dr.Wojbor Woyczynski Professor, Department of the Mathematics, Applied Mathematics and Statis- tics Committee: Dr.Alethea Barbaro Associate Professor, Department of the Mathematics, Applied Mathematics and Statistics Committee: Dr.Jenny Brynjarsdottir Associate Professor, Department of the Mathematics, Applied Mathematics and Statistics Committee: Dr.Peter Ritchken Professor, Weatherhead School of Management Acceptance date: June 14, 2019 *We also certify that written approval has been obtained for any proprietary material contained therein. CONTENTS List of Figures iv List of Tables ix Introduction . .1 1 Financial Modeling with L´evyProcesses and Infinitely Divisible Distributions 5 1.1 Introduction . .5 1.2 Preliminaries on L´evyprocesses . .6 1.3 Characteristic Functions . .8 1.4 Cumulant Generating Function . .9 1.5 α−Stable Distributions . 10 1.6 Tempered Stable Distribution and Process . 19 1.6.1 Tempered Stable Diffusion and Super-Diffusion . 23 1.7 Numerical Approximation of Stable and Tempered Stable Sample Paths 28 1.8 Monte Carlo Simulation for Tempered α−Stable L´evyprocess . 34 2 Brownian Subordination (Tempered Stable Subordinator) 44 i 2.1 Introduction . 44 2.2 Tempered Anomalous Subdiffusion . 46 2.3 Subordinators . 49 2.4 Time-Changed Brownian Motion .
    [Show full text]
  • A Dirichlet Process Mixture Model of Discrete Choice Arxiv:1801.06296V1
    A Dirichlet Process Mixture Model of Discrete Choice 19 January, 2018 Rico Krueger (corresponding author) Research Centre for Integrated Transport Innovation, School of Civil and Environmental Engineering, UNSW Australia, Sydney NSW 2052, Australia [email protected] Akshay Vij Institute for Choice, University of South Australia 140 Arthur Street, North Sydney NSW 2060, Australia [email protected] Taha H. Rashidi Research Centre for Integrated Transport Innovation, School of Civil and Environmental Engineering, UNSW Australia, Sydney NSW 2052, Australia [email protected] arXiv:1801.06296v1 [stat.AP] 19 Jan 2018 1 Abstract We present a mixed multinomial logit (MNL) model, which leverages the truncated stick- breaking process representation of the Dirichlet process as a flexible nonparametric mixing distribution. The proposed model is a Dirichlet process mixture model and accommodates discrete representations of heterogeneity, like a latent class MNL model. Yet, unlike a latent class MNL model, the proposed discrete choice model does not require the analyst to fix the number of mixture components prior to estimation, as the complexity of the discrete mixing distribution is inferred from the evidence. For posterior inference in the proposed Dirichlet process mixture model of discrete choice, we derive an expectation maximisation algorithm. In a simulation study, we demonstrate that the proposed model framework can flexibly capture differently-shaped taste parameter distributions. Furthermore, we empirically validate the model framework in a case study on motorists’ route choice preferences and find that the proposed Dirichlet process mixture model of discrete choice outperforms a latent class MNL model and mixed MNL models with common parametric mixing distributions in terms of both in-sample fit and out-of-sample predictive ability.
    [Show full text]