Monte Carlo Methods for Structured Data A

MONTE CARLO METHODS FOR STRUCTURED DATA A DISSERTATION SUBMITTED TO THE INSTITUTE FOR COMPUTATIONAL AND MATHEMATICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Adam Guetz January 2012 © 2012 by Adam Nathan Guetz. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/rg833nw3954 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Susan Holmes, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Amin Saberi, Co-Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Peter Glynn Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii Abstract Recent years has seen an increased need for modeling of rich data across many engineering and scientific disciplines. Much of this data contains structure, or non-trivial relationships between elements, that should be exploited when performing statistical inference. Sampling from and fitting complicated models present challenging computational issues, and available deterministic heuristics may be ineffective. Monte Carlo methods present an attractive framework for finding approximate solutions to these problems. This thesis covers two closely related techniques: adaptive importance sampling, and sequential Monte Carlo. Both of these methods make use of sampling-importance resampling to generate approximate samples from distributions of interest. Sequential importance sampling is well known to have difficulties in high-dimensional settings. I present a technique called conditional sampling-importance resampling, an extension of sampling importance resampling to conditional distributions that improves performance, particularly when independence structure is present. The primary application is to multi-object tracking for a colony of harvester ants in a laboratory setting. Previous approaches tend to make simplifying parametric assumptions on the model in order to make computations more tractable, while the approach presented finds approximate solutions to more complicated and realistic models. To analyze structural properties of networks, I expand adaptive importance sampling techniques to the analysis of network growth models such as preferential attachment, using the Plackett-Luce family of distributions on permutations, and I present an application of sequential Monte Carlo to a special form of network growth model called vertex censored stochastic Kronecker product graphs. iv Acknowledgements I'd like to thank my wife Heidi Lubin, my son Levi, my principal advisor Susan Holmes, my co-advisor Amin Saberi, my parents, and all of my friends and extended family. v Contents Abstract iv Acknowledgements v 1 Introduction 1 1.1 Monte Carlo Integration . .2 1.2 Applications . .3 2 Approximate Sampling 6 2.1 Importance Sampling . .7 2.1.1 Effective Sample Size . .9 2.1.2 Sampling Importance Resampling . 11 2.2 Markov Chain Monte Carlo . 13 2.2.1 Markov Chains . 13 2.2.2 Metropolis Hastings . 14 2.2.3 Gibbs Sampler . 14 2.2.4 Data Augmentation . 16 2.2.5 Hit-and-Run . 16 3 Sequential Monte Carlo 18 3.1 Sequential Models . 18 3.2 Sequential Importance Sampling . 20 3.3 Particle Filter . 22 vi 4 Adaptive Importance Sampling 24 4.1 Background . 24 4.1.1 Variance Minimization . 25 4.1.2 Cross-Entropy Method . 26 4.2 Avoiding Degeneracy . 28 4.3 Related Methods . 30 4.3.1 Annealed Importance Sampling . 30 4.3.2 Population Monte Carlo . 31 5 Conditional Sampling Importance Resampling 33 5.1 Motivation . 33 5.2 Conditional Resampling . 34 5.2.1 Estimating Marginal Importance Weights . 36 5.2.2 Conditional Effective Sample Size . 36 5.2.3 Importance Weight Accounting . 37 5.3 Example: Multivariate Normal . 38 6 Multi-Object Particle Tracking 43 6.1 Background . 43 6.1.1 Single Object Tracking . 43 6.1.2 Multi Object Tracking . 45 6.1.3 Tracking Notation . 46 6.2 Conditional SIR Particle Tracking . 47 6.2.1 Grouping Subsets for Multi-Object Tracking . 48 6.3 Application: Tracking Harvester Ants . 49 6.3.1 Object Detection . 49 6.3.2 Observation Model . 51 6.3.3 State-Space Model . 53 6.3.4 Importance Distribution . 54 6.3.5 Computing Relative and Marginal Importance Weights . 62 6.4 Empirical Results . 64 6.4.1 Simulated Data . 64 vii 6.4.2 Short Harvester Ant Video . 65 7 Network Growth Models 70 7.1 Background . 71 7.1.1 Erdös-Rényi . 73 7.1.2 Preferential Attachment . 73 7.1.3 Duplication/Divergence . 75 7.2 Computing Likelihoods with Adaptive Importance Sampling . 75 7.2.1 Marginalizing Vertex Ordering . 78 7.2.2 Plackett-Luce Model as an Importance Distribution . 79 7.2.3 Choice of Description Length Function . 80 7.3 Examples . 81 7.3.1 Modified Preferential Attachment Model . 81 7.3.2 Adaptive Importance sampling . 82 7.3.3 Annealed Importance sampling . 82 7.3.4 Computational Effort . 83 7.3.5 Numerical Results . 84 8 Kronecker Product Graphs 91 8.1 Motivation . 92 8.2 Stochastic Kronecker Product Graph model . 94 8.2.1 Likelihood under Stochastic Kronecker Product Graph model . 94 8.2.2 Sampling Permutations . 96 8.2.3 Computing Gradients . 96 8.3 Vertex Censored Stochastic Kronecker Product Graphs . 97 8.3.1 Importance Sampling for Likelihoods . 98 8.3.2 Choosing Censored Vertices . 100 8.3.3 Sampling Permutations . 100 8.3.4 Multiplicative Attribute Graphs . 101 8.4 Empirical Results . 101 8.4.1 Implementation . 102 viii List of Tables 6.1 Observation event types. 53 7.1 Comparison of estimators for sparse 500 node preferential attachment dataset from Figure 7.1 . 84 7.2 Comparison of estimators for dataset: 5 networks, 30 nodes each, average degree 2, 20 samples each method . 86 7.3 Comparison of estimators for dataset: 2 networks, 100 nodes each, average degree 2 . 86 7.4 Estimated log-likelihoods for Mus Musculus protein-protein interaction networks . 87 ix List of Figures 3.1 Dependence structure of hidden Markov models . 19 5.1 CSIR Normal example: eigenvalues of covariance matrices . 40 5.2 CSIR Normal example: estimate KL-Divergences . 41 5.3 Same experiments as in Figure 5.2, plotted by method. 42 6.1 Example grouping subset functions . 49 6.2 Blob bisection via spectral partitioning . 52 6.3 Association of objects with observations. 'Events' correspond to con- nected components in this bipartite graph, including Normal observations, splitting, merging, false positives, false negatives, and joint events.................................... 57 6.4 \True" distribution of path lengths and trajectories per frame, simulated example. 66 6.5 Centroid observations per frame, simulated example. 66 6.6 Distribution of path lengths and trajectories per frame using a sample from the importance distribution, simulated example. 67 6.7 Distribution of path lengths and trajectories per frame using CSIR, simulated example. 67 6.8 GemVident screenshot, showing centroids. 68 6.9 Centroid observations per frame from Harvester ant example. 68 6.10 Distribution of path lengths and trajectories per frame using a sample from the importance distribution, Harvester ant example. 69 x 6.11 Distribution of path lengths and trajectories per frame using CSIR, Harvester ant example. 69 7.1 Example runs comparing annealed importance sampling and adaptive importance sampling . 85 7.2 Likelihoods and importance weights for cross-entropy method. 88 7.3 Mus. Musculus (common mouse) PPI network. 89 7.4 Convergence of adaptive importance sampling and annealed importance sampling for Mus. Musculus PPI network. 90 8.1 Comparison of crude and SIS Monte Carlo for Kronecker graph likelihoods. ................................... 103 8.2 Comparison of SKPG and VCSKPG models for AS-ROUTEVIEWS graph . 104 xi Chapter 1 Introduction Contemporary data analysis is often comprised of information with complicated and high-dimensional relationships between elements. Traditional, deterministic analytic techniques are often unable to directly cope with the computational challenge, and must make simplifying assumptions or heuristic approximations. An attractive al- ternate is the suite of randomized methods known as Monte Carlo. The types of problems examined in this thesis often contain both discrete and continuous components, and can generally be expressed as or related to integral or summation type problems. Suppose one wishes to compute some quantity µ defined as Z µ = X(!)P(d!): (1.1) Ω If X :Ω 7! R is the random variable defined on probability space (Ω; Σ; P), then this can be equivalently expressed as the expected value µ = E[X]: (1.2) In some cases, µ can be computed exactly using analytic techniques. For many examples this is not possible and one must resort to methods of approximation. Deter- ministic numerical integration, or quadrature, generally has good convergence properties for low and moderate dimensional integrals. However, the computational com- plexity of quadrature increases exponentially in the dimension of the sample space 1 CHAPTER 1. INTRODUCTION 2 Ω, making high-dimensional inference computationally intractable. This general phe- nomenon is known as the curse of dimensionality [11], and can be explained in terms of the relative \sparseness" of high-dimensional space.

Monte Carlo Methods for Structured Data A

WHAT DID FISHER MEAN by an ESTIMATE? 3 Ideas but Is in Conﬂict with His Ideology of Statistical Inference

Statistical Theory

Estimation Methods in Multilevel Regression

Akaike's Information Criterion

Plausibility Functions and Exact Frequentist Inference

An Introduction to Maximum Likelihood in R

Likelihood Ratios: a Simple and Flexible Statistic for Empirical Psychologists

SPATIAL and TEMPORAL MODELLING of WATER ACIDITY in TURKEY LAKES WATERSHED Spatial and Temporal Modelling of Water Acidity in Turkey Lakes Watershed

P Values, Hypothesis Testing, and Model Selection: It’S De´Ja` Vu All Over Again1

Maximum Likelihood Estimation ∗ Contents

The Ways of Our Errors

Profile Likelihood Estimation of the Vulnerability P(X>V) and the Mixing