On the Method of Moments for Estimation in Latent Variable Models Anastasia Podosinnikova

On the Method of Moments for Estimation in Latent Variable Models Anastasia Podosinnikova To cite this version: Anastasia Podosinnikova. On the Method of Moments for Estimation in Latent Variable Models. Machine Learning [cs.LG]. Ecole Normale Superieure de Paris - ENS Paris, 2016. English. tel- 01489260v1 HAL Id: tel-01489260 https://tel.archives-ouvertes.fr/tel-01489260v1 Submitted on 14 Mar 2017 (v1), last revised 5 Apr 2018 (v2) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE DE DOCTORAT de l’Université de recherche Paris Sciences Lettres – PSL Research University préparée à l’École normale supérieure Sur la méthode des mo- École doctorale n°386 ments pour d’estimation Spécialité: Informatique des modèles à variable latentes Soutenue le 01.12.2016 On the method of moments for estimation in latent variable models par Anastasia Podosinnikova Composition du Jury : Mme Animashree ANANDKUMAR University of California Irvine Rapporteur M Samuel KASKI Aalto University Rapporteur M Francis BACH École normale supérieure Directeur de thèse M Simon LACOSTE-JULIEN École normale supérieure Directeur de thèse M Alexandre D’ASPREMONT École normale supérieure Membre du Jury M Pierre COMON CNRS Membre du Jury M Rémi GRIBONVAL INRIA Membre du Jury Abstract Latent variable models are powerful probabilistic tools for extracting useful latent structure from otherwise unstructured data and have proved useful in numerous ap- plications such as natural language processing and computer vision. A special case of latent variable models are latent linear models, where observations originate from a linear transformation of latent variables. Despite their modeling simplicity, latent linear models are useful and widely used instruments for data analysis in practice and include, among others, such notable examples as probabilistic principal component analysis and correlation component analysis, independent component analysis, as well as probabilistic semantic indexing and latent Dirichlet allocation. That being said, it is well known that estimation and inference are often intractable for many latent linear models and one has to make use of approximate methods often with no recovery guarantees. One approach to address this problem, which has been popular lately, are methods based on the method of moments. These methods often have guarantees of exact recovery in the idealized setting of an infinite data sample and well specified models, but they also often come with theoretical guarantees in cases where this is not exactly satisfied. This is opposed to more standard and widely used methods based on variational inference and sampling and, therefore, makes moment matching based methods especially interesting. In this thesis, we focus on moment matching based estimation methods for different latent linear models. In particular, by making more apparent connections between independent component analysis (ICA) and latent Dirichlet allocation (LDA), we introduce a topic model which we call discrete ICA. Through the close connection to ICA, which is a well understood latent linear model from the signal processing literature, we develop new estimation algorithms for the discrete ICA model with some theoretical guarantees and, in particular, with the improved sample complexity compared to the previous methods. Importantly, the discrete ICA model is semiparametric and the proposed estimation methods do not require any assumption on the prior distributions of the latent variables in the model. Moreover, through the close connection between ICA and canonical correlation analysis (CCA), we propose several novel semiparametric multi-view models, closely related to both ICA and CCA, which are adapted to work with count or continuous data or with the mixture of the two. We prove the identifiability of these models, which isa necessary property to ensure their interpretability. It appears that some linear models which are widely used for interpretability are unidentifiable and more meaningful ana- logues are of interest. We also develop moment matching based estimation algorithms for the introduced semiparametric multi-view models, which again does not require any assumptions on the prior distributions of the latent variables. For all mentioned models, we perform extensive experimental comparison of the proposed algorithms on both synthetic and real datasets and demonstrate their promising practical performance. Acknowledgements First and foremost, I would like to deeply thank my advisors Francis Bach and Si- mon Lacoste-Julien for their tremendous support, enthusiasm, motivation, and all the knowledge that they shared with me over the past three years. Francis and Simon shaped the way I think in a fundamental way, taught me to see important details and to address challenging problems in a clear and systematic way. Simon, Francis, thank you so much for this unique opportunity to work under your guidance and for the motivating environment. I am grateful to Animashree Anandkumar and Samuel Kaski for accepting to review my thesis and to Alexandre d’Aspremont, Pierre Comon, and Rémi Gribonval for accepting to be the jury members of my thesis. I would like to express my sincere gratitude to all the jury committee members for reading the preliminary manuscript of my thesis, for their corrections, valuable feedback and challenging questions which improved this thesis. I would like to thank Matthias Hein, with whom I worked prior to starting my PhD, for introducing me to the exciting field of machine learning and all the members of his group for their support. I would especially like to thank Martin Slawski for encouraging me to apply for a PhD position at the SIERRA project-team. I am also thankful to Guy Bresler, Philippe Rigollet, David Sontag, Caroline Uhler from MIT for accepting me for a postdoc position. I am very grateful to all the members (current and former) of the SIERRA and WILLOW project-teams, who are both great researches and friends, for all the support they provided over these three years. This work was supported in part by the Microsoft Research—Inria Joint Centre and I would like to thank Laurent Massoulie for his continuous help. Last but not least, I would like to thank my parents, my brother, all my relatives and friends for their tremendous support over these years. iii Introduction The goal of the estimation task in latent linear models is the estimation of latent pa- rameters of a model, in particular, the estimation of a linear transformation matrix. One approach to this task is based on the method of moments, which has been more and more popular recently due to its theoretical guarantees. This thesis brings several contributions in the field of moment matching-based estimation for latent linear models in machine learning. The main part of this thesis consists of four chapters, where the first two chapters are introductory and bring together several concepts which are further used in the last two chapters for constructing new models and developing new estimation algorithms. Below we summarize main contributions of each chapter. Chapter1: We start with an overview of some important latent linear models for different types of data : continuous vs. count data and single-view vs. multi-view, which we present in a unified framework with an emphasis on their identifiability properties. Chapter2: Despite the simplicity of these models, the estimation and inference are often intractable. In this thesis, we focus on moment matching-based estimation methods, which often have some theoretical guarantees as opposed to widely used methods based on variational inference or sampling. In Chapter2, we (a) review the main ideas of tensors and their decompositions, (b) connect higher- order statistics of latent linear models with the so-called canonical polyadic (CP) decomposition of tensors, and (c) briefly review the estimation and inference methods with the emphasis on the moment matching-based techniques with theoretical guarantees. Chapter3: As the first contribution of this thesis, we present anovel semiparametric topic model—called discrete ICA—which is closely related to independent component analysis and such linear topic models as probabilistic latent semantic indexing and latent Dirichlet allocation, but has a higher expressive power since it does not make any assumptions on the distribution of latent variables. We prove that the higher-order cumulant-based tensors of discrete ICA, in the po- pulation case, are tensors in the form of the symmetric CP decomposition. This result is closely related to the previous result for higher-order moment-based tensors of LDA. However, the derivations in the discrete ICA case are somew- hat more straightforward due to the properties of cumulants and the estimators have the improved sample complexity. The estimation methods are then based on the approximation of this symmetric CP decomposition from sample esti- mates using different (orthogonal) diagonalization algorithms, which include the eigendecomposition-based

On the Method of Moments for Estimation in Latent Variable Models Anastasia Podosinnikova

Laws of Total Expectation and Total Variance

Probability Cheatsheet V2.0 Thinking Conditionally Law of Total Probability (LOTP)

Probability Cheatsheet

Conditional Expectation and Prediction Conditional Frequency Functions and Pdfs Have Properties of Ordinary Frequency and Density Functions

Chapter 3: Expectation and Variance

Lecture Notes 4 Expectation • Definition and Properties

Probabilities, Random Variables and Distributions A

(Introduction to Probability at an Advanced Level) - All Lecture Notes

Lecture 15: Expected Running Time

Missed Expectations? 1 Linearity of Expectation

Advanced Probability

Identifying Sources of Variation and the Flow of Information in Biochemical