Recurrent Gaussian Processes and Robust Dynamical Modeling
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSIDADE FEDERAL DO CEARA´ CENTRO DE TECNOLOGIA DEPARTAMENTO DE ENGENHARIA DE TELEINFORMATICA´ PROGRAMA DE POS-GRADUA¸C´ AO~ EM ENGENHARIA DE TELEINFORMATICA´ DOUTORADO EM ENGENHARIA DE TELEINFORMATICA´ CESAR´ LINCOLN CAVALCANTE MATTOS RECURRENT GAUSSIAN PROCESSES AND ROBUST DYNAMICAL MODELING FORTALEZA 2017 CESAR´ LINCOLN CAVALCANTE MATTOS RECURRENT GAUSSIAN PROCESSES AND ROBUST DYNAMICAL MODELING Tese apresentada ao Curso de Doutorado em Engenharia de Teleinform´atica do Programa de P´os-Gradua¸c~aoem Engenharia de Teleinform´atica do Centro de Tecnologia da Universidade Federal do Cear´a, como requisito parcial `a obten¸c~ao do t´ıtulo de doutor em Engenharia de Teleinform´atica. Area´ de Concentra¸c~ao: Sinais e sistemas Orientador: Prof. Dr. Guilherme de Alencar Barreto FORTALEZA 2017 Dados Internacionais de Catalogação na Publicação Universidade Federal do Ceará Biblioteca Universitária Gerada automaticamente pelo módulo Catalog, mediante os dados fornecidos pelo(a) autor(a) M39r Mattos, César Lincoln. Recurrent Gaussian Processes and Robust Dynamical Modeling / César Lincoln Mattos. – 2017. 189 f. : il. color. Tese (doutorado) – Universidade Federal do Ceará, Centro de Tecnologia, Programa de Pós-Graduação em Engenharia de Teleinformática, Fortaleza, 2017. Orientação: Prof. Dr. Guilherme de Alencar Barreto. 1. Processos Gaussianos. 2. Modelagem dinâmica. 3. Identificação de sistemas não-lineares. 4. Aprendizagem robusta. 5. Aprendizagem estocástica. I. Título. CDD 621.38 CESAR´ LINCOLN CAVALCANTE MATTOS RECURRENT GAUSSIAN PROCESSES AND ROBUST DYNAMICAL MODELING A thesis presented to the PhD course in Teleinformatics Engineering of the Graduate Program on Teleinformatics Engineering of the Center of Technology at Federal University of Cear´ain fulfillment of the the requirement for the degree of Doctor of Philosophy in Teleinformatics Engineering. Area of Concentration: Signals and systems Approved on: August 25, 2017 EXAMINING COMMITTEE Prof. Dr. Guilherme de Alencar Barreto (Supervisor) Universidade Federal do Cear´a(UFC) Prof. Dr. Ant^onio Marcos Nogueira Lima Universidade Federal de Campina Grande (UFCG) Prof. Dr. Oswaldo Luiz do Valle Costa Universidade de S~aoPaulo (USP) Prof. Dr. Jos´eAilton Alencar Andrade Universidade Federal do Cear´a(UFC) For my beloved family who always supports me: Fernando Lincoln, Carmen and Fer- nanda. ACKNOWLEDGEMENTS I would like to thank my supervisor, Prof. Guilherme de Alencar Barreto, for his long-time support, partnership and academic guidance. Guilherme knows a fine balance between a focused supervision and the freedom to explore and experiment. His patience, encouragement and dedication throughout all the PhD journey were invaluable. I thank my many friends in the PPGETI, for all the intra and extra campus conversations about the most diverse topics. You were very important to proportionate a suitable environment for all this period. I specially thank Jos´eDaniel Alencar and Amauri Holanda, with whom I had many valuable discussions. I am very grateful to Prof. Neil Lawrence, for kindly receiving me in his research group in the University of Sheffield. Neil has been a great intellectual inspiration and provided a significant boost to my studies. The way he works in his group and handles machine learning research will always have an influence on me. I was very lucky to be around Neil's ML group in SITraN for six months. I learned so much from the incredible people I met there. All the experiences we shared were critical for my adaptation and will be treasured. Thanks a lot, everyone! A special thanks to my co-authors, Andreas Damianou and Zhenwen Dai, who dedicated a lot of time to help me with my research (and are great guys!). I acknowledge the financial support of FUNCAP (for my PhD scholarship) and CAPES (for my time abroad via the PDSE program). Finally, this thesis is dedicated to my family, Fernando Lincoln, Carmen and Fernanda, whose support was fundamental to the completion of this work. \It is not knowledge, but the act of learning, not possession but the act of getting there, which grants the greatest enjoyment." (Carl Friedrich Gauss) RESUMO O estudo de sistemas din^amicos encontra-se disseminado em v´arias ´areas do conhecimento. Dados sequenciais s~aogerados constantemente por diversos fen^omenos, a maioria deles n~ao pass´ıveis de serem explicados por equa¸c~oes derivadas de leis f´ısicas e estruturas conhecidas. Nesse contexto, esta tese tem como objetivo abordar a tarefa de identifica¸c~aode sistemas n~aolineares, por meio da qual s~ao obtidos modelos diretamente a partir de observa¸c~oes sequenciais. Mais especificamente, n´os abordamos cen´arios desafiadores, tais como o aprendizado de rela¸c~oes temporais a partir de dados ruidosos, dados contendo valores discrepantes (outliers) e grandes conjuntos de dados. Na interface entre estat´ısticas, ci^encia da computa¸c~ao, an´alise de dados e engenharia encontra-se a comunidade de aprendizagem de m´aquina, que fornece ferramentas poderosas para encontrar padr~oesa partir de dados e fazer previs~oes. Nesse sentido, seguimos m´etodos baseados em Processos Gaussianos (PGs), uma abordagem probabil´ıstica pr´atica para a aprendizagem de m´aquinas de kernel. A partir de avan¸cosrecentes em modelagem geral baseada em PGs, introduzimos novas contribui¸c~oes para o exerc´ıcio de modelagem din^amica. Desse modo, propomos a nova fam´ılia de modelos de Processos Gaussianos Recorrentes (RGPs, da sigla em ingl^es) e estendemos seu conceito para lidar com requisitos de robustez a outliers e aprendizagem estoc´astica escal´avel. A estrutura hier´arquica e latente (n~ao-observada) desses modelos imp~oeexpress~oesn~ao-anal´ıticas, que s~aoresolvidas com a deriva¸c~ao de novos algoritmos variacionais para realizar infer^encia determinista aproximada como um problema de otimiza¸c~ao. As solu¸c~oes apresentadas permitem a propaga¸c~ao da incerteza tanto no treinamento quanto no teste, com foco em realizar simula¸c~aolivre. N´os avaliamos em detalhe os m´etodos propostos com benchmarks artificiais e reais da ´area de identifica¸c~ao de sistemas, assim como outras tarefas envolvendo dados din^amicos. Os resultados obtidos indicam que nossas proposi¸c~oes s~ao competitivas quando comparadas a modelos dispon´ıveis na literatura, mesmo nos cen´arios que apresentam as complica¸c~oes mencionadas, e que a modelagem din^amica baseada em PGs ´euma ´area de pesquisa promissora. Palavras-chave: Processos Gaussianos. Modelagem din^amica. Identifica¸c~aode sistemas n~ao-lineares. Aprendizagem robusta. Aprendizagem estoc´astica. ABSTRACT The study of dynamical systems is widespread across several areas of knowledge. Sequential data is generated constantly by different phenomena, most of them that we cannot explain by equations derived from known physical laws and structures. In such context, this thesis aims to tackle the task of nonlinear system identification, which builds models directly from sequential measurements. More specifically, we approach challenging scenarios, such as learning temporal relations from noisy data, data containing discrepant values (outliers) and large datasets. In the interface between statistics, computer science, data analysis and engineering lies the machine learning community, which brings powerful tools to find patterns from data and make predictions. In that sense, we follow methods based on Gaussian Processes (GP), a principled, practical, probabilistic approach to learning in kernel machines. We aim to exploit recent advances in general GP modeling to bring new contributions to the dynamical modeling exercise. Thus, we propose the novel family of Recurrent Gaussian Processes (RGPs) models and extend their concept to handle outlier-robust requirements and scalable stochastic learning. The hierarchical latent (non- observed) structure of those models impose intractabilities in the form of non-analytical expressions, which are handled with the derivation of new variational algorithms to perform approximate deterministic inference as an optimization problem. The presented solutions enable uncertainty propagation on both training and testing, with focus on free simulation. We comprehensively evaluate the introduced methods with both artificial and real system identification benchmarks, as well as other related dynamical settings. The obtained results indicate that our propositions are competitive when compared to models available in the literature within the aforementioned complicated setups and that GP-based dynamical modeling is a promising area of research. Keywords: Gaussian Processes. Dynamical modeling. Nonlinear system identification. Robust learning. Stochastic learning. LIST OF FIGURES Figure 1 { Block diagrams of common evaluation methodologies for system identi- fication. ................................... 25 Figure 2 { Examples of GP samples with different covariance functions. 34 Figure 3 { Graphical model detailing the relations between the variables in a standard GP model. 36 Figure 4 { Samples from the GP model before and after the observations. 38 Figure 5 { Illustration of the Bayesian model selection procedure. 39 Figure 6 { Simplified graphical models for the standard and augmented sparse GPs. 47 Figure 7 { Comparison of standard GP and variational sparse GP regression. 48 Figure 8 { Simplified graphical model for the GP-LVM. 49 Figure 9 { Graphical model for the Deep GP. 51 Figure 10 { Graphical model for the GP-NARX. 56 Figure 11 { Graphical model for the GP-SSM. 59 Figure 12 { RGP graphical model with H hidden