Tilburg University Multilevel Modeling for Data Streams with Dependent Observations

Tilburg University Multilevel modeling for data streams with dependent observations Ippel, L. Publication date: 2017 Document Version Publisher's PDF, also known as Version of record Link to publication in Tilburg University Research Portal Citation for published version (APA): Ippel, L. (2017). Multilevel modeling for data streams with dependent observations. [s.n.]. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Download date: 25. sep. 2021 Multilevel Modeling for Data Streams with Dependent Observations Lianne Ippel Tilburg University Multilevel Modeling for Data Streams with Dependent Observations PROEFSCHRIFT ter verkrijging van de graad van doctor aan Tilburg University op gezag van de rector magnificus, prof.dr. E.H.L. Aarts, in het openbaar te verdedigen ten overstaan van een door het college voor promoties aangewezen commissie in de aula van de Universiteit op vrijdag 13 oktober 2017 om 10.00 uur door c 2017 L. Ippel All Rights Reserved. Gijsberdina Janna Elisabeth Ippel ⃝ geboren te Werkendam Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage and retrieval system, without written per- mission of the author. Printing was financially supported by Tilburg University. ISBN: 978-94-6295-757-2 Printed by: Proefschriftmaken || Vianen Cover design: Faboosh design & art Multilevel Modeling for Data Streams with Dependent Observations PROEFSCHRIFT ter verkrijging van de graad van doctor aan Tilburg University op gezag van de rector magnificus, prof.dr. E.H.L. Aarts, in het openbaar te verdedigen ten overstaan van een door het college voor promoties aangewezen commissie in de aula van de Universiteit op vrijdag 13 oktober 2017 om 10.00 uur door c 2017 L. Ippel All Rights Reserved. Gijsberdina Janna Elisabeth Ippel ⃝ geboren te Werkendam Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage and retrieval system, without written per- mission of the author. Printing was financially supported by Tilburg University. ISBN: 978-94-6295-757-2 Printed by: Proefschriftmaken || Vianen Cover design: Faboosh design & art Promotor: prof.dr. J. K. Vermunt Copromotor: prof.dr. M.C. Kaptein Overige leden van de Promotiecommissie: prof.dr. G.J.P. van Breukelen Preface prof.dr. M.E. Timmerman dr. M. Postma dr. M.A. Croon One of my early-childhood memories comes from second grade at primary school. I am standing at the desk of my teacher, a five-year old and a bit too witty, asking my teacher when I would finally learn how to write and how to do math. Done with playing with blocks and dolls, I wanted to learn more! However, I had to wait one more year before I could start writing and calculating. The eagerness to broaden my skills and deepen my knowledge has never left me. Years later, while finishing my Bachelor’s degree in Sociology, I decided to develop myself even more and I applied for the research master at the faculty of Social and Behavioral Sciences. I think it was not more than a month in the program, when Guy Moors ap- proached me. He asked me which topic I wanted to study during my PhD project. Honored, and admittedly a little stressed out because I didn’t feel like I had proven myself to be worthy of this position yet, we discussed several topics. Later in the program, I got the opportunity to work with Maurits Kaptein on my Master’s The- sis. After the research master, he became my PhD supervisor in the following four years. The book you are holding right now is the result of four years work. When I started this project, I never thought I was able to write the code, do the math, or have the writing skills to do this. Obviously, I have not accomplished the work on my own, but you will read more about that at the end of this book (Dankwoord). Lianne Ippel May, 2017 Promotor: prof.dr. J. K. Vermunt Copromotor: prof.dr. M.C. Kaptein Overige leden van de Promotiecommissie: prof.dr. G.J.P. van Breukelen Preface prof.dr. M.E. Timmerman dr. M. Postma dr. M.A. Croon One of my early-childhood memories comes from second grade at primary school. I am standing at the desk of my teacher, a five-year old and a bit too witty, asking my teacher when I would finally learn how to write and how to do math. Done with playing with blocks and dolls, I wanted to learn more! However, I had to wait one more year before I could start writing and calculating. The eagerness to broaden my skills and deepen my knowledge has never left me. Years later, while finishing my Bachelor’s degree in Sociology, I decided to develop myself even more and I applied for the research master at the faculty of Social and Behavioral Sciences. I think it was not more than a month in the program, when Guy Moors ap- proached me. He asked me which topic I wanted to study during my PhD project. Honored, and admittedly a little stressed out because I didn’t feel like I had proven myself to be worthy of this position yet, we discussed several topics. Later in the program, I got the opportunity to work with Maurits Kaptein on my Master’s The- sis. After the research master, he became my PhD supervisor in the following four years. The book you are holding right now is the result of four years work. When I started this project, I never thought I was able to write the code, do the math, or have the writing skills to do this. Obviously, I have not accomplished the work on my own, but you will read more about that at the end of this book (Dankwoord). Lianne Ippel May, 2017 vii Contents Preface v 1 Introduction 1 1.1 The era of data streams ............................ 1 1.2 Outline ..................................... 2 1.3 Contributions to the literature ........................ 7 2 Dealing with Data Streams: an Online, Row-by-Row, Estimation Tutorial. 9 2.1 Introduction .................................. 10 2.2 Dealing with Big Data: the options ..................... 12 2.3 From Conventional Analysis to Online Analysis ............. 14 2.3.1 Sample mean ............................. 14 2.3.2 Sample variance ........................... 15 2.3.3 Sample covariance .......................... 16 2.3.4 Linear regression ........................... 17 Computation time of linear regression ............... 18 2.3.5 Effect size η2 (ANOVA) ....................... 18 2.4 Online Estimation using Stochastic Gradient Descent .......... 21 2.4.1 Offline Gradient Descent ...................... 21 2.4.2 Online or Stochastic Gradient Descent ............... 23 2.4.3 Logistic regression: an Example of the Usage of SGD ...... 24 2.5 Online learning in practice: logistic regression in a data stream ..... 25 2.5.1 Switching to a safe well ....................... 25 2.5.2 Results ................................. 26 2.5.3 Learn rates ............................... 26 2.5.4 Starting values ............................ 27 2.6 Considerations analyzing Big Data and Data Streams .......... 28 2.7 Discussion ................................... 30 Appendix 2.A Online Correlation ........................ 31 Appendix 2.B Online linear regression ..................... 32 Appendix 2.C Stochastic Gradient Decent – Logistic regression ....... 33 Appendix 2.D Wells data example ........................ 34 vii Contents Preface v 1 Introduction 1 1.1 The era of data streams ............................ 1 1.2 Outline ..................................... 2 1.3 Contributions to the literature ........................ 7 2 Dealing with Data Streams: an Online, Row-by-Row, Estimation Tutorial. 9 2.1 Introduction .................................. 10 2.2 Dealing with Big Data: the options ..................... 12 2.3 From Conventional Analysis to Online Analysis ............. 14 2.3.1 Sample mean ............................. 14 2.3.2 Sample variance ........................... 15 2.3.3 Sample covariance .......................... 16 2.3.4 Linear regression ........................... 17 Computation time of linear regression ............... 18 2.3.5 Effect size η2 (ANOVA) ....................... 18 2.4 Online Estimation using Stochastic Gradient Descent .......... 21 2.4.1 Offline Gradient Descent ...................... 21 2.4.2 Online or Stochastic Gradient Descent ............... 23 2.4.3 Logistic regression: an Example of the Usage of SGD ...... 24 2.5 Online learning in practice: logistic regression in a data stream ..... 25 2.5.1 Switching to a safe well ....................... 25 2.5.2 Results ................................. 26 2.5.3 Learn rates ............................... 26 2.5.4 Starting values ............................ 27 2.6 Considerations analyzing Big Data and Data Streams .......... 28 2.7

Tilburg University Multilevel Modeling for Data Streams with Dependent Observations

A Primer for Analyzing Nested Data: Multilevel Modeling in SPSS Using

A Bayesian Multilevel Model for Time Series Applied to Learning in Experimental Auctions

Introduction to Multilevel Modeling

An Introduction to Bayesian Multilevel (Hierarchical) Modelling Using

3 Diagnostic Checks for Multilevel Models 141 Eroscedasticity, I.E., Non-Constant Variances of the Random Eﬀects

Multilevel Analysis

Multilevel Linear Models: the Basics

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Multilevel Models for Repeated Binary Outcomes: Attitudes and Vote Over the Electoral Cycle

User-Friendly Bayesian Regression Modeling: a Tutorial with Rstanarm and Shinystan

Fundamentals of Hierarchical Linear and Multilevel Modeling 1 G

Actuarial Applications of Hierarchical Modeling