Bayesian Variable Selection in Linear Mixed Effects Models Vuong Tran

Master Thesis in Statistics and Data Mining Bayesian variable selection in linear mixed effects models Vuong Tran Division of Statistics and Machine learning Department of Computer and Information Science Linköping University Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/. Copyright The publishers will keep this document online on the Internet – or its possible replacement – from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/her own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/. © Vuong Tran ii Supervisors Bertil Wegmann, Linda Wänström Examiner Mattias Villani iii iv Abstract Variable selection techniques have been well researched and used in many different fields. There is rich literature on Bayesian variable selection in linear regression models, but only few of them are about mixed effects. The topic of the thesis is Bayesian variable selection in linear mixed effect models. The choice of methods to achieve this goal is to induce different shrinkage priors. Both unimodal shrinkage priors and spike-and-slab priors are used and compared. The distributions that have been chosen, either as unimodal priors or parts of the spike-and-slab priors are the Normal distribution, the Student-t distribution and the Laplace distribution. Both the simulations and the real dataset studies have been carried out, with the intention of investigating and evaluating how good the chosen distributions are as shrinkage priors. Obtained results from the real dataset shows that spike-and-slab priors yield more shrinkage effect than what unimodal priors does. However, inducing spike-and-slab priors carelessly without any consideration if the size of the data is sufficiently large enough may lead to poor model parameter estimations. Results from the simulations studies indicates that a mixture of Laplace distribution for both the spike and slab components is the prior that yields the highest shrinkage effect among the investigated shrinkage priors. v vi Acknowledgements I would like to thank Bertil Wegmann and Linda Wänström for giving me the opportunity to work with this interesting topic. The meetings were always helpful and inspiring. I am really grateful for the fast responses, guidance and all the support I received during the whole course. Thanks for the great supervision. I would also like to thank Allan Gholmi, for doing a great job with the opposition. Your suggestions and opinions have definitely improved the quality of this report. vii viii Contents 1 Introduction ................................................................................................................. 1 1.1 Background ..................................................................................................... 1 1.2 Previous work .................................................................................................. 3 1.3 Thesis Objective .............................................................................................. 4 2 Theory .................................................................................................................... 5 2.1 Variable selection in linear regression models ................................................ 5 2.1.1 General definition .................................................................................... 5 2.1.2 Objective of variable selection ................................................................. 6 2.2 Bayesian variable selection in linear regression models ................................. 6 2.2.1 Introduction .............................................................................................. 6 2.2.2 Concept and properties ............................................................................ 7 2.2.3 Shrinkage priors. ...................................................................................... 7 3 Methods .................................................................................................................. 9 3.1 Random Intercept Model (RIM) ..................................................................... 9 3.1.1 Unimodal shrinkage priors ..................................................................... 10 3.1.2 Spike-and-slab smoothing priors ........................................................... 10 3.2 Linear mixed effects (LME) model ............................................................... 12 3.2.1 Prior choice for the fixed effects ............................................................ 12 3.2.2 Prior choice for random effect ............................................................... 14 3.3 Evaluation measurements .............................................................................. 14 3.3.1 Credibility intervals ............................................................................... 15 3.3.2 Effective sample size ............................................................................. 15 3.3.3 Potential scale reduction ........................................................................ 15 3.3.4 MCMC trajectories and cumulative mean plots .................................... 15 3.3.5 Root mean squared error ........................................................................ 16 4 Data ...................................................................................................................... 17 4.1 Data simulations ............................................................................................ 17 4.1.1 Study 1 – Random effects study ............................................................ 17 4.1.2 Study 2 – Fixed effects study ................................................................. 18 4.1.3 Study 3 – Fixed and random effects study ............................................. 18 4.1.4 Study 4 – Fixed and random effects with correlation study .................. 18 4.1.5 Study 5 – Fixed and random effects in higher dimensions study .......... 18 4.2 Real dataset ................................................................................................... 19 5 Results .................................................................................................................. 20 5.1 Study 1 – Random effects study .................................................................... 20 5.2 Study 2 – Fixed effects study ........................................................................ 33 5.3 Study 3 – Fixed and random effects study .................................................... 38 5.4 Study 4 – Fixed and random effects with correlation study .......................... 47 5.5 Study 5 – Fixed and random effects in higher dimensions study .................. 53 5.6 Real dataset ................................................................................................... 63 5.6.1 Random intercept model ........................................................................ 64 5.6.2 Linear mixed effects model ................................................................... 70 ix 6 Discussion ............................................................................................................ 78 6.1 Simulation studies ......................................................................................... 78 6.2 Real dataset ................................................................................................... 80 7 Conclusions .......................................................................................................... 82 8 Literature .............................................................................................................. 83 x 1 Introduction 1.1 Background Regression analysis is a common statistical

Load more