Supplementary Materials

Tomic et al, SIMON, an automated machine learning system reveals immune signatures of influenza vaccine responses 1 Supplementary Materials: 2 3 Figure S1. Staining profiles and gating scheme of immune cell subsets analyzed using mass 4 cytometry. Representative gating strategy for phenotype analysis of different blood- 5 derived immune cell subsets analyzed using mass cytometry in the sample from one donor 6 acquired before vaccination. In total PBMC from healthy 187 donors were analyzed using 7 same gating scheme. Text above plots indicates parent population, while arrows show 8 gating strategy defining major immune cell subsets (CD4+ T cells, CD8+ T cells, B cells, 9 NK cells, Tregs, NKT cells, etc.). 10 2 11 12 Figure S2. Distribution of high and low responders included in the initial dataset. Distribution 13 of individuals in groups of high (red, n=64) and low (grey, n=123) responders regarding the 14 (A) CMV status, gender and study year. (B) Age distribution between high and low 15 responders. Age is indicated in years. 16 3 17 18 Figure S3. Assays performed across different clinical studies and study years. Data from 5 19 different clinical studies (Study 15, 17, 18, 21 and 29) were included in the analysis. Flow 20 cytometry was performed only in year 2009, in other years phenotype of immune cells was 21 determined by mass cytometry. Luminex (either 51/63-plex) was performed from 2008 to 22 2014. Finally, signaling capacity of immune cells was analyzed by phosphorylation 23 cytometry (PhosphoFlow) on mass cytometer in 2013 and flow cytometer in all other years. 24 4 25 26 Figure S4. Visualization of the initial dataset in the context of missing values. Heatmap 27 showing distribution of data in the initial dataset. Each row represents a unique donor, while 28 each column is one feature. Missing values are shown in grey, while present values are 29 shown in black. 30 5 31 32 Fig. S5. Performance evaluation of models build on datasets 13 and 36 after applying 33 restriction filters. ROC curves shown for all the models build on (A) dataset 36 and (B) 34 dataset 13. Each model (classifier) is denoted in the color indicated in the graph legend. 35 6 36 37 Fig. S6. Heatmap of the correlation coefficients calculated between features from the dataset 38 13. Heatmap shows the significant correlation coefficients between all the features from 39 dataset 13 calculated using Pearson correlation (p<0.05). Not significant values are shown 40 as blank. Color of each circle follows the legend on the right side of the heatmap and red 41 indicates values with negative correlation, while blue values with positive correlation. 42 7 43 44 Fig. S7. Importance of features determined by Boruta. Boruta result plots for (A) dataset 13 45 and (B) dataset 36. Red boxplots represent importance score of rejected features, while green 46 boxplots show minimal, average and maximum importance score for confirmed features. 47 Blue boxplots show importance score of a shadow feature. Yellow boxplots are tentative 48 features. 49 8 50 51 Fig. S8. Heatmap of the correlation coefficients calculated between features from the dataset 52 36. Heatmap shows the significant correlation coefficients between all the features from 53 dataset 36 calculated using Pearson correlation (p<0.05). Not significant values are shown 54 as blank. Color of each circle follows the legend on the right side of the heatmap and red 55 indicates values with negative correlation, while blue values with positive correlation. 56 57 9 58 Tables S1-S19 provided in single Excel file 59 60 Data files S1-S2 61 62 Online Methods References 63 List of references of R packages used for Supplementary Table 6: 64 1. Algorithm: ada Package: ada - Mark Culp, Kjell Johnson and George Michailidis (2016). ada: 65 The R Package Ada for Stochastic Boosting. R package version 2.0-5. https://CRAN.R- 66 project.org/package=ada 67 2. Algorithm: AdaBag Package: adabag - Esteban Alfaro, Matias Gamez, Noelia Garcia (2013). 68 adabag: An R Package for Classification with Boosting and Bagging. Journal of Statistical 69 Software, 54(2), 1-35. URL http://www.jstatsoft.org/v54/i02/ 70 3. Algorithm: AdaBoost.M1 Package: adabag - Esteban Alfaro, Matias Gamez, Noelia Garcia 71 (2013). adabag: An R Package for Classification with Boosting and Bagging. Journal of 72 Statistical Software, 54(2), 1-35. URL http://www.jstatsoft.org/v54/i02/ 73 4. Algorithm: adaboost Package: fastAdaboost - Sourav Chatterjee (2016). fastAdaboost: a Fast 74 Implementation of Adaboost. R package version 1.0.0. https://CRAN.R- 75 project.org/package=fastAdaboost 76 5. Algorithm: amdai Package: adaptDA - Charles Bouveyron (2014). adaptDA: Adaptive Mixture 77 Discriminant Analysis. R package version 1.0. https://CRAN.R-project.org/package=adaptDA 78 6. Algorithm: avNNet Package: nnet - Venables, W. N. & Ripley, B. D. (2002) Modern Applied 79 Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0 10 80 7. Algorithm: awnb Package: caret - Max Kuhn. Contributions from Jed Wing, Steve Weston, 81 Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton 82 Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, 83 Yuan Tang, Can Candan and Tyler Hunt. (2017). caret: Classification and Regression Training. 84 R package version 6.0-76. https://CRAN.R-project.org/package=caret 85 8. Algorithm: bag Package: caret - Max Kuhn. Contributions from Jed Wing, Steve Weston, Andre 86 Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the 87 R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, Yuan Tang, 88 Can Candan and Tyler Hunt. (2017). caret: Classification and Regression Training. R package 89 version 6.0-76. https://CRAN.R-project.org/package=caret 90 9. Algorithm: awtan Package: caret - Max Kuhn. Contributions from Jed Wing, Steve Weston, 91 Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton 92 Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, 93 Yuan Tang, Can Candan and Tyler Hunt. (2017). caret: Classification and Regression Training. 94 R package version 6.0-76. https://CRAN.R-project.org/package=caret 95 10. Algorithm: bagEarth Package: earth - Stephen Milborrow. Derived from mda:mars by 96 Trevor Hastie and Rob Tibshirani. Uses Alan Miller's Fortran utilities with Thomas Lumley's 97 leaps wrapper. (2017). earth: Multivariate Adaptive Regression Splines. R package version 98 4.5.0. https://CRAN.R-project.org/package=earth 99 11. Algorithm: bagEarthGCV Package: earth - Stephen Milborrow. Derived from mda:mars 100 by Trevor Hastie and Rob Tibshirani. Uses Alan Miller's Fortran utilities with Thomas Lumley's 101 leaps wrapper. (2017). earth: Multivariate Adaptive Regression Splines. R package version 102 4.5.0. https://CRAN.R-project.org/package=earth 103 12. Algorithm: bagFDA Package: earth - Stephen Milborrow. Derived from mda:mars by 104 Trevor Hastie and Rob Tibshirani. Uses Alan Miller's Fortran utilities with Thomas Lumley's 11 105 leaps wrapper. (2017). earth: Multivariate Adaptive Regression Splines. R package version 106 4.5.0. https://CRAN.R-project.org/package=earth 107 13. Algorithm: bagFDAGCV Package: earth - Stephen Milborrow. Derived from mda:mars by 108 Trevor Hastie and Rob Tibshirani. Uses Alan Miller's Fortran utilities with Thomas Lumley's 109 leaps wrapper. (2017). earth: Multivariate Adaptive Regression Splines. R package version 110 4.5.0. https://CRAN.R-project.org/package=earth 111 14. Algorithm: bam Package: mgcv - Wood, S.N. (2011) Fast stable restricted maximum 112 likelihood \nand marginal likelihood estimation of semiparametric generalized linear \nmodels. 113 Journal of the Royal Statistical Society (B) 73(1):3-36 114 15. Algorithm: bayesglm Package: arm - Andrew Gelman and Yu-Sung Su (2016). arm: Data 115 Analysis Using Regression and Multilevel/Hierarchical\nModels. R package version 1.9-3. 116 https://CRAN.R-project.org/package=arm 117 16. Algorithm: binda Package: binda - Sebastian Gibb and Korbinian Strimmer. (2015). binda: 118 Multi-Class Discriminant Analysis using Binary Predictors. R package version 1.0.3. 119 https://CRAN.R-project.org/package=binda 120 17. Algorithm: blackboost Package: party - Torsten Hothorn, Kurt Hornik and Achim Zeileis 121 (2006). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of 122 Computational and Graphical Statistics, 15(3), 651--674. 123 18. Algorithm: C5.0 Package: C50 - Max Kuhn, Steve Weston, Nathan Coulter and Mark Culp. 124 C code for C5.0 by R. Quinlan (2015). C50: C5.0 Decision Trees and Rule-Based Models. R 125 package version 0.1.0-24. https://CRAN.R-project.org/package=C50 12 126 19. Algorithm: C5.0Rules Package: C50 - Max Kuhn, Steve Weston, Nathan Coulter and Mark 127 Culp. C code for C5.0 by R. Quinlan (2015). C50: C5.0 Decision Trees and Rule-Based Models. 128 R package version 0.1.0-24. https://CRAN.R-project.org/package=C50 129 20. Algorithm: C5.0Tree Package: C50 - Max Kuhn, Steve Weston, Nathan Coulter and Mark 130 Culp. C code for C5.0 by R. Quinlan (2015). C50: C5.0 Decision Trees and Rule-Based Models. 131 R package version 0.1.0-24. https://CRAN.R-project.org/package=C50 132 21. Algorithm: cforest Package: party - Torsten Hothorn, Kurt Hornik and Achim Zeileis 133 (2006). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of 134 Computational and Graphical Statistics, 15(3), 651--674. 135 22. Algorithm: chaid Package: CHAID - The FoRt Student Project Team (2015). CHAID: 136 CHi-squared Automated Interaction Detection R package version 0.1-2. 137 23. Algorithm: ctree Package: party - Torsten Hothorn, Kurt Hornik and Achim Zeileis (2006). 138 Unbiased Recursive Partitioning: A Conditional Inference Framework.

Supplementary Materials

Navigating the R Package Universe by Julia Silge, John C

The Split-Apply-Combine Strategy for Data Analysis

Hadley Wickham, the Man Who Revolutionized R Hadley Wickham, the Man Who Revolutionized R · 51,321 Views · More Stats

The Rockerverse: Packages and Applications for Containerisation

R Generation [1] 25

A History of R (In 15 Minutes… and Mostly in Pictures)

Changes on CRAN 2014-07-01 to 2014-12-31

R Programming for Data Science

ALFRED P. SLOAN FOUNDATION PROPOSAL COVER SHEET | Proposal Guidelines

Arxiv:1801.00371V2 [Stat.OT] 1 May 2018 Keywords the Edu for Communication, Mean for Trends Directions Research

R Software: Unfriendly but Probably the Best 67

Interactive Visualisation to Explore Structured Temporal Data