Page 1 of 66 Diabetes

Predictive Modeling of Type 1 Diabetes Stages Using Disparate Data Sources

Brigitte I. Frohnert1, Bobbie-Jo Webb-Robertson2, Lisa M. Bramer2, Sara M. Reehl2, Kathy Waugh1 Andrea K. Steck1, Jill M. Norris3, Marian Rewers1

1Barbara Davis Center for Diabetes, School of Medicine, University of Colorado, Aurora, CO.

2Computational and Statistical Analytics Division, Pacific Northwest National Laboratory, Richland, WA

3Department of Epidemiology, Colorado School of Public Health, University of Colorado, Aurora, CO

* Address correspondence to: Brigitte I. Frohnert, MD, PhD Assistant Professor of Pediatrics University of Colorado School of Medicine 1775 Aurora Court, B-140, Aurora, CO, 80045 Tel. 303-724-2323 Fax 303-724-6889

Email: [email protected]

Running title: Integrative modeling of type 1 diabetes risk

3 Tables, 4 Figures Total Word Count: 3999

1

Diabetes Publish Ahead of Print, published online November 18, 2019 Diabetes Page 2 of 66

ABSTRACT

This study aims to model genetic, immunologic, metabolomic and proteomic biomarkers for development of islet autoimmunity (IA) and progression to type 1 diabetes in a prospective high- risk cohort. We studied 67 children: 42 who developed IA (20/42 progressed to diabetes) and 25 controls matched for sex and age. Biomarkers were assessed at four time points: earliest available sample, just prior to IA, just after IA, and just prior to diabetes onset. Predictors of IA and progression to diabetes were identified across disparate sources using an integrative machine learning algorithm and optimization-based feature selection. Our integrative approach was predictive of IA (AUC 0.91) and progression to diabetes (AUC 0.92) based on standard cross- validation (CV). Amongst the strongest predictors of IA were change in serum ascorbate, 3- methyl-oxobutyrate, and the PTPN22 (rs2476601) polymorphism. Serum glucose, ADP , and mannose were amongst the strongest predictors of progression to diabetes. This proof-of-principle analysis is the first study to integrate large, diverse biomarker data sets into a limited number of features, highlighting differences in pathways leading to IA from those predicting progression to diabetes. Integrated models, if validated in independent populations, could provide novel clues concerning the pathways leading to IA and type 1 diabetes.

2 Page 3 of 66 Diabetes

Key words:

Type 1 diabetes

Prediction of type 1

Prediction model

Longitudinal analysis

Child

Islet autoimmunity

Metabolomics

Biomarkers

Proteomics

Genetics

Abbreviations:

AbPos: islet autoantibody positive group FDR: first-degree relative IA: islet autoimmunity KNN: K-nearest neighbors LDA: Linear Discriminant Analysis LR: logistic regression NB: Naïve Bayes RBF: radial basis function RF: random forest SVM (RBF): Support Vector Machine with a radial basis function kernel SVM (LIN): Support Vector Machine with a linear kernel T1D: type 1 diabetes group

3 Diabetes Page 4 of 66

Type 1 diabetes results from autoimmune destruction of insulin-producing pancreatic beta-cells.

Clinically-apparent diabetes is typically preceded by a period of islet autoimmunity (IA). marked by appearance of autoantibodies against islet autoantigens (1). While there is consensus that chronic autoimmune destruction of beta-cells is triggered by an interaction of environmental factor(s) with a relatively common genetic background, the specific cause remains elusive.

Prospective cohort studies have reported a number of demographic, immune (2–4), genetic (5–10), metabolomic (11,12), and proteomic (13–15) predictors of IA and/or progression from IA to diabetes. Each analytic approach offers unique insights; however, single data-stream analysis is unable to address the importance of technique-specific observations in the context of other analyses. Use of data fusion methods to integrate different data types can create models that are more complete and accurate than those derived from any individual source (16). Our objective was to provide proof-of-principle that machine learning Bayesian modeling of disparate biomarkers can yield useful integrated models for hypothesis generation. Applied to longitudinally collected biomarkers, such integrated models could provide novel clues concerning pathways leading to IA and/or diabetes. This integrative modeling approach could improve personalized prediction of progression through pre-symptomatic stages of type 1 diabetes.

RESEARCH DESIGN AND METHODS

Study Participants

We performed a nested case-control study of children participating in the Diabetes Autoimmunity

Study in the Young (DAISY) cohort. DAISY follows prospectively 2,547 children at increased risk for type 1 diabetes. The cohort consists of first-degree relatives of patients with type 1 diabetes

(FDR) and general population children with type 1 diabetes-susceptibility HLA-DR, DQ

4 Page 5 of 66 Diabetes

genotypes identified by newborn screening (17,18), recruited between 1993 and 2004. Follow-up

results are available through September 29, 2017. Written informed consent was obtained from

subjects and parents. The Colorado Multiple Institutional Review Board approved all protocols.

Outcome Measures

Autoantibodies were tested at 9, 15, and 24 months and, if negative, annually thereafter;

autoantibody positive children were re-tested every 3-6 months. Radio-immunoassays for insulin

(IAA), glutamic acid decarboxylase (GADA), insulinoma-associated 2 (IA-2A) and/or zinc

transporter 8 (ZnT8A) were as described (19–23). Subjects were considered persistently islet

autoantibody positive if they had ≥2 consecutive confirmed positive samples, not due to maternal

islet autoantibody transfer, or one confirmed positive sample and developed diabetes prior to next

sample collection. Diabetes was diagnosed using American Diabetes Association criteria.

Selection of subjects for analyses

67 children were selected in July of 2011 from the DAISY cohort for studies of metabolomic,

proteomic and immune predictors. Of those, 22 children were followed to diabetes (T1D group),

20 who had developed persistent IA and were islet autoantibody positive at their last study visit

(AbPos group), and 25 controls (Control [C] group). Controls were frequency matched with

subjects in the combined T1D and AbPos group on the HLA DR/DQ genotypes, age, sex and FDR

status. As of September 29, 2017, all controls have been negative for all islet autoantibodies. Of

the AbPos group, four progressed to diabetes in subsequent years at median age 17.8 years. These

individuals were retained in the AbPos group. Supplemental figure 1 describes subject selection.

Supplemental Table 1 presents individual autoantibody histories of all cases and controls at

relevant time points.

Specimens for Analysis

5 Diabetes Page 6 of 66

When available, samples for each subject were analyzed for metabolomic, proteomic and immune biomarkers at four time points:

T1 –earliest available sample prior to development of islet autoantibodies (typically age 9-

15 months);

T2 –just prior to development of first autoantibody;

T3 - just after development of first autoantibody, and

T4 –just prior to diagnosis of diabetes or most recent sample for AbPos subjects at time of

sample selection.

Of the T1D subjects, five were missing a T2 sample. Samples from control subjects were selected to frequency match storage time of samples from T1D and AbPos subjects combined.

Metabolomic analysis

Global metabolic profiling combined two separate ultrahigh performance liquid chromatography/tandem mass spectrometry (UHPLC/MS/MS2) injections, optimized for basic and acidic species, and gas chromatography/mass spectrometry (GS/MS) (Metabolon, Durham,

NC). All serum samples were stored at -80 oC ≤1 hour after collection, never thawed until analyses and processed essentially as described previously (24,25). Metabolites were identified by automated comparison of ion features in experimental samples to a reference library of chemical standard entries using software developed at Metabolon (26). 382 named metabolites were included in this analysis. For statistical analyses and data display, any missing values were assumed to be below limits of detection and these values were imputed with the compound minimum (minimum value imputation).

Proteomic Analysis

6 Page 7 of 66 Diabetes

Relative abundance of 1001 serum were measured by aptamers (Somalogic, Boulder, CO)

(27). Additionally, 49 (representing 24 proteins) were measured by LC-MRM/MS in the

laboratory of Drs. Thomas Metz and Qibin Zhang at PNNL as previously described (28).

Immune markers

Cytokines were measured using a Human Custom Cytokine 9-Plex assay (Meso Scale Discovery,

Rockville, MD) and included: Interferon [IFN]-α2a, interleukin [IL]-6, IL-17, IL-1β, interferon

gamma-induced protein [IP]-10, monocyte chemotactic protein [MCP]-1, IFN-γ, IL-1α, and IL-

1ra (4).

Genotyping

All 106 non-HLA SNPs available for these subjects were included. and reference for

genotyping are shown in Supplemental Table 2. As genetic data were derived from several

analyses, some were represented by more than one SNP and the rs2476601 SNP for PTPN22

was present in two genetic feature sets from separate analyses (Supplemental Table 2).

Metadata

Individual metadata included in model consisted of age at sampling, sex, Hispanic ethnicity, and

FDR status (classified as mother with type 1 diabetes, other FDR with type 1 diabetes or no FDR)

Additionally, subjects were classified by 4 HLA risk categories based on typing for HLA class II

alleles as previously described (29).

Statistics and Machine Learning

SAS version 9.4 (SAS institute, Cary, NC, USA) was used to analyze descriptive data for groups,

Integrative machine learning, based on the set of demographic characteristics, variants,

cytokines, proteins, and metabolites, was used to evaluate if predictive models can separate future

cases from controls, as well as identify the primary features that distinguish the groups. Two

7 Diabetes Page 8 of 66

disease stages were modeled; (A) Development of IA: transition from T1 to T2 among combined

AbPos and T1D subjects vs. controls, and (B) Progression to diabetes: transition from T3 to T4 among T1D subjects vs. AbPos subjects. Transition between time points are represented in analysis by determining log fold difference of cytokine, protein or metabolite between time points.

Integrative machine learning was performed using probability-based integration of multiple data streams via the Posterior Probability Product (P3)(16,30). Feature selection was performed in the context of the integrated model using a Repeated Optimization for Feature Interpretation (ROFI) approach (31). The ROFI-P3 algorithm for this analysis is described in detail in Supplemental

Figure 2. It has several key characteristics amenable to identifying and evaluating important features of diabetes progression across disparate data sources, which include allowing each dataset to be modeled with the optimal machine learning algorithm and features to be assigned importance metrics through repeated analyses.

The first step requires selection of the machine learning method to model each dataset (e.g., metabolomic, proteomic, etc.). We evaluated seven machine learning classification methods with all features in each dataset where a feature is the ratio of the measurements between time points for a case or control subject, including Random Forest (RF), Logistic Regression (LR), K-Nearest

Neighbors (KNN), Linear Discriminant Analysis (LDA), Support Vector Machine (SVM) with a radial basis function (RBF) kernel, SVM with a linear (LIN) kernel and Naïve Bayes (NB).

Supplemental Figure 3 shows average Area Under the Receiver Operating Characteristic Curve

(AUC) based on 5-fold CV repeated 100 times. The only requirement of this step is that a machine learning classifier can output the posterior probability defined as the probability that class i (푐푖) is observed given the data for subject s of dataset j (퐷푠푖); P(ci|Dsj). The posterior probabilities are

8 Page 9 of 66 Diabetes

generated using the standard functions in the R programming language for each machine learning

algorithm.

Integration via the P3 approach is a naïve product-based integration as the product of the posterior

∏ probability of each sample as related to each datasets, 푗푃(푐푖|퐷푠푗) (16,30). These integrated

probabilities can be used to compute the accuracy of the integrated model using a standard AUC.

Feature selection is performed on the integrated model to assure that features selected are those

that work best in combination across disparate sources. Selection utilizes a statistical optimization

algorithm, such as simulated annealing (SA), which is not affected by the order of features in the

dataset and allows the algorithm to move out of local minima by updating the solution at each

iteration based on the current feature state and sampling in proportion to including or excluding

the variable of interest. Thus, for each feature change proposal, this is based on looking at the

difference of the accuracy of the current state (AUCCurrent) and an updated solution (AUCUpdated).

 Updated Current  The updated solution is selected in proportion to exp AUC  AUC  based on a uniform   

distribution between 0 and 1, where  = 0.25 for this analysis. For each run of the algorithm, we

perform 100 random changes of individual features and keep or discard the change based on this

exponential difference between AUC values. After each 100 proposals and potential updates, we

determine if the solution has converged based on the difference between the AUC prior to the 100

feature evaluations (AUCprior) and the current solution. If this value is less than , which was set

to 1E-4 for this analysis, it is determined that the solution has converged (31).

Within ROFI-P3 the AUC is computed based on 5-fold CV for every feature evaluation. We repeat

the algorithm in conjunction with CV for 100 repetitions, each of which yields a single feature set

9 Diabetes Page 10 of 66

solution. We use the 100 repetitions to obtain a feature ensemble solution, which gives the likelihood that the feature would be selected for inclusion in the model. This is represented as the percentage of times that a specific feature was selected to be in the model. This also has the additional benefit of yielding robust measures of uncertainty on our classification accuracy metrics.

To evaluate the performance of the ROFI-P3 method versus established optimization methods, we compared ROFI-P3 to standard Recursive Feature Elimination (RFE) approaches. RFE is a method used extensively in biology in combination with various machine learning algorithms, such as

Linear Discriminant Analysis (LDA), and Support Vector Machines (SVM) (16,30,32,33). RFE is readily available in most statistical programming languages and is simple to implement. It is a greedy algorithm that sequentially eliminates the feature that yields the maximum AUC.

Data and Resource Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

RESULTS

Characteristics of the study subjects are presented in Table 1. Ages at visits ranged from 6 months to almost 21 years old with similar ages at time points T1, T2, T3 and T4 (Supplemental Table 3).

Demographic (metadata), genetic, immune, metabolic and proteomic biomarkers were analyzed across these four distinct time points.

10 Page 11 of 66 Diabetes

First, we determined the optimal machine learning algorithm for each of the data types

(Supplemental Figure 3). Next, we analyzed the ability of ROFI-P3 to predict the changes leading

up to two stages of type 1 diabetes; (A) seroconversion and (B) progression to diabetes. ROFI-P3

was performed and for each repetition features were selected. Each feature is represented as the

percentage of times it is selected as part of the model during 100 repetitions. For comparison, we

also ran RFE for multiple repetitions, each time permuting the features since the order of the

features has a direct relationship to those selected. This allowed us also to represent RFE features

as the percentage of times they were selected. To evaluate how well the methods work, a feature

selection threshold is selected and a ROC curve was generated only on this reduced model using

5-fold CV to build and test the model independently and to minimize over-fitting. Figures 1A and

1B show the results of these comparisons at a 50% frequency selection, i.e., features selected at

least 50% of the time for both ROFI-P3 and RFE, as well as if no feature selection is performed.

Various thresholds were evaluated and the ROFI-P3 integrated feature selection approach was

consistently more accurate than RFE, This demonstrated a clear advantage over both the RFE

selection and simple combination of all features for prediction of development of IA (AUC of 0.91

vs 0.84 and 0.64, respectively, p<0.0001) and progression (AUC of 0.92 vs 0.82 and 0.64,

respectively, p<0.0001) at a 50% feature selection threshold. We further evaluated the method in

the context of the classification of specific individuals rather than a global metric of classification.

If we select a defined reasonable false positive rate of 10% for the development of IA (Figure 1A),

we would correctly classify 67.6% of those who developed IA with ROFI-P3. The percentages

drop to 56.5% and 31.5% for the RFE selection and simple combination approaches, respectively.

The prediction is slightly better for the progression endpoint; ROFI-P3 correctly identifies 71.6%

11 Diabetes Page 12 of 66

compared to 61.4% for RFE and 18.3% for simple combination (Figure 1B). The top features selected by ROFI-P3 at a 50% frequency or higher for analysis of development of IA

(Supplemental Fig. 4A) and progression to diabetes (Supplemental Fig.4B) included features from five of the six datasets.

The top features selected by ROFI-P3 as predictors of IA are shown in Table 2. Percent selected is a measure of the number of times a particular feature was selected in the 100 iterations of the algorithm, an indication of the importance of that feature in the prediction of outcome.

Supplemental Table 4 shows all 76 features selected at 50% frequency or higher. Further detail regarding these features are shown in Supplemental Tables 5-8. Box plots of the log fold change in abundance in these top metabolites, proteins and peptides from T1 to T2 for IA and controls are shown in Figure 2.

The top features selected by ROFI-P3 as predictors of progression from IA to diabetes are shown in Table 3, while all 83 features selected at 50% frequency or higher are shown in Supplemental

Table 9. Characteristics of features including genotypes or direction of change in abundance from

T3 to T4 are further described in Supplemental Tables 10-13. Figure 3 shows the box plots for controls and AbPos groups of the log fold change of each feature from T3 to T4.

Of note, the metabolite glucose is the top selected feature, as could be expected during the progression to T1D. To evaluate the value of adding additional features to glucose, we ran a logistic regression using glucose alone as the predictor for progression to T1D compared to that of the selected ensemble. Change in glucose alone was able to classify a majority of cases, but the ROC

12 Page 13 of 66 Diabetes

AUC for the ROFI-P3 (0.91) was significantly higher than that for glucose alone (0.83, p<0.00001)

(Figure 1B). As expected for the endpoint of development of IA, the AUC of glucose is not

predictive, 0.48.

Overfitting is often an issue with machine learning, especially when samples sizes are only large

enough to allow CV. To evaluate if the top features could separate the groups with an unsupervised

approach, Principal Component Analysis (PCA), was utilized on only the top qualitative omics

features in Tables 1 and 2. From the PCA plot, the first component can visually separate the two

groups for both predictors of IA (Figure 4A) and progression from IA to diabetes (Figure 4B)

without prior knowledge of the groups. This demonstrates that although there may be some over-

fitting, the methodology in general is identifying features that can discriminate the groups of

interest.

DISCUSSION

Identification of causative factors in the development of islet autoimmunity and type 1 diabetes

has been elusive. Recent observations regarding the role of vitamin D in risk of IA has underlined

the importance of understanding environmental exposures in the context of genetic background

(34). Thus, analysis which integrates multiple data streams has the potential to identify unique

ensembles of pathogenic features. This proof-of-concept analysis represents the first integration of

disparate ‘omics data sets for the prediction of islet autoimmunity and type 1 diabetes. The ROFI-

P3 approach solves the feature selection process through hundreds of iterations, resulting in a

probability measure for each individual feature. This allows both the reduction of large data sets

to a smaller, more informative set of features as well as a robust measure of feature-level

13 Diabetes Page 14 of 66

uncertainty. The biomarker panels identified in this analysis represent an individualized prediction

algorithm based on a set of disparate features (e.g. metabolites, proteins in combination with

genetics and standard risk factors) selected in at least 50% of the iterations. These models predicted

development of IA and progression to diabetes with a ROC AUC of 0.91 and 0.92, respectively.

It should be noted that as the analysis incorporated change in protein, metabolite or cytokine over

time, selected features represent features whose change, not absolute value, is associated with

outcome.

To predict development of IA, several metadata features were included which serves to adjust the

analysis for these factors. Amongst the most highly selected features were all five metadata

elements: age, FDR status, Hispanic ethnicity, HLA risk group and sex, indicating that these

categories were important in conjunction with other features in the prediction of IA.

Two highly-selected features were genetic markers associated with development of IA, PTPN22

(rs2476601) and CTLA4 (rs3087243 and rs231775) (5,35). Both PTPN22 (rs2476601) and CTLA4

(rs3087243 and rs231775) were selected twice in this analysis as they were included in the feature set comprising data from multiple separate genetic analyses. The observation that the same SNPs were selected twice provides additional evidence of robustness of this analytical approach.

Many of the most frequently-selected features were metabolites. The highest selected feature was ascorbate (vitamin C), an important antioxidant. Ascorbate was present at lower relative abundance in participants who developed IA at the earliest time point (T1) relative to controls and rose over time (Figure 2), while controls started with a higher level and then showed a downward

14 Page 15 of 66 Diabetes

trend in ascorbate levels. Discrepant trajectories between these two groups were significantly

associated with IA outcome (Supplemental Table 6). Other metabolites whose change over time

predicted outcome included 3-methyl-oxobutyrate (alpha-ketoisovaleric acid), a degradation

product from valine as well as a precursor to valine for leucine synthesis. These branched-chain

amino acids are known to predict development of insulin resistance. They play an intriguing role

in promoting lymphocyte growth and proliferation as well as cytotoxic T lymphotyce activity and

have been previously identified as elevated prior seroconversion (36). 4-hydroxyhippuric acid is a

microbial end-product produced through polyphenol metabolism by intestinal microflora (37), and

serum levels are affected by altered gut permeability in mice (38). Pyroglutamic acid is a derivative

of L-glutamic acid, formed nonenzymatically from glutamate, glutamine, and gamma-

glutamylated peptides. Elevated blood levels of pyroglutamine may indicate problems in

antioxidant glutathione metabolism (39). This metabolite increased during progression to IA in

cases but decreased in controls.

Finally, among the most frequently features for development of islet autoimmunity, were multiple

proteins involved in immunity and inflammation: FCRL3 (Fc receptor-like protein 3), KLRK1,

MMP-2 and Activin A. Also selected were SSRP1, a protein involved in DNA repair and CSK21,

which plays a role in apoptosis and response to viral infections.

Of the five elements of metadata included in analysis for progression from IA to diabetes, only

age and FDR status were amongst the highest-selected features, indicating that these features were

important in conjunction with the constellation of other highly-selected features.

15 Diabetes Page 16 of 66

Interestingly, top-selected SNPs associated with development of IA were different from those associated with progression to diabetes. Of these, rs2157678 (HLA-DQB1 8.1) is associated with the ancestral HLA DR3-B8-A1 haplotype (40).

In analysis of progression from IA to diabetes, selected metabolomic features included multiple carbohydrates: glucose, mannose and ribose (Table 3, Figure 3). All three metabolites increased in abundance in children progressing to diabetes but decreased from T3 to T4 in the control group

(Supplemental Table 11). Additionally butyrylcarnitine, an acylcarnitine, was noted to increase from T3 to T4 in cases but decrease in controls. This could be explained by an overall increase in lipolysis secondary to progressive insulinopenia as one approaches clinical diabetes.

Among the top proteomic features in progression from IA to diabetes was Cystatin-F, a protein which modulates natural killer and T-cell cytotoxity and RAD51, which plays a role in DNA repair. Plasminogen, a protease important for lysis blood clots also plays a role in activating the . Proteins involved in cell adhesion and growth ((DRR1, IL-11 RA and

Spondin-1) were also among the most frequently selected features.

In summary, we demonstrated that the ROFI-P3 algorithm can identify and evaluate known and novel predictors of development of IA and progression to diabetes across disparate data sources.

Importantly, in children with high-risk HLA genotypes, changes in relative abundance of certain proteins and metabolites as well genetic markers predicted development of IA, and a distinct constellation of features predicted progression of persistent IA to diabetes. Seroconversion was associated with altered antioxidant profile, a finding that has been noted in humans (41) and NOD

16 Page 17 of 66 Diabetes

mice (42). Additionally, there are indications of altered gut permeability, another proposed

pathogenic mechanism (43). In contrast, progression from IA to diabetes was associated with

altered sugars and acyl carnitines, indicating a potential switch to alternate metabolic pathways as

relative insulin deficiency becomes more prominent.

The goal of this study was to develop a robust statistical machine learning model that predicts

development of IA and progression from IA to diabetes. The major advantage of this study is the

prospective characterization of developing autoimmunity over a prolonged period of time, with

repeat longitudinal measurements of biomarkers. The DAISY cohort has over 20 years of follow-

up. While a peak incidence of IA has been observed within the first 2 years of life (3,44), new

seroconversion has been observed well into adolescence and beyond (45). In addition, data from

multiple sources (clinical data, genetics, metabolomics, and proteomics) are available to be

integrated in a machine learning framework. Limitations of this study include the relatively small

numbers of subjects in each group. Larger cohort size could allow additional analysis, including

examination of whether age at seroconversion or specific endotypes play a role in the features

selected. A further limitation of the study is the potential bias to individuals with later

seroconversion. Both IA and T1D groups (Table 1) were older than reported median

seroconversion age of 2.3 years in the TEDDY study at 7 years of follow-up (46). In contrast,

studies such as DAISY (47) and BABYDIAB (44) with longer follow-up beyond the early peak in

autoimmunity, observe ongoing seroconversion into later childhood. Thus, selection of

participants included individuals with seroconversion at older ages. Further, attention to requisite

sample availability may have biased against individuals with early seroconversion who often have

17 Diabetes Page 18 of 66

exceedingly rapid progression. This may impact generalizability to such rapidly-progressing individuals.

Building predictive models via machine learning is an emerging strategy for identification of predictive biomarkers in type 1 diabetes and other diseases; however, challenges remain in the integration of large and diverse data sets. Machine learning strategies that incorporate feature selection allow identification of biomarkers which perform well in combination. This not only selects the most predictive features from among many, but also may lend insight into important biological mechanisms. Although P3 and ROFI have been both used previously to study ‘omics data, this is the first combination for feature selection in an integrative fashion. The feature sets identified using the ROFI-P3 strategy perform well in prediction of both IA and type 1 diabetes outcome. Further, identification of distinct panels of predictors underlines differences between processes leading to development of IA from pathways involved in progression to diabetes. The associated measure of probability adds further information for interpreting the utility of various biomarkers and could help researchers in identifying the best candidates to focus limited resources on validation. Further studies will determine whether these selected features can be validated in independent populations to predict progression to IA or type 1 diabetes.

ACKNOWLEDGEMENTS

B.I.F., B.J.W.R., and M.R. contributed to the design of the study, analyzed data, contributed to the discussion, and wrote manuscript. L.M.B. and S.M.R. analyzed data, contributed to the discussion and reviewed/edited manuscript. A.K.S., J.M.N. and K.W. researched data, contributed to discussion, and reviewed/edited manuscript. M.R. is principal investigator of the DAISY study.

18 Page 19 of 66 Diabetes

All authors approved the final version of the manuscript. M.R. is the guarantor of this work and

has full access to all the data in the study and takes responsibility for the integrity of the data and

the accuracy of the data analysis. All authors declare no conflicts of interest, real or perceived,

financial or nonfinancial.

This work was supported by JDRF (grants 17-2013-535,11-2010-206 and 5-ECR-2017-388-A-N)

and the National Institutes of Health (grants R01 DK32493, DK32083, DK049654, K12

DK094712, and P30 DK57516).

19 Diabetes Page 20 of 66

REFERENCES

1. Insel RA, Dunne JL, Atkinson MA, Chiang JL, Dabelea D, Gottlieb PA, et al. Staging Presymptomatic Type 1 Diabetes: A Scientific Statement of JDRF, the Endocrine Society, and the American Diabetes Association. Diabetes Care. 2015 Oct 1;38(10):1964–74.

2. Barker JM, Barriga KJ, Yu L, Miao D, Erlich HA, Norris JM, et al. Prediction of autoantibody positivity and progression to type 1 diabetes: Diabetes Autoimmunity Study in the Young (DAISY). J Clin Endocrinol Metab. 2004 Aug;89(8):3896–902.

3. Ilonen J, Hammais A, Laine A-P, Lempainen J, Vaarala O, Veijola R, et al. Patterns of β-cell autoantibody appearance and genetic associations during the first years of life. Diabetes. 2013 Oct;62(10):3636–40.

4. Waugh K, Snell-Bergeon J, Michels A, Dong F, Steck AK, Frohnert BI, et al. Increased inflammation is associated with islet autoimmunity and type 1 diabetes in the Diabetes Autoimmunity Study in the Young (DAISY). PloS One. 2017;12(4):e0174840.

5. Hermann R, Laine AP, Veijola R, Vahlberg T, Simell S, Lähde J, et al. The effect of HLA class II, insulin and CTLA4 gene regions on the development of humoral beta cell autoimmunity. Diabetologia. 2005 Sep 1;48(9):1766–75.

6. Steck AK, Wong R, Wagner B, Johnson K, Liu E, Romanos J, et al. Effects of non-HLA gene polymorphisms on development of islet autoimmunity and type 1 diabetes in a population with high-risk HLA-DR,DQ genotypes. Diabetes. 2012 Mar;61(3):753–8.

7. Törn C, Hadley D, Lee H-S, Hagopian W, Lernmark Å, Simell O, et al. Role of Type 1 Diabetes-Associated SNPs on Risk of Autoantibody Positivity in the TEDDY Study. Diabetes. 2015 May;64(5):1818–29.

8. Winkler C, Krumsiek J, Buettner F, Angermüller C, Giannopoulou EZ, Theis FJ, et al. Feature ranking of type 1 diabetes susceptibility genes improves prediction of type 1 diabetes. Diabetologia. 2014 Sep 4;57(12):2521–9.

9. Krischer JP, Liu X, Lernmark Å, Hagopian WA, Rewers MJ, She J-X, et al. The Influence of Type 1 Diabetes Genetic Susceptibility Regions, Age, Sex, and Family History on the Progression From Multiple Autoantibodies to Type 1 Diabetes: A TEDDY Study Report. Diabetes. 2017;66(12):3122–9.

10. Frohnert BI, Laimighofer M, Krumsiek J, Theis FJ, Winkler C, Norris JM, et al. Prediction of type 1 diabetes using a genetic risk model in the Diabetes Autoimmunity Study in the Young. Pediatr Diabetes. 2018;19(2):277–83.

11. Orešič M. Metabolomics in the Studies of Islet Autoimmunity and Type 1 Diabetes. Rev Diabet Stud RDS. 2012;9(4):236–47.

20 Page 21 of 66 Diabetes

12. Pflueger M, Seppänen-Laakso T, Suortti T, Hyötyläinen T, Achenbach P, Bonifacio E, et al. Age- and Islet Autoimmunity–Associated Differences in and Lipid Metabolites in Children at Risk for Type 1 Diabetes. Diabetes. 2011 Nov;60(11):2740–7.

13. Moulder R, Bhosale SD, Erkkilä T, Laajala E, Salmi J, Nguyen EV, et al. Serum Proteomes Distinguish Children Developing Type 1 Diabetes in a Cohort With HLA-Conferred Susceptibility. Diabetes. 2015 Jun 1;64(6):2265–78.

14. Toerne C von, Laimighofer M, Achenbach P, Beyerlein A, Gala T de las H, Krumsiek J, et al. serum markers in islet autoantibody-positive children. Diabetologia. 2017 Feb 1;60(2):287–95.

15. Liu C-W, Bramer L, Webb-Robertson B-J, Waugh K, Rewers MJ, Zhang Q. Temporal expression profiling of plasma proteins reveals oxidative stress in early stages of Type 1 Diabetes progression. J Proteomics. 2018 Feb 10;172:100–10.

16. Beagley N, Stratton KG, Webb-Robertson B-JM. VIBE 2.0: Visual Integration for Bayesian Evaluation. Bioinformatics. 2010 Jan 15;26(2):280–2.

17. Rewers M, Bugawan TL, Norris JM, Blair A, Beaty B, Hoffman M, et al. Newborn screening for HLA markers associated with IDDM: diabetes autoimmunity study in the young (DAISY). Diabetologia. 1996 Jul;39(7):807–12.

18. Norris JM, Yin X, Lamb MM, et al. Omega-3 polyunsaturated fatty acid intake and islet autoimmunity in children at increased risk for type 1 diabetes. JAMA. 2007 Sep 26;298(12):1420–8.

19. Gianani R, Rabin DU, Verge CF, Yu L, Babu SR, Pietropaolo M, et al. ICA512 autoantibody radioassay. Diabetes. 1995 Nov;44(11):1340–4.

20. Grubin CE, Daniels T, Toivola B, Landin-Olsson M, Hagopian WA, Li L, et al. A novel radioligand binding assay to determine diagnostic accuracy of isoform-specific glutamic acid decarboxylase antibodies in childhood IDDM. Diabetologia. 1994 Apr;37(4):344–50.

21. Yu L, Robles DT, Abiru N, Kaur P, Rewers M, Kelemen K, et al. Early expression of antiinsulin autoantibodies of humans and the NOD mouse: evidence for early determination of subsequent diabetes. Proc Natl Acad Sci U S A. 2000 Feb 15;97(4):1701–6.

22. Bonifacio E, Yu L, Williams AK, Eisenbarth GS, Bingley PJ, Marcovina SM, et al. Harmonization of glutamic acid decarboxylase and islet antigen-2 autoantibody assays for national institute of diabetes and digestive and kidney diseases consortia. J Clin Endocrinol Metab. 2010 Jul;95(7):3360–7.

23. Wenzlau JM, Juhl K, Yu L, Moua O, Sarkar SA, Gottlieb P, et al. The cation efflux transporter ZnT8 (Slc30A8) is a major autoantigen in human type 1 diabetes. Proc Natl Acad Sci U S A. 2007 Oct 23;104(43):17040–5.

21 Diabetes Page 22 of 66

24. Ohta T, Masutomi N, Tsutsui N, Sakairi T, Mitchell M, Milburn MV, et al. Untargeted Metabolomic Profiling as an Evaluative Tool of Fenofibrate-Induced Toxicology in Fischer 344 Male Rats. Toxicol Pathol. 2009 Jun 1;37(4):521–35.

25. Evans AM, DeHaven CD, Barrett T, Mitchell M, Milgram E. Integrated, Nontargeted Ultrahigh Performance Liquid Chromatography/Electrospray Ionization Tandem Mass Spectrometry Platform for the Identification and Relative Quantification of the Small- Molecule Complement of Biological Systems. Anal Chem. 2009 Aug 15;81(16):6656–67.

26. DeHaven CD, Evans AM, Dai H, Lawton KA. Organization of GC/MS and LC/MS metabolomics data into chemical libraries. J Cheminformatics. 2010 Oct 18;2:9.

27. Lollo B, Steele F, Gold L. Beyond antibodies: New affinity reagents to unlock the proteome. PROTEOMICS. 2014 Mar 1;14(6):638–44.

28. Zhang Q, Fillmore TL, Schepmoes AA, Clauss TRW, Gritsenko MA, Mueller PW, et al. Serum proteomics reveals systemic dysregulation of innate immunity in type 1 diabetes. J Exp Med. 2013 Jan 14;210(1):191–203.

29. Steck AK, Dong F, Wong R, Fouts A, Liu E, Romanos J, et al. Improving prediction of type 1 diabetes by testing non-HLA genetic variants in addition to HLA markers. Pediatr Diabetes. 2014 Aug;15(5):355–62.

30. Webb-Robertson B-JM, McCue LA, Beagley N, McDermott JE, Wunschel DS, Varnum SM, et al. A Bayesian integration model of high-throughput proteomics and metabolomics data for improved early detection of microbial infections. Pac Symp Biocomput Pac Symp Biocomput. 2009;451–63.

31. Webb-Robertson BJM, Bramer LM, Reehl SM, Metz TO, Zhang Q, Rewers MJ, et al. ROFI - The Use of Repeated Optimization for Feature Interpretation. In: 2016 International Conference on Computational Science and Computational Intelligence (CSCI). 2016. p. 29– 33.

32. Webb-Robertson B-J, Kreuzer H, Hart G, Ehleringer J, West J, Gill G, et al. Bayesian Integration of Isotope Ratio for Geographic Sourcing of Castor Beans. J Biomed Biotechnol. 2012;2012:450967.

33. Rolandsson O, Hägg E, Nilsson M, Hallmans G, Mincheva-Nilsson L, Lernmark Å. Prediction of diabetes with body mass index, oral glucose tolerance test and islet cell autoantibodies in a regional population. J Intern Med. 2001 Apr;249(4):279–88.

34. Norris JM, Lee H-S, Frederiksen B, Erlund I, Uusitalo U, Yang J, et al. Plasma 25- Hydroxyvitamin D Concentration and Risk of Islet Autoimmunity. Diabetes. 2018 Jan 1;67(1):146–54.

35. Steck AK, Liu SY, McFann K, Barriga KJ, Babu SR, Eisenbarth GS, et al. Association of the PTPN22/LYP gene with type 1 diabetes. Pediatr Diabetes. 2006 Oct;7(5):274–8.

22 Page 23 of 66 Diabetes

36. Oresic M, Simell S, Sysi-Aho M, Nanto-Salonen K, Seppanen-Laakso T, Parikka V, et al. Dysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetes. J Exp Med. 2008;205(13):2975–84.

37. Jacobs DM, Spiesser L, Garnier M, de Roo N, van Dorsten F, Hollebrands B, et al. SPE- NMR metabolite sub-profiling of urine. Anal Bioanal Chem. 2012 Nov;404(8):2349–61.

38. Orrego-Lagarón N, Martínez-Huélamo M, Vallverdú-Queralt A, Lamuela-Raventos RM, Escribano-Ferrer E. High gastrointestinal permeability and local metabolism of naringenin: influence of antibiotic treatment on absorption and metabolism. Br J Nutr. 2015 Jul;114(2):169–80.

39. Emmett M. Acetaminophen toxicity and 5-oxoproline (pyroglutamic acid): a tale of two cycles, one an ATP-depleting futile cycle and the other a useful cycle. Clin J Am Soc Nephrol CJASN. 2014 Jan;9(1):191–200.

40. Baschal EE, Aly TA, Jasinski JM, Steck AK, Johnson KN, Noble JA, et al. The frequent and conserved DR3-B8-A1 extended haplotype confers less diabetes risk than other DR3 haplotypes. Diabetes ObesMetab. 2009 Feb;11(Suppl 1):25–30.

41. Orešič M, Simell S, Sysi-Aho M, Näntö-Salonen K, Seppanen-Laakso T, Parikka V, et al. Dysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetes. J Exp Med. 2008 Dec 22;205(13):2975–84.

42. Brosche T, Platt D. The biological significance of plasmalogens in defense against oxidative damage. Exp Gerontol. 1998 Aug;33(5):363–9.

43. Vaarala O, Atkinson MA, Neu J. The “Perfect Storm” for Type 1 Diabetes. Diabetes. 2008 Oct;57(10):2555–62.

44. Ziegler A-G, Bonifacio E, and the BABYDIAB-BABYDIET Study Group. Age-related islet autoantibody incidence in offspring of patients with type 1 diabetes. Diabetologia. 2012 Jul 1;55(7):1937–43.

45. Yu L, Rewers M, Gianani R, Kawasaki E, Zhang Y, Verge C, et al. Antiislet autoantibodies usually develop sequentially rather than simultaneously. J Clin Endocrinol Metab. 1996 Dec;81(12):4264–7.

46. Vehik K, Lynch KF, Schatz DA, Akolkar B, Hagopian W, Rewers M, et al. Reversion of β- Cell Autoimmunity Changes Risk of Type 1 Diabetes: TEDDY Study. Diabetes Care. 2016 Sep;39(9):1535–42.

47. Frohnert BI, Ide L, Dong F, Barón AE, Steck AK, Norris JM, et al. Late-onset islet autoimmunity in childhood: the Diabetes Autoimmunity Study in the Young (DAISY). Diabetologia. 2017 Jun 1;60(6):998–1006.

23 Diabetes Page 24 of 66

Table 1. Characteristics of the study participants. AbPos: autoantibody positive subjects; T1D: subjects who progressed to type 1 diabetes. FDR: first-degree relative; NHW: non- Hispanic white; IA islet autoimmunity; x = neither HLA-DR4 nor DR3.

Characteristic Control AbPos T1D P-value N=25 N=20 N=22 HLA-DR group: 0.55 4/4, 4/3, or 4/x and DQB1*03:02 17 15 17 3/3 or 3/x 5 3 3 other 3 2 2

FDR (%) 9 (36%) 11 (55%) 15 (68%) 0.08 Female (%) 13 (52%) 8 (40%) 10 (45%) 0.72 NHW Ethnicity (%) 20 (80%) 15 (75%) 21 (95%) 0.17 Age (years) at development of IA, median (IQR) 7.4 (5.4, 9.9) 5.2 (2.9, 7.9) 0.06 Age (years) at development of diabetes, median (IQR) 11.0 (9.4, 13.7) - All comparisons by Chi-square except age at IA, which was compared using Wilcoxon rank sum test.

24 Page 25 of 66 Diabetes

Table 2. The top 16 predictors for development of islet autoimmunity. Analysis by ROFI-P3 comparing control subjects to pooled antibody positive and type 1 diabetes subjects. Ranking by selection frequency for metabolites and proteins (fold change from T1 to T2) or SNPs (risk allele count). FDR: first-degree relative.

Selected (%) Source Feature Function/Description 100 Metabolite Ascorbate (Vitamin C) Antioxidant and coenzyme 100 Metadata Age (years) Age at T1 Grouped by: mother with type 1 diabetes, other FDR (sibling or father) or 98 Metadata First-degree relative status no FDR. 94 Metabolite 3-methyl-2-oxobutyrate Branched chain organic acid, precursor to leucine and valine synthesis. 93 Protein FCRL3 (Fc receptor-like protein 3) Promotes TLR9-induced B-cell proliferation, activation and survival Microbial end-product derived from polyphenol metabolism by the 91 Metabolite 4-hydroxyhippurate microflora in the intestine 90 Metadata Hispanic Self-report of Hispanic ethnicity NKG2D type II integral membrane protein/ Stimulatory and costimulatory innate immune response on activated killer 90 Protein KLRK1 cells. Involved in immunosurveillance of virus-infected cells 89 SNP rs2476601 (PTPN22) Autoimmunity gene - negative regulator of T-cell receptor signaling The FACT complex plays a role in mRNA elongation, DNA replication 89 Protein SSRP1 (FACT complex subunit) and DNA repair 89 Metabolite Pyroglutamine Glutamine and glutathione metabolism. Metalloproteinase involved in diverse functions including angiogenesis, 87 Protein MMP-2 tissue repair and inflammation. Member of TGF-β superfamily of cytokines, plays role in regulation of 86 Protein Activin A tissue homeostasis, organ development, inflammation, cell proliferation, and apoptosis 85 SNP rs2476601 (PTPN22) Autoimmunity gene - negative regulator of T-cell receptor signaling Regulates various cellular processes including cell cycle progression, 84 Protein CSK21 apoptosis and transcription, as well as response to viral infections 84 SNP rs3087243 (CTLA4) Autoimmunity gene - negative regulator of T-cell responses

Proteins: http://www.genecards.org/ and http://www.uniprot.org SNPs: http://www.SNPedia.com/ Metabolite: http://www.hmdb.ca/

25 Diabetes Page 26 of 66

Table 3. The top 16 predictors of progression from islet autoimmunity to diabetes. Analysis by ROFI-P3 comparing islet autoantibody positive subjects who progressed to diabetes to those who did not progress. Ranking by selection frequency for metabolites and proteins (fold change from T3 to T4) or SNPs (risk allele count).

Selected (%) Source Feature Function/Description 100 Metabolite Glucose Carbohydrate metabolism 100 Metadata Age (years) Age at T3 100 Metabolite ADP fibrinogen DRR1 (Down-regulated in renal cell carcinoma 100 Protein Regulation of cytoskeleton organization and cell growth 1)/Actin-associated protein FAM107A 99 Metabolite Mannose Carbohydrate metabolism 98 Protein RAD51 (DNA repair protein RAD51 homolog 1) Response to DNA damage, DNA repair 98 Protein CYTF (Cystatin-F) Inhibits L, may play a role in immune regulation MAPKAPK3 (MAP kinase-activated protein 97 Protein Stress-activated /threonine protein kinase. kinase 3) 92 SNP MHC (rs3117103) SLE-associated genetic variant 6 91 SNP from GWAS for T1D (rs7221109) Type 1 diabetes-associated genetic variant on Chromosome 17 Dissolves in blood clots, plays a role in inflammation and tissue 90 Protein Plasminogen remodeling 89 Metabolite Ribose Carbohydrate metabolism 89 Protein IL-11 RA (Interleukin-11 receptor subunit alpha) Development and proliferation of mesenchymal cells Type 1 diabetes-associated genetic variant, associated with HLA DR3- 89 SNP HLA-DQB1 8.1 (rs2157678) B8-A1 87 Metabolite Butyrylcarnitine Fatty acid ester 86 Protein Spondin-1 Cell adhesion protein

Proteins: http://www.genecards.org/ and http://www.uniprot.org SNPs: http://www.SNPedia.com/ Metabolite: http://www.hmdb.ca/

26 Page 27 of 66 Diabetes

FIGURE LEGENDS

Figure 1: ROC curves comparing (A) Development of IA: comparison of control group to combined AbPos and T1D groups at transition from earliest time point (T1) to pre- seroconversion (T2) and (B) Progression to T1D: comparison of AbPos to T1D groups transition from post-seroconversion (T3) to before T1D diagnosis (T4). Dotted line: Prediction based on all features; Gray dashed line: (RFE) prediction based on features selected 50 percent of the time or more using recursive feature eleimination; Solid black line: (ROFI): prediction based on features selected at least 50 percent of the time using ROFI-P3 algorithm; Black dashed line (in panel B only): prediction based on glucose change from T3 to T4.

Figure 2. The top 10 protein, peptide and metabolite predictors for development of islet autoimmunity. For each analyte, the box plots show log fold change from time 1 (T1) to time 2 (T2) for cases and controls with individual values noted by circles. The value of log2(T2-T1) is positive with increasing trajectory and negative with decreasing trajectory.

Figure 3. The top 12 protein, peptide and metabolite predictors for progression to diabetes. For each analyte, the box plots show log fold change from time 3 (T3) to time 4 (T4) for cases and controls with individual values noted by circles. The value of log2(T4-T3) is positive with increasing trajectory and negative with decreasing trajectory.

Figure 4: Principal component analysis of (A) predictors of IA based on top 4 metabolites and 6 proteins and (B) progression from IA to diabetes on top 5 metabolites and 7 proteins. Open circles represent controls and closed circles represent combined AbPos and T1D groups.

27 Diabetes Page 28 of 66

Figure 1: ROC curves comparing (A) Development of IA: comparison of control group to combined AbPos and T1D groups at transition from early time points of T1 to T2 and (B) Progression to T1D: comparison of AbPos to T1D groups transition from post-seroconversion (T3) to before T1D diagnosis (T4). Dotted line: Prediction based on all features; Gray dashed line: (RFE) prediction based on features selected 50 percent of the time or more using recursive feature eleimination; Solid black line: (ROFI): prediction based on features selected at least 50 percent of the time using ROFI-P3 algorithm; Black dashed line (in panel B only): prediction based on glucose change from T3 to T4. Page 29 of 66 Diabetes

Figure 2. The top 10 protein, peptide and metabolite predictors for development of islet autoimmunity. For each analyte, the box plots show log fold change from time 2 to time 1 for cases and controls with individual values noted by circles.

307x335mm (150 x 150 DPI) Diabetes Page 30 of 66

Figure 3. The top 12 protein, peptide and metabolite predictors for progression to diabetes. For each analyte, the box plots show log fold change from time 3 to time 4 for cases and controls with individual values noted by circles.

309x335mm (150 x 150 DPI) Page 31 of 66 Diabetes

Figure 4: Principal component analysis of (A) predictors of IA based on top 4 metabolites and 6 proteins and (B) progression from IA to diabetes on top 5 metabolites and 7 proteins. Open circles represent controls and closed circles represent combined AbPos and T1D groups.

361x166mm (96 x 96 DPI) Diabetes Page 32 of 66

SUPPLEMENTAL FIGURES AND TABLES

Supplemental Figure 1. Flowchart of study population. IA Negative: individuals who were never persistently positive for any islet autoantibody; IA Positive: individuals who had one or more persistently positive islet autoantibodies; Type 1 diabetes: diagnosed by ADA criteria between 1994 and 2011; AbPos: Antibody positive group; T1D: type 1 diabetes group.

1 Page 33 of 66 Diabetes

Supplemental Figure 2: Description of Repeated Optimization for Feature Interpretation with Posterior Probability Product (ROFI- P3) Algorithm.

2 Diabetes Page 34 of 66

Supplemental Figure 3: Average Area Under the Receiver Operating Characteristic Curve (AUC) based on 5-fold cross-validation repeated 100 times for each data source and machine learning classification method.

Outlined by a box are the maximum average AUC values selected for each of the six datasets across seven machine learning algorithms for two disease stages: (A) seroconversion and (B) progression to diabetes. This was the optimal machine learning algorithm.

Methods used: RF: Random Forest; KNN: K-Nearest Neighbors; LDA: Linear Discriminant Analysis; LR: Logistic Regression; SVM (RBF): Support Vector Machine with a radial basis function kernel, SVM (LIN): SVM with a linear kernel; and NB: Naïve Bayes.

3 Page 35 of 66 Diabetes

Supplemental Figure 4: Number of features from each data stream retained at 50% selection for (A) Development of IA and (B) Progression from IA to diabetes

A

Metadata

Cytokines

Metabolites

Peptides

Data Stream Data Proteins

SNPs

0 10 20 30 40 50 Number of Features Selected B

Metadata

Cytokines

Metabolites

Peptides

Data Stream Data Proteins

SNPs

0 10 20 30 40 50 Number of Features Selected

4 Diabetes Page 36 of 66

Supplemental Table 1: Raw data for age and islet autoantibody status for control, islet autoantibody positive (AbPos) and type 1 diabetes (T1D) groups. Age and islet autoantibody status (1=positive, 0=negative) are given for each time point. Serum autoantibodies were measured for insulin (IAA), glutamic acid decarboxylase (GADA), insulinoma-associated protein 2 (IA-2A) and zinc transporter 8 (ZnT8A).

Group randomID Timepoint Age (years) GADA IA-2A IAA ZnT8A Control 11010 T1 0.8 0 0 0 T2 3.0 0 0 0 T3 6.0 0 0 0 T4 12.0 0 0 0 Last 19.0 0 0 0 Control 13063 T1 0.8 0 0 0 T2 2.3 0 0 0 T3 4.3 0 0 0 T4/Last 6.8 0 0 0 0 Control 21580 T1 0.7 0 0 0 T2 7.5 0 0 0 T3 9.3 0 0 0 T4/Last 13.4 0 0 0 Control 21645 T1 0.7 0 0 0 T2 3.0 0 0 0 T3 5.2 0 0 0 T4/Last 8.2 0 0 0 0 Control 24684 T1 0.8 0 0 0 T2 2.1 0 0 0 T3 9.1 0 0 0 T4 12.1 0 0 0 Last 18.8 0 0 0 Control 28102 T1 0.7 0 0 0 T2 4.2 0 0 0 T3 7.6 0 0 0 T4/Last 16.1 0 0 0 Control 28410 T1 1.9 0 0 0 T2 5.1 0 0 0 T3 6.1 0 0 0 T4 7.2 0 0 0 0 Last 14.0 0 0 0 Control 31888 5 Page 37 of 66 Diabetes

T1 0.7 0 0 0 T2 4.3 0 0 0 T3 6.3 0 0 0 T4 11.8 0 0 0 Last 19.2 0 0 0 Control 35055 T1 0.7 0 0 0 T2 3.0 0 0 0 T3 8.0 0 0 0 T4 16.0 0 0 0 Last 18.9 0 0 0 Control 38964 T1 0.8 0 0 0 T2 1.3 0 0 0 T3 2.1 0 0 0 T4 11.1 0 0 0 Last 18.5 0 0 0 Control 44935 T1 0.8 0 0 0 T2 2.0 0 0 0 T3 4.2 0 0 0 T4 6.9 0 0 0 Last 12.8 0 0 0 Control 51884 T1 1.2 0 0 0 T2 2.0 0 0 0 T3 3.1 0 0 0 T4/Last 4.1 0 0 0 0 Control 54642 T1 0.7 0 0 0 T2 2.0 0 0 0 T3 4.0 0 0 0 T4/Last 8.8 0 0 0 0 Control 57003 T1 1.3 0 0 0 T2 5.4 0 0 0 T3 11.5 0 0 0 T4 15.5 0 0 0 Last 17.5 0 0 0 0 Control 57732 T1 0.8 0 0 0 T2 1.3 0 0 0 T3 2.1 0 0 0 T4 3.1 0 0 0 0

6 Diabetes Page 38 of 66

Last 9.2 0 0 0 Control 61032 T1 0.9 0 0 0 T2 1.5 0 0 0 T3 2.0 0 0 0 T4 4.1 0 0 0 0 Last 16.3 0 0 0 Control 62568 T1 2.0 0 0 0 T2 4.5 0 0 0 T3 6.6 0 0 0 T4 7.6 0 0 0 Last 11.8 0 0 0 0 Control 66029 T1 1.2 0 0 0 T2 3.0 0 0 0 T3 7.1 0 0 0 T4 16.1 0 0 0 Last 23.0 0 0 0 Control 74515 T1 0.8 0 0 0 T2 3.2 0 0 0 T3 6.5 0 0 0 T4 12.1 0 0 0 Last 17.4 0 0 0 Control 81324 T1 0.7 0 0 0 T2 2.0 0 0 0 T3 4.2 0 0 0 0 T4 6.5 0 0 0 Last 14.8 0 0 0 Control 90948 T1 0.8 0 0 0 T2 6.1 0 0 0 T3 9.1 0 0 0 T4 16.0 0 0 0 Last 21.5 0 0 0 Control 92568 T1 0.8 0 0 0 T2 5.4 0 0 0 T3 8.7 0 0 0 T4 13.9 0 0 0 Last 17.6 0 0 0 Control 95491

7 Page 39 of 66 Diabetes

T1 0.9 0 0 0 T2 6.0 0 0 0 T3 8.0 0 0 0 T4/Last 15.0 0 0 0 Control 98695 T1 0.7 0 0 0 T2 5.1 0 0 0 T3 9.4 0 0 0 T4 15.4 0 0 0 Last 22.5 0 0 0 Control 99949 T1 0.8 0 0 0 T2 5.2 0 0 0 T3 8.2 0 0 0 T4 14.1 0 0 0 Last 18.6 0 0 0 AbPos 10929 T1 0.8 0 0 0 0 T2 4.3 0 0 0 0 IA 5.3 1 0 0 0 T3 6.6 1 0 0 0 T4 17.0 1 1 0 Last 23.4 1 1 0 0 AbPos 12712 T1 1.4 0 0 0 T2 3.8 0 0 0 IA 5.8 1 0 0 T3 6.0 1 0 1 T4 6.5 1 0 1 Last 10.0 1 1 1 0 AbPos 15560 T1 0.8 0 0 0 T2 5.0 0 0 0 0 IA 6.0 0 0 1 T3 6.7 0 0 0 1 T4 8.6 1 1 1 Last 14.9 1 1 0 1 AbPos 18895 T1 0.9 0 0 0 T2 5.5 0 0 0 IA 10.1 0 1 0 1 T3 10.2 0 1 0 1 T4 12.0 1 1 0 Last 18.1 0 1 0 1

8 Diabetes Page 40 of 66

AbPos 28916 T1 1.3 0 0 0 0 T2 7.3 0 0 0 0 IA 8.3 0 0 0 1 T3 9.3 0 0 0 1 T4 15.2 1 1 0 Last 19.9 0 1 0 1 AbPos 33999 T1 0.7 0 0 0 0 T2 3.1 0 0 0 0 IA 7.9 1 0 1 0 T3 8.1 1 0 1 0 T4 15.7 1 0 0 Last 21.6 1 0 0 0 AbPos 44628 T1 0.8 0 0 0 0 T2 2.0 0 0 0 0 IA 3.0 1 0 1 1 T3 3.1 1 0 1 1 T4 9.1 1 1 1 Last 14.2 1 1 1 1 AbPos 54078 T1 3.5 0 0 0 0 T2 5.5 0 0 0 0 IA 7.4 0 0 0 1 T3 10.3 1 0 0 0 T4 14.1 1 1 0 Last 19.9 1 1 0 0 AbPos 66191 T1 2.1 0 0 0 T2 11.0 0 0 0 0 IA 12.0 1 0 1 0 T3 12.2 1 0 1 0 T4 13.7 1 0 0 Last 20.0 1 0 0 1 AbPos 68942 T1 4.4 0 0 0 0 T2 6.5 0 0 0 0 IA 13.5 1 0 0 0 T3 13.5 1 0 0 0 T4/Last 18.3 1 1 0 0 AbPos 76157 T1 0.7 0 0 0 T2 5.1 0 0 0 0

9 Page 41 of 66 Diabetes

IA 6.0 1 0 0 0 T3 6.2 1 0 0 0 T4 7.2 1 0 1 1 T1D 8.6 AbPos 77508 T1 0.8 0 0 0 0 T2 6.4 0 0 0 0 IA 7.4 1 1 0 0 T3 8.5 1 0 0 0 T4 15.2 1 0 0 Last 20.7 1 0 0 0 AbPos 78545 T1 0.8 0 0 0 T2 4.9 0 0 0 0 IA 5.5 1 0 0 0 T3 5.6 1 0 0 0 T4 8.6 1 1 0 Last 14.6 1 1 0 1 AbPos 79070 T1 0.7 0 0 0 T2 8.5 0 0 0 0 IA 9.5 0 1 0 0 T3 10.3 1 1 1 1 T4 11.0 0 1 0 T1D 12.6 AbPos 81198 T1 0.9 0 0 0 0 T2 2.0 0 0 0 0 IA 3.1 1 0 1 0 T3 3.4 1 0 1 0 T4 7.5 1 0 0 Last 13.4 1 0 1 0 AbPos 84814 T1 0.8 0 0 0 0 T2 1.3 0 0 0 0 IA 3.0 0 0 0 1 T3 4.2 1 1 1 1 T4 10.5 1 1 1 T1D 12.4 1 1 1 1 AbPos 92799 T1 5.0 0 0 0 0 T2 10.7 0 0 0 0 IA 11.8 1 0 0 0 T3 11.9 1 0 0 0

10 Diabetes Page 42 of 66

T4 20.2 1 0 0 Last 26.5 1 0 0 0 AbPos 97632 T1 0.8 0 0 0 0 T2 3.1 0 0 0 0 IA 4.2 0 0 0 1 T3 5.4 1 0 1 0 T4 10.0 1 1 1 Last 16.2 1 1 0 1 AbPos 98530 T1 0.7 0 0 0 T2 8.8 0 0 0 0 IA 9.8 1 0 0 T3 10.0 1 0 0 T4 12.3 1 0 0 Last 17.4 1 0 0 0 AbPos 99827 T1 3.5 0 0 0 0 T2 9.1 0 0 0 0 IA 10.1 1 0 0 0 T3 10.6 1 0 0 0 T4 19.1 1 0 0 T1D 22.9 1 0 0 0 T1D 10030 T1 7.9 0 0 0 0 T2 9.0 0 0 0 0 IA 10.1 1 0 0 0 T3 10.9 1 1 0 0 T4/T1D 13.7 1 1 0 1 T1D 16375 T1 1.3 0 0 0 0 T2 2.0 0 0 0 0 IA 3.1 1 1 0 1 T3 3.3 1 1 0 1 T4 14.0 1 1 1 1 T1D 14.4 T1D 18431 T1 1.3 0 0 0 0 T2 9.0 0 0 0 0 IA 10.0 1 0 0 0 T3 10.2 1 0 0 0 T4/T1D 13.7 1 0 1 0 T1D 21270 T1 2.1 0 0 0 0

11 Page 43 of 66 Diabetes

T2 7.0 0 0 0 0 IA 7.9 0 1 1 0 T3 8.2 1 1 1 1 T4 13.6 0 1 0 1 T1D 13.7 T1D 34040 T1 0.8 0 0 0 0 T2 4.9 0 0 0 0 IA 6.9 1 0 0 1 T3 7.3 1 0 0 1 T4 11.0 1 1 0 1 T1D 11.2 1 1 0 1 T1D 39171 T1 0.8 0 0 0 0 IA 1.6 1 1 0 0 T3 3.9 0 1 0 0 T4 4.9 1 1 0 0 T1D 7.0 T1D 41850 T1 3.8 0 0 0 0 T2 7.0 0 0 0 0 IA 7.9 0 0 1 0 T3 9.0 1 0 1 0 T4 11.3 1 0 1 0 T1D 11.3 T1D 42578 T1 1.3 0 0 0 0 T2 6.4 0 0 0 0 IA 7.6 0 0 0 1 T3 8.9 1 1 1 1 T4/T1D 11.1 0 1 1 1 T1D 44461 T1 1.1 0 0 0 0 T2 4.0 0 0 0 0 IA 4.6 0 0 1 0 T3 4.9 0 0 1 0 T4 11.7 0 0 0 0 T1D 11.7 T1D 45654 T1 0.7 0 0 0 0 T2 1.8 0 0 0 0 IA 5.8 1 0 0 0 T3 7.1 1 0 1 1 T4 9.6 0 1 1 1

12 Diabetes Page 44 of 66

T1D 10.0 T1D 48987 T1 2.1 0 0 0 0 T2 3.1 0 0 0 0 IA 4.1 0 0 1 1 T3 5.8 0 0 0 1 T4 9.8 0 0 0 1 T1D 10.9 0 0 0 1 T1D 55932 T1 1.0 0 0 0 0 T2 1.2 0 0 0 0 IA 2.2 0 1 0 0 T3 2.8 1 1 0 0 T4 7.9 1 1 0 1 T1D 8.3 0 1 0 1 T1D 58044 T2 2.0 0 0 0 0 IA 3.0 0 0 0 1 T3 4.1 1 0 1 1 T4 8.9 1 1 0 T1D 9.4 T1D 58364 T1 1.2 0 0 0 0 IA 2.0 1 1 1 1 T3 2.2 1 1 1 1 T4 5.0 1 1 1 1 T1D 6.2 1 1 1 1 T1D 69161 T1 1.2 0 0 0 0 T2 2.0 0 0 0 0 IA 6.0 1 0 0 1 T3 8.4 0 0 0 1 T4/T1D 10.0 0 0 0 1 T1D 72005 T1 1.3 0 0 0 0 T2 5.2 0 0 0 0 IA 6.4 1 0 0 0 T3 6.9 1 0 0 0 T4 13.0 0 1 0 1 T1D 13.1 0 1 0 1 T1D 78552 T1 3.0 0 0 0 0 T2 9.4 0 0 0 0 IA 12.6 0 1 0 0

13 Page 45 of 66 Diabetes

T3 12.8 0 1 0 0 T4 14.7 0 1 0 0 T1D 15.0 T1D 79606 T1 1.2 0 0 0 0 IA 4.0 1 0 1 0 T3 4.1 1 1 0 0 T4/T1D 9.6 0 1 1 0 T1D 91993 T1 0.8 0 0 0 0 T2 1.3 0 0 0 0 IA 2.2 1 0 1 0 T3 3.1 1 1 0 1 T4 6.7 1 0 0 1 T1D 6.9 T1D 95373 T1 0.9 0 0 0 0 IA 2.9 1 1 0 1 T3 3.1 1 1 1 0 T4 10.4 0 1 0 1 T1D 10.6 T1D 95696 T1 2.0 0 0 0 0 T2 6.0 0 0 0 0 IA 9.0 1 0 0 T3 12.0 1 0 0 0 T4 14.0 1 0 0 T1D 14.0 T1D 98856 T1 0.9 0 0 0 0 IA 2.2 1 0 1 0 T3 2.9 1 1 0 1 T4/T1D 6.7 1 0 0 1

14 Diabetes Page 46 of 66

Supplemental Table 2: SNPs included in analysis and reference.

SNP Locus Reference rs11712165 ARHGAP1 10 rs1038914 ARHGEF25 4 rs2241880 ATG16L1 4 rs10806425 BACH2 10 rs10509540 C10orf59 7 rs4900384 C14orf 8 rs1465788 C14orf181 4 rs229541 C1QTNF6 4 rs9388489 C6orf173 4 rs13098911 CCR1-3, LTF 10 rs13314993 CCR4, GLB1 10 microsatellite CCR5 9 rs11711054 CCR5 5 rs4763879 CD69 7 rs12708716 CLEC16A 8 rs231775 CTLA4 7 rs3087243 CTLA4 9 rs4675374 CTLA4, ICOS, CD28 10 rs7202877 CTRB 8 rs11254363 CUBN 4 rs6013897 CYP24A1 3 rs4646536 CYP27B1 3 rs10741657 CYP2R1 3 rs12794714 CYP2R1 3 rs12785878 DHCR7/NADSYN1 3 rs7454108 DR4-DQB1*0302 1 rs3798719 ELOVL2 6 rs7744440 ELOVL2 6 rs1570069 ELOVL2 6 rs953413 ELOVL2 6 rs11221332 ETS1 10 rs174556 FADS1 6 rs174537 FADS1/FADS2 6 rs174570 FADS1/FADS2 6 rs174583 FADS2 6 rs2664170 GAB3 7 rs2282679 GC [vitamin D binding protein] 3 rs3755967 GC [vitamin D binding protein] 3 rs4588 GC [vitamin D binding protein] 3 rs7041 GC [vitamin D binding protein] 3 rs7020673 GLIS3 7 rs2290400 GSDMB/ORMDL3 8

15 Page 47 of 66 Diabetes

rs399604 HLA-DOA 4 rs1431403 HLA-DPA1 4 rs3135021 HLA-DPA1, HLA-DPB1 10 rs2064476 HLA-DPB1 4 rs9277554 HLA-DPB1 4 rs7775228 HLA-DQ2.2 5 rs4713586 HLA-DQ4 10 rs2187668 HLA-DQA1 10 rs2157678 HLA-DQB1 8.1 5 rs5753037 HORMAD2 8 rs16940765 HRH4 4 rs3024496 IL10 7 rs17810546 IL12A-AS1 10 rs1295686 IL13 9 rs20541 IL13 9 rs917997 IL18RAP, IL18R1 10 rs2069762 IL2 4 rs4505848 IL2 4 rs181206 IL27 8 rs4788084 IL27 8 rs12251307 IL2RA 7 rs12722563 IL2RA 4 rs2104286 IL2RA 4 rs1805010 IL4R 9 rs2107356 IL4R 9 rs6897932 IL7R 4 rs689 INS-23-Hph1 7 rs2825932 intergenic between C1QBPP and 4 FDPSP6 rs4667121 ITGA4, UBE2E3 10 rs13151961 KIAA1109 10 rs1464510 LPP 10 rs2160322 MAGI2 10 rs6962966 MAGI2 10 rs3117103 MHC 5 rs2853977 MICA 4 rs2066844 NOD2 4 rs10763976 PARD3 4 rs4379776 PARD3 10 rs17035378 PLEK, FBX048 10 rs947474 PRKCQ 4 rs1893217 PTPN2 2 rs478582 PTPN2 2 rs2476601 PTPN22 5 rs2476601 PTPN22 7

16 Diabetes Page 48 of 66 rs802734 PTPRK 10 rs7738609 PTPRK 5 rs13003464 PUS10 10 rs9792269 PVT1 10 rs842647 REL 5 rs10903122 RUNX3 10 rs9811792 SCHIP1, IL12A 10 rs2281808 SIRPG 7 rs7804356 SKAP2 4 rs7221109 SMARCE1 4 rs12928822 SOCS1, PRM1, PRM2 10 rs17718324 SPARC 4 rs1738074 TAGAP 10 rs5979785 TLR8 4 rs10517086 TNFAIP3 4 rs11203203 UBASH3A 7 rs3788013 UBASH3A 4 rs9976767 UBASH3A 8 rs11568820 VDR 2 rs1544410 VDR 2 rs2228570 VDR 2 rs1250552 ZMIZ1 10

1. Barker JM, Triolo TM, Aly TA, Baschal EE, Babu SR, Kretowski A, et al. Two single nucleotide polymorphisms identify the highest-risk diabetes HLA genotype. Diabetes. 2008 Nov;57(1939–327X (Electronic)):3152–5. 2. Frederiksen B, Liu E, Romanos J, Steck AK, Yin X, Kroehl M, et al. Investigation of the vitamin D receptor gene (VDR) and its interaction with protein tyrosine phosphatase, non-receptor type 2 gene (PTPN2) on risk of islet autoimmunity and type 1 diabetes: The Diabetes Autoimmunity Study in the Young (DAISY). J Steroid Biochem Mol Biol. 2013 Jan;133:51–7. 3. Frederiksen BN, Kroehl M, Fingerlin TE, Wong R, Steck AK, Rewers M, et al. Association Between Vitamin D Metabolism Gene Polymorphisms and Risk of Islet Autoimmunity and Progression to Type 1 Diabetes: The Diabetes Autoimmunity Study in the Young (DAISY). The Journal of Clinical Endocrinology & Metabolism. 2013 Aug 26;98(11):E1845–51. 4. Frederiksen BN, Steck AK, Kroehl M, Lamb MM, Wong R, Rewers M, et al. Evidence of Stage- and Age- Related Heterogeneity of Non-HLA SNPs and Risk of Islet Autoimmunity and Type 1 Diabetes: The Diabetes Autoimmunity Study in the Young. Clin Dev Immunol [Internet]. 2013 [cited 2015 Sep 18];2013. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866813/ 5. Romanos J, Rosen A, Kumar V, Trynka G, Franke L, Szperl A, et al. Improving risk prediction by testing non-HLA variants additional to HLA variants. Gut. 2014 Mar;63(1468–3288 (Electronic)):415–22. 6. Norris JM, Kroehl M, Fingerlin TE, Frederiksen BN, Seifert J, Wong R, et al. Erythrocyte membrane docosapentaenoic acid levels are associated with islet autoimmunity: the Diabetes Autoimmunity Study in the Young. Diabetologia. 2014 Feb;57(2):295–304. 7. Steck AK, Wong R, Wagner B, Johnson K, Liu E, Romanos J, et al. Effects of non-HLA gene polymorphisms on development of islet autoimmunity and type 1 diabetes in a population with high-risk HLA-DR,DQ genotypes. Diabetes. 2012 Mar;61(3):753–8. 8. Steck AK, Dong F, Wong R, Fouts A, Liu E, Romanos J, et al. Improving prediction of type 1 diabetes by testing non-HLA genetic variants in addition to HLA markers. Pediatr Diabetes. 2014 Aug;15(5):355–62.

17 Page 49 of 66 Diabetes

9. Steck AK, Bugawan TL, Valdes AM, Emery LM, Blair A, Norris JM, et al. Association of Non-HLA Genes With Type 1 Diabetes Autoimmunity. Diabetes. 2005 Aug 1;54(8):2482–6. 10. Gutierrez-Achury J, Romanos J, Bakker SF, Kumar V, Haas EC de, Trynka G, et al. Contrasting the Genetic Background of Type 1 Diabetes and Celiac Disease Autoimmunity. Dia Care. 2015 Oct 1;38(Supplement 2):S37–44.

18 Diabetes Page 50 of 66

Supplemental Table 3: Disease state group summary

Clinical Age at Sampling Event Group Gender N Time 1 Time 2 Time3 Time 4 Age (y) Age Age Age (min/max) (min/max) (min/max) (min/max) Control F 13 1.0 ± 0.5 3.6 ± 1.7 6.4 ± 2.3 10.9 ± 3.9 (0.7 / 2.0) (1.3/ 7.5) (2.1 / 9.4) (4.1 / 16.1) M 12 0.9 ± 0.2 3.6 ± 1.9 6.3 ± 3.0 11.0 ± 4.9 (0.7 / 1.3) (1.3 / 6.1) (2.0 / 11.5) (3.1 / 16.1)

AbPos F 8 1.8 ± 1.6 6.1 ± 3.8 8.2 ± 3.5 12.9 ± 3.9 (0.7 / 5.0) (1.3 / 11.0) (3.4 / 12.2) (7.5 / 20.2) M 12 1.4 ± 1.2 5.4 ± 2.1 8.0 ± 2.8 12.3 ± 4.5 (0.7 / 4.4) (2.0 / 9.1) (3.1 / 13.5) (6.5 / 19.1)

T1D F 10 1.6 ± 0.7 *5.1 ± 2.7 6.7 ± 3.8 10.8 ± 3.1 (0.8 / 3.0) (1.3 / 9.4) (2.2 / 12.8) (5.0 / 14.7) M 12 1.8 ± 2.1 *4.6 ± 3.1 6.2 ± 2.9 10.2 ± 3.0 (0.7 / 7.9) (1.3 / 9.0) (2.9 / 10.9) (4.9 / 14.0) *T1D group is missing samples for 2 females and 3 males at Time 2

19 Page 51 of 66 Diabetes

Supplemental Table 4: The top features in development of islet autoimmunity (IA). Analysis by ROFI-P3 comparing controls to pooled islet autoantibody positive (AbPos) and type 1 diabetes (T1D) subjects from T1 to T2 for metadata, cytokines, metabolites and proteins and SNPs. Ranking by selection frequency, reporting features selected at least 50% of the time. (See supplemental tables 5-9 for comparison of each feature for case vs. control.)

Selected (%) Source Feature 100 Metabolite Ascorbate (Vitamin C) 100 Metadata Age (years) 98 Metadata First-degree relative status 94 Metabolite 3-methyl-2-oxobutyrate 93 Protein FCRL3 (Fc receptor-like protein 3) 91 Metabolite 4-hydroxyhippurate 90 Metadata Hispanic 90 Protein NKG2-D type II integral membrane protein/ KLRK1 89 SNP PTPN22 (rs2476601) 89 Protein SSRP1 (FACT complex subunit) 89 Metabolite Pyroglutamine 87 Protein MMP-2 (Matrix Metallopeptidase 2) 86 Protein Activin A 85 SNP PTPN22 (rs2476601) 84 Protein CSK21 (Casein kinase II subunit alpha) 84 SNP CTLA4 ( rs3087243) 82 Protein HAI-1(Hepatocyte growth factor activator inhibitor type 1/Kunitz-type protease inhibitor 1) 82 Protein Macrophage scavenger receptor 79 Metadata HLA risk group 77 Metabolite Betaine 77 Protein CD22 (B-cell receptor CD22) 77 Peptide C6 (Complement component C6) 76 Metadata Sex 76 Protein CCL18 (C-C motif chemokine 18/PARC) 75 Protein CRIS3 (Cysteine-rich secretory protein 3) 75 Protein Hepcidin (Liver-expressed antimicrobial peptide 1/LEAP-1) 73 Metabolite 4-androsten-3b,17b-diol disulfate 1 73 Metabolite N-acetylornithine 73 Protein FSTL3 (Follistatin-related protein 3) 73 Protein Angiopoietin-2 72 SNP rs2290400 (GSDMB/ORMDL3 )

20 Diabetes Page 52 of 66

71 Metabolite Dihomo-linolenate (20:3n3 or n6) 71 Metabolite S-methylcysteine 71 Metabolite Xylitol 71 Protein MFRP (Membrane frizzled-related protein) 71 Protein FGFR-3 (Fibroblast growth factor receptor 3) 71 Protein BSSP4 (Brain-specific 4) 71 Protein SEPR (Seprase/Prolyl FAP) 70 Metabolite 7-hydroxyoctanoate 69 Protein Cathepsin 68 Protein C8 (Complement component 8) 67 Protein IGFBP-2 (Insulin-like growth factor-binding protein 2) 66 Metabolite Thymol sulfate 66 Protein Osteoblast-specific transcription factor 2 65 Metabolite Dodecanedioate 65 Protein Protein disulfide A3 65 Protein Coagulation 65 Protein Stanniocalcin-1 65 SNP rs2282679 (GC [vitamin D binding protein]) 64 Metabolite 4-androsten-3b,17b-diol disulfate 2 64 Metabolite Sphingosine 64 Protein LKHA4 (Leukotriene A-4 ) 64 Protein LYVE1 (Lymphatic vessel endothelial hyaluronic acid receptor 1) 64 Protein Testican-2 64 Protein ROR1 (Inactive tyrosine-protein kinase transmembrane receptor ROR1) 63 Metabolite Phosphate 63 Protein TrATPase (Tartrate-resistant acid phosphatase type 5) 63 Protein , HMW 62 Peptide CLU (Clusterin) 61 Protein Cystatin C 61 Protein VEGF sR2 (Vascular endothelial growth factor receptor 2) 61 Protein PSA6 (Proteasome subunit alpha type-6) 60 Protein Olfactomedin-4 60 Protein Thrombopoietin Receptor 60 Protein DPP2 (Dipeptidyl peptidase 2) 59 Metabolite Uridine 58 Metabolite 3-indoxyl sulfate

21 Page 53 of 66 Diabetes

58 Protein NKp30 (Natural killer cell p30-related protein/Natural cytotoxicity triggering receptor 3/ NCR3) 58 Protein CLF-1/CLC Complex (Cytokine receptor-like factor 1/ Cardiotrophin-like cytokine factor 1) 57 Protein MMP-3 (MMP-2 (Matrix Metallopeptidase 3) 57 Protein GFRa-2 (GDNF family receptor alpha-2) 57 SNP PRKCQ (rs947474) 57 SNP CTLA4 (rs231775) 56 Protein MAPK2 (MAP kinase-activated protein kinase 2/MAPKAPK2) 56 Protein MAPT (Microtubule-associated protein tau) 54 Metabolite Tryptophan 54 Protein C3d (Complement C3d fragment) 51 SNP PARD3 (rs10763976)

22 Diabetes Page 54 of 66

Supplemental Table 5: Comparison of SNP genotype features between cases (AbPos/T1D) and controls for development of islet autoimmunity (IA). Ranking by selection frequency, reporting features selected at least 50% of the time. Genotype associated with cases (combined AbPos/T1D group) vs controls are listed.

Control Case (AbPos/T1D) Selected (%) SNP Genotypes Genotypes 89 rs2476601 (PTPN22) GG AG 85 rs2476601 (PTPN22) GG AG 84 rs3087243 (CTLA4) AA AG or GG 72 rs2290400 (GSDMB/ORMDL3 ) GG AA 65 rs2282679 (GC [vitamin D binding AA AC protein]) 57 rs947474 (PRKCQ) AA AG or GG 57 rs231775 (CTLA4) AA GG 51 rs10763976 (PARD3) AG GG

23 Page 55 of 66 Diabetes

Supplemental Table 6: Comparison of metabolomic features between cases (AbPos/T1D) and controls for development of islet autoimmunity (IA). Ranking by selection frequency, reporting features selected at least 50% of the time. Metabolites identified by ultrahigh performance liquid chromatography/tandem mass spectrometry (UHPLC/MS/MS2) and gas chromatography/mass spectrometry (GS/MS). Mean Log2(fold Δ) is the mean fold change from time 1 to time 2 for each group. Positive number indicates increase in the measure over time; negative number indicates decrease over time.

Control Case (AbPos/T1D) Selected (%) Metabolite Mean Log2(fold Δ) Mean Log2(fold Δ) 100 Ascorbate (Vitamin C) -1.843 0.718 94 3-methyl-2-oxobutyrate 0.420 -0.357 91 4-hydroxyhippurate -0.590 0.133 89 Pyroglutamine -0.034 0.376 77 Betaine -0.288 0.034 73 4-androsten-3b,17b-diol -0.209 0.904 disulfate 1 73 N-acetylornithine -0.405 0.064 71 Dihomo-linolenate (20:3n3 or 0.123 -0.214 n6) 71 S-methylcysteine -0.423 -0.066 71 Xylitol -0.231 0.158 70 7-hydroxyoctanoate -0.729 -0.145 66 Thymol sulfate 0.063 1.554 65 Dodecanedioate -1.372 -0.744 64 4-androsten-3b,17b-diol -0.942 -0.199 disulfate 2 64 Sphingosine -0.770 -0.142 63 Phosphate 0.008 -0.086 59 Uridine -0.047 -0.351 58 3-indoxyl sulfate 1.264 0.602 54 Tryptophan 0.035 -0.033

24 Diabetes Page 56 of 66

Supplemental Table 7: Comparison of protein features between cases (AbPos/T1D) and controls for development of islet autoimmunity (IA). Ranking by selection frequency, reporting features selected at least 50% of the time. Proteins identified by aptamer (Somalogic). Mean Log2(fold Δ) is the mean fold change from time 1 to time 2 for each group. Positive number indicates increase in the measure over time; negative number indicates decrease over time.

Control Selected Mean Log2(fold Case (AbPos/T1D) (%) Δ) Mean Log2(fold Δ) 93 FCRL3 (Fc receptor-like protein 3) 0.008 -0.034 90 NKG2-D type II integral membrane protein/ KLRK1 0.178 0.064 89 SSRP1 (FACT complex subunit) 0.006 -0.137 87 MMP-2 (Matrix Metallopeptidase 2) -0.075 0.006 86 Activin A -0.046 -0.040 84 CSK21 (Casein kinase II subunit alpha) -0.223 -0.066 82 HAI-1(Hepatocyte growth factor activator inhibitor -0.002 0.038 type 1/Kunitz-type protease inhibitor 1) 82 Macrophage scavenger receptor 0.160 0.018 77 CD22 (B-cell receptor CD22) 0.100 0.020 76 CCL18 (C-C motif chemokine 18/PARC) -0.314 -0.238 75 CRIS3 (Cysteine-rich secretory protein 3) 0.145 0.067 75 Hepcidin (Liver-expressed antimicrobial -0.031 0.000 peptide 1/LEAP-1) 73 FSTL3 (Follistatin-related protein 3) 0.053 0.005 73 Angiopoietin-2 -0.045 -0.024 71 MFRP (Membrane frizzled-related protein) 0.126 0.077 71 FGFR-3 (Fibroblast growth factor receptor 3) -0.044 -0.100 71 BSSP4 (Brain-specific serine protease 4) 0.027 0.049 71 SEPR (Seprase/ FAP) -0.170 -0.155 69 Cathepsin H 0.337 0.252 68 C8 (Complement component 8) 0.070 0.013 67 IGFBP-2 (Insulin-like growth factor-binding 0.253 0.263 protein 2) 66 Osteoblast-specific transcription factor 2 0.084 -0.161 65 Protein disulfide isomerase A3 -0.270 -0.172 65 Coagulation Factor X -0.101 -0.220 65 Stanniocalcin-1 -0.040 0.052 64 LYVE1 (Lymphatic vessel endothelial hyaluronic -0.050 -0.072 acid receptor 1) 64 Testican-2 -0.088 -0.099 64 ROR1 (Inactive tyrosine-protein kinase -0.032 -0.051 transmembrane receptor ROR1) 63 TrATPase (Tartrate-resistant acid phosphatase -0.336 -0.240 type 5) 63 Kininogen, HMW -0.013 -0.005 61 Cystatin C -0.068 0.021

25 Page 57 of 66 Diabetes

61 VEGF sR2 (Vascular endothelial growth factor -0.126 0.034 receptor 2) 61 PSA6 (Proteasome subunit alpha type-6) -0.035 0.049 60 Thrombopoietin Receptor -0.274 -0.293 60 DPP2 (Dipeptidyl peptidase 2) 0.177 0.242 60 Olfactomedin-4 0.067 0.038 58 NKp30 (Natural killer cell p30-related -0.065 -0.052 protein/Natural cytotoxicity triggering receptor 3/ NCR3) 58 CLF-1/CLC Complex (Cytokine receptor-like factor -0.121 -0.030 1/ Cardiotrophin-like cytokine factor 1) 57 MMP-3 (Matrix Metallopeptidase 3) -0.285 -0.211 57 GFRa-2 (GDNF family receptor alpha-2) -0.216 -0.127 56 MAPK2 (MAP kinase-activated protein kinase 2/ 0.082 0.078 MAPKAPK2) 56 MAPT (Microtubule-associated protein tau) 0.012 -0.007 54 C3d (Complement C3d fragment) 0.238 0.182

26 Diabetes Page 58 of 66

Supplemental Table 8: Comparison of peptide features between cases (AbPos/T1D) and controls for development of islet autoimmunity (IA). Ranking by selection frequency, reporting features selected at least 50% of the time. Peptides identified by LC-MRM/MS. Mean Log2(fold Δ) is the mean fold change from time 1 to time 2 for each group. Positive number indicates increase in the measure over time; negative number indicates decrease over time.

Control Case (AbPos/T1D) Selected (%) Peptide Mean Log2(fold Δ) Mean Log2(fold Δ) 77 C6 (Complement component C6) 0.139 0.031 64 KNG1 (Kininogen-1) -0.008 -0.012 62 CLU (Clusterin) 0.073 0.084

27 Page 59 of 66 Diabetes

Supplemental Table 9: The top features in progression from islet autoimmunity (IA) to type 1 diabetes. Analysis by ROFI-P3 comparing islet autoantibody positive subjects (controls) to T1D subjects (Cases) from T3 to T4 for metadata, cytokines, metabolites and proteins and SNPs. Ranking by selection frequency, reporting features selected at least 50% of the time. (See supplemental tables 11 – 15 for comparison of each feature for case vs. control.)

Selected Source Feature (%) 100 Metabolite Glucose 100 Metadata Age (years) 100 Metabolite ADP fibrinogen DRR1 (Down-regulated in renal cell carcinoma 1)/ 100 Protein Actin-associated protein FAM107A 99 Metabolite Mannose 98 Protein RAD51 (DNA repair protein RAD51 homolog 1) 98 Protein Cystatin-F (CYTF) 97 Protein MAPKAPK3 (MAP kinase-activated protein kinase 3) 92 SNP MHC (rs3117103) 91 SNP from GWAS for T1D (rs7221109) 90 Protein Plasminogen 89 Metabolite Ribose 89 Protein IL-11 RA (Interleukin-11 receptor subunit alpha) 89 SNP HLA-DQB1 8.1 (rs2157678) 87 Metabolite Butyrylcarnitine 86 Protein Spondin-1 85 Metabolite Gluconate 84 Protein Tropomyosin 2 82 SNP UBASH3A (rs11203203) 80 SNP GSDMB/ORMDL3 (rs2290400) 80 Protein Carbonic Anhydrase X 80 Protein FCG2A (Low affinity immunoglobulin gamma Fc region receptor II-a) 80 Protein TPSB2 ( beta-2) 80 Protein MED-1 (Mediator of RNA polymerase II transcription subunit 1) 79 Protein Calcineurin 78 Metadata First-degree relative status 75 SNP MAGI2 (rs2160322) 75 SNP from GWAS for celiac (rs10806425) 74 SNP HORMAD2 (rs5753037) 74 SNP MHC (rs2853977)

28 Diabetes Page 60 of 66

74 SNP CTLA4 (rs231775) 74 Protein FCG2B (Low affinity immunoglobulin gamma Fc region receptor II-b) 73 Protein Desmoglein-1 73 Protein NET4 (Netrin-4) IGFBP-1 (Insulin-like growth factor-binding 72 Protein protein 1) 72 Metabolite 3-methylhistidine 71 Protein Haptoglobin, Mixed Type 71 Peptide SERPI 71 Protein uPA (-type ) 70 Protein PAK3 (Serine/threonine-protein kinase PAK 3) 70 Protein 2 PTP-1B (Protein-tyrosine phosphatase 1B/Tyrosine-protein phosphatase non- 69 Protein receptor type 1) 69 Protein IF4A3 (Eukaryotic initiation factor 4A-III) 68 Protein CCL17 (C-C motif chemokine 17)/TARC 68 Protein Discoidin domain receptor 2 68 Protein PAFAH beta subunit (Platelet-activating factor acetylhydrolase IB subunit beta) 67 Protein Proteinase-3 67 Metabolite , des-arg(9) 67 Metabolite 7-methylguanine 65 Protein Kininogen,HMW 64 Protein Fractalkine/CX3CL-1 64 Protein ARP19 (cAMP-regulated phosphoprotein 19) 64 Metabolite 1,6-anhydroglucose 63 Protein Trypsin 63 Protein SLIK1 (SLIT and NTRK-like protein 1) 63 Protein SH21A (SH2 domain-containing protein 1A) 62 Protein -Antichymotrypsin 62 Protein TrkC (NT-3 growth factor receptor) 62 Protein Mn SOD (Superoxide dismutase [Mn], mitochondrial) 61 Protein MBL (Mannose-binding ) 61 Protein Nucleoside diphosphate kinase A 60 Protein Mitogen-activated protein kinase 14 (MAPK14) 60 Metabolite 3-hydroxyisobutyrate 59 Metabolite Threonine 58 Peptide PPBP 58 Metabolite Phenyllactate (PLA)

29 Page 61 of 66 Diabetes

58 Protein EphB4 (Ephrin type-B receptor 4) 58 Metabolite Cortisone 58 Metabolite 1,5-anhydroglucitol (1,5-AG) 57 Protein CD109 antigen 56 Protein GPVI (Platelet glycoprotein VI) 55 Metabolite Isoleucylglycine 55 Metabolite Fucose 54 Metabolite 2-hydroxyglutarate 54 Metabolite 1-linoleoylglycerophosphocholine 53 Protein MCP-4 (Monocyte chemotactic protein 4) / C-C motif chemokine 13 53 Protein Caspase-2 53 Protein ARTS1 (Endoplasmic reticulum aminopeptidase 1) 52 Protein Cardiotrophin I 52 Protein -6 SARP-2 (Secreted apoptosis-related protein 2)/ sFRP-1 (Secreted frizzled- 51 Protein related protein 1) NKp44 (Natural killer cell p44-related protein)/ Natural cytotoxicity triggering 50 Protein receptor 2 / CD336 50 Protein LAG-1 (Lymphocyte activation gene 1 protein)/ C-C motif chemokine 4-like

30 Diabetes Page 62 of 66

Supplemental Table 10: Comparison of SNP genotype features between cases and controls for progression from islet autoimmunity (IA) to T1D. Ranking by selection frequency, reporting features selected at least 50% of the time. Genotype associated with cases (T1D group) vs controls (AbPos group) are listed.

Control (AbPos) Case (T1D) Selected (%) SNP Genotypes Genotypes 92 MHC (rs3117103) TT AT 91 from GWAS for T1D (rs7221109) CT CC 89 HLA-DQB1 8.1 (rs2157678) CC AC 82 UBASH3A (rs11203203) GG AA or AG 80 GSDMB/ORMDL3 (rs2290400) AG AA 75 MAGI2 (rs2160322) CG CC 75 from GWAS for celiac (rs10806425) CC AC 74 HORMAD2 (rs5753037) CC CT 74 MHC (rs2853977) TT AT 74 CTLA4 (rs231775) AG AA

31 Page 63 of 66 Diabetes

Supplemental Table 11: Comparison of metabolomics features between cases (T1D) and controls (AbPos) for progression from islet autoimmunity (IA) to T1D. Ranking by selection frequency, reporting features selected at least 50% of the time. Metabolites identified by ultrahigh performance liquid chromatography/tandem mass spectrometry (UHPLC/MS/MS2) and gas chromatography/mass spectrometry (GS/MS). Mean Log2(fold Δ) is the mean fold change from time 3 to time 4 for each group. Positive number indicates increase in the measure over time; negative number indicates decrease over time.

Control (AbPos) Case (T1D) Selected (%) Metabolite Mean Log2(fold Δ) Mean Log2(fold Δ) 100 Glucose -0.225 0.423 100 ADP fibrinogen 0.048 0.745 99 Mannose -0.023 1.000 89 Ribose -0.494 0.147 87 Butyrylcarnitine -0.247 0.092 85 Gluconate -0.218 0.442 72 3-methylhistidine 0.871 0.008 67 Bradykinin, des-arg(9) 0.541 0.306 67 7-methylguanine -0.252 0.273 64 1,6-anhydroglucose -0.086 0.896 60 3-hydroxyisobutyrate -0.339 -0.056 59 Threonine -0.075 0.100 58 Phenyllactate (PLA) 0.085 0.210 58 Cortisone 0.422 0.026 58 1,5-anhydroglucitol (1,5-AG) 0.026 -0.708 55 Isoleucylglycine -0.317 0.139 55 Fucose -0.027 0.276 54 2-hydroxyglutarate -0.910 -0.527 54 1-linoleoylglycerophosphocholine 0.013 0.230

32 Diabetes Page 64 of 66

Supplemental Table 12: Comparison of protein features between cases and controls for progression from islet autoimmunity (IA) to T1D. Ranking by selection frequency, reporting features selected at least 50% of the time. Proteins identified by aptamer (Somalogic). Mean Log2(fold Δ) is the mean fold change from time 3 to time 4 for each group. Positive number indicates increase in the measure over time; negative number indicates decrease over time.

Selected Control (AbPos) Case (T1D) (%) Mean Log2(fold Δ) Mean Log2(fold Δ) DRR1 (Down-regulated in renal cell carcinoma 0.085 0.008 100 1)/Actin-associated protein FAM107A 98 RAD51 (DNA repair protein RAD51 homolog 1) 0.273 -0.042 98 Cystatin-F (CYTF) -0.160 0.109 97 MAPKAPK3 (MAP kinase-activated protein kinase 3) -0.082 0.146 90 Plasminogen 0.096 -0.045 89 IL-11 RA -0.120 0.022 86 Spondin-1 -0.165 0.051 84 Tropomyosin 2 0.090 -0.038 80 Carbonic Anhydrase X 0.190 -0.008 FCG2A (Low affinity immunoglobulin gamma Fc -0.045 0.079 80 region receptor II-a) 80 TPSB2 (Tryptase beta-2) -0.120 -0.015 MED-1 (Mediator of RNA polymerase II transcription -0.017 0.077 80 subunit 1) 79 Calcineurin -0.276 -0.040 FCG2B (Low affinity immunoglobulin gamma Fc -0.087 0.125 74 region receptor II-b) 73 Desmoglein-1 0.273 0.001 73 NET4 (Netrin-4) -0.159 0.032 IGFBP-1 (Insulin-like growth factor-binding -0.466 -1.237 72 protein 1) 71 Haptoglobin, Mixed Type 0.241 -0.211 71 uPA (Urokinase-type plasminogen activator) -0.194 -0.001 70 PAK3 (Serine/threonine-protein kinase PAK 3) 0.134 -0.062 70 Trypsin 2 0.005 -0.333 PTP-1B (Protein-tyrosine phosphatase 1B/Tyrosine- 0.171 -0.022 69 protein phosphatase non-receptor type 1) 69 IF4A3 (Eukaryotic initiation factor 4A-III) 0.023 -0.110 68 CCL17 (C-C motif chemokine 17)/TARC -0.035 -0.072 68 Discoidin domain receptor 2 -0.065 0.058 PAFAH beta subunit (Platelet-activating factor 0.035 0.107 68 acetylhydrolase IB subunit beta) 67 Proteinase-3 0.529 0.228 65 Kininogen, HMW 0.078 -0.042 64 Fractalkine/CX3CL-1 0.197 -0.015 64 ARP19 (cAMP-regulated phosphoprotein 19) -0.134 0.039 63 Trypsin -0.123 -0.463

33 Page 65 of 66 Diabetes

63 SLIK1 (SLIT and NTRK-like protein 1) 0.186 0.042 63 SH21A (SH2 domain-containing protein 1A) -0.164 -0.045 62 -Antichymotrypsin 0.075 -0.026 62 TrkC (NT-3 growth factor receptor) -0.093 0.002 62 Mn SOD (Superoxide dismutase [Mn], mitochondrial) -0.025 -0.177 61 MBL (Mannose-binding protein C) -0.134 0.153 61 Nucleoside diphosphate kinase A 0.360 0.150 60 Mitogen-activated protein kinase 14 (MAPK14) -0.167 0.023 58 EphB4 (Ephrin type-B receptor 4) -0.018 0.050 57 CD109 antigen -0.275 0.082 56 GPVI (Platelet glycoprotein VI) 0.253 0.096 MCP-4 (Monocyte chemotactic protein 4) / C-C motif -0.073 0.010 53 chemokine 13 53 Caspase-2 -0.026 -0.071 53 ARTS1 (Endoplasmic reticulum aminopeptidase 1) 0.897 0.384 52 Cardiotrophin I 0.006 -0.033 52 Kallikrein-6 0.173 0.008 SARP-2 (Secreted apoptosis-related protein 2)/ sFRP- -0.127 -0.001 51 1 (Secreted frizzled-related protein 1) NKp44 (Natural killer cell p44-related protein)/ 0.026 -0.016 50 Natural cytotoxicity triggering receptor 2 / CD336 LAG-1 (Lymphocyte activation gene 1 protein)/ C-C 0.063 -0.001 50 motif chemokine 4-like

34 Diabetes Page 66 of 66

Supplemental Table 13: Comparison of peptide features between cases and controls for progression from islet autoimmunity (IA) to T1D. Ranking by selection frequency, reporting features selected at least 50% of the time. Peptides identified by LC-MRM/MS. Mean Log2(fold Δ) is the mean fold change from time 3 to time 4 for each group. Positive number indicates increase in the measure over time; negative number indicates decrease over time.

Control (AbPos) Case (T1D) Selected (%) Peptide Mean Log2(fold Δ) Mean Log2(fold Δ) 71 SERPIND1 (Heparin 2) 0.039 -0.001 58 PPBP (Platelet basic protein) 0.077 0.103

35