ARTICLE

https://doi.org/10.1038/s42003-019-0324-7 OPEN Risk prediction models for dementia constructed by supervised principal component analysis using miRNA expression data

Daichi Shigemizu 1,2,3,4, Shintaro Akiyama1, Yuya Asanomi1, Keith A. Boroevich 3, Alok Sharma 3,4,5,6,

1234567890():,; Tatsuhiko Tsunoda 2,3,4, Kana Matsukuma7, Makiko Ichikawa7, Hiroko Sudo7, Satoko Takizawa7, Takashi Sakurai8,9, Kouichi Ozaki1,3, Takahiro Ochiya10,11 & Shumpei Niida1

Alzheimer’s disease (AD) is the most common subtype of dementia, followed by Vascular Dementia (VaD), and Dementia with Lewy Bodies (DLB). Recently, microRNAs (miRNAs) have received a lot of attention as the novel biomarkers for dementia. Here, using serum miRNA expression of 1,601 Japanese individuals, we investigated potential miRNA bio- markers and constructed risk prediction models, based on a supervised principal component analysis (PCA) logistic regression method, according to the subtype of dementia. The final risk prediction model achieved a high accuracy of 0.873 on a validation cohort in AD, when using 78 miRNAs: Accuracy = 0.836 with 86 miRNAs in VaD; Accuracy = 0.825 with 110 miRNAs in DLB. To our knowledge, this is the first report applying miRNA-based risk prediction models to a dementia prospective cohort. Our study demonstrates our models to be effective in prospective disease risk prediction, and with further improvement may contribute to practical clinical use in dementia.

1 Medical Genome Center, National Center for Geriatrics and Gerontology, Obu, Aichi 474-8511, Japan. 2 Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo 113-8510, Japan. 3 RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan. 4 CREST, JST, Tokyo 102-8666, Japan. 5 School of Engineering & Physics, University of the South Pacific, Suva, Fiji. 6 Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD 4111, Australia. 7 Toray Industries, Inc., Kamakura, Kanagawa 248-0036, Japan. 8 The Center for Comprehensive Care and Research on Memory Disorders, National Center for Geriatrics and Gerontology, Obu, Aichi 474-8511, Japan. 9 Department of Cognitive and Behavioral Science, Nagoya University Graduate School of Medicine, Nagoya, Aichi 466-8550, Japan. 10 Division of Molecular and Cellular Medicine, Fundamental Innovative Oncology Core Center, National Center Research Institute, Tokyo 104-0045, Japan. 11 Institute of Medical Science, Tokyo Medical University, Tokyo 160-8402, Japan. Correspondence and requests for materials should be addressed to D.S. (email: [email protected])

COMMUNICATIONS BIOLOGY | (2019) 2:77 | https://doi.org/10.1038/s42003-019-0324-7 | www.nature.com/commsbio 1 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0324-7

ith an increasingly aging global human population, the and the principal component scores (PC scores), we then con- Wnumber of people with dementia is rapidly increasing, structed risk prediction models based on a supervised principal and is estimated to reach 75 million by 2030 and component analysis (PCA) logistic regression method. Finally, we 135 million by 2050, worldwide1. Since dementia is a clinical determined the optimal miRNA and PC score set though cross- syndrome that leads to difficulties in daily activities involving validation. This final risk prediction model, constructed based on memory, language and behavior, this rapid increase raises a the entire discovery cohort, was evaluated with an independent substantial burden for medical care and public health systems2. validation cohort by the area under the receiver operating char- On the other hand, there is no current cure for this disease, and acteristic curve (AUC). We further evaluated the predictive ability the available treatments are only able to postpone the progres- of our model using a prospective cohort. Our findings indicate sion3. Therefore, identification of new biomarkers for earlier that the prediction models using serum miRNA expression data diagnosis and therapeutic intervention of the disease is promptly may be useful as biomarkers for dementia and contribute to the required4. development of future therapeutic measurement for this common The diagnosis of dementia is generally based on the patients’ but serious disorder. cognitive function5. Alzheimer’s disease (AD) is the most com- mon subtype of dementia, followed by vascular dementia (VaD), Results and dementia with Lewy bodies (DLB)1. While recent studies Japanese samples. We divided 1601 Japanese individuals (1021 have showed that three in the cerebrospinal fluid (CSF): AD cases, 91 VaD cases, 169 DLB cases, 32 mild cognitive amyloid-beta 1–42 (Aβ ), total tau (T-tau) and phosphorylated 142 impairment (MCI), and 288 NC) into a discovery cohort of 786 tau 181 (P-tau ), could be effective in characterizing AD6,7,itis 181 individuals (511 AD cases, 46 VaD cases, 85 DLB cases and 144 still challenging to use these CSF molecules as biomarkers in NC) and a validation cohort of 783 individuals (510 AD cases, 45 general physical examination for early diagnosis and therapeutic VaD cases, 84 DLB cases, and 144 NC) (see Materials and intervention due to the highly invasive collection process. In methods). The separation was performed to result in a similar addition, new imaging-based techniques, including positron distribution in the age between the discovery and validation emission tomography scans for detection of amyloid-beta cohorts for each disease (Table 1). deposition or tau tracers, and the volumetric magnetic reso- nance imaging with determination of hippocampal or medial temporal lobe atrophy, are not suitable for initial screening due to Construction of risk prediction models. Our risk prediction the high cost performance8–10. It has also been reported that models were constructed based on a supervised PCA logistic microRNAs play a key role in the control of glial cell development regression method. All approaches that we considered were car- in the central nervous system11. Therefore, the present study is ried out on datasets of the p pre-selected miRNAs (p ≤ 2562). The evaluated on the hypothesis that neurite and synapse destruction, selection of miRNAs was carried out based on the z-value in the associated with pathologic of dementia and other neurodegen- logistic regression. Nine-tenths of entire training set was used for erative diseases, can be detected in vitro by quantitative analysis the calculation of the z-values and to fit the model for each cross- of brain-enriched cell-free microRNA in the human blood5. validation step. The adjusted model was evaluated using the MicroRNAs (miRNAs) are approximately 22-nucleotide small remaining one-tenth of the training set. This process was repeated non-coding RNAs, which have been shown to regulate 10 times (10-fold cross-validation). The cutoff value T of the z- expression by binding to complementary regions of messenger values was then raised from 0.1 to 5.0 at an interval 0.1. The transcripts. The alteration of some miRNAs expression has number of top PC scores used, m, was set from 1 to 10 (Fig. 1). recently been found in neurons of patients with AD and other On the basis of the average AUC, we investigated all combina- neurodegenerative diseases12–14, and hence miRNAs are expected tions of the T and m, and in AD, the combination of (T, m) = to be useful as easily accessible and non-invasive biomarkers15. (4.5, 10) achieved the highest AUC of 0.877 in the discovery Here, we performed a comprehensive miRNA expression cohort. In VaD, a (T, m) = (4.0, 10) achieved an AUC = 0.923, analysis using 1601 serum samples, composed of dementia and in DLB, a (T, m) = (3.4, 9) achieved an AUC = 0.885 (Fig. 2). patients and individuals with cognitive normal function (referred Final risk prediction models were constructed based on the to as normal controls (NC)), in order to investigate new bio- optimal T and m detected in each disease using the entire training markers for earlier diagnosis and therapeutic intervention and to set (discovery cohort). The adjusted models were then evaluated construct risk prediction models using the biomarkers. We on the validation cohort, which was completely independent from applied 10-fold cross-validation to a discovery cohort of 1092 the discovery cohort. As a result, 78 miRNAs out of 2562 were individuals, separated from a validation cohort of 1089 indivi- employed for the final model construction in AD, which achieved duals. We performed a two-step procedure similar to those used an AUC of 0.874 in the validation cohort (Fig. 3a). Of the 78, two for risk prediction in several previous disease studies16–19.We miRNAs (MIMAT0004947 and MIMAT0022726) were AD- first selected effective miRNA biomarker candidates in the logistic specific miRNAs reported in previous studies20. The remaining regression risk prediction models. Using the pre-selected miRNAs previously reported miRNAs did not show significantly better

Table 1 Average age, sex and APOE information in the discovery and validation cohorts

Discovery cohort Validation cohort Phenotype #Sample Age Sex (Male) APOEa #Sample Age Sex (Male) APOEa AD 511 79.2 0.29 0.53 510 79.2 0.31 0.47 VaD 46 79.0 0.63 0.33 45 79.1 0.56 0.18 DLB 85 79.5 0.45 0.34 84 79.5 0.36 0.30 NC 144 71.7 0.49 0.22 144 71.8 0.56 0.15

APOE apolipoprotein E, AD Alzheimer’s disease, VaD vascular dementia, DLB dementia with Lewy bodies, NC normal control a APOE shows the average of the number of APOE ε4 allele genotype

2 COMMUNICATIONS BIOLOGY | (2019) 2:77 | https://doi.org/10.1038/s42003-019-0324-7 | www.nature.com/commsbio COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0324-7 ARTICLE

miRNA2 miRNA3 1. For each miRNA miRNA3 |z-value| > T

miRNA1

miRNA1

4. Evaluation of T and m 2. PCA using cross-validation

1 PC1 t 0 3. m PC scores

–1

0 1 x   logit(Pi ) = 1PC1 + ··· + mPCm, where PC = l x + ··· + l x i 1 1 n n PC2

Fig. 1 Workflow of our risk prediction model construction with supervised principal component analysis (PCA) logistic regression method. We calculated z-value in the logistic regression method for each microRNA (miRNA). The cutoff value of the z-value, T and the number of miRNAs, n, was pre-selected (1). The PCA was performed using the pre-selected miRNAs (2). The risk prediction models were constructed based on the combination of the miRNAs and m PC scores (3). This optimal parameter set (T, m) was determined in the discovery cohort using 10-fold cross-validation (4) outcome in logistic regression in the selection of miRNAs (Sup- were shared between VaD and DLB (32 miRNAs) and among plementary Data 1). A maximum average sensitivity and speci- all three (31 miRNAs) (Fig. 4a). AD possessed the most disease- ficity of the ROC curve was achieved at a sensitivity of 0.933 and specific miRNAs compared to the other diseases (AD; 29/78 = specificity of 0.660 in AD. The accuracy showed 0.873 when the 0.371, DLB; 34/110 = 0.309, VaD; 18/86 = 0.209) (Fig. 4a). prognostic index was 0.281 (Table 2). In a similar way, 86 miR- Overall, miRNAs regulate the expression of thousands of NAs and 110 miRNAs were employed for our final model con- -coding gene targets (mRNAs) at both post-transcriptional struction in VaD (Fig. 3b) and DLB (Fig. 3c), which achieved and translational levels21–23. To determine the biological AUCs of 0.867 and 0.870 in the validation cohort, respectively. A significance of our findings (miRNAs), we examined microRNA maximum average sensitivity and specificity of the ROC curve Target Prediction and Functional Study Database (miRDB24), was achieved at a sensitivity of 0.733 and specificity of 0.868 in which can predict miRNA functional target . The 78 VaD and at a sensitivity of 0.762 and specificity of 0.861 in DLB. miRNAs in AD were predicted to target 1755 genes. In the similar The accuracies in VaD and DLB were 0.836 and 0.825 when the way, the 86 miRNAs in VaD and 110 miRNAs in DLB were prognostic index was −0.761 and 0.0392, respectively (Table 2). predicted to target 2017 and 2521 genes, respectively. Compared We also constructed risk prediction models based on the with miRNAs, a large number of mRNAs were shared among logistic regression method using only clinical information (age, three diseases (miRNAs: 31/162 = 0.191, mRNAs: 960/3370 = sex and apolipoprotein E (APOE)) for the entire training data set. 0.285) (Fig. 4a, b). The adjusted models were then evaluated on the validation cohort. The AUCs achieved were 0.857 in AD, 0.813 in VaD and 0.827 in DLB. Our finding’s miRNAs contributed to an increase Functional modules using co-expression network analysis. of AUCs for the risk prediction models. We further compared our Since we detected several candidate gene targets in the three two-step method with a one-step penalized regression method, diseases, we next attempted to elucidate functional modules from LASSO (least absolute shrinkage and selection operator). We the candidates. We focused on the occurrence of hub genes, constructed risk prediction models based on the LASSO method which have relationships with many genes, through large-scale gene co-expression network analysis. The gene co-expression using all miRNAs in the entire training data set. The adjusted 25 models were then evaluated on the validation cohort. The AUCs information was gathered from the COXPRESdb database (see Materials and methods). Gene co-expression network visualiza- achieved were 0.898 in AD, 0.821 in VaD and 0.892 in DLB. For 26 AD and DLB, the one-step penalized regression method showed tion was performed using Cytoscape software . Three hub genes, similar AUCs to our two-step method, but the penalized method which co-expressed with >25 genes, were detected in the func- showed a lower AUC in VaD than our method (LASSO = 0.821, tional modules (EXOC5, DDX3X and YTHDF3, Fig. 5). EXOC5 our method = 0.867). was associated with AD and VaD, and the remaining two genes were common among the three diseases. These three genes were also verified to express in brain tissue through the Genotype- Effective miRNAs and the functional gene annotations. The Tissue Expression (GTEx) project27,28. number of miRNAs used for final risk prediction models were 78, 86 and 110 in AD, VaD and DLB, respectively (Supplemen- tary Data 2). We next examined the common and disease-specific Validation in a prospective cohort. We measured miRNA miRNAs among these three diseases. A large number of miRNAs expression for 32 MCI subjects, which were obtained from the

COMMUNICATIONS BIOLOGY | (2019) 2:77 | https://doi.org/10.1038/s42003-019-0324-7 | www.nature.com/commsbio 3 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0324-7

a AD |Z-value| AUC 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 0.1 0.2 0.4 0.6 0.8 1 1.2 1.4 2.7 0.88 PC1 0.87 PC2 0.87 PC3 0.87 PC4 0.86 PC5 0.86 PC6 0.86 PC7

#PC score 0.85 PC8 0.85 PC9 0.84 PC10 0.84 b VaD 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4 0.92 PC1 0.91 PC2 0.9 PC3 0.88 PC4 0.87 PC5 0.86 PC6 0.84 PC7 #PC score 0.83 PC8 0.81 PC9 0.8 PC10 0.79 c DLB 0.3 0.5 0.7 0.8 1 1.8 2 2.3 2.8 3 3.1 3.3 3.7 3.8 4 0.1 0.2 0.4 0.6 0.9 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.9 2.1 2.2 2.4 2.5 2.6 2.7 2.9 3.2 3.4 3.5 3.6 3.9 0.88 PC1 0.88 PC2 0.87 PC3 0.86 PC4 0.85 PC5 0.84 PC6 0.83 PC7 #PC score 0.82 PC8 0.82 PC9 0.81 PC10 0.8

Fig. 2 Risk prediction models using 10-fold cross-validation on a discovery cohort. The x and y axes show the cutoff value of the z-value in the logistic regression (T) and the number of top principal component (PC) scores used (m) for the prediction models, respectively. In AD (a), a combination of (T, m) = (4.5, 10) achieved the highest AUC of 0.877 in the discovery cohort. In VaD (b), a (T, m) = (4.0, 10) achieved an AUC = 0.923, and in DLB (c), a (T, m) = (3.4, 9) achieved an AUC = 0.885. AUC area under the curve, AD Alzheimer’s disease, VaD vascular dementia, DLB dementia with Lewy bodies

abcAD VaD DLB 1.0 1.0 1.0

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4 True positive rate True positive rate 0.2 0.2 0.2

0.0 0.0 0.0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate False positive rate False positive rate

Fig. 3 The ROC curves of our risk prediction models in a validation cohort. Final risk prediction models were constructed based on the supervised principal component analysis (PCA) logistic regression method in each disease using the complete discovery cohort. The adjusted models were then evaluated on the validation cohort. The final model construction in AD achieved an AUC of 0.874 in the validation cohort (a): AUC = 0.867 in VaD (b); AUC = 0.870 in DLB (c). Sensitivity and specificity were maximized at a sensitivity of 0.933 and specificity of 0.660 in AD (a): a sensitivity of 0.733 and specificity of 0.868 in VaD (b); and a sensitivity of 0.762 and specificity of 0.861 in DLB (c). ROC receiver operating characteristic, AUC area under the curve, AD Alzheimer’s disease, VaD vascular dementia, DLB dementia with Lewy bodies prospective data, of which 10 subjects converted to AD after at greater than 0.281 predicted the subject would convert to AD least 6 months. We evaluated if our risk prediction model in AD (Table 2). As a result, all of the 10 converted subjects were cor- could predict the converted subjects after 6 months. Prognosis rectly predicted by our model (sensitivity = 1.0). Furthermore, indices (PIs) assigned to each subject were calculated by applying all 4 subjects predicted not to covert to AD did not actually 78 miRNA expression values to our prediction model. A PI score convert to AD (negative predictive value = 1.0). The remaining

4 COMMUNICATIONS BIOLOGY | (2019) 2:77 | https://doi.org/10.1038/s42003-019-0324-7 | www.nature.com/commsbio COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0324-7 ARTICLE

on the validation cohort18,40. The difference of the AUCs is due Table 2 Accuracy estimation in three diseases using the fi validation cohort to over tting of the model construction criterion. However, our risk models showed only small differences between discovery and validation cohorts in the AUCs (AD = 0.877 and 0.874, Disease PI cutoff Accuracy Sensitivity Specificity VaD = 0.923 and 0.867, and DLB = 0.885 and 0.870). These AD 0.281 0.873 0.933 0.660 results imply the miRNAs used in our models were efficient VaD −0.761 0.836 0.733 0.868 DLB 0.0392 0.825 0.762 0.861 to classify disease samples and non-disease samples, although additional replication studies are necessary in future work. We PI prognostic index, AD Alzheimer’s disease, VaD vascular dementia, DLB dementia with Lewy also constructed the risk prediction models using a larger max- bodies imum value of PC scores (m = 50). In AD and DLB, the AUCs of the final models were slightly increased in a validation cohort in the combination of (T, m) = (3.6, 41) in AD and that ab of (T, m) = (3.2, 12) in DLB, compared with those in m = 10 =+ =+ VaD VaD (AD 0.007, DLB 0.002, Supplementary Table 1), but that 18 304 AD 122 AD in VaD was considerably decreased in the combination of 5 (T, m) = (3.8, 13) (VaD = −0.015, Supplementary Table 1). For 29 423 all, a larger number of miRNAs were required for the final model = = 32 31 631 960 construction: m 50 (AD, VaD, DLB) (171, 134, 143) miR- NAs, m = 10 (AD, VaD, DLB) = (78, 86, 110) miRNAs (Sup- 13 250 plementary Table 1). When considering prediction models with a low number of biomarkers, our approaches would be efficient 680 34 DLB DLB for optimal risk prediction models. We further compared our final models using pre-selected miRNAs to those using all miR- NAs. Our models using pre-selected miRNAs had superior AUCs Fig. 4 Effective microRNAs (miRNAs) and genes used in risk prediction to those using all miRNAs in all three diseases for both m = 10 fi model. a The number of miRNAs used for nal risk prediction models were and m = 50 (Supplementary Table 2). Investigations using larger 78, 86 and 110 in AD, VaD and DLB, respectively. The pie-chart showed the sample sizes will lead to further improvement in the performance fi common and disease-speci c miRNAs among these three diseases. b Using of risk prediction models. microRNA Target Prediction and Functional Study Database (miRDB), The annotation of gene targets for miRNAs is critical for which can predict miRNA functional target genes, the 78 miRNAs in AD functional characterization of our findings. We used miRDB24 for were predicted to target 1755 genes. The 86 miRNAs in VaD and 110 these functional gene annotation from miRNAs. A large number miRNAs in DLB were predicted to target 2017 and 2521 genes, respectively. of genes associated with the dementia was detected. We further fi The pie-chart showed the common and disease-speci c target genes elucidated three functionally important modules (i.e. hub genes, ’ among these three diseases. AD Alzheimer s disease, VaD vascular EXOC5, DDX3X and YTHDF3) through large-scale gene co- dementia, DLB dementia with Lewy bodies expression network analysis. These three genes were verified to express in brain tissue through the GTEx project27,28. Jun et al.41 18 subjects were predicted to convert to AD, but had not yet have reported that a single-nucleotide polymorphism (SNP) converted (specificity = 0.18) (Table 3). For further validation in the EXOC5 showed evidence for association with AD. DDX3X, of this discordance, we may have to follow-up with the subjects the DEAD (Asp-Glu-Ala-Asp) box helicase 3, X-linked, belongs in the future and use the additional comprehensive information, to ATP-dependent RNA helicase, the activation of which is including genetic data and/or whole transcriptome data, for fur- – associated with cancer in many tissues, including brain42 44. ther improvement for practical clinical use in dementia. Previous studies have reported that DDX3X expression level is positively correlated with poor survival outcome in human Discussion glioma45. Also, several studies have reported that YTHDF New biomarkers for early diagnosis and intervention have been protein could be associated with accumulation of m6A-modified examined in many diseases29–31. The role of serum miRNAs has transcripts46, and this m6A mRNA modification is critical for recently been reviewed with emphasis on their impact on the glioblastoma stem cell self-renewal and tumorigenesis47. Fur- etiopathogenesis of sporadic AD32 and cancers21–23,33,34. For thermore, recent transcriptomic meta-analyses revealed that AD example, dysregulated serum miRNAs, such as the down- and glioblastoma patients had similar expression patterns in a regulation of miR-137, miR-181c, miR-9, and miR-29a/b in the number of genes48. These observations support the existence blood of AD patients, have been identified35–38. It has also been of molecular substrates that could partially account for direct reported that expressional differences between AD and other co-morbidity relationships49–51. These results suggest that the dementia types were observed for some miRNAs39. However, three hub genes detected could not only play a key role in due to the small sample sizes in these previous reports, com- pathogenesis of dementia, but also contribute to discovery of prehensive miRNA expression analysis had not been performed novel drug targets. in AD or the other subtypes of dementia. Therefore, in this study The diagnosis of dementia is not always consistent with brain we investigated biomarkers with respect to each subtype of pathological changes52,53. Also, elderly dementia patients often dementia, using serum miRNAs and a larger sample size. have concomitant cerebrovascular disease as well as We first detected optimal parameters for risk prediction other concomitant neurodegenerative disease pathologies54.We models using cross-validation of a discovery cohort. The final proposed a methodology that finds the best risk prediction models were then constructed with the optimal parameters using model for each disease rather than a general model that could be the complete discovery cohort. The adjusted models were finally applied to any data set. Our proposed models might be able to evaluated on an independent validation cohort using AUC as differentiate these complex neurological disorders. However, the discriminative accuracy of these risk prediction models. In further refinement of this methodology will be required before its general, these risk prediction models on cross-validation of the practical use in healthcare. One way may be to consider genetic discovery cohort achieved higher AUCs than the adjust models variations, such as SNPs and insertions and deletions (indels)

COMMUNICATIONS BIOLOGY | (2019) 2:77 | https://doi.org/10.1038/s42003-019-0324-7 | www.nature.com/commsbio 5 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0324-7

TSC1

SPOP

MAP3K11

HECTD4 ARID1B

TRAFD1 FAM234A CEP97

CSNK1G2 ZCCHC3 CCDC43 LENG8

DGKZ

MGRN1 ORMDL3

PPP1R11

HRAS

TADA2B

MIEN1

MPRIP

PCGF3

CTCF USP32

CPSF7

BNIP3L

URM1 NAA60

MLXIP CHD8

CCP110 PPDPF MARK2 ATXN7L3

STK35

AGPAT1 SDC3 DCAF7

PLXNA1

SLC27A4 KMT2D

CAPN1

FAM222B

PGAP3

CIC AXIN1 PRKACA PTPA

PHF12 TOB1

LARP1

FBXO45

BMPR2

FAM177A1 CD276

SUPT6H

SSFA2 BCOR

SEC16A UBALD1

MAU2 PIP5K1C

TRAPPC2

FBXL5

ZC3H4 SEPT7

CPOX TBC1D10B RNF11

GTPBP2 DDX6 SRP54

KIAA0232 MAP4K5

PPP1R9B

YIPF6

ARID1A

CNOT3

KANSL3

PA2G4 MSL2 DYNC1LI2

SNX27

TOB2

TXLNG

PACS1

NPLOC4

SCAMP4 WDTC1 FMNL2

ZER1 BAZ2A RAVER1

ACTN4

RPS6KB1

AP1G1 SMURF2

HDGF

TFE3 C6orf62 CARM1 ATXN1L

TULP4

PRR14L ZNF609

TATDN3

PIP4K2A

ZNF780A HMGA1

USP37

ZYX

BICD2

HCFC1

SIN3A VEZF1 RPS6KA4 BMT2

PDPK1

HNRNPUL2 STK40 TBC1D22B

SSBP3 ARHGDIA ELF1

PRDM2

DENND4A KDM2A DKC1

LPGAT1

KHSRP

RAB5B

SPTBN1 EP300

SMARCD1 SAMD4B RPRD2 GCN1

GPD2 PEX5

TAPT1

WASF2

AGO1

ESCO1 MLEC MACO1

SRP68 GNAI2 MED14 KLHL12

LRRC59 ZFR ATXN2 RAB11B

STRN3

AKAP13

CDK12 PCYOX1 BCL2L1 DISP3 TADA1

PIK3R2

KDM5A

FXR2

MAP3K3

USP9X

SRRM1 SIAH1

ATXN2L EDEM1

RAP2A WNK1

ELMSAN1 ABCB10 PITHD1 PJA2 PPP1R10 NSUN2

DNAJB9

WASL

MIS12 SMARCC2 ASH1L TMEM87A

FAF2

YPEL5 AGO3

UBXN7 CREBBP

CAMKK2 LUZP1 MINK1

BAZ1A SCAMP5 KMT2C SLC8A3

RAB35

GIGYF1

DAZAP1 KIF2A EPC2

WDR37

TLK2 FAM76A GAPVD1 CHD4

LARP4B BOD1L1 PML CALR

GPATCH8 SOCS5 BRWD3

DYRK1A

CAPZB ABCD1 NIPBL

MOCS3 RYBP EFR3B ITSN2 PHF20

HECTD1

ARF3 MED13TAF4 ZFP91

MCFD2 BRD2

COCH THRAP3 ELK4 TMEM87B CCDC93

GFPT1

AKIRIN2 KAT6A

TSNAX

USF3 CWC25

PNPLA2

COQ10B

MED1

TAGLN2 C6orf106 RCOR3

HIPK1

ATXN7L3B

BICD1 ZBTB6

KCTD5

WDR26 KPNA6 TAF5L OAZ2 STX6

PUM1 PAPD7

PPP3R1 PHF8

GORASP2 HNRNPU XRN1

CYB5R3

TMEM39A UBE3B

UQCC1

NR1D2

ARIH1 UHRF2

SURF4

SZRD1

POLDIP3 ANKRD17

PATZ1 RALBP1

UBE3C

COL4A3BP

ARID4B RAB3IL1 PEA15

CDC42BPA RBM14 ARCN1

MFN2

BROX

DDA1 CCDC88A TMED5 LTN1

ILF3 KMT2E ANKRD12

SLC22A23 ZBTB43

MDGA1 USP10

PIK3CA

ATG14

ZFAND3 RBM41

JMY

PHC3

ASB7

RELA

SLC35A2 CFL2

ZNF654 HSPA13 MDM4

RAB11FIP2 NFAT5

RICTOR

INO80D LCOR SEC23A CBL

ERC1

MGA FRS2

AMMECR1L

SPIN1

DUSP3 WNT10B NCOA6 MIER1

ZDHHC5 FBXO11

PTBP1

YTHDC1

GOSR2 EIF4G2 FAM20B

SPRTN RSBN1 ZNF292

CSMD2 KPNB1 ZYG11B

NFRKB

MYBL1 LMNB2

SEC24A

KLHL28 DNAJC3

GABPB1

SEC22C YLPM1 HIF1AN SMAD5

SIKE1

SRPRA DSCR3 CUL1

CNOT2 SBNO1

SNRK

CREBRF PLEKHM3

ATRX

KCTD20

CRKL

SUCO YTHDF2 ATF6

KDM6A

LAMP1

EIF4G3

AGO2

NUFIP2

INPP5A RHOT1 FNBP4

CYLD JARID2 ZNF655 RIF1

SAMD8

UPF2

LNPK PIKFYVE DNAJB4 TOR1AIP2 HNRNPA2B1 HERC3

RASGEF1B

NRAS

FAM126B

DNASE2

KMT2A CASK WDR47

TAOK1 SMG1

UBE2J1 DDX3X SRSF5 TSC22D2

FKBP1A

C5orf24 ARID4A

PSME3

DESI2

KLHL24 CCNYL1 RMND5A

SFPQ GOLT1B BTAF1

FBXO28

PHIP

CORO1C TRRAP

FKBP14 CISD2

OSER1

TUBG1

SLC31A1 AP2B1

RNF38 MYCBP2

SPCS3

ITM2B

PAN3 TNPO1

PRDM10 OPA1

JUND SELENOI TRIM23

MARCH6 PATL1 UBQLN1 ATXN7

CSNK1G3

SLC30A7 UBE4A

CALU

EDC3 MAPK1IP1L

TMEM33

SENP7 BTRC

APH1A

IKZF5

ZBTB41 USP42 NFYA KPNA4 ELMOD2 OGFOD1 STRN

SEL1L SP4 FLT1

MIGA1

SSR3 ZCCHC8 ZFC3H1 ING3 ACTR1A MIER3

GMFB SPAG9 CDC73

GPBP1 IRF2BP2 NR2C2

FAM199X

NUCKS1

NAA16

CHD1

SYNJ1 RBM39

GNA13 TM9SF2

SOS2

DCUN1D1 RABL3 OGT ZNF12 SEPHS1 CSNK2A1

SRSF6 TPP2

PHAX

FNDC3A INTS6 YTHDF3

ELAVL1 ARIH2

KPNA1 BIRC6 CDK17 BRWD1 KAT6B SYNGAP1 TACC1 GTF2I

EEA1 MAPK14

RERE CLIC4 ARGLU1 RBM26

AMMECR1 POGZ ABHD17B TNRC6A CERS6 ZBTB21

RALGAPB

EPM2AIP1

UBE4B BZW1

BLZF1 RNF6 SPN SLC39A9

AP1S1 CEP120 POLR2E RSRC2

SAR1A PGGT1B

SRSF11

TBC1D15 SOGA1

SPTY2D1 PDE4C UBR3 LIN7C

ATF2

EPG5 PPM1B PAXBP1 RHOB MYSM1 ZMYM5 STX16 EXOC5 FBXW11 CPSF6 RAB28 UTRN

SAR1B

UACA

UBA6 KHDC4 DLGAP4 LUC7L3 PRR11

C5orf22 REEP3

ARL8B

CHUK

CDK13 PDAP1

AFF4 TMEM50B PANK3

TMEM167A ZC3H7B

UBQLN4 C20orf24

USP47 PTPN11

CNIH4 SLMAP TNKS2

ZNF644

PGF

CTDSPL2 PRPF40A TASP1

ESF1 SCAF11

ATF7IP

PTBP3

RAB6A EZH1 CNIH1 PUM2

ENSA DOCK5 SP3 IMPAD1 ARF6

BTG2 SS18

TRAM1 CBFB BBX SLF2

NONO

U2SURP

EIF5 TBL1XR1 ZRANB2

SETX DMC1 KBTBD2

ERLIN1

TMEM245 API5 JMJD6 UFM1 MAP3K2

WAPL

G3BP1

STAG1 SCYL2 RBMX

GABPA ZNF440 STXBP3

EDEM3 ABRAXAS2 EBLN2 LARP4

IL13RA1 RNF111 USP34 FBXL3 CREBZF DUSP1

OSGIN2 NAA50 ACTR2 GSPT1 USP24

MAPK1 SLC25A46

TMEM30A RBM15 UBR5

OPHN1

CCPG1 RAPGEF6

SEC24B AGPS

CELF1 GPATCH2L

TOR1AIP1 MBNL1 TM9SF3 CTNNB1

TARDBP SREK1 MTMR6 VPS35 THUMPD1

FOS KLF4 CAPRIN1 ZFAND6

ZMAT3

ZNF207 NUDT4 ATL2 MPZL1

HNRNPA3

SLC4A7

TMED2 KLF6 BTBD7 C2orf49 AEBP2 PRPF4B ATP6V1A SNX13

NPAT

CHD6

SRSF1 MAPRE1

MSANTD2 PCDHGA8

CPD PTP4A1 RREB1 SORT1

FAR1 NAA25 ARFGEF1 PHTF2

MBTD1

FOPNL RAB14

UBE2B SRSF2 ZMYND11

ATF3 SH3BGRL CREB1 ABHD13 SRP19 SLAIN2 PRKRA

PAPOLA

ZXDC JOSD1

NR4A2 PDS5B

YWHAZ CUL3

ATP6AP2

NAMPT

PAFAH1B1 AZIN1 UBE2D3

NAA15

ADAM10 SCOC ANKRD28

COMMD10 DDIT3 PPM1A

GGNBP2 ITPKB

DLG1 FBXO33

PCNP PPTC7

MAFF B3GNT2 SELENOT HIPK3 UBE2K RAB18

ZBTB44

MSI2

PLXDC1 DENND1B SRPK2 XPO1 ZNF202 CMTR2 STK4 TRPM7 TRA2A

EML4 CUL5

RAB21 BECN1 REST

PDE7A CGGBP1 CAB39

MAP3K8 KIF5B PTPN12

TFCP2 ZNF148 ARID2 FOSL2

MFN1

GXYLT1 SMNDC1 REEP5 PFKFB3 ACTR10

ARFIP1 CREM

MXD1 HACD3

DUSP2 HNRNPF

CTBS

PPHLN1 AP5M1 CNOT8 DAZAP2 RHOA KLHL15 BRCC3 GDAP2

CSNK1A1

STAG2

RNF7 RANBP9

STAM PAPD4 GPBP1L1

ACSL3

PNN

NAA30

PCDHGA10 PLK3 BHLHE40 SYNCRIP

SNX1

TAB2

ITPRIP

SPOPL

USP38 OAZ1 DPP8

RLIM PSME4

PPP1CA

TMEM161B ANKRD13A

FMR1 HNRNPC CBLL1

MAPK8 RB1 PPP2R5E

HAUS6

ATF1 RSBN1L SENP6

SET

ATF7 TRUB1 TMEM106B ARPC5 RFX7

CUL4B GIGYF2

BTG1

VIRMA

CRK

FGFR1OP2

SUMO2 NSF

ACTG1

TRA2B NCK1 TMEM260 DSTN

RAP1A

PCNX1

CDKN1A

SKP1 ZZZ3

HNRNPA0 VEGFA

SRSF3 NAB1 MARCH5

RALGAPA2

PCDHGC3

VAPA CCDC91

SHPRH MED13L ABI2

STAU1

AP4E1 KMT5B FAM76B

SUB1

TIA1 BNIP2

PAIP2

ZNF706 GSK3B PCDHGA3 PPP6R3

NT5C2 FXR1 RAP1B PPP1R15B

ZNF430

BRD4

TIMM17A

ZNF117

SOS1 NF1 TFDP2

CCDC126 IRF2

BIRC2 PGK1 PRKCI

ZNF107 ETF1

SERF2 CLASP2

RABEP1 MYO9A

RNF138 VAPB PNRC1

ZKSCAN8 SEC11A TIPRL ZNF138

QTRT2 RALYL ASXL1 DCP2

PPIP5K2

EIF2S1 TMCC1

VDAC1 CELF3 NUTF2

YBX1

PIGA

ADO

KATNBL1

ZNF468

PPP2CA

HSPD1 PCDHGA1 CAPZA1 MYNN

TFAM

SMIM7

ZKSCAN1

ZNF254 VGLL4

MAP2K1

GPI

UBE3A

RIT1 COPS8 PRKAA1 TBC1D23 KCMF1 ZNF765 LDHA

ZNF518A HSP90B1 CACYBP SCAI MAP2K4 SERINC3

ZMYM4

DIP2B

CCNC

RALGPS2 YWHAQ NFYB

UBE2D2

KRIT1 MITD1

CUL4A

TMEM183A HSPA9 ZNF267 SEC63

SLC35B3 WDR7 PPIF PI4K2B OXR1

RP2 LAMTOR3

ATP11C

COPS2 M6PR

CAAP1

MFSD6

NRBF2

VPS4B

THAP1

XRCC5

TIAL1

UBE2G1 PDCD6IP

DPM1 TCAIM

FGFR2

CLIP3 PURA

BTBD1 UTP15 DOCK7 VPS13A PCMTD1

GRSF1

CNBP

PHF21A DPY19L4

MAPK6 PGRMC2 TMEM151B PFDN4

RNF141 SLC35A5

VPS36 PHF6 RAP2C

CLSPN DTX3L

ZDHHC21 MAP1A

TRAPPC8

STX17 EIF3J HDAC2

SMARCAD1

CSDE1

SMAD2

DCAF10

RNF146 KRCC1

MCM10 SCO1

CHEK1

ABCE1

AGGF1

CREBL2

C18orf25

FEN1

EIF2S2 ANKRD46 RHEB

MBLAC2

NIP7

CDC6 PPP4R1 GINS3 FAM220A

ARMC1

TMEM151A

PARG ARAP2

KIAA1468 DUSP11 NDC1

C6orf89

SEZ6L2

FKTN

RFX3

LRRC57 PIAS2

SEH1L TAP1

MOB1B METAP1 CACUL1

SUN1

IDH3A

POU2F2

ZNF189

DHX33 UBE2W TNKS FAM149B1 PCMT1

LTBR

ZNF22 NACC1

CD3EAP

PEX3

STX7 TMEM248 GOLPH3 AKAP11AP3M1 GSK3A

WDR43

TRIM8 WDR3

PAK1IP1

BBOF1

CSTF3 ZNF507 ACSL4

NUDCD1

BBS10 VWA8

RANBP1 NHLRC3

CPEB3

FAM84A

STK3

CNOT7

ORC3 CLCN3

MED28

ELAVL3 ANAPC10

LTBP3

GSE1

KCNC2

FLT4 RALA

ARSK

MAK16

ERCC4

PTPRA

SLC6A2 JAG1

ADARB1 ASB1 ENY2

DLG4 DPY19L3 WBP1L

LRCH1

TFRC

SEMA6D DCBLD2

MEX3D

ADAMTS10

LANCL2 TPD52L3 SLC39A8

RASL10B ZNF302 VPS37A

ADPRHL2 MLLT1

PDE4A

FAM91A1

ZNF322 PCYT2

SNRPF

LSM6

MEX3A RTN4IP1

CDH15 MGAT5B SLC35E2B

ACAA2

ORC4 PSMB2 ERI1 BTF3L4

CDK4 ZFP62

ZNF439

UBE2J2

PTMS

ORAI2 ARHGAP23 KCTD9

RPRD1B EFR3A

SPTBN4 SIRT5 ZNF763

COQ3

VSX1

PNKD CNPY2

COX14

PSMC4

PDSS2

MRPL51

MTCH2

PRMT1 COQ5 EIF4E2

COA3

DYNLL1

PHF20L1

Fig. 5 Gene co-expression network analysis. Node size corresponds to the number of connected edges. The gene name is displayed for nodes with >25 edges. Node color corresponds to which diseases the gene is associated: AD (orange), VaD (blue), DLB (green), AD and VaD (pink), AD and DLB (yellow), VaD and DLB (purple), and all three (red). AD Alzheimer’s disease, VaD vascular dementia, DLB dementia with Lewy bodies

Table 3 Validation using the prospective cohort Clinical samples. All 1601 serum subjects and the associated clinical data were distributed from the NCGG Biobank, which collects human biomaterials and data for geriatrics research. Of them, 1021 subjects were patients with AD: Prospective cohort 91 patients with VaD, 169 patients with DLB, 32 patients with MCI and 288 subjects were normal controls with normal cognitive function (NC). NCs Conversion MCI to AD MCI to MCI Total who had subjective cognitive complaints, but normal cognition on the neu- Prediction MCI to AD 10 18 28 ropsychological assessment, were categorized as normal controls. The AD and MCI to MCI 0 4 4 MCI subjects were diagnosed with a probable or possible AD based on the criteria ’ 55,56 Total 10 22 32 of the National Institute on Aging Alzheimer s Association workgroups . We used the probable ADs as AD subjects in this study. The VaD and DLB MCI mild cognitive impairment, AD Alzheimer’s disease subjects were diagnosed based on the criteria of report of the NINDS-AIREN International Workshop57 and fourth report of the DLB Consortium58, respec- tively. The diagnosis of all subjects was conducted based on medical history, physical examination and diagnostic tests, neurological examination, neu- and gene expressions. The development of next-generation ropsychological tests and brain imaging with magnetic resonance imaging or sequencing technology has facilitated comprehensive analysis of computerized tomography by experts including neurologists, psychiatrists, geria- these genetic and expression data. There is no doubt that these tricians or a neurosurgeon, all experts in dementia who are familiar with its additional data would contribute to further improvement of our diagnostic criteria. Comprehensive neuropsychological tests included Mini-Mental State Examination (MMSE), Alzheimer’s Disease Assessment Scale Cognitive risk prediction models. Component Japanese version, Logical Memory I and II from the Wechsler Memory Scale–Revised, frontal assessment battery, Raven’s colored progressive matrices and Geriatric Depression Scale59. If necessary, dopamine transporter imaging and Materials and methods metaiodobenzylguanidine myocardial scintigraphy were performed for the diag- Ethics statements. This study was approved by the ethics committee of the nosis of DLB. Pathological tests and biomarkers in cerebrospinal fluid tests were National Center for Geriatrics and Gerontology (NCGG). The design and per- not used for the diagnosis of dementia. For all of the subjects, the status of the ε formance of current study involving human subjects were clearly described in a APOE 4 allele genotype (the major genetic risk factor with AD) and the MMSE research protocol. All participants were voluntary and completed informed consent score were obtained. All subjects were >60 years in age. All NC subjects had a in writing before registering to NCGG Biobank. MMSE score of >23.

6 COMMUNICATIONS BIOLOGY | (2019) 2:77 | https://doi.org/10.1038/s42003-019-0324-7 | www.nature.com/commsbio COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0324-7 ARTICLE miRNA expression. Serum samples were isolated from whole blood following the Code availability. We used open source program languages R (version 3.4.1), Ruby standard operating procedure of NCGG Biobank. In brief, blood samples tubes (version 2.4.0) and Python (version 3.5.1) to analyze data and create plots. Code is were gently inverted a few times, put in an upright-position for at least 30 min to available upon request from the corresponding authors. clot, and then centrifuged for 15 min at 3500 rpm at 4 °C. After centrifugation, serum was transferred to storage tubes containing 500 μl per tube and immediately Reporting summary. Further information on experimental design is available in stored in −80 °C freezers. Total RNA was extracted from a 300 μl serum sample the Nature Research Reporting Summary linked to this article. using a 3D‐Gene RNA extraction reagent from a liquid sample kit (Toray Indus- tries, Inc.), as previously described34. Comprehensive miRNA expression analysis was performed using a 3D‐Gene miRNA Labeling kit and a 3D‐Gene Human Data availability miRNA Oligo Chip (Toray Industries, Inc.), which was designed to detect 2562 All microarray data (2562 miRNAs) of this study are publicly available through the Gene miRNA sequences registered in miRBase release 21 (http://www.mirbase.org/). Expression Omnibus (GEO) database at the National Center for Biotechnology Infor- The normalization of miRNA expression was performed by the following steps. mation (NCBI) and accessible through GEO series accession number GSE120584. Mean and standard deviation (SD) were calculated using a set of pre-selected Datasets generated during the current study are available from the corresponding author negative control signals (background signals), the top and bottom 5% of which on reasonable request. were removed. Signal values greater than mean + 2 SD of the background signals were replaced using log2(signal–mean) and labeled effective signals. The remaining signal values were replaced by the minimum of the effective signals–0.1. Received: 21 May 2018 Accepted: 24 January 2019 Undetected signal values were replaced by the average signal of each miRNA signal. To normalize the signals across different microarrays, a set of pre-selected internal control miRNAs (miR-149-3p, miR-2861 and miR-4463), which had been stably detected in more than 500 serum samples, was used. Each miRNA signal value was standardized with the ratio of the average signal of the three internal control References miRNA signals34. 1. Robinson, L., Tang, E. & Taylor, J. P. Dementia: timely diagnosis and early intervention. BMJ 350, h3029 (2015). Risk prediction model construction. We calculated the z-value corresponding to 2. Haan, M. N. & Wallace, R. Can dementia be prevented? Brain aging in a the miRNA in the logistic regression model in each disease (AD, VaD and DLB, population-based context. Annu. Rev. Public Health 25,1–24 (2004). Fig. 1) in the following way: 3. Kim, D. H. et al. Genetic markers for diagnosis and pathogenesis of Alzheimer’s disease. Gene 545, 185–193 (2014). ðÞ¼α þ α ´ þ α ´ þ α ´ þ α ´ ; logit Pi 0 1 Sexi 2 Agei 3 APOEi 4 miRNAi 4. Spires-Jones, T. L. & Hyman, B. T. The intersection of amyloid beta and tau at synapses in Alzheimer’s disease. Neuron 82, 756–771 (2014). The z-value was the regression coefficient divided by its standard error. The cutoff 5. Sheinerman, K. S. et al. Plasma microRNA biomarkers for detection of mild value, T, of the z-value, and n, the number of miRNAs (n = 1, …, 2562), was pre- cognitive impairment. Aging 4, 590–605 (2012). selected (Fig. 1). Next, the PCA was performed using the pre-selected miRNAs. The 6. Fagan, A. M. et al. Comparison of analytical platforms for cerebrospinal fluid risk prediction models were constructed based on the combination of the miRNAs measures of beta-amyloid 1–42, total tau, and p-tau181 for identifying Alzheimer and PC scores as defined by Fig. 1: disease amyloid plaque . Arch. Neurol. 68,1137–1144 (2011). 7. De Meyer, G. et al. Diagnosis-independent Alzheimer disease biomarker ðÞ¼β ´ þ ¼ þ β ´ ; logit Pi 1 PC1 m PCm signature in cognitively normal elderly people. Arch. Neurol. 67, 949–956 (2010). where PC = l × x + … + l × x , and x is the normalized expression value of 8. Mistur, R. et al. Current challenges for the early detection of Alzheimer’s i 1 1 n n j – miRNAj. These calculations were iteratively performed for all combinations of disease: brain imaging and CSF studies. J. Clin. Neurol. 5, 153 166 (2009). cutoff values (T = 0.1, 0.2, …, 5.0) and the top PC scores (m = 1, …,10) (Fig. 1). 9. Miller, G. Alzheimer’s biomarker initiative hits its stride. Science 326, 386–389 This optimal parameter set (T, m) was determined in the discovery cohort using (2009). 10-fold cross-validation. The regression method used in this study was conducted 10. Schmand, B., Eikelenboom, P., van Gool, W. A. & Alzheimer’s Disease using the glmnet package in the statistical software R60. Neuroimaging Initiative. Value of neuropsychological tests, neuroimaging, and biomarkers for diagnosing Alzheimer’s disease in younger and older age – Evaluation of risk prediction models. All data were strictly separated into the cohorts. J. Am. Geriatr. Soc. 59, 1705 1710 (2011). 11. Zheng, K., Li, H., Huang, H. & Qiu, M. MicroRNAs and glial cell discovery cohort and validation cohort. An optimal parameter set (T, m) was – detected using 10-fold cross-validation in the discovery cohort with respect to each development. Neuroscientist 18, 114 118 (2012). disease (AD, VaD and DLB). Final models were constructed with the optimal 12. Satoh, J. MicroRNAs and their therapeutic potential for human diseases: ’ parameter sets using the complete discovery cohort. The adjusted models were aberrant microRNA expression in Alzheimer s disease brains. J. Pharmacol. – evaluated on an independent validation cohort. The receiver operator characteristic Sci. 114, 269 275 (2010). fi ’ (ROC) curves61 on the validation cohort and the AUC were used as the dis- 13. Cogswell, J. P. et al. Identi cation of miRNA changes in Alzheimer s disease criminative accuracy of the risk prediction models. In order to further apply these brain and CSF yields putative biomarkers and insights into disease pathways. final risk prediction models to prospective cohort data, we calculated prognostic J. Alzheimers Dis. 14,27–41 (2008). index in each sample as defined by: 14. Tacutu, R., Budovsky, A., Yanai, H. & Fraifeld, V. E. Molecular links between X cellular senescence, longevity and age-related diseases - a systems biology ¼ β ´ ; prognostic index i PCi perspective. Aging 3, 1178–1191 (2011). i 15. Femminella, G. D., Ferrara, N. & Rengo, G. The emerging role of microRNAs in Alzheimer’s disease. Front. Physiol. 6, 40 (2015). where β is the estimated regression coefficient of each PC score using a supervised 16. Bair, E. & Tibshirani, R. Semi-supervised methods to predict patient survival PCA logistic regression method in the discovery cohort. These optimal prognostic from gene expression data. PLoS Biol. 2, E108 (2004). fi indices were determined using a maximum average sensitivity and speci city of the 17. Kooperberg, C., LeBlanc, M. & Obenchain, V. Risk prediction using genome- ROC curve in the discovery cohort. wide association studies. Genet. Epidemiol. 34, 643–652 (2010). 18. Shigemizu, D. et al. The construction of risk prediction models using GWAS Target gene annotation of miRNAs. The functional gene annotation of miRNAs data and its application to a type 2 diabetes prospective cohort. PLoS One 9, was conducted using miRDB, which includes predicted gene targets regulated by a e92549 (2014). comprehensive 6709 miRNAs24. All the gene targets have a prediction score in the 19. Liang, Y. et al. Cancer survival analysis using semi-supervised learning range between 0 and 100 assigned by MirTarget V3, with a higher score repre- method based on Cox and AFT models with L1/2 regularization. BMC Med. senting more statistical confidence in the prediction result. Only gene targets with Genom. 9, 11 (2016). the score of >90 were used as functional gene annotation for our analysis. 20. Wu, H. Z. et al. Circulating microRNAs as biomarkers of Alzheimer’s disease: a systematic review. J. Alzheimers Dis. 49, 755–766 (2016). Gene co-expression network analysis. COXPRESdb25 provides gene co- 21. Heneghan, H. M., Miller, N., Kelly, R., Newell, J. & Kerin, M. J. Systemic expression relationships for 11 animal species (human, mouse, rat, monkey, dog, miRNA-195 differentiates breast cancer from other malignancies and is a chicken, zebrafish, fly, nematode, budding yeast and fission yeast). For all gene potential biomarker for detecting noninvasive and early stage disease. – pairs, Pearson’s correlation coefficients were calculated, and these values were Oncologist 15, 673 682 (2010). transferred to the Mutual Rank (MR) value62, which is the geometric average of 22. Asaga, S. et al. Direct serum assay for microRNA-21 concentrations in early asymmetric ranks in co-expressed gene lists. In this study, gene pairs with a MR < and advanced breast cancer. Clin. Chem. 57,84–91 (2011). 20 and Pearson’s correlation coefficients > 0.4 in human were used as co-expression 23. Roth, C. et al. Circulating microRNAs as blood-based markers for patients genes. The gene co-expression network was generated using Cytoscape v3.5.126. with primary and metastatic breast cancer. Breast Cancer Res. 12, R90 (2010).

COMMUNICATIONS BIOLOGY | (2019) 2:77 | https://doi.org/10.1038/s42003-019-0324-7 | www.nature.com/commsbio 7 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0324-7

24. Wong, N. & Wang, X. miRDB: an online resource for microRNA target 54. Kapasi, A., DeCarli, C. & Schneider, J. A. Impact of multiple pathologies on prediction and functional annotations. Nucleic Acids Res. 43, D146–D152 the threshold for clinically overt dementia. Acta Neuropathol. 134, 171–186 (2015). (2017). 25. Okamura, Y. et al. COXPRESdb in 2015: coexpression database for animal 55. McKhann, G. M. et al. The diagnosis of dementia due to Alzheimer’s disease: species by DNA-microarray and RNAseq-based expression data with multiple recommendations from the National Institute on Aging-Alzheimer’s quality assessment systems. Nucleic Acids Res. 43, D82–D86 (2015). Association workgroups on diagnostic guidelines for Alzheimer’s disease. 26. Shannon, P. et al. Cytoscape: a software environment for integrated models of Alzheimers Dement. 7, 263–269 (2011). biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). 56. Albert, M. S. et al. The diagnosis of mild cognitive impairment due to 27. Consortium, G. T. The Genotype-Tissue Expression (GTEx) project. Nat. Alzheimer’s disease: recommendations from the National Institute on Aging- Genet. 45, 580–585 (2013). Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s 28. Consortium, G. T. Human genomics. The Genotype-Tissue Expression disease. Alzheimers Dement. 7, 270–279 (2011). (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 57. Roman, G. C. et al. Vascular dementia: diagnostic criteria for research studies. 648–660 (2015). Report of the NINDS-AIREN International Workshop. Neurology 43, 29. Fang, C. et al. Serum microRNAs are promising novel biomarkers for diffuse 250–260 (1993). large B cell lymphoma. Ann. Hematol. 91, 553–559 (2012). 58. McKeith, I. G. et al. Diagnosis and management of dementia with Lewy 30. Mitchell, P. S. et al. Circulating microRNAs as stable blood-based markers bodies: fourth consensus report of the DLB Consortium. Neurology 89,88–100 for cancer detection. Proc. Natl. Acad. Sci. USA 105, 10513–10518 (2008). (2017). 31. Mizuno, H. et al. Identification of muscle-specific microRNAs in serum of 59. Kawai, Y. et al. Neuropsychological differentiation between Alzheimer’s muscular dystrophy animal models: promising novel blood-based markers disease and dementia with Lewy bodies in a memory clinic. Psychogeriatrics for muscular dystrophy. PLoS ONE 6, e18388 (2011). 13, 157–163 (2013). 32. Maes, O. C., Chertkow, H. M., Wang, E. & Schipper, H. M. MicroRNA: 60. R Development Core Team. R: A Language and Environment for Statistical implications for Alzheimer disease and other human CNS disorders. Curr. Computing (R Foundation for Statistical Computing, Vienna, 2009). Genom. 10, 154–168 (2009). 61. Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing 33. Zhu, W., Qin, W., Atasoy, U. & Sauter, E. R. Circulating microRNAs in breast classifier performance in R. Bioinformatics 21, 3940–3941 (2005). cancer and healthy subjects. BMC Res. Notes 2, 89 (2009). 62. Obayashi, T. & Kinoshita, K. Rank of correlation coefficient as a comparable 34. Shimomura, A. et al. Novel combination of serum microRNA for detecting measure for biological significance of gene coexpression. DNA Res. 16, breast cancer in the early stage. Cancer Sci. 107, 326–334 (2016). 249–260 (2009). 35. Geekiyanage, H., Jicha, G. A., Nelson, P. T. & Chan, C. Blood serum miRNA: non-invasive biomarkers for Alzheimer’s disease. Exp. Neurol. 235, 491–496 (2012). Acknowledgements 36. Bekris, L. M. et al. MicroRNA in Alzheimer’s disease: an exploratory study We thank NCGG Biobank for providing the study materials, clinical information and in brain, cerebrospinal fluid and plasma. Biomarkers 18, 455–466 (2013). technical support. This study was supported by the “Development of Diagnostic Tech- 37. Kumar, P. et al. Circulating miRNA biomarkers for Alzheimer’s disease. nology for Detection of miRNA in Body Fluids” grant from the Japan Agency for PLoS One 8, e69807 (2013). Medical Research and Development and New Energy and Industrial Technology 38. Kiko, T. et al. MicroRNAs in plasma and cerebrospinal fluid as potential Development Organization (to S.N., Grant Number JP17ae0101013). markers for Alzheimer’s disease. J. Alzheimers Dis. 39, 253–259 (2014). 39. Sorensen, S. S., Nygaard, A. B. & Christensen, T. miRNA expression profiles in Author contributions cerebrospinal fluid and blood of patients with Alzheimer’s disease and other D.S. and S.A. developed the method and performed the analyses; K.M., M.I., H.S. and types of dementia - an exploratory study. Transl. Neurodegener. 5, 6 (2016). S.T. performed the experiments of miRNA expression; Y.A., K.A.B., A.S. and T.T. pro- 40. Shigemizu, D. et al. The prediction models for postoperative overall survival vided the technical assistance; T.S. contributed to data acquisition and the analyses; D.S., and disease-free survival in patients with breast cancer. Cancer Med. 6, T. O., K.O. and S.N. wrote the manuscript and organized this work. All authors con- 1627–1638 (2017). tributed to and approved the final manuscript. 41. Jun, G. et al. Genome-wide scan suggested novel Alzheimer’s disease susceptibility genes by factoring influence of APOE. J. Alzheimer’s Assoc. 7, S187 (2011). Additional information 42. Bol, G. M. et al. Expression of the RNA helicase DDX3 and the hypoxia Supplementary information accompanies this paper at https://doi.org/10.1038/s42003- response in breast cancer. PLoS One 8, e63548 (2013). 019-0324-7. 43. Wu, D. W. et al. DDX3 loss by p53 inactivation promotes tumor malignancy via the /Slug/E-cadherin pathway and poor patient outcome in non- Competing interests: The authors declare no competing interests. small-cell lung cancer. Oncogene 33, 1515–1526 (2014). 44. Sun, M., Song, L., Zhou, T., Gillespie, G. Y. & Jope, R. S. The role of DDX3 in Reprints and permission information is available online at http://npg.nature.com/ regulating Snail. Biochim. Biophys. Acta 1813, 438–447 (2011). reprintsandpermissions/ 45. Hueng, D. Y. et al. DDX3X biomarker correlates with poor survival in human gliomas. Int. J. Mol. Sci. 16, 15578–15591 (2015). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in 46. Shi, H. et al. YTHDF3 facilitates translation and decay of N(6)- published maps and institutional affiliations. methyladenosine-modified RNA. Cell Res. 27, 315–328 (2017). 47. Cui, Q. et al. m(6)A RNA methylation regulates the self-renewal and tumorigenesis of glioblastoma stem cells. Cell Rep. 18, 2622–2634 (2017). Open Access This article is licensed under a Creative Commons 48. Sanchez-Valle, J. et al. A molecular hypothesis to explain direct and inverse Attribution 4.0 International License, which permits use, sharing, co-morbidities between Alzheimer’s disease, glioblastoma and lung cancer. Sci. adaptation, distribution and reproduction in any medium or format, as long as you give Rep. 7, 4474 (2017). appropriate credit to the original author(s) and the source, provide a link to the Creative 49. Lehrer, S. Glioblastoma and dementia may share a common cause. Med. Commons license, and indicate if changes were made. The images or other third party Hypotheses 75,67–68 (2010). material in this article are included in the article’s Creative Commons license, unless 50. Driver, J. A. et al. Inverse association between cancer and Alzheimer’s disease: indicated otherwise in a credit line to the material. If material is not included in the results from the Framingham Heart Study. BMJ 344, e1442 (2012). article’s Creative Commons license and your intended use is not permitted by statutory 51. Musicco, M. et al. Inverse occurrence of cancer and Alzheimer disease: a regulation or exceeds the permitted use, you will need to obtain permission directly from – population-based incidence study. Neurology 81, 322 328 (2013). the copyright holder. To view a copy of this license, visit http://creativecommons.org/ ’ 52. Kosunen, O. et al. Diagnostic accuracy of Alzheimer s disease: a licenses/by/4.0/. neuropathological study. Acta Neuropathol. 91, 185–193 (1996). 53. Dubois, B. et al. Research criteria for the diagnosis of Alzheimer’s disease: revising the NINCDS-ADRDA criteria. Lancet Neurol. 6, 734–746 (2007). © The Author(s) 2019

8 COMMUNICATIONS BIOLOGY | (2019) 2:77 | https://doi.org/10.1038/s42003-019-0324-7 | www.nature.com/commsbio