Large-scale informatic analysis to algorithmically identify blood biomarkers of neurological damage

Grant C. O’Connella,1, Megan L. Aldera, Christine G. Smothersa, and Julia H. C. Changa

aSchool of Nursing, Case Western Reserve University, Cleveland, OH 44106

Edited by Vincent T. Marchesi, Yale University School of Medicine, New Haven, CT, and approved July 9, 2020 (received for review April 23, 2020) The identification of precision blood biomarkers which can accurately From a logical perspective, candidate most ideally suited indicate damage to brain tissue could yield molecular diagnostics to serve as such biomarkers are those which display three pre- with the potential to improve how we detect and treat neurological dominant properties. First, they should exhibit highly enriched pathologies. However, a majority of candidate blood biomarkers for expression in brain tissue relative to other tissues, ensuring spec- neurological damage that are studied today are proteins which were ificity. Second, they should be highly abundant within brain tissue, arbitrarily proposed several decades before the advent of high- as lowly expressed proteins may not be released into circulation at throughput omic techniques, and it is unclear whether they repre- high enough levels to enable detection. Third, they should exhibit sent the best possible targets relative to the remainder of the ubiquitous expression across all brain regions, reducing the risk of proteome. Here, we leveraged mRNA expression data generated false negative diagnosis in the case of focal damage. from nearly 12,000 human specimens to algorithmically evaluate A large number of existing candidate blood biomarkers of neu- over 17,000 -coding in terms of their potential to pro- rological damage studied today are proteins which were arbitrarily duce blood biomarkers for neurological damage based on their ex- labeled as being brain specific decades ago (12–19); however, in pression profiles both across the body and within the brain. The many cases, their degree of enrichment in brain tissue has been circulating levels of proteins associated with the top-ranked genes poorly validated, especially in . Furthermore, a majority of were then measured in blood sampled from a diverse cohort of pa- these proteins were proposed as biomarkers without consideration tients diagnosed with a variety of acute and chronic neurological for brain abundance or expressional variability across brain regions. disorders, including ischemic stroke, hemorrhagic stroke, traumatic Additionally, because many of them were suggested before the brain injury, Alzheimer’s disease, and multiple sclerosis, and evalu- widespread availability and use of high-throughput omic techniques, ated for their diagnostic performance. Our analysis identifies several they have predominantly been studied in low-throughput investi- previously unexplored candidate blood biomarkers of neurological gations only considering a handful of targets. Due to the unsys- damage with possible clinical utility, many of which whose presence tematic manner in which these existing candidates have been in blood is likely linked to specific cell-level pathologic processes. proposed and investigated, it is currently unclear whether they Furthermore, our findings also suggest that many frequently cited represent the best possible biomarkers relative to the remainder of previously proposed blood biomarkers exhibit expression profiles the human proteome. which could limit their diagnostic efficacy. Thus, our goal was to systematically search the protein-coding genome to identify genes with the highest potential to produce molecular diagnostics | stroke | multiple sclerosis | traumatic brain injury | blood biomarkers of neurological damage. To do this, we lever- Alzheimer’s disease aged mRNA expression data generated from nearly 12,000 human specimens to algorithmically evaluate over 17,000 protein-coding ollectively, neurological disorders are the leading cause of genes in terms of a novel biomarker suitability score accounting Cdisability and second leading cause of death worldwide (1). The identification and development of precision blood bio- Significance markers of neurological damage could dramatically improve how we diagnose and treat these debilitating conditions, and ulti- The discovery and development of precision blood biomarkers mately reduce their burden. For example, it is well established which can accurately detect damage to brain tissue could that rapid and accurate diagnosis of acute neurological injuries transform how we diagnose and treat neurological patholo- such as stroke and traumatic brain injury during the early stages gies. In this study, we used mRNA expression data generated of care significantly reduces mortality and morbidity (2, 3). from thousands of tissue samples to algorithmically evaluate However, the symptom-based assessments that are currently nearly every protein-coding in the in used for recognition of such injuries during triage have limited terms of potential to produce blood biomarkers for neurolog- accuracy, and up to 35% of patients are misdiagnosed at initial ical damage based on expression profiles both across the body – clinician contact (4 8). In these acute conditions, the develop- and within the brain. This unprecedented analysis identifies a ment of biomarker-based screening tools with the ability to ac- plethora of previously unexplored candidate blood biomarkers curately detect neurological damage could substantially reduce which could have clinical utility for noninvasive diagnosis and rates of mistriage, enable earlier access to intervention, and monitoring of various common neurological conditions, in- improve patient outcomes (9). With respect to chronic neuro- cluding traumatic brain injury, stroke, and multiple sclerosis. degenerative diseases such as Alzheimer’s disease and multiple sclerosis, developing accurate blood biomarkers of neurological Author contributions: G.C.O. designed research; G.C.O. secured funding; G.C.O., M.L.A., damage could allow for more confident early diagnosis, nonin- and C.G.S. performed research; G.C.O. analyzed data; and G.C.O., M.L.A., C.G.S., and vasive tracking of disease progression, and real-time monitoring J.H.C.C. wrote the paper. of response to therapy (10, 11). The authors declare no competing interest. Due to its specialized function, the proteomic composition of This article is a PNAS Direct Submission. the brain is highly unique relative to other organs. Cellular dis- Published under the PNAS license. ruption of neural tissue results in the release of brain-specific 1To whom correspondence may be addressed. Email: [email protected]. proteins into the extracellular environment, and ultimately into This article contains supporting information online at https://www.pnas.org/lookup/suppl/ peripheral circulation. Thus, the detection of these proteins in the doi:10.1073/pnas.2007719117/-/DCSupplemental. blood can serve as a surrogate marker of neurological damage. First published August 6, 2020.

20764–20775 | PNAS | August 25, 2020 | vol. 117 | no. 34 www.pnas.org/cgi/doi/10.1073/pnas.2007719117 Downloaded by guest on October 2, 2021 for brain enrichment, brain abundance, and brain regional vari- annotated (SI Appendix, Fig. S1). In terms of algorithmic rank- ability. Then, to determine whether the top-ranked genes identi- ing, only 100 genes remained after filtering based on brain fold fied in our algorithmic analysis could code for proteins with the enrichment cutoff. The relationships between biomarker suit- potential to provide more detailed diagnostic information re- ability score, brain fold enrichment, brain abundance, and brain garding the specific cellular nature of pathology, we leveraged regional variability for these remaining genes are indicated in single-cell sequencing data generated from human brain tissue to Fig. 1B. The highest ranked genes according to biomarker suit- determine which cell populations the top-ranked genes are ability score generally exhibited a combination of high brain en- expressed within. Finally, in order to directly evaluate their diag- richment, high brain abundance, and low regional variability, while nostic potential, the circulating levels of proteins associated with lower ranked genes tended to exhibit lower levels of brain en- the top-ranked genes were measured in blood sampled from a richment, lower brain abundance, and higher regional variability. diverse cohort of patients diagnosed with a variety of acute and The genes associated with two well-studied candidate bio- chronic neurological disorders, including ischemic stroke, hemor- markers of neurological damage, glial fibrillary acidic protein rhagic stroke, traumatic brain injury, Alzheimer’sdisease,and (GFAP) (23), and basic protein (MBP) (24–26), ranked multiple sclerosis. in the top of the analysis at first and seventh, respectively. The Our collective analysis identifies several previously unexplored gene coding for light chain (NfL), another previ- candidate blood biomarkers of neurological damage with po- ously proposed and increasingly studied neurological damage bio- tential clinical utility, many of which whose presence in blood is marker (27), ranked 68th in the analysis. However, the remaining likely linked to specific cell-level pathologic processes. Further- 97 of the 100 top-ranked genes all coded for proteins which have more, our findings also suggest that several of the most fre- been largely unexplored as blood biomarkers to date, suggesting quently cited previously proposed blood biomarkers exhibit that there is a plethora of proteins with diagnostic potential which expression profiles which could limit their diagnostic perfor- have yet to be investigated. For example, the remaining 6 of the mance in many clinical-use scenarios. top 8 ranked genes, which code for oligodendrocytic myelin para- nodal and inner loop protein (OPALIN), metallothionein-3 (MT- Results 3), synaptosomal-associated protein 25 (SNAP-25), beta- Algorithmic Ranking of Biomarker Suitability. In order to discover (β-synuclein), heavy chain isoform 5A (KIF5A), and myelin- genes with optimal expression profiles to produce blood bio- associated oligodendrocyte basic protein (MOBP), all displayed markers of neurological damage, we systematically searched the optimal expression profiles, but their products have yet to be widely protein-coding genome to identify genes which exhibit high lev- evaluated for use in blood-based diagnostics. Surprisingly, genes els of expressional enrichment in brain tissue relative to non- associated with some of the most high-profile and frequently cited MEDICAL SCIENCES neural tissues, abundant expression within the brain, and low previously proposed candidate biomarkers of neurological damage variability in expression levels across brain regions. To do this, (10, 28–30), such as S100 calcium binding protein B (S100B), genomewide mRNA expression data were obtained via two neuron-specific enolase (NSE), ubiquitin carboxyl-terminal hydro- publicly available datasets. The first dataset originated from the lase isozyme L1 (UCH-L1), alpha-II (spectrin-αII), Tau, Genotype-Tissue Expression (GTEx) project (20), and was neurofilament heavy chain (NfH), prion protein (PrP), and amyloid generated via RNA sequencing of 7,906 postmortem normal beta (Aβ), all failed to meet the fold enrichment cutoff. human specimens harvested from 28 different nonneural tissues, as well as eight anatomically distinct regions of brain (SI Ap- Enrichment of the Top-Ranked Genes in Brain Tissue. Fig. 2 depicts pendix, Tables S1 and S2). The second dataset originated from the transcriptional expression levels of the top 50 algorithmically the Allen Brain Atlas (ABA) project (21), and was generated via ranked genes, along with those of genes associated with several microarray analysis of 3,702 normal postmortem normal human notable previously proposed candidate biomarkers, in each of brain specimens harvested from 232 anatomically distinct brain the 29 tissue types interrogated in the GTEx dataset. Genes regions (SI Appendix, Tables S3 and S4). Data were filtered to associated with three previously proposed biomarkers, GFAP, retain 17,650 genes which were detected in both datasets and MBP, and NfL, demonstrated modest to high levels of brain informatically annotated as coding for protein products based on enrichment, exhibiting 1,670-fold, 143-fold, and 119-fold higher the presence of an open reading frame. To determine how expression levels in brain specimens relative to specimens from specific the expression of each protein-coding gene is to brain nonneural tissues. However, genes associated with the remaining tissue, we calculated the level of fold enrichment in brain sam- previously proposed biomarkers which we examined displayed ples relative to samples from nonneural tissues within the GTEx relatively low levels of enrichment; transcripts associated with dataset. In order to determine how abundantly expressed each S100B, NSE, UCH-L1, spectrin αII, Tau, NfH, PrP, and Aβ were gene is within the brain, we further calculated the average ex- only enriched 3- to 23-fold in brain specimens, suggesting they pression levels across brain samples contained in the GTEx are not as brain specific as previously thought, and thus may be dataset. In order to determine how ubiquitously expressed each limited in their diagnostic specificity. Interestingly, some of the gene is across the brain, we used the Gini coefficient, a statistical most brain-enriched genes were those coding for proteins which measure of inequality (22), to assess variability in expression have yet to be widely investigated as blood biomarkers. For ex- levels across samples from different anatomical brain regions ample, the genes which code for OPALIN, β-synuclein, and within the ABA dataset. Genes were then filtered to only retain MOBP were transcriptionally enriched 2,430-fold, 475-fold, and those whose expression levels were enriched at least 100-fold 1,130-fold, respectively. This suggests that there are several within the brain, and the remaining genes were subsequently proteins which have yet to be studied as blood biomarkers which ranked for suitability to produce blood biomarkers of neuro- may offer equivalent or higher levels of diagnostic specificity logical damage based on a biomarker suitability score calculated relative to those which are currently being investigated. by taking the mean of unity-normalized brain fold enrichment, brain abundance, and inverse brain regional variability values Abundance of the Top-Ranked Genes in Brain Tissue. Fig. 3 depicts (Fig. 1A). the mean transcriptional expression levels of the top 50 algo- Clustering of samples in both datasets based on the expression rithmically ranked genes, along with those of genes associated levels of all 17,650 protein-coding genes using t-distributed sto- with several notable previously proposed candidate biomarkers chastic neighborhood embedding (t-SNE) produced distinct of neurological damage, in brain specimens of the GTEx dataset. clusters consistent with tissue type and brain region, indicating Genes associated with all of the previously proposed biomarkers that the processed data were of high fidelity and properly which we examined exhibited expression levels which fell above

O’Connell et al. PNAS | August 25, 2020 | vol. 117 | no. 34 | 20765 Downloaded by guest on October 2, 2021 A mRNA expression data from 17,650 protein coding genes: Genotype Tissue Expression RNAseq data (GTEx dataset): Allen Brain Atlas microarray data (ABA dataset): 7,906 post-mortem human specimens harvested from 29 discrete tissues including brain 3,702 post-mortem human specimens harvested from 321 anatomically distinct brain regions

Brain enrichment: Brain abundance: Brain regional variability: Fold enrichment in brain tissue vs non-neural tissue Average expression levels across brain specimens Variability in expression levels across brain regions (measured as fold difference) (measured as average TPM values) (measured with Gini coefficient)

Algorithmic assessment of genes to identify proteins with optimal properties for potential use as biomarkers of neurological damage:

Ranking of remaining genes using a biomarker suitability score equally considering Initital gene set brain fold enrichment, brain abundance, and the inverse of brain regional variability. Filtering using 100-fold brain enrichment cutoff

B Brain enrichment (log2 fold enrichment) Brain abundance (log2 TPM) Brain regional variability (Gini coefficient) Biomarker suitability score Rank 4 11 18 25 53 91 98 5 6 7 12 13 14 19 20 21 26 27 28 31 32 33 34 35 38 39 40 41 42 45 46 47 49 52 54 56 59 60 61 62 63 65 66 67 68 69 70 72 73 74 75 76 77 78 79 80 81 82 83 84 86 87 88 89 90 93 94 95 96 100 1 2 3 8 9 10 15 16 17 22 23 24 29 30 36 37 43 44 48 50 51 55 57 58 64 71 85 92 99 97

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

Arbitrary units 0.0 Ermin Opalin Brevican Neurocan 4 Stathmin-2 Tenascin R Tenascin Syndapin-1 Protein Complexin-2 Corticoliberin Synaptoporin Neuromodulin Beta-synuclein Protein CREG2 Protein CROC-4 Metallothionein-3 ELAV-like protein 3 ELAV-like Kelch-like protein 1 T-box brain protein 1 T-box Histamine H3 receptor Zinc transporter ZIP12 Oxytocin-neurophysin 1 Uncharacterized protein Uncharacterized protein Zinc finger protein ZIC 4 Zinc finger protein ZIC 1 Zinc finger protein ZIC 2 GTP-binding protein Rit2 Neurofilament light chain Myelin proteolipid protein Neuronal-specific septin-3 Transcription factor SOX-1 Transcription Glial fibrillary acidic protein Glutamate decarboxylase 2 Neurotensin receptor type 2 Transmembrane protein 235 Transmembrane F-box/LRR-repeat protein 16 GTP-binding protein Di-Ras2 Protein kinase C gamma type Homeobox protein orthopedia BarH-like 1 homeobox protein Transmembrane protein 151B Transmembrane Fez family zinc finger protein 2 Complement C1q-like protein 2 Kinesin heavy chain isoform 5A subunit alpha-2 5-hydroxytryptamine receptor 5A 5-hydroxytryptamine receptor 2C Vesicular glutamate transporter 1 Vesicular Neurogenic differentiation factor 6 Neurogenic differentiation Neurogenic differentiation factor 1 Neurogenic differentiation Neurogenic differentiation factor 2 Neurogenic differentiation Solute carrier family 12 member 5 Chondroitin sulfate proteoglycan 5 Metabotropic 3 Metabotropic glutamate receptor 5 Metabotropic glutamate receptor 4 Contactin-associated protein-like 4 Tetratricopeptide repeat protein 9B Tetratricopeptide Excitatory transporter 2 T-cell leukemia homeobox protein 3 T-cell Pro-Melanin-concentrating hormone Vasopressin-neurophysin 2-copeptin Vasopressin-neurophysin Synaptosomal-associated protein 25 APC membrane recruitment protein 2 Oligodendrocyte transcription factor 1 Oligodendrocyte transcription factor 2 Zona pellucida sperm-binding protein 2 Neuronal membrane glycoprotein M6-a Glutamate receptor ionotropic, NMDA 1 Glutamate receptor ionotropic, NMDA Neuron-specific calcium-binding protein Regulated endocrine-specific protein 18 Sodium/potassium/calcium exchanger 2 Pro-FMRFamide-related neuropeptide VF PH and SEC7 domain-containing protein 2 Melanin-concentrating hormone receptor 2 CaM kinase-like vesicle-associated protein Neuronal vesicle trafficking-associated protein 2 Neuronal vesicle trafficking-associated Nuclear receptor subfamily 2 group E member 1 Myelin-associated oligodendrocyte basic protein Gamma-aminobutyric acid receptor subunit delta Solute carrier organic anion transporter family 1A2 Cell cycle exit and neuronal differentiation protein 1 Cell cycle exit and neuronal differentiation Synapse differentiation-inducing gene protein 1-like Synapse differentiation-inducing Gamma-aminobutyric acid receptor subunit alpha-1 Gamma-aminobutyric acid receptor subunit alpha-3 Gamma-aminobutyric acid receptor subunit alpha-6 Gamma-aminobutyric acid receptor subunit alpha-5 Gamma-aminobutyric acid type B receptor subunit 2 Leucine rich repeats and transmembrane domains 2 Calcium/calmodulin-dependent protein kinase type 2 Gamma-aminobutyric acid receptor subunit gamma-1 Voltage-dependent calcium channel gamma-3 subunit Voltage-dependent Voltage-dependent calcium channel gamma-7 subunit Voltage-dependent Voltage-dependent calcium channel gamma-8 subunit Voltage-dependent Guanine nucleotide-binding protein subunit gamma-13 G-protein-activated inward rectifier potassium channel 3 V-set and transmembrane domain-containing protein 2B V-set Small conductance calcium-activated potassium channel ZP2 AVP MT3 OTP TNR OXT RIT2 ZIC2 CRH ZIC4 ZIC1 MBP TLX3 PLP1 NEFL TBR1 PSD2 NPVF SOX1 GFAP HRH3 NSG2 GAD2 HPCA SNCB KIF5A Gene GRM5 GRM3 BCAN GRM4 NCAN ERMN PMCH MOBP OLIG2 OLIG1 FEZF2 GRIN1 TTC9B KLHL1 NR2E1 C1QL2 CPLX2 NTSR2 KCNJ9 GAP43 HTR5A LRTM2 HTR2C GLRA2 STMN4 STMN2 CSPG5 CEND1 GNG13 KCNN1 SYNPR CREG2 AMER2 GPM6A MCHR2 PRKCG CAMKV GABRD DIRAS2 OPALIN FBXL16 C1orf61 ELAVL3 SLC1A2 RESP18 SNAP25 VSTM2B BARHL1 SEPTIN3 CACNG7 CACNG3 GABBR2 GABRA3 GABRA5 CACNG8 GABRA6 GABRA1 CAMK2A PACSIN1 GABRG1 SLC17A7 SLC24A2 SLC12A5 C2ORF80 SLCO1A2 TMEM235 CNTNAP4 NEUROD2 NEUROD6 NEUROD1 SLC39A12 C11ORF87 SYNDIG1L TMEM151B

Fig. 1. Algorithmic search strategy and resultant top-ranked candidate genes according to biomarker suitability score. (A) Experimental workflow used to algorithmically assess the protein-coding genome for candidate biomarkers of neurological damage. (B) Relationship between biomarker suitability score, brain fold enrichment, brain abundance, and brain regional variability for the top-100 ranked genes. Brain fold enrichment, brain abundance, and brain regional variability values are presented scaled between 0 and 1 using unity normalization.

the 90th percentile relative to all 17,650 protein-coding genes biomarkers, across specimens from each of the 232 brain regions included in our analysis, suggesting high levels of abundance in examined in the ABA dataset, with the degree of regional vari- brain tissue. Comparatively, many highly ranked genes coding for ability indicated by Gini coefficient. Gini coefficient values ranged proteins that have yet to be widely considered as blood bio- from 0.06 to 1.0 across all 17,650 protein-coding genes examined in markers exhibited similar or higher expression levels. For ex- the analysis, with values closer to 0 indicating nearly homogeneous ample, the genes which code for MT-3, SNAP-25, β-synuclein, expression across brain regions, and values approaching 1 indicating and KIF5A all displayed expression levels which fell above the extreme inequality in expression across brain regions. Many genes 99th percentile. From an analytical perspective, the fact that these associated with previously proposed biomarkers exhibited relatively genes exhibit such high transcriptional abundance in brain tissue is low expressional variability, indicated by Gini coefficient values encouraging in terms of the possibility that their protein products ranging from 0.11 to 0.36. However, a handful, including GFAP, could be detected in blood in the case of neurological damage. MBP, NfL, and NfH, were more variable, with respective measured Gini values of 0.43, 0.53, 0.47, and 0.60. Comparatively, many top Variability in Expression Levels of the Top-Ranked Genes across Brain ranked genes associated with proteins that have yet to be widely Regions. Fig. 4 depicts the transcriptional expression levels of the investigated as blood biomarkers exhibited an equivalent or lower top 50 algorithmically ranked genes, along with those of genes degree of expressional variability. For example, the genes which associated with other notable previously proposed candidate code for OPALIN, MT-3, SNAP-25, β-synuclein, and KIF5A

20766 | www.pnas.org/cgi/doi/10.1073/pnas.2007719117 O’Connell et al. Downloaded by guest on October 2, 2021 Colon Rank Stomach Vagina Uterus Kidney Protein Fold enrichment Pancreas Testis Blood Nerve Pituitary gland Lung Thyroid Skeletal muscle Salivary gland Adipose Skin Cervix Esophagus Artery Gene Prostate gland Adrenal gland Breast Spleen Ovary Small intestine Heart Liver Bladder Brain 1 GFAP Glial fibrillary acidic protein 1670.3 2 OPALIN Opalin 2430.3 3 MT3 Metallothionein-3 197.0 4 SNAP25 Synaptosomal-associated protein 25 282.0 5 SNCB Beta-synuclein 475.0 6 KIF5A Kinesin heavy chain isoform 5A 339.6 7 MBP Myelin basic protein 143.1 8 MOBP Myelin-associated oligodendrocyte basic protein 1160.7 9 NCAN Neurocan 687.4 10 CEND1 Cell cycle exit and neuronal differentiation protein 1 119.6 11 STMN2 Stathmin-2 163.4 12 C1ORF61 Protein CROC-4 564.6 13 GABRA1 Gamma-aminobutyric acid receptor subunit alpha-1 746.2 14 PLP1 Myelin proteolipid protein 111.2 15 GRIN1 Glutamate receptor ionotropic, NMDA 1 204.2 16 PSD2 PH and SEC7 domain-containing protein 2 165.3 17 TNR Tenascin R 143.6 18 ELAVL3 ELAV-like protein 3 187.9 19 BCAN Brevican 163.2 20 KCNJ9 G-protein-activated inward rectifier potassium channel 3 351.9 21 SEPTIN3 Neuronal-specific septin-3 118.8 22 NSG2 Neuronal vesicle trafficking-associated protein 2 131.8 23 CSPG5 Chondroitin sulfate proteoglycan 5 110.4 24 GAP43 Neuromodulin 132.9 25 STMN4 Stathmin 4 246.3 26 C2ORF80 Uncharacterized protein 550.5 27 HTR5A 5-hydroxytryptamine receptor 5A 220.1 28 PACSIN1 Syndapin-1 148.4 29 CPLX2 Complexin-2 116.3 30 CACNG7 Voltage-dependent calcium channel gamma-7 subunit 183.7 31 SLC1A2 Excitatory amino acid transporter 2 271.5 32 GPM6A Neuronal membrane glycoprotein M6-a 110.4 33 AMER2 APC membrane recruitment protein 2 134.1 34 OLIG1 Oligodendrocyte transcription factor 1 278.0 35 GRM3 Metabotropic glutamate receptor 3 231.2 36 CACNG3 Voltage-dependent calcium channel gamma-3 subunit 851.2 37 HRH3 Histamine H3 receptor 181.1 38 OLIG2 Oligodendrocyte transcription factor 2 263.9 39 RIT2 GTP-binding protein Rit2 335.7 40 SYNPR Synaptoporin 454.7 41 HPCA Neuron-specific calcium-binding protein 469.5 42 FBXL16 F-box/LRR-repeat protein 16 114.0 43 NEUROD6 Neurogenic differentiation factor 6 1134.4 44 TMEM235 Transmembrane protein 235 487.5 45 TTC9B Tetratricopeptide repeat protein 9B 281.9 46 GABBR2 Gamma-aminobutyric acid type B receptor subunit 2 126.7 47 GABRA3 Gamma-aminobutyric acid receptor subunit alpha-3 104.5 48 CNTNAP4 Contactin-associated protein-like 4 166.4 49 DIRAS2 GTP-binding protein Di-Ras2 145.4 50 SLC12A5 Solute carrier family 12 member 5 207.6 68 NEFL Neurofilament light chain 119.1 >100 S100B S100 calcium binding protein B 15.0 >100 ENO2 Neuron-specific enolase 16.8 >100 UCHL1 Ubiquitin carboxyl-terminal hydrolase isozyme L1 23.4

>100 SPTAN1 Alpha-II spectrin 2.6 MEDICAL SCIENCES >100 MAPT Tau 14.4 >100 NEFH Neurofilament heavy chain 3.8 >100 PRNP Prion protein 4.0 >100 APP Amyloid-beta precursor protein 2.6 MIN EXPRESSION MAX EXPRESSION

Fig. 2. Transcriptional enrichment of the top-ranked candidate genes in brain tissue. Transcriptional expression levels of the top-50 algorithmically ranked genes, along with those of genes associated with several notable previously proposed candidate biomarkers of neurological damage, in each of the 29 tissue types interrogated in the GTEx dataset. Levels of fold enrichment in brain tissue relative to nonneural tissues are indicated.

displayed respective Gini coefficient values of 0.40, 0.16, 0.25, 0.21, S100B, PrP, and spectrin-αII exhibited less isolated cellular ex- and 0.29. This suggests that there are several yet to be explored pression profiles, suggesting they may not be able to provide similar candidate blood biomarkers which may exhibit enough homoge- contextual value. neity in brain expression to allow for detection of neurological In order to determine the subcellular distribution of proteins damage across a variety of brain regions with relatively equivalent associated with the candidate genes, we utilized the Compart- levels of diagnostic sensitivity. ments database developed by Binder et al. (32) to retrieve sub- cellular localization confidence scores generated from an Cellular and Subcellular Expression Profiles of Top-Ranked Genes. In aggregate of sequence-based prediction, literature text-mining, an effort to provide biological context relevant to the use of their and high-throughput microscopy-based screening. A majority of protein products as biomarkers, we examined the cellular and the top-ranked genes generated in our analysis code for proteins subcellular expression profiles of the top-50 ranked genes, as associated with the plasma membrane, cytosol, or , well as genes associated with other notable previously proposed and few coded for nuclear or secreted proteins (Fig. 5B). This is candidate markers of neurological damage. encouraging in terms of their use as biomarkers, as secreted In order to determine which cell populations the candidate proteins are more likely to be found in the blood in the absence genes are expressed by within the brain, we leveraged a publicly of cellular damage, limiting diagnostic specificity, and proteins available single-cell RNA-sequencing dataset originally generated with predominantly nuclear localization may not easily diffuse by Darmanis et al. (31) to examine their transcriptional expression from damaged cells into the extracellular environment, limiting levels in five distinct types of cells isolated from surgically resected their ability to be detected. Furthermore, in the case of some brain tissue harvested from a mix of living adult human donors (SI proteins associated with top-ranked genes, their highly specified Appendix,TablesS5andS6). A majority of the top-ranked genes subcellular localization, when considered along with their cellu- exhibited relatively cell-specific expression profiles (Fig. 5A). For lar expression profiles, could allow them to provide even more example, the genes coding for GFAP and MT-3 were predomi- detailed pathophysiological context as biomarkers. For example, nantly expressed by astrocytes, while the genes coding for SNAP-25, KIF5A is a cytoskeletal motor protein which we β-synuclein, and KIF5A were predominantly expressed by neurons. observed as being primarily expressed by neurons; thus it is Likewise, the genes coding for OPALIN, MBP, and MOBP were possible that the presence of KIF5A in the blood may be asso- predominantly expressed by oligodendrocytes. This suggests that it ciated with axonal damage specifically. is possible that the presence of any of these given proteins in the blood could be diagnostically attributable to damage to a specific Circulating Levels of Top-Ranked Candidate Biomarkers in Patients with brain cell population, which could provide pathophysiological con- Neurological Disorders. In order to directly evaluate their potential text that adds to their clinical value as biomarkers. Genes associated for use as blood biomarkers, ELISA was used to measure the with multiple notable previously proposed biomarkers including serum levels of proteins associated with the top-eight ranked

O’Connell et al. PNAS | August 25, 2020 | vol. 117 | no. 34 | 20767 Downloaded by guest on October 2, 2021 12 Mean ± SD 100 11 10 9 8 99 7 6 5 90 4 3 50 2 1 Percentile (genome-wide) Brain expression (log2 TPM) 0

t r i n 3 2 n n a 1 4 3 1 n in in i in in i 1 3 6 2 1

-3 l l s2 se to r 2 r r Protein: 5A ain e n 3 16 n L n a- 6-

u un t i ca a a Tau or r h i l n 235 ctri to t b ote o e M ocan in 9B v tei c tein tein cha R o r rote

pto cto exin- lph me scin R scin l Opalin o t i- protein p prote te pto ROC-

pr r ecep fa rotein B rotein su rotein 2 rotein spe fac ce d p t a g

Bre na p C 7 subunit 2 otein p or n n e ce

mp Neur p - n r r i -II soform 5A n s o in D o ipid Stathmin 4 e fic septin-3 fic Stathmin-2 a ze i l Syndapin-1 unit alpha-1 Te i i o ti isozy r protein H3 r o te Prion Co te te ci ing buni at Synaptoporin m tein tein i e e o Neuromod a e n t ription fac i o lpha r n n ated protein 25 co Beta-synuclein pecific en pecific tiati ecur m r i su ane p c ote ment ligh A r m ily 12 memberily 5 ta r s n p lase Metallothionei binding r sub P a

m br p gly e g o p ta notropic, NMDA NMDA notropic, m

r r a tor fil to l ga ELAV-like pro ELAV-like a u feren ans o n e m l on- Myelin basicMyelin protein d o f i e r l r stami ulfate proteoglycan 5 y i -associated prot character ypt di associ ur an bet r cep - g rofilament heavy H nn n e transcrip diffe s e recruitment protein 2 ye me t e te t te t

c bindin ecep u l a i calcium-bind n n br U in heavy chain heavy y - M i Neu GTP-binding protein Rit2 N e a peptide repeat pro n k n nal h oid- m mal Neuronal-sp oxy l Glial fibrillaryGlial acidic protein N ans tropic g r omain-con -box/LRR-repeat protei on cid r ico r o d d so F Tr GTP acid re ecific y my 100 calcium roge ndroit raffic ium ch h atr eu termi A u c S o endrocy t - ic a Kinesi al me sp c c acid type B receptor n C7 C7 tr ri l r ate receptor i - d olute carrier fa 5 le ri y E Ne n d ty Ch S y Metab c m Te Contactin-associated protein-like 4 i ca ron o go n ut i Excitatory amino acid transporter 2 l ut s bu boxyl- Synapto inward rectifier potassium channe potassium rectifier inward b eu r o Oligodendroc O APC membra APC it a d N n x and S i inob Neur Gluta ca

ino

e m m n e a ti PH i u a-a ependent a- q d i m m l cycl ma-am l Myelin-associated oligodendrocyte basic protein Neuronal ve Neuronal m m e Ub m C a Ga Ga G oltage-dependent calcium channel gamma-3 subunit oltage- V V G-protein-activate 4 3 9 0 1 1 1 2 1 2 1 3 6 6 2 2 5 5 L P P T A A B R N J 8 4 3

Gene: L3 A P1 N P4 A S 5A 6 0 9B G3 D G D2 M3 R2 R G N P O P N A BP BP I IG2 2 F F61 LX P 0 L A EF MT3 A AN R N FAP E R5 M T A R RIT2 M O 1 AV P R P T C1A2 N T HRH3 EN PS NSG2 SP YNPR C G NEFH EN PR G KIF SNCB HPCA MA NCAN ROD BC GRIN1 OL OL M S EM L NAP25 C P GA TTC KCN STM UCHL OR H STMN2 ABR S C EPTIN3 AM C GP ABB DIR FBXL1 EL OPALIN S 2O S 1 M S S GAB G CA CACNG7 PACSIN1 G EU SLC12A5 T C C CNTN N 1 2 3 4 5 6 7 8 9 4 5 6 7 7 7 8 9 0 0 1 1 3 2 2 0 5 3 6 0 3 Rank: 2 00 0 1 2 2 4 15 3 2 16 3 28 17 18 29 48 1 39 30 40 2 49 41 5 3 42 2 4 68 3 2 33 44 1 34 4 2 35 4 11 24 1 1 36 100 >100 > >100 >100 >1 >1 >100 >100

Fig. 3. Transcriptional abundance of top-ranked candidate genes in brain tissue. Average transcriptional expression levels of the top-50 algorithmically ranked genes, along with those of genes associated with several notable previously proposed candidate biomarkers of neurological damage, in brain specimens of the GTEx dataset.

candidate genes in a diverse cohort of patients with acute and neurological damage. Conversely, circulating levels of MT-3 chronic neurological damage, as well as in a group of neurologi- were only significantly elevated in patients with acute ischemic cally normal controls. Subjects with neurological damage included stroke (Fig. 6C); this observation, taken with prior reports that patients with definitive clinical diagnoses of traumatic brain injury MT-3 is robustly up-regulated in astrocytes and neurons in re- (TBI, n = 13), ischemic stroke (IS, n = 43), hemorrhagic stroke sponse to ischemic and hypoxic conditions (33–35), suggests MT- (HS, n = 5), Alzheimer’s disease (AD, n = 20), and multiple 3 may have utility for detecting ischemia specifically. Likewise, sclerosis (MS, n = 20). In cases of acute neurological damage such elevations in circulating levels of MBP, MOBP, and OPALIN as traumatic brain injury and stroke, blood was collected imme- were most dramatic in patients with multiple sclerosis compared diately upon hospital admission. Neurologically normal controls to patients with other neurological conditions (Fig. 6 B, G, and (n = 85) included patients coenrolled in various other investiga- H), suggesting that they may have specific utility for detecting tions of chronic disease who were deemed neurologically normal demyelinating pathologies. by a trained clinician. Patients with neurological damage and Correlational analysis further supported the notion that the control subjects were relatively well matched in terms of gender presence of these proteins in the blood could be linked to spe- and the prevalence of common comorbidities such as diabetes, cific cellular pathology. The serum levels of the eight proteins all dyslipidemia, and hypertension. The collective group of patients exhibited varying degrees of positive correlations with each other with neurological damage were also of similar age as control sub- across all subjects after controlling for clinical diagnosis; how- jects, although ischemic stroke and Alzheimer’s disease patients ever, hierarchal clustering of the proteins based on covariance tended to be older, while multiple sclerosis patients tended to be produced perfect clustering according to the specific cell pop- younger (SI Appendix,TableS7). However, we observed no signif- ulations their transcripts were predominantly expressed by in our icant correlations between the serum levels of any of the eight earlier single-cell analysis. For example, MBP, OPALIN, and proteins and age after controlling for clinical diagnosis (SI Appendix, MOBP, which displayed predominant transcriptional expression Table S8); this suggests that age was a relatively small contributor to by oligodendrocytes, distinctly clustered with each other based the overall variance in the serum levels of these proteins in our on the correlation between their circulating levels. Similarly, study population, and that any modest intergroup age differences SNAP-25, β-synuclein, and KIF5A, which displayed predominant were unlikely to meaningfully confound downstream analyses. transcriptional expression by neurons, formed a second distinct The levels of all eight proteins were significantly elevated in cluster, while MT-3 and GFAP, which displayed predominant the serum of patients with neurological damage relative to transcriptional expression by astrocytes, formed a third distinct neurologically normal controls, providing evidence that they are cluster (Fig. 6I). This overall pattern of correlation, taken with in fact released into circulation as a result of damage to brain patterns of blood elevations we observed across different neuro- tissue. However, some elevations were condition specific, while logical conditions, provides evidence that presence of these proteins others were more ubiquitous. For example, circulating levels of in the blood may be attributable to specific pathophysiological GFAP, KIF5A, and SNAP-25, were significantly elevated across processes affecting specific populations of cells. In particular, this nearly all neurological conditions (Fig. 6 A, D, and F), suggesting may suggest that increased blood levels of MBP, MOBP, and their presence in blood may serve as a general marker of OPALIN are a result of myelin damage, that increased blood levels

20768 | www.pnas.org/cgi/doi/10.1073/pnas.2007719117 O’Connell et al. Downloaded by guest on October 2, 2021 Diencephalon Myencephalon Telencephalon Metencephalon Mesencephalon Sulci and spaces Cerebral Cerebral nuclei cortex Rank Max fold enrichment Basal ganglia Thalamus Hypothalamus Cerebellum Pons Tectum Tegmentum Medulla Insula Occipital lobe Limbic lobe Parietal lobe Gene Epithalamus Pretectum Ventricle Protein Frontal lobe lobe Temporal Basal forebrain Claustrum Cingulum bundle Subthalamus Amygdala Corpus collosum 1 GFAP Glial fibrillary acidic protein 0.43 2 OPALIN Opalin 0.40 3 MT3 Metallothionein-3 0.16 4 SNAP25 Synaptosomal-associated protein 25 0.25 5 SNCB Beta-synuclein 0.21 6 KIF5A Kinesin heavy chain isoform 5A 0.29 7 MBP Myelin basic protein 0.53 8 MOBP Myelin-associated oligodendrocyte basic protein 0.48 9 NCAN Neurocan 0.26 10 CEND1 Cell cycle exit and neuronal differentiation protein 1 0.13 11 STMN2 Stathmin-2 0.19 12 C1ORF61 Protein CROC-4 0.23 13 GABRA1 Gamma-aminobutyric acid receptor subunit alpha-1 0.29 14 PLP1 Myelin proteolipid protein 0.51 15 GRIN1 Glutamate receptor ionotropic, NMDA 1 0.19 16 PSD2 PH and SEC7 domain-containing protein 2 0.16 17 TNR Tenascin R 0.15 18 ELAVL3 ELAV-like protein 3 0.19 19 BCAN Brevican 0.20 20 KCNJ9 G-protein-activated inward rectifier potassium channel 3 0.26 21 SEPTIN3 Neuronal-specific septin-3 0.22 22 NSG2 Neuronal vesicle trafficking-associated protein 2 0.23 23 CSPG5 Chondroitin sulfate proteoglycan 5 0.19 24 GAP43 Neuromodulin 0.27 25 STMN4 Stathmin 4 0.27 26 C2ORF80 Uncharacterized protein 0.36 27 HTR5A 5-hydroxytryptamine receptor 5A 0.24 28 PACSIN1 Syndapin-1 0.27 29 CPLX2 Complexin-2 0.30 30 CACNG7 Voltage-dependent calcium channel gamma-7 subunit 0.26 31 SLC1A2 Excitatory amino acid transporter 2 0.36 32 GPM6A Neuronal membrane glycoprotein M6-a 0.29 33 AMER2 APC membrane recruitment protein 2 0.24 34 OLIG1 Oligodendrocyte transcription factor 1 0.34 35 GRM3 Metabotropic glutamate receptor 3 0.28 36 CACNG3 Voltage-dependent calcium channel gamma-3 subunit 0.51 37 HRH3 Histamine H3 receptor 0.28 38 OLIG2 Oligodendrocyte transcription factor 2 0.31 39 RIT2 GTP-binding protein Rit2 0.34 40 SYNPR Synaptoporin 0.40 41 HPCA Neuron-specific calcium-binding protein 0.47 42 FBXL16 F-box/LRR-repeat protein 16 0.41 43 NEUROD6 Neurogenic differentiation factor 6 0.65 44 TMEM235 Transmembrane protein 235 0.42 45 TTC9B Tetratricopeptide repeat protein 9B 0.37 46 GABBR2 Gamma-aminobutyric acid type B receptor subunit 2 0.31 47 GABRA3 Gamma-aminobutyric acid receptor subunit alpha-3 0.30 MEDICAL SCIENCES 48 CNTNAP4 Contactin-associated protein-like 4 0.34 49 DIRAS2 GTP-binding protein Di-Ras2 0.35 50 SLC12A5 Solute carrier family 12 member 5 0.37 68 NEFL Neurofilament light chain 0.47 >100 S100B S100 calcium binding protein B 0.36 >100 ENO2 Neuron-specific enolase 0.20 >100 UCHL1 Ubiquitin carboxyl-terminal hydrolase isozyme L1 0.23 >100 SPTAN1 Alpha-II spectrin 0.13 >100 MAPT Tau 0.25 >100 NEFH Neurofilament heavy chain 0.60 >100 PRNP Prion protein 0.11 >100 APP Amyloid-beta precursor protein 0.19 Fold difference relative to median: ≤10 1 ≥10

Fig. 4. Variability in expression levels of the top-ranked candidate genes across brain regions. Transcriptional expression levels of the top-50 algorithmically ranked genes, along with those of genes associated with other notable previously proposed candidate biomarkers of neurological damage, across samples from each of the 232 brain regions examined in the ABA dataset. The degree of variability in expression levels across brain regions is indicated by the Gini coefficient.

of SNAP-25, β-synuclein, and KIF5A may be attributable to general demonstrated condition-specific discriminatory ability consistent neuron or axonal damage, and that increased blood levels of MT-3 with their patterns of blood elevations. and GFAP may be attributable to astrocyte damage or activation. For example, separate comparisons of each individual neuro- logical condition to the control group revealed that MOBP, Diagnostic Performance. Receiver operating characteristic (ROC) OPALIN, and MBP exhibited relatively poor performance for analysis was subsequently used to determine how well the cir- diagnosis of acute pathology. However, consistent with their culating levels of proteins associated with the top-eight ranked potential link to oligodendrocyte damage, they were the highest genes could diagnostically discriminate between patients with three performing individual markers in diagnosis of multiple neurological damage and control subjects. In terms of differen- sclerosis; their use in combination produced an AUC of 0.92, tiating the total pool of patients with neurological damage from and allowed for discrimination between groups with 90% sensi- F control subjects, all eight proteins demonstrated modest diagnostic tivity and 87% specificity (Fig. 7 ). While they performed fairly ability, producing area under curve (AUC) values ranging from 0.68 well across all conditions, three markers potentially linked to to 0.80, with SNAP-25, GFAP, and KIF5A offering the highest astrocyte activation and neuron damage, SNAP-25, GFAP, and overall diagnostic performance, and MBP, β-synuclein, and MT-3 KIF5A, exhibited higher levels of performance in diagnosis of patients with acute pathology as opposed to patients with chronic offering the lowest. When used in combination, the coordinate pathology. They were the highest three performing individual blood levels of all eight proteins yielded greater overall diagnostic markers for diagnosis of traumatic brain injury, where their use in performance than any individual marker alone, producing an AUC combination produced an AUC of 0.98, and allowed for discrimi- of 0.94, and could discriminate between groups with 84% sensitivity nation between groups with 100% sensitivity and 92% specificity A and 89% specificity (Fig. 7 ). While this suggests that all eight (Fig. 7B). They were also the highest three individual performing proteins have utility for detection of general neurological damage, markers with respect to diagnosis of hemorrhagic stroke, where the levels of diagnostic performance we observed when considering their use in combination yielded an AUC of 0.99, and allowed for the total subject pool are likely not reflective of their true diagnostic discrimination between groups with 100% sensitivity and 99% potential. Each of the neurological conditions we examined are a specificity (Fig. 7D). Unsurprisingly given its potential link to is- result of different pathophysiological processes; accordingly, some chemic injury, MT-3 offered the highest individual level of diag- conditions were better diagnosed than others, and several proteins nostic performance for detection of ischemic stroke; when used in

O’Connell et al. PNAS | August 25, 2020 | vol. 117 | no. 34 | 20769 Downloaded by guest on October 2, 2021 ABCellular expression Subcellular localization Mitochondrion Rank Neurons Microglia Cytosol Protein length (aa) Astrocytes Oligodendrocytes Maximum expression Protein type / function Extracellular space Plasma membrane Golgi apparatus Endoplasmic reticulum Endosome Nucleus Gene Protein Endotheleal cells Lysosome Peroxisome Cytoskeleton 1 GFAP Glial fibrillary acidic protein Astrocyte 432 2 OPALIN Opalin Oligodendrocyte Transmembrane sialylglycoprotein 141 3 MT3 Metallothionein-3 Astrocyte Metal-binding protein 68 4 SNAP25 Synaptosomal-associated protein 25 Neuron Vesicular membrane fusion protein 206 5 SNCB Beta-synuclein Neuron Intrinically disordered protein 134 6 KIF5A Kinesin heavy chain isoform 5A Neuron Microtubule motor protein 1032 7 MBP Myelin basic protein Oligodendrocyte Intrinically disordered protein 304 8 MOBP Myelin-associated oligodendrocyte basic protein Oligodendrocyte Soluble basic protein 183 9 NCAN Neurocan Astrocyte Extracellular matrix protein 1321 10 CEND1 Cell cycle exit and neuronal differentiation protein 1 Neuron Intracellular transmembrane protein 149 11 STMN2 Stathmin-2 Neuron Mircotubule-associated protein 179 12 C1ORF61 Protein CROC-4 Astrocyte Transcriptional cofactor 156 13 GABRA1 Gamma-aminobutyric acid receptor subunit alpha-1 Neuron Transmembrane ionotropic receptor subunit 456 14 PLP1 Myelin proteolipid protein Oligodendrocyte Transmembrane proteolipid 277 15 GRIN1 Glutamate receptor ionotropic, NMDA 1 Neuron Transmembrane ionotropic receptor subunit 938 16 PSD2 PH and SEC7 domain-containing protein 2 Astrocyte - 771 17 TNR Tenascin R Neuron Extracellular matrix protein 1358 18 ELAVL3 ELAV-like protein 3 Oligodendrocyte RNA-binding protein 367 19 BCAN Brevican Neuron Extracellular matrix protein 911 20 KCNJ9 G-protein-activated inward rectifier potassium channel 3 Neuron Transmembrane 393 21 SEPTIN3 Neuronal-specific septin-3 Neuron Cytoskeletal GTPase 358 22 NSG2 Neuronal vesicle trafficking-associated protein 2 Neuron Clatherin-binding protein 171 23 CSPG5 Chondroitin sulfate proteoglycan 5 Oligodendrocyte Extracellular matrix protein 566 24 GAP43 Neuromodulin Neuron Calmodulin-binding protein 238 25 STMN4 Stathmin 4 Oligodendrocyte Mircotubule-associated protein 189 26 C2ORF80 Uncharacterized protein Neuron - 193 27 HTR5A 5-hydroxytryptamine receptor 5A Neuron Transmembrane G-protein coupled receptor 357 28 PACSIN1 Syndapin-1 Neuron Mircotubule associated protein 444 29 CPLX2 Complexin-2 Neuron Vesicular fusion regulator 134 30 CACNG7 Voltage-dependent calcium channel gamma-7 subunit Astrocyte Transmembrane ion channel subunit 275 31 SLC1A2 Excitatory amino acid transporter 2 Astrocyte Transmembrane solute transporter 574 32 GPM6A Neuronal membrane glycoprotein M6-a Astrocyte Transmembrane glycoprotein 278 33 AMER2 APC membrane recruitment protein 2 Oligodendrocyte Membrane-associated scaffoling protein 671 34 OLIG1 Oligodendrocyte transcription factor 1 Oligodendrocyte Transcription factor 271 35 GRM3 Metabotropic glutamate receptor 3 Astrocyte Transmembrane G-protein coupled receptor 879 36 CACNG3 Voltage-dependent calcium channel gamma-3 subunit Neuron Transmembrane ionotropic receptor subunit 315 37 HRH3 Histamine H3 receptor Neuron Transmembrane G-protein coupled receptor 445 38 OLIG2 Oligodendrocyte transcription factor 2 Oligodendrocyte Transcription factor 323 39 RIT2 GTP-binding protein Rit2 Neuron GTP-binding protein 217 40 SYNPR Synaptoporin Neuron Vesicular membrane channel 265 41 HPCA Neuron-specific calcium-binding protein Neuron Soluble calcium binding protein 193 42 FBXL16 F-box/LRR-repeat protein 16 Neuron Protein-ubiquitin transferase 479 43 NEUROD6 Neurogenic differentiation factor 6 Neuron Transcription factor 337 44 TMEM235 Transmembrane protein 235 Oligodendrocyte Transmembrane protein 223 45 TTC9B Tetratricopeptide repeat protein 9B Neuron - 239 46 GABBR2 Gamma-aminobutyric acid type B receptor subunit 2 Neuron Transmembrane G-protein coupled receptor 941 47 GABRA3 Gamma-aminobutyric acid receptor subunit alpha-3 Neuron Transmembrane ionotropic receptor subunit 492 48 CNTNAP4 Contactin-associated protein-like 4 Neuron Cell adhesion molecule 1308 49 DIRAS2 GTP-binding protein Di-Ras2 Neuron GTPase 199 50 SLC12A5 Solute carrier family 12 member 5 Neuron Transmembrane ion transporter 1139 68 NEFL Neurofilament light chain Neuron Intermediate filament 543 >100 S100B S100 calcium binding protein B Oligodendrocyte Calcium-binding protein 92 >100 ENO2 Neuron-specific enolase Neuron Metalloenzyme 434 >100 UCHL1 Ubiquitin carboxyl-terminal hydrolase isozyme L1 Neuron Ubiquitin protease 223 >100 SPTAN1 Alpha-II spectrin Neuron Cytoskeletal protein 2472 >100 MAPT Tau Neuron Mircotubule-associated protein 758 >100 NEFH Neurofilament heavy chain Neuron Intermediate filament 1026 >100 PRNP Prion protein Neuron Membrane-associated protein 73 >100 APP Amyloid-beta precursor protein Oligodendrocyte Integral 770 MIN EXPRESSION MAX EXPRESSION LOW CONFIDENCE HIGH CONFIDENCE

Fig. 5. Cellular and subcellular expression profiles of the top-ranked candidate genes. (A) Transcriptional expression levels of the top-50 algorithmically ranked genes, along with those of genes associated with other notable previously proposed candidate biomarkers of neurological damage, across various distinct human brain cell populations profiled by single-cell sequencing. The cell population with the highest average expression levels for each gene is indicated. (B) Descriptions of proteins coded for by the top-50 algorithmically ranked genes, as well as genes associated with other notable previously proposed candidate biomarkers of neurological damage, along with their predicted subcellular localizations. Predicated protein subcellular localizations are indicated as confidence scores; higher values indicate a greater degree of confidence a given protein exhibits a given localization.

combination with the next two highest performing markers, SNAP- neurological damage. The results produced by our analysis, which 25 and KIF5A, the three markers produced an AUC of 0.90, and is unprecedented in both its scale and scope, provide valuable could discriminate between groups with 72.0% sensitivity and insights into the diagnostic suitability of numerous previously 93.0% specificity (Fig. 7C). proposed candidate biomarkers, and identify several previously Further supportive of the idea that the presence of these unexplored candidate biomarkers with strong potential for future proteins in the blood may be attributed to different pathophys- clinical use. iological mechanisms, in many cases, they displayed the ability to Three previously proposed candidate biomarkers, GFAP, diagnostically discriminate between different neurological con- MBP, and NfL, fell near the top of our algorithmic analysis, ditions with fairly high levels of accuracy, especially when used in ranking at 1st, 7th, and 68th, respectively. These results strongly combination (SI Appendix, Fig. S2). Taken together, the collec- support those of numerous prior investigations which have tive levels of diagnostic performance observed across these reported elevations in their circulating levels in various states of analyses suggest that the top-ranked candidate biomarkers neurological damage and provide further evidence that they may identified using our algorithmic search strategy could have true have true diagnostic utility (23–27). Unlike GFAP, MBP, and clinical utility for detection of neurological damage across a NfL, the remaining previously proposed biomarkers which were multitude of common pathologies. interrogated in our algorithmic analysis, including S100B, NSE, UCH-L1, spectrin-αII, Tau, NfH, PrP, and Aβ, all ranked out- Discussion side of the top 100. This phenomenon was largely attributable to In the work described here, we employed an algorithmic approach the fact that they displayed a shockingly low degree of brain to systematically evaluate the protein-coding genome to identify enrichment at the transcriptional level, which was unexpected genes with the highest potential to produce blood biomarkers of given that many of them are frequently cited as being brain-

20770 | www.pnas.org/cgi/doi/10.1073/pnas.2007719117 O’Connell et al. Downloaded by guest on October 2, 2021 AB989 339 C 581 D677 E 2441 786 287 476 521 1916 625 243 391 401 1505 497 206 321 309 1182 395 174 263 238 928 314 147 216 183 728 250 125 177 141 572 198 106 145 109 449 158 89 119 84 353 125 76 98 65 277 100 64 80 50 217 79 54 66 38 171 63 46 54 30 134 (pg/mL) 50 39 44 23 105 40 33 36 18 83 32 28 30 13 65 β-synuclein 25 24 24 10 51 Serum MT-3 (pg/mL) Serum MT-3 Serum SNAP-25 (pg/mL) Serum GFAP (pg/mL) Serum GFAP Serum OPALIN (pg/mL) Serum OPALIN <20 <8 <20 <20 Serum <40 IS IS IS IS IS HS HS HS HS AD AD AD AD MS HS MS MS MS TBI TBI AD MS TBI TBI TBI Control Control Control Control Control Kruskal-Wallis: p=2.74E-10* Kruskal-Wallis: p=8.93E-9* Kruskal-Wallis: p=1.22E-7* Kruskal-Wallis: p=4.26E-13* Kruskal-Wallis: p=5.51E-4* F G H I 529 548 255 455 445 220 391 362 189

336 294 163 Neuron Astrocyte 288 239 140 Oligodendrocyte 248 194 121 213 158 104 183 128 89 MEDICAL SCIENCES 157 104 77 P A LIN P-25 P B P A -3 A 135 84 66 A F5 T O F KI G M β-synuclein M MB OP 116 69 57 SN 100 56 49 MBP 0.59* 0.59* 0.16 0.13 0.15 0.19 0.21 86 45 42 OPALIN 0.59* 0.61* 0.25* 0.26* 0.25* 0.25* 0.26* 73 37 36 63 30 31 MOBP 0.59* 0.61* 0.18 0.13 0.29* 0.26* 0.24* 54 24 27 GFAP 0.16 0.25* 0.18 0.46* 0.13 0.49* 0.36* 47 20 23 MT-3 0.13 0.26* 0.13 0.46* 0.15 0.32* 0.19 Serum MBP (pg/mL) Serum MBP Serum KIF5A (pg/mL) Serum KIF5A <40 <16 (pg/mL) Serum MOBP <20

0.15 0.25* 0.29* 0.13 0.15 0.44* 0.42*

IS β-synuclein IS IS HS HS HS AD AD AD MS MS MS TBI TBI TBI SNAP-25 0.19 0.25* 0.26* 0.49* 0.32* 0.44* 0.60* Control Control Control Kruskal-Wallis: p=4.55E-8* Kruskal-Wallis: p=5.62E-6* Kruskal-Wallis: p=8.37E-9* KIF5A 0.21 0.26* 0.24* 0.36* 0.19 0.42* 0.60*

Group median ± IQR Pooled median ± IQR Significant post-hoc test r=0 r=1

Fig. 6. Circulating levels of proteins associated with the top-eight ranked candidate genes in patients with neurological disease. (A–H) Circulating con- centrations of proteins associated with each of the top-eight ranked candidate genes in neurologically normal controls and patients diagnosed with traumatic brain injury (TBI), ischemic stroke (IS), hemorrhagic stroke (HS), Alzheimer’s disease (AD), and multiple sclerosis (MS). Data are presented on a log2 scale, with the minimum axis value corresponding to the lower limit of quantification. Protein concentrations were compared between groups with the Kruskal–Wallis test. Post hoc comparisons were made using the Mann–Whitney u test, with P values adjusted for multiple comparisons using the Holm–Bonferroni method. (I) Correlation matrix depicting the relationships between the circulating levels of proteins associated with the top-eight ranked genes across all subjects after controlling for diagnosis. Strength of partial correlations were assessed via Spearman’s rho, and P values were adjusted for multiple comparisons using the Holm–Bonferroni method. Hierarchical clustering indicates similarity in expression based on correlation; the predominant brain cell population of expression associated with each major cluster of proteins is indicated in the cluster dendrogram. *Statistically significant.

specific proteins (29, 30, 36). Because our findings suggest that offer high levels of diagnostic performance in many true clinical- their expression is poorly restricted to brain, they may be limited use scenarios. in their ability to accurately detect damage to brain tissue in the This is likely why S100B, NSE, and UCH-L1 have performed presence of concurrent damage to peripheral tissues and organs, poorly as blood biomarkers in several conditions, particularly which is often a feature in many neurological conditions. For ischemic stroke (40, 41) and multiple sclerosis (42, 43), when example, in stroke, common comorbidities such as diabetes and trialed under clinically applicable study designs. From this per- spective, it is particularly interesting that the US Food and Drug atherosclerosis cause cellular damage to the vasculature and Administration (FDA) recently approved a dual-analyte assay kidneys (37). In Alzheimer’s disease, a multitude of common measuring circulating levels of UCH-L1 in combination with GFAP age-related maladies and general atrophy cause similar cellular for clinical detection of traumatic brain injury (44). Based on our damage across the body (38). Perhaps most obviously, traumatic observations, it is possible that a majority of the discriminatory brain injury is often a result of collision sports, vehicular acci- power of this assay is attributable to GFAP alone. If this is true, it dents, falls, and other types of events which commonly result in would mean that a single-analyte assay targeting only GFAP may be numerous peripheral injuries (39). Thus, generally, it is doubtful able to achieve similar or equivalent performance, while being more that blood measures of any of these proteins would be able to cost effective and easier to implement at the point of care.

O’Connell et al. PNAS | August 25, 2020 | vol. 117 | no. 34 | 20771 Downloaded by guest on October 2, 2021 All ND vs. Control TBI vs. Control IS vs Control A 100 B 100 C 100 90 90 90 80 80 80 70 70 70 60 60 60 50 GFAP 50 GFAP 50 GFAP OPALIN OPALIN OPALIN 40 MT-3 40 MT-3 40 MT-3 30 SNAP-25 30 SNAP-25 30 SNAP-25 β-synuclein β-synuclein β-synuclein 20 KIF5A 20 KIF5A 20 KIF5A MBP MBP MBP 10 10 10 Sensitivity (%) MOBP Sensitivity (%) MOBP Sensitivity (%) MOBP 0 Combined 0 Combined 0 Combined 0 0 0 70 40 30 90 80 70 60 50 40 90 80 60 50 20 10 90 80 40 30 20 10 70 60 50 30 20 10 100 100 100 Specificity (%) Specificity (%) Specificity (%)

*Combined *Combined *Combined *SNAP-25 *SNAP-25 *MT-3 *GFAP *GFAP *SNAP-25 *KIF5A *KIF5A *KIF5A *OPALIN *MOBP *GFAP *MOBP *OPALIN *OPALIN *MBP *MBP *β-synuclein *β-synuclein β-synuclein *MBP *MT-3 MT-3 *MOBP 0.5 0.6 0.7 0.8 0.9 1.0 0.4 0.8 0.9 1.0 0.0 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.1 0.2 0.3 0.4 0.5

AUC: Sens: Spec: AUC: Sens: Spec: AUC: Sens: Spec:

HS vs. Control AD vs. Control MS vs. Control D 100 E 100 F 100 90 90 90 80 80 80 70 70 70 60 60 60 50 GFAP 50 GFAP 50 GFAP OPALIN OPALIN OPALIN 40 MT-3 40 MT-3 40 MT-3 30 SNAP-25 30 SNAP-25 30 SNAP-25 β-synuclein β-synuclein β-synuclein 20 KIF5A 20 KIF5A 20 KIF5A MBP MBP MBP 10 10 10 Sensitivity (%) MOBP Sensitivity (%) MOBP Sensitivity (%) MOBP 0 Combined 0 Combined 0 Combined 0 0 0 10 40 30 20 10 90 80 70 60 50 20 40 30 20 10 60 50 90 40 30 90 80 70 60 50 80 70 100 100 100 Specificity (%) Specificity (%) Specificity (%)

*Combined *Combined *Combined *SNAP-25 *SNAP-25 *MOBP *GFAP *β-synuclein *OPALIN *KIF5A *MOBP *MBP *MOBP *KIF5A *GFAP MBP *GFAP *KIF5A OPALIN OPALIN *SNAP-25 MT-3 MBP *β-synuclein β-synuclein MT-3 MT-3 0.6 1.0 0.8 0.7 0.8 0.9 1.0 0.4 0.5 0.6 0.8 0.9 0.0 0.1 0.9 0.0 0.1 0.2 0.3 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.2 0.3 0.4 0.5 1.0

AUC: Sens: Spec: AUC: Sens: Spec: AUC: Sens: Spec:

Fig. 7. Diagnostic performance of proteins associated with the top-eight ranked candidate genes. (A–F) ROC curves indicating the individual and combined abilities of circulating levels of proteins associated with the top-eight ranked genes to discriminate between neurologically normal controls and the total pool of patients diagnosed with neurological damage (All ND), patients diagnosed with traumatic brain injury (TBI), ischemic stroke (IS), hemorrhagic stroke (HS), Alzheimer’s disease (AD), and multiple sclerosis (MS). AUC, sensitivity, and specificity values with 95% confidence intervals are indicated. Sensitivity and specificity values are associated with the cutoff yielding the highest Youden value. For the comparison between control subjects and the total pool of patients with neurological damage, the combined ROC curve was generated using the circulating levels of all eight proteins. For comparisons between control subjects and specific neurological diagnoses, combined ROC curves were generated using the circulating levels of only the top three individually performing proteins, as determined by AUC. The statistical significance of AUC values was tested using the DeLong method, and P values were adjusted for multiple comparisons using the Holm–Bonferroni method. *Statistically significant AUC. A complete summary of all diagnostic statistics with optimal cutoff values can be found in SI Appendix, Table S9.

20772 | www.pnas.org/cgi/doi/10.1073/pnas.2007719117 O’Connell et al. Downloaded by guest on October 2, 2021 It is important to note that with respect to some of the previ- have been reported to be as low as 30% sensitive (4, 5), and the ously proposed biomarkers assessed in our analysis, the lack of only biomarker-based screening tool with FDA approval, which brain specificity we observed may not necessarily preclude them targets GFAP and UCH-L1, has been reported to be only about from being informative in certain neurological conditions, espe- 35% specific (44), KIF5A and SNAP-25 could give much needed cially if they are directly involved in pathogenesis. For example, additional diagnostic power to current clinical tools if our find- because Alzheimer’s disease is pathophysiologically associated ings can be validated in a larger cohort of patients. with an accumulation of neurofibrillary tangles and plaques Another relatively unexplored blood marker identified in our comprised of tau and Aβ, measures of these proteins in the blood, analysis, MT-3, displayed properties which suggest it could be particularly disease-related isoforms, may still offer diagnostic similarly valuable for detection of ischemic stroke during triage. utility. However, even in this case, a lack of brain specificity may Much like traumatic brain injury, early recognition is essential still limit the degree of accuracy these markers can truly achieve for positive outcome, as it is estimated that 1.9 million neurons when measured in peripheral circulation; this could explain the are permanently lost every minute without intervention (54). fact that measures of tau and Aβ in cerebrospinal fluid have his- However, the ability to confidently rule out a stroke diagnosis in torically performed better than measures in blood for detection of patients presenting with neurological symptoms is also impor- Alzheimer’s pathology (45). tant, as mistriaged patients with nonstroke disorders can put In addition to providing insights regarding the aforementioned significant resource strain on stroke centers (55, 56). We ob- previously proposed biomarkers, our algorithmic analysis also served robust elevation in circulating level of MT-3 in ischemic identified several previously unexplored candidates, as proteins stroke patients at hospital admission, and MT-3 was able to associated with 97 of the top-100 ranked genes have yet to be discriminate between ischemic stroke patients and controls with widely investigated for blood-based diagnostic use. While we 70% sensitivity and 95% specificity. Given that the most com- only focused on the highest six ranked of these markers in our monly used symptom-based stroke recognition tools available to confirmatory blood analysis, many others displayed body- and clinicians in the prehospital and early in-hospital setting are of- brain-wide expression profiles which suggest that they may have ten unreliable, with levels of sensitivity ranging from 44 to 95%, similar diagnostic utility. If they can be validated, this large pool and levels of specificity ranging from 21 to 78% (57), MT-3 could of newly identified candidate markers could be extremely valu- be a strong candidate for future use in biomarker-based stroke able, given that it is becoming increasingly evident that single screening if our findings can be confirmed in a larger-scale blood markers are inadequate to provide high levels of accuracy follow-up investigation. for many neuro applications, and that the development of true While our results are exciting, it is important to note that this precision diagnostics will likely require the use of multiple study was not without limitations. Perhaps most notable is that MEDICAL SCIENCES markers as part of multianalyte algorithmic assays (46–49). our algorithmic analysis was performed at the transcriptional Many of the top-ranked previously uninvestigated markers level, even though our end goal was to identify protein bio- which we assessed in our blood analysis displayed patterns of markers. While the correlation between mRNA and protein elevations and levels of diagnostic performance which were en- expression is far from perfect, several studies have concluded couraging in terms of future clinical use. For example, levels of that cellular mRNA levels are the primary determinant of pro- OPALIN and MOBP were robustly elevated in patients with tein levels, and that the former is a strong predictor of the latter multiple sclerosis and appeared to indicate cellular damage to (58, 59). Furthermore, the fact that a well-studied and estab- oligodendrocytes specifically. When combined with MBP, which lished protein biomarker ranked first in our analysis, and that all is an often studied cerebral spinal fluid biomarker of multiple of the top-ranked markers were validated at the protein level in sclerosis-related neurological damage (50–52), they were able to our serum assays, suggests that our approach was robust despite discriminate between multiple sclerosis patients and controls this potential limitation. Another potential limitation lies in with 90% sensitivity and 87% specificity, albeit in a limited that we used conventional ELISA techniques to assay the sample size. Thus, diagnostics targeting circulating levels of these concentrations of the top-eight ranked candidate markers in proteins could be useful in the initial diagnosis of multiple our serum analysis; due to the limited analytical sensitivity sclerosis, which is often subjective and ambiguous (53). However, associated with conventional ELISA, in many instances, ana- these proteins may have the most future value for noninvasive lyte levels in several samples fell below lower limits of quan- longitudinal tracking of disease progression. Most molecular titation or were undetected. While this did not hinder our biomarkers which are clinically used in multiple sclerosis, such as ability to detect the presence of intergroup differences, it could oligoclonal band screening and autoantibody tests, detect im- have negatively impacted diagnostic performance. Thus, mune system activity and are therefore indirect indicators of analysis of these markers in future work using more sensitive neurodegeneration (11). Because the presence of OPALIN, immunoassay techniques such as digital ELISA may reveal MOBP, and MBP in the blood appear to be directly linked to even higher levels of diagnostic performance than those which brain tissue damage, they could be useful for evaluating efficacy we have reported here (60, 61). Furthermore, because the of therapeutic interventions, which is essential in developing and capacity of digital ELISA to support multiplexing is increasing maintaining long-term individualized treatment strategies. (62), it possible that the future development of a multiplexed Two previously unexplored blood markers identified in our digital ELISA simultaneously targeting some or all of these analysis, SNAP-25 and KIF5A, appeared to be linked to cellular proteins as a neuro panel could offer both high levels of ana- damage to neurons specifically, and displayed a strong ability to lytical performance and convenience. detect acute neurological conditions, especially traumatic brain It is also worthwhile to note that the general biomarker dis- injury. In fact, when combined with GFAP, they were able to covery strategy employed in our analysis could be modified to discriminate between traumatic brain injury patients and con- identify additional neurological disease biomarkers. The primary trols with 100% sensitivity and 92% specificity. Thus, these goal of our analysis was to identify candidate markers of general markers could be extremely valuable in screening patients for neurological damage; thus, we searched for brain-enriched pro- traumatic brain injury during triage, where timely recognition teins which exhibit ubiquitous expression across brain regions. can avoid debilitating complications by ensuring patients are However, our analysis could be modified to search for brain- referred to an appropriate care team which is trained to manage enriched proteins with region-specific expression profiles whose neurological injury (3). Given that the symptom-based assess- presence in the blood could indicate the precise location of pa- ments predominantly used by clinicians for recognition of trau- thology. Furthermore, our analysis could also be modified to matic brain injury in the prehospital and early in-hospital setting identify other brain-enriched molecular species which could

O’Connell et al. PNAS | August 25, 2020 | vol. 117 | no. 34 | 20773 Downloaded by guest on October 2, 2021 serve similar diagnostic function as the candidate proteins de- Materials and Methods scribed here. For example, similar algorithmic analysis of pub- Detailed methodology for all informatic analyses, recruitment of subjects, licly available micro-RNA (miRNA) datasets could be used to and serum protein measurements can be found in SI Appendix, Materials identify brain-enriched miRNAs (63), which may be beneficial in and Methods.

that miRNAs originating from the brain may be more detectable Protection of Human Subjects. All procedures performed in this study in- in the blood than proteins, given that they can be measured via volving human participants were in accordance with the ethical standards set PCR-based methods. Likewise, the datasets forth in the 1964 Helsinki declaration and its later amendments, and were used in our analysis could be used to informatically predict brain- approved by the institutional review boards of University Hospitals (Cleve- enriched metabolites (64); if this information were combined land, OH) and Ruby Memorial Hospital (Morgantown, WV). Written informed consent was obtained from all subjects or their authorized representatives with other existing disease-specific gene expression datasets, it prior to all study procedures. may be possible to identify blood-borne brain-originating me- tabolites with a high degree of disease specificity. Data Availability. Genotype Tissue Expression RNA sequencing data are available Our collective findings provide an unprecedented depth of from https://gtexportal.org/home/datasets. Allen Brain Atlas microarray data insight into numerous previously proposed candidate blood are available from https://human.brain-map.org/static/download. Brain tissue single-cell RNA sequencing data are available from the NCBI GEO database via biomarkers of neurological damage, and suggest that several may accession number GSE67835 and can be found at https://www.ncbi.nlm.nih.gov/ have limited diagnostic utility in many clinical-use scenarios due geo/query/acc.cgi?acc=GSE67835. Compartments subcellular localization data to a low degree of brain specificity. Just as importantly, we have are available from https://compartments.jensenlab.org/Downloads. Blood bio- also identified a plethora of previously unexplored biomarkers marker data are available via the Open Science Framework and can be found at which have strong potential for clinical use in several common https://osf.io/rab7q.

neurological conditions, particularly traumatic brain injury, ACKNOWLEDGMENTS. We thank Dr. Paul D. Chantler of West Virginia stroke, and multiple sclerosis. Further clinical validation of these University, and Dr. Taura L. Barr, formerly of West Virginia University, for markers could lead to the development of blood-based precision providing access to stroke and traumatic brain injury blood samples. We also thank Dr. Mary E. Kerr and Ali L. Clark for critical review of the manuscript. molecular diagnostics with the potential to transform how we Work was financially supported by Case Western Reserve University Frances detect and monitor these debilitating pathologies. Payne Bolton School of Nursing start-up funds issued to G.C.O.

1. V. L. Feigin et al.; GBD 2016 Neurology Collaborators, Global, regional, and national 22. S. O’Hagan, M. Wright Muelas, P. J. Day, E. Lundberg, D. B. Kell, GeneGini: Assessment burden of neurological disorders, 1990-2016: A systematic analysis for the global via the Gini coefficient of reference “housekeeping” genes and diverse human burden of disease study 2016. Lancet Neurol. 18, 459–480 (2019). transporter expression profiles. Cell Syst. 6, 230–244.e1 (2018). 2. K. R. Lees et al.; ECASS, ATLANTIS, NINDS and EPITHET rt-PA Study Group, Time to 23. Z. Yang, K. K. W. Wang, Glial fibrillary acidic protein: From intermediate filament treatment with intravenous alteplase and outcome in stroke: An updated pooled assembly and gliosis to neurobiomarker. Trends Neurosci. 38, 364–374 (2015). analysis of ECASS, ATLANTIS, NINDS, and EPITHET trials. Lancet 375, 1695–1703 (2010). 24. D. G. Thomas, N. R. Hoyle, P. Seeldrayers, Myelin basic protein immunoreactivity in 3. I. K. Moppett, Traumatic brain injury: Assessment, resuscitation and early manage- serum of neurosurgical patients. J. Neurol. Neurosurg. Psychiatry 47, 173–175 (1984). ment. Br. J. Anaesth. 99,18–31 (2007). 25. D. G. T. Thomas, J. W. Palfreyman, J. G. Ratcliffe, Serum-myelin-basic-protein assay in 4. G. Fuller, G. McClelland, T. Lawrence, W. Russell, F. Lecky, The diagnostic accuracy of diagnosis and prognosis of patients with head injury. Lancet 1, 113–115 (1978). the HITSNS prehospital triage rule for identifying patients with significant traumatic 26. N. Wa¸ sik et al., Serum myelin basic protein as a marker of brain injury in aneurysmal – brain injury: A cohort study. Eur. J. Emerg. Med. 23,61–64 (2016). subarachnoid haemorrhage. Acta Neurochir. (Wien) 162, 545 552 (2020). 5. G. Fuller, T. Lawrence, M. Woodford, F. Lecky, The accuracy of alternative triage rules 27. M. Khalil et al., as biomarkers in neurological disorders. Nat. Rev. – for identification of significant traumatic brain injury: A diagnostic cohort study. Neurol. 14, 577 589 (2018). Emerg. Med. J. 31, 914–919 (2014). 28. H. Zetterberg, D. H. Smith, K. Blennow, Biomarkers of mild traumatic brain injury in – 6. A. E. Arch et al., Missed ischemic stroke diagnosis in the emergency department by cerebrospinal fluid and blood. Nat. Rev. Neurol. 9, 201 210 (2013). 29. H. J. Kim, J. W. Tsao, A. G. Stanfill, The current state of biomarkers of mild traumatic emergency medicine and neurology services. Stroke 47, 668–673 (2016). 7. N. M. Lever et al., Missed opportunities for recognition of ischemic stroke in the brain injury. JCI Insight 3, e97105 (2018). 30. O. Y. Glushakova, A. V. Glushakov, E. R. Miller, A. B. Valadka, R. L. Hayes, Biomarkers emergency department. J. Emerg. Nurs. 39, 434–439 (2013). for acute diagnosis and management of stroke in neurointensive care units. Brain 8. B. Jiang et al., Pre-hospital delay and its associated factors in first-ever stroke regis- Circ. 2,28–47 (2016). tered in communities from three cities in China. Sci. Rep. 6, 29795 (2016). 31. S. Darmanis et al., A survey of human brain transcriptome diversity at the single cell 9. D. N. Kernagis, D. T. Laskowitz, Evolving role of biomarkers in acute cerebrovascular level. Proc. Natl. Acad. Sci. U.S.A. 112, 7285–7290 (2015). disease. Ann. Neurol. 71, 289–303 (2012). 32. J. X. Binder et al., COMPARTMENTS: Unification and visualization of protein subcel- 10. H. Hampel et al., Blood-based biomarkers for Alzheimer disease: Mapping the road to lular localization evidence. Database (Oxford) 2014, bau012 (2014). the clinic. Nat. Rev. Neurol. 14, 639–652 (2018). 33. S. Yanagitani et al., Ischemia induces metallothionein III expression in neurons of rat 11. M. Comabella, X. Montalban, Body fluid biomarkers in multiple sclerosis. Lancet brain. Life Sci. 64, 707–715 (1999). Neurol. 13, 113–126 (2014). 34. K. Tanji et al., Expression of metallothionein-III induced by hypoxia attenuates 12. M. D. Weingarten, A. H. Lockwood, S. Y. Hwo, M. W. Kirschner, A protein factor hypoxia-induced cell death in vitro. Brain Res. 976, 125–129 (2003). essential for microtubule assembly. Proc. Natl. Acad. Sci. U.S.A. 72, 1858–1862 (1975). 35. T. Yuguchi et al., Expression of growth inhibitory factor mRNA after focal ischemia in 13. L. F. Eng, J. J. Vanderhaeghen, A. Bignami, B. Gerstl, An acidic protein isolated from rat brain. J. Cereb. Blood Flow Metab. 17, 745–752 (1997). fibrous astrocytes. Brain Res. 28, 351–354 (1971). 36. J. Kamtchum-Tatuene, G. C. Jickling, Blood biomarkers for stroke diagnosis and 14. B. W. Moore, D. McGregor, Chromatographic and electrophoretic fractionation of management. Neuromolecular Med. 21, 344–368 (2019). – soluble proteins of brain and liver. J. Biol. Chem. 240, 1647 1653 (1965). 37. K. I. Gallacher, B. D. Jani, P. Hanlon, B. I. Nicholl, F. S. Mair, Multimorbidity in stroke. 15. B. W. Moore, A soluble protein characteristic of the . Biochem. Bio- Stroke 50, 1919–1926 (2019). – phys. Res. Commun. 19, 739 744 (1965). 38. C. López-Otín, M. A. Blasco, L. Partridge, M. Serrano, G. Kroemer, The hallmarks of 16. R. H. Laatsch, M. W. Kies, S. Gordon, E. C. Alvord Jr., The encephalomyelitic activity of aging. Cell 153, 1194–1217 (2013). – myelin isolated by ultracentrifugation. J. Exp. Med. 115, 777 788 (1962). 39. E. Rickels, K. von Wild, P. Wenzlaff, Head injury in Germany: A population-based 17. J. F. Doran, P. Jackson, P. A. Kynoch, R. J. Thompson, Isolation of PGP 9.5, a new prospective study on epidemiology, causes, treatment and outcome of all degrees human neurone-specific protein detected by high-resolution two-dimensional elec- of head-injury severity in two distinct areas. Brain Inj. 24, 1491–1504 (2010). – trophoresis. J. Neurochem. 40, 1542 1547 (1983). 40. C. Ren et al., Assessment of serum UCH-L1 and GFAP in acute stroke patients. Sci. Rep. ’ 18. G. G. Glenner, C. W. Wong, Alzheimer s disease: Initial report of the purification and 6, 24588 (2016). characterization of a novel cerebrovascular amyloid protein. Biochem. Biophys. Res. 41. A. Bustamante et al., Blood biomarkers for the early diagnosis of stroke: The stroke- Commun. 120, 885–890 (1984). chip study. Stroke 48, 2419–2425 (2017). 19. P. N. Hoffman, R. J. Lasek, The slow component of axonal transport. Identification of 42. M. W. Koch, S. George, W. Wall, V. Wee Yong, L. M. Metz, Serum NSE level and major structural polypeptides of the axon and their generality among mammalian disability progression in multiple sclerosis. J. Neurol. Sci. 350,46–50 (2015). neurons. J. Cell Biol. 66, 351–366 (1975). 43. E. T. Lim et al., Serum S100B in primary progressive multiple sclerosis patients treated 20. J. Lonsdale et al.; GTEx Consortium, The genotype-tissue expression (GTEx) project. with interferon-beta-1a. J. Negat. Results Biomed. 3, 4 (2004). Nat. Genet. 45, 580–585 (2013). 44. J. J. Bazarian et al., Serum GFAP and UCH-L1 for prediction of absence of intracranial 21. E. H. Shen, C. C. Overly, A. R. Jones, The allen human brain Atlas: Comprehensive gene injuries on head CT (ALERT-TBI): A multicentre observational study. Lancet Neurol. 17, expression mapping of the human brain. Trends Neurosci. 35, 711–714 (2012). 782–789 (2018).

20774 | www.pnas.org/cgi/doi/10.1073/pnas.2007719117 O’Connell et al. Downloaded by guest on October 2, 2021 45. B. Olsson et al., CSF and blood biomarkers for the diagnosis of Alzheimer’s disease: A 56. N. Goyal, S. Male, A. Al Wafai, S. Bellamkonda, R. Zand, Cost burden of stroke mimics systematic review and meta-analysis. Lancet Neurol. 15, 673–684 (2016). and transient ischemic attack after intravenous tissue plasminogen activator treat- 46. G. C. Jickling, F. R. Sharp, Biomarker panels in ischemic stroke. Stroke 46,915–920 (2015). ment. J. Stroke Cerebrovasc. Dis. 24, 828– 833 (2015). 47. M. I. Hiskens, A. G. Schneiders, M. Angoa-Pérez, R. K. Vella, A. S. Fenning, Blood 57. Z. Zhelev, G. Walker, N. Henschke, J. Fridhandler, S. Yip, Prehospital stroke scales as biomarkers for assessment of mild traumatic brain injury and chronic traumatic en- screening tools for early identification of stroke and transient ischemic attack. Co- – cephalopathy. Biomarkers 25, 213 227 (2020). chrane Database Syst. Rev. 4, CD011427 (2019). ’ 48. S. Ray et al., Classification and prediction of clinical Alzheimer s diagnosis based on 58. J. J. Li, P. J. Bickel, M. D. Biggin, System wide analyses have underestimated protein – plasma signaling proteins. Nat. Med. 13, 1359 1362 (2007). abundances and the importance of transcription in mammals. PeerJ 2, e270 (2014). ’ 49. G. C. O Connell, P. Stafford, K. B. Walsh, O. Adeoye, T. L. Barr, High-throughput 59. M. Jovanovic et al., Immunogenetics. Dynamic profiling of the protein life cycle in profiling of circulating antibody signatures for stroke diagnosis using small volumes response to pathogens. Science 347, 1259038 (2015). of whole blood. Neurotherapeutics 16, 868–877 (2019). 60. G. C. O’Connell et al., Use of high-sensitivity digital ELISA improves the diagnostic 50. F. Barkhof et al., A correlative triad of gadolinium-DTPA MRI, EDSS, and CSF-MBP in performance of circulating brain-specific proteins for detection of traumatic brain relapsing multiple sclerosis patients treated with high-dose intravenous methyl- injury during triage. Neurol. Res. 42, 346–353 (2020). prednisolone. Neurology 42,63–67 (1992). 61. G. C. O’Connell et al., Diagnosis of ischemic stroke using circulating levels of brain- 51. J. N. Whitaker, Myelin encephalitogenic protein fragments in cerebrospinal fluid of persons with multiple sclerosis. Neurology 27, 911–920 (1977). specific proteins measured via high-sensitivity digital ELISA. Brain Res. 1739, 146861 52. K. J. Lamers, H. P. de Reus, P. J. Jongen, Myelin basic protein in CSF as indicator of (2020). disease activity in multiple sclerosis. Mult. Scler. 4, 124–126 (1998). 62. J. Lambert et al., Development of a high sensitivity 10-plex human cytokine assay 53. A. J. Solomon, J. R. Corboy, The tension between early diagnosis and misdiagnosis of using Simoa Planar Array technology. J. Immunol. 202, 52.21 (2019). ’ multiple sclerosis. Nat. Rev. Neurol. 13, 567–572 (2017). 63. G. C. O Connell, C. G. Smothers, C. Winkelman, Bioinformatic analysis of brain-specific 54. J. L. Saver, Time is brain–quantified. Stroke 37, 263–266 (2006). miRNAs for identification of candidate traumatic brain injury blood biomarkers. Brain 55. J. Neves Briard et al., Stroke mimics transported by emergency medical services to a Inj. 34, 965–974 (2020). comprehensive stroke center: The magnitude of the problem. J. Stroke Cerebrovasc. 64. A. Karnovsky et al., Metscape 2 bioinformatics tool for the analysis and visualization Dis. 27, 2738–2745 (2018). of metabolomics and gene expression data. Bioinformatics 28, 373–380 (2012). MEDICAL SCIENCES

O’Connell et al. PNAS | August 25, 2020 | vol. 117 | no. 34 | 20775 Downloaded by guest on October 2, 2021