Large-Scale Informatic Analysis to Algorithmically Identify Blood Biomarkers of Neurological Damage
Total Page:16
File Type:pdf, Size:1020Kb
Large-scale informatic analysis to algorithmically identify blood biomarkers of neurological damage Grant C. O’Connella,1, Megan L. Aldera, Christine G. Smothersa, and Julia H. C. Changa aSchool of Nursing, Case Western Reserve University, Cleveland, OH 44106 Edited by Vincent T. Marchesi, Yale University School of Medicine, New Haven, CT, and approved July 9, 2020 (received for review April 23, 2020) The identification of precision blood biomarkers which can accurately From a logical perspective, candidate proteins most ideally suited indicate damage to brain tissue could yield molecular diagnostics to serve as such biomarkers are those which display three pre- with the potential to improve how we detect and treat neurological dominant properties. First, they should exhibit highly enriched pathologies. However, a majority of candidate blood biomarkers for expression in brain tissue relative to other tissues, ensuring spec- neurological damage that are studied today are proteins which were ificity. Second, they should be highly abundant within brain tissue, arbitrarily proposed several decades before the advent of high- as lowly expressed proteins may not be released into circulation at throughput omic techniques, and it is unclear whether they repre- high enough levels to enable detection. Third, they should exhibit sent the best possible targets relative to the remainder of the human ubiquitous expression across all brain regions, reducing the risk of proteome. Here, we leveraged mRNA expression data generated false negative diagnosis in the case of focal damage. from nearly 12,000 human specimens to algorithmically evaluate A large number of existing candidate blood biomarkers of neu- over 17,000 protein-coding genes in terms of their potential to pro- rological damage studied today are proteins which were arbitrarily duce blood biomarkers for neurological damage based on their ex- labeled as being brain specific decades ago (12–19); however, in pression profiles both across the body and within the brain. The many cases, their degree of enrichment in brain tissue has been circulating levels of proteins associated with the top-ranked genes poorly validated, especially in humans. Furthermore, a majority of were then measured in blood sampled from a diverse cohort of pa- these proteins were proposed as biomarkers without consideration tients diagnosed with a variety of acute and chronic neurological for brain abundance or expressional variability across brain regions. disorders, including ischemic stroke, hemorrhagic stroke, traumatic Additionally, because many of them were suggested before the brain injury, Alzheimer’s disease, and multiple sclerosis, and evalu- widespread availability and use of high-throughput omic techniques, ated for their diagnostic performance. Our analysis identifies several they have predominantly been studied in low-throughput investi- previously unexplored candidate blood biomarkers of neurological gations only considering a handful of targets. Due to the unsys- damage with possible clinical utility, many of which whose presence tematic manner in which these existing candidates have been in blood is likely linked to specific cell-level pathologic processes. proposed and investigated, it is currently unclear whether they Furthermore, our findings also suggest that many frequently cited represent the best possible biomarkers relative to the remainder of previously proposed blood biomarkers exhibit expression profiles the human proteome. which could limit their diagnostic efficacy. Thus, our goal was to systematically search the protein-coding genome to identify genes with the highest potential to produce molecular diagnostics | stroke | multiple sclerosis | traumatic brain injury | blood biomarkers of neurological damage. To do this, we lever- Alzheimer’s disease aged mRNA expression data generated from nearly 12,000 human specimens to algorithmically evaluate over 17,000 protein-coding ollectively, neurological disorders are the leading cause of genes in terms of a novel biomarker suitability score accounting Cdisability and second leading cause of death worldwide (1). The identification and development of precision blood bio- Significance markers of neurological damage could dramatically improve how we diagnose and treat these debilitating conditions, and ulti- The discovery and development of precision blood biomarkers mately reduce their burden. For example, it is well established which can accurately detect damage to brain tissue could that rapid and accurate diagnosis of acute neurological injuries transform how we diagnose and treat neurological patholo- such as stroke and traumatic brain injury during the early stages gies. In this study, we used mRNA expression data generated of care significantly reduces mortality and morbidity (2, 3). from thousands of tissue samples to algorithmically evaluate However, the symptom-based assessments that are currently nearly every protein-coding gene in the human genome in used for recognition of such injuries during triage have limited terms of potential to produce blood biomarkers for neurolog- accuracy, and up to 35% of patients are misdiagnosed at initial ical damage based on expression profiles both across the body – clinician contact (4 8). In these acute conditions, the develop- and within the brain. This unprecedented analysis identifies a ment of biomarker-based screening tools with the ability to ac- plethora of previously unexplored candidate blood biomarkers curately detect neurological damage could substantially reduce which could have clinical utility for noninvasive diagnosis and rates of mistriage, enable earlier access to intervention, and monitoring of various common neurological conditions, in- improve patient outcomes (9). With respect to chronic neuro- cluding traumatic brain injury, stroke, and multiple sclerosis. degenerative diseases such as Alzheimer’s disease and multiple sclerosis, developing accurate blood biomarkers of neurological Author contributions: G.C.O. designed research; G.C.O. secured funding; G.C.O., M.L.A., damage could allow for more confident early diagnosis, nonin- and C.G.S. performed research; G.C.O. analyzed data; and G.C.O., M.L.A., C.G.S., and vasive tracking of disease progression, and real-time monitoring J.H.C.C. wrote the paper. of response to therapy (10, 11). The authors declare no competing interest. Due to its specialized function, the proteomic composition of This article is a PNAS Direct Submission. the brain is highly unique relative to other organs. Cellular dis- Published under the PNAS license. ruption of neural tissue results in the release of brain-specific 1To whom correspondence may be addressed. Email: [email protected]. proteins into the extracellular environment, and ultimately into This article contains supporting information online at https://www.pnas.org/lookup/suppl/ peripheral circulation. Thus, the detection of these proteins in the doi:10.1073/pnas.2007719117/-/DCSupplemental. blood can serve as a surrogate marker of neurological damage. First published August 6, 2020. 20764–20775 | PNAS | August 25, 2020 | vol. 117 | no. 34 www.pnas.org/cgi/doi/10.1073/pnas.2007719117 Downloaded by guest on September 24, 2021 for brain enrichment, brain abundance, and brain regional vari- annotated (SI Appendix, Fig. S1). In terms of algorithmic rank- ability. Then, to determine whether the top-ranked genes identi- ing, only 100 genes remained after filtering based on brain fold fied in our algorithmic analysis could code for proteins with the enrichment cutoff. The relationships between biomarker suit- potential to provide more detailed diagnostic information re- ability score, brain fold enrichment, brain abundance, and brain garding the specific cellular nature of pathology, we leveraged regional variability for these remaining genes are indicated in single-cell sequencing data generated from human brain tissue to Fig. 1B. The highest ranked genes according to biomarker suit- determine which cell populations the top-ranked genes are ability score generally exhibited a combination of high brain en- expressed within. Finally, in order to directly evaluate their diag- richment, high brain abundance, and low regional variability, while nostic potential, the circulating levels of proteins associated with lower ranked genes tended to exhibit lower levels of brain en- the top-ranked genes were measured in blood sampled from a richment, lower brain abundance, and higher regional variability. diverse cohort of patients diagnosed with a variety of acute and The genes associated with two well-studied candidate bio- chronic neurological disorders, including ischemic stroke, hemor- markers of neurological damage, glial fibrillary acidic protein rhagic stroke, traumatic brain injury, Alzheimer’sdisease,and (GFAP) (23), and myelin basic protein (MBP) (24–26), ranked multiple sclerosis. in the top of the analysis at first and seventh, respectively. The Our collective analysis identifies several previously unexplored gene coding for neurofilament light chain (NfL), another previ- candidate blood biomarkers of neurological damage with po- ously proposed and increasingly studied neurological damage bio- tential clinical utility, many of which whose presence in blood is marker (27), ranked 68th in the analysis. However, the remaining likely linked to specific cell-level pathologic processes. Further- 97 of the 100 top-ranked genes all coded for proteins which have more, our findings also suggest that