Leveraging Large Genomic Datasets to Illuminate the Pathobiology of Autism Spectrum Disorders
Total Page:16
File Type:pdf, Size:1020Kb
www.nature.com/npp NEUROPSYCHOPHARMACOLOGY REVIEWS OPEN Leveraging large genomic datasets to illuminate the pathobiology of autism spectrum disorders Veronica B. Searles Quick1, Belinda Wang1 and Matthew W. State1 “Big data” approaches in the form of large-scale human genomic studies have led to striking advances in autism spectrum disorder (ASD) genetics. Similar to many other psychiatric syndromes, advances in genotyping technology, allowing for inexpensive genome-wide assays, has confirmed the contribution of polygenic inheritance involving common alleles of small effect, a handful of which have now been definitively identified. However, the past decade of gene discovery in ASD has been most notable for the application, in large family-based cohorts, of high-density microarray studies of submicroscopic chromosomal structure as well as high-throughput DNA sequencing—leading to the identification of an increasingly long list of risk regions and genes disrupted by rare, de novo germline mutations of large effect. This genomic architecture offers particular advantages for the illumination of biological mechanisms but also presents distinctive challenges. While the tremendous locus heterogeneity and functional pleiotropy associated with the more than 100 identified ASD-risk genes and regions is daunting, a growing armamentarium of comprehensive, large, foundational -omics databases, across species and capturing developmental trajectories, are increasingly contributing to a deeper understanding of ASD pathology. Neuropsychopharmacology (2021) 46:55–69; https://doi.org/10.1038/s41386-020-0768-y 1234567890();,: INTRODUCTION The identification of more than 100 large-effect risk genes Autism spectrum disorder (ASD) refers to a group of neurodeve- and genomic regions has already led to important insights lopmental disorders (NDDs) defined by social communication into molecular mechanisms. From the earliest results of reliable deficits and restricted, repetitive patterns of behavior or interests gene discovery efforts, synaptic structure and function and [1]. The prevalence of the syndrome is currently estimated to be chromatin modification have been identified as the most ~1.7% in the US population [2]. It has been known for many common points of functional convergence [4, 5, 10–12]. decades that genetic risk plays a critical role in ASD etiology [3]. Nonetheless, there remain important obstacles to leveraging However, only in the past dozen or so years has the systematic gene discovery to develop an actionable understanding of ASD and reliable identification of specific genes conferring liability for pathophysiology. ASD-gained real traction. This transformation has been the result A key issue is differentiating disease mechanisms from the of “shared big data”, made possible by the sequencing of the highly pleiotropic biology encoded by ASD-risk genes. It is more human genome, advent of high-density microarrays, development straightforward to demonstrate a range of biological conse- of high-throughput sequencing technologies, application of quences of a given risk mutation (e.g., differential gene expression, rigorous statistical methods that account for vast numbers of changes in electrophysiological properties, alterations in cellular comparisons, availability of increasingly comprehensive neurobio- proliferation, differentiation, or migration), than it is to confirm logical databases, and prioritization of rapid data-sharing and that a particular observation is a contributor to the human large-scale collaborations across research groups. disorder. A systems biology approach to defining salient aspects A key outcome of this research has been the identification of a of ASD pathology involves using large datasets to identify the large and growing number of genes and regions that definitively intersection or convergence among multiple risk genes some- confer ASD risk [4, 5] and the illumination of the overall genetic where along the path from genes to behavior—for instance in architecture of the syndrome—that is the type and relative relation to molecular pathways, cell types, anatomical regions, distribution of genetic variation underlying the disorder [6] (Fig. 1). and/or developmental stages. Increasingly, -omics approaches Overall, the past decade of research has confirmed that alleles that have been employed in this effort, empowered by a growing are common in the typically developing population exert very armamentarium of large-scale foundational databases character- small individual effects, yet carry the majority of population risk izing region- and cell-specific transcriptional activity across [7]. However, among the ASD clinical population, a substantial species, developmental patterns of gene expression and regula- contribution from rare, typically de novo variants of large effect tion including in humans, and proteomic pathways [13]. The have also been identified [8, 9] (Fig. 2), which offer distinctive rationale for and challenges facing these complementary opportunities and challenges in the pursuit of underlying approaches to elaborating pathophysiology will be discussed biological mechanisms. after a review of the current state of gene discovery in ASD. 1Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94143, USA Correspondence: Matthew W. State ([email protected]) These authors contributed equally: Veronica B. Searles Quick, Belinda Wang Received: 9 May 2020 Revised: 26 June 2020 Accepted: 6 July 2020 Published online: 15 July 2020 © The Author(s) 2020 Leveraging large genomic datasets to illuminate the pathobiology of. VB Searles Quick et al. 56 a Rare inherited variant Common inherited variant Fathers Mothers Children Fathers Mothers Variants in 300,000 human genome rare inherited Children De novo variant 3,000,000 70 de novo Fathers Mothers Children common inherited b Synonymous variant Missense variant Protein-truncating variant Gene from mother Gene from mother Gene from mother DNA Gene from father Gene from father Gene from father RNA Protein from mother Protein from mother Protein from mother Protein Protein from father Protein from father c DNA RNA Protein Fig. 1 Types of genetic variants. a The majority of genetic variation in the human genome is common (population frequency ≥ 1%, blue). These variants are transmitted from parents to offspring via Mendelian inheritance patterns. A smaller proportion is rare (≤1%, purple) and also transmitted from parents. ∼70 variants are de novo (red), observed only in the child, but not in either parent. b The impact of single- nucleotide variants (SNVs) and small (≤50 bp) insertion/deletions (indels) depends on their location in the genome. In the 1.5% of the genome that encodes proteins (the exome), these variants can either be synonymous (no change to the resulting protein), missense (a single amino acid is changed in the protein with variable functional impact), or protein-truncating (leads to nonsense-mediated decay and no protein). Variants and their consequences (red stars) are shown on the father’s allele, but can also arise on the maternal allele. c Copy number variants (CNVs) are large (≥50 bp to millions of nucleotides) deletions (resulting in no protein), or duplications (potentially resulting in excess protein). Figure adapted from Sanders [81] with author permission. ASD GENE DISCOVERY opportunities for gene discovery largely as a consequence of an A confluence of factors has led to the maturation of the field of extremely high correlation between variation at a single genetic ASD genetics: the sequencing of the human genome; associated locus and a distinctive, reliable phenotype—often characterized rapid advances in -omics technologies, a focus on studying rare by intellectual disability, neurological findings and/or dysmorphol- and de novo mutations as well as common alleles, early successes ogy, as well as an increased risk for social impairment. of parent advocacy groups in promoting the creation of large- As early as the 1990s, the study of these “monogenic” forms of scale open genetic resources [14], and highly effective and now ASD began to reveal themes that foreshadowed results of future long-standing partnerships among advocacy groups, academia, research examining larger ASD populations. These include the the National Institutes of Health and philanthropy promoting importance of rare mutations, the involvement of both coding as open data sharing and large-scale scientific collaborations. well as noncoding variation, the finding that a highly diverse array of molecules can all lead to an ASD phenotype, often coincident Mendelian forms of ASD with intellectual disability, and the illumination of implicated High-throughput -omics technologies and associated analytic biological functions, including RNA binding, synaptic structure and approaches have been critical to the emergence of systematic, function, the mammalian target of the rapamycin (mTOR) path- reliable ASD gene discovery, particularly among individuals with way, and gene regulation, epigenetics, and chromatin modifica- common forms of the syndrome. Nonetheless, the first insights tion [21]. into the genetic architecture and biology of social disability can be However, before focusing on the role of big data in ASD gene traced to studies of well-described genetic syndromes that pre- discovery, the above distinction between “syndromic” or “Mende- date the -omics era. These paradigmatic disorders, including lian”