Genes and Language
Total Page:16
File Type:pdf, Size:1020Kb
43 Key Issues and Future Directions: Genes and Language SIMON E. FISHER Determining the molecular mechanisms that underlie of such mutations is clearly illustrated by the example human capacities for speech, language, and reading is of FOXP21 (Fisher & Scharff, 2009). The identification a vibrant and exciting initiative. The prior chapters of of FOXP2 preceded the completion of the first full draft this part comprehensively reviewed the state- of- the- art of the human genome sequence. It relied on analy sis of on mapping ge ne tic factors that contribute to lan- an unusual family (the KE family) in which 15 relatives guage- and reading- related skills (Luciano & Bates, displayed apparent monogenic inheritance of develop- chapter 39), discovery of mutations involved in speech mental speech and language prob lems across three disorders (Morgan, chapter 40), and deciphering the successive generations. Even with such a large family impacts of key genes and mutations on development available to study, the pro cess of uncovering the caus- and function of the ner vous system, in dif ferent model ative mutation was a lengthy one. It began with a screen systems (Vernes, chapter 41). Insights in this area come for genomic regions linked to the disorder (Fisher, not only from language itself, but also by studying ge ne- Vargha- Khadem, Watkins, Monaco, & Pembrey, 1998), tic influences on the associated neural architecture, followed by an intensive gene- by- gene search through most notably the lateralization of much of the relevant one par tic u lar section of chromosome 7, containing functional circuitry (Francks, chapter 42). Here, in this many unknown genes that had never been sequenced final chapter of the part, I lay out the promise and chal- before (Lai et al., 2000). Clues to limit the search came lenges of the coming years for tracing ge ne tic pathways from an unrelated child with a similar disorder, along involved in language, in light of rapid improvements in with a chromosomal rearrangement involving that the available research tools. In par tic u lar, I focus on same region. Eventually, researchers pinpointed the the dramatically reduced costs and enhanced feasibil- KE family mutation (Lai, Fisher, Hurst, Vargha- ity of large- scale genotyping and high- throughput DNA Khadem, & Monaco, 2001), a change of a single base in sequencing and what this will mean for the future land- a crucial part of FOXP2 (at that time, a novel human scape of studies in this area. I emphasize that the sub- gene that had not yet been described). Subsequent stantial advances in molecular technologies will need studies could then look specifically at FOXP2 in other to be matched by increased sophistication in our smaller families. Although these targeted investiga- approach to defining and characterizing language- tions successfully confirmed an etiological role, by related phenotypes. Thus, interdisciplinary connec- finding other mutations of this gene in in de pen dent tions among molecular biologists, psychologists, cases/families with a similar profile of speech and lan- linguists, and neuroscientists are set to become ever guage deficits, they also illustrated that such mutations more essential for pro gress in this fast- moving research are extremely rare (MacDermot et al., 2005). Indeed, endeavor. I will begin by considering rare mutations more than a de cade and a half after discovering FOXP2, and the impact of next- generation DNA sequencing, most cases of speech and language disorders remain before moving on to discuss common ge ne tic variants unexplained at the ge ne tic level (Deriziotis & Fisher, and the prospect of systematic studies of interindivid- 2017; Graham & Fisher, 2015). ual variation in language proficiency. In the meantime, the broader landscape of ge ne tic research has been dramatically transformed by the 1. Rare Gene Variants and How to Find Them development of novel technologies for reading off a person’s genomic sequence (Metzker, 2010). Traditional Rare gene variants that disrupt aspects of speech and methods, such as Sanger sequencing, are highly labori- language development offer power ful win dows into ous, time- consuming, and expensive. These standard neurobiological pathways (see Vernes, chapter 41). The approaches were used to assem ble the first draft fundamental scientific value afforded by the discovery sequence of the human genome, work that took the best 609 part of a de cade, requiring the efforts of tens of thou- but often families with language- related disorders are sands of researchers around the world, at a cost of bil- not large enough to be sure that the observed pattern lions of US dollars. So- called next- generation techniques represents a statistically significant finding (i.e., that (Goodwin, McPherson, & McCombie, 2016), massively the cosegregation is beyond that expected by chance). parallel sequencing of a great many DNA fragments at The KE family is exceptional in this regard (Fisher the same time, allow for an almost entire human genome et al., 1998). Sometimes a study will identify several dif- to be sequenced from a simple saliva sample in a matter fer ent candidate mutations in a family, without con- of hours, at a cost of less than US$1,000 (based on esti- crete evidence to highlight a single true culprit. To mates in 2018, at time of writing). Another upcoming have high confidence that a gene is implicated in a generation of technologies involves direct sequencing of disorder, it is necessary to collate observations of this individual intact molecules of DNA from start to finish, same gene being disrupted by in de pen dent mutations without shearing up into fragments (Loose, 2017; Zaaijer in dif ferent families and/or cases. These factors make et al., 2017). So, as it becomes ever cheaper and quicker it challenging to identify completely novel etiological to readout the genomes of people with speech and lan- genes that have not already been confirmed as poten- guage disorders, how will this impact on our under- tial contributors to language- related disorders. standing of the ge ne tic under pinnings? To distinguish true causative mutations from benign In princi ple, the routine availability of whole genome variation in an affected family, cosegregation with the sequencing should make it considerably easier to study disorder is impor tant, but not sufficient. We also need pedigrees, such as the KE family, in which there are to draw inferences about the likely functional relevance multiple relatives affected with speech- , language- , of any variant of interest (i.e., determining whether it and/or reading- related disorders across several genera- will disturb gene function). Although it is sometimes tions. Thus, one might expect a substantial accelera- pos si ble to make confident in silico predictions about tion in the discovery of novel causative mutations. In the disruptive potential of a ge ne tic variant, based on practice, even with these new molecular technologies, DNA sequence data alone, the ability to do this varies pro gress in uncovering susceptibility genes has been considerably for dif fer ent parts of the genome. We are relatively slow, and it is worth considering some of the on firmest ground when it comes to the genomic explanations for this. Crucially, the majority of cases of regions that directly code for proteins. For example, a language- related disorders involve multifactorial etiol- DNA sequence variant in a protein- coding gene may ogy (Graham & Fisher, 2015), resulting from the com- lead to substitution of one amino acid for another at a bined effects of several, perhaps many, ge ne tic risk key point of the encoded protein, and if that substi- factors of small effect (as will be discussed in section 2). tuted amino acid has dif fer ent chemical properties it Researchers rely on a certain amount of serendipity to can distort the shape of the protein, preventing it from successfully track down those families that have truly carry ing out its normal cellular functions. (This is the monogenic forms, in which the disorder can be fully kind of mutation that was found in FOXP2 in the explained by a single ge ne tic mutation with a large affected members of the KE family; Lai et al., 2001.) effect size, also known as high penetrance. A pedigree Other well- studied types of mutation lead to truncation may show what appears to be a simple inheritance pat- of the encoded protein, so that it is missing crucial parts, tern at the phenotypic level, but this surface level view or may even stop the protein from being expressed at will not necessarily be reflected in the under lying biol- all (MacDermot et al., 2005). Importantly, the protein- ogy. Hence, it is not uncommon to come up empty- coding regions (collectively known as the exome) make handed after searching for a high- penetrance ge ne tic up only 1% to 2% of the ge ne tic makeup of a person— mutation in a family of interest, despite using the fast- the vast majority of the genome comprises noncoding est and most power ful sequencing techniques. DNA. Variants in noncoding regions can also play etio- Moreover, whole genome sequencing has revealed logical roles in a disorder, by disturbing the way that that each human individual carries a surprisingly large expression of a gene is regulated (Devanna et al., 2018), number of rare sequence variants that could poten- but our understanding of such effects is lagging behind tially disrupt gene function. It can be challenging to that for coding variants. Thus, even as it becomes triv- distinguish causative mutations from more benign ial to obtain whole genome sequences from people sequence variants in the genomic background (Der- affected with language- related disorders, and to accu- iziotis & Fisher, 2017). When a rare disruptive variant rately identify all the rare DNA variants that they carry, cosegregates with a disorder in multiple affected rela- we still face difficulties in interpreting the functional tives, this finding is consistent with an etiological role, significance of the variants, and this will impede 610 Simon E.