375Focus: Illuminating Science Years of Science at Harvard

From Datasets to Diseases Pardis Sabeti’s search for signals of selection in the human genome

By Elizabeth Byrne

ight years ago, scientists accom- genetically. allele in a population, people within Eplished a huge feat that would As Sabeti explains, “Everything we do the population are genotyped and the change the face of biology by offer- is based on the very simple and elegant percentage of people who have the ing a key to unlock an understanding principle of laid out by mutant versus the wildtype allele is RI KXPDQOLIH)RUWKHÀUVWWLPHWKH Darwin and Wallace – if a trait emerges calculated. However, the method of human genome was sequenced. Fol- that enhances the survival or reproduc- testing whether a is new is a lowing the completion of the Human tive success of its carriers, it will spread PRUHGLIÀFXOWSURSRVLWLRQ Genome Project in 2003, the genomes through the population.” The basic idea One intuitive way to determine of thousands of people around the behind testing for natural selection relies whether an allele has been selected for is world were analyzed (8). Given the on the principle of looking for muta- to identify that are distributed complexity of the human genetic code, tions in the human genome that are new, disproportionately among populations. the daunting task of assimilating a large but common. Sabeti looks for a newly These mutations can be pinpointed by dataset buried in the sequence remained arising variant, or “derived allele,” that looking for alleles that have a very high a formidable challenge. FRQIHUVVHOHFWLYHEHQHÀWDGHULYHGDOOHOH frequency in one population compared Given these sequences of A, T, C is one that occurs through a mutation to others (5). This phenomenon may and G bases, Pardis Sabeti and her lab event and is different from the wildtype signal some kind of environment- realized that they could identify loci in allele that had been most prevalent in the VSHFLÀFVHOHFWLYHSUHVVXUHDFWLQJRQRQO\ the human genome where natural selec- population. If the derived allele causes one population. There are, however, tion has acted on human populations. an individual to have a selective advan- VSHFLÀFFOXHVRIVHOHFWLRQFRQWDLQHG Sabeti, now an Assistant Professor in tage, that individual should have more within the genome itself, and uncovering Organismic and offspring and will pass along the allele these clues is Sabeti’s focus. at Harvard and an associate member to at least some of their children. The Sabeti employed a basic idea in biol- of the Broad Institute of Harvard and children, in turn, will have a selective ad- ogy, that of homologous recombination MIT, has devised algorithms that scan vantage and pass the allele along to their during meiosis. In the splitting of the the genome for signals of positive selec- children. In this way, an advantageous genetic material to form haploid sperm tion based on fundamental biological allele will be widespread in a population and egg cells, the two homologous concepts (3). The results have suggested to an extent that is disproportionate to undergo crossing over that disease and environmental pres- how long it has been around (5). to exchange information, a process that

sures have shaped human populations To test the prevalence of a derived is responsible for the credit: http://www.nlm.nih.gov/medlineplus/ency/imagepages/1494.htm

fall 2011 ‡ Harvard Science Review 27 375Focus: Illuminating Science Years of Science at Harvard between siblings. If two loci on a are closer together, it is less likely that the crossover will occur between those loci, and these loci are said to be linked. If they are further apart, it is more likely that they will get split up during recombination. If WZRORFLDUHVXIÀFLHQWO\IDUDSDUWRUDUH on different chromosomes altogether, they will not be linked during crossing over (5). In examining evidence for the recent origin of an advantageous mu- tation, one can look at other variants nearby on the chromosome. When Figure 1.8QGHUQDWXUDOVHOHFWLRQDQHZEHQHÀFLDOPXWDWLRQZLOOULVHLQIUHTXHQF\ SUHYDOHQFH  the mutation of interest is passed on in a population. A schematic of polymorphisms along a chromosome, including the selected to the next generation, the children will DOOHOHEHIRUHDQGDIWHUVHOHFWLRQ$VDQHZSRVLWLYHO\VHOHFWHGDOOHOHULVHVWRKLJKIUHTXHQF\ QHDUE\OLQNHGDOOHOHVRQWKHFKURPRVRPH¶KLWFKKLNH·DORQJZLWKLWWRKLJKIUHTXHQF\FUHDWLQJD DOVRLQKHULWWKHÁDQNLQJDOOHOHV $XVH- ‘selective sweep.’ ful class of variants to study are single polymorphisms, SNPs – pro- composite of multiple signals (CMS), International HapMap project, which nounced “snips”). The pattern of alleles which can further zoom in on a region found allelic variants within human FRQVWLWXWHVDVRUWRIÀQJHUSULQWZKLFK that contains a signal of selection (4). A populations by publishing the alleles of is referred to as a . Over many long haplotype covers many alleles that individuals across the globe (1, 2). generations, as homologous recom- could be the causative agents of natural After only a few years, this research bination happens in each generation, selection, so CMS helps focus on which has produced results that suggest the the length of the haplotype decreases VSHFLÀFDOOHOHVDUHJRRGFDQGLGDWHVIRU potential power of using computational because, over a longer period of time, being the real target of selection in the biology to understand the of there is a greater chance that crossing selective sweep. human populations. The Sabeti lab is over will happen Sabeti’s work comes at a key moment using signals hidden in the code of the within the origi- “Everything we do is based on in which genet- genome to go on to study the interplay ics and compu- between humans and their environment, nal haplotype the very simple and elegant block. Thus, the tational biology particularly within the realm of infec- age of a mutation principle of natural selection are intersecting. tious diseases. The lab’s tasks, however, can be investi- laid out by Darwin and “The genomic age do not end with elucidating signals of gated by look- Wallace – if a trait emerges is a tremendously VHOHFWLRQLQWKHJHQRPH6DEHWLLVQRZ ing at the length that enhances the survival exciting time for following up on alleles that may have of the haplotype researchers like had an important impact on human or reproductive success of myself,” Sabeti HYROXWLRQDQGLVWU\LQJWRÀJXUHRXWWKH as compared be- its carriers, it will spread tween individuals says. “With mil- story of selection behind them. An with the muta- through the population.” lions and billions especially intriguing direction of inves- tion. To find of datapoints to tigation for the lab is elucidating the a new and common derived allele, mine for patterns, computational biolo- mark that the battle between infectious therefore, the goal is to identify a long gists are like kids in a candy store with diseases and human populations has haplotype that surrounds the allele in a the endless possibilities to pursue.” left on the human genome. “Infectious large proportion of individuals in the The application of biologically-inspired diseases are some of the most intriguing population (5). computational approaches to massive evolutionary pressures on humans,” Sa- With this theory in mind, the Sabeti datasets of human genetic information beti thinks, since “[t]hey have had such lab has developed a variety of algo- has yielded exciting signals of selec- a tremendous impact on our evolution, rithms, such as the long range haplotype tion in the human genome. Following and are themselves evolving over time (LRH) and the cross population-extend- the initial sequencing of the human in a continual arms race for survival.” ed haplotype homozygosity (XP-EHH) genome, more and more data was Some of the areas in the genome that assay to test for haplotype length (6). generated from endeavors such as the were hits for the test of natural selection The lab then compiled the tests into the Thousand Genomes Project and the did not come as a surprise. One of the credit: http://www.nature.com/scitable/topicpage/evolutionary-adaptation-in-the-human-lineage-12397

28 Harvard Science Review ‡ fall 2011 375Focus: Illuminating Science Years of Science at Harvard hair follicles, sweat glands, and teeth. The mutation may cause individuals with that allele to have better thermo- regulation to be acclimated to their environment (6). To understand exactly how these genes, and others, have functioned to shape human evolution, the lab is following up on each mutation to as- sess its function. Investigations are FXUUHQWO\DOVREHLQJGRQHLQVSHFLÀF regions where the gene’s activity has been selected for, including Nigeria for LARGE and China for EDAR. In just a few years, the lab has trans- formed the way we understand natural selection. Through this novel approach, we can now rewind and retrace the tape of human history to understand what Figure 2. Virions that cause Lassa fever virus, a hemorrhagic fever that is endemic to West Africa and may have shaped human evolution in the gene LARGE that Sabeti has found in forces have shaped our populations. her tests for natural selection in the human genome. Moreover, we can understand how our bodies have naturally evolved to deal most widely known examples of human its role in human evolution. The com- with the pressures of disease, potentially evolution is a mutation in the Hemoglo- putational biology problem of develop- through understanding LARGE, or a bin B (HBB) gene, which causes sickle ing algorithms turned into a pursuit of challenging environment, as seen in the cell anemia when an individual has two loci that, based on the characteristics of case of EDAR. FRSLHVRIWKHPXWDWHGJHQHKRZHYHU being new, common, and differentiated if the individual is heterozygous, that between populations, seemed to be un- —Elizabeth Byrne ‘14 is a Human Develop- is, has one wild- der positive natural “Infectious diseases are mental and Regenerative Biology Concentrator type and one selection in the hu- in Leverett House. mutated copy, some of the most intriguing man genome. The he or she will be evolutionary pressures on lab is now investi- resistant to ma- gating two genes in References humans. They have had 1. “About the 1000 Genomes Project.” 1000 Genomes: laria. Thus, as particular, LARGE A Deep Catalog of Human Genetic Variation. 1000 such a tremendous impact Genomes, 2008-2010. Web. 25 Oct. 2011. . on our evolution, and are 2. “About the International HapMap Project.” Interna- ed, in popula- seem to have been tional HapMap Project. International HapMap Project, tions from areas themselves evolving over integral in shaping 26 Oct. 2006. Web. 25 Oct. 2011. . in which malaria time in a continual arms race the populations of 3. Davis, Nicole. “Tracking Our Traits: New Approach is endemic, the West Africa and 6SHFLÀHV(YROXWLRQ·V)RRWSULQWVLQWKH+XPDQ*HQRPHµ for survival.” Harvard Gazette, 7 Jan. 2010. Web. 25 Oct. 2011. region of the ge- East Asia, respec- . nome contain- tively (6). *URVVPDQ65,6K\ODNKWHUHWDO  ´$FRP- ing this mutation is a potential target of Based on these tests, LARGE is one posite of multiple signals distinguishes causal variants LQ UHJLRQV RI SRVLWLYH VHOHFWLRQµ 6FLHQFH    selection based on Sabeti’s algorithms gene that seems to be under strong 883-886. 6DEHWL3&6)6FKDIIQHUHWDO  ´3RVL- (7). positive natural selection in some Afri- tive natural selection in the human lineage.” Science Other regions in which a source of can populations. Based on preliminary    6DEHWL3&39DULOO\HWDO  ´*HQRPHZLGH selective advantage is unknown also functional characterization of the gene detection and characterization of positive selection in KXPDQSRSXODWLRQVµ1DWXUH   seemed to exhibit characteristics of so far, Sabeti and her team hypothesize 6DEHWL3&  ´1DWXUDO6HOHFWLRQ8QFRYHULQJ selection. The story of selection is still that LARGE is involved in resistance Mechanisms of Evolutionary Adaptation to Infectious 'LVHDVHµ1DWXUH(GXFDWLRQ   a mystery in these regions, and the lab to Lassa Fever Virus, a hemorrhagic 8. U.S. Department of Energy Genome Program’s Biological and Environmental Research Information is currently addressing the challenges of fever endemic to West Africa. EDAR, 6\VWHP %(5,6 ´+XPDQ*HQRPH3URMHFW,QIRUPDWLRQµ ÀJXULQJRXWZKLFKSDUWLFXODUPXWDWLRQLV another gene that appears to be under Genomics.energy.gov. U.S. Department of Energy Of- ÀFHRI6FLHQFH2IÀFHRI%LRORJLFDODQG(QYLURQPHQWDO causing the signal of selection and trac- selection in China and other areas of Research, Human Genome Program, 25 July 2011. Web. 25 Oct. 2011. . FUHGLWKWWSHQZLNLSHGLDRUJZLNL)LOH/DVVDBYLUXV-3*

fall 2011 ‡ Harvard Science Review 29