Evolution of Genomic Structural Variation and Genomic Architecture in the Adaptive Radiations of African Cichlid fishes
Total Page:16
File Type:pdf, Size:1020Kb
ORIGINAL RESEARCH ARTICLE published: 03 June 2014 doi: 10.3389/fgene.2014.00163 Evolution of genomic structural variation and genomic architecture in the adaptive radiations of African cichlid fishes Shaohua Fan and Axel Meyer* Lehrstuhl für Zoologie und Evolutionsbiologie, Department of Biology, University of Konstanz, Konstanz, Germany Edited by: African cichlid fishes are an ideal system for studying explosive rates of speciation and Philine G. D. Feulner, Max Planck the origin of diversity in adaptive radiation. Within the last few million years, more than Institute for Evolutionary Biology, 2000 species have evolved in the Great Lakes of East Africa, the largest adaptive radiation Germany in vertebrates. These young species show spectacular diversity in their coloration, Reviewed by: Harald Schneider, Natural History morphology and behavior. However, little is known about the genomic basis of this Museum, UK astonishing diversity. Recently, five African cichlid genomes were sequenced, including Feng Zhang, Fudan University, China that of the Nile Tilapia (Oreochromis niloticus), a basal and only relatively moderately Arne W. Nolte, Max Planck Society, diversified lineage, and the genomes of four representative endemic species of the Germany adaptive radiations, Neolamprologus brichardi, Astatotilapia burtoni, Metriaclima zebra, *Correspondence: Axel Meyer, Lehrstuhl für Zoologie and Pundamila nyererei. Using the Tilapia genome as a reference genome, we generated und Evolutionsbiologie, Department a high-resolution genomic variation map, consisting of single nucleotide polymorphisms of Biology, University of Konstanz, (SNPs), short insertions and deletions (indels), inversions and deletions. In total, around Universitätstraße 10, Konstanz 18.8, 17.7, 17.0, and 17.0 million SNPs, 2.3, 2.2, 1.4, and 1.9 million indels, 262, 306, 78464, Germany e-mail: [email protected] 162, and 154 inversions, and 3509, 2705, 2710, and 2634 deletions were inferred to have evolved in N. brichardi, A. burtoni, P. nyererei,andM. zebra, respectively. Many of these variations affected the annotated gene regions in the genome. Different patterns of genetic variation were detected during the adaptive radiation of African cichlid fishes. For SNPs, the highest rate of evolution was detected in the common ancestor of N. brichardi, A. burtoni, P. nyererei,andM. zebra. However, for the evolution of inversions and deletions, we found that the rates at the terminal taxa are substantially higher than the rates at the ancestral lineages. The high-resolution map provides an ideal opportunity to understand the genomic bases of the adaptive radiation of African cichlid fishes. Keywords: SNPs, insertions, deletions, inversions, adaptive radiation, structural variation INTRODUCTION (Meyer, 1993; Meyer et al., 1993, 1995; Kuraku and Meyer, 2008; Cichlid fishes provide one of the most extreme examples for Henning and Meyer, in press). A large body of work, including adaptive radiations in vertebrates (Kocher, 2004; Salzburger and transcriptome sequencing (Salzburger et al., 2008; Lee et al., 2010; Meyer, 2004; Kuraku and Meyer, 2008; Henning and Meyer, Baldo et al., 2011; Gunter et al., 2013), BAC library construction in press). More than 2000 species have evolved during the last (Lang et al., 2006), candidate gene sequencing (Terai et al., 2002, few million years in the Great Lakes of East Africa (Meyer, 2006; Hofmann et al., 2009; Fan et al., 2011), and microarrays 1993; Salzburger et al., 2005; Elmer et al., 2009). With >250 (Gunter et al., 2011; Loh et al., 2013), has been conducted in an endemic species in Lake Tanganyika, >800 species in Lake Malawi effort to study the molecular basis of the adaptive radiation of and >500 species in Lake Victoria, these are the largest adap- African cichlid fishes (reviewed by Fan et al., 2012). tive radiations in vertebrates (Meyer et al., 1990; Stiassny and Recently, the cichlid genome consortium sequenced Meyer, 1999; Kocher, 2004; Henning and Meyer, in press). Lake five African cichlid genomes, including the Nile Tilapia Tanganyika is maximally 7–8 million years old, and is the oldest (Oreochromis niloticus), Neolamprologus brichardi (endemic to of these lakes (Sturmbauer et al., 1994). Lake Malawi is younger Lake Tanganyika), Astatotilapia burtoni (lives in and around Lake at about 2–4 Million years and the current form of Lake Victoria Tanganyika), Metriaclima zebra (endemic to Lake Malawi), and is probably less than 100,000 years old. These young species, Pundamila nyererei (endemic to Lake Victoria) (Brawand et al., particularly of Lakes Malawi and Victoria are both extremely submitted). The analysis of these five African cichlid genomes young, yet spectacularly diverse in morphology, coloration and shows that their rapid evolution is associated not with one, behavior. Since most species of Lakes Malawi and Victoria can but with multiple mechanisms, including an excess of gene be hybridized in the laboratory for forward genetic studies, we duplications, transposable element expansions, fast evolution referred to them to be “natural mutants” that can be used for of conserved non-coding elements, and the evolution of novel studying genomic diversification by natural and sexual selection micro RNAs (Brawand et al., submitted). However, genomic www.frontiersin.org June 2014 | Volume 5 | Article 163 | 1 Fan and Meyer Genome evolution in African cichlid fish work on cichlid diversification has lacked so far a compre- to exclude reads less than 50 bp in length with parameters -q hensive comparative analysis of large-scale genomic variation. 20, -l 50. Ordered by size, genomic variation can take on the form of We processed the reads from the paired-end and mate-paired single nucleotide polymorphisms (SNPs), short insertions and libraries separately. The overlapping paired-end reads (insertion deletions (indels), and larger structural variation (SVs, normally size: 180 bp, sequencing length: 100 bp) were first trimmed and > 50 bp). Furthermore, SVs can be classified as insertions, weonlykeptthefirstandthelast50bpinthefirstandsecond inversions, deletions, duplications and translocations (Alkan read using fastx_trimmer of the Fastx toolkit (version 0.0.13) et al., 2011a). The advent of next generation sequencing (NGS) (http://hannonlab.cshl.edu/fastx_toolkit/). For the mate-paired technologies has revolutionized the study of genomic variation libraries (insertion size 3000 bp, sequencing length: 36 bp), we (Alkan et al., 2011a), as high-density maps were generated for first mapped the raw reads to their corresponding genomes using various model systems. Such maps, reveal the presence of a wide BWA (version 0.7.3a-r367) (Li and Durbin, 2009)andexcluded spectrum of variation and are of important for gaining a deeper the read pairs that are facing each other, as these reads could understanding of phenotypic diversification and speciation be potential contaminations of paired-end reads in the mate- from a genomic perspective (Quinlan et al., 2010; Elmer and paired libraries. The remaining mate-paired reads were reverse- Meyer, 2011; Mills et al., 2011b; Zhan et al., 2011; Jones et al., complemented, therefore the orientation of the mate pair reads 2012b; Feulner et al., 2013; Zichner et al., 2013). For example, are as same as the paired-end reads, to fit the requirements of the recent studies have shown that adaptive evolution in three spine software in the downstream analyses. stickleback is associated with the reuse of standing variations, The filtered paired-end and mate-paired reads from Tilapia, but also with SVs such as chromosomal inversion and deletions N. bricharid, A. burtoni, P. nyererei,andM. zebra genomes were (Chan et al., 2010; Jones et al., 2012a,b; Feulner et al., 2013). mapped to the anchored Tilapia genome using Burrows-Wheeler In this study, using the recently sequenced five African cich- Aligner (BWA) with the default parameters (version 0.7.3a-r367) lid genomes, we investigated patterns of genomic variation that (Li and Durbin, 2009). Although mapping short reads against accompany the adaptive radiation of African cichlids. The aims a relative distantly related (around 4% sequence divergence in of the present study were 3-fold: first, we characterized the preva- coding regions) outgroup can be a challenge for BWA, the ref- lence and locations of the genomic variation in these five African erence genome is equidistant to N. bricharid, A. burtoni, P. nyer- cichlid genomes. Second, we analyzed the variation between these erei,andM. zebra, thus would not bias the placement of the genomes in a phylogenetic context, which enabled us to gain a reads and not affect the downstream analyses. The raw mapping deeper understanding of questions such as: when did this varia- results were converted to BAM format and ambiguously mapped tion originate? How do processes such as natural selection operate reads removed by requiring a mapping quality ≥ 20 using on different types of variation? Third, we inferred the poten- Samtools (version 0.1.19-44428cd) (Li et al., 2009). Duplicated tial functional impact of this variation using gene annotation read pairs were removed using the MarkDuplicates in the Picard information. These analyses of the genomic variation will not toolkit (version 1.92) (http://picard.sourceforge.net/). The fil- only enable us to assess the impact of the genomic differentia- tered bam files from the former steps were utilized for variation tion on the functional portions of the genome, but also elucidate detection.