Phylogenetics of Freshwater Sculpin
Total Page:16
File Type:pdf, Size:1020Kb
Phylogenetics of Freshwater Sculpin Charlene Emerson Introduction. Reconciling observed genetic which are sequences derived from a common variation with evolutionary history is often a ancestor (Scotland 2010). Closely related daunting task. Phylogenetics is a method of individuals, populations, or species should study that is particularly well-suited to have fewer differences in the homologous addressing this task by combining the sequence when compared to the genes of measureable genetic variability of individuals, distantly related species (Lemey, Salemi & populations, or species with proposed Vandamme 2009). These differences in the evolutionary relationships and processes genetic sequence arise from the actions of (Barraclough & Nee 2001). Historically, several evolutionary mechanisms: mutation, phylogenetic trees have been generated using natural selection, genetic drift, and gene flow a single locus approach, where variation (Hartl 1981). Genomic differences arising within a single gene is used to create a from these mechanisms are often the basis for phylogenetic gene tree. However, this single phylogenetic analysis (Davis & Nixon 1992). gene method may fail to account for the full In the presence of these processes, frequency variation across a genome. of genetic alleles1 will vary and a population Shortcomings of the single locus approach will experience evolutionary change (Nei, are most apparent in complex organisms, such Maruyama & Chakraborty 1975). as the freshwater sculpin (genus Cottus). Mutations, or changes in genomic Freshwater sculpin exhibit highly ambiguous sequence which often occur spontaneously, morphology, as well as wide and overlapping are propagated or eliminated by the action of distributions – resulting in considerable natural selection. If a mutation is deleterious difficulty in species classification (Moyle to a species, natural selection causes it to be 2002). Analyzing multiple genetic loci in eliminated from the population. If the such species, instead of a single locus, may mutation is favorable, natural selection will provide a fuller picture of species variation. allow it to become established in a I propose to use freshwater sculpin as a population. However, a harmful mutation model to compare this modern multiple loci may propagate by the action of genetic drift. approach with the single locus approach. By Genetic drift is the changing of allelic comparing a species tree, generated by frequencies due to random occurrences. concatenating multiple nuclear markers, with Effects of genetic drift are more marked in the single gene trees, I hope to achieve greater small populations, since deleterious alleles understanding of evolutionary relationships can become more easily fixed within a within seven Eastern Pacific Cottus species, smaller population (Frankham 2005). Genetic as well as discerning the advantages and drift and natural selection are generally disadvantages of the single locus and multiple considered to increase genetic differences loci approaches. between populations, eventually leading to speciation2. Phylogenetics. Before comparing the relative merits of these different approaches, it is 1 important to understand the overall phylo- Alleles are different forms of a single gene. Different alleles may create different traits in the organism in genetic method. Phylogenetics compares the which they are carried. similarities and differences between 2 Speciation is the evolutionary process by which new homologous sequences of genetic material, species are formed. In contrast, gene flow is generally particular locus, analysis of individual considered to decrease the occurrence of sequences will create different trees for the speciation (Via 1999, Porter & Johnson same group of organisms (Avise 2000). For 2007). Gene flow is the transfer of genetic example, a gene for eye color will be selected alleles between populations of a species. upon very differently than a gene for limb Isolated populations experience decreased structure, generating different phylogenetic gene flow, because fewer individuals trees. By combining, or concatenating, immigrate to or emigrate from the population multiple loci into an overall species tree – to exchange genes. Over time, gene flow may rather than a single locus tree, the most cease and the isolated population could accurate representation of evolutionary become reproductively isolated from the rest relationships can be determined. of their species. The absence of gene flow is Generally, phylogenetic trees are considered to be a major mechanism of constructed using a cladistic approach, where evolution. Together, these four major it is assumed that members of the tree share a processes change the allele frequencies of a common evolutionary history (NCBI 2004). population, altering the relative relatedness of The cladistic approach groups members of the populations and species. Studying this tree by shared common ancestry, with evolutionary relatedness in the context of members of the tree gradually diverging into genetic data is one of the major applications individual groupings, or clades. Phylogenetic of phylogenetics. trees commonly take two forms: cladograms Phylogenetic Trees. The phylogenetic tree is a or phylograms. Cladograms only show the major tool of phylogenetic analysis. One of order of the branching changes of the the most basic methods of tree-making uses a homologous sequence, whereas in single homologous sequence – a single locus phylograms, the length of each branch of the tree, and compares this sequence across tree corresponds to the number of changes members of a group, within or between that have occurred in the sequence (Hall populations, or between closely related 2004, Lemey et al. 2009). species. These single locus trees can then be Determining Accuracy of the Phylogenetic used to determine the overall evolutionary Diagram. Phylogenetic tree diagrams can also relatedness between members of the sample vary according to the statistical analysis used group. However, an analysis that stops with to generate the tree. Differences in genomic examination of a single homologous sequence sequences can vary as a result of a number of would be extremely limited in scope. processes, making analyses of sequence Limiting analysis to a single locus tree will changes complex. Components of the only represent the events occurring at that sequence cannot always be assumed to gene locus, and may not necessarily represent change with equal likelihood. For example, the entire organism or species. genetic changes that result in the production Analysis of multiple genetic loci is vital of a similarly functioning protein would seem for a more complete understanding of more likely than changes that completely evolutionary relationships, especially when negate a protein’s function. considering that differing homologous Additionally, multiple changes may sequences are subjected to differing happen at a site, with no way of knowing the evolutionary pressures (Brito & Edwards total number of mutation occurrences. 2009). Because each homologous sequence Changes may also occur that later revert back changes differently, depending on to the original sequence, causing organisms to evolutionary pressures exerted on that seem more evolutionarily related than they construct the phylogenetic tree. Several actually are (Hall 2004, Lemey et al. 2009). different methods have been developed to When multiple changes occur at a nucleotide construct trees (Lemey et al. 2009, Rosenberg site, so that the sequence is no longer & Kumar 2001). Among these methods are informative about true evolutionary the Maximum Likelihood (ML) and Bayesian relationships, it is called substitution analyses. Both are considered to be “discrete saturation (Lemey et al. 2009). character” methods, where compared Substitution rates of nucleotide sites are homologous sequences are aligned – each modeled by several mathematical formulas, position is considered to be a “character” and with each model representing a different the nucleotide in that position is a “state” relative rate of change. Models are chosen for (Lemey et al. 2009). Character-states are a specific data set using statistical selection analyzed independently to determine software (Posada 2008). Among the most relatedness between samples. popular are the MODELTEST and Maximum Likelihood examines different JMODELTEST. Model tests analyze the possible tree formations and searches for the nucleotide sequences in the data set and select most likely tree, according to a particular the model of nucleotide substitution that best evolutionary model. Likelihood of possible fits the existing data. trees is calculated, according to an algorithm, The simplest model of nucleotide and the most likely tree is selected. A ML substitution is the Jukes-Cantor model, generated tree can be supported by the use of commonly called JC69 (Jukes & Cantor bootstrap resampling. Bootstrapping takes a 1969). JC69 assumes equal nucleotide base subsample of character-states and creates a frequencies and equal mutation rates for each tree based upon this subsample. The adenine, thymine, cytosine, or guanine3 in the bootstrapping process is replicated numerous genetic code. Models become more complex times, providing support for the final chosen as other assumptions are made about the base tree. In contrast, a Bayesian analysis does sequence and substitution rate. Felsenstein not search for a single best tree. Bayesian (1981) created a model (F81) which