L11: Alignments 5 Evolution: MEGA

Biochem 711 – 2008 1 L11: Alignments Evolution: MEGA Table of Contents Introduction............................................................................................. 3 Acknowledgements.................................................................................. 4 L11 Exercise A: Set up............................................................................. 4 1. Launch MEGA............................................................................................... 4 2. Retrieve Sequence ........................................................................................ 5 L11 Exercise B: BLAST and Align within MEGA ....................................... 6 1. Launch MEGA web browser........................................................................... 6 2. BLAST search within MEGA .......................................................................... 6 2.1. Paste sequence........................................................................................ 6 2.2. Select the database to be searched ............................................................ 6 2.3. Optimization algorithm: blastn................................................................... 7 2.4. Press BLAST ............................................................................................ 7 2.5. BLAST results .......................................................................................... 7 2.6. Selecting results for alignment................................................................... 9 3. Preparing the Alignment within MEGA......................................................... 11 3.1. Add first sequence to alignment ............................................................... 11 3.2. Add additional sequence to be aligned...................................................... 11 3.3. Save the current list ................................................................................ 12 4. Create the Alignment .................................................................................. 13 4.1. Algorithm: ClustalW................................................................................ 13 4.2. Perform the alignment ............................................................................ 14 4.3. Adjustments to the Aligned Sequences ..................................................... 15 4.4. Adjustments to the Alignment.................................................................. 16 L11 Exercise C: Calculate a Neighbor-Joining Tree ............................... 18 1. Open alignment file .................................................................................... 18 2. Activate Neighbor-Joining ........................................................................... 19 L11 Exercise D: Precision in Acquiring and Aligning Sequences............ 20 1. Acquiring Query Sequence .......................................................................... 21 2. BLAST within MEGA.................................................................................... 23 2.1. Set-up ................................................................................................... 23 2.2. BLAST results ........................................................................................ 24 3. Build the alignment list............................................................................... 31 3.1. Edit Sequence Names............................................................................. 32 3.2. Edit START codons ................................................................................. 33 4. Translate to Protein.................................................................................... 33 5. Set parameters and calculate protein alignment ......................................... 34 6. Alignment adjustments............................................................................... 35 Alignment & Evolution with MEGA ‐ 1 Biochem 711 – 2008 2 7. Export alignment in DNA and protein forms ................................................ 35 8. Eliminate duplicate sequences.................................................................... 35 9. Eliminate inadequate sequences ................................................................. 37 9.1. Biology and Structure of the protein .......................................................... 37 9.2. Remove sequence .................................................................................. 39 10. Estimate reliability of alignment with Average AA Identity .......................... 40 L11 Exercise E: Neighbor-Joining Phylogenetic Tree, Rooting ............... 42 1. Create Neighbor-Joining Tree...................................................................... 42 2. Estimating the reliability of a tree: Bootstraping.......................................... 43 3. Tree Rooting............................................................................................... 45 3.1. Finding an outgroup................................................................................ 46 3.2. Rooting the tree ..................................................................................... 46 L11 Exercise F: End of laboratory .......................................................... 48 Alignment & Evolution with MEGA ‐ 2 Biochem 711 – 2008 3 Introduction ✔ INFO http://en.wikipedia.org/wiki/ /Evolution, /Phylogenetics, /Phylogenetic_tree In biology, evolution refers to changes in the inherited traits of a population of organisms from one generation to the next. Genes that are passed on to an organism's offspring produce the inherited traits that are the basis of evolution. Phylogenetics is the study of evolutionary relatedness among various groups of organisms (e.g., species, populations). A phylogenetic tree, also called an evolutionary tree, is a tree showing the evolutionary relationships among various biological species that are believed to have a common ancestor. In a phylogenetic tree, each node with descendants represents the most recent common ancestor of the descendants, and the edge lengths in some trees correspond to time estimates. Each node is called a taxonomic unit. Taxonomy is the classification of organisms according to similarity. Although phylogenetic trees produced on the basis of sequenced genes or genomic data in different species can provide evolutionary insight, they have important limitations. They do not necessarily accurately represent the species evolutionary history. ✔ INFO http://en.wikipedia.org/wiki/Multiple_sequence_alignment A multiple sequence alignment is a sequence alignment of three or more biological sequences. In general, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor. From the resulting multiple sequence alignment, sequence homology can be inferred (homology refers to any similarity between characters that is due to their shared ancestry) and phylogenetic analysis can be conducted to assess the shared evolutionary origins amongst the sequences. In practical terms, a tree is constructed from a multiple alignment of homologous sequences. The quality of the alignment is the most influential factor for the calculated trees. In these exercises we will use the MEGA software that can retrieve sequences, create a multiple sequence alignment with the Clustal algorithm and calculate a tree with various methods. Alignment & Evolution with MEGA ‐ 3 Biochem 711 – 2008 4 Quoting from the web site http://www.megasoftware.net/ MEGA 4: Molecular Evolutionary Genetics Analysis MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses. References: Kumar S, Dudley J, Nei M & Tamura K (2008) MEGA: A biologist-centric software for evolutionary analysis of DNA and protein sequences. Briefings in Bioinformatics 9: 299-306. Tamura K, Dudley J, Nei M & Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24: 1596-1599. Acknowledgements This laboratory is loosely inspired from Barry G. Hall’s book “Phylogenetic Trees Made Easy: A How-to Manual, Third Edition” L11 Exercise A: Set up MEGA 4 has been tested on the following Microsoft Windows® operating systems: Windows 95/98, NT, 2000, XP, and Vista. If you are working from home you can download MEGA from http://www.megasoftware.net/ Your DMC computer should be running in Windows mode. If not ask for help. 1. Launch MEGA Alignment & Evolution with MEGA ‐ 4 Biochem 711 – 2008 5 ✔ TASK Launch MEGA with the menu cascade Start > All Programs > Mega 4 2. Retrieve Sequence ✔ INFO The first complete E. coli genome was announced Sept 5th 1997 in the journal Science1 as a milestone in complete genome elucidation. In the following example we will use the nuoL gene defined as "NADH:ubiquinone oxidoreductase, membrane subunit L” within the complete genome entry. Note: database searches now often find complete genomes. The reference to a particular gene is obtained by the “begin” and “end” values specifying a sequence “region.” For example, the nuoL gene is defined for the complete genome of K12 which has the accession value of CP000948: LOCUS CP000948 1842 bp DNA linear BCT 05-JUN-2008 DEFINITION Escherichia coli str. K12 substr. DH10B, complete genome. ACCESSION CP000948 REGION: 2482992..2484833 VERSION

L11: Alignments 5 Evolution: MEGA

To Find Information About Arabidopsis Genes Leonore Reiser1, Shabari

Homology & Alignment

Assembly Exercise

The Uniprot Knowledgebase BLAST

Multiple Alignments, Blast and Clustalw

BLAST Practice

Teachers Notes(.Pdf, 116.2

Bioinformatics: Analyzing DNA Sequence Using BLAST

Hmmer and Applications

Predicting Protein Structure and Function with Interpro

Sequence Based Function Annotation

Basic Local Alignment Search Tool (BLAST) Biochemistry