Multiple Sequence Alignments: Clustal Omega 1 2 3 4

08/12/2021 Multiple Sequence Alignments & Building Phylogenetic Trees • Basic BLAST: input query and search database for homologous sequences and see pair-wise matches (query/subject) Multiple Sequence Alignments: • Clustal Omega However, what if you want to specify the sequences to align? 1) Two sequences: NCBI BLAST — (bl2seq) 2) Three or more sequences: Clustal Omega Susan Parrish 3) Building trees from multiple sequence alignments: Clustal McDaniel College Simple Phylogeny Last Update: 08/2021 1 2 Clustal: Creating a Multiple Sequence Alignment Consensus Nucleotide Sequences • You SELECT sequences to align to identified conserved • A consensus nucleotide sequence is derived by making a nucleotides or amino acids multiple sequence alignment and calculating the most represented nucleotide at each position • What conserved sequences might you identify in a multiple • Provides insight into the functional regions of a given sequence alignment of DNA sequences? sequence (more important for function = conservation through evolution) • What conserved sequences might you identify in a multiple sequence alignment of amino acid sequences? • Example: If a DNA binding site is necessary to recruit a specific protein, that DNA sequence could be conserved during evolution • How could you use multiple sequence alignments to build phylogenetic trees? 3 4 Consensus Nucleotide Sequences: Prokaryotic -10 & -35 Promoter Consensus Promoters Sequence • The prokaryotic promoter consensus sequence • Determine consensus sequence found in promoter for RNA polymerase recruitment by multiple sequence alignments of • Closer to this sequence, more easily recognized by regions upstream of genes prokaryotic RNA polymerase • Closer a promoter is to consensus sequence, stronger the promoter in driving expression of gene (better at recruiting RNA polymerase) Sense • Individual promoters usually differ from the consensus at one or more positions 5 6 1 08/12/2021 Prokaryotic -10 & -35 Promoter Consensus A Eukaryotic Consensus Sequence Sequence Deep Thoughts: Any idea what this consensus sequence controls? 7 8 Proteins: Conserved Domains Proteins: Conserved Domains • Can find conserved domains in proteins by performing multiple sequence alignments of amino acid sequences • If a region of a protein has a particular function that is important (e.g., DNA binding domain/ protein-protein interaction domain/ enzymatic active site) will see conservation of amino acids within that domain • Conserved domains tend to fold independently of other parts of protein (therefore structure of domain conserved) • Appears that conserved domains have been “shuffled” during evolution to create new proteins 9 10 Many Proteins have Multiple Conserved Clustal Omega: Creating a Multiple Sequence Domains Alignment • Select sequences to align: • from NCBI or other sequence databases (e.g., FlyBase) • Nucleotide or amino acid • Obtain FASTA sequences and paste into text document • Click on the link for the EMBL-EBI Clustal Omega page: • https://www.ebi.ac.uk/Tools/msa/clustalo/ • Copy and paste selected FASTA formatted sequences into Clustal Omega window 11 12 2 08/12/2021 Clustal Omega: Clustal Omega: Creating the Initial Alignment Creating a Multiple Sequence Alignment 13 14 Generating the Sequence Files Generating the Sequence Files • Create a new text file • Prudent to save the text file after you add each – Using WordPad in MS Windows, TextEdit in macOS sequence to the file (in case of malfunction) • For each sequence you like to align: • After you have created the sequence file: – Retrieve the sequence in FASTA format – Select all the FASTA Formatted sequences in the – Copy and paste it into the text File, then hit return text document – Save the text File – Copy and paste the sequences into the “Enter your input sequences” text box in the Clustal Omega • Repeat the previous step until the text file window includes all the sequences you like to align 15 16 Clustal Omega: Enter Your Input Sequences Clustal Omega: The Alignment Report 17 18 3 08/12/2021 Clustal Omega: The Alignment Report Clustal Omega: The Alignment Report • Consensus symbols For protein alignments Symbol Description * Position with a fully conserved residue : Conservation between groups of strongly similar properties . Conservation between groups of weakly similar properties Position with residues that have different properties • Consensus symbols For nucleotide alignments Symbol Description * Position with fully conserved nucleotide Position with different nucleotides Ignore the “:” and “.” consensus symbols for nucleotide alignments 19 20 Clustal Omega: The Alignment Report 21 22 Simple Phylogeny: Making a Phylogenetic Tree Simple Phylogeny: Making a Phylogenetic Tree • Simple Phylogeny can only assemble a phylogenetic tree from multiple sequence alignments generated by Clustal Omega 23 24 4 08/12/2021 PHYLIP: Phylogeny Inference Package Simple Phylogeny: Making a Phylogenetic Tree 25 26 Neighbor Joining Neighbor Joining Yang Z and Rannala B. Nat Rev Genet. 2012 Mar 28;13(5):303-14. 27 28 Simple Phylogeny: Making a Phylogenetic Tree Simple Phylogeny: Making a Phylogenetic Tree CHANGE to “on”! 29 30 5 08/12/2021 Simple Phylogeny: Making a Phylogenetic Tree • A Cladogram is a branching diagram (tree) assumed to be an estimate of a phylogeny where the branches are of equal length, thus cladograms show common ancestry, but do not indicate the amount of evolutionary "time" separating taxa. 31 32 Simple Phylogeny: Making a Phylogenetic Tree Distance Value= # of substitutions as a proportion of the alignment, excluding gaps 33 6.

Multiple Sequence Alignments: Clustal Omega 1 2 3 4

T-Coffee Documentation Release Version 13.45.47.Aba98c5

Comparative Analysis of Multiple Sequence Alignment Tools

How to Generate a Publication-Quality Multiple Sequence Alignment (Thomas Weimbs, University of California Santa Barbara, 11/2012)

DNA Sequencing and Sorting: Identifying Genetic Variations

To Find Information About Arabidopsis Genes Leonore Reiser1, Shabari

"Phylogenetic Analysis of Protein Sequence Data Using The

Performance Evaluation of Leading Protein Multiple Sequence Alignment Methods

HMMER User's Guide

Homology & Alignment

Assembly Exercise

Introduction to Linux for Bioinformatics – Part II Paul Stothard, 2006-09-20

Syntax Highlighting for Computational Biology Artem Babaian1†, Anicet Ebou2, Alyssa Fegen3, Ho Yin (Jeffrey) Kam4, German E