T-Coffee Documentation Release Version 13.45.47.Aba98c5
Total Page:16
File Type:pdf, Size:1020Kb
T-Coffee Documentation Release Version_13.45.47.aba98c5 Cedric Notredame Aug 31, 2021 Contents 1 T-Coffee Installation 3 1.1 Installation................................................3 1.1.1 Unix/Linux Binaries......................................4 1.1.2 MacOS Binaries - Updated...................................4 1.1.3 Installation From Source/Binaries downloader (Mac OSX/Linux)...............4 1.2 Template based modes: PSI/TM-Coffee and Expresso.........................5 1.2.1 Why do I need BLAST with T-Coffee?.............................6 1.2.2 Using a BLAST local version on Unix.............................6 1.2.3 Using the EBI BLAST client..................................6 1.2.4 Using the NCBI BLAST client.................................7 1.2.5 Using another client.......................................7 1.3 Troubleshooting.............................................7 1.3.1 Third party packages......................................7 1.3.2 M-Coffee parameters......................................9 1.3.3 Structural modes (using PDB)................................. 10 1.3.4 R-Coffee associated packages................................. 10 2 Quick Start Regressive Algorithm 11 2.1 Introduction............................................... 11 2.2 Installation from source......................................... 12 2.3 Examples................................................. 12 2.3.1 Fast and accurate........................................ 12 2.3.2 Slower and more accurate.................................... 12 2.3.3 Very Fast............................................ 13 2.4 Regressive flags............................................. 13 3 Quick Start Unistrap 15 3.1 Introduction............................................... 15 3.2 Examples................................................. 16 3.2.1 Produce bootsrap replicates by shuffling sequences and guide tree sister nodes........ 16 3.2.2 Produce bootsrap replicates by shuffling the guide tree sister nodes.............. 16 3.2.3 Get list of supported methods.................................. 16 3.3 unistrap flags............................................... 17 4 Quick Start T-Coffee 19 4.1 Basic Command Lines (or modes).................................... 19 4.1.1 Protein sequences........................................ 19 i 4.1.2 DNA sequences......................................... 20 4.1.3 RNA sequences......................................... 20 4.2 Brief Overview of T-Coffee Tools.................................... 21 4.2.1 Alignment methods....................................... 21 4.2.1.1 T-Coffee........................................ 21 4.2.1.2 M-Coffee....................................... 22 4.2.1.3 Expresso........................................ 22 4.2.1.4 R-Coffee........................................ 23 4.2.1.5 Pro-Coffee....................................... 23 4.2.2 Evaluation tools......................................... 24 4.2.2.1 TCS (MSA evaluation based on consistency)..................... 24 4.2.2.2 iRMSD/APDB (MSA structural evaluation)..................... 24 4.2.2.3 STRIKE (single structure MSA evaluation)...................... 24 4.2.2.4 T-RMSD (structural clustering)............................ 25 4.3 Tutorial (Practical Examples)...................................... 25 4.3.1 Introduction........................................... 25 4.3.2 Materials............................................ 25 4.3.3 Procedures........................................... 26 5 T-Coffee Main Documentation 27 5.1 Before You Start. ............................................ 27 5.1.1 Foreword............................................ 27 5.1.2 Prerequisite for using T-Coffee................................. 27 5.1.3 Let’s have a try. ......................................... 28 5.2 What is T-Coffee ?............................................ 28 5.2.1 What is T-Coffee?........................................ 28 5.2.1.1 What does it do?.................................... 28 5.2.1.2 What can it align?................................... 28 5.2.1.3 How can I use it?................................... 29 5.2.1.4 Is there an online webserver?............................. 29 5.2.1.5 Is T-Coffee different from ClustalW?......................... 29 5.2.1.6 Is T-Coffee very accurate?............................... 29 5.2.2 What T-Coffee can and cannot do for you . .......................... 29 5.2.2.1 What T-Coffee can’t do................................ 29 5.2.2.2 What T-Coffee can do................................. 29 5.2.3 How does T-Coffee alignment works?............................. 30 5.3 Preparing Your Data........................................... 31 5.3.1 The reformatting utility: seq_reformat............................. 31 5.3.1.1 General introduction.................................. 31 5.3.1.2 Modification options.................................. 32 5.3.1.3 Using a “cache” file.................................. 32 5.3.1.3.1 What is a cache in T-Coffee?........................ 32 5.3.1.3.2 Preparing a sequence/alignment cache................... 32 5.3.1.3.3 Preparing a library cache.......................... 33 5.3.2 Modifying the format of your data............................... 34 5.3.2.1 Keeping/Protecting your sequence names....................... 34 5.3.2.2 Changing the sequence format............................ 34 5.3.2.3 Changing the case................................... 35 5.3.2.3.1 Changing the case of your sequences.................... 35 5.3.2.3.2 Changing the case of specific residues................... 35 5.3.2.3.3 Changing the case with a cache....................... 35 5.3.2.4 Coloring/Editing residues in an alignment...................... 36 5.3.2.4.1 Changing the default colors......................... 36 5.3.2.4.2 Coloring specific types of residues/nucleic acids.............. 36 ii 5.3.2.4.3 Coloring a specific residue of a specific sequence............. 36 5.3.2.4.4 Coloring according to the conservation................... 37 5.3.2.4.5 Coloring an alignment using a cache.................... 37 5.3.3 Modifying the data itself. .................................... 37 5.3.3.1 Modifiying sequences in your dataset......................... 37 5.3.3.1.1 Converting residues............................. 37 5.3.3.1.2 Extracting sequences according to a pattern................ 38 5.3.3.1.3 Extracting/Removing specific sequences by names............ 38 5.3.3.1.4 Extracting the most informative sequences................. 39 5.3.3.1.5 Extracting/Removing sequences with the % identity............ 39 5.3.3.1.6 Chaining important sequences....................... 41 5.3.3.2 Modifying columns/blocks in your dataset...................... 41 5.3.3.2.1 Removing gapped columns......................... 41 5.3.3.2.2 Extracting specific columns......................... 41 5.3.3.2.3 Extracting entire blocks........................... 42 5.3.3.2.4 Concatenating blocks or MSAs....................... 43 5.3.4 Manipulating DNA sequences................................. 43 5.3.4.1 Translating DNA sequences into protein sequences.................. 43 5.3.4.2 Back-translation with the bona fide DNA sequences................. 43 5.3.4.3 Finding the bona fide sequences for the back-translation............... 43 5.3.5 Manipulating RNA Sequences................................. 43 5.3.5.1 Producing a Stockholm output: adding predicted secondary structures....... 43 5.3.5.1.1 Producing/Adding a consensus structure.................. 43 5.3.5.1.2 Adding a precomputed consensus structure to an alignment........ 44 5.3.5.2 Analyzing a RNAalifold secondary structure prediction............... 44 5.3.5.2.1 Visualizing compensatory mutations.................... 44 5.3.5.2.2 Analyzing matching columns........................ 45 5.3.5.3 Comparing alternative folds.............................. 45 5.3.6 Phylogenetic Trees Manipulation................................ 45 5.3.6.1 Producing phylogenetic trees............................. 45 5.3.6.2 Comparing two phylogenetic trees.......................... 46 5.3.6.3 Scanning phylogenetic trees.............................. 47 5.3.6.4 Pruning phylogenetic trees.............................. 47 5.3.6.5 Tree Reformatting for MAFFT............................ 47 5.3.7 Manipulating structure files (PDB)............................... 48 5.3.7.1 Extracting a structure................................. 48 5.3.7.2 Adapting extract_from_pdb to your own environment................ 48 5.4 Aligning Your Sequences........................................ 49 5.4.1 General comments on alignments and aligners......................... 49 5.4.1.1 What is a good alignment?.............................. 49 5.4.1.2 The main methods and their scope.......................... 49 5.4.1.2.1 ClustalW is really everywhere. ....................... 49 5.4.1.2.2 MAFFT/MUSCLE to align big datasets.................. 50 5.4.1.2.3 T-Coffee/ProbCons, slow but accurate !!!.................. 50 5.4.1.3 Choosing the right package (without flipping a coin !)................ 50 5.4.2 Protein Sequences........................................ 51 5.4.2.1 General considerations................................ 51 5.4.2.2 Computing a simple MSA (default T-Coffee)..................... 52 5.4.2.3 Aligning multiple datasets/Combining multiple MSAs................ 52 5.4.2.4 Estimating the diversity in your alignment...................... 53 5.4.2.5 Comparing alternative alignments........................... 53 5.4.2.6 Comparing subsets of alternative alignments....................