DAMBE Manual

Total Page:16

File Type:pdf, Size:1020Kb

DAMBE Manual The DAMBE Manual Revise alignment (half done) Xuhua Xia, 2014 2 CONTENTS The DAMBE Manual.................................................................................................................................... 1 Instruction for students ................................................................................................................................ 5 Why bioinformatics? ................................................................................................................................. 5 Why DAMBE? .......................................................................................................................................... 5 Answers all questions at end of each lab ................................................................................................... 6 Acknowledgement ..................................................................................................................................... 6 Lab 1 sequence databases and string matchings ........................................................................................ 7 Summary ................................................................................................................................................... 7 Data resource and software tools............................................................................................................... 7 Representative database and web interface: NCBI, GenBank, and Entrez ........................................... 7 Annotated sequences ............................................................................................................................ 8 Sequence formats .................................................................................................................................. 8 Ambiguous codes, and gap symbols in sequences ................................................................................ 9 Why do we need to extract sequence elements? ................................................................................... 9 Retrieve the KAL153 genome with Entrez ..............................................................................................10 Extracting coding sequences from a GenBank file ..................................................................................10 Computing nucleotide frequencies ...........................................................................................................11 Computing Karlin-Altschul parameters ...................................................................................................12 Which subtype does KAL153 belong to? Is it recombinant? ...................................................................13 Subtypes in HIV-1 M group ................................................................................................................13 Retrieve reference subtype sequences .................................................................................................13 Use BLAST to identify subtypes .........................................................................................................13 Lecture questions: ....................................................................................................................................16 Lab questions: ..........................................................................................................................................16 Lab 2 Making sense of genomes: Position Weight Matrix (PWM) .........................................................17 Introduction ..............................................................................................................................................17 Two approaches to understand the meaning of a nucleotide sequence ................................................17 Extraction of annotated gene features from GenBank files .................................................................17 Position weight matrix .........................................................................................................................18 Objectives .................................................................................................................................................18 Procedures ................................................................................................................................................18 A brief peek into a GenBank file .........................................................................................................18 Extracting annotated sequence elements with DAMBE ......................................................................19 Characterize 5’ and 3’ splice sites with position weight matrix (PWM) .............................................19 Scan sequences for splice site signals ..................................................................................................22 Limitations of PWM ............................................................................................................................22 More questions .........................................................................................................................................23 Lab 3 Gibbs Sampler and Yeast Intron Properties ..................................................................................24 Introduction ..............................................................................................................................................24 Genetic switches ..................................................................................................................................24 Gibbs sampler and its application in molecular biology ......................................................................24 Identifying genetic motifs with Gibbs sampler ....................................................................................24 Objectives .................................................................................................................................................25 Procedures ................................................................................................................................................26 Copy column-based output from DAMBE to EXCEL ........................................................................26 Running Gibbs sampler in DAMBE ....................................................................................................27 Studying S1 and S2 distances ..............................................................................................................29 More questions .........................................................................................................................................30 3 Lab 5 Codon Usage Bias ............................................................................................................................. 32 Introduction ............................................................................................................................................. 32 Codon usage bias ................................................................................................................................ 32 Codon usage bias and tRNA abundance ............................................................................................. 36 Objectives ................................................................................................................................................ 37 Use RSCU and CAI to characterize codon usage ............................................................................... 37 Understand the relationship between tRNA abundance and codon usage .......................................... 37 Procedures ............................................................................................................................................... 37 Computing CAI and RSCU ................................................................................................................ 37 Identifying tRNA anticodon ............................................................................................................... 41 More questions ........................................................................................................................................ 43 Lab 6 RNA secondary structure, minimum folding energy, and IRES.................................................. 44 Introduction ............................................................................................................................................. 44 RNA secondary structure .................................................................................................................... 44 Minimum folding energy (MFE) ........................................................................................................ 44 5’ UTR secondary structure and translation economy ........................................................................ 44 Internal ribosomal entry site ............................................................................................................... 45 Objectives ................................................................................................................................................ 45 Procedures ..............................................................................................................................................
Recommended publications
  • Treebase: an R Package for Discovery, Access and Manipulation of Online Phylogenies
    Methods in Ecology and Evolution 2012, 3, 1060–1066 doi: 10.1111/j.2041-210X.2012.00247.x APPLICATION Treebase: an R package for discovery, access and manipulation of online phylogenies Carl Boettiger1* and Duncan Temple Lang2 1Center for Population Biology, University of California, Davis, CA,95616,USA; and 2Department of Statistics, University of California, Davis, CA,95616,USA Summary 1. The TreeBASE portal is an important and rapidly growing repository of phylogenetic data. The R statistical environment has also become a primary tool for applied phylogenetic analyses across a range of questions, from comparative evolution to community ecology to conservation planning. 2. We have developed treebase, an open-source software package (freely available from http://cran.r-project. org/web/packages/treebase) for the R programming environment, providing simplified, programmatic and inter- active access to phylogenetic data in the TreeBASE repository. 3. We illustrate how this package creates a bridge between the TreeBASE repository and the rapidly growing collection of R packages for phylogenetics that can reduce barriers to discovery and integration across phyloge- netic research. 4. We show how the treebase package can be used to facilitate replication of previous studies and testing of methods and hypotheses across a large sample of phylogenies, which may help make such important reproduc- ibility practices more common. Key-words: application programming interface, database, programmatic, R, software, TreeBASE, workflow include, but are not limited to, ancestral state reconstruction Introduction (Paradis 2004; Butler & King 2004), diversification analysis Applications that use phylogenetic information as part of their (Paradis 2004; Rabosky 2006; Harmon et al.
    [Show full text]
  • Regulatory Motifs of DNA
    Lecture 3: PWMs and other models for regulatory elements in DNA -Sequence motifs -Position weight matrices (PWMs) -Learning PWMs combinatorially, or probabilistically -Learning from an alignment -Ab initio learning: Baum-Welch & Gibbs sampling -Visualization of PWMs as sequence logos -Search methods for PWM occurrences -Cis-regulatory modules Regulatory motifs of DNA Measuring gene activity (’expression’) with a microarray 1 Jointly regulated gene sets Cluster of co-expressed genes, apply pattern discovery in regulatory regions 600 basepairs Expression profiles Retrieve Upstream regions Genes with similar expression pattern Pattern over-represented in cluster Find a hidden motif 2 Find a hidden motif Find a hidden motif (cont.) Multiple local alignment! Smith-Waterman?? 3 Motif? • Definition: Motif is a pattern that occurs (unexpectedly) often in a string or in a set of strings • The problem of finding repetitions is similar cgccgagtgacagagacgctaatcagg ctgtgttctcaggatgcgtaccgagtg ggagacagcagcacgaccagcggtggc agagacccttgcagacatcaagctctt tgggaacaagtggagcaccgatgatgt acagccgatcaatgacatttccctaat gcaggattacattgcagtgcccaagga gaagtatgccaagtaatacctccctca cagtg... 4 Longest repeat? Example: A 50 million bases long fragment of human genome. We found (using a suffix-tree) the following repetitive sequence of length 2559, with one occurrence at 28395980, and other at 28401554r ttagggtacatgtgcacaacgtgcaggtttgttacatatgtatacacgtgccatgatggtgtgctgcacccattaactcgtcatttagcgttaggtatatctcc gaatgctatccctcccccctccccccaccccacaacagtccccggtgtgtgatgttccccttcctgtgtccatgtgttctcattgttcaattcccacctatgagt
    [Show full text]
  • Motif Discovery in Sequential Data Kyle L. Jensen
    Motif discovery in sequential data by Kyle L. Jensen Submitted to the Department of Chemical Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY May © Kyle L. Jensen, . Author.............................................................. Department of Chemical Engineering April , Certified by . Gregory N. Stephanopoulos Bayer Professor of Chemical Engineering esis Supervisor Accepted by . William M. Deen Chairman, Department Committee on Graduate Students Motif discovery in sequential data by Kyle L. Jensen Submitted to the Department of Chemical Engineering on April , , in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract In this thesis, I discuss the application and development of methods for the automated discovery of motifs in sequential data. ese data include DNA sequences, protein sequences, and real– valued sequential data such as protein structures and timeseries of arbitrary dimension. As more genomes are sequenced and annotated, the need for automated, computational methods for analyzing biological data is increasing rapidly. In broad terms, the goal of this thesis is to treat sequential data sets as unknown languages and to develop tools for interpreting an understanding these languages. e first chapter of this thesis is an introduction to the fundamentals of motif discovery, which establishes a common mode of thought and vocabulary for the subsequent chapters. One of the central themes of this work is the use of grammatical models, which are more commonly associated with the field of computational linguistics. In the second chapter, I use grammatical models to design novel antimicrobial peptides (AmPs). AmPs are small proteins used by the innate immune system to combat bacterial infection in multicellular eukaryotes.
    [Show full text]
  • A General Pairwise Interaction Model Provides an Accurate Description of in Vivo Transcription Factor Binding Sites
    A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites Marc Santolini, Thierry Mora, Vincent Hakim* Laboratoire de Physique Statistique, CNRS, Universite´ P. et M. Curie, Universite´ D. Diderot, E´cole Normale Supe´rieure, Paris, France Abstract The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair contributes independently to the transcription factor (TF) binding. However, this description ignores correlations between nucleotides at different positions, and is generally inaccurate: analysing fly and mouse in vivo ChIPseq data, we show that in most cases the PWM model fails to reproduce the observed statistics of TFBSs. To overcome this issue, we introduce the pairwise interaction model (PIM), a generalization of the PWM model. The model is based on the principle of maximum entropy and explicitly describes pairwise correlations between nucleotides at different positions, while being otherwise as unconstrained as possible. It is mathematically equivalent to considering a TF-DNA binding energy that depends additively on each nucleotide identity at all positions in the TFBS, like the PWM model, but also additively on pairs of nucleotides. We find that the PIM significantly improves over the PWM model, and even provides an optimal description of TFBS statistics within statistical noise. The PIM generalizes previous approaches to interdependent positions: it accounts for co-variation of two or more base pairs, and predicts secondary motifs, while outperforming multiple-motif models consisting of mixtures of PWMs.
    [Show full text]
  • A New Sequence Logo Plot to Highlight Enrichment and Depletion
    bioRxiv preprint doi: https://doi.org/10.1101/226597; this version posted November 29, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. A new sequence logo plot to highlight enrichment and depletion. Kushal K. Dey 1, Dongyue Xie 1, Matthew Stephens 1, 2 1 Department of Statistics, University of Chicago 2 Department of Human Genetics, University of Chicago * Corresponding Email : [email protected] Abstract Background : Sequence logo plots have become a standard graphical tool for visualizing sequence motifs in DNA, RNA or protein sequences. However standard logo plots primarily highlight enrichment of symbols, and may fail to highlight interesting depletions. Current alternatives that try to highlight depletion often produce visually cluttered logos. Results : We introduce a new sequence logo plot, the EDLogo plot, that highlights both enrichment and depletion, while minimizing visual clutter. We provide an easy-to-use and highly customizable R package Logolas to produce a range of logo plots, including EDLogo plots. This software also allows elements in the logo plot to be strings of characters, rather than a single character, extending the range of applications beyond the usual DNA, RNA or protein sequences. We illustrate our methods and software on applications to transcription factor binding site motifs, protein sequence alignments and cancer mutation signature profiles. Conclusion : Our new EDLogo plots, and flexible software implementation, can help data analysts visualize both enrichment and depletion of characters (DNA sequence bases, amino acids, etc) across a wide range of applications.
    [Show full text]
  • Biocreative 2012 Proceedings
    Proceedings of 2012 BioCreative Workshop April 4 -5, 2012 Washington, DC USA Editors: Cecilia Arighi Kevin Cohen Lynette Hirschman Martin Krallinger Zhiyong Lu Carolyn Mattingly Alfonso Valencia Thomas Wiegers John Wilbur Cathy Wu 2012 BioCreative Workshop Proceedings Table of Contents Preface…………………………………………………………………………………….......... iv Committees……………………………………………………………………………………... v Workshop Agenda…………………………………………………………………………….. vi Track 1 Collaborative Biocuration-Text Mining Development Task for Document Prioritization for Curation……………………………………..……………………………………………….. 2 T Wiegers, AP Davis, and CJ Mattingly System Description for the BioCreative 2012 Triage Task ………………………………... 20 S Kim, W Kim, CH Wei, Z Lu and WJ Wilbur Ranking of CTD articles and interactions using the OntoGene pipeline ……………..….. 25 F Rinaldi, S Clematide and S Hafner Selection of relevant articles for curation for the Comparative Toxicogenomic Database…………………………………………………………………………………………. 31 D Vishnyakova, E Pasche and P Ruch CoIN: a network exploration for document triage………………………………................... 39 YY Hsu and HY Kao DrTW: A Biomedical Term Weighting Method for Document Recommendation ………... 45 JH Ju, YD Chen and JH Chiang C2HI: a Complete CHemical Information decision system……………………………..….. 52 CH Ke, TLM Lee and JH Chiang Track 2 Overview of BioCreative Curation Workshop Track II: Curation Workflows….…………... 59 Z Lu and L Hirschman WormBase Literature Curation Workflow ……………………………………………………. 66 KV Auken, T Bieri, A Cabunoc, J Chan, Wj Chen, P Davis, A Duong, R Fang, C Grove, Tw Harris, K Howe, R Kishore, R Lee, Y Li, Hm Muller, C Nakamura, B Nash, P Ozersky, M Paulini, D Raciti, A Rangarajan, G Schindelman, Ma Tuli, D Wang, X Wang, G Williams, K Yook, J Hodgkin, M Berriman, R Durbin, P Kersey, J Spieth, L Stein and Pw Sternberg Literature curation workflow at The Arabidopsis Information Resource (TAIR)…..……… 72 D Li, R Muller, TZ Berardini and E Huala Summary of Curation Process for one component of the Mouse Genome Informatics Database Resource …………………………………………………………………………....
    [Show full text]
  • Predicting Expression Levels of ​De Novo​Protein Designs in Yeast
    Predicting expression levels of de novo protein designs in yeast through machine learning. ​ ​ Patrick Slogic A thesis submitted in partial fulfilment of the requirements for the degree of: Master of Science in Bioengineering University of Washington 2020 Committee: Eric Klavins Azadeh Yazdan-Shahmorad Program Authorized to Offer Degree: Bioengineering ©Copyright 2020 Patrick Slogic University of Washington Abstract Predicting expression levels of de novo protein designs in yeast through machine learning. ​ ​ Patrick Slogic Chair of the Supervisory Committee: Eric Klavins Department of Electrical and Computer Engineering Protein engineering has unlocked potential for the biomolecular synthesis of amino acid sequences with specified functions. With the many implications in enhancing therapy development, developing accurate functional models of recombinant genetic outcomes has shown great promise in unlocking the potential of novel protein structures. However, the complicated nature of structural determination requires an iterative process of design and testing among an enormous search space for potential designs. Expressed protein sequences through biological platforms are necessary to produce and analyze the functional efficacy of novel designs. In this study, machine learning principles are applied to a yeast surface display protocol with the goal of optimizing the search space of amino acid sequences that will express in yeast transformations. Machine learning networks are compared in predicting expression levels and generating similar sequences to find those likely to be successfully expressed. ACKNOWLEDGEMENTS I would like to thank Dr. Eric Klavins for accepting me as a master’s student in his laboratory and for providing direction and guidance in exploring the topics of machine learning and recombinant genetics throughout my graduate program in bioengineering.
    [Show full text]
  • Low-N Protein Engineering with Data-Efficient Deep Learning
    bioRxiv preprint doi: https://doi.org/10.1101/2020.01.23.917682; this version posted January 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Low-N protein engineering with data-efficient deep learning 1† 3† 2† 2 1,4 Surojit Biswas ,​ Grigory Khimulya ,​ Ethan C. Alley ,​ Kevin M. Esvelt ,​ George M. Church ​ ​ ​ ​ ​ 1 Wyss Institute for Biologically Inspired Engineering, Harvard University 2 MIT Media Lab, Massachusetts Institute of Technology 3 Somerville, MA 02143, USA 4 Department of Genetics, Harvard Medical School †These authors contributed equally to this work. *Correspondence to: George M. Church: gchurch{at}genetics.med.harvard.edu ​ Abstract Protein engineering has enormous academic and industrial potential. However, it is limited by the lack of experimental assays that are consistent with the design goal and sufficiently high-throughput to find rare, enhanced variants. Here we introduce a machine learning-guided paradigm that can use as few as 24 functionally assayed mutant sequences to build an accurate virtual fitness landscape and screen ten million sequences via in silico directed evolution. As demonstrated in two highly dissimilar proteins, avGFP and ​ ​ TEM-1 β-lactamase, top candidates from a single round are diverse and as active as engineered mutants ​ ​ obtained from previous multi-year, high-throughput efforts. Because it distills information from both global and local sequence landscapes, our model approximates protein function even before receiving experimental data, and generalizes from only single mutations to propose high-functioning epistatically non-trivial designs.
    [Show full text]
  • TESE Kelma Sirleide De Souza.Pdf
    UNIVERSIDADE FEDERAL DE PERNAMBUCO CENTRO DE CIÊNCIAS BIOLÓGICAS PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIAS BIOLÓGICAS KELMA SIRLEIDE DE SOUZA Efeitos bioquímicos e genotóxicos de poluentes na ostra Crassostrea sp., na região norte do complexo estuarino Canal de Santa Cruz, litoral norte de Pernambuco Recife 2016 KELMA SIRLEIDE DE SOUZA Efeitos bioquímicos e genotóxicos de poluentes na ostra Crassostrea sp., na região norte do complexo estuarino Canal de Santa Cruz, litoral norte de Pernambuco Tese apresentada ao Programa de Pós- Graduação em Ciências Biológicas da Universidade Federal de Pernambuco como pré- requisito para a obtenção do grau de doutor em Ciências Biológicas. Orientador: Prof. Dr. Ranilson de Souza Bezerra Co- Orientador: Dr. Caio Rodrigo Dias de Assis Recife 2016 Catalogação na fonte Elaine Barroso CRB 1728 Souza, Kelma Sirleide de Efeitos bioquímicos e genotóxicos de poluentes na ostra Crassostrea sp. na região norte do complexo estuarino Canal de Santa Cruz, litoral norte de Pernambuco / Kelma Sirleide de Souza - Recife: O Autor, 2016. 91 folhas: il., fig., tab. Orientador: Ranilson de Souza Bezerra Coorientador: Caio Rodrigo Silva de Assis Tese (doutorado) – Universidade Federal de Pernambuco. Centro de Biociências. Ciências Biológicas, 2016. Inclui referências 1. Ostras 2. Estuários 3. Santa Cruz, Canal de (PE) I. Bezerra, Ranilson de Souza (orient.) II. Assis, Caio Rodrigo Silva de (coorient.) III. Título 594.4 CDD (22.ed.) UFPE/CB-2017- 406 KELMA SIRLEIDE DE SOUZA Efeitos bioquímicos e genotóxicos de poluentes na ostra Crassostrea sp., na região norte do complexo estuarino Canal de Santa Cruz, litoral norte de Pernambuco Tese apresentada ao Programa de Pós- Graduação em Ciências Biológicas da Universidade Federal de Pernambuco como pré- requisito para a obtenção do grau de doutor em Ciências Biológicas.
    [Show full text]
  • Lecture 12. Gene Finding & Rnaseq
    Lecture 12. Gene Finding & RNAseq Michael Schatz March 4, 2020 Applied Comparative Genomics Assignment 4: Due Wed Mar 4 Assignment 5: Due Wed Mar 11 Oxford Nanopore MinION • Thumb drive sized sequencer powered over USB • Contains 512 channels • Four pores per channel, only one pore active at a time • Early access began in 2014 • Officially released in 2015 (the same year I met MiKe!) Assignment 5: Due March 8 Goal: Genome Annotations aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaat gcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgct aatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaa tggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcg gctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcg gctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcct gcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggct atgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcg gctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccg atgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaat gcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctg ggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatg gtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgg gatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatc cgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcg
    [Show full text]
  • Interpretable Convolution Methods for Learning Genomic Sequence Motifs
    bioRxiv preprint doi: https://doi.org/10.1101/411934; this version posted September 8, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Interpretable Convolution Methods for Learning Genomic Sequence Motifs MS Ploenzke1*, RA Irizarry1,2 1 Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA 2 Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA * Corresponding author: ploenzke(at)g.harvard.edu Abstract The first-layer filters employed in convolutional neural networks tend to learn, or extract, spatial features from the data. Within their application to genomic sequence data, these learned features are often visualized and interpreted by converting them to sequence logos; an information-based representation of the consensus nucleotide motif. The process to obtain such motifs, however, is done through post-training procedures which often discard the filter weights themselves and instead rely upon finding those sequences maximally correlated with the given filter. Moreover, the filters collectively learn motifs with high redundancy, often simply shifted representations of the same sequence. We propose a schema to learn sequence motifs directly through weight constraints and transformations such that the individual weights comprising the filter are directly interpretable as either position weight matrices (PWMs) or information gain matrices (IGMs). We additionally leverage regularization to encourage learning highly-representative motifs with low inter-filter redundancy. Through learning PWMs and IGMs directly we present preliminary results showcasing how our method is capable of incorporating previously-annotated database motifs along with learning motifs de novo and then outline a pipeline for how these tools may be used jointly in a data application.
    [Show full text]
  • Tutorial: Environment for Tree Exploration Release 3.0.0B7
    Tutorial: Environment for Tree Exploration Release 3.0.0b7 Jaime Huerta-Cepas November 25, 2015 Contents 1 Changelog history3 1.1 What’s new in ETE 2.3...................................3 1.2 What’s new in ETE 2.2...................................5 1.3 What’s new in ETE 2.1...................................9 2 The ETE tutorial 13 2.1 Working With Tree Data Structures............................ 13 2.2 The Programmable Tree Drawing Engine......................... 43 2.3 Phylogenetic Trees..................................... 64 2.4 Clustering Trees...................................... 80 2.5 Phylogenetic XML standards............................... 85 2.6 Interactive web tree visualization............................. 93 2.7 Testing Evolutionary Hypothesis.............................. 95 2.8 Dealing with the NCBI Taxonomy database........................ 105 2.9 SCRIPTS: orthoXML................................... 108 3 ETE’s Reference Guide 115 3.1 Master Tree class...................................... 115 3.2 Treeview module...................................... 127 3.3 PhyloTree class....................................... 138 3.4 Clustering module..................................... 141 3.5 Nexml module....................................... 142 3.6 Phyloxml Module..................................... 158 3.7 Seqgroup class....................................... 161 3.8 WebTreeApplication object................................ 161 3.9 EvolTree class....................................... 162 3.10 NCBITaxa class.....................................
    [Show full text]