Seventh Annual DOE Joint Genome Institute User Meeting

Total Page:16

File Type:pdf, Size:1020Kb

Seventh Annual DOE Joint Genome Institute User Meeting Seventh Annual DOE Joint Genome Institute User Meeting Sponsored By U.S. Department of Energy Office of Science March 20-22, 2012 Walnut Creek Marriott Walnut Creek, California Contents Speaker Presentations .......................................................................................................... 1 Poster Presentations ........................................................................................................... 13 Attendees ............................................................................................................................ 83 Author Index ....................................................................................................................... 89 Speaker Presentations Abstracts alphabetical by speaker DOE Systems Biology Knowledgebase (Kbase) Adam Arkin ([email protected]) Lawrence Berkeley National Laboratory, Berkeley, California The Genome of Selaginella, a Remnant of an Ancient Vascular Plant Lineage Jody Banks ([email protected]) Botany and Plant Pathology, Purdue University, West Lafayette, Indiana Plants with lignified vascular tissues first appeared on earth about 400MY ago and subsequently diverged into several lineages. Only two of them remain: the euphyllophytes, which includes the ferns, gymnosperms and angiosperms, and the lycophytes. The genome sequence of the lycophyte Selaginella moellendorffii described here is the first lycophyte genome sequenced. Its compact genome, about two-thirds the size of Arabidopsis, has fewer genes with small intergenic regions and introns and no evidence of polyploidy. By comparing the Selaginella proteome with those of earlier diverging plants (Chlamydomonas and the moss Physcomitrella) and later diverging angiosperms, we were able to identify genes that coincide with the evolution of traits specific to land plants. Among these traits are vascular tissues consisting of special lignified cell types. Surprisingly, Selaginella produces an angiosperm-specific lignin. Recent studies indicate that the Selaginella lignin biosynthetic genes may be useful in modifying lignin in angiosperms. Genomics of Energy and the Environment Steven A. Benner ([email protected]) Foundation for Applied Molecular Evolution, The Westheimer Institute of Science and Technology, Gainesville, Florida The Earth and its biosphere co-evolve in tandem, each having influenced the other over the 4.5 billion year history of the planet. Genomic sequence data provide an important resource to explore and understand this co-evolution , especially if experimental methods are used to supplement theoretical modeling based on comparative sequence analysis. Paleogenetics provides experimental tools to address historical models based on genomic sequence analysis. Paleogenetics infers the sequences of ancestral genes and proteins from now-extinct organisms by analysis of the sequences of their descendents. Then, paleogenetics exploits recombinant DNA technology to bring these ancient biomolecules back to life, where they can be studied in the laboratory. This talk will describe the use of Abstracts alphabetical by speaker Speaker Presentations experimental paleogenetics to correlate the genomic, paleontological, and geological records of life on Earth. By resurrecting ancestral proteins from extinct organisms that lived long in the past, we can make broad statements about the chemistry behind adaptation, the nature of ancient environments, and the interaction between species in the ecosystem. We will start in the present day, and take steps back in time, first by 40 million years to the start of the most recent global climate deterioration, then back 100 million years to the age of the dinosaurs, and then back over 2 billion years, to the establishment of eubacteria on the planet. We will also discuss how this process if hindered by errors in modern genome sequence databases. Further, we will discuss the construction of "naturally organized" genome sequence databases, such as the MasterCatalog, which allow efficient organization, search, error correction and analysis, especially when compared with standard public genome sequence database. Getting to the Root of Things: Root Spatiotemporal Regulatory Networks Siobhan Brady ([email protected]) Genome Center, Department of Biology, College of Biological Sciences, University of California, Davis, Davis, California Plant root development provides a remarkably tractable system to delineate developmental gene regulatory networks and to study their functionality in a complex multicellular model system over developmental time. We present a gene regulatory network that regulates distinct transcriptional events in developmental time. Distinct regulatory modules were identified that temporally drive the expression of genes involved in xylem specification and in the subsequent synthesis of secondary cell wall metabolites associated with xylem differentiation. Reprogramming Bacteria to Seek and Destroy Small Molecules Justin P. Gallivan ([email protected]) Department of Chemistry and Center for Fundamental and Applied Molecular Evolution, Emory University, Atlanta, Georgia Simple organisms, such as the bacterium E. coli., carry out a wide variety of complex functions. E. coli cells synthesize complex molecules, communicate with one another, move in response to changing conditions, and replicate themselves every 20 minutes. The programs that control these behaviors are stored in a genome that encodes just over a megabyte of digital information. In this talk, I will present our recent efforts to reprogram E. coli to sense new small molecules and to respond to them with predictable behaviors. Specifically, I will describe our efforts to create synthetic riboswitches, which are designer RNA sequences that control gene expression in a ligand-dependent fashion without the need for proteins. I will show how synthetic riboswitches can be used to engineer bacteria to have a variety of functions, including the ability to seek and destroy small molecules, such as the herbicide atrazine. 2 Abstracts alphabetical by speaker Speaker Presentations The Evolution of Streamlined Genomes in Ocean Bacteria Stephen J. Giovannoni* ([email protected]) and J. Cameron Thrash Department of Microbiology, Oregon State University, Corvallis, Oregon The smallest genomes known from free-living cells are found in marine bacterioplankton. Genome sequences from cyanobacteria in the genus Prochlorococcus range in size from 1.6 - 2.7 Mbp. Genomes from heterotrophic cells in the SAR11 clade of Alphaproteobacteria are 1.3 - 1.5 Mbp. Genome sequences from obligate methylotrophs of the OM43 clade of Betaproteobacteria are 1.3 Mbp. Thus, in three metabolic categories, marine bacterioplankton hold the record for the smallest free-living genomes. Genome streamlining theory has been invoked to explain the small genomes of these cells. The essence of this theory is that selection is most efficient in microbial populations with large effective population sizes, and favors minimalism in the genomes and cell architecture of bacterioplankton because of selection for the efficient use of nutrient resources. Genome reduction also occurs in bacterial symbionts, where it has been attributed to genetic drift, and produces very different genomic signatures, including the expansion of non-coding genetic material, loss of anapleurotic pathways, and elevated rates of non-synonymous substitution. Comparative study of SAR11 genomes and experimental studies with cultures have revealed the metabolic consequences of genome streamlining. Most strains are deficient in assimilatory sulphate reduction and in normal pathways of glycine biosynthesis, making them dependent on organosulphur compounds and glycine, or glycine precursors, for growth. Many common regulatory systems are absent, and are replaced by simpler systems for maintaining cellular homeostasis, often involving riboswitches. Studies of ultrastructure and the metaproteomics of cells from oligotrophic oceans show that SAR11 have high surface-to-volume ratios and very high ratios of transport proteins, an apparent adaptation to enable efficient replication in ocean “deserts”. These observations support the broad conclusion that metabolic versatility has been sacrificed for simplicity and genome reduction in some bacterioplankton, rendering them able to use ambient nutrient resources efficiently but reducing their versatility. The question remains, how does the evolutionary history and ecology of these organisms differ from microbial plankton with genomes of average size? Systems Biology Approaches to Dissecting Plant Cell Wall Deconstruction in a Model Filamentous Fungus N. Louise Glass* ([email protected]), Sam Coradetti, Elizabeth Znameroski, Jianping Sun, James Craig, and Yi Yiong Plant and Microbial Biology Department, University of California, Berkeley, Berkeley, California Neurospora crassa colonizes burnt grasslands in the wild and metabolizes both cellulose and hemicellulose from plant cell walls. When switched from a favored carbon source such as sucrose to cellulose, N. crassa dramatically upregulates expression and secretion of a wide variety of genes encoding lignocellulolytic enzymes. However, the means by which N. crassa and other filamentous fungi sense the presence of cellulose in the environment remains unclear. In N. crassa, cellobiose efficiently induces cellulase gene expression in the absence of intra and Abstracts alphabetical by speaker 3 Speaker Presentations extracellular β-glucosidase
Recommended publications
  • Phylogeny of the Pluteaceae (Agaricales, Basidiomycota): Taxonomy and Character Evolution
    AperTO - Archivio Istituzionale Open Access dell'Università di Torino Phylogeny of the Pluteaceae (Agaricales, Basidiomycota): taxonomy and character evolution This is the author's manuscript Original Citation: Availability: This version is available http://hdl.handle.net/2318/74776 since 2016-10-06T16:59:44Z Published version: DOI:10.1016/j.funbio.2010.09.012 Terms of use: Open Access Anyone can freely access the full text of works made available as "Open Access". Works made available under a Creative Commons license can be used according to the terms and conditions of said license. Use of all other works requires consent of the right holder (author or publisher) if not exempted from copyright protection by the applicable law. (Article begins on next page) 23 September 2021 This Accepted Author Manuscript (AAM) is copyrighted and published by Elsevier. It is posted here by agreement between Elsevier and the University of Turin. Changes resulting from the publishing process - such as editing, corrections, structural formatting, and other quality control mechanisms - may not be reflected in this version of the text. The definitive version of the text was subsequently published in FUNGAL BIOLOGY, 115(1), 2011, 10.1016/j.funbio.2010.09.012. You may download, copy and otherwise use the AAM for non-commercial purposes provided that your license is limited by the following restrictions: (1) You may use this AAM for non-commercial purposes only under the terms of the CC-BY-NC-ND license. (2) The integrity of the work and identification of the author, copyright owner, and publisher must be preserved in any copy.
    [Show full text]
  • Mushroom Cultivation
    CMS COLLEGE OF SCIENCE AND COMMERCE (AUTONOMOUS) MODEL EXAMINATIONS (October 2019) ALL UG COURSES (EXCEPT BIOSCIENCE) SEMESTER V EDC – MUSHROOM CULTIVATION SL QUESTIONS ANS NO 1 To which division does it belong? A A. Basidiomycetes B. Pteridophyta C. Thallophyta D. Mollusca 2 Mushroom is: A A. Saprophyticfungus B. AutotrophicAlgae None of the C. Heterotrophicfungus D. above 3 Mycellium produces white or colored umbrella shaped fruiting bodies called: B A. Haphae B. Basidiocarp C. Annalus D. Seta 4 Basidiocarp consist of a fleshy stalk called ___________ and umbrella like D head borne on its top called __________ A. Hyphae and Seta B. Seta and Annalus Annalus adn C. Antheridia D. Stipe and Pileus 5 When young fruiting body is completely enveloped by a thin membrane, it is C called _____________ A. Mycelium B. Rhizoids C. Velum(veil) D. Septate 6 With the growth of ____________ velum gets ruptured, while a part of it B remained attached to stipe in the form of ring or____________. Basidiocarp and A. Slender B. Pileus and Annalus Pyrenoid and C. Conjugation D. Hyaline and Pyrenoid 7 On the lower side of Pileus number of vertical plates like structure are present D called____________ A. Spores B. Organelles Mushroom C. Dryopteris D. Gills 8 The gills on either sides bear club shaped basidia which A produce_____________ A. Basidiocarp B. Chloroplasts C. funaria D. None of these 9 C It grows during ______ A. Summer season B. Winters C. Rainy season D. all seasons 10 One of the best edible species mushrooms under A A. Sahiwal B. Kasur C.
    [Show full text]
  • Mechanism of Sulfur Poisoning by H2s and So2 of Nickel and Cobalt Based Catalysts for Dry Reforming of Methane
    MECHANISM OF SULFUR POISONING BY H2S AND SO2 OF NICKEL AND COBALT BASED CATALYSTS FOR DRY REFORMING OF METHANE A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the Degree of Master of Science in the Department of Chemical and Biological Engineering University of Saskatchewan Saskatoon By Francisco Javier Pacheco Gómez © Copyright Francisco Javier Pacheco Gómez, March 2016. All rights reserved. PERMISSION TO USE In presenting this thesis in partial fulfillment of the requirements for a Postgraduate degree from the University of Saskatchewan, I agree that the Libraries of this University may make it freely available for inspection. I further agree that permission for copying of this thesis/dissertation in any manner, in whole or in part, for scholarly purposes may be granted by Professor Hui Wang who supervised my thesis work. It is understood that any copying or publication or use of this thesis or parts for financial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to me and to the University of Saskatchewan in any scholarly use which may be made of any material in my thesis. DISCLAIMER The University of Saskatchewan was exclusively created to meet the thesis and/or exhibition requirements for the degree of Master of Science at the University of Saskatchewan. Reference in this thesis to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not constitute or imply its endorsement, recommendation, or favouring by the University of Saskatchewan.
    [Show full text]
  • The U.S. Department of Energy's Ten-Year-Plans for the Office Of
    U.S. DEPARTMENT OF ENERGY The U.S. Department of Energy’s Ten-Year-Plans for the Office of Science National Laboratories FY 2019 FY 2019 Annual Laboratory Plans for the Office of Science National Laboratories i Table of Contents Introduction ................................................................................................................................................................1 Ames Laboratory ........................................................................................................................................................3 Lab-at-a-Glance ......................................................................................................................................................3 Mission and Overview ............................................................................................................................................3 Core Capabilities .....................................................................................................................................................4 Science Strategy for the Future ..............................................................................................................................8 Infrastructure .........................................................................................................................................................8 Argonne National Laboratory .................................................................................................................................
    [Show full text]
  • "Phylogenetic Profiling"
    Introductory Review Phylogenetic p rofiling Matteo Pellegrini, Todd O. Yeates and David Eisenberg Howard Hughes Medical Institute, University of California at Los Angeles, Los Angeles, CA, USA Sorel T. Fitz-Gibbon IGPP Center for Astrobiology, University of California at Los Angeles, Los Angeles, CA, USA 1. Introduction Biology has been profoundly changed by the development of techniques to se- quence DNA. The advent of rapid sequencing in conjunction with the capability to assemble sequence fragments into complete genome sequences enables researchers to read and analyze entire genomes of organisms. Parallel progress has been made in algorithms to study the evolutionary history of proteins. The techniques rely on the ability to measure the similarity of protein sequences in order to determine the likelihood that different proteins are descended from a common ancestor. It is therefore possible to reconstruct families of proteins that share a common ancestor. Combining these two capabilities, we can now not only determine which proteins are coded within an organism’s genome but we can also discover the evolutionary relationships between the proteins of multiple organisms. Phylogenetic profiling is the study of which protein types are found in which organisms. In order to perform phylogenetic profiling, one must first establish a classification of proteins into families. An example of such a classification scheme across a broad range of fully sequenced organisms is the Clusters of Orthologous Groups (Tatusov, 1997), where an attempt is made to group together proteins that perform a similar function. Next, each organism is described in terms of which protein families are coded or not coded in its genome.
    [Show full text]
  • Pellegrini M. Using Phylogenetic Profiles To
    Chapter 9 Using Phylogenetic Profiles to Predict Functional Relationships Matteo Pellegrini Abstract Phylogenetic profiling involves the comparison of phylogenetic data across gene families. It is possible to construct phylogenetic trees, or related data structures, for specific gene families using a wide variety of tools and approaches. Phylogenetic profiling involves the comparison of this data to determine which families have correlated or coupled evolution. The underlying assumption is that in certain cases these couplings may allow us to infer that the two families are functionally related: that is their function in the cell is coupled. Although this technique can be applied to noncoding genes, it is more commonly used to assess the function of protein coding genes. Examples of proteins that are functionally related include subunits of protein complexes, or enzymes that perform consecutive steps along biochemical pathways. We hypothesize the deletion of one of the families from a genome would then indirectly affect the function of the other. Dozens of different implementations of the phylogenetic profiling technique have been developed over the past decade. These range from the first simple approaches that describe phylogenetic profiles as binary vectors to the most complex ones that attempt to model to the coevolution of protein families on a phylogenetic tree. We discuss a set of these implementations and present the software and databases that are available to perform phylogenetic profiling. Key words: Phylogenetic profiles, Coevolution, Functional associations, Comparative genomics, Coevolving proteins 1. Introduction The remarkable improvements in sequencing technology that have occurred over the past few decades have made the sequenc- ing of genomes an ever more routine task.
    [Show full text]
  • Production of Paddy Straw Mushroom (Vovlariella Volvacea): Paddy Straw Mushroom Is an Edible Mushroom of the Tropics and Subtropics
    1 Production of paddy straw mushroom (Vovlariella volvacea): Paddy straw mushroom is an edible mushroom of the tropics and subtropics. It was first cultivated in China as early as in 1822. Around 1932-35, the straw mushroom was introduced into Philippines, Malaysia, and other South-East Asian countries by overseas Chinese. In India this mushroom was first cultivated in early 1940’s. In India, 19 edible species of Volvariella have been recorded but cultivation methods have been devised for three of them only viz; V. volvacea, V. esculenta (Mass) and V. diplasia. Volvariella volvacea is deep grey in colour and number of fruiting body is less per bed whereas V. diplasia is whitish or ashy in colour and, fruiting body is more with smaller size. Paddy straw mushroom (Volvariella spp.) also called ‘straw mushroom’ is a fungus of the tropics and subtropics and has been cultivated for many years in India. Paddy straw mushroom is also known as “warm mushroom” as it grows at relatively high temperature. It is a fast growing mushroom and under favorable growing conditions total crop cycle is completed within 4-5 weeks time. This mushroom can use wide range of cellulosic materials and the C: N ratio needed is 40 to 60, quite high in comparison to other cultivated mushrooms. It can be grown quite quickly and easily on uncomposted substrates such as paddy straw and cotton waste or other cellulosic organic waste materials. Several species of Volvariella have reportedly been grown for food, but only three species of the straw mushroom i.e. Volvariella volvacea, Volvariella esculanta and Volvariella diplasia are cultivated artificially.
    [Show full text]
  • Second Annual DOE Joint Genome Institute User Meeting
    Second Annual DOE Joint Genome Institute User Meeting Sponsored By U.S. Department of Energy Office of Science March 28–30, 2007 Marriott Hotel Walnut Creek, California Contents Speaker Presentations Abstracts alphabetical by speaker....................................................................................... 1 Poster Presentations Posters alphabetical by first author. *Presenting author. .................................................. 11 Attendees Current as of March 9, 2007 ............................................................................................. 69 Author Index ................................................................................................................... 77 iii iv Speaker Presentations Abstracts alphabetical by speaker. The JGI Aspergillus niger Genome Project Scott E. Baker ([email protected]) Fungal Biotechnology Team, Chemical and Biological Process Development Group, Pacific Northwest National Laboratory, Richland, WA Aspergillus niger is an economically important filamentous ascomycete fungus that is used in industry for its prodigious production of citric acid and a number of enzymes. The DOE Joint Genome Institute has sequenced the genome of A. niger ATCC 1015, a wildtype strain and the source of the first patented microbial fermentation process for citric acid production. Preliminary annotation indicates the presence of over 250 glycosyl hydrolases. These enzymes are crucial for the degradation of lignocellulosic biomass into simple sugars and other chemical building blocks.
    [Show full text]
  • DOE Human Genome Program Contractor-Grantee Workshop VIII
    Human Genome Prozram U.S. Department of Energy Office of Biological and Environmental Research SC-72 GTN Germantown, MD 20874-1290 301/903-6488, Fax: 3011903-8521 E-mail: [email protected] A limited number of print copies are available. Contact: Sheryl Martin Human Genome Management Information System Oak Ridge National Laboratory 1060 Commerce Park, MS 6480 Oak Ridge, TN 37830 865/576-6669, Fax: 865/574-9888 E-mail: [email protected] An electronic version of this document will be available on February 27, 2000, at the Human Genome Project Infonnation Web site under Publications (http:llwww.ornl.gov/hgmis). Abstracts for this publication were submitted via the web. DOE/SC-0002 DOE Human Genome Program Contractor-Grantee Workshop VIII February 27-March 2, 2000 Santa Fe, New Mexico Date Published: February 2000 Prepared for the U.S. Department ofEnergy Office of Science Office of Biological and Environmental Research Washington, DC 20874-1290 Prepa.-ed by Human Genome Management Information System Oak Ridge National Labomtory Oak Ridge, 1N 37830 Managed by LOCKHEED MARTIN ENERGY RESEARCH CORP. for the U.S. DEPAR1MENT OF ENERGY UNDER CONI'RACT DE-AC05-960R22464 Contents1 Introduction to Contractor-Grantee Workshop VIII . 1 Sequencing . 3 1. Sequence Analysis of HUman Chromosome 19 Anne Olsen, Paul Predki, Ken Frankel, Laurie Gordon, Astrid Terry, Matt Nolan, Mark Wagner, Amy Brower, Andrea Aerts, Marne! Bondoc, Kristen Kadner, Manesh Shah, Richard Mural, Miriam Land, Denise Schmoyer, Sergey Petrov, Doug Hyatt, Morey Parang, Jay Snoddy, Ed Uberbacher, and the JGI Production Sequencing Team . 3 2. Draft Sequencing Procedures for Chromosome 16 Sequencing Mark 0.
    [Show full text]
  • Paddy Straw Mushroom (433)
    Pacific Pests, Pathogens and Weeds - Online edition Paddy straw mushroom (433) Common Name Paddy straw mushroom, straw mushroom, Chinese mushroom. Scientific Name Volvariella volvacea Distribution It is cultivated widely in East and Southeast Asia, and introduced in many other regions, including Africa, North America and Australia. It is recorded from Solomon Islands. Use & Appearance The paddy straw mushroom is grown on rice straw beds and picked immature, during the button or egg phase and before the veil ruptures (Photo 1). It is found in woodchips, rich garden Photo 1. Button stage of the paddy straw soil, compost piles and, in the Pacific, on decaying trunks of fallen sago palm and empty fruit mushroom, Volvariella volvacea, showing bunches of oil palm. They are often available fresh in Asia, but are more frequently found canned many still enclosed in the veil, and others or dried in countries where they are not cultivated. where the veil has broken. Methods of cultivation are here: http://www.fao.org/3/ca4450en/ca4450en.pdf. Young stages are formed under a greyish-brown veil (‘universal veil’), which surrounds the mushroom at the ‘button stage’ (Photo 2). It breaks to allow the stem and cap to expand leaving a dark brown cup-shaped structure (the ‘volva’) at the base (Photo 2). The cap is 5-12 cm diameter, first ovoid, then cone-like and finally broadly convex or bell- shaped, dark grey in the centre, becoming silvery-white or brownish-grey towards the margins, radially streaked with soft hairs (Photo 3). The cap tends to split at the edges.
    [Show full text]
  • Integrating Genomic Data to Predict Transcription Factor Binding
    Integrating Genomic Data to Predict Transcription Factor Binding Dustin T. Holloway 1 Mark Kon 2 Charles DeLisi 3 [email protected] [email protected] [email protected] 1 Molecular Biology Cell Biology and Biochemistry, 3 Boston University, Boston, MA 02215, U.S.A. Bioinformatics and Systems Biology, 2 Boston University, Boston, MA 02215, U.S.A. Department of Mathematics and Statistics, Boston University, Boston, MA 02215 , U.S.A. Abstract Transcription factor binding sites (TFBS) in gene promoter regions are often predicted by using position specific scoring matrices (PSSMs), which summarize sequence patterns of experimentally determined TF binding sites. Although PSSMs are more reliable than simple consensus string matching in predicting a true binding site, they generally result in high numbers of false positive hits. This study attempts to reduce the number of false positive matches and generate new predictions by integrating various types of genomic data by two methods: a Bayesian allocation procedure, and support vector machine classification. Several methods will be explored to strengthen the prediction of a true TFBS in the Saccharomyces cerevisiae genome: binding site degeneracy, binding site conservation, phylogenetic profiling, TF binding site clustering, gene expression profiles, GO functional annotation, and k-mer counts in promoter regions. Binding site degeneracy (or redundancy) refers to the number of times a particular transcription factor’s binding motif is discovered in the upstream region of a gene. Phylogenetic conservation takes into account the number of orthologous upstream regions in other genomes that contain a particular binding site. Phylogenetic profiling refers to the presence or absence of a gene across a large set of genomes.
    [Show full text]
  • Scalable Phylogenetic Profiling Using Minhash Uncovers Likely Eukaryotic Sexual Reproduction Genes
    bioRxiv preprint doi: https://doi.org/10.1101/852491; this version posted November 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 Scalable Phylogenetic Profiling using MinHash Uncovers Likely Eukaryotic Sexual Reproduction Genes 1,2,3,* 1,2,3 4,5 1,2,3,6,7,* David Moi ​ ,​ Laurent Kilchoer ,​ Pablo S. Aguilar ​ and Christophe Dessimoz ​ ​ ​ ​ 1 2 Department​ of Computational Biology, University of Lausanne, Switzerland; Center​ for Integrative Genomics, ​ 3 4 University of Lausanne, Switzerland; SIB​ Swiss Institute of Bioinformatics, Lausanne, Switzerland; Instituto​ de ​ ​ 5 Investigaciones Biotecnologicas (IIBIO), Universidad Nacional de San Martín Buenos Aires, Argentina; Instituto​ de ​ 6 Fisiología, Biología Molecular y Neurociencias (IFIBYNE-CONICET), Department​ of Genetics, Evolution, and ​ 7 Environment, University College London, UK; Department​ of Computer Science, University College London, UK. ​ *Corresponding authors: [email protected] and [email protected] Abstract Phylogenetic profiling is a computational method to predict genes involved in the same biological process by identifying protein families which tend to be jointly lost or retained across the tree of life. Phylogenetic profiling has customarily been more widely used with prokaryotes than eukaryotes, because the method is thought to require many diverse genomes. There are now many eukaryotic genomes available, but these are considerably larger, and typical phylogenetic profiling methods require quadratic time or worse in the number of genes. We introduce a fast, scalable phylogenetic profiling approach entitled HogProf, which leverages hierarchical orthologous groups for the construction of large profiles and locality-sensitive hashing for efficient retrieval of similar profiles.
    [Show full text]