Computational Approaches for Protein Function Prediction: a Survey
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Gene Prediction and Genome Annotation
A Crash Course in Gene and Genome Annotation Lieven Sterck, Bioinformatics & Systems Biology VIB-UGent [email protected] ProCoGen Dissemination Workshop, Riga, 5 nov 2013 “Conifer sequencing: basic concepts in conifer genomics” “This Project is financially supported by the European Commission under the 7th Framework Programme” Genome annotation: finding the biological relevant features on a raw genomic sequence (in a high throughput manner) ProCoGen Dissemination Workshop, Riga, 5 nov 2013 Thx to: BSB - annotation team • Lieven Sterck (Ectocarpus, higher plants, conifers, … ) • Yao-cheng Lin (Fungi, conifers, …) • Stephane Rombauts (green alga, mites, …) • Bram Verhelst (green algae) • Pierre Rouzé • Yves Van de Peer ProCoGen Dissemination Workshop, Riga, 5 nov 2013 Annotation experience • Plant genomes : A.thaliana & relatives (e.g. A.lyrata), Poplar, Physcomitrella patens, Medicago, Tomato, Vitis, Apple, Eucalyptus, Zostera, Spruce, Oak, Orchids … • Fungal genomes: Laccaria bicolor, Melampsora laricis- populina, Heterobasidion, other basidiomycetes, Glomus intraradices, Pichia pastoris, Geotrichum Candidum, Candida ... • Algal genomes: Ostreococcus spp, Micromonas, Bathycoccus, Phaeodactylum (and other diatoms), E.hux, Ectocarpus, Amoebophrya … • Animal genomes: Tetranychus urticae, Brevipalpus spp (mites), ... ProCoGen Dissemination Workshop, Riga, 5 nov 2013 Why genome annotation? • Raw sequence data is not useful for most biologists • To be meaningful to them it has to be converted into biological significant knowledge -
Gene Structure Prediction
Gene finding Lorenzo Cerutti Swiss Institute of Bioinformatics EMBNet course, September 2002 Gene finding EMBNet 2002 Introduction Gene finding is about detecting coding regions and infer gene structure Gene finding is difficult DNA sequence signals have low information content (degenerated and highly unspe- • cific) It is difficult to discriminate real signals • Sequencing errors • Prokaryotes High gene density and simple gene structure • Short genes have little information • Overlapping genes • Eukaryotes Low gene density and complex gene structure • Alternative splicing • Pseudo-genes • 1 Gene finding EMBNet 2002 Gene finding strategies Homology method Gene structure can be deduced by homology • Requires a not too distant homologous sequence • Ab initio method Requires two types of information • . compositional information . signal information 2 Gene finding EMBNet 2002 Gene finding: Homology method 3 Gene finding EMBNet 2002 Homology method Principles of the homology method. Coding regions evolve slower than non-coding regions, i.e. local sequence similarity • can be used as a gene finder. Homologous sequences reflect a common evolutionary origin and possibly a common • gene structure, i.e. gene structure can be solved by homology (mRNAs, ESTs, proteins, domains). Standard homology search methods can be used (BLAST, Smith-Waterman, ...). • Include ”gene syntax” information (start/stop codons, ...). • Homology methods are also useful to confirm predictions inferred by other methods 4 Gene finding EMBNet 2002 Homology method: a simple view Gene of unknown structure Homology with a gene of known structure Exon 1 Exon 2 Exon 3 Find DNA signals ATG GT {TAA,TGA,TAG} AG 5 Gene finding EMBNet 2002 Procrustes Procrustes is a software to predict gene structure from homology found in pro- teins (Gelfand et al., 1996) Principle of the algorithm • . -
Molecular Basis for the Distinct Cellular Functions of the Lsm1-7 and Lsm2-8 Complexes
bioRxiv preprint doi: https://doi.org/10.1101/2020.04.22.055376; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Molecular basis for the distinct cellular functions of the Lsm1-7 and Lsm2-8 complexes Eric J. Montemayor1,2, Johanna M. Virta1, Samuel M. Hayes1, Yuichiro Nomura1, David A. Brow2, Samuel E. Butcher1 1Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA. 2Department of Biomolecular Chemistry, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA. Correspondence should be addressed to E.J.M. ([email protected]) and S.E.B. ([email protected]). Abstract Eukaryotes possess eight highly conserved Lsm (like Sm) proteins that assemble into circular, heteroheptameric complexes, bind RNA, and direct a diverse range of biological processes. Among the many essential functions of Lsm proteins, the cytoplasmic Lsm1-7 complex initiates mRNA decay, while the nuclear Lsm2-8 complex acts as a chaperone for U6 spliceosomal RNA. It has been unclear how these complexes perform their distinct functions while differing by only one out of seven subunits. Here, we elucidate the molecular basis for Lsm-RNA recognition and present four high-resolution structures of Lsm complexes bound to RNAs. The structures of Lsm2-8 bound to RNA identify the unique 2′,3′ cyclic phosphate end of U6 as a prime determinant of specificity. In contrast, the Lsm1-7 complex strongly discriminates against cyclic phosphates and tightly binds to oligouridylate tracts with terminal purines. -
Posters A.Pdf
INVESTIGATING THE COUPLING MECHANISM IN THE E. COLI MULTIDRUG TRANSPORTER, MdfA, BY FLUORESCENCE SPECTROSCOPY N. Fluman, D. Cohen-Karni, E. Bibi Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel In bacteria, multidrug transporters couple the energetically favored import of protons to export of chemically-dissimilar drugs (substrates) from the cell. By this function, they render bacteria resistant against multiple drugs. In this work, fluorescence spectroscopy of purified protein is used to unravel the mechanism of coupling between protons and substrates in MdfA, an E. coli multidrug transporter. Intrinsic fluorescence of MdfA revealed that binding of an MdfA substrate, tetraphenylphosphonium (TPP), induced a conformational change in this transporter. The measured affinity of MdfA-TPP was increased in basic pH, raising a possibility that TPP might bind tighter to the deprotonated state of MdfA. Similar increases in affinity of TPP also occurred (1) in the presence of the substrate chloramphenicol, or (2) when MdfA is covalently labeled by the fluorophore monobromobimane at a putative chloramphenicol interacting site. We favor a mechanism by which basic pH, chloramphenicol binding, or labeling with monobromobimane, all induce a conformational change in MdfA, which results in deprotonation of the transporter and increase in the affinity of TPP. PHENOTYPE CHARACTERIZATION OF AZOSPIRILLUM BRASILENSE Sp7 ABC TRANSPORTER (wzm) MUTANT A. Lerner1,2, S. Burdman1, Y. Okon1,2 1Department of Plant Pathology and Microbiology, Faculty of Agricultural, Food and Environmental Quality Sciences, Hebrew University of Jerusalem, Rehovot, Israel, 2The Otto Warburg Center for Agricultural Biotechnology, Faculty of Agricultural, Food and Environmental Quality Sciences, Hebrew University of Jerusalem, Rehovot, Israel Azospirillum, a free-living nitrogen fixer, belongs to the plant growth promoting rhizobacteria (PGPR), living in close association with plant roots. -
Sox2-RNA Mechanisms of Chromosome Topological Control in Developing Forebrain
bioRxiv preprint doi: https://doi.org/10.1101/2020.09.22.307215; this version posted September 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Title: Sox2-RNA mechanisms of chromosome topological control in developing forebrain Ivelisse Cajigas1, Abhijit Chakraborty2, Madison Lynam1, Kelsey R Swyter1, Monique Bastidas1, Linden Collens1, Hao Luo1, Ferhat Ay2,3, Jhumku D. Kohtz1,4 1Department of Pediatrics, Northwestern University, Feinberg School of Medicine, Department of Human Molecular Genetics, Stanley Manne Children's Research Institute 2430 N Halsted, Chicago, IL 60614 2Centers for Autoimmunity and Cancer Immunotherapy, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA 3School of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA 4To whom correspondence should be addressed: [email protected] 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.22.307215; this version posted September 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Summary Precise regulation of gene expression networks requires the selective targeting of DNA enhancers. The Evf2 long non-coding RNA regulates Dlx5/6 ultraconserved enhancer(UCE) interactions with long-range target genes, controlling gene expression over a 27Mb region in mouse developing forebrain. Here, we show that Evf2 long range gene repression occurs through multi-step mechanisms involving the transcription factor Sox2, a component of the Evf2 ribonucleoprotein complex (RNP). -
Proteomics Provides Insights Into the Inhibition of Chinese Hamster V79
www.nature.com/scientificreports OPEN Proteomics provides insights into the inhibition of Chinese hamster V79 cell proliferation in the deep underground environment Jifeng Liu1,2, Tengfei Ma1,2, Mingzhong Gao3, Yilin Liu4, Jun Liu1, Shichao Wang2, Yike Xie2, Ling Wang2, Juan Cheng2, Shixi Liu1*, Jian Zou1,2*, Jiang Wu2, Weimin Li2 & Heping Xie2,3,5 As resources in the shallow depths of the earth exhausted, people will spend extended periods of time in the deep underground space. However, little is known about the deep underground environment afecting the health of organisms. Hence, we established both deep underground laboratory (DUGL) and above ground laboratory (AGL) to investigate the efect of environmental factors on organisms. Six environmental parameters were monitored in the DUGL and AGL. Growth curves were recorded and tandem mass tag (TMT) proteomics analysis were performed to explore the proliferative ability and diferentially abundant proteins (DAPs) in V79 cells (a cell line widely used in biological study in DUGLs) cultured in the DUGL and AGL. Parallel Reaction Monitoring was conducted to verify the TMT results. γ ray dose rate showed the most detectable diference between the two laboratories, whereby γ ray dose rate was signifcantly lower in the DUGL compared to the AGL. V79 cell proliferation was slower in the DUGL. Quantitative proteomics detected 980 DAPs (absolute fold change ≥ 1.2, p < 0.05) between V79 cells cultured in the DUGL and AGL. Of these, 576 proteins were up-regulated and 404 proteins were down-regulated in V79 cells cultured in the DUGL. KEGG pathway analysis revealed that seven pathways (e.g. -
A Curated Benchmark of Enhancer-Gene Interactions for Evaluating Enhancer-Target Gene Prediction Methods
University of Massachusetts Medical School eScholarship@UMMS Open Access Articles Open Access Publications by UMMS Authors 2020-01-22 A curated benchmark of enhancer-gene interactions for evaluating enhancer-target gene prediction methods Jill E. Moore University of Massachusetts Medical School Et al. Let us know how access to this document benefits ou.y Follow this and additional works at: https://escholarship.umassmed.edu/oapubs Part of the Bioinformatics Commons, Computational Biology Commons, Genetic Phenomena Commons, and the Genomics Commons Repository Citation Moore JE, Pratt HE, Purcaro MJ, Weng Z. (2020). A curated benchmark of enhancer-gene interactions for evaluating enhancer-target gene prediction methods. Open Access Articles. https://doi.org/10.1186/ s13059-019-1924-8. Retrieved from https://escholarship.umassmed.edu/oapubs/4118 Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License. This material is brought to you by eScholarship@UMMS. It has been accepted for inclusion in Open Access Articles by an authorized administrator of eScholarship@UMMS. For more information, please contact [email protected]. Moore et al. Genome Biology (2020) 21:17 https://doi.org/10.1186/s13059-019-1924-8 RESEARCH Open Access A curated benchmark of enhancer-gene interactions for evaluating enhancer-target gene prediction methods Jill E. Moore, Henry E. Pratt, Michael J. Purcaro and Zhiping Weng* Abstract Background: Many genome-wide collections of candidate cis-regulatory elements (cCREs) have been defined using genomic and epigenomic data, but it remains a major challenge to connect these elements to their target genes. Results: To facilitate the development of computational methods for predicting target genes, we develop a Benchmark of candidate Enhancer-Gene Interactions (BENGI) by integrating the recently developed Registry of cCREs with experimentally derived genomic interactions. -
There Is a Lot of Research on Gene Prediction Methods
LBNL #58992 Fungal Genomic Annotation Igor V. Grigoriev1, Diego A. Martinez2 and Asaf A. Salamov1 1US Department of Energy Joint Genome Institute, Walnut Creek, CA 94598 ([email protected], [email protected]); 2Los Alamos National Laboratory Joint Genome Institute, P.O. Box 1663 Los Alamos, NM 87545 ([email protected]). Sequencing technology in the last decade has advanced at an incredible pace. Currently there are hundreds of microbial genomes available with more still to come. Automated genome annotation aims to analyze this amount of sequence data in a high-throughput fashion and help researches to understand the biology of these organisms. Manual curation of automatically annotated genomes validates the predictions and set up 'gold' standards for improving the methodologies used. Here we review the methods and tools used for annotation of fungal genomes in different genome sequencing centers. 1. INTRODUCTION In recent years the power of DNA sequencing has dramatically increased, with dedicated centers running 24 hours a day 7 days a week able to produce as much as 2 gigabases of raw sequence or more a month. The researchers who work on a variety of fungi are fortunate, as most fungal genomes are under 50 megabases and produce high- quality draft assembly almost as easily as bacteria. This feature of fungal genomes is a key reason that the first sequenced eukaryotic genome was of the ascomycete Saccharomyces cerevisiae (Goffeau et al. 1996). As of the submission of this chapter, one can obtain draft sequences of more than 100 fungal genomes (Table 1) and the list is growing. While some are species of the same genus (e.g., Aspergillus has three members and more coming), there still remains a height of data that could confuse and bury a researcher for many years. -
Gene Prediction Using Deep Learning
FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO Gene prediction using Deep Learning Pedro Vieira Lamares Martins Mestrado Integrado em Engenharia Informática e Computação Supervisor: Rui Camacho (FEUP) Second Supervisor: Nuno Fonseca (EBI-Cambridge, UK) July 22, 2018 Gene prediction using Deep Learning Pedro Vieira Lamares Martins Mestrado Integrado em Engenharia Informática e Computação Approved in oral examination by the committee: Chair: Doctor Jorge Alves da Silva External Examiner: Doctor Carlos Manuel Abreu Gomes Ferreira Supervisor: Doctor Rui Carlos Camacho de Sousa Ferreira da Silva July 22, 2018 Abstract Every living being has in their cells complex molecules called Deoxyribonucleic Acid (or DNA) which are responsible for all their biological features. This DNA molecule is condensed into larger structures called chromosomes, which all together compose the individual’s genome. Genes are size varying DNA sequences which contain a code that are often used to synthesize proteins. Proteins are very large molecules which have a multitude of purposes within the individual’s body. Only a very small portion of the DNA has gene sequences. There is no accurate number on the total number of genes that exist in the human genome, but current estimations place that number between 20000 and 25000. Ever since the entire human genome has been sequenced, there has been an effort to consistently try to identify the gene sequences. The number was initially thought to be much higher, but it has since been furthered down following improvements in gene finding techniques. Computational prediction of genes is among these improvements, and is nowadays an area of deep interest in bioinformatics as new tools focused on the theme are developed. -
Prediction of Protein-Protein Interactions and Essential Genes Through Data Integration
Prediction of Protein-Protein Interactions and Essential Genes Through Data Integration by Max Kotlyar A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Department of Medical Biophysics University of Toronto Copyright °c 2011 by Max Kotlyar Abstract Prediction of Protein-Protein Interactions and Essential Genes Through Data Integration Max Kotlyar Doctor of Philosophy Graduate Department of Department of Medical Biophysics University of Toronto 2011 The currently known network of human protein-protein interactions (PPIs) is pro- viding new insights into diseases and helping to identify potential therapies. However, according to several estimates, the known interaction network may represent only 10% of the entire interactome – indicating that more comprehensive knowledge of the inter- actome could have a major impact on understanding and treating diseases. The primary aim of this thesis was to develop computational methods to provide increased coverage of the interactome. A secondary aim was to gain a better understanding of the link between networks and phenotype, by analyzing essential mouse genes. Two algorithms were developed to predict PPIs and provide increased coverage of the interactome: F pClass and mixed co-expression. F pClass differs from previous PPI prediction methods in two key ways: it integrates both positive and negative evidence for protein interactions, and it identifies synergies between predictive features. Through these approaches F pClass provides interaction networks with significantly improved reli- ability and interactome coverage. Compared to previous predicted human PPI networks, FpClass provides a network with over 10 times more interactions, about 2 times more pro- teins and a lower false discovery rate. -
Supplementary Materials
Supplementary Materials COMPARATIVE ANALYSIS OF THE TRANSCRIPTOME, PROTEOME AND miRNA PROFILE OF KUPFFER CELLS AND MONOCYTES Andrey Elchaninov1,3*, Anastasiya Lokhonina1,3, Maria Nikitina2, Polina Vishnyakova1,3, Andrey Makarov1, Irina Arutyunyan1, Anastasiya Poltavets1, Evgeniya Kananykhina2, Sergey Kovalchuk4, Evgeny Karpulevich5,6, Galina Bolshakova2, Gennady Sukhikh1, Timur Fatkhudinov2,3 1 Laboratory of Regenerative Medicine, National Medical Research Center for Obstetrics, Gynecology and Perinatology Named after Academician V.I. Kulakov of Ministry of Healthcare of Russian Federation, Moscow, Russia 2 Laboratory of Growth and Development, Scientific Research Institute of Human Morphology, Moscow, Russia 3 Histology Department, Medical Institute, Peoples' Friendship University of Russia, Moscow, Russia 4 Laboratory of Bioinformatic methods for Combinatorial Chemistry and Biology, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow, Russia 5 Information Systems Department, Ivannikov Institute for System Programming of the Russian Academy of Sciences, Moscow, Russia 6 Genome Engineering Laboratory, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia Figure S1. Flow cytometry analysis of unsorted blood sample. Representative forward, side scattering and histogram are shown. The proportions of negative cells were determined in relation to the isotype controls. The percentages of positive cells are indicated. The blue curve corresponds to the isotype control. Figure S2. Flow cytometry analysis of unsorted liver stromal cells. Representative forward, side scattering and histogram are shown. The proportions of negative cells were determined in relation to the isotype controls. The percentages of positive cells are indicated. The blue curve corresponds to the isotype control. Figure S3. MiRNAs expression analysis in monocytes and Kupffer cells. Full-length of heatmaps are presented. -
"An Overview of Gene Identification: Approaches, Strategies, and Considerations"
An Overview of Gene Identification: UNIT 4.1 Approaches, Strategies, and Considerations Modern biology has officially ushered in a new era with the completion of the sequencing of the human genome in April 2003. While often erroneously called the “post-genome” era, this milestone truly marks the beginning of the “genome era,” a time in which the availability of sequence data for many genomes will have a significant effect on how science is performed in the 21st century. While complete human sequence data is now available at an overall accuracy of 99.99%, the mere availability of all of these As, Cs, Ts, and Gs still does not answer some of the basic questions regarding the human genome—how many genes actually comprise the genome, how many of these genes code for multiple gene products, and where those genes actually lie along the complement of human chromosomes. Current estimates, based on preliminary analyses of the draft sequence, place the number of human genes at ∼30,000 (International Human Genome Sequencing Consortium, 2001). This number is in stark contrast to previously-suggested estimates, which had ranged as high as 140,000. A number that is in the 30,000 range brings into question the one-gene, one-protein hypothesis, underscoring the importance of processes such as alternative splicing in the generation of multiple gene products from a single gene. Finding all of the genes and the positions of those genes within the human genome sequence—and in other model organism genome sequences as well—requires the devel- opment and application of robust computational methods, some of which are listed in Table 4.1.1.