Non-Cellulosomal Cohesin- and Dockerin-Like Modules in the Three Domains of Life Ayelet Peera, Steven P

Total Page:16

File Type:pdf, Size:1020Kb

Non-Cellulosomal Cohesin- and Dockerin-Like Modules in the Three Domains of Life Ayelet Peera, Steven P 1 Non-cellulosomal Cohesin- and Dockerin-like Modules in the Three Domains of Life Ayelet Peera, Steven P. Smithb, Edward A. Bayerc,*, Raphael Lameda and Ilya Borovoka aDepartment of Molecular Microbiology and Biotechnology, Tel Aviv University, Ramat Aviv 69978 Israel bDepartment of Biochemistry, Queen’s University Kingston Ontario Canada K7L 3N6 cDepartment of Biological Sciences, Weizmann Institute of Science, Rehovot 76100 Israel *Corresponding author: Edward A. Bayer Tel: (+972) -8-934-2373 Fax: (+972)-8-946-8256 Email: [email protected] Supplementary Table S1. Compendium of cohesins and dockerins in the three domains of life. In order to discover new putative cohesin/dockerin-containing proteins, we used sequences of all the classical cohesin and dockerin modules from C. thermocellum, C. cellulovorans, C. cellulolyticum, B. cellulosolvens and Acetivibrio cellulolyticus as well as cohesins and dockerins recently discovered in rumen bacteria, Ruminococcus albus and R. flavefaciens as BlastP queries for the main NCBI Blast server against all non- redundant protein sequences deposited in GenBank/EMBL/DDBJ databases. We also performed extensive searches using the TblastN algorithm through all publicly available microbial genome databases including those attached to the NCBI BLAST server for bacterial genomes (http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?), as well as several additional microbial genome databases – Microbial Genomics at the DOE Joint Genome Institute (http://genome.jgi-psf.org/mic_home.html), the Rumenomics database at TIGR/JCVI (http://tigrblast.tigr.org/rumenomics/index.cgi) and Bacterial Genomes at the Sanger Centre (http://www.sanger.ac.uk/Projects/Microbes/). Once a putative cohesin or dockerin-encoding gene product was identified, gene-walking techniques were employed to analyze and locate possible cellulosome-like gene clusters. The date of the final search of the database, used for construction of this table, was September 23, 2008. 2 ARCHAEA Euryarchaeota (No cohesins or dockerins are currently predicted in Crenarchaeota) Gene Protein Putative Putative Annotation and comments1 size (no. cohesin dockerin residues) Archaeoglobus fulgidus DSM 4304 / ATCC 49558 (Archaeoglobi; Archaeoglobales) AF2375 506 279..418 436..488 No obvious signal peptide (SP). TMD (69..91) + Cohesin + (operon) Dockerin Cohesin AF2375 TTIIAGSAEAPQGSDIQVPVKIENADKVGSINLILSYPNVLEVEDV LQGSLTQNSLFDYNVEGNQIKVGIADSNGISGDGSLFYVKFRVTGN EKAEQAENVKGKLRGLGQQLSEITLRNSHALTLQGIEIYDIDGNSV KV Dockerin AF23751 GDVNGDGEINSLDALLALQMSIG KVEPNPV ADMDGDGKVLAKDATEIMKMATD AF2376 190 33..156 - SP, Hypothetical protein, +Cohesin, a 'strong' TMD (167..185) (operon) Cohesin AF2376 MVVKIPDTSGAVGGTVEVPVEVENAQNLGSMDIVIVYDPTILKVQN VGKGELNKGLLSSNTGEGMVAISLADSKGINGKGSVAVITFQVLKA GSTDLTIQSVKAYDVNTHVDIPVKADNGKFEA 1In dockerin sequences, calcium-coordinating amino acid residues in positions 1, 3, 5, 9 and 12 are highlighted (those fitting the consensus are in blue, atypical residues – in gray; an asterisks follow the C-terminal residue in some proteins. 3 Haloarcula marismortui / Halobacterium marismortui ATCC 43049 (Halobacteria; Halobacteriales) phoD or 741 - 688..741 Alkaline phosphatase D, EC 3.1.3.1; a putative dockerin is similar rrnAC0273 to that in Mhun_3046, Methanospirillum hungatei, (chromosome I) Putative dockerin: EDVNGNGRVDYDDIQLLFDNFDD DSVALNKTA YDFNENGKLEYDDIVTLYSEVN* rrnAC0766 892 - 839..892 Putative cell surface protein, (chromosome I) Putative dockerin: EDLNGNGRLDYEDVQVLFSNMDS DSVQLNTGA YDFNENGKLDFADVTALYEEVN* rrnAC2380; 615 - 559-613 putative serine/threonine phosphatase (PF00149; Metallophos) operon? Putative dockerin: (chromosome I) GDADGDGDVDTDDVDRIQRSVAG EDSDIDRDA ADIDGDGDVDIGDAVSARNLAGD rrnAC2381; 442 42-175 - Hypothetical protein; Cohesin domain predicted in both, Pfam operon? (PF00963) and InterPro (IPR002102). (chromosome I) Putative cohesin: PEASGDALSVAPGETVTVQVWANATAVRGYQTNVTFDPTIAEIDSV AGSDDFDSPVTNVNNEAGWVAFNQLRSRETSDPVLAEITLTVPADA AGSTQLAFIEADTKLSNSDGETVAPAAFNSIELRVNNGSTTP rrnB0167 1562 - 126..180 Cbp, calcium-binding protein-like: approx. 8 TSP_3 like domains; (chromosome II) PF00404; dockerin_1 EDVNGDGAVNIVDVDALSRHLES TAAGANWSA YDYTGDNRTDVGDIQWLFVATRS Halobacterium salinarium NRC-1 / Halobacterium halobium NRC-1 (Halobacteria; Halobacteriales) VNG1951G 585 - 532..585 Subtilisin homolog or Sub; PF00404; Dockerin_1 EDVNGDGVVDMADVELLYQLALR NAVPESTTA FDVTGDGKFDMHDVQALTRQVE* 4 Haloquadratum walsbyi DSM 16790 (Halobacteria; Halobacteriales) HQ1205A, 1626 1285-1420 1571-1626 Probable cell surface adhesin, operon? Putative cohesin: VEGVELQSGNTTTLATNFTLREDTTADLTATVEFSNETAIENTDVT FSGSGGVPSSNDASLPVTQTTNSSGIVQFEIPASDTNDAYTITADS ANFSNASTGVSPAPGDSVSRDFSLEEAQAQEPSVSVNFTDQTV Putative dockerin: EDVNGDNATNFDDVVALAFVLTE KDDLTQQQRDA FDINDDGVVNFDDVVRLAFPGQ* HQ1208A, 619 - 558-619 Subtilisin-like serine protease Sub, operon? Putative dockerin: EDVNGDNGYTFDDVIALAFADTE RLTNQQRDA LDFDHDGNIDFDDVVELAFER* HQ1210A, 125 71-125 Hypothetical protein, operon? Putative dockerin: EDVNGDAESTFDDVIALAFVISQ TGQLTEQQRDA LNIDGEDDVDFDNVIALTFNI* HQ3467A, 2079 - 1952-2006; Probable cell surface glycoprotein Hmu3 operon? 2027-2079 Putative dockerin: EDINGDGEFNFDDVVALAFNTGT DPIQQNSDL FDINNDGVVNFDDVVALAFESPS Putative dockerin: EDINGDGELNFDDVVALAFNIES DAVQQNTDQ FDFDNDGNVDFDDVVELAFET* HQ3469A, 506 - 443-506 Probable cell surface glycoprotein, operon? Putative dockerin: EDINGDGKANFDDVVTFAFKITT NAVQENADL FDFNDDNDVNFDDVIELAFTLPT Bolhuis et al., 2006. 5 Methanococcoides burtonii DSM 6242 (Methanomicrobia; Methanosarcinales) Mbur_0125 418 35-167 - Hypothetical protein precursor; Cohesin domain, Pfam (PF00963) and InterPro (IPR002102): VSISPPDQLISIGSNVTVIIYVDPDVPVSGAQFDLDFDGDILSVIS ISEGDIFTNGVSTLFSGGTIDNLEGNIMNVYGALFGKNEVIKSGTF ATIVFETKSSGSSSLQLSNVVVSNSIGAAVPIAVINGIVSINGD Mbur_0728*, 1126 846-979 1072-1123 Putative Ig precursor, operon with Cohesin, Pfam (PF00963) and InterPro (IPR002102): Mbur_0729? VVMSSSAQIVSPGQPFSIGISIDPSEPVSGAQLDLLFDEQLVSAEG TIEGDLFNLNGASTLFNSGTIGDSGVITDIYCSIIGAAPVSSKGTM * see its ortholog, ATIAFTAGNTAGVAEFSLSDVVISNARSEGTPYTVTDTMVVI MM_2098 Putative dockerin: WDINRDCAVNILDITLVSQKIGT DGAGTD QDVNQDGVVNIQDLTLVAQHFGE Mbur_0729, 226 53-178 - SP, annotated as "Cellulosome anchoring protein, precursor"; operon with Cohesin, Pfam (PF00963) and InterPro (IPR002102): Mbur_0728? FTVSSGEVLNVAPGETFNVDVFVSPDVDIAGMQFDLLFDSSKFQID SVTEGNLFKQSGMGTFFVAASVTPGLLDNTYGCILGNAAVLTPANF ATITITANEHASGRSAFILKDVIVSDPKGNAVNV Methanococcus aeolicus Nankai-3 / ATCC BAA-1280 (Methanococci; Methanococcales), isolated from deep marine sediment from the Nankai Trough off the coast of Japan Maeo_0111 421 - 24-73 Periplasmic binding protein precursor, Putative N-terminal dockerin: GDVNEDGSINIIDVVYLFKNRNL DYEV GDINCDNSINIIDVVYLFKHYDQ Maeo_0115 703 25-155 SP, PKD domain containing protein precursor, Pfam PF00963; InterPro IPR002102; Cohesin: NITVELTPNNTNTIVGDTINLDLFVKNIPNDTKCSGFETTLSYDSN LLNLTDIKLSDISNNADLKEADLSTGSIKIAWFSNEPSGNITIASL SFKTLNEATSPPATISLSSKTAVSNINGVGYNNITIINSNITIN 6 Maeo_0117 182 22-150 SP, hypothetical protein, Pfam PF00963; InterPro IPR002102; Cohesin: ASNVEVNCPSEVNVGDTFDIVITIKNNESLSGFECQVNVPDNLKII NVLENDDIRKKASNYYKNDFSDRNGIISFMLINEPMQSDFYLATIK VKVLKDSSNTTIHITPKAADKDANEIKMNNINLKLNV Maeo_0118 454 - 36-85 SP, ABC-type Fe3+-hydroxamate transport system periplasmic component-like protein precursor, Putative N-terminal dockerin: GDVNADGSINIIDVVYLFKNRNL DYED GDINADNSINIIDVVYLFKHYDQ Maeo_0119 976 23-160 - SP, APHP domain protein, Pfam PF07705; CARDB (x2); PF04473; DUF553; Pfam PF00963; InterPro IPR002102; Cohesin: NVSVELTPSNVNAIVGDTISLDLVVKNIPNDTKCSGFDTTLSYDSN LLNLTDIKLSDISNNANLKRANLSTGFISLSWSSDEPSGNITIASL SFKALNEATSPTTISLTQTIVSDIEPKGYKITLNNSNIIITKPKAD Methanococcus vannielii SB / ATCC 35089 / DSM 1224 (+ two other strains) (Methanococci; Methanococcales) Mevan_0131 411 - 14-63 Periplasmic binding protein, precursor Putative N-terminal dockerin: GDVNGDGNINIADVVYLFKNRNV PIDV GDLNCDGNVNVADVVYLFRNYDK Mevan_0132 695 22-145 SP, Putative uncharacterized protein, PKD domain, Pfam PF00963; InterPro IPR002102; Cohesin: SVVLNSSSDSFFIGESFDLELNLLNIPSDKKCGGFETRISYDKTQL QLSSITLSKDLNSDIGDVSLSNGRISIVWFSTPPTGDINIAKIKFN VLKEGNSTIELLGTTVSDSNGFTYNNMTLSNLKVKLSPLGYTVT Mevan_0135 754 18-145 SP, Putative uncharacterized protein, Pfam PF00963; InterPro IPR002102; Cohesin: IELSPNVINIQKGNNFTVDVLLRDIPEIAGYDSKLNYDKTILKLNS ISLSTDVNSASLKDTNLNNDGSVSLLWMGNSVTGNLTVATFTFEAI NEGDTKISLKNAALSDIEGQGINNFDTLDSEVLVFE 7 Methanococcus voltae A3 (Methanococci; Methanococcales) MvolDRAFT_0511, 453 - 26-82 Putative iron transport system substrate-binding protein, operon? precursor, Putative dockerin: GNYDYRLGDVNCDGKISVSDVVY LFNNRNMDIED GDVNADNKISVSDVVYLFNYYDK MvolDRAFT_0512, 1103 36-170 - SP, APHP domain protein precursor, operon? Putative cohesin: TNDNIVVSLAPQTQQVNNGDQFVLELWVNNSNITISAFESNFVYDS SNVNLTAIQLSGIANTAGLKTVDVSDNSISLAWMGGGITESNVNIA NLTFETFNVSSNEIRLRNLDITNQGGISTMPTDEHILDASVEV MvolDRAFT_0578 789 49-186 - Hypothetical protein. Pfam PF00963; Cohesin; InterPro IPR002102; Cohesin: NYSMTLSPTVKSAGVGDSFELTLWGNTPKVSAIQSKFNYDTAMLEL TDIELGTIADTASSKRAEVSSSLITMLWFDPSTAPEGYYKIATLSF KVLKGGNTSIVPRNYEFSDNTGVSITPVINKANISVSEGMKAELII MvolDRAFT_0734 451 - 25-81 Putative iron transport system substrate-binding protein, precursor, Putative dockerin: GNYDYRLGDINEDGKISVSDVVY LFNNRDLDIEDADVN ADVNADNKISVSDVVYLFNYYDE Methanoculleus marisnigri JR1 / ATCC 35101 / DSM 1498 (Methanomicrobia; Methanomicrobiales) Memar_0673 857 - 800-852 2 CARDB (PF07705), Aminotrans_II_pyridoxalP_BS (IPR001917), Putative dockerin: GDLNGDGNVDWADVTIAAGMAQK TAPSDPA ADVNGDGTVDWKDVALLADFFFG
Recommended publications
  • A Parts List for Fungal Cellulosomes Revealed by Comparative Genomics Charles H
    LETTERS PUBLISHED: 30 MAY 2017 | VOLUME: 2 | ARTICLE NUMBER: 17087 A parts list for fungal cellulosomes revealed by comparative genomics Charles H. Haitjema1, Sean P. Gilmore1,JohnK.Henske1, Kevin V. Solomon1†, Randall de Groot1, Alan Kuo2, Stephen J. Mondo2, Asaf A. Salamov2, Kurt LaButti2, Zhiying Zhao2, Jennifer Chiniquy2, Kerrie Barry2, Heather M. Brewer3,SamuelO.Purvine3,AaronT.Wright4, Matthieu Hainaut5,6, Brigitte Boxma7,TheovanAlen7†, Johannes H. P. Hackstein7, Bernard Henrissat5,6,8, Scott E. Baker3, Igor V. Grigoriev2,9 and Michelle A. O’Malley1* Cellulosomes are large, multiprotein complexes that tether enzyme transcripts yet found in nature7. The vast majority of plant biomass-degrading enzymes together for improved these enzymes carry non-catalytic dockerin domains (NCDDs), hydrolysis1. These complexes were first described in anaerobic which mediate assembly into large multiprotein complexes, or bacteria, where species-specific dockerin domains mediate the cellulosomes7. Cellulosomes were first described in anaerobic assembly of enzymes onto cohesin motifs interspersed within bacteria, and the modular interaction scheme native to these protein scaffolds1. The versatile protein assembly mechanism complexes has rapidly been exploited for biomass conversion and conferred by the bacterial cohesin–dockerin interaction is now enzyme tethering applications8–10. However, since the first report a standard design principle for synthetic biology2,3.For of cellulosome-like structures in anaerobic fungi more than decades, analogous structures have been reported in anaerobic twenty years ago11, the identification of dockerin-binding cohesins, fungi, which are known to assemble by sequence-divergent or scaffoldins that mediate assembly, has yet to be accomplished non-catalytic dockerin domains (NCDDs)4. However, the due to a lack of genomic data, functional proteomics and components, modular assembly mechanism and functional characterized strains.
    [Show full text]
  • The Cellulosome Paradigm in an Extreme Alkaline Environment Paripok Phitsuwan, Sarah Moraïs, Bareket Dassa, Bernard Henrissat, Edward Bayer
    The Cellulosome Paradigm in An Extreme Alkaline Environment Paripok Phitsuwan, Sarah Moraïs, Bareket Dassa, Bernard Henrissat, Edward Bayer To cite this version: Paripok Phitsuwan, Sarah Moraïs, Bareket Dassa, Bernard Henrissat, Edward Bayer. The Cellulo- some Paradigm in An Extreme Alkaline Environment. Microorganisms, MDPI, 2019, 7 (9), pp.347. 10.3390/microorganisms7090347. hal-02612575 HAL Id: hal-02612575 https://hal-amu.archives-ouvertes.fr/hal-02612575 Submitted on 19 May 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License microorganisms Article The Cellulosome Paradigm in An Extreme Alkaline Environment 1, 1,2 1 3,4 Paripok Phitsuwan y, Sarah Moraïs , Bareket Dassa , Bernard Henrissat and Edward A. Bayer 1,* 1 Department of Biomolecular Sciences, The Weizmann Institute of Science, Rehovot 7610001, Israel 2 Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva 8499000, Israel 3 Architecture et Fonction des Macromolécules Biologiques, CNRS and Aix-Marseille University and CNRS, 13288 Marseille, France 4 USC1408, INRA, Architecture et Fonction des Macromolécules Biologiques, 13288 Marseille, France * Correspondence: [email protected] Present address: Division of Biochemical Technology, School of Bioresources and Technology, y King Mongkut’s University of Technology Thonburi, Bangkuntien, Bangkok 10150, Thailand.
    [Show full text]
  • An Update on Jacalin-Like Lectins and Their Role in Plant Defense
    International Journal of Molecular Sciences Review An Update on Jacalin-Like Lectins and Their Role in Plant Defense Lara Esch and Ulrich Schaffrath * ID Department of Plant Physiology, RWTH Aachen University, 52056 Aachen, Germany; [email protected] * Correspondence: [email protected]; Tel.: +49-241-8020100 Received: 30 June 2017; Accepted: 20 July 2017; Published: 22 July 2017 Abstract: Plant lectins are proteins that reversibly bind carbohydrates and are assumed to play an important role in plant development and resistance. Through the binding of carbohydrate ligands, lectins are involved in the perception of environmental signals and their translation into phenotypical responses. These processes require down-stream signaling cascades, often mediated by interacting proteins. Fusing the respective genes of two interacting proteins can be a way to increase the efficiency of this process. Most recently, proteins containing jacalin-related lectin (JRL) domains became a subject of plant resistance responses research. A meta-data analysis of fusion proteins containing JRL domains across different kingdoms revealed diverse partner domains ranging from kinases to toxins. Among them, proteins containing a JRL domain and a dirigent domain occur exclusively within monocotyledonous plants and show an unexpected high range of family member expansion compared to other JRL-fusion proteins. Rice, wheat, and barley plants overexpressing OsJAC1, a member of this family, are resistant against important fungal pathogens. We discuss the possibility that JRL domains also function as a decoy in fusion proteins and help to alert plants of the presence of attacking pathogens. Keywords: fusion protein; JRL domain; plant resistance; dirigent protein; decoy; chimeric protein 1.
    [Show full text]
  • Sequence Based Prediction of Novel Domains in the Cellulosome of Ruminiclostridium Thermocellum
    bioRxiv preprint doi: https://doi.org/10.1101/066357; this version posted July 27, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Sequence based prediction of novel domains in the cellulosome of Ruminiclostridium thermocellum Zarrin Basharat, Azra Yasmin* Microbiology & Biotechnology Research Lab, Department of Environmental Sciences, Fatima Jinnah Women University, 46000, Pakistan *Corresponding author Email:[email protected] Abstract Ruminiclostridium thermocellum strain ATCC 27405 is valuable with reference to the next generation biofuel production being a degrader of crystalline cellulose. The completion of its genome sequence has revealed that this organism carries 3,376 genes with more than hundred genes encoding for enzymes involved in cellulysis. Novel protein domain discovery in the cellulose degrading enzyme complex of this strain has been attempted to understand this organism at molecular level. Streamlined automated methods were employed to generate possibly unreported or new domains. A set of 12 novel Pfam-B domains was developed after detailed analysis. This finding will enhance our understanding of this bacterium and its molecular processes involved in the degradation of cellulose. This approach of in silico analysis prior to experimentation facilitates in lab study. Previously uncorrelated data has been utilized for rapid generation of new biological information in this study. Keywords: Domain prediction, in silico, R. thermocellum, cellulosome. Note: This research was conducted in 2014 for Clostridium thermocellum ATCC 27405. The bacterium was later reannotated as Ruminiclostridium thermocellum.
    [Show full text]
  • A Gene Co-Expression Network Model Identifies Yield-Related Vicinity
    www.nature.com/scientificreports OPEN A gene co-expression network model identifes yield-related vicinity networks in Jatropha curcas Received: 7 February 2018 Accepted: 4 June 2018 shoot system Published: xx xx xxxx Nisha Govender1,2, Siju Senan1, Zeti-Azura Mohamed-Hussein2,3 & Ratnam Wickneswari1 The plant shoot system consists of reproductive organs such as inforescences, buds and fruits, and the vegetative leaves and stems. In this study, the reproductive part of the Jatropha curcas shoot system, which includes the aerial shoots, shoots bearing the inforescence and inforescence were investigated in regard to gene-to-gene interactions underpinning yield-related biological processes. An RNA-seq based sequencing of shoot tissues performed on an Illumina HiSeq. 2500 platform generated 18 transcriptomes. Using the reference genome-based mapping approach, a total of 64 361 genes was identifed in all samples and the data was annotated against the non-redundant database by the BLAST2GO Pro. Suite. After removing the outlier genes and samples, a total of 12 734 genes across 17 samples were subjected to gene co-expression network construction using petal, an R library. A gene co-expression network model built with scale-free and small-world properties extracted four vicinity networks (VNs) with putative involvement in yield-related biological processes as follow; heat stress tolerance, foral and shoot meristem diferentiation, biosynthesis of chlorophyll molecules and laticifers, cell wall metabolism and epigenetic regulations. Our VNs revealed putative key players that could be adapted in breeding strategies for J. curcas shoot system improvements. Jatropha curcas L. or the physic nut is an environmentally friendly and cost-efective feedstock for sustainable biofuel production.
    [Show full text]
  • Sequence and Structural Analysis of the Asp-Box Motif and Asp-Box Beta
    BMC Structural Biology BioMed Central Research article Open Access Sequence and structural analysis of the Asp-box motif and Asp-box beta-propellers; a widespread propeller-type characteristic of the Vps10 domain family and several glycoside hydrolase families Esben M Quistgaard1,2 and Søren S Thirup*1 Address: 1MIND Centre, Department of Molecular Biology, University of Aarhus, Gustav Wieds Vej 10C, DK 8000 Århus C, Denmark and 2Department of Medical Biochemistry and Biophysics, Karolinska Institute, 17177 Stockholm, Sweden Email: Esben M Quistgaard - [email protected]; Søren S Thirup* - [email protected] * Corresponding author Published: 13 July 2009 Received: 14 May 2009 Accepted: 13 July 2009 BMC Structural Biology 2009, 9:46 doi:10.1186/1472-6807-9-46 This article is available from: http://www.biomedcentral.com/1472-6807/9/46 © 2009 Quistgaard and Thirup; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: The Asp-box is a short sequence and structure motif that folds as a well-defined β- hairpin. It is present in different folds, but occurs most prominently as repeats in β-propellers. Asp- box β-propellers are known to be characteristically irregular and to occur in many medically important proteins, most of which are glycosidase enzymes, but they are otherwise not well characterized and are only rarely treated as a distinct β-propeller family. We have analyzed the sequence, structure, function and occurrence of the Asp-box and s-Asp-box -a related shorter variant, and provide a comprehensive classification and computational analysis of the Asp-box β- propeller family.
    [Show full text]
  • STACK: a Toolkit for Analysing Β-Helix Proteins
    STACK: a toolkit for analysing ¯-helix proteins Master of Science Thesis (20 points) Salvatore Cappadona, Lars Diestelhorst Abstract ¯-helix proteins contain a solenoid fold consisting of repeated coils forming parallel ¯-sheets. Our goal is to formalise the intuitive notion of a ¯-helix in an objective algorithm. Our approach is based on first identifying residues stacks — linear spatial arrangements of residues with similar conformations — and then combining these elementary patterns to form ¯-coils and ¯-helices. Our algorithm has been implemented within STACK, a toolkit for analyzing ¯-helix proteins. STACK distinguishes aromatic, aliphatic and amidic stacks such as the asparagine ladder. Geometrical features are computed and stored in a relational database. These features include the axis of the ¯-helix, the stacks, the cross-sectional shape, the area of the coils and related packing information. An interface between STACK and a molecular visualisation program enables structural features to be highlighted automatically. i Contents 1 Introduction 1 2 Biological Background 2 2.1 Basic Concepts of Protein Structure ....................... 2 2.2 Secondary Structure ................................ 2 2.3 The ¯-Helix Fold .................................. 3 3 Parallel ¯-Helices 6 3.1 Introduction ..................................... 6 3.2 Nomenclature .................................... 6 3.2.1 Parallel ¯-Helix and its ¯-Sheets ..................... 6 3.2.2 Stacks ................................... 8 3.2.3 Coils ..................................... 8 3.2.4 The Core Region .............................. 8 3.3 Description of Known Structures ......................... 8 3.3.1 Helix Handedness .............................. 8 3.3.2 Right-Handed Parallel ¯-Helices ..................... 13 3.3.3 Left-Handed Parallel ¯-Helices ...................... 19 3.4 Amyloidosis .................................... 20 4 The STACK Toolkit 24 4.1 Identification of Structural Elements ....................... 24 4.1.1 Stacks ...................................
    [Show full text]
  • (12) United States Patent (10) Patent No.: US 6,989,232 B2 Burgess Et Al
    USOO6989232B2 (12) United States Patent (10) Patent No.: US 6,989,232 B2 Burgess et al. (45) Date of Patent: Jan. 24, 2006 (54) PROTEINS AND NUCLEIC ACIDS Adams et al, Nature 377 (Suppl), 3 (1995).* ENCODING SAME Nakayama et al, Genomics 51(1), 27(1998).* Mahairas et al. Accession No. B45150 (Oct. 21, 1997).* (75) Inventors: Catherine E. Burgess, Wethersfield, Wallace et al, Methods Enzymol. 152: 432 (1987).* CT (US); Pamela B. Conley, Palo Alto, GenBank Accession No.: CAB01233 (Jun. 20, 2001). CA (US); William M. Grosse, SWALL (SPTR) Accession No.: Q9N4G7 (Oct. 1, 2000). Branford, CT (US); Matthew Hart, San GenBank Accession No.: Z77666 (Jun. 20, 2001). Francisco, CA (US); Ramesh Kekuda, GenBank Accession No. XM 038002 (Oct. 16, 2001). Stamford, CT (US); Richard A. GenBank Accession No. XM 039746 (Oct. 16, 2001). Shimkets, West Haven, CT (US); GenBank Accession No.: AAF51854 (Oct. 4, 2000). Kimberly A. Spytek, New Haven, CT GenBank Accession No.: AAF52569 (Oct. 4, 2000). (US); Edward Szekeres, Jr., Branford, GenBank Accession No.: AAF53188 (Oct. 4, 2000). CT (US); James E. Tomlinson, GenBank Accession No.: AAF55 108 (Oct. 5, 2000). Burlingame, CA (US); James N. GenBank Accession No.: AAF58048 (Oct. 4, 2000). Topper, Los Altos, CA (US); Ruey-Bin GenBank Accession No.: AAF59281 (Oct. 4, 2000). Yang, San Mateo, CA (US) GenBank Accession No.: AE003598 (Oct. 4, 2000). GenBank Accession No.: AE003619 (Oct. 4, 2000). (73) Assignees: Millennium Pharmaceuticals, Inc., GenBank Accession No.: AE003636 (Oct. 4, 2000). Cambridge, MA (US); Curagen GenBank Accession No.: AE003706 (Oct. 5, 2000). Corporation, New Haven, CT (US) GenBank Accession No.: AE003808 (Oct.
    [Show full text]
  • Comparative Sequence Analysis of the South African Vaccine Strain and Two Virulent field Isolates of Lumpy Skin Disease Virus
    Arch Virol (2003) 148: 1335–1356 DOI 10.1007/s00705-003-0102-0 Comparative sequence analysis of the South African vaccine strain and two virulent field isolates of Lumpy skin disease virus P. D. Kara1, C. L. Afonso2, D. B. Wallace1, G. F. Kutish2, C. Abolnik1, Z. Lu2, F. T. Vreede3, L. C. F. Taljaard1, A. Zsak2, G. J. Viljoen1, and D. L. Rock2 1Biotechnology Division, Onderstepoort Veterinary Institute, Onderstepoort, South Africa 2Plum Island Animal Disease Centre, Agricultural Research Service, U.S. Department of Agriculture, Greenport, New York, U.S.A. 3IGBMC, Illkirch, France Received November 25, 2002; accepted February 17, 2003 Published online May 5, 2003 c Springer-Verlag 2003 Summary.The genomic sequences of 3 strains of Lumpy skin disease virus (LSDV) (Neethling type) were compared to determine molecular differences, viz. the South African vaccine strain (LW), a virulent field-strain from a recent outbreak in South Africa (LD), and the virulent Kenyan 2490 strain (LK). A comparison between the virulent field isolates indicates that in 29 of the 156 putative genes, only 38 encoded amino acid differences were found, mostly in the variable terminal regions. When the attenuated vaccine strain (LW) was compared with field isolate LD, a total of 438 amino acid substitutions were observed. These were also mainly in the terminal regions, but with notably more frameshifts leading to truncated ORFs as well as deletions and insertions. These modified ORFs encode proteins involved in the regulation of host immune responses, gene expression, DNA repair, host-range specificity and proteins with unassigned functions. We suggest that these differences could lead to restricted immuno-evasive mechanisms and virulence factors present in attenuated LSDV strains.
    [Show full text]
  • Systematic Analysis of Kelch Repeat F-Box (KFB) Protein Family and Identification of Phenolic Acid Regulation Members in Salvia Miltiorrhiza Bunge
    G C A T T A C G G C A T genes Article Systematic Analysis of Kelch Repeat F-box (KFB) Protein Family and Identification of Phenolic Acid Regulation Members in Salvia miltiorrhiza Bunge Haizheng Yu 1,2, Mengdan Jiang 3, Bingcong Xing 1,2, Lijun Liang 1,2, Bingxue Zhang 1,2 and Zongsuo Liang 1,2,3,* 1 Institute of Soil and Water Conservation, Chinese Academy of Sciences & Ministry of Water Resource, Yangling 712100, China; [email protected] (H.Y.); [email protected] (B.X.); [email protected] (L.L.); [email protected] (B.Z.) 2 University of the Chinese Academy of Sciences, Beijing 100049, China 3 Zhejiang Province Key Laboratory of Plant Secondary Metabolism and Regulation, College of Life Sciences and Medicine, Zhejiang Sci-Tech University, Hangzhou 310018, China; [email protected] * Correspondence: [email protected]; Tel.: +86-0571-86843684 Received: 10 April 2020; Accepted: 12 May 2020; Published: 16 May 2020 Abstract: S. miltiorrhiza is a well-known Chinese herb for the clinical treatment of cardiovascular and cerebrovascular diseases. Tanshinones and phenolic acids are the major secondary metabolites and significant pharmacological constituents of this plant. Kelch repeat F-box (KFB) proteins play important roles in plant secondary metabolism, but their regulation mechanism in S. miltiorrhiza has not been characterized. In this study, we systematically characterized the S. miltiorrhiza KFB gene family. In total, 31 SmKFB genes were isolated from S. miltiorrhiza. Phylogenetic analysis of those SmKFBs indicated that 31 SmKFBs can be divided into four groups. Thereinto, five SmKFBs (SmKFB1, 2, 3, 5, and 28) shared high homology with other plant KFBs which have been described to be regulators of secondary metabolism.
    [Show full text]
  • Transcriptomic Characterization of Caecomyces Churrovis: a Novel, Non‑Rhizoid‑Forming Lignocellulolytic Anaerobic Fungus John K
    Henske et al. Biotechnol Biofuels (2017) 10:305 https://doi.org/10.1186/s13068-017-0997-4 Biotechnology for Biofuels RESEARCH Open Access Transcriptomic characterization of Caecomyces churrovis: a novel, non‑rhizoid‑forming lignocellulolytic anaerobic fungus John K. Henske1, Sean P. Gilmore1, Doriv Knop1, Francis J. Cunningham1, Jessica A. Sexton1, Chuck R. Smallwood2, Vaithiyalingam Shutthanandan2, James E. Evans2, Michael K. Theodorou3 and Michelle A. O’Malley1* Abstract Anaerobic gut fungi are the primary colonizers of plant material in the rumen microbiome, but are poorly studied due to a lack of characterized isolates. While most genera of gut fungi form extensive rhizoidal networks, which likely participate in mechanical disruption of plant cell walls, fungi within the Caecomyces genus do not possess these rhizoids. Here, we describe a novel fungal isolate, Caecomyces churrovis, which forms spherical sporangia with a limited rhizoidal network yet secretes a diverse set of carbohydrate active enzymes (CAZymes) for plant cell wall hydrolysis. Despite lacking an extensive rhizoidal system, C. churrovis is capable of growth on fbrous substrates like switchgrass, reed canary grass, and corn stover, although faster growth is observed on soluble sugars. Gut fungi have been shown to use enzyme complexes (fungal cellulosomes) in which CAZymes bind to non-catalytic scafoldins to improve bio- mass degradation efciency. However, transcriptomic analysis and enzyme activity assays reveal that C. churrovis relies more on free enzymes compared to other gut fungal isolates. Only 15% of CAZyme transcripts contain non-catalytic dockerin domains in C. churrovis, compared to 30% in rhizoid-forming fungi. Furthermore, C. churrovis is enriched in GH43 enzymes that provide complementary hemicellulose degrading activities, suggesting that a wider variety of these activities are required to degrade plant biomass in the absence of an extensive fungal rhizoid network.
    [Show full text]
  • Protein Family Expansions and Biological Complexity
    Protein Family Expansions and Biological Complexity Christine Vogel1,2*, Cyrus Chothia1 1 Medical Research Council Laboratory of Molecular Biology, Cambridge, United Kingdom, 2 Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America During the course of evolution, new proteins are produced very largely as the result of gene duplication, divergence and, in many cases, combination. This means that proteins or protein domains belong to families or, in cases where their relationships can only be recognised on the basis of structure, superfamilies whose members descended from a common ancestor. The size of superfamilies can vary greatly. Also, during the course of evolution organisms of increasing complexity have arisen. In this paper we determine the identity of those superfamilies whose relative sizes in different organisms are highly correlated to the complexity of the organisms. As a measure of the complexity of 38 uni- and multicellular eukaryotes we took the number of different cell types of which they are composed. Of 1,219 superfamilies, there are 194 whose sizes in the 38 organisms are strongly correlated with the number of cell types in the organisms. We give outline descriptions of these superfamilies. Half are involved in extracellular processes or regulation and smaller proportions in other types of activity. Half of all superfamilies have no significant correlation with complexity. We also determined whether the expansions of large superfamilies correlate with each other. We found three large clusters of correlated expansions: one involves expansions in both vertebrates and plants, one just in vertebrates, and one just in plants.
    [Show full text]