From Sequence to Function: Case Studies in Structural and Functional Genomics

Total Page:16

File Type:pdf, Size:1020Kb

From Sequence to Function: Case Studies in Structural and Functional Genomics 4 From Sequence to Function: Case Studies in Structural and Functional Genomics One of the main challenges facing biology is to assign biochemical and cellular functions to the thousands of hitherto uncharacterized gene products discovered by genome sequencing. This chapter discusses the strengths and limitations of the many experimental and computational methods, including those that use the vast amount of sequence infor- mation now available, to help determine protein structure and function. The chapter ends with two individual case studies that illustrate these methods in action, and show both their capabilities and the approaches that still must be developed to allow us to proceed from sequence to consequence. 4-0 Overview: From Sequence to Function in the Age of Genomics 4-1 Sequence Alignment and Comparison 4-2 Protein Profiling 4-3 Deriving Function from Sequence 4-4 Experimental Tools for Probing Protein Function 4-5 Divergent and Convergent Evolution 4-6 Structure from Sequence: Homology Modeling 4-7 Structure From Sequence: Profile-Based Threading and “Rosetta” 4-8 Deducing Function From Structure: Protein Superfamilies 4-9 Strategies for Identifying Binding Sites 4-10 Strategies for Identifying Catalytic Residues 4-11 TIM Barrels: One Structure with Diverse Functions 4-12 PLP Enzymes: Diverse Structures with One Function 4-13 Moonlighting: Proteins with More than One Function 4-14 Chameleon Sequences: One Sequence with More than One Fold 4-15 Prions, Amyloids and Serpins: Metastable Protein Folds 4-16 Functions for Uncharacterized Genes: Galactonate Dehydratase 4-17 Starting From Scratch: A Gene Product of Unknown Function 4-0 Overview: From Sequence to Function in the Age of Genomics Genomics is making an increasing contribution to the study of protein structure and function The relatively new discipline of genomics has great implications for the study of protein structure and function. The genome-sequencing programs are providing more amino-acid sequences of proteins of unknown function to analyze than ever before, and many computational and experimental tools are now available for comparing these sequences with those of proteins of known structure and function to search for clues to their roles in the cell or organism. Also underway are systematic efforts aimed at providing the three-dimensional structures, subcellular locations, interacting partners, and deletion phenotypes for all the gene products in several model organisms. These databases can also be searched for insights into the functions of these proteins and their corresponding proteins in other organisms. Sequence and structural comparison can usually give only limited information, however, and comprehensively characterizing the function of an uncharacterized protein in a cell or organism will always require additional experimental investigations on the purified protein in vitro as well as cell biological and mutational studies in vivo. Different experimental methods are required to define a protein’s function precisely at biochemical, cellular, and organismal levels in order to characterize it completely, as shown in Figure 4-1. In this chapter we first look at methods of comparing amino-acid sequences to determine their similarity and to search for related sequences in the sequence databases. Sequence comparison alone gives only limited information at present, and in most cases, other experimental and structural information is also important for indicating possible biochemical function and mechanism of action. We next provide a summary of some of the genome-driven experimental tools for probing function. We then describe computational methods that are being developed to deduce the protein fold of an uncharacterized protein from its sequence. The existence of large families of structurally related proteins with similar functions, at least at the biochemical level, is enabling sequence and structural motifs characteristic of various functions to be identified. Protein structures can also be screened for possible ligand-binding sites and catalytic active sites by both computational and experimental methods. As we see next, predicting a protein’s function from its structure alone is complicated by the fact that evolution has produced proteins with almost identical structures but different functions, proteins with quite different structures but the same function, and even multifunctional proteins which have more than one biochemical function and numerous cellular and physiological functions. We shall also see that some proteins can adopt more than one stable protein fold, a change which can sometimes lead to disease. The chapter ends with two case histories illustrating how a range of different approaches were combined to determine aspects of the functions of two uncharacterized proteins from the genome sequences of E. coli and yeast, respectively. Figure 4-1 Time and distance scales in functional genomics The various levels of function of proteins encompass an enormous range of time (scale on the left) and distance (scale on the right). Depending on the time and distance regime involved, different experimental approaches are required to probe function. Since many genes code for proteins that act in processes that cross multiple levels on this diagram (for example, a protein kinase may catalyze tyrosine phosphorylation at typical enzyme rates, but may also be required for cell division in embryonic development), no single experimental technique is adequate to dissect all their roles. In the age of genomics, interdisciplinary approaches are essential to determine the functions of gene products. Definitions genomics: the study of the DNA sequence and gene content of whole genomes. 130 Chapter 4 From Sequence to Function ©2004 New Science Press Ltd Overview: From Sequence to Function in the Age of Genomics 4-0 Time Process Example System Example Detection Methods Distance –15 10 electron photosynthetic optical sec 1 Å transfer reaction center spectroscopy proton triosephosphate fast 10–9 transfer isomerase kinetics sec 2–10 Å 10–6 fastest catalase, fumarase, kinetics sec enzyme reactions carbonic anhydrase typical trypsin, protein kinase A, kinetics, –3 10 enzyme reactions ketosteroid isomerase time-resolved X-ray, sec nuclear magnetic resonance slow cytochrome P450, kinetics, sec enzyme reactions/cycles phosphofructokinase low T X-ray, Å – nm nuclear magnetic resonance, mass spectroscopy min/ protein synthesis/ budding yeast cell light microscopy, nm – µm hour cell division genetics, optical probes µ day/ embryonic mouse embryo genetics, m – m year development microscopy, microarray analysis References Houry, W.A. et al.: Identification of in vivo substrates initiative on yeast proteins. J. Synchrotron. Radiat. of the chaperonin GroEL. Nature 1999, 402:147–154. 2003, 10:4–8. Brazhnik, P. et al.: Gene networks: how to put the func- tion in genomics. Trends Biotechnol. 2002, 20:467–472. Koonin E.V. et al.: The structure of the protein universe Tefferi, A. et al.: Primer on medical genomics parts and genome evolution. Nature 2002, 420:218–223. I–IV. Mayo. Clin. Proc. 2002, 77:927–940. Chan, T.-F. et al.: A chemical genomics approach toward understanding the global functions of the O’Donovan, C. et al.: The human proteomics initiative Tong, A.H. et al.: Systematic genetic analysis with target of rapamycin protein (TOR). Proc.Natl Acad.Sci. (HPI). Trends Biotechnol. 2001, 19:178–181. ordered arrays of yeast deletion mutants. Science USA 2000, 97:13227–13232. 2001, 294:2364–2368. Oliver S.G.: Functional genomics: lessons from yeast. Guttmacher A.E. and Collins, F.S.: Genomic medicine— Philos.Trans. R. Soc. Lond. B. Biol. Sci. 2002, 357:17–23. von Mering, C. et al.: Comparative assessment of large- a primer. N. Engl. J. Med. 2002, 347:1512–1520. scale data sets of protein–protein interactions. Quevillon-Cheruel, S. et al.: A structural genomics Nature 2002, 417:399–403. ©2004 New Science Press Ltd From Sequence to Function Chapter 4 131 4-1 Sequence Alignment and Comparison S.c. Kss1 INNQNSGFSTLSDDHVQYFTYQILRALKSIHSAQVI H.s. Erk2 LKTQH-----LSNDHICYFLYQILRGLKYIHSANVL Sequence comparison provides a measure of the relationship between genes HRDIKPSNLLLNSNCDLKVCDFGLARCLASSSDSRET HRDLKPSNLLLNTTCDLKICDFGLARVA----DPDHD The comparison of one nucleotide or amino-acid sequence with another to find the degree Figure 4-2 Pairwise alignment Part of an of similarity between them is a key technique in present-day biology. A marked similarity alignment of the amino-acid sequences of the between two gene or protein sequences may reflect the fact that they are derived by evolution kinase domains from two ERK-like kinases of from the same ancestral sequence. Sequences related in this way are called homologous and the MAP kinase superfamily, Erk2 from humans the evolutionary similarity between them is known as homology. Unknown genes from and Kss1 from yeast. The region shown covers the kinase catalytic loop and part of the activation newly sequenced genomes can often be identified by searching for similar sequences in data- loop (see Figure 3-24). Identical residues bases of known gene and protein sequences using computer programs such as BLAST and highlighted in purple show the extensive FASTA. Sequences of the same protein from different species can also be compared in order similarity between these two homologous to deduce evolutionary relationships. Two genes that have evolved fairly recently from a kinases (their evolutionary relationship can be seen in Figure 4-5). To
Recommended publications
  • Hexachlorocyclohexane Dehydrochlorinase Lina
    PROTEINS:Structure,Function,andGenetics45:471–477(2001) IdentificationofProteinFoldandCatalyticResiduesof␥- HexachlorocyclohexaneDehydrochlorinaseLinA YujiNagata,1* KatsukiMori,2 MasamichiTakagi,2 AlexeyG.Murzin,3 andJirˇı´Damborsky´ 4 1GraduateSchoolofLifeSciences,TohokuUniversity,Sendai,Japan 2DepartmentofBiotechnology,TheUniversityofTokyo,Tokyo,Japan 3 CentreforProteinEngineering,MedicalResearchCouncilCentre,Cambridge,UnitedKingdom 4NationalCentreforBiomolecularResearch,MasarykUniversity,Brno,CzechRepublic ABSTRACT ␥-Hexachlorocyclohexanedehy- tion.Infact,wehaverevealedthatthreedifferenttypesof drochlorinase(LinA)isauniquedehydrochlorinase dehalogenases,dehydrochlorinaseLinA,4,5 halidohydro- thathasnohomologoussequenceattheaminoacid- laseLinB,6,7 andreductivedehalogenaseLinD,8 arese- sequencelevelandforwhichtheevolutionaryori- quentiallyinvolvedinthedegradationof␥-HCHinUT26.9 ginisunknown.WehereproposethatLinAisa Amongthesethreedehalogenases,LinAisthoughttobea memberofanovelstructuralsuperfamilyofpro- uniquedehydrochlorinase,basedonthefailureofFASTA 5 teinscontainingscytalonedehydratase,3-oxo-⌬ - andBLASTdatabasesearchestofindanysignificantly steroidisomerase,nucleartransportfactor2,and homologoussequencestothelinAgene.4 Thus,theorigin the␤-subunitofnaphthalenedioxygenase—all ofthelinAgeneisofgreatinterest,butisstillunknown. knownstructureswithdifferentfunctions.Thecat- LinAcatalyzestwostepsofdehydrochlorinationfrom alyticandtheactivesiteresiduesofLinAarepre- ␥-HCHto1,3,4,6-tetrachloro-1,4-cyclohexadiene(1,4- dictedonthebasisofitshomologymodel.Ninemu-
    [Show full text]
  • Molecular Markers of Serine Protease Evolution
    The EMBO Journal Vol. 20 No. 12 pp. 3036±3045, 2001 Molecular markers of serine protease evolution Maxwell M.Krem and Enrico Di Cera1 ment and specialization of the catalytic architecture should correspond to signi®cant evolutionary transitions in the Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, Box 8231, St Louis, history of protease clans. Evolutionary markers encoun- MO 63110-1093, USA tered in the sequences contributing to the catalytic apparatus would thus give an account of the history of 1Corresponding author e-mail: [email protected] an enzyme family or clan and provide for comparative analysis with other families and clans. Therefore, the use The evolutionary history of serine proteases can be of sequence markers associated with active site structure accounted for by highly conserved amino acids that generates a model for protease evolution with broad form crucial structural and chemical elements of applicability and potential for extension to other classes of the catalytic apparatus. These residues display non- enzymes. random dichotomies in either amino acid choice or The ®rst report of a sequence marker associated with serine codon usage and serve as discrete markers for active site chemistry was the observation that both AGY tracking changes in the active site environment and and TCN codons were used to encode active site serines in supporting structures. These markers categorize a variety of enzyme families (Brenner, 1988). Since serine proteases of the chymotrypsin-like, subtilisin- AGY®TCN interconversion is an uncommon event, it like and a/b-hydrolase fold clans according to phylo- was reasoned that enzymes within the same family genetic lineages, and indicate the relative ages and utilizing different active site codons belonged to different order of appearance of those lineages.
    [Show full text]
  • Legionella Genus Genome Provide Multiple, Independent Combinations for Replication in Human Cells
    Supplemental Material More than 18,000 effectors in the Legionella genus genome provide multiple, independent combinations for replication in human cells Laura Gomez-Valero1,2, Christophe Rusniok1,2, Danielle Carson3, Sonia Mondino1,2, Ana Elena Pérez-Cobas1,2, Monica Rolando1,2, Shivani Pasricha4, Sandra Reuter5+, Jasmin Demirtas1,2, Johannes Crumbach1,2, Stephane Descorps-Declere6, Elizabeth L. Hartland4,7,8, Sophie Jarraud9, Gordon Dougan5, Gunnar N. Schroeder3,10, Gad Frankel3, and Carmen Buchrieser1,2,* Table S1: Legionella strains analyzed in the present study Table S2: Type IV secretion systems predicted in the genomes analyzed Table S3: Eukaryotic like domains identified in the Legionella proteins analyzed Table S4: Small GTPases domains detected in the genus Legionella as defined in the CDD ncbi domain database Table S5: Eukaryotic like proteins detected in the Legionella genomes analyzed in this study Table S6: Aminoacid identity of the Dot/Icm components in Legionella species with respect to orthologous proteins in L. pneumophila Paris Table S7: Distribution of seventeen highly conserved Dot/Icm secreted substrates Table S8: Comparison of the effector reperotoire among strains of the same Legionella species Table S9. Number of Dot/Icm secreted proteins predicted in each strain analyzed Table S10: Replication capacity of the different Legionella species analyzed in this study and collection of literature data on Legionella replication Table S11: Orthologous table for all genes of the 80 analyzed strains based on PanOCT. The orthologoss where defined with the program PanOCT using the parameters previously indicated in material and methods.) Figure S1: Distribution of the genes predicted to encode for the biosynthesis of flagella among all Legionella species.
    [Show full text]
  • Assembly of an Active Enzyme by the Linkage of Two Protein Modules
    Proc. Natl. Acad. Sci. USA Vol. 94, pp. 1069–1073, February 1997 Biochemistry Assembly of an active enzyme by the linkage of two protein modules A. E. NIXON,M.S.WARREN, AND S. J. BENKOVIC* 152 Davey Laboratory, Department of Chemistry, Pennsylvania State University, University Park, PA 16802-6300 Contributed by S. J. Benkovic, December 9, 1996 ABSTRACT The feasibility of creating new enzyme activ- design enzymes with novel properties. Previous approaches to ities from enzymes of known function has precedence in view the design of proteins with novel activities have included of protein evolution based on the concepts of molecular catalytic antibodies (8, 9); introduction of metal ion binding recruitment and exon shuffling. The enzymes encoded by the sites, such as the one engineered into trypsin to allow either Escherichia coli genes purU and purN, N10-formyltetrahydro- control of the proteolytic activity (10) or to regulate specificity folate hydrolase and glycinamide ribonucleotide (GAR) trans- (11); creation of hybrid enzymes through exchange of subunits formylase, respectively, catalyze similiar yet distinct reactions. to create hybrid oligomers (12); replacement of structural N10-formyltetrahydrofolate hydrolase uses water to cleave elements such as the DNA binding domain of GCN4 with that N10-formyltetrahydrofolate into tetrahydrofolate and for- of CyEBP (13); mutation of multiple individual residues to mate, whereas GAR transformylase catalyses the transfer of change the cofactor specificity of glutathione reductase from formyl from N10-formyltetrahydrofolate to GAR to yield NADPH to NADH (14), modulation of the substrate speci- formyl-GAR and tetrahydrofolate. The two enzymes show ficity of aspartate aminotransferase (15); and changing the significant homology ('60%) in the carboxyl-terminal region specificity of subtilisin (16) and a-lytic protease (17) through which, from the GAR transformylase crystal structure and mutation of single functional groups.
    [Show full text]
  • Resolution of Carbon Metabolism and Sulfur-Oxidation Pathways of Metallosphaera Cuprina Ar-4 Via Comparative Proteomics
    JOURNAL OF PROTEOMICS 109 (2014) 276– 289 Available online at www.sciencedirect.com ScienceDirect www.elsevier.com/locate/jprot Resolution of carbon metabolism and sulfur-oxidation pathways of Metallosphaera cuprina Ar-4 via comparative proteomics Cheng-Ying Jianga, Li-Jun Liua, Xu Guoa, Xiao-Yan Youa, Shuang-Jiang Liua,c,⁎, Ansgar Poetschb,⁎⁎ aState Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, PR China bPlant Biochemistry, Ruhr University Bochum, Bochum, Germany cEnvrionmental Microbiology and Biotechnology Research Center, Institute of Microbiology, Chinese Academy of Sciences, Beijing, PR China ARTICLE INFO ABSTRACT Article history: Metallosphaera cuprina is able to grow either heterotrophically on organics or autotrophically Received 16 March 2014 on CO2 with reduced sulfur compounds as electron donor. These traits endowed the species Accepted 6 July 2014 desirable for application in biomining. In order to obtain a global overview of physiological Available online 14 July 2014 adaptations on the proteome level, proteomes of cytoplasmic and membrane fractions from cells grown autotrophically on CO2 plus sulfur or heterotrophically on yeast extract Keywords: were compared. 169 proteins were found to change their abundance depending on growth Quantitative proteomics condition. The proteins with increased abundance under autotrophic growth displayed Bioleaching candidate enzymes/proteins of M. cuprina for fixing CO2 through the previously identified Autotrophy 3-hydroxypropionate/4-hydroxybutyrate cycle and for oxidizing elemental sulfur as energy Heterotrophy source. The main enzymes/proteins involved in semi- and non-phosphorylating Entner– Industrial microbiology Doudoroff (ED) pathway and TCA cycle were less abundant under autotrophic growth. Also Extremophile some transporter proteins and proteins of amino acid metabolism changed their abundances, suggesting pivotal roles for growth under the respective conditions.
    [Show full text]
  • Biased Signaling of G Protein Coupled Receptors (Gpcrs): Molecular Determinants of GPCR/Transducer Selectivity and Therapeutic Potential
    Pharmacology & Therapeutics 200 (2019) 148–178 Contents lists available at ScienceDirect Pharmacology & Therapeutics journal homepage: www.elsevier.com/locate/pharmthera Biased signaling of G protein coupled receptors (GPCRs): Molecular determinants of GPCR/transducer selectivity and therapeutic potential Mohammad Seyedabadi a,b, Mohammad Hossein Ghahremani c, Paul R. Albert d,⁎ a Department of Pharmacology, School of Medicine, Bushehr University of Medical Sciences, Iran b Education Development Center, Bushehr University of Medical Sciences, Iran c Department of Toxicology–Pharmacology, School of Pharmacy, Tehran University of Medical Sciences, Iran d Ottawa Hospital Research Institute, Neuroscience, University of Ottawa, Canada article info abstract Available online 8 May 2019 G protein coupled receptors (GPCRs) convey signals across membranes via interaction with G proteins. Origi- nally, an individual GPCR was thought to signal through one G protein family, comprising cognate G proteins Keywords: that mediate canonical receptor signaling. However, several deviations from canonical signaling pathways for GPCR GPCRs have been described. It is now clear that GPCRs can engage with multiple G proteins and the line between Gprotein cognate and non-cognate signaling is increasingly blurred. Furthermore, GPCRs couple to non-G protein trans- β-arrestin ducers, including β-arrestins or other scaffold proteins, to initiate additional signaling cascades. Selectivity Biased Signaling Receptor/transducer selectivity is dictated by agonist-induced receptor conformations as well as by collateral fac- Therapeutic Potential tors. In particular, ligands stabilize distinct receptor conformations to preferentially activate certain pathways, designated ‘biased signaling’. In this regard, receptor sequence alignment and mutagenesis have helped to iden- tify key receptor domains for receptor/transducer specificity.
    [Show full text]
  • Structure-Function Studies of Iron-Sulfur Enzyme Systems
    Structure-Function Studies of Iron-Sulfur Enzyme Systems Rosmarie Friemann Department of Molecular Biology Uppsala Doctoral thesis Swedish University of Agricultural Sciences Uppsala 2005 1 Acta Universitatis Agriculturae Sueciae Agraria 504 ISSN 1401-6249 ISBN 91-576-6783-7 © 2004 Rosmarie Friemann, Uppsala Tryck: SLU Service/Repro, Uppsala 2004 2 Abstract Friemann, R., 2005, Structure-Function Studies of Iron-Sulfur Enzyme Systems. Doctorial dissertation. ISSN 1401-6249, ISBN 91-576-6783-7 Iron-sulfur clusters are among the most ancient of metallocofactors and serve a variety of biological functions in proteins, including electron transport, catalytic, and structural roles. Two kinds of multicomponent enzyme systems have been investigated by X-ray crystallography, the ferredoxin/thioredoxin system and bacterial Rieske non- heme iron dioxygenase (RDO) systems. The ferredoxin/thioredoxin system is a light sensitive system controlling the activities of key enzymes involved in the assimilatory (photosynthetic) and dissimilatory pathways in chloroplasts and photosynthetic bacteria. The system consists of a ferredoxin, ferredoxin:thioredoxin reductase (FTR), and two thioredoxins, Trx-m and Trx-f. In light, photosystem I reduces ferredoxin that reduces Trx-m and Trx- f. This two-electron reduction is catalyzed by FTR that contains a [4Fe-4S] center and a proximal disulfide bridge. When the first electron is delivered by the ferredoxin, an intermediate is formed where one thiol of the proximal disulfide attacks the disulfide bridge of thioredoxin. This results in a transient protein-protein complex held together by a mixed disulfide between FTR and Trx-m. This complex is stabilized by using a C40S mutant Trx-m and its structure have been determined.
    [Show full text]
  • Supporting Information High-Throughput Virtual Screening
    Supporting Information High-Throughput Virtual Screening of Proteins using GRID Molecular Interaction Fields Simone Sciabola, Robert V. Stanton, James E. Mills, Maria M. Flocco, Massimo Baroni, Gabriele Cruciani, Francesca Perruccio and Jonathan S. Mason Contents Table S1 S2-S21 Figure S1 S22 * To whom correspondence should be addressed: Simone Sciabola, Pfizer Research Technology Center, Cambridge, 02139 MA, USA Phone: +1-617-551-3327; Fax: +1-617-551-3117; E-mail: [email protected] S1 Table S1. Description of the 990 proteins used as decoy for the Protein Virtual Screening analysis. PDB ID Protein family Molecule Res. (Å) 1n24 ISOMERASE (+)-BORNYL DIPHOSPHATE SYNTHASE 2.3 1g4h HYDROLASE 1,3,4,6-TETRACHLORO-1,4-CYCLOHEXADIENE HYDROLASE 1.8 1cel HYDROLASE(O-GLYCOSYL) 1,4-BETA-D-GLUCAN CELLOBIOHYDROLASE I 1.8 1vyf TRANSPORT PROTEIN 14 KDA FATTY ACID BINDING PROTEIN 1.85 1o9f PROTEIN-BINDING 14-3-3-LIKE PROTEIN C 2.7 1t1s OXIDOREDUCTASE 1-DEOXY-D-XYLULOSE 5-PHOSPHATE REDUCTOISOMERASE 2.4 1t1r OXIDOREDUCTASE 1-DEOXY-D-XYLULOSE 5-PHOSPHATE REDUCTOISOMERASE 2.3 1q0q OXIDOREDUCTASE 1-DEOXY-D-XYLULOSE 5-PHOSPHATE REDUCTOISOMERASE 1.9 1jcy LYASE 2-DEHYDRO-3-DEOXYPHOSPHOOCTONATE ALDOLASE 1.9 1fww LYASE 2-DEHYDRO-3-DEOXYPHOSPHOOCTONATE ALDOLASE 1.85 1uk7 HYDROLASE 2-HYDROXY-6-OXO-7-METHYLOCTA-2,4-DIENOATE 1.7 1v11 OXIDOREDUCTASE 2-OXOISOVALERATE DEHYDROGENASE ALPHA SUBUNIT 1.95 1x7w OXIDOREDUCTASE 2-OXOISOVALERATE DEHYDROGENASE ALPHA SUBUNIT 1.73 1d0l TRANSFERASE 35KD SOLUBLE LYTIC TRANSGLYCOSYLASE 1.97 2bt4 LYASE 3-DEHYDROQUINATE DEHYDRATASE
    [Show full text]
  • Bioinformatic Protein Family Characterisation
    Linköping studies in science and technology Dissertation No. 1343 Bioinformatic protein family characterisation Joel Hedlund Department of Physics, Chemistry and Biology Linköping, 2010 1 The front cover shows a tree diagram of the relations between proteins in the MDR superfamily (papers III–IV), excluding non-eukaryotic sequences as well as four fifths of the remainder for clarity. In total, 518 out of the 16667 known members are shown, and 1.5 cm in the dendrogram represents 10 % sequence differences. The bottom bar diagram shows conservation in these sequences using the CScore algorithm from the MSAView program (papers II and V), with infrequent insertions omitted for brevity. This example illustrates the size and complexity of the MDR superfamily, and it also serves as an illuminating example of the intricacies of the field of bioinformatics as a whole, where, after scaling down and removing layer after layer of complexity, there is still always ample size and complexity left to go around. The back cover shows a schematic view of the three-dimensional structure of human class III alcohol dehydrogenase, indicating the positions of the zinc ion and NAD cofactors, as well as the Rossmann fold cofactor binding domain (red) and the GroES-like folding core of the catalytic domain (green). This thesis was typeset using LYX. Inkscape was used for figure layout. During the course of research underlying this thesis, Joel Hedlund was enrolled in Forum Scientium, a multidisciplinary doctoral programme at Linköping University, Sweden. Copyright © 2010 Joel Hedlund, unless otherwise noted. All rights reserved. Joel Hedlund Bioinformatic protein family characterisation ISBN: 978-91-7393-297-4 ISSN: 0345-7524 Linköping studies in science and technology, dissertation No.
    [Show full text]
  • Supplementary Information
    Supplementary information (a) (b) Figure S1. Resistant (a) and sensitive (b) gene scores plotted against subsystems involved in cell regulation. The small circles represent the individual hits and the large circles represent the mean of each subsystem. Each individual score signifies the mean of 12 trials – three biological and four technical. The p-value was calculated as a two-tailed t-test and significance was determined using the Benjamini-Hochberg procedure; false discovery rate was selected to be 0.1. Plots constructed using Pathway Tools, Omics Dashboard. Figure S2. Connectivity map displaying the predicted functional associations between the silver-resistant gene hits; disconnected gene hits not shown. The thicknesses of the lines indicate the degree of confidence prediction for the given interaction, based on fusion, co-occurrence, experimental and co-expression data. Figure produced using STRING (version 10.5) and a medium confidence score (approximate probability) of 0.4. Figure S3. Connectivity map displaying the predicted functional associations between the silver-sensitive gene hits; disconnected gene hits not shown. The thicknesses of the lines indicate the degree of confidence prediction for the given interaction, based on fusion, co-occurrence, experimental and co-expression data. Figure produced using STRING (version 10.5) and a medium confidence score (approximate probability) of 0.4. Figure S4. Metabolic overview of the pathways in Escherichia coli. The pathways involved in silver-resistance are coloured according to respective normalized score. Each individual score represents the mean of 12 trials – three biological and four technical. Amino acid – upward pointing triangle, carbohydrate – square, proteins – diamond, purines – vertical ellipse, cofactor – downward pointing triangle, tRNA – tee, and other – circle.
    [Show full text]
  • Protein Structure and Function from Sequence to Function (I)
    BCHS 6229 ProteinProtein StructureStructure andand FunctionFunction Lecture 6 (Oct 27, 2011) From Sequence to Function (I): - Protein Profiling - Case Studies in Structural & Functional Genomics 1 From Sequence to Function in the age of Genomics How to predict functions for 1000s proteins? 1. “Conventional” sequence pattern 2. “Fold similarity - Structural genomics 3. Clustering microarray experiments 4. Data integration 2 From Sequence to Function in the age of Genomics •Genomics is making an increasing contribution to the study of protein structure function. •Sequence and structural comparison can usually give only limited information, however, and comprehensively characterizing the protein function of an uncharacterized protein in a cell or organism will always require additional experimental investigations on the purified proteins in vitro as well as cell biological and mutational studies in vivo. 3 From Sequence to Function in the age of Genomics •Proteomics Understand Proteins, through analyzing populations Structure (motions, packing,folds) Function Evolution Integration of information Structures - Sequences - Microarrays 4 Regulation of flagellar genes in C. jejuni Carrillo CD et al. J Biol Chem (2004) 279(19):20327-38. 5 Time and distance scales in functional genomics Time Process Example System Example Detection Methods Distance Interdisciplinary approaches are essential to determine the functions of gene products. 6 Sequence alignment and comparison General principle of alignments: Two homologous sequences derived from the same ancestral sequence will have at least some identical residues at the corresponding positions in the sequence; if corresponding positions in the sequence are aligned, the degree of matching should be statistically significant compared with that of two randomly chosen unrelated sequences.
    [Show full text]
  • Supplementary Informations SI2. Supplementary Table 1
    Supplementary Informations SI2. Supplementary Table 1. M9, soil, and rhizosphere media composition. LB in Compound Name Exchange Reaction LB in soil LBin M9 rhizosphere H2O EX_cpd00001_e0 -15 -15 -10 O2 EX_cpd00007_e0 -15 -15 -10 Phosphate EX_cpd00009_e0 -15 -15 -10 CO2 EX_cpd00011_e0 -15 -15 0 Ammonia EX_cpd00013_e0 -7.5 -7.5 -10 L-glutamate EX_cpd00023_e0 0 -0.0283302 0 D-glucose EX_cpd00027_e0 -0.61972444 -0.04098397 0 Mn2 EX_cpd00030_e0 -15 -15 -10 Glycine EX_cpd00033_e0 -0.0068175 -0.00693094 0 Zn2 EX_cpd00034_e0 -15 -15 -10 L-alanine EX_cpd00035_e0 -0.02780553 -0.00823049 0 Succinate EX_cpd00036_e0 -0.0056245 -0.12240603 0 L-lysine EX_cpd00039_e0 0 -10 0 L-aspartate EX_cpd00041_e0 0 -0.03205557 0 Sulfate EX_cpd00048_e0 -15 -15 -10 L-arginine EX_cpd00051_e0 -0.0068175 -0.00948672 0 L-serine EX_cpd00054_e0 0 -0.01004986 0 Cu2+ EX_cpd00058_e0 -15 -15 -10 Ca2+ EX_cpd00063_e0 -15 -100 -10 L-ornithine EX_cpd00064_e0 -0.0068175 -0.00831712 0 H+ EX_cpd00067_e0 -15 -15 -10 L-tyrosine EX_cpd00069_e0 -0.0068175 -0.00233919 0 Sucrose EX_cpd00076_e0 0 -0.02049199 0 L-cysteine EX_cpd00084_e0 -0.0068175 0 0 Cl- EX_cpd00099_e0 -15 -15 -10 Glycerol EX_cpd00100_e0 0 0 -10 Biotin EX_cpd00104_e0 -15 -15 0 D-ribose EX_cpd00105_e0 -0.01862144 0 0 L-leucine EX_cpd00107_e0 -0.03596182 -0.00303228 0 D-galactose EX_cpd00108_e0 -0.25290619 -0.18317325 0 L-histidine EX_cpd00119_e0 -0.0068175 -0.00506825 0 L-proline EX_cpd00129_e0 -0.01102953 0 0 L-malate EX_cpd00130_e0 -0.03649016 -0.79413596 0 D-mannose EX_cpd00138_e0 -0.2540567 -0.05436649 0 Co2 EX_cpd00149_e0
    [Show full text]