Computational Analysis of Protein Function Within Complete Genomes

Total Page:16

File Type:pdf, Size:1020Kb

Computational Analysis of Protein Function Within Complete Genomes Computational Analysis of Protein Function within Complete Genomes Anton James Enright Wolfson College A dissertation submitted to the University of Cambridge for the degree of Doctor of Philosophy European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom. Email: [email protected] March 7, 2002 To My Parents and Kerstin This thesis is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. This thesis does not exceed the specified length limit of 300 pages as de- fined by the Biology Degree Committee. This thesis has been typeset in 12pt font using LATEX2ε accordingtothe specifications defined by the Board of Graduate Studies and the Biology Degree Committee. ii Computational Analysis of Protein Function within Complete Genomes Summary Anton James Enright March 7, 2002 Wolfson College Since the advent of complete genome sequencing, vast amounts of nucleotide and amino acid sequence data have been produced. These data need to be effectively analysed and verified so that they may be used for biologi- cal discovery. A significant proportion of predicted protein sequences from these complete genomes have poorly characterised or unknown functional annotations. This thesis describes a number of approaches which detail the computational analysis of amino acid sequences for the prediction and analy- sis of protein function within complete genomes. The first chapter is a short introduction to computational genome analysis while the second and third chapters describe how groups of related protein sequences (termed protein families) may be characterised using sequence clustering algorithms. Two novel and complementary sequence clustering algorithms will be presented together with details of how protein family information can be used to detect and describe the functions of proteins in complete genomes. Further research is described which uses this protein family information to analyse the molec- ular evolution of proteins within complete genomes. Recent developments in genome analysis have shown that the computational prediction of protein function in complete genomes is not limited to pure sequence homology meth- ods. So called genome context methods use other information from complete genome sequences to predict protein function. Examples of this include gene location or neighbourhood analysis and the analysis of the phyletic distri- bution of protein sequences. In chapter four a novel method for predicting whether two proteins are functionally associated or physically interact is de- scribed. This method is based on the detection of gene-fusion events within complete genomes. During this research many novel tools and methods for genome analysis, data-visualisation, data-mining and high-performance bio- logical computing were developed. Many of these tools represent interesting research projects in their own right, and formed the basis from which the research carried out in this thesis was conducted. Within the final chapter a selection of these novel algorithms developed during this research is described. Preface This thesis describes work carried out at the European Bioinformatics In- stitute (EBI) in Cambridge, UK, between October 1998 and February 2002. The EBI is an outstation of the European Molecular Biology Laboratory (EMBL). I was the recipient of an EMBL predoctoral fellowship to work with Dr. Christos Ouzounis in the Computational Genomics Group. There are many individuals who have provided assistance and encourage- ment throughout this work. I would first like to thank Professor Kenneth Wolfe and Professor David McConnell at the Smurfit Institute of Genetics in Trinity College Dublin, where I had previously been an undergraduate. While at Trinity I was given the best scientific education one could hope for, spanning classical genetics, mathematics and bioinformatics. I was encour- aged at Trinity to accept an internship with Digital Equipment Corporation in my final year. It was at Digital that I discovered how bad my programming skills actually were, and through the patient efforts of Dr. Eamonn O’Toole managed to rectify the situation somewhat. There are many people at the European Bioinformatics Institute who through their friendship and support have made this work an enjoyable and rewarding experience. Firstly, my supervisor Dr. Christos Ouzounis, whose patient support and scientific knowledge have greatly aided this research. The other members of the computational genomics group have helped me over the years in countless ways and have also had the dubious pleasure of discovering all of the bugs in software I have written. James Cuff and Michele Clamp helped me get on my feet as a bioinformatician by patiently providing advice and assistance on an almost daily basis. I’d also like to thank Steven Searle, who has helped me debug countless programs, and whose sage-like wisdom of all things computational still terrifies me. Stijn Van Dongen, the inventor of the MCL algorithm, deserves special mention for writing such a beautiful algorithm, and for helping me understand how it works. Finally, and not least, the members of the EBI ’A-Team’, especially Kerstin Dose, who keep the EBI running and who have sorted out countless problems for me. iv Contents Summary iii Preface iv 1 Introduction 1 1.1 Protein Sequence Alignment andSimilarityDetection..................... 3 1.1.1 The Needleman-Wunsch Algorithm . 4 1.1.2 TheSmith-WatermanAlgorithm............ 8 1.1.3 TheFASTAAlgorithm.................. 9 1.1.4 TheBLASTAlgorithm.................. 12 1.1.5 Statistics for Sequence Alignment Searching . 14 1.1.6 Profile Searching: PSI-BLASTandHMMER................ 22 1.2 Computational Biology Data Sources . 26 1.2.1 Nucleotide Sequence Databases . 26 1.2.2 ProteinSequenceDatabases............... 29 1.2.3 Protein Domain and Family Databases (Pfam&InterPro).................... 31 2 Algorithms for Automatic Protein Sequence Clustering 33 2.1GenomicProteinFamilyAnalysis................ 35 2.2ProteinSequenceClusteringTechniques............ 37 2.2.1 Multi-DomainProteins.................. 38 2.2.2 Orthology, Paralogy and Evolution . 40 2.2.3 AutomationandScaling................. 42 2.2.4 PreviousMethods..................... 44 2.3GeneRAGE............................ 49 2.3.1 InitialSteps........................ 49 2.3.2 SymmetrificationoftheMatrix............. 51 2.3.3 Detection of Multi-Domain Proteins . 52 v 2.3.4 ClusteringtheSimilarityMatrix............. 53 2.3.5 ValidationandTesting.................. 55 2.3.6 AlgorithmImplementation................ 58 2.3.7 Conclusions........................ 58 2.4Tribe-MCL............................ 60 2.4.1 Introduction........................ 60 2.4.2 Markov Clustering of Sequence Similarities . 62 2.4.3 Application of the MCL Algorithm to Biological Graphs 67 2.4.4 ValidationoftheAlgorithm............... 70 2.4.5 Large-Scale Family Detection . 72 2.4.6 Algorithm Implementation and Availability . 75 2.4.7 Conclusions........................ 75 3 Analysis of Protein Families 77 3.1 Exploration of Protein Families using GeneRAGE............................ 78 3.1.1 Detection of Novel Archaeal Domains . 78 3.1.2 Transcription Associated Protein Family Analysis.......................... 84 3.1.3 DetectionofSR-DomainProteins............ 89 3.2 Large-Scale Detection of Protein Families using Tribe-MCL . 93 3.2.1 Family Annotation of the Draft Human Genome . 93 3.3 Tribes: A Database of Protein Families . 99 3.3.1 A Complete Genomes Database . 100 3.3.2 Construction of the Tribes Database . 103 3.3.3 Protein Family Analysis using the Tribes Database . 106 4 Genomic Analysis of Protein Interaction 125 4.1 Computational Detection of Protein-ProteinInteractions...................126 4.1.1 Structural Biology Approaches . 126 4.1.2 Sequence and Genomic Context Approaches . 127 4.2 Prediction of Protein Function and Protein-Protein interaction using GeneFusionevents........................131 4.2.1 An Algorithm for the Detection of Gene Fusion Events 134 4.2.2 Application and Validation of the Algorithm using Four CompleteGenomes....................137 4.3 Exhaustive Detection of Protein-ProteinInteractions...................143 vi 4.3.1 Computational Protocol for Exhaustive Detection of GeneFusionEvents....................145 4.3.2 Results...........................148 4.3.3 Data Storage & Availability . 159 4.3.4 Discussion.........................159 5 Methods for Genome Analysis and Annotation 164 5.1 CAST: Detection of Compositionally BiasedRegionsinProteinSequences..............165 5.1.1 Introduction........................165 5.1.2 AlgorithmConcept....................166 5.1.3 Detection of Low-Complexity Residues . 168 5.1.4 FilteringProcedure....................169 5.1.5 Implementation......................169 5.2 BioLayout: A Visualisation Algorithm for Protein Sequence Similarities............................173 5.2.1 GraphLayoutTheory..................173 5.2.2 GraphLayoutAlgorithms................174 5.2.3 The BioLayout Algorithm . 175 5.2.4 ImplementationandUsage................177 5.3 TextQuest: Automatic Classification of Medline Abstracts . 183 5.3.1 Introduction........................183 5.3.2 Text-MiningApproaches.................184 5.3.3 TheTextQuestAlgorithm................185 5.3.4 DocumentClusteringResults..............188 5.3.5 Implementation......................191 5.4 Consensus Annotation of Protein Families . 192 5.4.1
Recommended publications
  • Anti-Rab11 Antibody (ARG41900)
    Product datasheet [email protected] ARG41900 Package: 100 μg anti-Rab11 antibody Store at: -20°C Summary Product Description Goat Polyclonal antibody recognizes Rab11 Tested Reactivity Hu, Ms, Rat, Dog, Mk Tested Application IHC-Fr, IHC-P, WB Host Goat Clonality Polyclonal Isotype IgG Target Name Rab11 Antigen Species Mouse Immunogen Purified recombinant peptides within aa. 110 to the C-terminus of Mouse Rab11a, Rab11b and Rab11c (Rab25). Conjugation Un-conjugated Alternate Names RAB11A: Rab-11; Ras-related protein Rab-11A; YL8 RAB11B: GTP-binding protein YPT3; H-YPT3; Ras-related protein Rab-11B RAB25: RAB11C; CATX-8; Ras-related protein Rab-25 Application Instructions Application table Application Dilution IHC-Fr 1:100 - 1:400 IHC-P 1:100 - 1:400 WB 1:250 - 1:2000 Application Note IHC-P: Antigen Retrieval: Heat mediation was recommended. * The dilutions indicate recommended starting dilutions and the optimal dilutions or concentrations should be determined by the scientist. Positive Control Hepa cell lysate Calculated Mw 24 kDa Observed Size ~ 26 kDa Properties Form Liquid Purification Affinity purification with immunogen. Buffer PBS, 0.05% Sodium azide and 20% Glycerol. Preservative 0.05% Sodium azide www.arigobio.com 1/3 Stabilizer 20% Glycerol Concentration 3 mg/ml Storage instruction For continuous use, store undiluted antibody at 2-8°C for up to a week. For long-term storage, aliquot and store at -20°C. Storage in frost free freezers is not recommended. Avoid repeated freeze/thaw cycles. Suggest spin the vial prior to opening. The antibody solution should be gently mixed before use. Note For laboratory research only, not for drug, diagnostic or other use.
    [Show full text]
  • A Bivalent Chromatin Structure Marks Key Developmental Genes in Embryonic Stem Cells
    A Bivalent Chromatin Structure Marks Key Developmental Genes in Embryonic Stem Cells Bradley E. Bernstein,1,2,3,* Tarjei S. Mikkelsen,3,4 Xiaohui Xie,3 Michael Kamal,3 Dana J. Huebert,1 James Cuff,3 Ben Fry,3 Alex Meissner,5 Marius Wernig,5 Kathrin Plath,5 Rudolf Jaenisch,5 Alexandre Wagschal,6 Robert Feil,6 Stuart L. Schreiber,3,7 and Eric S. Lander3,5 1 Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA 2 Department of Pathology, Harvard Medical School, Boston, MA 02115, USA 3 Broad Institute of Harvard and MIT, Cambridge, MA 02139, USA 4 Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA 5 Whitehead Institute for Biomedical Research, MIT, Cambridge, MA 02139, USA 6 Institute of Molecular Genetics, CNRS UMR-5535 and University of Montpellier-II, Montpellier, France 7 Howard Hughes Medical Institute at the Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA *Contact: [email protected] DOI 10.1016/j.cell.2006.02.041 SUMMARY which in turn modulate chromatin structure (Jenuwein and Allis, 2001; Margueron et al., 2005). The core histones The most highly conserved noncoding ele- H2A, H2B, H3, and H4 are subject to dozens of different ments (HCNEs) in mammalian genomes cluster modifications, including acetylation, methylation, and within regions enriched for genes encoding de- phosphorylation. Histone H3 lysine 4 (Lys4) and lysine velopmentally important transcription factors 27 (Lys27) methylation are of particular interest as these (TFs). This suggests that HCNE-rich regions modifications are catalyzed, respectively, by trithorax- may contain key regulatory controls involved and Polycomb-group proteins, which mediate mitotic in- heritance of lineage-specific gene expression programs in development.
    [Show full text]
  • Molecular Insights Into DNA Interference by CRISPR-Associated Nuclease-Helicase Cas3
    Molecular insights into DNA interference by CRISPR-associated nuclease-helicase Cas3 Bei Gonga,1, Minsang Shinb,1, Jiali Suna, Che-Hun Junga,b, Edward L. Boltc, John van der Oostd, and Jeong-Sun Kima,b,2 aInterdisciplinary Graduate Program in Molecular Medicine, Chonnam National University, Gwangju 501-746, Korea; bDepartment of Chemistry, Chonnam National University, Gwangju 500-757, Korea; cSchool of Life Sciences, University of Nottingham Medical School, Nottingham NG72UH, United Kingdom; and dLaboratory of Microbiology, Wageningen University, 6703 HB, Wageningen, The Netherlands Edited by Wei Yang, National Institutes of Health, Bethesda, MD, and approved September 26, 2014 (received for review June 10, 2014) Mobile genetic elements in bacteria are neutralized by a system 21), Cascade complex by itself is not a nuclease for degradation based on clustered regularly interspaced short palindromic repeats of invader DNA (13). In Escherichia coli K-12 (type IE), Cascade (CRISPRs) and CRISPR-associated (Cas) proteins. Type I CRISPR-Cas recognizes and binds crRNA to complimentary sequence in systems use a “Cascade” ribonucleoprotein complex to guide RNA target DNA, generating an RNA mediated displacement loop specifically to complementary sequence in invader double-stranded (R-loop) of single-stranded (ss) DNA and RNA-DNA hybrid DNA (dsDNA), a process called “interference.” After target recogni- within double-stranded (ds) target DNA (13). Formation of the tion by Cascade, formation of an R-loop triggers recruitment of R-loop structure induces conformational change in Cascade subunits, triggering recruitment of Cas3 protein that is a nucle- a Cas3 nuclease-helicase, completing the interference process by – destroying the invader dsDNA.
    [Show full text]
  • Comparative Analysis of Multiple Sequence Alignment Tools
    I.J. Information Technology and Computer Science, 2018, 8, 24-30 Published Online August 2018 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2018.08.04 Comparative Analysis of Multiple Sequence Alignment Tools Eman M. Mohamed Faculty of Computers and Information, Menoufia University, Egypt E-mail: [email protected]. Hamdy M. Mousa, Arabi E. keshk Faculty of Computers and Information, Menoufia University, Egypt E-mail: [email protected], [email protected]. Received: 24 April 2018; Accepted: 07 July 2018; Published: 08 August 2018 Abstract—The perfect alignment between three or more global alignment algorithm built-in dynamic sequences of Protein, RNA or DNA is a very difficult programming technique [1]. This algorithm maximizes task in bioinformatics. There are many techniques for the number of amino acid matches and minimizes the alignment multiple sequences. Many techniques number of required gaps to finds globally optimal maximize speed and do not concern with the accuracy of alignment. Local alignments are more useful for aligning the resulting alignment. Likewise, many techniques sub-regions of the sequences, whereas local alignment maximize accuracy and do not concern with the speed. maximizes sub-regions similarity alignment. One of the Reducing memory and execution time requirements and most known of Local alignment is Smith-Waterman increasing the accuracy of multiple sequence alignment algorithm [2]. on large-scale datasets are the vital goal of any technique. The paper introduces the comparative analysis of the Table 1. Pairwise vs. multiple sequence alignment most well-known programs (CLUSTAL-OMEGA, PSA MSA MAFFT, BROBCONS, KALIGN, RETALIGN, and Compare two biological Compare more than two MUSCLE).
    [Show full text]
  • Protein Prenylation Reactions As Tools for Site-Specific Protein Labeling and Identification of Prenylation Substrates
    Protein prenylation reactions as tools for site-specific protein labeling and identification of prenylation substrates Dissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) des Fachbereichs Chemie der Technischen Universität Dortmund Angefertigt am Max-Planck-Institut für molekulare Physiologie in Dortmund Vorgelegt von Dipl.-Chemikerin Thi Thanh Uyen Nguyen aus Jülich Dortmund, April 2009 Die vorliegende Arbeit wurde in der Zeit von Oktober 2005 bis April 2009 am Max-Planck- Institut für molekulare Physiologie in Dortmund unter der Anleitung von Prof. Dr. Roger S. Goody, Prof. Dr. Kirill Alexandrov und Prof. Dr. Herbert Waldmann durchgeführt. 1. Gutachter : Prof. Dr. R. S. Goody 2. Gutachter : Prof. Dr. H. Waldmann The results of this work were published in the following journals: “Exploiting the substrate tolerance of farnesyltransferase for site-selective protein derivatization” U.T.T. Nguyen, J. Cramer, J. Gomis, R. Reents, M. Gutierrez-Rodriguez, R.S. Goody, K. Alexandrov, H. Waldmann, Chembiochem 2007, 8 (4), 408-23. “Development of selective RabGGTase inhibitors and crystal structure of a RabGGTase- inhibitor complex” Z. Guo, Y.W. Wu, K.T. Tan, R.S. Bon, E. Guiu-Rozas, C. Delon, U.T.T. Nguyen, S. Wetzel,S. Arndt, R.S. Goody, W. Blankenfeldt, K. Alexandrov, H. Waldmann, Angew Chem Int Ed Engl 2008, 47 (20), 3747-50. “Analysis of the eukaryotic prenylome by isoprenoid affinity tagging” U.T.T. Nguyen, Z. Guo, C. Delon, Y.W. Wu, C. Deraeve, B. Franzel, R.S. Bon, W. Blankenfeldt, R.S. Goody, H. Waldmann, D. Wolters, K. Alexandrov, Nat Chem Biol 2009, 5 (4), 227-35.
    [Show full text]
  • Download Download
    Supplementary Figure S1. Results of flow cytometry analysis, performed to estimate CD34 positivity, after immunomagnetic separation in two different experiments. As monoclonal antibody for labeling the sample, the fluorescein isothiocyanate (FITC)- conjugated mouse anti-human CD34 MoAb (Mylteni) was used. Briefly, cell samples were incubated in the presence of the indicated MoAbs, at the proper dilution, in PBS containing 5% FCS and 1% Fc receptor (FcR) blocking reagent (Miltenyi) for 30 min at 4 C. Cells were then washed twice, resuspended with PBS and analyzed by a Coulter Epics XL (Coulter Electronics Inc., Hialeah, FL, USA) flow cytometer. only use Non-commercial 1 Supplementary Table S1. Complete list of the datasets used in this study and their sources. GEO Total samples Geo selected GEO accession of used Platform Reference series in series samples samples GSM142565 GSM142566 GSM142567 GSM142568 GSE6146 HG-U133A 14 8 - GSM142569 GSM142571 GSM142572 GSM142574 GSM51391 GSM51392 GSE2666 HG-U133A 36 4 1 GSM51393 GSM51394 only GSM321583 GSE12803 HG-U133A 20 3 GSM321584 2 GSM321585 use Promyelocytes_1 Promyelocytes_2 Promyelocytes_3 Promyelocytes_4 HG-U133A 8 8 3 GSE64282 Promyelocytes_5 Promyelocytes_6 Promyelocytes_7 Promyelocytes_8 Non-commercial 2 Supplementary Table S2. Chromosomal regions up-regulated in CD34+ samples as identified by the LAP procedure with the two-class statistics coded in the PREDA R package and an FDR threshold of 0.5. Functional enrichment analysis has been performed using DAVID (http://david.abcc.ncifcrf.gov/)
    [Show full text]
  • Yeast Genome Gazetteer P35-65
    gazetteer Metabolism 35 tRNA modification mitochondrial transport amino-acid metabolism other tRNA-transcription activities vesicular transport (Golgi network, etc.) nitrogen and sulphur metabolism mRNA synthesis peroxisomal transport nucleotide metabolism mRNA processing (splicing) vacuolar transport phosphate metabolism mRNA processing (5’-end, 3’-end processing extracellular transport carbohydrate metabolism and mRNA degradation) cellular import lipid, fatty-acid and sterol metabolism other mRNA-transcription activities other intracellular-transport activities biosynthesis of vitamins, cofactors and RNA transport prosthetic groups other transcription activities Cellular organization and biogenesis 54 ionic homeostasis organization and biogenesis of cell wall and Protein synthesis 48 plasma membrane Energy 40 ribosomal proteins organization and biogenesis of glycolysis translation (initiation,elongation and cytoskeleton gluconeogenesis termination) organization and biogenesis of endoplasmic pentose-phosphate pathway translational control reticulum and Golgi tricarboxylic-acid pathway tRNA synthetases organization and biogenesis of chromosome respiration other protein-synthesis activities structure fermentation mitochondrial organization and biogenesis metabolism of energy reserves (glycogen Protein destination 49 peroxisomal organization and biogenesis and trehalose) protein folding and stabilization endosomal organization and biogenesis other energy-generation activities protein targeting, sorting and translocation vacuolar and lysosomal
    [Show full text]
  • Anti- DDX17 Antibody
    anti- DDX17 antibody Product Information Catalog No.: FNab02296 Size: 100μg Form: liquid Purification: Immunogen affinity purified Purity: ≥95% as determined by SDS-PAGE Host: Rabbit Clonality: polyclonal Clone ID: None IsoType: IgG Storage: PBS with 0.02% sodium azide and 50% glycerol pH 7.3, -20℃ for 12 months (Avoid repeated freeze / thaw cycles.) Background RNA-dependent ATPase activity. Involved in transcriptional regulation. Transcriptional coactivator for estrogen receptor ESR1. Increases ESR1 AF-1 domain-mediated transactivation. Synergizes with DDX5 and SRA1 RNA to activate MYOD1 transcriptional activity and probably involved in skeletal muscle differentiation. Required for zinc-finger antiviral protein ZC3HAV1- mediated mRNA degradation. Immunogen information Immunogen: DEAD(Asp-Glu-Ala-Asp) box polypeptide 17 Synonyms: DDX17, DEAD box protein 17, DEAD box protein p72, P72, RH70, RNA dependent helicase p72 Observed MW: 72 kDa, 80 kDa Uniprot ID : Q92841 Application Reactivity: Human, Mouse, Rat Tested Application: ELISA, IHC, IF, WB, IP 1 Wuhan Fine Biotech Co., Ltd. B9 Bld, High-Tech Medical Devices Park, No. 818 Gaoxin Ave.East Lake High-Tech Development Zone.Wuhan, Hubei, China(430206) Tel :( 0086)027-87384275 Fax: (0086)027-87800889 www.fn-test.com Recommended dilution: WB: 1:500-1:2000; IP: 1:500-1:2000; IHC: 1:500-1:2000; IF: 1:10-1:100 Image: Immunohistochemistry of paraffin-embedded human kidney using FNab02296(DDX17 antibody) at dilution of 1:100 Immunofluorescent analysis of Hela cells, using DDX17 antibody FNab02296 at 1:25 dilution and Rhodamine-labeled goat anti-rabbit IgG (red). IP Result of anti-DDX17,P72 (IP: FNab02296, 4ug; Detection: FNab02296 1:1000) with mouse brain tissue lysate 3000ug.
    [Show full text]
  • 1 Supporting Information for a Microrna Network Regulates
    Supporting Information for A microRNA Network Regulates Expression and Biosynthesis of CFTR and CFTR-ΔF508 Shyam Ramachandrana,b, Philip H. Karpc, Peng Jiangc, Lynda S. Ostedgaardc, Amy E. Walza, John T. Fishere, Shaf Keshavjeeh, Kim A. Lennoxi, Ashley M. Jacobii, Scott D. Rosei, Mark A. Behlkei, Michael J. Welshb,c,d,g, Yi Xingb,c,f, Paul B. McCray Jr.a,b,c Author Affiliations: Department of Pediatricsa, Interdisciplinary Program in Geneticsb, Departments of Internal Medicinec, Molecular Physiology and Biophysicsd, Anatomy and Cell Biologye, Biomedical Engineeringf, Howard Hughes Medical Instituteg, Carver College of Medicine, University of Iowa, Iowa City, IA-52242 Division of Thoracic Surgeryh, Toronto General Hospital, University Health Network, University of Toronto, Toronto, Canada-M5G 2C4 Integrated DNA Technologiesi, Coralville, IA-52241 To whom correspondence should be addressed: Email: [email protected] (M.J.W.); yi- [email protected] (Y.X.); Email: [email protected] (P.B.M.) This PDF file includes: Materials and Methods References Fig. S1. miR-138 regulates SIN3A in a dose-dependent and site-specific manner. Fig. S2. miR-138 regulates endogenous SIN3A protein expression. Fig. S3. miR-138 regulates endogenous CFTR protein expression in Calu-3 cells. Fig. S4. miR-138 regulates endogenous CFTR protein expression in primary human airway epithelia. Fig. S5. miR-138 regulates CFTR expression in HeLa cells. Fig. S6. miR-138 regulates CFTR expression in HEK293T cells. Fig. S7. HeLa cells exhibit CFTR channel activity. Fig. S8. miR-138 improves CFTR processing. Fig. S9. miR-138 improves CFTR-ΔF508 processing. Fig. S10. SIN3A inhibition yields partial rescue of Cl- transport in CF epithelia.
    [Show full text]
  • "Phylogenetic Analysis of Protein Sequence Data Using The
    Phylogenetic Analysis of Protein Sequence UNIT 19.11 Data Using the Randomized Axelerated Maximum Likelihood (RAXML) Program Antonis Rokas1 1Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee ABSTRACT Phylogenetic analysis is the study of evolutionary relationships among molecules, phenotypes, and organisms. In the context of protein sequence data, phylogenetic analysis is one of the cornerstones of comparative sequence analysis and has many applications in the study of protein evolution and function. This unit provides a brief review of the principles of phylogenetic analysis and describes several different standard phylogenetic analyses of protein sequence data using the RAXML (Randomized Axelerated Maximum Likelihood) Program. Curr. Protoc. Mol. Biol. 96:19.11.1-19.11.14. C 2011 by John Wiley & Sons, Inc. Keywords: molecular evolution r bootstrap r multiple sequence alignment r amino acid substitution matrix r evolutionary relationship r systematics INTRODUCTION the baboon-colobus monkey lineage almost Phylogenetic analysis is a standard and es- 25 million years ago, whereas baboons and sential tool in any molecular biologist’s bioin- colobus monkeys diverged less than 15 mil- formatics toolkit that, in the context of pro- lion years ago (Sterner et al., 2006). Clearly, tein sequence analysis, enables us to study degree of sequence similarity does not equate the evolutionary history and change of pro- with degree of evolutionary relationship. teins and their function. Such analysis is es- A typical phylogenetic analysis of protein sential to understanding major evolutionary sequence data involves five distinct steps: (a) questions, such as the origins and history of data collection, (b) inference of homology, (c) macromolecules, developmental mechanisms, sequence alignment, (d) alignment trimming, phenotypes, and life itself.
    [Show full text]
  • ING3 Is Essential for Asymmetric Cell Division During Mouse Oocyte Maturation
    ING3 Is Essential for Asymmetric Cell Division during Mouse Oocyte Maturation Shinnosuke Suzuki1, Yusuke Nozawa1, Satoshi Tsukamoto2, Takehito Kaneko3, Hiroshi Imai1, Naojiro Minami1* 1 Laboratory of Reproductive Biology, Graduate School of Agriculture, Kyoto University, Kyoto, Japan, 2 Laboratory Animal and Genome Sciences Section, National Institute of Radiological Sciences, Chiba, Japan, 3 Institute of Laboratory Animals, Graduate School of Medicine, Kyoto University, Kyoto, Japan Abstract ING3 (inhibitor of growth family, member 3) is a subunit of the nucleosome acetyltransferase of histone 4 (NuA4) complex, which activates gene expression. ING3, which contains a plant homeodomain (PHD) motif that can bind to trimethylated lysine 4 on histone H3 (H3K4me3), is ubiquitously expressed in mammalian tissues and governs transcriptional regulation, cell cycle control, and apoptosis via p53-mediated transcription or the Fas/caspase-8 pathway. Thus, ING3 plays a number of important roles in various somatic cells. However, the role(s) of ING3 in germ cells remains unknown. Here, we show that loss of ING3 function led to the failure of asymmetric cell division and cortical reorganization in the mouse oocyte. Immunostaining showed that in fully grown germinal vesicle (GV) oocytes, ING3 localized predominantly in the GV. After germinal vesicle breakdown (GVBD), ING3 homogeneously localized in the cytoplasm. In oocytes where Ing3 was targeted by siRNA microinjection, we observed symmetric cell division during mouse oocyte maturation. In those oocytes, oocyte polarization was not established due to the failure to form an actin cap or a cortical granule-free domain (CGFD), the lack of which inhibited spindle migration. These features were among the main causes of abnormal symmetric cell division.
    [Show full text]
  • Performance Evaluation of Leading Protein Multiple Sequence Alignment Methods
    International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-9 Issue-1, October 2019 Performance Evaluation of Leading Protein Multiple Sequence Alignment Methods Arunima Mishra, B. K. Tripathi, S. S. Soam MSA is a well-known method of alignment of three or more Abstract: Protein Multiple sequence alignment (MSA) is a biological sequences. Multiple sequence alignment is a very process, that helps in alignment of more than two protein intricate problem, therefore, computation of exact MSA is sequences to establish an evolutionary relationship between the only feasible for the very small number of sequences which is sequences. As part of Protein MSA, the biological sequences are not practical in real situations. Dynamic programming as used aligned in a way to identify maximum similarities. Over time the sequencing technologies are becoming more sophisticated and in pairwise sequence method is impractical for a large number hence the volume of biological data generated is increasing at an of sequences while performing MSA and therefore the enormous rate. This increase in volume of data poses a challenge heuristic algorithms with approximate approaches [7] have to the existing methods used to perform effective MSA as with the been proved more successful. Generally, various biological increase in data volume the computational complexities also sequences are organized into a two-dimensional array such increases and the speed to process decreases. The accuracy of that the residues in each column are homologous or having the MSA is another factor critically important as many bioinformatics same functionality. Many MSA methods were developed over inferences are dependent on the output of MSA.
    [Show full text]