Discovering Protein-Binding RNA Motifs with a Generative Model of RNA Sequences T

Total Page:16

File Type:pdf, Size:1020Kb

Discovering Protein-Binding RNA Motifs with a Generative Model of RNA Sequences T Computational Biology and Chemistry 84 (2020) 107171 Contents lists available at ScienceDirect Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/cbac Research Article Discovering protein-binding RNA motifs with a generative model of RNA sequences T Byungkyu Park, Kyungsook Han* Department of Computer Engineering, Inha University, 22212 Incheon, South Korea ARTICLE INFO ABSTRACT Keywords: Recent advances in high-throughput experimental technologies have generated a huge amount of data on in- Protein-RNA interaction teractions between proteins and nucleic acids. Motivated by the big experimental data, several computational Binding motif methods have been developed either to predict binding sites in a sequence or to determine if an interaction exists Generator between protein and nucleic acid sequences. However, most of the methods cannot be used to discover new Long short-term memory network nucleic acid sequences that bind to a target protein because they are classifiers rather than generators. In this paper we propose a generative model for constructing protein-binding RNA sequences and motifs using a long short-term memory (LSTM) neural network. Testing the model for several target proteins showed that RNA sequences generated by the model have high binding affinity and specificity for their target proteins and that the protein-binding motifs derived from the generated RNA sequences are comparable to the motifs from experi- mentally validated protein-binding RNA sequences. The results are promising and we believe this approach will help design more efficient in vitro or in vivo experiments by suggesting potential RNA aptamers for a target protein. 1. Introduction protein-binding sites in the input nucleic sequence. Nonetheless, the binding score provided by DeepBind is informative, so we used the Interactions between proteins and nucleic acids are involved in binding score in our study to estimate the affinity and specificity of many biological processes such as transcription, RNA processing, and nucleic acid sequences for a target protein. translation. A variety of in vitro and in vivo experimental methods have A later model known as DeeperBind (Hassanzadeh and Wang, 2016) been developed to study interactions between proteins and nucleic is a long short-term recurrent convolutional network for predicting the acids, and the past decade has witnessed a large amount of data gen- protein-binding specificity of DNA sequences. DeeperBind showed a erated by the experimental methods. The experimental data have trig- better performance than DeepBind for some proteins, but its use is gered the development of computational methods to predict binding limited to the datasets from protein-binding microarrays. Both Deep- sites in a sequence (Choi et al., 2017; Tuvshinjargal et al., 2016; Walia Bind and DeeperBind are classifiers rather generators, and are not in- et al., 2014) or to determine if an interaction exists between a pair of tended for finding protein-binding motifs or for constructing protein- sequences (Akbaripour-Elahabad et al., 2016; Alipanahi et al., 2015; binding nucleic acid sequences. Zhang and Liu, 2017). However, most of the computational methods A more recent model called iDeep (Pan and Shen, 2017) uses a are classifiers rather than generators, so cannot be used to discover new convolutional neural network and deep belief network to predict the protein-binding nucleic acid sequences. RBP binding sites and motifs in RNAs. Five features (region type, clip- Among the computational methods, neural networks have shown a cobinding, structure, motif, CNN sequence) were used to train iDeep. In certain degree of success in predicting the interactions between proteins testing on 31 human RBPs, iDeep showed a better performance than and nucleic acids. DeepBind (Alipanahi et al., 2015), for example, is a DeepBind. However, iDeep has several drawbacks compared to Deep- convolutional neural network trained on a large amount of data from Bind. First, iDeep contains only 31 human RBP models whereas Deep- RNAcompete experiments (Ray et al., 2009, 2013). For the problem of Bind has 144 human RBP models. Second, iDeep requires input data predicting protein-binding sites of nucleic acid sequences, DeepBind with the 5 features (region type, clip-cobinding, structure, motif, CNN contains hundreds of distinct prediction models, each for a different sequence) derived by iONMF (Stražar et al., 2016) whereas DeepBind target protein. What DeepBind predicts is a binding score rather than takes only a sequence as input. Thus, if data generated by iONMF is not ⁎ Corresponding author. E-mail address: [email protected] (K. Han). https://doi.org/10.1016/j.compbiolchem.2019.107171 Received 17 June 2019; Received in revised form 19 October 2019; Accepted 19 November 2019 Available online 07 January 2020 1476-9271/ © 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/BY-NC-ND/4.0/). B. Park and K. Han Computational Biology and Chemistry 84 (2020) 107171 available for a target protein, iDeep cannot be used. xt = number for an input nucleotide (1) In this paper we propose a new approach to finding protein-binding y = number for a target nucleotide (2) RNA motifs with a generative model of protein-binding RNA sequences t using a long short-term memory (LSTM) network. As an extension of a zxtt= LSTM( ) (3) recurrent neural network (RNN), the LSTM network solves the van- n ishing gradient problem of RNN by introducing a gating mechanism yzxnt ===softmax(tt ) Pr(+1 |yt ) (4) (Hochreiter and Schmidhuber, 1997). LSTM networks are often used in deep learning and show good performance in speech recognition loss=−∑ (log Pr(xnyt+1 = |t )) (5) (Graves et al., 2013) and language translation (Sutskever et al., 2014). Fig. 2 shows the loss of the model during the first 50 epochs of To the best of our knowledge, our generator is the first attempt to training for eight target proteins HuR, SLM2, PTB, RBM4, U1A, SF2/ construct protein-binding RNA sequences using a LSTM network. The ASF, FUSIP1, and YB1. The loss tended to be decreased after a certain rest of the paper describes the architecture of the generator and dis- point as the model was trained longer, but the decreasing trend was not cusses the protein-binding RNA sequences generated by the model and monotonic. We selected a generator model with the minimum loss their motifs for several RNA-binding proteins (RBPs). value. 2. Method 2.2. Definition of binding affinity and specificity 2.1. Generator model of RNA sequences To assess the RNA sequences generated by our model, we defined the binding affinity and specificity of the sequences using the predictive For protein-binding RNA sequences, a generator model was im- binding score of DeepBind (Alipanahi et al., 2015). One problem with plemented using char-rnn (https://github.com/karpathy/char-rnn). the DeepBind score is that the scale of the scores is arbitrary, thus the The model is composed of four layers of LSTM with 512 hidden neurons scores from different DeepBind models are not directly comparable. (Fig. 1). Given an RNA sequence, it reads one nucleotide of the input To make DeepBind scores comparable, we defined the binding af- sequence at a time and predicts the next nucleotide in the sequence. The finity (AF) of an RNA sequence s to a target protein p as the probability ff parameters of the model are updated by the di erence between pre- that s has a larger DeepBind score than a random sequence. To obtain dicted and target nucleotides. an approximate value of the probability, we ran DeepBind on 200,000 To train the model, we obtained a set of 213,300 unique RNA se- random RNA sequences of 40 nucleotides and computed their binding fi quences (GEO: GSE15769), which were identi ed as binding sequences affinity defined by Eq. (6) (see Fig. 3 for an illustration of deriving the to RBPs by a custom Agilent 244K microarray experiment with known binding affinity). Since the binding affinity of a sequence is a prob- as RNAcompete (Ray et al., 2009). Among the nine RBPs of RNA- ability that the sequence has a larger DeepBind score than a random compete, we excluded one yeast RBP and used the remaining eight sequence, it is guaranteed to be in the range of [0, 1]. In the following human RBPs as target proteins in our study. Since the RBP-binding RNA equation, Score (s ) denotes the score of sequence s computed by – m sequences were 29 38 nucleotide (nt) long, the length of RNA se- DeepBind model m: quences generated by our model was set to 40 nt. n The generator model was trained in the following way (Eqs. 1 AF()p s =≤∑ δx(Score()),im s – n (1) (5)). Let xt be a 4-bit vector representing the tth nucleotide ( )in n i=1 (6) the input sequence (Supplementary data 1). Only one element of xt is 1 where δA()= 1 if an event A occurs; δA()= 0 otherwise. and the others are 0 (one hot encoding). yt is a class indicator re- The binding specificity (SP) of RNA sequence s to protein p was presenting a target nucleotide. The LSTM calculates zt for xt. Softmax defined by Eq. (7). In the equation, M is a set of all generator models changes zt to a vector of values between 0 and 1 that sum to 1 (one hot decoding). When training the model, the parameters of the neurons in trained on data from the same type of experiment as m: the LSTM layers were updated using the loss calculated from the dif- MMmc =−{} ference between the target nucleotide and predicted nucleotide. The 1 loss function, defined by Eq. (5), is the mean of the negative log-like- SP()ppss=− AF() ∑ AFk (s ) ||Mc lihood of the prediction: kM∈ c (7) Fig.
Recommended publications
  • FUSIP1 Polyclonal Antibody Catalog Number PA5-41929 Product Data Sheet
    Lot Number: A9C701N Website: thermofisher.com Customer Service (US): 1 800 955 6288 ext. 1 Technical Support (US): 1 800 955 6288 ext. 441 thermofisher.com/contactus FUSIP1 Polyclonal Antibody Catalog Number PA5-41929 Product Data Sheet Details Species Reactivity Size 100 µl Tested species reactivity Equine, Guinea Pig, Human, Mouse, Host / Isotype Rabbit IgG Rabbit, Rat Class Polyclonal Tested Applications Dilution * Type Antibody Immunohistochemistry (Paraffin) 4-8 µg/mL (IHC (P)) Immunogen Synthetic peptide directed towards the C-terminal of human FUSIP1 Western Blot (WB) 0.2-1 µg/mL Conjugate Unconjugated * Suggested working dilutions are given as a guide only. It is recommended that the user titrate the product for use in their own experiment using appropriate negative and positive controls. Form Liquid Concentration 0.5mg/mL Purification Affinity Chromatography Storage Buffer PBS with 2% sucrose Contains 0.09% sodium azide Storage Conditions -20° C, Avoid Freeze/Thaw Cycles Product Specific Information Peptide sequence: TDSKTHYKSG SRYEKESRKK EPPRSKSQSR SQSRSRSKSR SRSWTSPKSS Sequence homology: Guinea Pig: 100%; Horse: 100%; Human: 100%; Mouse: 100%; Rabbit: 100%; Rat: 100% Background/Target Information FUSIP1 is a member of the serine-arginine (SR) family of proteins, which is involved in constitutive and regulated RNA splicing. Members of this family are characterized by N-terminal RNP1 and RNP2 motifs, which are required for binding to RNA, and multiple C-terminal SR/RS repeats, which are important in mediating association with other cellular proteins. This protein can influence splice site selection of adenovirus E1A pre-mRNA. It interacts with the oncoprotein TLS, and abrogates the influence of TLS on E1A pre-mRNA splicing.This gene product is a member of the serine-arginine (SR) family of proteins, which is involved in constitutive and regulated RNA splicing.
    [Show full text]
  • FUSIP1 (SRSF10) (NM 006625) Human Tagged ORF Clone Product Data
    OriGene Technologies, Inc. 9620 Medical Center Drive, Ste 200 Rockville, MD 20850, US Phone: +1-888-267-4436 [email protected] EU: [email protected] CN: [email protected] Product datasheet for RC221759 FUSIP1 (SRSF10) (NM_006625) Human Tagged ORF Clone Product data: Product Type: Expression Plasmids Product Name: FUSIP1 (SRSF10) (NM_006625) Human Tagged ORF Clone Tag: Myc-DDK Symbol: SRSF10 Synonyms: FUSIP1; FUSIP2; NSSR; PPP1R149; SFRS13; SFRS13A; SRp38; SRrp40; TASR; TASR1; TASR2 Vector: pCMV6-Entry (PS100001) E. coli Selection: Kanamycin (25 ug/mL) Cell Selection: Neomycin ORF Nucleotide >RC221759 ORF sequence Sequence: Red=Cloning site Blue=ORF Green=Tags(s) TTTTGTAATACGACTCACTATAGGGCGGCCGGGAATTCGTCGACTGGATCCGGTACCGAGGAGATCTGCC GCCGCGATCGCC ATGTCCCGCTACCTGCGTCCCCCCAACACGTCTCTGTTCGTCAGGAACGTGGCCGACGACACCAGGTCTG AAGACTTGCGGCGTGAATTTGGTCGTTATGGTCCTATAGTTGATGTGTATGTTCCACTTGATTTCTACAC TCGCCGTCCAAGAGGATTTGCTTATGTTCAATTTGAGGATGTTCGTGATGCTGAAGACGCTTTACATAAT TTGGACAGAAAGTGGATTTGTGGACGGCAGATTGAAATACAGTTTGCCCAGGGGGATCGAAAGACACCAA ATCAGATGAAAGCCAAGGAAGGGAGGAATGTGTACAGTTCTTCACGCTATGATGATTATGACAGATACAG ACGTTCTAGAAGCCGAAGTTATGAAAGGAGGAGATCAAGAAGTCGGTCTTTTGATTACAACTATAGAAGA TCGTATAGTCCTAGAAACAGTAGACCGACTGGAAGACCACGGCGTAGCAGAAGCCATTCCGACAATGATA GACCAAACTGCAGCTGGAATACCCAGTACAGTTCTGCTTACTACACTTCAAGAAAGATC ACGCGTACGCGGCCGCTCGAGCAGAAACTCATCTCAGAAGAGGATCTGGCAGCAAATGATATCCTGGATT ACAAGGATGACGACGATAAGGTTTAA Protein Sequence: >RC221759 protein sequence Red=Cloning site Green=Tags(s) MSRYLRPPNTSLFVRNVADDTRSEDLRREFGRYGPIVDVYVPLDFYTRRPRGFAYVQFEDVRDAEDALHN
    [Show full text]
  • Molecular and Cellular Mechanisms of the Angiogenic Effect of Poly(Methacrylic Acid-Co-Methyl Methacrylate) Beads
    Molecular and Cellular Mechanisms of the Angiogenic Effect of Poly(methacrylic acid-co-methyl methacrylate) Beads by Lindsay Elizabeth Fitzpatrick A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Institute of Biomaterials and Biomedical Engineering University of Toronto © Copyright by Lindsay Elizabeth Fitzpatrick 2012 Molecular and Cellular Mechanisms of the Angiogenic Effect of Poly(methacrylic acid-co-methyl methacrylate) Beads Lindsay Elizabeth Fitzpatrick Doctorate of Philosophy Institute of Biomaterials and Biomedical Engineering University of Toronto 2012 Abstract Poly(methacrylic acid -co- methyl methacrylate) beads were previously shown to have a therapeutic effect on wound closure through the promotion of angiogenesis. However, it was unclear how this polymer elicited its beneficial properties. The goal of this thesis was to characterize the host response to MAA beads by identifying molecules of interest involved in MAA-mediated angiogenesis (in comparison to poly(methyl methacrylate) beads, PMMA). Using a model of diabetic wound healing and a macrophage-like cell line (dTHP-1), eight molecules of interest were identified in the host response to MAA beads. Gene and/or protein expression analysis showed that MAA beads increased the expression of Shh, IL-1β, IL-6, TNF- α and Spry2, but decreased the expression of CXCL10 and CXCL12, compared to PMMA and no beads. MAA beads also appeared to modulate the expression of OPN. In vivo, the global gene expression of OPN was increased in wounds treated with MAA beads, compared to PMMA and no beads. In contrast, dTHP-1 decreased OPN gene expression compared to PMMA and no beads, but expressed the same amount of secreted OPN, suggesting that the cells decreased the expression of the intracellular isoform of OPN.
    [Show full text]
  • Literature Mining Sustains and Enhances Knowledge Discovery from Omic Studies
    LITERATURE MINING SUSTAINS AND ENHANCES KNOWLEDGE DISCOVERY FROM OMIC STUDIES by Rick Matthew Jordan B.S. Biology, University of Pittsburgh, 1996 M.S. Molecular Biology/Biotechnology, East Carolina University, 2001 M.S. Biomedical Informatics, University of Pittsburgh, 2005 Submitted to the Graduate Faculty of School of Medicine in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2016 UNIVERSITY OF PITTSBURGH SCHOOL OF MEDICINE This dissertation was presented by Rick Matthew Jordan It was defended on December 2, 2015 and approved by Shyam Visweswaran, M.D., Ph.D., Associate Professor Rebecca Jacobson, M.D., M.S., Professor Songjian Lu, Ph.D., Assistant Professor Dissertation Advisor: Vanathi Gopalakrishnan, Ph.D., Associate Professor ii Copyright © by Rick Matthew Jordan 2016 iii LITERATURE MINING SUSTAINS AND ENHANCES KNOWLEDGE DISCOVERY FROM OMIC STUDIES Rick Matthew Jordan, M.S. University of Pittsburgh, 2016 Genomic, proteomic and other experimentally generated data from studies of biological systems aiming to discover disease biomarkers are currently analyzed without sufficient supporting evidence from the literature due to complexities associated with automated processing. Extracting prior knowledge about markers associated with biological sample types and disease states from the literature is tedious, and little research has been performed to understand how to use this knowledge to inform the generation of classification models from ‘omic’ data. Using pathway analysis methods to better understand the underlying biology of complex diseases such as breast and lung cancers is state-of-the-art. However, the problem of how to combine literature- mining evidence with pathway analysis evidence is an open problem in biomedical informatics research.
    [Show full text]
  • Progression of Pathology in PINK1-Deficient Mouse Brain From
    Torres-Odio et al. Journal of Neuroinflammation (2017) 14:154 DOI 10.1186/s12974-017-0928-0 RESEARCH Open Access Progression of pathology in PINK1-deficient mouse brain from splicing via ubiquitination, ER stress, and mitophagy changes to neuroinflammation Sylvia Torres-Odio1†, Jana Key1†, Hans-Hermann Hoepken1, Júlia Canet-Pons1, Lucie Valek2, Bastian Roller3, Michael Walter4, Blas Morales-Gordo5, David Meierhofer6, Patrick N. Harter3, Michel Mittelbronn3,7,8,9,10, Irmgard Tegeder2, Suzana Gispert1 and Georg Auburger1* Abstract Background: PINK1 deficiency causes the autosomal recessive PARK6 variant of Parkinson’s disease. PINK1 activates ubiquitin by phosphorylation and cooperates with the downstream ubiquitin ligase PARKIN, to exert quality control and control autophagic degradation of mitochondria and of misfolded proteins in all cell types. Methods: Global transcriptome profiling of mouse brain and neuron cultures were assessed in protein-protein interaction diagrams and by pathway enrichment algorithms. Validation by quantitative reverse transcriptase polymerase chain reaction and immunoblots was performed, including human neuroblastoma cells and patient primary skin fibroblasts. Results: In a first approach, we documented Pink1-deleted mice across the lifespan regarding brain mRNAs. The expression changes were always subtle, consistently affecting “intracellular membrane-bounded organelles”.Significant anomalies involved about 250 factors at age 6 weeks, 1300 at 6 months, and more than 3500 at age 18 months in the cerebellar tissue, including Srsf10, Ube3a, Mapk8, Creb3,andNfkbia. Initially, mildly significant pathway enrichment for the spliceosome was apparent. Later, highly significant networks of ubiquitin-mediated proteolysis and endoplasmic reticulum protein processing occurred. Finally, an enrichment of neuroinflammation factors appeared, together with profiles of bacterial invasion and MAPK signaling changes—while mitophagy had minor significance.
    [Show full text]
  • Delineation of Key Regulatory Elements Identifies Points Of
    DELINEATION OF KEY REGULATORY ELEMENTS IDENTIFIES POINTS OF VULNERABILITY IN THE MITOGEN-ACTIVATED SIGNALING NETWORK SUPPLEMENTARY MATERIALS List of contents Supplementary Figures with legends 1. Figure S1: Distribution of primary siRNA screen data, and standardization of assay procedure. 2. Figure S2: Scatter plot of screen data. 3. Figure S3: Functional relevance of the identified targets and Calculation of residence time from PDT and cell cycle distribution. 4. Figure S4: FACS profiles for ABL1 and AKT1. Table for data in Figure 5B. 5. Figure S5: Venn diagram showing the results of the comparative analysis of other screen results 6. Figure S6: Dose response profiles for the AKT1 + ABL1 inhibitor combination for CH1, list of the 14 cell lines and their description, effect of ABL1+AKT1 inhibitor combination on increase in apoptotic cells and G1 arrest in 14 cell lines, effects of CHEK1 inhibitor on combination C1,C2 on 4 cell lines. Supplementary Tables 1. Table S1: siRNA screen results for targeted kinases and phosphatases. 2. Table S2: Gene expression status of the validated hits. 3. Table S3: Role played by identified RNAi hits in regulation of cell cycle, the effect on PDTs along with phase-specific RTs. 4. Table S4: List of molecules classified as cell cycle targets. 5. Table S5: High confidence network used for graph theory analysis. 6. Table S6: Occurrences of nodes in shortest path networks. 7. Table S7: Network file used as SNAVI background. 8. Table S8: Classification of nodes present in modules according to specificity. Legends for tables Supplementary Experimental Procedures References Figure S1 A 450 400 G1 S 350 G2 300 250 200 150 100 50 Distribution of molecules Distribution 0 -6-4-20246 Z-score 350 200 400 G1 S 300 G2 150 300 250 200 100 200 150 100 50 100 Distribution of molecules 50 0 0 0 -4 -2 0 2 4 -4-20246 -4-20246 Z-score B PLK1 GAPDH PLCg BTK PLCg CDC2A PLCg CHEK1 PLCg MET Distribution profiles of complete primary screen and western blots showing knockdown efficiency.
    [Show full text]
  • 1 Title: Ultra-Conserved Elements in the Human Genome Authors And
    4/22/2004 Title: Ultra-conserved elements in the human genome Authors and affiliations: Gill Bejerano*, Michael Pheasant**, Igor Makunin**, Stuart Stephen**, W. James Kent*, John S. Mattick** and David Haussler*** *Department of Biomolecular Engineering and ***Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA **ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia Corresponding authors: Gill Bejerano ([email protected]) and David Haussler ([email protected]) -------------------------------------------------------------------------------------------------------------------- Supporting on-line material: Separate figures, Like Figure 1 but for each individual chromosome are available in postscript and PDF format, at http://www.cse.ucsc.edu/~jill/ultra.html. Table S1. A table listing all 481 ultra conserved elements and their properties can be found at http://www.cse.ucsc.edu/~jill/ultra.html. The elements were extracted from an alignment of NCBI Build 34 of the human genome (July 2003, UCSC hg16), mouse NCBI Build 30 (February 2003, UCSC mm3), and rat Baylor HGSC v3.1 (June 2003, UCSC rn3). This table does not include an additional, probably ultra conserved element (uc.10) overlapping an alternatively spliced exon of FUSIP1, which is not yet placed in the current assembly of human chromosome 1. Nor does the list contain the ultra conserved elements found in ribosomal RNA sequences, as these are not currently present as part of the draft genome sequences. The small subunit 18S rRNA includes 3 ultra conserved regions of sizes 399, 224, 212bp and the large subunit 28S rRNA contains 3 additional regions of sizes 277, 335, 227bp (the later two are one base apart).
    [Show full text]
  • Expanding the Horizons of Microrna Bioinformatics
    Downloaded from rnajournal.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press BIOINFORMATICS Expanding the horizons of microRNA bioinformatics RACHAEL P. HUNTLEY,1 BARBARA KRAMARZ,1 TONY SAWFORD,2 ZARA UMRAO,1 ANASTASIA KALEA,1,4 VANESSA ACQUAAH,1 MARIA J. MARTIN,2 MANUEL MAYR,3 and RUTH C. LOVERING1 1Institute of Cardiovascular Science, University College London, London WC1E 6JF, United Kingdom 2European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge CB10 1SD, United Kingdom 3King’s British Heart Foundation Centre, King’s College London, London SE5 9NU, United Kingdom ABSTRACT MicroRNA regulation of key biological and developmental pathways is a rapidly expanding area of research, accompanied by vast amounts of experimental data. This data, however, is not widely available in bioinformatic resources, making it difficult for researchers to find and analyze microRNA-related experimental data and define further research projects. We are addressing this problem by providing two new bioinformatics data sets that contain experimentally verified functional information for mammalian microRNAs involved in cardiovascular-relevant, and other, processes. To date, our resource provides over 4400 Gene Ontology annotations associated with over 500 microRNAs from human, mouse, and rat and over 2400 experimentally validated microRNA:target interactions. We illustrate how this resource can be used to create microRNA-focused interaction networks with a biological context using the known biological role of microRNAs and the mRNAs they regulate, enabling discovery of associations between gene products, biological pathways and, ultimately, diseases. This data will be crucial in advancing the field of microRNA bioinformatics and will establish consistent data sets for reproducible functional analysis of microRNAs across all biological research areas.
    [Show full text]
  • Lipopolysaccharide Treatment Induces Genome-Wide Pre-Mrna Splicing
    The Author(s) BMC Genomics 2016, 17(Suppl 7):509 DOI 10.1186/s12864-016-2898-5 RESEARCH Open Access Lipopolysaccharide treatment induces genome-wide pre-mRNA splicing pattern changes in mouse bone marrow stromal stem cells Ao Zhou1,2, Meng Li3,BoHe3, Weixing Feng3, Fei Huang1, Bing Xu4,6, A. Keith Dunker1, Curt Balch5, Baiyan Li6, Yunlong Liu1,4 and Yue Wang4* From The International Conference on Intelligent Biology and Medicine (ICIBM) 2015 Indianapolis, IN, USA. 13-15 November 2015 Abstract Background: Lipopolysaccharide (LPS) is a gram-negative bacterial antigen that triggers a series of cellular responses. LPS pre-conditioning was previously shown to improve the therapeutic efficacy of bone marrow stromal cells/bone-marrow derived mesenchymal stem cells (BMSCs) for repairing ischemic, injured tissue. Results: In this study, we systematically evaluated the effects of LPS treatment on genome-wide splicing pattern changes in mouse BMSCs by comparing transcriptome sequencing data from control vs. LPS-treated samples, revealing 197 exons whose BMSC splicing patterns were altered by LPS. Functional analysis of these alternatively spliced genes demonstrated significant enrichment of phosphoproteins, zinc finger proteins, and proteins undergoing acetylation. Additional bioinformatics analysis strongly suggest that LPS-induced alternatively spliced exons could have major effects on protein functions by disrupting key protein functional domains, protein-protein interactions, and post-translational modifications. Conclusion: Although it is still to be determined whether such proteome modifications improve BMSC therapeutic efficacy, our comprehensive splicing characterizations provide greater understanding of the intracellular mechanisms that underlie the therapeutic potential of BMSCs. Keywords: Alternative splicing, Lipopolysaccharide, Mesenchymal stem cells Background developmental pathways, and other processes associated Alternative splicing (AS) is important for gene regulation with multicellular organisms.
    [Show full text]
  • Molecular Targeting and Enhancing Anticancer Efficacy of Oncolytic HSV-1 to Midkine Expressing Tumors
    University of Cincinnati Date: 12/20/2010 I, Arturo R Maldonado , hereby submit this original work as part of the requirements for the degree of Doctor of Philosophy in Developmental Biology. It is entitled: Molecular Targeting and Enhancing Anticancer Efficacy of Oncolytic HSV-1 to Midkine Expressing Tumors Student's name: Arturo R Maldonado This work and its defense approved by: Committee chair: Jeffrey Whitsett Committee member: Timothy Crombleholme, MD Committee member: Dan Wiginton, PhD Committee member: Rhonda Cardin, PhD Committee member: Tim Cripe 1297 Last Printed:1/11/2011 Document Of Defense Form Molecular Targeting and Enhancing Anticancer Efficacy of Oncolytic HSV-1 to Midkine Expressing Tumors A dissertation submitted to the Graduate School of the University of Cincinnati College of Medicine in partial fulfillment of the requirements for the degree of DOCTORATE OF PHILOSOPHY (PH.D.) in the Division of Molecular & Developmental Biology 2010 By Arturo Rafael Maldonado B.A., University of Miami, Coral Gables, Florida June 1993 M.D., New Jersey Medical School, Newark, New Jersey June 1999 Committee Chair: Jeffrey A. Whitsett, M.D. Advisor: Timothy M. Crombleholme, M.D. Timothy P. Cripe, M.D. Ph.D. Dan Wiginton, Ph.D. Rhonda D. Cardin, Ph.D. ABSTRACT Since 1999, cancer has surpassed heart disease as the number one cause of death in the US for people under the age of 85. Malignant Peripheral Nerve Sheath Tumor (MPNST), a common malignancy in patients with Neurofibromatosis, and colorectal cancer are midkine- producing tumors with high mortality rates. In vitro and preclinical xenograft models of MPNST were utilized in this dissertation to study the role of midkine (MDK), a tumor-specific gene over- expressed in these tumors and to test the efficacy of a MDK-transcriptionally targeted oncolytic HSV-1 (oHSV).
    [Show full text]
  • GENE LIST ANTI-CORRELATED Systematic Common Description
    GENE LIST ANTI-CORRELATED Systematic Common Description 210348_at 4-Sep Septin 4 206155_at ABCC2 ATP-binding cassette, sub-family C (CFTR/MRP), member 2 221226_s_at ACCN4 Amiloride-sensitive cation channel 4, pituitary 207427_at ACR Acrosin 214957_at ACTL8 Actin-like 8 207422_at ADAM20 A disintegrin and metalloproteinase domain 20 216998_s_at ADAM5 synonym: tMDCII; Homo sapiens a disintegrin and metalloproteinase domain 5 (ADAM5) on chromosome 8. 216743_at ADCY6 Adenylate cyclase 6 206807_s_at ADD2 Adducin 2 (beta) 208544_at ADRA2B Adrenergic, alpha-2B-, receptor 38447_at ADRBK1 Adrenergic, beta, receptor kinase 1 219977_at AIPL1 211560_s_at ALAS2 Aminolevulinate, delta-, synthase 2 (sideroblastic/hypochromic anemia) 211004_s_at ALDH3B1 Aldehyde dehydrogenase 3 family, member B1 204705_x_at ALDOB Aldolase B, fructose-bisphosphate 220365_at ALLC Allantoicase 204664_at ALPP Alkaline phosphatase, placental (Regan isozyme) 216377_x_at ALPPL2 Alkaline phosphatase, placental-like 2 221114_at AMBN Ameloblastin, enamel matrix protein 206892_at AMHR2 Anti-Mullerian hormone receptor, type II 217293_at ANGPT1 Angiopoietin 1 210952_at AP4S1 Adaptor-related protein complex 4, sigma 1 subunit 207158_at APOBEC1 Apolipoprotein B mRNA editing enzyme, catalytic polypeptide 1 213611_at AQP5 Aquaporin 5 216219_at AQP6 Aquaporin 6, kidney specific 206784_at AQP8 Aquaporin 8 214490_at ARSF Arylsulfatase F 216204_at ARVCF Armadillo repeat gene deletes in velocardiofacial syndrome 214070_s_at ATP10B ATPase, Class V, type 10B 221240_s_at B3GNT4 UDP-GlcNAc:betaGal
    [Show full text]
  • Coexpression Networks Based on Natural Variation in Human Gene Expression at Baseline and Under Stress
    University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Fall 2010 Coexpression Networks Based on Natural Variation in Human Gene Expression at Baseline and Under Stress Renuka Nayak University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Computational Biology Commons, and the Genomics Commons Recommended Citation Nayak, Renuka, "Coexpression Networks Based on Natural Variation in Human Gene Expression at Baseline and Under Stress" (2010). Publicly Accessible Penn Dissertations. 1559. https://repository.upenn.edu/edissertations/1559 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/1559 For more information, please contact [email protected]. Coexpression Networks Based on Natural Variation in Human Gene Expression at Baseline and Under Stress Abstract Genes interact in networks to orchestrate cellular processes. Here, we used coexpression networks based on natural variation in gene expression to study the functions and interactions of human genes. We asked how these networks change in response to stress. First, we studied human coexpression networks at baseline. We constructed networks by identifying correlations in expression levels of 8.9 million gene pairs in immortalized B cells from 295 individuals comprising three independent samples. The resulting networks allowed us to infer interactions between biological processes. We used the network to predict the functions of poorly-characterized human genes, and provided some experimental support. Examining genes implicated in disease, we found that IFIH1, a diabetes susceptibility gene, interacts with YES1, which affects glucose transport. Genes predisposing to the same diseases are clustered non-randomly in the network, suggesting that the network may be used to identify candidate genes that influence disease susceptibility.
    [Show full text]