Predicting DNA, RNA, Ion, Peptide and Small Molecule Interaction Sites Within Protein Domains Anat Etzion-Fuchs 1,Davida.Todd2 and Mona Singh 1,2,*

Total Page:16

File Type:pdf, Size:1020Kb

Predicting DNA, RNA, Ion, Peptide and Small Molecule Interaction Sites Within Protein Domains Anat Etzion-Fuchs 1,Davida.Todd2 and Mona Singh 1,2,* Published online 17 May 2021 Nucleic Acids Research, 2021, Vol. 49, No. 13 e78 https://doi.org/10.1093/nar/gkab356 dSPRINT: predicting DNA, RNA, ion, peptide and small molecule interaction sites within protein domains Anat Etzion-Fuchs 1,DavidA.Todd2 and Mona Singh 1,2,* 1Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ 08544, USA and 2Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA Received November 12, 2020; Revised March 30, 2021; Editorial Decision April 20, 2021; Accepted April 22, 2021 Downloaded from https://academic.oup.com/nar/article/49/13/e78/6276922 by guest on 30 September 2021 ABSTRACT quence alone. Domains facilitate a wide range of functions, among the most important of which are mediating inter- Domains are instrumental in facilitating protein inter- actions. Knowing the types of interactions each domain actions with DNA, RNA, small molecules, ions and enables––whether with DNA, RNA, ions, small molecules peptides. Identifying ligand-binding domains within or other proteins––would be a great aid in protein function sequences is a critical step in protein function an- annotation. For example, proteins with DNA- and RNA- notation, and the ligand-binding properties of pro- binding activities are routinely analyzed and categorized teins are frequently analyzed based upon whether based upon identifying domain subsequences (6,7), and the they contain one of these domains. To date, how- functions of signaling proteins can be inferred based upon ever, knowledge of whether and how protein do- their composition of interaction domains (8). Nevertheless, mains interact with ligands has been limited to do- not all domains that bind these ligands are known, and new mains that have been observed in co-crystal struc- binding properties of domains continue to be elucidated ex- perimentally (9). tures; this leaves approximately two-thirds of hu- While identifying the types of molecules with which a man protein domain families uncharacterized with re- protein interacts has already proven to be an essential com- spect to whether and how they bind DNA, RNA, small ponent of functional annotation pipelines (10), pinpointing molecules, ions and peptides. To fill this gap, we in- the specific residues within proteins that mediate these in- troduce dSPRINT, a novel ensemble machine learn- teractions results in a deeper molecular understanding of ing method for predicting whether a domain binds protein function. For example, protein interaction sites can DNA, RNA, small molecules, ions or peptides, along be used to assess the impact of mutations in the context of with the positions within it that participate in these disease (11,12), to identify specificity-determining positions types of interactions. In stringent cross-validation (13) and to reason about interaction network evolution (14). testing, we demonstrate that dSPRINT has an excel- Recently, it has been shown that domains provide a frame- lent performance in uncovering ligand-binding posi- work within which to aggregate structural co-complex data and this per-domain aggregation allows accurate inference tions and domains. We also apply dSPRINT to newly of positions within domains that participate in interactions characterize the molecular functions of domains of (15). Domains annotated in this manner can then be used to unknown function. dSPRINT’s predictions can be identify interaction sites within proteins, and such domain- transferred from domains to sequences, enabling based annotation of interaction sites has been the basis of predictions about the ligand-binding properties of approaches to detect genes with perturbed functionalities in 95% of human genes. The dSPRINT framework and cancer (16) and to perform systematic cross-genomic anal- its predictions for 6503 human protein domains are ysis of regulatory network variation (17). freely available at http://protdomain.princeton.edu/ In this paper, we aim to develop methods to comprehen- dsprint. sively identify, for all domains found in the human genome, which types of ligands they bind along with the positions within them that participate in these interactions. To date, INTRODUCTION computational methods to characterize domain–ligand in- Domains are recurring protein subunits that are grouped teractions (15,18–20) have relied heavily on structural infor- into families that share sequence, structure and evolution- mation. However, structural data are not readily available ary descent (1–5). Protein domains are ubiquitous––over for all domain families, with only a third of the domain fam- 95% of human genes contain complete instances of ∼6000 ilies with instances in human protein sequences observed in Pfam domain families (1)––and can be identified from se- *To whom correspondence should be addressed. Tel: +1 609 258 2087; Fax: +1 609 258 1771; Email: [email protected] C The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. e78 Nucleic Acids Research, 2021, Vol. 49, No. 13 PAGE 2 OF 13 any co-complex crystal structure (see Supplementary Meth- MATERIALS AND METHODS ods S1.1.3). Identifying protein domains and creating our training set For the remaining domain families, current techniques fail to identify whether they are involved in mediating in- We first curate a training set of human protein domains teractions, what types of ligands they bind, and the specific with reliably annotated ligand-binding positions. We begin positions within them that participate in binding. A method by identifying all Pfam (1) domains with matches in hu- that can identify a domain’s ligands and interaction posi- man genes. This corresponds to 6029 protein domain fam- tions without relying on structural information not only ilies that collectively cover 95% of human genes (see Sup- would be a great aid in characterizing domains of unknown plementary Figure S1). Then, to construct our training set, function (DUFs) or other domains for which interaction in- we restrict to domains that (i) have >10 instances in the hu- formation is unknown but also would enable further down- man genome to leverage aggregating information across hu- stream analyses by identifying interaction sites within pro- man genes at the domain level, (ii) have per-position bind- teins that contain instances of these domains. Indeed, at ing labels (15) as described in the next section and (iii) pass Downloaded from https://academic.oup.com/nar/article/49/13/e78/6276922 by guest on 30 September 2021 present, the proteins for only ∼13% of human genes have certain domain similarity thresholds to avoid trivial cases co-complex structures with ligands and from which inter- where multiple near-identical domains are utilized. Supple- action sites can be derived (see Supplementary Figure S1). mentary Methods S1.1.1 describes the details of this pro- Here we develop a machine learning framework, cess including how we identify domain instances using hid- domain Sequence-based PRediction of INTeraction-sites den Markov Model (HMM) profile models from the Pfam- (dSPRINT), to predict DNA, RNA, ion, small molecule A (version 31) database (1) using HMMER (28), and Sup- and peptide binding positions within all protein domains plementary Methods S1.1.2 describes the domain similarity found in the human genome. While previous methods have filtering done using hhalign (v3.0.3) from the HH-suite29 ( ). predicted sites and regions within protein sequences that Ultimately, we obtain a training set of 44,872 domain posi- participate in interactions (e.g., (21–25), and see reviews tions within 327 domains. (26,27)), dSPRINT instead predicts the types of ligands a domain binds along with the specific interaction posi- Defining structure-based training labels within domains tions within it, even for domains that lack any structural characterization. dSPRINT’s focus on predicting the Domain positions are labeled as binding RNA, DNA, ion, ligand-binding properties of domains is complementary peptide and small-molecule based on InteracDome (15), a to existing approaches that annotate interaction sites collection of binding positions across 4128 protein domain within protein sequences. To train dSPRINT, we leverage families annotated via analysis of PDB co-complex crys- InteracDome (15), a large dataset of domains for which tal structures. For a position within a domain, its Interac- binding information has been annotated based on the Dome positional score corresponds to the fraction of Bi- analysis of co-complex crystal structures, while considering oLip (30) structural co-complexes in which the amino acid a comprehensive set of features extracted from a diverse found in that domain position is within 3.6A˚ of the lig- set of sources. Our framework aggregates information at and. For each InteracDome domain–ligand type pair anno- the level of protein domain positions, and features for each tated to have instances in at least three non-redundant PDB domain position are derived from sites corresponding to structures, we define positive (i.e., binding) and negative this position across instances within protein sequences. (i.e., non-binding) positions using the pair’s InteracDome- Site-based
Recommended publications
  • Antisense RNA Polymerase II Divergent Transcripts Are P-Tefb Dependent and Substrates for the RNA Exosome
    Antisense RNA polymerase II divergent transcripts are P-TEFb dependent and substrates for the RNA exosome Ryan A. Flynna,1,2, Albert E. Almadaa,b,1, Jesse R. Zamudioa, and Phillip A. Sharpa,b,3 aDavid H. Koch Institute for Integrative Cancer Research, Cambridge, MA 02139; and bDepartment of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139 Contributed by Phillip A. Sharp, May 12, 2011 (sent for review March 3, 2011) Divergent transcription occurs at the majority of RNA polymerase II PII carboxyl-terminal domain (CTD) at serine 2, DSIF, and (RNAPII) promoters in mouse embryonic stem cells (mESCs), and NELF results in the dissociation of NELF from the elongation this activity correlates with CpG islands. Here we report the char- complex and continuation of transcription (13). More recently acterization of upstream antisense transcription in regions encod- it was recognized, in mESCs, that c-Myc stimulates transcription ing transcription start site associated RNAs (TSSa-RNAs) at four at over a third of all cellular promoters by recruitment of P-TEFb divergent CpG island promoters: Isg20l1, Tcea1, Txn1, and Sf3b1. (12). Intriguingly in these same cells, NELF and DSIF have bi- We find that upstream antisense RNAs (uaRNAs) have distinct modal binding profiles at divergent TSSs. This suggests divergent capped 5′ termini and heterogeneous nonpolyadenylated 3′ ends. RNAPII complexes might be poised for signals controlling elon- uaRNAs are short-lived with average half-lives of 18 minutes and gation and opens up the possibility that in the antisense direction are present at 1–4 copies per cell, approximately one RNA per DNA P-TEF-b recruitment may be regulating release for productive template.
    [Show full text]
  • Coupling of Spliceosome Complexity to Intron Diversity
    bioRxiv preprint doi: https://doi.org/10.1101/2021.03.19.436190; this version posted March 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Coupling of spliceosome complexity to intron diversity Jade Sales-Lee1, Daniela S. Perry1, Bradley A. Bowser2, Jolene K. Diedrich3, Beiduo Rao1, Irene Beusch1, John R. Yates III3, Scott W. Roy4,6, and Hiten D. Madhani1,6,7 1Dept. of Biochemistry and Biophysics University of California – San Francisco San Francisco, CA 94158 2Dept. of Molecular and Cellular Biology University of California - Merced Merced, CA 95343 3Department of Molecular Medicine The Scripps Research Institute, La Jolla, CA 92037 4Dept. of Biology San Francisco State University San Francisco, CA 94132 5Chan-Zuckerberg Biohub San Francisco, CA 94158 6Corresponding authors: [email protected], [email protected] 7Lead Contact 1 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.19.436190; this version posted March 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. SUMMARY We determined that over 40 spliceosomal proteins are conserved between many fungal species and humans but were lost during the evolution of S. cerevisiae, an intron-poor yeast with unusually rigid splicing signals. We analyzed null mutations in a subset of these factors, most of which had not been investigated previously, in the intron-rich yeast Cryptococcus neoformans.
    [Show full text]
  • A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus
    Page 1 of 781 Diabetes A Computational Approach for Defining a Signature of β-Cell Golgi Stress in Diabetes Mellitus Robert N. Bone1,6,7, Olufunmilola Oyebamiji2, Sayali Talware2, Sharmila Selvaraj2, Preethi Krishnan3,6, Farooq Syed1,6,7, Huanmei Wu2, Carmella Evans-Molina 1,3,4,5,6,7,8* Departments of 1Pediatrics, 3Medicine, 4Anatomy, Cell Biology & Physiology, 5Biochemistry & Molecular Biology, the 6Center for Diabetes & Metabolic Diseases, and the 7Herman B. Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202; 2Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202; 8Roudebush VA Medical Center, Indianapolis, IN 46202. *Corresponding Author(s): Carmella Evans-Molina, MD, PhD ([email protected]) Indiana University School of Medicine, 635 Barnhill Drive, MS 2031A, Indianapolis, IN 46202, Telephone: (317) 274-4145, Fax (317) 274-4107 Running Title: Golgi Stress Response in Diabetes Word Count: 4358 Number of Figures: 6 Keywords: Golgi apparatus stress, Islets, β cell, Type 1 diabetes, Type 2 diabetes 1 Diabetes Publish Ahead of Print, published online August 20, 2020 Diabetes Page 2 of 781 ABSTRACT The Golgi apparatus (GA) is an important site of insulin processing and granule maturation, but whether GA organelle dysfunction and GA stress are present in the diabetic β-cell has not been tested. We utilized an informatics-based approach to develop a transcriptional signature of β-cell GA stress using existing RNA sequencing and microarray datasets generated using human islets from donors with diabetes and islets where type 1(T1D) and type 2 diabetes (T2D) had been modeled ex vivo. To narrow our results to GA-specific genes, we applied a filter set of 1,030 genes accepted as GA associated.
    [Show full text]
  • Análise Integrativa De Perfis Transcricionais De Pacientes Com
    UNIVERSIDADE DE SÃO PAULO FACULDADE DE MEDICINA DE RIBEIRÃO PRETO PROGRAMA DE PÓS-GRADUAÇÃO EM GENÉTICA ADRIANE FEIJÓ EVANGELISTA Análise integrativa de perfis transcricionais de pacientes com diabetes mellitus tipo 1, tipo 2 e gestacional, comparando-os com manifestações demográficas, clínicas, laboratoriais, fisiopatológicas e terapêuticas Ribeirão Preto – 2012 ADRIANE FEIJÓ EVANGELISTA Análise integrativa de perfis transcricionais de pacientes com diabetes mellitus tipo 1, tipo 2 e gestacional, comparando-os com manifestações demográficas, clínicas, laboratoriais, fisiopatológicas e terapêuticas Tese apresentada à Faculdade de Medicina de Ribeirão Preto da Universidade de São Paulo para obtenção do título de Doutor em Ciências. Área de Concentração: Genética Orientador: Prof. Dr. Eduardo Antonio Donadi Co-orientador: Prof. Dr. Geraldo A. S. Passos Ribeirão Preto – 2012 AUTORIZO A REPRODUÇÃO E DIVULGAÇÃO TOTAL OU PARCIAL DESTE TRABALHO, POR QUALQUER MEIO CONVENCIONAL OU ELETRÔNICO, PARA FINS DE ESTUDO E PESQUISA, DESDE QUE CITADA A FONTE. FICHA CATALOGRÁFICA Evangelista, Adriane Feijó Análise integrativa de perfis transcricionais de pacientes com diabetes mellitus tipo 1, tipo 2 e gestacional, comparando-os com manifestações demográficas, clínicas, laboratoriais, fisiopatológicas e terapêuticas. Ribeirão Preto, 2012 192p. Tese de Doutorado apresentada à Faculdade de Medicina de Ribeirão Preto da Universidade de São Paulo. Área de Concentração: Genética. Orientador: Donadi, Eduardo Antonio Co-orientador: Passos, Geraldo A. 1. Expressão gênica – microarrays 2. Análise bioinformática por module maps 3. Diabetes mellitus tipo 1 4. Diabetes mellitus tipo 2 5. Diabetes mellitus gestacional FOLHA DE APROVAÇÃO ADRIANE FEIJÓ EVANGELISTA Análise integrativa de perfis transcricionais de pacientes com diabetes mellitus tipo 1, tipo 2 e gestacional, comparando-os com manifestações demográficas, clínicas, laboratoriais, fisiopatológicas e terapêuticas.
    [Show full text]
  • A Genome-Wide Library of MADM Mice for Single-Cell Genetic Mosaic Analysis
    bioRxiv preprint doi: https://doi.org/10.1101/2020.06.05.136192; this version posted June 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Contreras et al., A Genome-wide Library of MADM Mice for Single-Cell Genetic Mosaic Analysis Ximena Contreras1, Amarbayasgalan Davaatseren1, Nicole Amberg1, Andi H. Hansen1, Johanna Sonntag1, Lill Andersen2, Tina Bernthaler2, Anna Heger1, Randy Johnson3, Lindsay A. Schwarz4,5, Liqun Luo4, Thomas Rülicke2 & Simon Hippenmeyer1,6,# 1 Institute of Science and Technology Austria, Am Campus 1, 3400 Klosterneuburg, Austria 2 Institute of Laboratory Animal Science, University of Veterinary Medicine Vienna, Vienna, Austria 3 Department of Biochemistry and Molecular Biology, University of Texas, Houston, TX 77030, USA 4 HHMI and Department of Biology, Stanford University, Stanford, CA 94305, USA 5 Present address: St. Jude Children’s Research Hospital, Memphis, TN 38105, USA 6 Lead contact #Correspondence and requests for materials should be addressed to S.H. ([email protected]) 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.05.136192; this version posted June 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Contreras et al., SUMMARY Mosaic Analysis with Double Markers (MADM) offers a unique approach to visualize and concomitantly manipulate genetically-defined cells in mice with single-cell resolution.
    [Show full text]
  • Mouse Fam32a Knockout Project (CRISPR/Cas9)
    https://www.alphaknockout.com Mouse Fam32a Knockout Project (CRISPR/Cas9) Objective: To create a Fam32a knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering. Strategy summary: The Fam32a gene (NCBI Reference Sequence: NM_026455 ; Ensembl: ENSMUSG00000003039 ) is located on Mouse chromosome 8. 4 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 4 (Transcript: ENSMUST00000003123). Exon 1~4 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Exon 1 starts from about 0.3% of the coding region. Exon 1~4 covers 100.0% of the coding region. The size of effective KO region: ~2461 bp. The KO region does not have any other known gene. Page 1 of 8 https://www.alphaknockout.com Overview of the Targeting Strategy Wildtype allele 5' gRNA region gRNA region 3' 1 2 3 4 Legends Exon of mouse Fam32a Knockout region Page 2 of 8 https://www.alphaknockout.com Overview of the Dot Plot (up) Window size: 15 bp Forward Reverse Complement Sequence 12 Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis. Overview of the Dot Plot (down) Window size: 15 bp Forward Reverse Complement Sequence 12 Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats.
    [Show full text]
  • Gene Regulation Underlies Environmental Adaptation in House Mice
    Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press Research Gene regulation underlies environmental adaptation in house mice Katya L. Mack,1 Mallory A. Ballinger,1 Megan Phifer-Rixey,2 and Michael W. Nachman1 1Department of Integrative Biology and Museum of Vertebrate Zoology, University of California, Berkeley, California 94720, USA; 2Department of Biology, Monmouth University, West Long Branch, New Jersey 07764, USA Changes in cis-regulatory regions are thought to play a major role in the genetic basis of adaptation. However, few studies have linked cis-regulatory variation with adaptation in natural populations. Here, using a combination of exome and RNA- seq data, we performed expression quantitative trait locus (eQTL) mapping and allele-specific expression analyses to study the genetic architecture of regulatory variation in wild house mice (Mus musculus domesticus) using individuals from five pop- ulations collected along a latitudinal cline in eastern North America. Mice in this transect showed clinal patterns of variation in several traits, including body mass. Mice were larger in more northern latitudes, in accordance with Bergmann’s rule. We identified 17 genes where cis-eQTLs were clinal outliers and for which expression level was correlated with latitude. Among these clinal outliers, we identified two genes (Adam17 and Bcat2) with cis-eQTLs that were associated with adaptive body mass variation and for which expression is correlated with body mass both within and between populations. Finally, we per- formed a weighted gene co-expression network analysis (WGCNA) to identify expression modules associated with measures of body size variation in these mice.
    [Show full text]
  • Lack of Polymorphisms in the Coding Region of the Highly Conserved Gene Encoding Transcription Elongation Factor S-II (TCEA1)
    Drug Discov Ther 2007;1(1):9-11. Brief Report Lack of polymorphisms in the coding region of the highly conserved gene encoding transcription elongation factor S-II (TCEA1) Takahiro Ito1, Kent Doi2, 3, Naoko Matsumoto4, Fumiko Kakihara4, Eisei Noiri2, Setsuo Hasegawa4, Katsushi Tokunaga3, Kazuhisa Sekimizu1, * 1 Division of Developmental Biochemistry, Graduate School of Pharmaceutical Sciences, University of Tokyo, Tokyo, Japan; 2 Department of Nephrology and Endocrinology, Graduate School of Medicine, University of Tokyo, Tokyo, Japan; 3 Department of Human Genetics, Graduate School of Medicine, University of Tokyo, Tokyo, Japan; 4 Sekino Clinical Pharmacology Clinic, Tokyo, Japan. ABSTRACT: Transcription elongation factor been identified in many eukaryotes including budding S-II stimulates mRNA chain elongation catalyzed yeast, fruit fly, mouse, and human (6-9). The human by RNA polymerase II. S-II is highly conserved S-II gene, designated TCEA1, was initially reported among eukaryotes and is essential for definitive to be a 2.5-kb intronless gene mapped on 3p22- > hematopoiesis in mice. In the present study, we p21.3 (10). We previously reported that the murine report the identification of five novel nucleotide S-II gene consists of 10 exons and maps on the variations in the human S-II gene in the Japanese proximal region of mouse chromosome 1, which is population. All five variations were located in syntenic to human chromosome 8q (11). Consistent introns, and no polymorphisms were found in the with the synteny between the mouse and human protein-coding region, suggesting strong negative chromosomes, recent progress in the human genome selection during gene evolution.
    [Show full text]
  • A High-Throughput Approach to Uncover Novel Roles of APOBEC2, a Functional Orphan of the AID/APOBEC Family
    Rockefeller University Digital Commons @ RU Student Theses and Dissertations 2018 A High-Throughput Approach to Uncover Novel Roles of APOBEC2, a Functional Orphan of the AID/APOBEC Family Linda Molla Follow this and additional works at: https://digitalcommons.rockefeller.edu/ student_theses_and_dissertations Part of the Life Sciences Commons A HIGH-THROUGHPUT APPROACH TO UNCOVER NOVEL ROLES OF APOBEC2, A FUNCTIONAL ORPHAN OF THE AID/APOBEC FAMILY A Thesis Presented to the Faculty of The Rockefeller University in Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy by Linda Molla June 2018 © Copyright by Linda Molla 2018 A HIGH-THROUGHPUT APPROACH TO UNCOVER NOVEL ROLES OF APOBEC2, A FUNCTIONAL ORPHAN OF THE AID/APOBEC FAMILY Linda Molla, Ph.D. The Rockefeller University 2018 APOBEC2 is a member of the AID/APOBEC cytidine deaminase family of proteins. Unlike most of AID/APOBEC, however, APOBEC2’s function remains elusive. Previous research has implicated APOBEC2 in diverse organisms and cellular processes such as muscle biology (in Mus musculus), regeneration (in Danio rerio), and development (in Xenopus laevis). APOBEC2 has also been implicated in cancer. However the enzymatic activity, substrate or physiological target(s) of APOBEC2 are unknown. For this thesis, I have combined Next Generation Sequencing (NGS) techniques with state-of-the-art molecular biology to determine the physiological targets of APOBEC2. Using a cell culture muscle differentiation system, and RNA sequencing (RNA-Seq) by polyA capture, I demonstrated that unlike the AID/APOBEC family member APOBEC1, APOBEC2 is not an RNA editor. Using the same system combined with enhanced Reduced Representation Bisulfite Sequencing (eRRBS) analyses I showed that, unlike the AID/APOBEC family member AID, APOBEC2 does not act as a 5-methyl-C deaminase.
    [Show full text]
  • Mrna Editing, Processing and Quality Control in Caenorhabditis Elegans
    | WORMBOOK mRNA Editing, Processing and Quality Control in Caenorhabditis elegans Joshua A. Arribere,*,1 Hidehito Kuroyanagi,†,1 and Heather A. Hundley‡,1 *Department of MCD Biology, UC Santa Cruz, California 95064, †Laboratory of Gene Expression, Medical Research Institute, Tokyo Medical and Dental University, Tokyo 113-8510, Japan, and ‡Medical Sciences Program, Indiana University School of Medicine-Bloomington, Indiana 47405 ABSTRACT While DNA serves as the blueprint of life, the distinct functions of each cell are determined by the dynamic expression of genes from the static genome. The amount and specific sequences of RNAs expressed in a given cell involves a number of regulated processes including RNA synthesis (transcription), processing, splicing, modification, polyadenylation, stability, translation, and degradation. As errors during mRNA production can create gene products that are deleterious to the organism, quality control mechanisms exist to survey and remove errors in mRNA expression and processing. Here, we will provide an overview of mRNA processing and quality control mechanisms that occur in Caenorhabditis elegans, with a focus on those that occur on protein-coding genes after transcription initiation. In addition, we will describe the genetic and technical approaches that have allowed studies in C. elegans to reveal important mechanistic insight into these processes. KEYWORDS Caenorhabditis elegans; splicing; RNA editing; RNA modification; polyadenylation; quality control; WormBook TABLE OF CONTENTS Abstract 531 RNA Editing and Modification 533 Adenosine-to-inosine RNA editing 533 The C. elegans A-to-I editing machinery 534 RNA editing in space and time 535 ADARs regulate the levels and fates of endogenous dsRNA 537 Are other modifications present in C.
    [Show full text]
  • Oral Administration of Lactobacillus Plantarum 299V
    Genes Nutr (2015) 10:10 DOI 10.1007/s12263-015-0461-7 RESEARCH PAPER Oral administration of Lactobacillus plantarum 299v modulates gene expression in the ileum of pigs: prediction of crosstalk between intestinal immune cells and sub-mucosal adipocytes 1 1,4 1,5 1 Marcel Hulst • Gabriele Gross • Yaping Liu • Arjan Hoekman • 2 1,3 1,3 Theo Niewold • Jan van der Meulen • Mari Smits Received: 19 November 2014 / Accepted: 28 March 2015 / Published online: 11 April 2015 Ó The Author(s) 2015. This article is published with open access at Springerlink.com Abstract To study host–probiotic interactions in parts of ileum. A higher expression level of several B cell-specific the intestine only accessible in humans by surgery (je- transcription factors/regulators was observed, suggesting junum, ileum and colon), pigs were used as model for that an influx of B cells from the periphery to the ileum humans. Groups of eight 6-week-old pigs were repeatedly and/or the proliferation of progenitor B cells to IgA-com- orally administered with 5 9 1012 CFU Lactobacillus mitted plasma cells in the Peyer’s patches of the ileum was plantarum 299v (L. plantarum 299v) or PBS, starting with stimulated. Genes coding for enzymes that metabolize a single dose followed by three consecutive daily dosings leukotriene B4, 1,25-dihydroxyvitamin D3 and steroids 10 days later. Gene expression was assessed with pooled were regulated in the ileum. Bioinformatics analysis pre- RNA samples isolated from jejunum, ileum and colon dicted that these metabolites may play a role in the scrapings of the eight pigs per group using Affymetrix crosstalk between intestinal immune cells and sub-mucosal porcine microarrays.
    [Show full text]
  • Gene Regulation and the Genomic Basis of Speciation and Adaptation in House Mice (Mus Musculus)
    Gene regulation and the genomic basis of speciation and adaptation in house mice (Mus musculus) By Katya L. Mack A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Integrative Biology in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Michael W. Nachman, Chair Professor Rasmus Nielsen Professor Craig T. Miller Fall 2018 Abstract Gene regulation and the genomic basis of speciation and adaptation in house mice (Mus musculus) by Katya Mack Doctor of Philosophy in Integrative Biology University of California, Berkeley Professor Michael W. Nachman, Chair Gene expression is a molecular phenotype that is essential to organismal form and fitness. However, how gene regulation evolves over evolutionary time and contributes to phenotypic differences within and between species is still not well understood. In my dissertation, I examined the role of gene regulation in adaptation and speciation in house mice (Mus musculus). In chapter 1, I reviewed theoretical models and empirical data on the role of gene regulation in the origin of new species. I discuss how regulatory divergence between species can result in hybrid dysfunction and point to areas that could benefit from future research. In chapter 2, I characterized regulatory divergence between M. m. domesticus and M. m. musculus associated with male hybrid sterility. The major model for the evolution of post-zygotic isolation proposes that hybrid sterility or inviability will evolve as a product of deleterious interactions (i.e., negative epistasis) between alleles at different loci when joined together in hybrids. As the regulation of gene expression is inherently based on interactions between loci, disruption of gene regulation in hybrids may be a common mechanism for post-zygotic isolation.
    [Show full text]