Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences

Total Page:16

File Type:pdf, Size:1020Kb

Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences bioRxiv preprint doi: https://doi.org/10.1101/622803; this version posted August 31, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences Alexander Rives * 1 2 Joshua Meier * 1 Tom Sercu * 1 Siddharth Goyal * 1 Zeming Lin 2 Demi Guo 3 y Myle Ott 1 C. Lawrence Zitnick 1 Jerry Ma 4 y Rob Fergus 2 Abstract 1. Introduction Growth in the number of protein sequences in public In the field of artificial intelligence, a combination databases has followed an exponential trend over decades, of scale in data and model capacity enabled by un- creating a deep view into the breadth and diversity of protein supervised learning has led to major advances in sequences across life. This data is a promising ground for representation learning and statistical generation. studying predictive and generative models for biology using In the life sciences, the anticipated growth of se- artificial intelligence. Our focus here will be to fit a single quencing promises unprecedented data on natural model to many diverse sequences from across evolution. sequence diversity. Evolutionary-scale language Accordingly we study high-capacity neural networks, inves- modeling is a logical step toward predictive and tigating what can be learned about the biology of proteins generative artificial intelligence for biology. To from modeling evolutionary data at scale. this end we use unsupervised learning to train The idea that biological function and structure are recorded a deep contextual language model on 86 billion in the statistics of protein sequences selected through evolu- amino acids across 250 million protein sequences tion has a long history (Yanofsky et al., 1964; Altschuh et al., spanning evolutionary diversity. The resulting 1987; 1988). Out of the possible random perturbations to a model contains information about biological prop- sequence, evolution is biased toward selecting those that are erties in its representations. The representations consistent with fitness (Göbel et al., 1994). The unobserved are learned from sequence data alone; no informa- variables that determine a protein’s fitness, such as structure, tion about the properties of the sequences is given function, and stability, leave a record in the distribution of through supervision or domain-specific features. observed natural sequences (Göbel et al., 1994). The learned representation space has a multi-scale organization reflecting structure from the level Unlocking the information encoded in protein sequence of biochemical properties of amino acids to re- variation is a longstanding problem in biology. An analo- mote homology of proteins. Information about gous problem in the field of artificial intelligence is natural secondary and tertiary structure is encoded in the language understanding, where the distributional hypothe- representations and can be identified by linear pro- sis posits that a word’s semantics can be derived from the jections. Representation learning produces fea- contexts in which it appears (Harris, 1954). tures that generalize across a range of applications, Recently, techniques based on self-supervision, a form of enabling state-of-the-art supervised prediction of unsupervised learning in which context within the text is mutational effect and secondary structure, and im- used to predict missing words, have been shown to materi- proving state-of-the-art features for long-range alize representations of word meaning that can generalize contact prediction. across natural language tasks (Collobert & Weston, 2008; Dai & Le, 2015; Peters et al., 2018; Devlin et al., 2018). The ability to learn such representations improves significantly with larger training datasets (Baevski et al., 2019; Radford *Equal contribution yWork performed while at Facebook AI et al., 2019). Research 1Facebook AI Research 2Dept. of Computer Science, New York University 3Harvard University 4Booth School of Busi- Protein sequences result from a process greatly dissimilar ness, University of Chicago & Yale Law School. Correspondence to natural language. It is uncertain whether the models and to: Alexander Rives <[email protected]>. Pre-trained models objective functions effective for natural language transfer available at: <https://github.com/facebookresearch/esm>. across differences between the domains. We explore this question by training high-capacity Transformer language bioRxiv preprint doi: https://doi.org/10.1101/622803; this version posted August 31, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. models on evolutionary data. We investigate the resulting improvements in the learned representations. In recent work, unsupervised representations for the presence of biologi- self-supervision methods used in conjunction with large data cal organizing principles and information about intrinsic and high-capacity models produced new state-of-the-art biological properties. We find metric structure in the rep- results approaching human performance on various ques- resentation space that accords with organizing principles tion answering and semantic reasoning benchmarks (Devlin at scales from physicochemical to remote homology. We et al., 2018), and coherent natural text generation (Radford also find that secondary and tertiary protein structure can et al., 2019). be identified in representations. The properties captured by This paper explores self-supervised language modeling ap- the representations generalize across proteins. We apply the proaches that have demonstrated state-of-the-art perfor- representations to a range of prediction tasks and find that mance on a range of natural language processing tasks, ap- they improve state-of-art features across the applications. plying them to protein data in the form of unlabeled amino acid sequences. Since protein sequences use a small vocabu- 2. Background lary of twenty canonical elements, the modeling problem is more similar to character-level language models (Mikolov Sequence alignment and search is a longstanding basis et al., 2012; Kim et al., 2016) than word-level models. Like for comparative and statistical analysis of biological se- natural language, protein sequences also contain long-range quence data. (Altschul et al., 1990; Altschul & Koonin, dependencies, motivating use of architectures that detect 1998; Eddy, 1998; Remmert et al., 2011). Search across and model distant context (Vaswani et al., 2017). large databases containing evolutionary diversity assem- bles related sequences into a multiple sequence alignment (MSA). Within sequence families, mutational patterns con- 3. Scaling language models to 250 million vey information about functional sites, stability, tertiary con- diverse protein sequences tacts, binding, and other properties (Altschuh et al., 1987; 1988; Göbel et al., 1994). Conserved sites correlate with Large protein sequence databases contain diverse sequences functional and structural importance (Altschuh et al., 1987). sampled across life. In our experiments we explore Local biochemical and structural contexts are reflected in datasets with up to 250 million sequences of the Uniparc preferences for distinct classes of amino acids (Levitt, 1978). database (The UniProt Consortium, 2007) which has 86 Covarying mutations have been associated with function, billion amino acids. This data is comparable in size to large tertiary contacts, and binding (Göbel et al., 1994). text datasets that are being used to train high-capacity neu- ral network architectures on natural language (Devlin et al., The prospect of inferring biological structure and function 2018; Radford et al., 2019). To model the data of evolu- from evolutionary statistics has motivated development of tion with fidelity, neural network architectures must have machine learning on individual sequence families. Covari- capacity and inductive biases to represent its breadth and ance, correlation, and mutual information have confounding diversity. effects from indirect couplings (Weigt et al., 2009). Maxi- mum entropy methods disentangle direct interactions from We investigate the Transformer (Vaswani et al., 2017), which indirect interactions by inferring parameters of a posited has emerged as a powerful general-purpose model architec- generating distribution for the sequence family (Weigt et al., ture for representation learning and generative modeling, 2009; Marks et al., 2011; Morcos et al., 2011; Jones et al., outperforming recurrent and convolutional architectures in 2011; Balakrishnan et al., 2011; Ekeberg et al., 2013b). The natural language settings. We use a deep Transformer (De- generative picture can be extended to include latent vari- vlin et al., 2018) with character sequences of amino acids ables (Riesselman et al., 2018). from the proteins as input. Recently, self-supervision has emerged as a core direction in The Transformer processes inputs through a series of blocks artificial intelligence research. Unlike supervised learning that alternate self-attention with feed-forward connections. which requires manual annotation of each datapoint, self- Self-attention allows the network
Recommended publications
  • 1 Supporting Information for a Microrna Network Regulates
    Supporting Information for A microRNA Network Regulates Expression and Biosynthesis of CFTR and CFTR-ΔF508 Shyam Ramachandrana,b, Philip H. Karpc, Peng Jiangc, Lynda S. Ostedgaardc, Amy E. Walza, John T. Fishere, Shaf Keshavjeeh, Kim A. Lennoxi, Ashley M. Jacobii, Scott D. Rosei, Mark A. Behlkei, Michael J. Welshb,c,d,g, Yi Xingb,c,f, Paul B. McCray Jr.a,b,c Author Affiliations: Department of Pediatricsa, Interdisciplinary Program in Geneticsb, Departments of Internal Medicinec, Molecular Physiology and Biophysicsd, Anatomy and Cell Biologye, Biomedical Engineeringf, Howard Hughes Medical Instituteg, Carver College of Medicine, University of Iowa, Iowa City, IA-52242 Division of Thoracic Surgeryh, Toronto General Hospital, University Health Network, University of Toronto, Toronto, Canada-M5G 2C4 Integrated DNA Technologiesi, Coralville, IA-52241 To whom correspondence should be addressed: Email: [email protected] (M.J.W.); yi- [email protected] (Y.X.); Email: [email protected] (P.B.M.) This PDF file includes: Materials and Methods References Fig. S1. miR-138 regulates SIN3A in a dose-dependent and site-specific manner. Fig. S2. miR-138 regulates endogenous SIN3A protein expression. Fig. S3. miR-138 regulates endogenous CFTR protein expression in Calu-3 cells. Fig. S4. miR-138 regulates endogenous CFTR protein expression in primary human airway epithelia. Fig. S5. miR-138 regulates CFTR expression in HeLa cells. Fig. S6. miR-138 regulates CFTR expression in HEK293T cells. Fig. S7. HeLa cells exhibit CFTR channel activity. Fig. S8. miR-138 improves CFTR processing. Fig. S9. miR-138 improves CFTR-ΔF508 processing. Fig. S10. SIN3A inhibition yields partial rescue of Cl- transport in CF epithelia.
    [Show full text]
  • Role of Ube4b in the Ubiquitination of the Htlv-1 Tax Oncoprotein and Nf-B Activation
    ROLE OF UBE4B IN THE UBIQUITINATION OF THE HTLV-1 TAX ONCOPROTEIN AND NF-B ACTIVATION by Teng Han A thesis submitted to Johns Hopkins University in conformity with the requirements for the degree of Master of Science Baltimore, Maryland April, 2014 © 2014 Teng Han All Rights Reserved i ABSTRACT Human T-cell leukemia virus type 1 (HTLV-1) is the etiological agent of adult T-cell leukemia and lymphoma (ATLL), an aggressive CD4+CD25+ malignancy. The HTLV-1 genome encodes the Tax protein that plays essential regulatory roles in oncogenic transformation of T lymphocytes by deregulating different cellular pathways, most notably NF-κB. Lysine 63 (K63)-linked polyubiquitination of Tax provides an important regulatory mechanism that promotes Tax-mediated interaction with the IKK complex and activation of NF-κB. However, the E3 ligase(s) and other host proteins regulating Tax ubiquitination are currently unknown. To identify novel Tax interacting proteins that may regulate its ubiquitination we conducted a yeast two-hybrid screen using Tax as bait. This screen yielded the E3/E4 ligase ubiquitin conjugation E4 B (UBE4B) as a novel binding partner for Tax. Here, we confirmed the interaction between Tax and UBE4B in mammalian cells by co-immunoprecipitation assays and demonstrated that they co- localized in the cytoplasm by confocal microscopy. Overexpression of UBE4B specifically enhanced Tax-induced NF-κB activation, whereas knockdown of UBE4B impaired Tax-induced NF-κB activation and induction of NF-B target genes in Jurkat T cells and ATL cell lines. Although the UBE4B promoter contains putative NF-κB binding sites, its expression was not upregulated by Tax.
    [Show full text]
  • UBE4B Antibody (C-Term) Blocking Peptide Synthetic Peptide Catalog # Bp2111b
    10320 Camino Santa Fe, Suite G San Diego, CA 92121 Tel: 858.875.1900 Fax: 858.622.0609 UBE4B Antibody (C-term) Blocking Peptide Synthetic peptide Catalog # BP2111b Specification UBE4B Antibody (C-term) Blocking Peptide UBE4B Antibody (C-term) Blocking Peptide - - Background Product Information Ubiquitin is a 76 amino acid highly conserved Primary Accession O95155 eukaryotic polypeptide that selectively marks cellular proteins for proteolytic degradation by the 26S proteasome. The process of target UBE4B Antibody (C-term) Blocking Peptide - Additional Information selection, covalent attachment and shuttle to the 26S proteasome is a vital means of regulating the concentrations of key regulatory Gene ID 10277 proteins in the cell by limiting their lifespans. Polyubiquitination is a common feature of this Other Names modification. Serial steps for modification Ubiquitin conjugation factor E4 B, 632-, include the activation of ubiquitin, an UBE4B (<a href="http://www.genenames.or ATP-dependent formation of a thioester bond g/cgi-bin/gene_symbol_report?hgnc_id=125 between ubiquitin and the enzyme E1, transfer 00" target="_blank">HGNC:12500</a>) by transacylation of ubiquitin from E1 to the Target/Specificity ubiquitin conjugating enzyme E2, and covalent The synthetic peptide sequence used to linkage to the target protein directly by E2 or generate the antibody <a href=/product/pr via E3 ligase enzyme. Deubiquitination oducts/AP2111b>AP2111b</a> was enzymes also exist to reverse the marking of selected from the C-term region of human protein substrates. Posttranslational tagging by UBE4B . A 10 to 100 fold molar excess to Ub is involved in a multitude of cellular antibody is recommended.
    [Show full text]
  • The Basis of VCP-Mediated Degeneration: Insights from a Drosophila Model of Disease
    University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Fall 2010 The Basis of VCP-Mediated Degeneration: Insights From a Drosophila Model of Disease Gillian P. Ritson University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Disease Modeling Commons, Medical Molecular Biology Commons, Medical Neurobiology Commons, Molecular and Cellular Neuroscience Commons, and the Neurosciences Commons Recommended Citation Ritson, Gillian P., "The Basis of VCP-Mediated Degeneration: Insights From a Drosophila Model of Disease" (2010). Publicly Accessible Penn Dissertations. 460. https://repository.upenn.edu/edissertations/460 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/460 For more information, please contact [email protected]. The Basis of VCP-Mediated Degeneration: Insights From a Drosophila Model of Disease Abstract Valosin-containing protein (VCP) is a highly conserved molecular chaperone that regulates a wide array of essential cellular processes. Mutations in VCP are causative of degenerative disease that can affect muscle, brain and bone. Despite VCP being implicated in many major pathways in the cell, the mechanism of disease pathogenesis is unknown. To gain insight into the degeneration associated with mutations in VCP, we developed and characterized a Drosophila model of disease that recapitulated VCP mutation- dependent toxicity. VCP is involved in a diverse array of activities, many of which we may not know. Therefore we employed an unbiased genetic screening method that has the potential to uncover unanticipated pathways affected in the disease. Using this approach, we identified four proteins that dominantly suppressed degeneration; one of which was Ube4b, one of the many known ancillary proteins that bind to VCP and determine its function.
    [Show full text]
  • Novel Mutations in Breast Cancer Patients from Southwestern Colombia
    Genetics and Molecular Biology 43, 4, e20190359 (2020) Copyright © 2020, Sociedade Brasileira de Genética. DOI: https://doi.org/10.1590/1678-4685-GMB-2019-0359 Short Communication Human and Medical Genetics Novel mutations in breast cancer patients from southwestern Colombia Melissa Solarte1,2 , Carolina Cortes-Urrea1,2, Nelson Rivera Franco2, Guillermo Barreto2 and Pedro A. Moreno1 1Universidad del Valle, School of Systems and Computing Engineering, Bioinformatics and Biocomputing Laboratory, Cali, Colombia. 2Universidad del Valle, Biology Department, Human molecular Genetic Laboratory, Cali, Colombia. Abstract Breast cancer is the leading cause of death by cancer among women in less developed regions. In Colombia, few pub- lished studies have applied next-generation sequencing technologies to evaluate the genetic factors related to breast cancer. This study characterized the exome of three patients with breast cancer from southwestern Colombia to identify likely pathogenic or disease-related DNA sequence variants in tumor cells. For this, the exomes of three tumor tissue samples from patients with breast cancer were sequenced. The bioinformatics analysis identified two pathogenic vari- ants in Fgfr4 and Nf1 genes, which are highly relevant for this type of cancer. Specifically, variant FGFR4-c.1162G>A predisposes individuals to a significantly accelerated progression of this pathology, while NF1-c.1915C>T negatively alters the encoded protein and should be further investigated to clarify the role of this variant in this neoplasia. More- over, 27 novel likely pathogenic variants were found and 10 genes showed alterations of pathological interest. These results suggest that the novel variants reported here should be further studied to elucidate their role in breast cancer.
    [Show full text]
  • UBE4B, a Microrna-9 Target Gene, Promotes Autophagy-Mediated Tau
    ARTICLE https://doi.org/10.1038/s41467-021-23597-9 OPEN UBE4B,amicroRNA-9 target gene, promotes autophagy-mediated Tau degradation Manivannan Subramanian1,2,7, Seung Jae Hyeon3,7, Tanuza Das4, Yoon Seok Suh1, Yun Kyung Kim 2, ✉ ✉ ✉ Jeong-Soo Lee 1,2, Eun Joo Song 5 , Hoon Ryu3 & Kweon Yu 1,2,6 The formation of hyperphosphorylated intracellular Tau tangles in the brain is a hallmark of Alzheimer’s disease (AD). Tau hyperphosphorylation destabilizes microtubules, promoting 1234567890():,; neurodegeneration in AD patients. To identify suppressors of tau-mediated AD, we perform a screen using a microRNA (miR) library in Drosophila and identify the miR-9 family as sup- pressors of human tau overexpression phenotypes. CG11070,amiR-9a target gene, and its mammalian orthologue UBE4B, an E3/E4 ubiquitin ligase, alleviate eye neurodegeneration, synaptic bouton defects, and crawling phenotypes in Drosophila human tau overexpression models. Total and phosphorylated Tau levels also decrease upon CG11070 or UBE4B over- expression. In mammalian neuroblastoma cells, overexpression of UBE4B and STUB1, which encodes the E3 ligase CHIP, increases the ubiquitination and degradation of Tau. In the Tau-BiFC mouse model, UBE4B and STUB1 overexpression also increase oligomeric Tau degradation. Inhibitor assays of the autophagy and proteasome systems reveal that the autophagy-lysosome system is the major pathway for Tau degradation in this context. These results demonstrate that UBE4B, a miR-9 target gene, promotes autophagy-mediated Tau degradation together with STUB1, and is thus an innovative therapeutic approach for AD. 1 Metabolism and Neurophysiology Research Group, KRIBB, Daejeon, Korea. 2 Convergence Research Center of Dementia, KIST, Seoul, Korea.
    [Show full text]
  • CHARACTERIZATION of the UBIQUITIN LIGASE, UBE4B, in ENDOCYTIC TRAFFICKING Natalie Sirisaengtaksin
    Texas Medical Center Library DigitalCommons@TMC UT GSBS Dissertations and Theses (Open Access) Graduate School of Biomedical Sciences 5-2017 CHARACTERIZATION OF THE UBIQUITIN LIGASE, UBE4B, IN ENDOCYTIC TRAFFICKING Natalie Sirisaengtaksin Natalie Sirisaengtaksin Follow this and additional works at: http://digitalcommons.library.tmc.edu/utgsbs_dissertations Part of the Biology Commons, and the Cell Biology Commons Recommended Citation Sirisaengtaksin, Natalie and Sirisaengtaksin, Natalie, "CHARACTERIZATION OF THE UBIQUITIN LIGASE, UBE4B, IN ENDOCYTIC TRAFFICKING" (2017). UT GSBS Dissertations and Theses (Open Access). 774. http://digitalcommons.library.tmc.edu/utgsbs_dissertations/774 This Dissertation (PhD) is brought to you for free and open access by the Graduate School of Biomedical Sciences at DigitalCommons@TMC. It has been accepted for inclusion in UT GSBS Dissertations and Theses (Open Access) by an authorized administrator of DigitalCommons@TMC. For more information, please contact [email protected]. CHARACTERIZATION OF THE UBIQUITIN LIGASE, UBE4B, IN ENDOCYTIC TRAFFICKING by Natalie Sirisaengtaksin, M.S. APPROVED: ______________________________ Andrew J. Bean, Ph.D. Supervisory Professor ______________________________ Chinnaswamy Jagannath, Ph.D. ______________________________ M. Neal Waxham, Ph.D. ______________________________ Jack C. Waymire Ph.D. ______________________________ Peter E. Zage, Ph.D. APPROVED: ____________________________ Dean, The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences CHARACTERIZATION OF THE UBIQUITIN LIGASE, UBE4B, IN ENDOCYTIC TRAFFICKING A DISSERTATION Presented to the Faculty of The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY by Natalie Sirisaengtaksin, M.S. Houston, Texas May, 2017 Dedication This work is dedicated to my father, who is irreplaceable.
    [Show full text]
  • Downloaded As Previously Described [46]
    UC San Diego UC San Diego Previously Published Works Title Neuroblastoma patient outcomes, tumor differentiation, and ERK activation are correlated with expression levels of the ubiquitin ligase UBE4B. Permalink https://escholarship.org/uc/item/5v6473gj Journal Genes & cancer, 7(1-2) ISSN 1947-6019 Authors Woodfield, Sarah E Guo, Rong Jun Liu, Yin et al. Publication Date 2016 DOI 10.18632/genesandcancer.97 Peer reviewed eScholarship.org Powered by the California Digital Library University of California www.impactjournals.com/Genes&Cancer Genes & Cancer, Vol. 7 (1-2), January 2016 Neuroblastoma patient outcomes, tumor differentiation, and ERK activation are correlated with expression levels of the ubiquitin ligase UBE4B Sarah E. Woodfield1, Rong Jun Guo2,7, Yin Liu3,4, Angela M. Major2, Emporia Faith Hollingsworth2, Sandra Indiviglio1, Sarah B. Whittle1, Qianxing Mo5, Andrew J. Bean3, Michael Ittmann2,6, Dolores Lopez-Terrada1,2 and Peter E. Zage1 1 Department of Pediatrics, Section of Hematology/Oncology, Baylor College of Medicine, Houston, TX, USA 2 Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, USA 3 Department of Neurobiology and Anatomy, The University of Texas Medical School & Graduate School of Biomedical Sciences, Houston, TX, USA 4 Department of Biomedical Engineering, University of Texas at Austin, Austin, TX, USA 5 Department of Medicine, Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, TX, USA 6 The Michael E. DeBakey Department of Veterans Affairs Medical Center, Houston, TX, USA 7 Department of Pathology, University of Alabama Birmingham, Birmingham, AL, USA Correspondence to: Peter E. Zage, email: [email protected] Keywords: neuroblastoma, UBE4B, differentiation, ERK, retinoic acid Received: October 08, 2015 Accepted: February 14, 2016 Published: February 16, 2016 This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
    [Show full text]
  • The Identification of Novel Regions for Reproduction Trait in Landrace and Large White Pigs Using a Single Step Genome-Wide Association Study
    Open Access Asian-Australas J Anim Sci Vol. 31, No. 12:1852-1862 December 2018 https://doi.org/10.5713/ajas.18.0072 pISSN 1011-2367 eISSN 1976-5517 The identification of novel regions for reproduction trait in Landrace and Large White pigs using a single step genome-wide association study Rattikan Suwannasing1, Monchai Duangjinda1,*, Wuttigrai Boonkum1, Rutjawate Taharnklaew2, and Komson Tuangsithtanon3 * Corresponding Author: Monchai Duangjinda Objective: The purpose of this study was to investigate a single step genome-wide association Tel: +66-43-202362, Fax: +66-43-202361, E-mail: [email protected] study (ssGWAS) for identifying genomic regions affecting reproductive traits in Landrace and Large White pigs. 1 Department of Animal Science, Faculty of Agriculture, Methods: The traits included the number of pigs weaned per sow per year (PWSY), the Khon Kaen University, Khon Kaen 40002, Thailand 2 Research and Development Center Betagro Group, number of litters per sow per year (LSY), pigs weaned per litters (PWL), born alive per litters Pathumthani 12120, Thailand (BAL), non-productive day (NPD) and wean to conception interval per litters (W2CL). A 3 Betagro Hybrid International Company Limited, total of 321 animals (140 Landrace and 181 Large White pigs) were genotyped with the Illumina Bangkok 10210, Thailand Porcine SNP 60k BeadChip, containing 61,177 single nucleotide polymorphisms (SNPs), ORCID while multiple traits single-step genomic BLUP method was used to calculate variances of Rattikan Suwannasing 5 SNP windows for 11,048 Landrace and 13,985 Large White data records. https://orcid.org/0000-0002-6950-4384 Monchai Duangjinda Results: The outcome of ssGWAS on the reproductive traits identified twenty-five and twenty- https://orcid.org/0000-0001-7044-8271 two SNPs associated with reproductive traits in Landrace and Large White, respectively.
    [Show full text]
  • The Ubiquitin Ligase Ube4b Is Required for Efficient Epidermal Growth Factor Receptor Degradation
    The Texas Medical Center Library DigitalCommons@TMC The University of Texas MD Anderson Cancer Center UTHealth Graduate School of The University of Texas MD Anderson Cancer Biomedical Sciences Dissertations and Theses Center UTHealth Graduate School of (Open Access) Biomedical Sciences 5-2010 THE UBIQUITIN LIGASE UBE4B IS REQUIRED FOR EFFICIENT EPIDERMAL GROWTH FACTOR RECEPTOR DEGRADATION Natalie Sirisaengtaksin Follow this and additional works at: https://digitalcommons.library.tmc.edu/utgsbs_dissertations Part of the Biochemistry Commons, and the Cell Biology Commons Recommended Citation Sirisaengtaksin, Natalie, "THE UBIQUITIN LIGASE UBE4B IS REQUIRED FOR EFFICIENT EPIDERMAL GROWTH FACTOR RECEPTOR DEGRADATION" (2010). The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences Dissertations and Theses (Open Access). 45. https://digitalcommons.library.tmc.edu/utgsbs_dissertations/45 This Thesis (MS) is brought to you for free and open access by the The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences at DigitalCommons@TMC. It has been accepted for inclusion in The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences Dissertations and Theses (Open Access) by an authorized administrator of DigitalCommons@TMC. For more information, please contact [email protected]. THE UBIQUITIN LIGASE UBE4B IS REQUIRED FOR EFFICIENT EPIDERMAL GROWTH FACTOR RECEPTOR DEGRADATION by Natalie Sirisaengtaksin, B.S. APPROVED:
    [Show full text]
  • Supporting Information
    Supporting Information Friedman et al. 10.1073/pnas.0812446106 SI Results and Discussion intronic miR genes in these protein-coding genes. Because in General Phenotype of Dicer-PCKO Mice. Dicer-PCKO mice had many many cases the exact borders of the protein-coding genes are defects in additional to inner ear defects. Many of them died unknown, we searched for miR genes up to 10 kb from the around birth, and although they were born at a similar size to hosting-gene ends. Out of the 488 mouse miR genes included in their littermate heterozygote siblings, after a few weeks the miRBase release 12.0, 192 mouse miR genes were found as surviving mutants were smaller than their heterozygote siblings located inside (distance 0) or in the vicinity of the protein-coding (see Fig. 1A) and exhibited typical defects, which enabled their genes that are expressed in the P2 cochlear and vestibular SE identification even before genotyping, including typical alopecia (Table S2). Some coding genes include huge clusters of miRNAs (in particular on the nape of the neck), partially closed eyelids (e.g., Sfmbt2). Other genes listed in Table S2 as coding genes are [supporting information (SI) Fig. S1 A and C], eye defects, and actually predicted, as their transcript was detected in cells, but weakness of the rear legs that were twisted backwards (data not the predicted encoded protein has not been identified yet, and shown). However, while all of the mutant mice tested exhibited some of them may be noncoding RNAs. Only a single protein- similar deafness and stereocilia malformation in inner ear HCs, coding gene that is differentially expressed in the cochlear and other defects were variable in their severity.
    [Show full text]
  • Characterization of the Ubiquitin Ligase, Ube4b, in Endocytic Trafficking
    The Texas Medical Center Library DigitalCommons@TMC The University of Texas MD Anderson Cancer Center UTHealth Graduate School of The University of Texas MD Anderson Cancer Biomedical Sciences Dissertations and Theses Center UTHealth Graduate School of (Open Access) Biomedical Sciences 5-2017 CHARACTERIZATION OF THE UBIQUITIN LIGASE, UBE4B, IN ENDOCYTIC TRAFFICKING Natalie Sirisaengtaksin Natalie Sirisaengtaksin Follow this and additional works at: https://digitalcommons.library.tmc.edu/utgsbs_dissertations Part of the Biology Commons, and the Cell Biology Commons Recommended Citation Sirisaengtaksin, Natalie and Sirisaengtaksin, Natalie, "CHARACTERIZATION OF THE UBIQUITIN LIGASE, UBE4B, IN ENDOCYTIC TRAFFICKING" (2017). The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences Dissertations and Theses (Open Access). 774. https://digitalcommons.library.tmc.edu/utgsbs_dissertations/774 This Dissertation (PhD) is brought to you for free and open access by the The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences at DigitalCommons@TMC. It has been accepted for inclusion in The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences Dissertations and Theses (Open Access) by an authorized administrator of DigitalCommons@TMC. For more information, please contact [email protected]. CHARACTERIZATION OF THE UBIQUITIN LIGASE, UBE4B, IN ENDOCYTIC TRAFFICKING by Natalie Sirisaengtaksin, M.S. APPROVED: ______________________________
    [Show full text]