Novel Extensions of Label Propagation for Biomarker Discovery in Genomic Data

Total Page:16

File Type:pdf, Size:1020Kb

Novel Extensions of Label Propagation for Biomarker Discovery in Genomic Data NOVEL EXTENSIONS OF LABEL PROPAGATION FOR BIOMARKER DISCOVERY IN GENOMIC DATA by Matthew E. Stokes B.S. Systems and Control Engineering, Case Western Reserve University, 2008 M.S. Intelligent Systems Program / Biomedical Informatics, University of Pittsburgh 2011 Submitted to the Graduate Faculty of the Kenneth P. Dietrich School of Arts and Sciences in fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2014 UNIVERSITY OF PITTSBURGH DIETRICH SCHOOL OF ARTS AND SCIENCES This dissertation proposal was presented by Matthew E. Stokes on July 17, 2014 and approved by M. Michael Barmada, PhD, Department of Human Genetics Gregory F. Cooper, MD, PhD, Department of Biomedical Informatics and the Intelligent Systems Program Milos Hauskrecht, PhD, Department of Computer Science and the Intelligent Systems Program Dissertation Advisor: Shyam Visweswaran, MD, PhD, Department of Biomedical Informatics and the Intelligent Systems Program ii NOVEL EXTENSIONS OF LABEL PORPAGATION FOR BIOMARKER DISCOVERY IN GENOMIC DATA Matthew E. Stokes, M.S University of Pittsburgh, 2014 Copyright © by Matthew E. Stokes 2014 iii NOVEL EXTENSIONS OF LABEL PROPAGATION FOR BIOMARKER DISCOVERY IN GENOMIC DATA Matthew E. Stokes, PhD University of Pittsburgh, 2014 One primary goal of analyzing genomic data is the identification of biomarkers which may be causative of, correlated with, or otherwise biologically relevant to disease phenotypes. In this work, I implement and extend a multivariate feature ranking algorithm called label propagation (LP) for biomarker discovery in genome-wide single-nucleotide polymorphism (SNP) data. This graph-based algorithm utilizes an iterative propagation method to efficiently compute the strength of association between a SNP and a phenotype. I developed three extensions to the LP algorithm, with the goal of tailoring it to genomic data. The first extension is a modification to the LP score which yields a variable-level score for each SNP, rather than a score for each SNP genotype. The second extension incorporates prior biological knowledge that is encoded as a prior value for each SNP. The third extension enables the combination of rankings produced by LP and another feature ranking algorithm. iv The LP algorithm, its extensions, and two control algorithms (chi squared and sparse logistic regression) were applied to 11 genomic datasets, including a synthetic dataset, a semi- synthetic dataset, and nine genome-wide association study (GWAS) datasets covering eight diseases. The quality of each feature ranking algorithm was evaluated by using a subset of top- ranked SNPs to construct a classifier, whose predictive power was evaluated in terms of the area under the Receiver Operating Characteristic curve. Top-ranked SNPs were also evaluated for prior evidence of being associated with disease using evidence from the literature. The LP algorithm was found to be effective at identifying predictive and biologically meaningful SNPs. The single-score extension performed significantly better than the original algorithm on the GWAS datasets. The prior knowledge extension did not improve on the feature ranking results, and in some cases it reduced the predictive power of top-ranked variants. The ranking combination method was effective for some pairs of algorithms, but not for others. Overall, this work’s main results are the formulation and evaluation of several algorithmic extensions of LP for use in the analysis of genomic data, as well as the identification of several disease-associated SNPs. v TABLE OF CONTENTS NOVEL EXTENSIONS OF LABEL PROPAGATION FOR BIOMARKER DISCOVERY IN GENOMIC DATA ................................................................................................................. IV GLOSSARY................................................................................................................................ XV 1.0 INTRODUCTION ........................................................................................................ 1 1.1 DIMENSIONALITY REDUCTION .................................................................. 2 1.2 OVERVIEW OF LABEL PROPAGATION ..................................................... 2 1.3 HYPOTHESIS AND SPECIFIC AIMS............................................................. 3 1.4 CONTRIBUTIONS ............................................................................................. 4 1.5 OVERVIEW OF DISSERTATION ................................................................... 5 2.0 BACKGROUND .......................................................................................................... 6 2.1 GENETIC BASIS OF COMPLEX DISEASES ................................................ 6 2.2 GENETIC VARIATIONS .................................................................................. 8 2.3 GENOME-WIDE ASSOCIATION STUDIES................................................ 10 2.4 ANALYSIS OF GENETIC VARIATION DATA .......................................... 11 2.4.1 Linkage Disequilibrium ................................................................................ 12 2.4.2 Population Stratification ............................................................................... 13 2.4.3 Computational Complexity ........................................................................... 13 2.4.4 Avoiding Bias ................................................................................................. 14 vi 2.5 DIMENSIONALITY REDUCTION ................................................................ 15 2.5.1 Types of Dimensionality Reduction ............................................................. 16 2.6 FEATURE SELECTION METHODS ............................................................ 18 2.6.1 Filter Methods ................................................................................................ 19 2.6.2 Wrapper Methods.......................................................................................... 21 2.6.3 Embedded Methods ....................................................................................... 22 2.6.4 Combining Feature Rankings....................................................................... 23 3.0 ALGORITHMIC METHODS .................................................................................. 26 3.1 ALGORITHMIC DETAILS ............................................................................. 26 3.2 DETAILS OF LP ALGORITHM FOR SNP DATA ...................................... 28 3.3 ADDITIONAL LP ALGORITHMS ................................................................ 31 3.4 ALGORITHMIC EXTENSIONS .................................................................... 36 3.4.1 LP1 score ........................................................................................................ 36 3.4.2 Incorporation of prior knowledge ................................................................ 39 3.4.2.1 Sources of Knowledge ......................................................................... 39 3.4.2.2 Use of Prior Knowledge in LP ........................................................... 42 3.4.3 Combining feature rankings ......................................................................... 44 4.0 EXPERIMENTAL METHODS ................................................................................ 46 4.1 DATASETS ........................................................................................................ 46 4.1.1 Synthetic dataset ............................................................................................ 47 4.1.2 GAW17 semi-synthetic dataset..................................................................... 48 4.1.3 Late-onset Alzheimer’s disease datasets ...................................................... 49 4.1.4 WTCCC GWAS datasets .............................................................................. 51 vii 4.1.5 Quality control for GWAS datasets ............................................................. 51 4.1.6 Prior knowledge sources ............................................................................... 52 4.2 EVALUATION .................................................................................................. 54 4.2.1 Precision-recall curves .................................................................................. 54 4.2.2 Evaluation of predictive performance ......................................................... 55 4.2.3 Reproducibility of biomarkers ..................................................................... 56 4.2.4 Evidence of biological validity ...................................................................... 57 4.3 COMPARISON ALGORITHMS..................................................................... 58 4.3.1 Chi squared test ............................................................................................. 58 4.3.2 Sigmoid Weighted ReliefF ............................................................................ 59 4.3.3 Sparse Logistic Regression ........................................................................... 60 5.0 EXPERIMENTAL RESULTS .................................................................................. 62 5.1 EVALUATION OF LP3 AND LP1 .................................................................. 62 5.1.1 Synthetic Data ................................................................................................ 63 5.1.2 GAW17 Data .................................................................................................
Recommended publications
  • WO 2Ull/13162O Al
    (12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property Organization International Bureau (10) International Publication Number (43) International Publication Date Χ 1 / A 1 27 October 2011 (27.10.2011) WO 2Ull/13162o Al (51) International Patent Classification: AO, AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ, C12N 9/02 (2006.01) A61K 38/44 (2006.01) CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, DO, A61K 38/17 (2006.01) DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL, IN, IS, JP, KE, KG, KM, KN, KP, (21) International Application Number: KR, KZ, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD, PCT/EP20 11/056142 ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, (22) International Filing Date: NO, NZ, OM, PE, PG, PH, PL, PT, RO, RS, RU, SC, SD, 18 April 201 1 (18.04.201 1) SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW. (25) Filing Language: English (84) Designated States (unless otherwise indicated, for every (26) Publication Langi English kind of regional protection available): ARIPO (BW, GH, (30) Priority Data: GM, KE, LR, LS, MW, MZ, NA, SD, SL, SZ, TZ, UG, 10160368.6 19 April 2010 (19.04.2010) EP ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK, (71) Applicants (for all designated States except US): MEDI- EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, FT, LT, LU, ZINISCHE UNIVERSITAT INNSBRUCK [AT/AT]; LV, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, Christoph-Probst-Platz, Innrain 52, A-6020 Innsbruck SM, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, (AT).
    [Show full text]
  • CRL4-DCAF12 Ubiquitin Ligase Controls MOV10 RNA Helicase During Spermatogenesis and T Cell Activation
    International Journal of Molecular Sciences Article CRL4-DCAF12 Ubiquitin Ligase Controls MOV10 RNA Helicase during Spermatogenesis and T Cell Activation Tomas Lidak 1,2, Nikol Baloghova 1, Vladimir Korinek 1,3 , Radislav Sedlacek 4, Jana Balounova 4 , Petr Kasparek 4 and Lukas Cermak 1,* 1 Laboratory of Cancer Biology, Institute of Molecular Genetics of the Czech Academy of Sciences, 252 42 Vestec, Czech Republic; [email protected] (T.L.); [email protected] (N.B.); [email protected] (V.K.) 2 Faculty of Science, Charles University, 128 00 Prague, Czech Republic 3 Laboratory of Cell and Developmental Biology, Institute of Molecular Genetics of the Czech Academy of Sciences, 252 42 Vestec, Czech Republic 4 Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, 252 50 Vestec, Czech Republic; [email protected] (R.S.); [email protected] (J.B.); [email protected] (P.K.) * Correspondence: [email protected] Abstract: Multisubunit cullin-RING ubiquitin ligase 4 (CRL4)-DCAF12 recognizes the C-terminal degron containing acidic amino acid residues. However, its physiological roles and substrates are largely unknown. Purification of CRL4-DCAF12 complexes revealed a wide range of potential substrates, including MOV10, an “ancient” RNA-induced silencing complex (RISC) complex RNA helicase. We show that DCAF12 controls the MOV10 protein level via its C-terminal motif in a Citation: Lidak, T.; Baloghova, N.; proteasome- and CRL-dependent manner. Next, we generated Dcaf12 knockout mice and demon- Korinek, V.; Sedlacek, R.; Balounova, strated that the DCAF12-mediated degradation of MOV10 is conserved in mice and humans.
    [Show full text]
  • Human Lectins, Their Carbohydrate Affinities and Where to Find Them
    biomolecules Review Human Lectins, Their Carbohydrate Affinities and Where to Review HumanFind Them Lectins, Their Carbohydrate Affinities and Where to FindCláudia ThemD. Raposo 1,*, André B. Canelas 2 and M. Teresa Barros 1 1, 2 1 Cláudia D. Raposo * , Andr1 é LAQVB. Canelas‐Requimte,and Department M. Teresa of Chemistry, Barros NOVA School of Science and Technology, Universidade NOVA de Lisboa, 2829‐516 Caparica, Portugal; [email protected] 12 GlanbiaLAQV-Requimte,‐AgriChemWhey, Department Lisheen of Chemistry, Mine, Killoran, NOVA Moyne, School E41 of ScienceR622 Co. and Tipperary, Technology, Ireland; canelas‐ [email protected] NOVA de Lisboa, 2829-516 Caparica, Portugal; [email protected] 2* Correspondence:Glanbia-AgriChemWhey, [email protected]; Lisheen Mine, Tel.: Killoran, +351‐212948550 Moyne, E41 R622 Tipperary, Ireland; [email protected] * Correspondence: [email protected]; Tel.: +351-212948550 Abstract: Lectins are a class of proteins responsible for several biological roles such as cell‐cell in‐ Abstract:teractions,Lectins signaling are pathways, a class of and proteins several responsible innate immune for several responses biological against roles pathogens. such as Since cell-cell lec‐ interactions,tins are able signalingto bind to pathways, carbohydrates, and several they can innate be a immuneviable target responses for targeted against drug pathogens. delivery Since sys‐ lectinstems. In are fact, able several to bind lectins to carbohydrates, were approved they by canFood be and a viable Drug targetAdministration for targeted for drugthat purpose. delivery systems.Information In fact, about several specific lectins carbohydrate were approved recognition by Food by andlectin Drug receptors Administration was gathered for that herein, purpose. plus Informationthe specific organs about specific where those carbohydrate lectins can recognition be found by within lectin the receptors human was body.
    [Show full text]
  • STAU2 (NM 001164380) Human Untagged Clone Product Data
    OriGene Technologies, Inc. 9620 Medical Center Drive, Ste 200 Rockville, MD 20850, US Phone: +1-888-267-4436 [email protected] EU: [email protected] CN: [email protected] Product datasheet for SC327098 STAU2 (NM_001164380) Human Untagged Clone Product data: Product Type: Expression Plasmids Product Name: STAU2 (NM_001164380) Human Untagged Clone Tag: Tag Free Symbol: STAU2 Synonyms: 39K2; 39K3 Vector: pCMV6-Entry (PS100001) E. coli Selection: Kanamycin (25 ug/mL) Cell Selection: Neomycin This product is to be used for laboratory only. Not for diagnostic or therapeutic use. View online » ©2021 OriGene Technologies, Inc., 9620 Medical Center Drive, Ste 200, Rockville, MD 20850, US 1 / 3 STAU2 (NM_001164380) Human Untagged Clone – SC327098 Fully Sequenced ORF: >NCBI ORF sequence for NM_001164380, the custom clone sequence may differ by one or more nucleotides ATGGCAAACCCAAAAGAGAAAACTGCAATGTGTCTGGTAAATGAGTTAGCCCGTTTCAATAGAGTCCAAC CCCAGTATAAACTTCTGAATGAAAGAGGGCCTGCTCATTCAAAGATGTTCTCAGTGCAGCTGAGTCTTGG TGAGCAGACATGGGAATCCGAAGGCAGCAGTATAAAGAAGGCTCAGCAGGCTGTTGCCAATAAAGCTTTG ACTGAATCTACGCTTCCCAAACCAGTTCAGAAGCCACCCAAAAGTAATGTTAACAATAACCCAGGCAGTA TAACTCCAACTGTGGAACTGAATGGGCTTGCTATGAAAAGGGGAGAGCCTGCCATCTACAGGCCATTAGA TCCAAAGCCATTCCCAAATTATAGAGCTAATTACAACTTTCGGGGCATGTACAATCAGAGGTATCATTGC CCAGTGCCTAAGATCTTTTATGTTCAGCTCACTGTAGGAAATAATGAATTTTTTGGGGAAGGAAAGACTC GACAAGCTGCTAGACACAATGCTGCAATGAAAGCCCTCCAAGCACTGCAGAATGAACCTATTCCAGAAAG ATCTCCTCAGAATGGTGAATCAGGAAAGGATGTGGATGATGACAAAGATGCAAATAAGTCTGAGATCAGC TTAGTGTTTGAAATTGCTCTGAAGCGAAATATGCCTGTCAGTTTTGAGGTTATTAAAGAAAGTGGACCAC
    [Show full text]
  • Large-Scale Identification of Clonal Hematopoiesis and Mutations Recurrent in Blood Cancers
    RESEARCH BRIEF Large-scale Identification of Clonal Hematopoiesis and Mutations Recurrent in Blood Cancers Julie E. Feusier1,2, Sasi Arunachalam1,3,4, Tsewang Tashi3,5,6, Monika J. Baker1, Chad VanSant-Webb1, Amber Ferdig1, Bryan E. Welm3,7, Juan L. Rodriguez-Flores8, Christopher Ours1, Lynn B. Jorde2, Josef T. Prchal3,5,6, and Clinton C. Mason1 ABSTRACT Clonal hematopoiesis of indeterminate potential (CHIP) is characterized by detect- able hematopoietic-associated gene mutations in a person without evidence of hematologic malignancy. We sought to identify additional cancer-presenting mutations usable for CHIP detection by performing a data mining analysis of 48 somatic mutation landscape studies report- ing mutations at diagnoses of 7,430 adult and pediatric patients with leukemia or other hematologic malignancy. Following extraction of 20,141 protein-altering mutations, we identified 434 significantly recurrent mutation hotspots, 364 of which occurred at loci confidently assessable for CHIP. We then performed an additional large-scale analysis of whole-exome sequencing data from 4,538 persons belonging to three noncancer cohorts for clonal mutations. We found the combined cohort prevalence of CHIP with mutations identical to those reported at blood cancer mutation hotspots to be 1.8%, and that some of these CHIP mutations occurred in children. Our findings may help to improve CHIP detec- tion and precancer surveillance for both children and adults. SIGNIFICANCE: This study identifies frequently occurring mutations across several blood cancers that may drive hematologic malignancies and signal increased risk for cancer when detected in healthy persons. We find clonal mutations at these hotspots in a substantial number of individuals from noncan- cer cohorts, including children, showcasing potential for improved precancer surveillance.
    [Show full text]
  • Anti-STAU2 Monoclonal Antibody, Clone 2D7 (DCABH-6925) This Product Is for Research Use Only and Is Not Intended for Diagnostic Use
    Anti-STAU2 monoclonal antibody, clone 2D7 (DCABH-6925) This product is for research use only and is not intended for diagnostic use. PRODUCT INFORMATION Product Overview Mouse monoclonal to STAU2 - C-terminal Antigen Description RNA-binding protein required for the microtubule-dependent transport of neuronal RNA from the cell body to the dendrite. As protein synthesis occurs within the dendrite, the localization of specific mRNAs to dendrites may be a prerequisite for neurite outgrowth and plasticity at sites distant from the cell body. Immunogen Synthetic peptide within Human STAU2 aa 379-570 (C terminal). The exact sequence is proprietary.Database link: Q9NUL3 Source/Host Mouse Species Reactivity Mouse, Rat, Human Clone 2D7 Conjugate Unconjugated Applications WB, ICC/IF Format Liquid Size 125 μl Buffer pH: 9.0; Purification elution buffer is: 0.1 M glycine-HCl, pH 2.7 + Neutralizing buffer: 1 M Tris- HCl Preservative None Storage Store at +4°C short term (1-2 weeks). Upon delivery aliquot. Store at -20°C long term. Avoid freeze / thaw cycle. Ship Shipped at 4°C. GENE INFORMATION Gene Name STAU2 staufen, RNA binding protein, homolog 2 (Drosophila) [ Homo sapiens ] 45-1 Ramsey Road, Shirley, NY 11967, USA Email: [email protected] Tel: 1-631-624-4882 Fax: 1-631-938-8221 1 © Creative Diagnostics All Rights Reserved Official Symbol STAU2 Synonyms STAU2; staufen, RNA binding protein, homolog 2 (Drosophila); staufen (Drosophila, RNA binding protein) homolog 2; double-stranded RNA-binding protein Staufen homolog 2; 39K2; staufen homolog 2; 39K3; MGC119606; DKFZp781K0371; Entrez Gene ID 27067 Protein Refseq NP_001157852 UniProt ID Q9NUL3 Chromosome Location 8q21.11 Function double-stranded RNA binding; 45-1 Ramsey Road, Shirley, NY 11967, USA Email: [email protected] Tel: 1-631-624-4882 Fax: 1-631-938-8221 2 © Creative Diagnostics All Rights Reserved.
    [Show full text]
  • Mouse Tmtc2 Conditional Knockout Project (CRISPR/Cas9)
    https://www.alphaknockout.com Mouse Tmtc2 Conditional Knockout Project (CRISPR/Cas9) Objective: To create a Tmtc2 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering. Strategy summary: The Tmtc2 gene (NCBI Reference Sequence: NM_177368 ; Ensembl: ENSMUSG00000036019 ) is located on Mouse chromosome 10. 12 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 12 (Transcript: ENSMUST00000061506). Exon 2 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Tmtc2 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-34F6 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Exon 2 starts from about 3.35% of the coding region. The knockout of Exon 2 will result in frameshift of the gene. The size of intron 1 for 5'-loxP site insertion: 159879 bp, and the size of intron 2 for 3'-loxP site insertion: 42438 bp. The size of effective cKO region: ~1071 bp. The cKO region does not have any other known gene. Page 1 of 8 https://www.alphaknockout.com Overview of the Targeting Strategy Wildtype allele gRNA region 5' gRNA region 3' 1 2 12 Targeting vector Targeted allele Constitutive KO allele (After Cre recombination) Legends Exon of mouse Tmtc2 Homology arm cKO region loxP site Page 2 of 8 https://www.alphaknockout.com Overview of the Dot Plot Window size: 10 bp Forward Reverse Complement Sequence 12 Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats.
    [Show full text]
  • Discovery of an O-Mannosylation Pathway Selectively Serving Cadherins and Protocadherins
    Discovery of an O-mannosylation pathway selectively serving cadherins and protocadherins Ida Signe Bohse Larsena,1, Yoshiki Narimatsua,1, Hiren Jitendra Joshia,1, Lina Siukstaitea, Oliver J. Harrisonb, Julia Braschb, Kerry M. Goodmanb, Lars Hansena, Lawrence Shapirob,c,d, Barry Honigb,c,d,e, Sergey Y. Vakhrusheva, Henrik Clausena, and Adnan Halima,2,3 aDepartment of Cellular and Molecular Medicine, Faculty of Health Sciences, Copenhagen Center for Glycomics, University of Copenhagen, DK-2200 Copenhagen, Denmark; bDepartment of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032; cZuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10032; dDepartment of Systems Biology, Columbia University, New York, NY 10032; and eHoward Hughes Medical Institute, Columbia University, New York, NY 10032 Edited by Stuart A. Kornfeld, Washington University School of Medicine, St. Louis, MO, and approved September 6, 2017 (received for review May 22, 2017) The cadherin (cdh) superfamily of adhesion molecules carry muscular dystrophies that have been designated α-dystroglyca- O-linked mannose (O-Man) glycans at highly conserved sites nopathies because deficient O-Man glycosylation of α-DG dis- localized to specific β-strands of their extracellular cdh (EC) domains. rupts the interaction between the dystrophin glycoprotein complex These O-Man glycans do not appear to be elongated like O-Man and the ECM (7–9). Several studies have also implicated de- glycans found on α-dystroglycan (α-DG), and we recently demon- ficiency of POMT2 with E-cdh dysfunction (10–12), although di- strated that initiation of cdh/protocadherin (pcdh) O-Man glycosyl- rect evidence for a role in glycosylation of cdhs and pcdhs is ation is not dependent on the evolutionary conserved POMT1/ missing.
    [Show full text]
  • Analyses of Long-Term Fine Particulate Air Pollution Exposure, Genetic Variants, and Blood DNA Methylation Age in the Elderly
    On Aging: Analyses of Long-Term Fine Particulate Air Pollution Exposure, Genetic Variants, and Blood DNA Methylation Age in the Elderly The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:40050003 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA On Aging: Analyses of Long-term Fine Particulate Air Pollution Exposure, Genetic Variants, and Blood DNA Methylation Age in the Elderly A dissertation presented by Jamaji Chilaka Nwanaji-Enwerem to The Committee on Higher Degrees in Biological Sciences in Public Health in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Biological Sciences in Public Health Harvard University Cambridge, Massachusetts November, 2017 © 2017 Jamaji Chilaka Nwanaji-Enwerem All rights reserved. Dissertation Advisors: Andrea A. Baccarelli, MD, PhD Jamaji Chilaka Nwanaji-Enwerem & Marc G. Weisskopf, PhD, ScD On Aging: Analyses of Long-term Fine Particulate Air Pollution Exposure, Genetic Variants, and Blood DNA Methylation Age in the Elderly Abstract Human aging is often accompanied by the development of chronic disease. Research has identified molecular processes that are shared by aging-related diseases, and it is widely believed that pre- clinical changes in these aging-related molecular processes (i.e. measures of “biological age”) may be more informative of morbidity and mortality risks than simple chronological age.
    [Show full text]
  • Comparative Analysis of the Ubiquitin-Proteasome System in Homo Sapiens and Saccharomyces Cerevisiae
    Comparative Analysis of the Ubiquitin-proteasome system in Homo sapiens and Saccharomyces cerevisiae Inaugural-Dissertation zur Erlangung des Doktorgrades der Mathematisch-Naturwissenschaftlichen Fakultät der Universität zu Köln vorgelegt von Hartmut Scheel aus Rheinbach Köln, 2005 Berichterstatter: Prof. Dr. R. Jürgen Dohmen Prof. Dr. Thomas Langer Dr. Kay Hofmann Tag der mündlichen Prüfung: 18.07.2005 Zusammenfassung I Zusammenfassung Das Ubiquitin-Proteasom System (UPS) stellt den wichtigsten Abbauweg für intrazelluläre Proteine in eukaryotischen Zellen dar. Das abzubauende Protein wird zunächst über eine Enzym-Kaskade mit einer kovalent gebundenen Ubiquitinkette markiert. Anschließend wird das konjugierte Substrat vom Proteasom erkannt und proteolytisch gespalten. Ubiquitin besitzt eine Reihe von Homologen, die ebenfalls posttranslational an Proteine gekoppelt werden können, wie z.B. SUMO und NEDD8. Die hierbei verwendeten Aktivierungs- und Konjugations-Kaskaden sind vollständig analog zu der des Ubiquitin- Systems. Es ist charakteristisch für das UPS, daß sich die Vielzahl der daran beteiligten Proteine aus nur wenigen Proteinfamilien rekrutiert, die durch gemeinsame, funktionale Homologiedomänen gekennzeichnet sind. Einige dieser funktionalen Domänen sind auch in den Modifikations-Systemen der Ubiquitin-Homologen zu finden, jedoch verfügen diese Systeme zusätzlich über spezifische Domänentypen. Homologiedomänen lassen sich als mathematische Modelle in Form von Domänen- deskriptoren (Profile) beschreiben. Diese Deskriptoren können wiederum dazu verwendet werden, mit Hilfe geeigneter Verfahren eine gegebene Proteinsequenz auf das Vorliegen von entsprechenden Homologiedomänen zu untersuchen. Da die im UPS involvierten Homologie- domänen fast ausschließlich auf dieses System und seine Analoga beschränkt sind, können domänen-spezifische Profile zur Katalogisierung der UPS-relevanten Proteine einer Spezies verwendet werden. Auf dieser Basis können dann die entsprechenden UPS-Repertoires verschiedener Spezies miteinander verglichen werden.
    [Show full text]
  • TMTC2 (G-15): Sc-169651
    SAN TA C RUZ BI OTEC HNOL OG Y, INC . TMTC2 (G-15): sc-169651 BACKGROUND SOURCE The tetratricopeptide repeat (TPR) motif is a degenerate, 34 amino acid TMTC2 (G-15) is an affinity purified goat polyclonal antibody raised against sequence found in many proteins and acts to mediate protein-protein inter - a peptide mapping within an internal region of TMTC2 of human origin. actions in various pathways. At the sequence level, there can be up to 16 tandem TPR repeats, each of which has a helix-turn-helix shape that stacks PRODUCT on other TPR repeats to achieve ligand binding specificity. TMTC2 (trans - Each vial contains 200 µg IgG in 1.0 ml of PBS with < 0.1% sodium azide membrane and tetratricopeptide repeat containing 2) is an 836 amino acid and 0.1% gelatin. multi-pass membrane protein that contains ten TPR repeats and belongs to the TMTC family. The gene encoding TMTC2 maps to human chromosome 12, Blocking peptide available for competition studies, sc-169651 P, (100 µg which encodes over 1,100 genes and comprises approximately 4.5% of the peptide in 0.5 ml PBS containing < 0.1% sodium azide and 0.2% BSA). human genome. Chromosome 12 is associated with a variety of diseases and afflictions, including hypochondrogenesis, achondrogenesis, Kniest dysplasia, APPLICATIONS Noonan syndrome and trisomy 12p, which causes facial developmental TMTC2 (G-15) is recommended for detection of TMTC2 of mouse, rat and defects and seizure disorders. human origin by Western Blotting (starting dilution 1:200, dilution range 1:100-1:1000), immunofluorescence (starting dilution 1:50, dilution range REFERENCES 1:50-1:500) and solid phase ELISA (starting dilution 1:30, dilution range 1.
    [Show full text]
  • BMC Biology Biomed Central
    BMC Biology BioMed Central Research article Open Access Classification and nomenclature of all human homeobox genes PeterWHHolland*†1, H Anne F Booth†1 and Elspeth A Bruford2 Address: 1Department of Zoology, University of Oxford, South Parks Road, Oxford, OX1 3PS, UK and 2HUGO Gene Nomenclature Committee, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK Email: Peter WH Holland* - [email protected]; H Anne F Booth - [email protected]; Elspeth A Bruford - [email protected] * Corresponding author †Equal contributors Published: 26 October 2007 Received: 30 March 2007 Accepted: 26 October 2007 BMC Biology 2007, 5:47 doi:10.1186/1741-7007-5-47 This article is available from: http://www.biomedcentral.com/1741-7007/5/47 © 2007 Holland et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: The homeobox genes are a large and diverse group of genes, many of which play important roles in the embryonic development of animals. Increasingly, homeobox genes are being compared between genomes in an attempt to understand the evolution of animal development. Despite their importance, the full diversity of human homeobox genes has not previously been described. Results: We have identified all homeobox genes and pseudogenes in the euchromatic regions of the human genome, finding many unannotated, incorrectly annotated, unnamed, misnamed or misclassified genes and pseudogenes.
    [Show full text]